CN113971737A

CN113971737A - Object recognition method for robot, electronic device, medium, and program product

Info

Publication number: CN113971737A
Application number: CN202111222348.1A
Authority: CN
Inventors: 彭政睿; 宫新一; 魏本刚; 徐湘忆
Original assignee: Institute of Automation of Chinese Academy of Science; State Grid Shanghai Electric Power Co Ltd
Current assignee: Institute of Automation of Chinese Academy of Science; State Grid Shanghai Electric Power Co Ltd
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-01-25

Abstract

The invention provides an object recognition method, an electronic device, a medium and a program product for a robot, the method includes acquiring an image of an object to be recognized, which is acquired by the robot; and inputting the image of the object to be recognized into an object recognition model, and predicting the object class to obtain the object class output by the object recognition model, wherein the object recognition model is obtained based on metric learning training. The object recognition model is obtained based on metric learning training, and the metric learning can enlarge the similarity distance between object images of different object types and reduce the similarity distance between object images of the same object type, so that the object images of different object types are more obviously distinguished, and the object recognition efficiency is improved. Based on the method, even if the number of samples is small and the types of the samples are multiple, the object images of different object types are obviously distinguished, so that the object recognition model with high accuracy can be trained and obtained, and the robot can accurately recognize the object.

Description

Object recognition method for robot, electronic device, medium, and program product

Technical Field

The present invention relates to the field of computer vision technologies, and in particular, to an object recognition method for a robot, an electronic device, a medium, and a program product.

Background

With the rapid development of artificial intelligence technology and computer vision technology, robots have gradually deepened into aspects of human life. For example, it is widely used in houses, office buildings, hotels, restaurants, factories, and warehouses.

At present, robots identify objects by building deep learning models. However, the robot captures few target objects, so that the amount of sample data is small, the types of the target objects are many, so that the object recognition model cannot be trained, or the trained object recognition model has low recognition accuracy, and the object recognition requirement of the robot cannot be met.

In summary, how to train to obtain an object recognition model with high accuracy under the conditions of small number of samples and many types of samples is a problem which needs to be solved urgently at present.

Disclosure of Invention

The invention provides an object recognition method, electronic equipment, a medium and a program product for a robot, which are used for solving the defect that an object recognition model with high accuracy cannot be obtained by training under the conditions of small number of samples and various types of samples in the prior art and realizing accurate recognition of an object.

The invention provides an object recognition method for a robot, comprising the following steps:

acquiring an image of an object to be identified, which is acquired by a robot;

and inputting the image of the object to be recognized into an object recognition model, and predicting the object class to obtain the object class output by the object recognition model, wherein the object recognition model is obtained based on metric learning training.

The object recognition method for the robot provided by the invention further comprises the following steps:

acquiring training sample data, wherein the training sample data comprises a sample object image obtained by shooting through a robot and an object class label corresponding to the sample object image;

training a model to be trained by adopting the training sample data and a preset loss function to obtain an object recognition model, wherein the preset loss function is obtained by aggregating classification loss functions and measurement loss functions corresponding to measurement learning.

According to the object identification method for the robot provided by the invention, the measurement loss function comprises a central loss function and a discriminant loss function;

the center loss function is used for reducing the similarity distance between the sample object images of the same category in the training sample data;

and the discriminant loss function is used for expanding the similarity distance between the sample object images of different classes in the training sample data.

According to the object recognition method for the robot provided by the invention, the center loss function is determined based on similarity distances between feature vectors of a batch of sample object images in the training sample data and a feature vector mean of the batch of sample object images.

According to the object identification method for the robot provided by the invention, the discriminant loss function is determined based on the standard Euclidean distance between the feature vector of the target sample object image and the feature vector mean value of the corresponding category of the target sample object image, and the standard Euclidean distance between the feature vector of the non-target sample object image and the feature vector mean value of the corresponding category of the non-target sample object image;

the target sample object image is an object image of the model to be trained which is currently input into the training sample data, and the feature vector mean value of the corresponding category of the target sample object image is the feature vector mean value of a plurality of sample object images corresponding to the object category labels corresponding to the target sample object image in a batch of sample object images;

the non-target sample object image is an object image excluding the target sample image from the batch of sample object images, and the feature vector mean value of the corresponding category of the non-target sample object image is the feature vector mean value of a plurality of sample object images corresponding to the object category labels corresponding to the non-target sample object images in the batch of sample object images.

According to the object recognition method for the robot provided by the invention, after the training of the model to be trained is performed by adopting the training sample data and the preset loss function to obtain the object recognition model, the method further comprises the following steps:

inputting test sample data into the object recognition model, and performing feature extraction to obtain a first feature vector corresponding to a test object image in the test sample data, wherein the test object image is an object image obtained by shooting through a robot;

inputting support sample data to the object identification model, performing feature extraction, and obtaining a second feature vector corresponding to a support object image in the support sample data, wherein the support sample data comprises the support object image obtained by shooting through a robot and an object class label corresponding to the support object image;

and performing cosine similarity calculation on the first characteristic vector and the second characteristic vector to obtain a cosine similarity calculation result, wherein the cosine similarity calculation result is used for testing the object identification model.

According to the object recognition method for the robot provided by the invention, the object image to be recognized is input to an object recognition model, object type prediction is carried out, and the object type output by the object recognition model is obtained, the method comprises the following steps:

based on a feature extractor of the object recognition model, performing feature extraction on the object image to be recognized to obtain an image feature vector, wherein the feature extractor comprises a plurality of residual blocks, and one residual block comprises a convolution layer, a batch normalization layer and an activation function;

and performing category prediction on the image feature vector based on the full connection layer of the object recognition model to obtain the object category output by the object recognition model.

The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the object recognition method for a robot as described in any of the above when executing the program.

The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the object identification method for a robot as claimed in any one of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, carries out the steps of the object recognition method for a robot as defined in any one of the above.

According to the object identification method, the electronic device, the medium and the program product for the robot, the object image to be identified collected by the robot is acquired, then the object image to be identified is input to the object identification model, object type prediction is carried out, so that the object type output by the object identification model is obtained, and the object identification function of the robot is realized. The object recognition model is obtained based on metric learning training, the metric learning can enlarge the similarity distance between object images of different object types and can also reduce the similarity distance between object images of the same object type, so that the object images of different object types are more obviously distinguished, and the object recognition efficiency is improved. Based on the method, even if the number of samples is small and the types of the samples are multiple, the object images of different object types are obviously distinguished, so that the object recognition model with high accuracy can be trained and obtained, and the robot can accurately recognize the object.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of an object recognition method for a robot according to the present invention;

fig. 2 is a second flowchart of the object recognition method for a robot according to the present invention;

fig. 3 illustrates a physical structure diagram of an electronic device.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of an object recognition method for a robot according to the present invention, and as shown in fig. 1, the object recognition method for a robot according to the present invention includes:

step 110, acquiring an image of an object to be identified, which is acquired by a robot;

in this embodiment, before the robot operates on the target object, the robot needs to identify the target object to determine whether the target object is an object that the robot needs to operate. For example, in a scene where the robot grasps the target object, the category of the target object should be determined first, so as to determine the corresponding grasping operation.

The image of the object to be recognized is acquired by a camera deployed on the robot, and of course, the image of the object to be recognized can also be acquired by combining with other sensor devices.

Step 120, inputting the image of the object to be recognized into an object recognition model, and performing object class prediction to obtain an object class output by the object recognition model, wherein the object recognition model is obtained based on metric learning training;

in this embodiment, the types of the object categories can be set according to actual needs, for example, cylindrical wood blocks, cubic wood blocks, spherical wood blocks, rectangular wood blocks, apples, cups, and the like.

The object recognition model is a machine learning model, specifically, the object recognition model is a deep learning network model based on metric learning, that is, the object recognition model is obtained based on metric learning and deep learning training.

In a particular embodiment, the object recognition model includes a feature extractor and a classifier. Specifically, based on a feature extractor in the trained object recognition model, image feature information in an object image to be recognized is extracted, and then classification prediction is performed on the image feature information according to the image feature information and a classifier in the object recognition model, so that a classification prediction result, namely an object class output by the object recognition model, is obtained.

The classifier specifically performs a process of obtaining a classification probability vector (i.e., a probability prediction of an input image to be recognized for each class), and then determining an object class corresponding to a maximum classification probability value in the classification probability vector. The classifier includes a fully connected layer and a softmax function.

In some embodiments, the feature extractor is an encoder, the encoder includes a plurality of residual blocks and a global average pooling layer, a residual block includes convolutional layers and activation functions, the classifier includes fully-connected layers, and the step 120 includes:

based on the encoder of the object recognition model, performing feature extraction on the object image to be recognized to obtain an image feature vector; and performing category prediction on the image feature vector based on the full connection layer of the object recognition model to obtain the object category output by the object recognition model.

The specific implementation process of the fully-connected layer is to obtain a classification probability vector (i.e. a probability prediction of the input image to be recognized for each class) through a softmax function, and then determine an object class corresponding to the maximum classification probability value in the classification probability vector.

The number of residual blocks may be set according to actual needs, for example, 3, 4, and 5, which is not limited herein.

In an embodiment, the encoder includes a first residual block, a second residual block, a third residual block, and four residual blocks, and the step of performing feature extraction on the object image to be recognized based on the object recognition model to obtain an image feature vector includes:

based on a first residual block of the object recognition model, performing feature extraction on the object image to be recognized to obtain a first feature vector; performing feature extraction on the first feature vector based on a second residual block of the object identification model to obtain a second feature vector; performing feature extraction on the second feature vector based on a third residual block of the object identification model to obtain a third feature vector; and performing feature extraction on the third feature vector based on a fourth residual block of the object recognition model to obtain an image feature vector.

Further, the encoder further includes a global average pooling layer, and after the step of extracting features of the third feature vector based on a fourth residual block of the object recognition model to obtain an image feature vector, the object recognition method for the robot further includes:

and performing category prediction on the image feature vector based on a global average pooling layer of the object identification model to obtain an object category output by the object identification model. It should be noted that, by replacing the original full-connection layer with the global average pooling layer, global average pooling can be performed on the whole picture of each feature map, so that each feature map can obtain one output.

In addition, global mean pooling is performed on the image feature vectors, so that network parameters can be greatly reduced, and overfitting is avoided. Specifically, by enhancing the consistency of the feature map and the category, the convolution structure is simpler; parameter optimization is not needed, so that overfitting can be avoided; the spatial information is summed, so that the input spatial transformation is more stable.

In an embodiment, the residual block includes a convolution layer and an activation function, and the step of performing feature extraction on the object image to be recognized based on the first residual block of the object recognition model to obtain a first feature vector includes:

performing convolution operation on the object image to be identified based on the convolution layer in the first residual block to obtain a first convolution result; and carrying out nonlinear processing on the first convolution result by adopting the activation function to obtain a first characteristic vector. Wherein the activation function may be set as actually needed, such as the RELU activation function.

Further, the step of obtaining a first feature vector by performing feature extraction on the object image to be recognized based on the first residual block of the object recognition model includes:

performing convolution operation on the object image to be identified based on the convolution layer in the first residual block to obtain a first convolution result; based on a batch normalization layer in the first residual block, normalizing the first volume result to obtain a first normalization result; and carrying out nonlinear processing on the first normalization result by adopting the activation function to obtain a first feature vector.

It should be noted that the network training and convergence speed is increased by the batch normalization layer BN layer, so that overfitting and gradient explosion are avoided; reducing network parameters by maximum pooling downsampling; the non-linear relationship is introduced by the activation function.

In order to obtain the object recognition model through training, the method further comprises the following steps:

acquiring a sample object image, wherein the sample object image is an object image obtained by shooting through a robot; labeling labels aiming at object types on the sample object image to obtain object type label data; obtaining a model to be trained, and selecting training sample data from the sample object image and the object class label data; and performing iterative training on the model to be trained based on the training sample data and a preset loss function to obtain the object recognition model, wherein the preset loss function is obtained by aggregating classification loss functions and measurement loss functions corresponding to measurement learning.

Specifically, each object image representation value in the sample object image is extracted, and then a corresponding object type label is matched for the sample object image based on each object image representation value, so that object image label data is obtained; selecting a training sample from training sample data, inputting an object image and an object class label corresponding to the training sample into a model to be trained, executing model prediction to obtain a model output label, calculating model loss based on a preset loss function, and updating the model to be trained based on the model loss until the iteration number of the model to be trained reaches the preset iteration number or a corresponding loss function (target function) reaches a preset value.

It should be noted that the preset number of iterations may be set to 1500. In addition, through gradient descent, the optimal weight value which enables the target function to be minimum can be found, the weight value can be automatically learned through training, and then the model to be trained is updated.

In addition, it should be noted that training is performed by using training sample data, so that the smaller the preset loss function is, the better the preset loss function is, and after each round of training, the verification model is evaluated by using test sample data, and the weight of the model is derived until the model converges, so as to obtain the final object recognition model.

Wherein the sample object image includes at least one object image. The training sample data includes at least one training sample, and the training sample includes one object image from the sample object image and one object class label from the object image label data.

Of course, the encoder in the model to be trained may be the convolutional neural network described above, for example, ResNet-18, or may be a cyclic neural network, a codec neural network, or the like.

According to the object identification method for the robot, the object image to be identified collected by the robot is obtained, then the object image to be identified is input to the object identification model, object type prediction is carried out, the object type output by the object identification model is obtained, and therefore the object identification function of the robot is achieved. The object recognition model is obtained based on metric learning training, the metric learning can enlarge the similarity distance between object images of different object types and can also reduce the similarity distance between object images of the same object type, so that the object images of different object types are more obviously distinguished, and the object recognition efficiency is improved. Based on the method, even if the number of samples is small and the types of the samples are multiple, the object images of different object types are obviously distinguished, so that the object recognition model with high accuracy can be trained and obtained, and the robot can accurately recognize the object.

Further, based on the above-described first embodiment, a second embodiment of the object recognition method for a robot of the present invention is proposed. Fig. 2 is a second flowchart of the object recognition method for a robot according to the present invention, as shown in fig. 2, in this embodiment, the method further includes a training method of the object recognition model:

step 210, acquiring training sample data, wherein the training sample data comprises a sample object image obtained by shooting through a robot and an object class label corresponding to the sample object image;

in this embodiment, the training sample data includes the sample object image and the object class label corresponding thereto. The sample object image is an object image captured by a robot. The type of the object type label can be set according to actual conditions.

The training sample data may include N types of object images, each type may include N object images, that is, the training sample data may include N × N object images. The training sample data may be less sample data, i.e. training sample data with less samples and more samples.

In an embodiment, the training sample data is obtained by: acquiring a sample object image, wherein the sample object image is an object image obtained by shooting through a robot; labeling labels aiming at object types on the sample object image to obtain object type label data; and selecting training sample data from the sample object image and the object class label data.

Specifically, each object image representation value in the sample object image is extracted, and then a corresponding object type label is matched for the sample object image based on each object image representation value, so that object image label data is obtained, and training sample data is selected from the sample object image and the object type label data.

And step 220, training the model to be trained by adopting the training sample data and a preset loss function to obtain an object recognition model, wherein the preset loss function is obtained by aggregating classification loss functions and measurement loss functions corresponding to measurement learning.

In this embodiment, the preset loss function is a loss function (objective function) of the model to be trained, and is used for performing optimization training on the model parameters in the model to be trained by using gradient descent.

Specifically, an object image and an object class label corresponding to a training sample in training sample data are input into a model to be trained, model prediction is performed to obtain a model output label, model loss is calculated based on a preset loss function, and then the model to be trained is updated based on the model loss, namely model parameters in the model to be trained are updated until the iteration number of the model to be trained reaches the preset iteration number or the corresponding preset loss function (target function) reaches a preset value. The preset iteration times and the preset value can be set according to actual needs.

The classification loss function is a loss function corresponding to deep learning, and is used for calculating the loss of a prediction result and a true value of an input training sample, namely calculating a difference value between a model output label and an object class label corresponding to the training sample. The classification function may be a cross entropy loss function, a log-loss function of two types, or other types of loss functions.

Cross entropy loss function L_ceComprises the following steps:

wherein q is_iTrue value for indicating whether input object picture belongs to i-th class object class, if so, q_iQ is not_i＝0，p_iAnd indicating the input object picture belongs to the predicted value of the i-th class object category.

The metric loss function is a loss function corresponding to metric learning, and is used for expanding the similarity distance between object images of different object classes and reducing the similarity distance between object images of the same object class.

The classification function is used for calculating the loss of the prediction result and the truth value of the input training sample, namely calculating the difference between the output label of the model and the object class label corresponding to the training sample. The classification function may be a cross entropy loss function, a log-loss function of two types, or other types of loss functions.

In one embodiment, the metric loss function includes a central loss function and a discriminant loss function;

Specifically, the central loss function is configured to reduce a similarity distance between sample object images of the same category in a batch of sample object images in the training sample data; the discriminant loss function is used for expanding the similarity distance between different types of sample object images in a batch of sample object images in the training sample data

The similarity distance may be a euclidean distance, or may be other distances, such as a manhattan distance, a chebyshev distance, or the like.

In an embodiment, the center loss function is determined based on similarity distances between feature vectors of a batch of the sample object images in the training sample data and a feature vector mean of the batch of sample object images.

And the batch sample object images are batch images of the currently input model to be trained.

Each feature vector of a batch of sample object images comprises a feature vector of each sample object image in a batch of sample object images, and the feature vector of each sample object image is a feature vector obtained by extracting features of the sample object image by using a model to be trained.

The mean value of the feature vectors of the sample object images is the mean value of the feature vectors of all the sample object images in the sample object images, and the feature vector of each sample object image is the feature vector obtained by extracting the features of the sample object image by the model to be trained.

In one embodiment, the center loss function Lc is:

wherein the batch size B is the number of sample object images in the batch of sample object images, and the feature vector x_iThe feature mean value z is the feature map of the ith sample object image in the batch of sample object images_yiIs the average of the feature vectors of all sample object images in the batch of sample object images. In addition, the reduction degree n is greater than 1, and the reduction degree n represents the reduction degree of the similarity distance between the sample object images of the same category in the training sample data.

Further, the center loss function Lc is:

wherein the batch size B is the number of sample object images in the batch of sample object images, and the feature vector x_iThe feature mean value z is the feature map of the ith sample object image in the batch of sample object images_yiIs the average of the feature vectors of all sample object images in the batch of sample object images.

In one embodiment, the discriminant loss function is determined based on a standard euclidean distance between a feature vector of a target sample object image and a feature vector mean of a corresponding category of the target sample object image, and a standard euclidean distance between a feature vector of a non-target sample object image and a feature vector mean of a corresponding category of the non-target sample object image;

The feature vector of the target sample object image is obtained by performing feature extraction on the target sample object image by using the model to be trained.

The corresponding category of the target sample object image is an object category label corresponding to the target sample object image, and the feature vector mean value corresponding to the category is the mean value of all feature vectors of a plurality of sample object images corresponding to the object category label in a batch of sample object images.

The non-target sample object image is an object image except the target sample object image in a batch of the sample object images. The feature vector of the non-target sample object image is obtained by performing feature extraction on the non-target sample object image by the model to be trained.

The corresponding category of the non-target sample object image is an object category label corresponding to the non-target sample object image, and the feature vector mean value corresponding to the category is the mean value of all feature vectors of a plurality of sample object images corresponding to the object category label in a batch of sample object images.

In one embodiment, the discriminant loss function L_dComprises the following steps:

wherein ED (,) represents the standard Euclidean distance, the eigenvector o_kA feature vector representing a sample object image belonging to K classes (Kth object class) in a batch of sample object images, a feature mean value z_kRepresenting the average of the feature vectors, feature vector o, of all sample object images belonging to class K (class K object) in a batch of sample object images_k′A feature vector representing a sample object image not belonging to the K-th class (the K-th object class) in a batch of sample object images, and a feature mean value z_k′Representing the correspondence of an object class not belonging to class K (class K object class) in a batch of sample object imagesThere is an average value of feature vectors of the sample object image, and N is the number of object classes not belonging to class K (class K object class).

In particular, the characteristic mean value z_kComprises the following steps:

wherein the batch size B is the number of sample object images in the batch of sample object images

An ith sample object image representing a K-th class (K-th object class) of the sample object images of the batch, D_trainRepresenting a batch of sample object images, feature vectors

And performing feature extraction on the ith sample object image of the K type (the K type object type) in the batch of sample object images to obtain a feature vector for the model to be trained.

The metric loss function is obtained based on an aggregation process of the central loss function and the discriminant loss function. In one embodiment, the aggregation process is an addition process, and for the sake of understanding, the central loss function is assumed to be Lc, and the discriminant loss function is assumed to be L_dThen the metric loss function is L_metric：

L_metric＝L_c+L_d。

In addition, it should be noted that the preset loss function is obtained by performing aggregation processing on the classification loss function and the metric loss function corresponding to metric learning. In one embodiment, the aggregation processing is a weighting processing. For ease of understanding, assume a classification loss function of L_ceThe metric loss function is L_metricThen, a loss function L is preset_finalComprises the following steps:

L_final＝αL_ce+L_metric，

where a is a balance parameter, and the balance parameter may be set according to an actual requirement, for example, 0.5.

In this embodiment, the loss function of the model to be trained is set as a function obtained by aggregating the classification loss function and the measurement loss function, so that in the process of training the model to be trained, the similarity distance between object images of different object categories is expanded, and the similarity distance between object images of the same object category is reduced, so that the object images of different object categories are more obviously distinguished, and then an object recognition model with high accuracy can be trained, thereby improving the accuracy and recognition efficiency of robot recognition.

Further, based on the above second embodiment, a third embodiment of the object recognition method for a robot of the present invention is proposed. In this embodiment, after the step 220, the method for identifying an object for a robot further includes:

step 230, inputting test sample data into the object recognition model, and performing feature extraction to obtain a first feature vector corresponding to a test object image in the test sample data, wherein the test object image is an object image obtained by shooting through a robot;

in the present embodiment, the test sample data includes a test object image. The test object image is an object image captured by the robot. The test sample data may be less sample data, i.e. the test sample data has less number of samples and more types of samples.

Specifically, based on a feature extractor in the trained object recognition model, image feature information in a test object image is extracted, and then a first feature vector of the test object image is obtained. The manner of extracting the features of the test sample data is basically the same as the manner of extracting the features of the image of the object to be recognized in the first embodiment, and details are not repeated here.

Step 240, inputting support sample data to the object identification model, performing feature extraction, and obtaining a second feature vector corresponding to a support object image in the support sample data, wherein the support sample data comprises the support object image obtained by shooting through a robot and an object type label corresponding to the support object image;

in this embodiment, the support sample data includes a support object image. The supported object image is an object image captured by the robot. The support sample data may be less sample data, i.e. support sample data with less number of samples and more types of samples.

Specifically, based on a feature extractor in the trained object recognition model, image feature information in the supporting object image is extracted, and then a second feature vector of the supporting object image is obtained. The manner of extracting the features of the support sample data is basically the same as the manner of extracting the features of the image of the object to be recognized in the first embodiment, and is not repeated here.

The support sample data may include M types of object images, each type may include M object images, that is, the support sample data may include M × M object images. The number of object classes supporting the sample data should be less than or equal to the number of object classes of the training sample data.

In one embodiment, the method for obtaining the support sample data is as follows: acquiring a supporting object image, wherein the supporting object image is an object image obtained by shooting through a robot; labeling labels aiming at object types on the supporting object image to obtain object type label data; and selecting support sample data from the support object image and the object class label data.

Specifically, each object image representation value in the support object image is extracted, and then a corresponding object type label is matched for the support object image based on each object image representation value, so that object image label data is obtained, and therefore support sample data is selected from the support object image and the object type label data.

In addition, it should be noted that there is no overlapping portion between the test sample data and the object image supporting the sample data and the training sample data. The test sample data and the support sample data are used for testing the trained object recognition model.

And 250, performing cosine similarity calculation on the first feature vector and the second feature vector to obtain a cosine similarity calculation result, wherein the cosine similarity calculation result is used for testing the object identification model.

Specifically, the formula of the cosine similarity calculation is as follows:

wherein x is_sA first feature vector, x, corresponding to the image of the test object in the test sample data_qAnd a second feature vector corresponding to the image of the support object in the support sample data.

In this embodiment, by calculating the cosine similarity between the feature vector of each test object image in the test sample data and the feature vector of each support object image containing a label in the support sample data, the identification prediction of the test object image in the test sample data can be realized, so as to realize the test of the trained object identification model, and further improve the object identification accuracy of the object identification model.

Further, based on the first embodiment described above, a fourth embodiment of the object recognition method for a robot of the present invention is proposed. In this embodiment, the step 120 includes:

step 121, extracting features of the object image to be recognized based on a feature extractor of the object recognition model to obtain an image feature vector, wherein the feature extractor comprises a plurality of residual blocks, and one residual block comprises a convolutional layer, a batch normalization layer and an activation function;

the number of the residual blocks can be set according to actual needs, for example, 3, 4, and 5, which is not limited herein.

Wherein the convolutional layer is used to perform a convolution operation. The batch normalization layer is used for carrying out normalization and normalization processing, and overfitting and gradient explosion are avoided by increasing the network training and convergence speed through the batch normalization layer BN layer. The activation function is used to introduce a non-linear relationship, which may be set as actually needed, such as the RELU activation function. .

In one embodiment, the feature extractor comprises four residual blocks, the feature extractor comprises a first residual block, a second residual block, a third residual block and four residual blocks, and the step 121 comprises:

based on a first residual block of the object recognition model, performing feature extraction on the object image to be recognized to obtain a first intermediate feature vector; performing feature extraction on the first intermediate feature vector based on a second residual block of the object recognition model to obtain a second intermediate feature vector; performing feature extraction on the second intermediate feature vector based on a third residual block of the object recognition model to obtain a third intermediate feature vector; and performing feature extraction on the third intermediate feature vector based on a fourth residual block of the object recognition model to obtain an image feature vector.

In a specific embodiment, the step of extracting features of the to-be-recognized object image based on the first residual block of the object recognition model to obtain a first intermediate feature vector includes:

In addition, the execution flow of the second, third and fourth residual blocks is basically the same as that of the first residual block, and is not repeated here.

And step 122, performing category prediction on the image feature vector based on the full connection layer of the object recognition model to obtain an object category output by the object recognition model.

Specifically, the specific implementation procedure of the fully-connected layer is to obtain a classification probability vector (i.e., a probability prediction of each class of the input image to be recognized) through a softmax function, and then determine an object class corresponding to the maximum classification probability value in the classification probability vector.

In this embodiment, the object recognition model includes a plurality of residual blocks, and each residual block includes the batch normalization layer, thereby accelerating the speed of network training and convergence through the batch normalization layer, avoiding overfitting and gradient explosion, and further improving the object recognition accuracy and the object recognition efficiency of the object recognition model.

The invention also provides an object recognition device for the robot, which is deployed on the robot. The function implementation of each module in the object recognition apparatus for a robot corresponds to each step in the embodiment of the object recognition method for a robot, and the function and implementation process are not described in detail here.

Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform an object recognition method for a robot, the method comprising: acquiring an image of an object to be identified, which is acquired by a robot; and inputting the image of the object to be recognized into an object recognition model, and predicting the object class to obtain the object class output by the object recognition model, wherein the object recognition model is obtained based on metric learning training.

In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the object identification method for a robot provided by the above methods, the method comprising: acquiring an image of an object to be identified, which is acquired by a robot; and inputting the image of the object to be recognized into an object recognition model, and predicting the object class to obtain the object class output by the object recognition model, wherein the object recognition model is obtained based on metric learning training.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an object recognition method for a robot provided to perform the above methods, the method including: acquiring an image of an object to be identified, which is acquired by a robot; and inputting the image of the object to be recognized into an object recognition model, and predicting the object class to obtain the object class output by the object recognition model, wherein the object recognition model is obtained based on metric learning training.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a robot, a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An object recognition method for a robot, comprising:

acquiring an image of an object to be identified, which is acquired by a robot;

2. The object recognition method for a robot according to claim 1, further comprising a training method of the object recognition model:

3. The object recognition method for a robot according to claim 2, wherein the metric loss function includes a center loss function and a discriminant loss function;

4. The object recognition method for a robot according to claim 3, wherein the center loss function is determined based on a similarity distance between each feature vector of a batch of the sample object images in the training sample data and a feature vector mean of the batch of the sample object images.

5. The object recognition method for a robot according to claim 3, wherein the discriminant loss function is determined based on a standard Euclidean distance between a feature vector of a target sample object image and a feature vector mean of a corresponding category of the target sample object image, and a standard Euclidean distance between a feature vector of a non-target sample object image and a feature vector mean of a corresponding category of the non-target sample object image;

6. The object recognition method for a robot according to claim 2, wherein after the training of the model to be trained by using the training sample data and the predetermined loss function to obtain the object recognition model, the method further comprises:

7. The object recognition method for a robot according to any one of claims 1 to 6, wherein the inputting the image of the object to be recognized to an object recognition model, performing object class prediction, and obtaining the object class output by the object recognition model, comprises:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, carries out the steps of the object recognition method for a robot according to any one of claims 1 to 7.

9. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the object recognition method for a robot according to any one of claims 1 to 7.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the object recognition method for a robot according to any one of claims 1 to 7.