CN112733067B

CN112733067B - Data set selection method for robot target detection algorithm

Info

Publication number: CN112733067B
Application number: CN202011542396.4A
Authority: CN
Inventors: 沈文婷; 陆林东; 郑军奇
Original assignee: Shanghai Robot Industrial Technology Research Institute Co Ltd
Current assignee: Shanghai Robot Industrial Technology Research Institute Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2023-05-09
Anticipated expiration: 2040-12-22
Also published as: CN112733067A

Abstract

Aiming at the problem of how to select the data set by the robot target detection algorithm model in various application scenes, the invention provides a data set selection method for the robot target detection algorithm. According to the invention, the machine learning method is used, the data set can be automatically selected for training and testing of the model according to different requirements of the algorithm model, a manual experience method can be effectively replaced, and meanwhile, the robustness performance and generalization performance of the algorithm model are improved. According to the method provided by the invention, the metadata characteristics of the data set affecting the detection effect of the model are extracted according to the test conclusion, and the row vectors are recoded, so that the similarity value of the next matching can be effectively reduced. Through continuous iterative updating learning, the row vector coding of the data set in the first step is improved, a more proper data set can be provided for a next new robot target recognition algorithm model, and the robustness and generalization performance of the algorithm model are improved.

Description

Data set selection method for robot target detection algorithm

Technical Field

The invention relates to the field of machine learning, in particular to a data set selection method for a robot target detection algorithm.

Background

With the rapid development of artificial intelligence deep learning, computer vision-based object detection techniques have been applied to various scenes. In particular, in the field of robots, target detection technologies for scenes such as industrial robots, service robots, unmanned aerial vehicles, security monitoring, and the like have been increasingly developed and matured. Different application scenarios typically select corresponding data sets for training and testing of the model. In order to select a proper data set for algorithm model developers so as to achieve the optimal performance of the algorithm model, the method becomes a research hot spot of each algorithm model developer in recent years.

The existing robot target detection algorithm model data set recommendation research generally adopts a method based on manual experience, and the selection with artificial subjectivity often needs to be subjected to a large number of model parameter adjustment processes.

Disclosure of Invention

The purpose of the invention is that: a data set selection method based on machine learning is provided for a robot target detection algorithm model.

In order to achieve the above object, the present invention provides a data set selection method for a robot target detection algorithm, which is characterized by comprising the following steps:

step 1: performing row vector encoding on the metadata characteristics of each type of existing data set comprises:

step 101: each metadata feature contains different numbers of feature elements, and if the total number of all the feature elements of all the metadata features is n, a 1 Xn-dimensional matrix is constructed, wherein each element in the matrix corresponds to one feature element in one metadata feature;

step 102: setting all elements of the 1×n-dimensional matrix obtained in step 101 to 0, obtaining a set of n row vectors containing 0 for each type of dataset

Step 103: obtaining row vectors corresponding to each type of data set

The method comprises the following steps:

if the current dataset contains a feature element in a metadata feature, the corresponding row vector obtained in step 102 is then used to determine the metadata feature

The value of the corresponding element in (1) is set from 0 to 1, so that the current data set corresponds to a 1 Xn-dimensional matrix containing a number of element values of 1, which 1 Xn-dimensional matrix is defined as row vector +.>

Step 2: determining a target detection object required by a robot target detection algorithm model, and performing row vector coding on metadata features of the target detection object by using the same method as the step 1 so as to establish row vectors corresponding to the metadata features of the target detection object

Step 3: based on the row vector obtained in step 1

And row vector obtained in step 2 +.>

Calculating the similarity between the metadata characteristics of the target detection object related to the robot field and the metadata characteristics of the existing data set, wherein the higher the similarity is, the more the current data set is matched with the target detection object, and the lower the similarity is, the more the current data set is not matched with the target detection object;

step 4: taking the data set with the highest similarity in the step 3 as a reference data set recommended by a target detection object in the robot field, respectively carrying out similarity calculation on the rest data sets and the reference data sets, and calculating a value of similarity II by using a row vector corresponding to each data set;

step 5: and (3) giving a first similarity threshold and a second similarity threshold, and taking all data sets with the value of the first similarity higher than the first similarity threshold and the value of the second similarity higher than the second similarity threshold and the reference data set determined in the step (4) as recommended data sets of the target detection object.

Preferably, the row vector of the current data set is calculated using a cosine similarity formula

Said row vector corresponding to the target detection object +.>

The distance between the two data sets is taken as a value of similarity one between the target detection object and the current data set; the larger the cosine similarity value, the more the target detection object matches the current dataset.

Preferably, the concrete calculation method of the cosine similarity is as follows:

cos (A, B) represents a row vector

And row vector->

Is the cosine similarity of (a) and (a) represents the row vector +.>

Is used to represent the row vector +.>

Is a mold of (a).

Preferably, in step 3, the calculated values of the similarity of all the data sets are arranged in descending order from high to low, so as to complete the descending order of all the corresponding data sets.

Preferably, in step 4, all the values of the similarity two are arranged in descending order from high to low.

Preferably, after the step 5, the method further comprises:

step 6: after training test is carried out by using the recommended data set in the robot field obtained in the step 5, extracting metadata characteristics of the data set influencing the detection effect of the model according to a test conclusion, and recoding row vectors, so that the similarity value of the next matching can be effectively reduced; through continuous iterative updating learning, the row vector coding of the data set in the step 1 is improved, a more proper data set is provided for the next new robot target recognition algorithm model, and the robustness and generalization performance of the algorithm model are improved.

Aiming at the problem of how to select the data set by the robot target detection algorithm model in various application scenes, the invention provides a method for recommending a proper data set for a robot target detection algorithm model developer. According to the invention, the machine learning method is used, the data set can be automatically selected for training and testing of the model according to different requirements of the algorithm model, a manual experience method can be effectively replaced, and meanwhile, the robustness performance and generalization performance of the algorithm model are improved. The method provided by the invention analyzes the metadata characteristics of the data set affecting the detection effect of the model, and recodes the row vector, so that the similarity value of the next matching can be effectively reduced. Through continuous iterative updating, a more proper data set can be provided for the next robot target recognition algorithm model, and the robustness and generalization performance of the algorithm model are improved.

Drawings

FIG. 1 is a schematic overall flow chart of a data set selection method for a robot target detection algorithm model provided by the invention;

fig. 2 is a schematic diagram of line vector encoding of metadata features provided in the present invention.

Detailed Description

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.

As shown in fig. 1, the data set selection method for the robot target recognition algorithm provided by the invention comprises the following steps:

step one: the metadata features of each type of existing dataset are row vector encoded.

The metadata features include application scenes, target detection object categories, target detection object sizes, illumination brightness and the like. The algorithm model developer target detection object establishes a similarity matching relation with each metadata characteristic traversal, and specifically comprises the following contents:

each metadata feature contains a different number of feature elements, e.g. a robot application scenario is a metadata feature, which is a package of metadata featuresContains a plurality of characteristic elements corresponding to different scenes of home, market, park and the like. Let the total number of all feature elements of all metadata features be n, a 1 x n dimensional matrix is constructed. At initialization, all elements of the 1 Xn-dimensional matrix are set to 0, so that a set of row vectors containing n 0's can be obtained

Among n feature elements of the 1×n-dimensional matrix, the first metadata feature is a robot application scene, 1 st to nth ₁ The feature elements belong to a first metadata feature and respectively correspond to different scenes such as home, market, park and the like. Nth (n) ₁ +1 characteristic elements to nth ₂ The feature elements belong to a second metadata feature, which is the target detection object class. Nth (n) ₂ The +1 to nth feature elements belong to a third metadata feature, which is the target detection object size. Coding is carried out in the initialized 1 Xn-dimensional matrix according to the corresponding relation. When encoding, if the current data set contains a certain characteristic element in a certain metadata characteristic, the value of the corresponding element in the 1 Xn-dimensional matrix is set from 0 to 1, so that each data set corresponds to a 1 Xn-dimensional matrix containing a plurality of element values of 1, and the 1 Xn-dimensional matrix is defined as a row vector

Each dataset can thus be represented as a different row vector +.>

Step two: and determining a target detection object required by the robot target detection algorithm model, and performing row vector coding on metadata characteristics of the target detection object by using the same method as the first step. In the same phase as the step oneSimilarly, if the target detection object contains a certain characteristic element in a certain metadata characteristic, the value of the corresponding element in the 1×n-dimensional matrix is set from 0 to 1, so as to establish a row vector corresponding to the metadata characteristic of the target detection object

Step three: the method comprises the steps of calculating the similarity between the metadata characteristics of a target detection object related to the robot field and the metadata characteristics of an existing dataset, wherein the higher the similarity is, the more the current dataset is matched with the target detection object, and the lower the similarity is, the more the current dataset is not matched with the target detection object.

Computing row vectors for a current dataset using a cosine similarity formula

Row vector corresponding to target detection object +.>

And taking the calculated distance as a value of similarity between the target detection object and the current data set. The larger the cosine similarity value, the more the target detection object matches the current dataset. The concrete calculation method of the cosine similarity is shown as follows:

cos (A, B) represents a row vector

And row vector->

Is the cosine similarity of (a) and (a) represents the row vector +.>

Is the modulus of (B) and (B) represents the direction of rowsQuantity->

Is a mold of (a).

And arranging the calculated similarity values of all the data sets in descending order according to the sequence from high to low, and further finishing the descending order of all the corresponding data sets.

Step four: and (3) taking the data set arranged at the first position in the third step as a reference data set recommended by a target detection object in the robot field, and respectively carrying out similarity calculation on the rest data sets and the reference data set. Similar to the first, second and third steps, the second similarity values obtained by calculating the row vectors corresponding to each data set are utilized, and all the second similarity values are arranged in descending order from high to low.

Step five: and (3) giving a first similarity threshold and a second similarity threshold, and taking all data sets with the value of the first similarity higher than the first similarity threshold and the value of the second similarity higher than the second similarity threshold and the reference data set determined in the step four as recommended data sets of the target detection object.

Step six: and D, after training and testing the recommended data set in the robot field obtained in the step five, extracting metadata characteristics of the data set influencing the detection effect of the model according to a test conclusion, recoding the row vector, and effectively reducing the similarity value of the next matching. Through continuous iterative updating learning, the row vector coding of the data set in the first step is improved, a more proper data set can be provided for a next new robot target recognition algorithm model, and the robustness and generalization performance of the algorithm model are improved.

Claims

1. The data set selection method for the robot target detection algorithm is characterized by comprising the following steps of:

Step 103: obtaining row vectors corresponding to each type of data set

The method comprises the following steps:

Step 3: based on the row vector obtained in step 1

And row vector obtained in step 2 +.>

step 5: giving a first similarity threshold and a second similarity threshold, and taking all data sets with the value of the first similarity higher than the first similarity threshold and the value of the second similarity higher than the second similarity threshold and the reference data set determined in the step 4 as recommended data sets of the target detection object;

2. The method for selecting a dataset for a robotic target detection algorithm as claimed in claim 1, wherein the row vector for the current dataset is calculated using a cosine similarity formula

With the objectThe row vector corresponding to the detection object +.>

3. The method for selecting a dataset for a robot target detection algorithm according to claim 2, wherein the concrete calculation method of the cosine similarity is as follows:

/>

cos (A, B) represents a row vector

And row vector->

Is the cosine similarity of (a) and (a) represents the row vector +.>

Is used to represent the row vector +.>

Is a mold of (a).

4. The method for selecting datasets for a target detection algorithm of a robot according to claim 1, wherein in step 3, the calculated similarity values of all datasets are arranged in descending order from high to low, thereby completing the descending order of all datasets.

5. The method of claim 1, wherein in step 4, all the values of the similarity two are arranged in descending order from high to low.