CN114373075A

CN114373075A - Target component detection data set construction method, detection method, device and equipment

Info

Publication number: CN114373075A
Application number: CN202111683433.8A
Authority: CN
Inventors: 石光明; 白洁; 李旭阳; 饶承炜; 谢雪梅
Original assignee: Guangzhou Institute of Technology of Xidian University
Current assignee: Guangzhou Institute of Technology of Xidian University
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-19

Abstract

The invention discloses a construction method, a detection method, a device and equipment of a target component detection data set, wherein the method comprises the following steps: acquiring a first group of images from a preset public data set according to a preset object type; acquiring a second group of images through image acquisition according to the object type; according to a preset component disassembly standard, performing component disassembly on each target object in each image in the first group of images and the second group of images to obtain a disassembled first group of images and a disassembled second group of images; constructing an initial data set according to the disassembled first group of images and the disassembled second group of images; and performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set. By adopting the technical scheme of the invention, more accurate object information can be obtained by disassembling and marking the parts of the target object, so that the grabbing success rate of the robot is improved in practical application.

Description

Target component detection data set construction method, detection method, device and equipment

Technical Field

The invention relates to the technical field of image processing and computer vision, in particular to a construction method of a target component detection data set, a target detection method, a target detection device, a computer readable storage medium and terminal equipment.

Background

The target detection is a hot direction of computer vision and digital image processing, is widely applied to various fields of robots, intelligent video monitoring, industrial detection, aerospace and the like, reduces the consumption of human capital through the computer vision, and has important practical significance. Therefore, the target detection becomes a research hotspot of theory and application in recent years, is an important branch of image processing and computer vision discipline, is also a core part of an intelligent monitoring system, is also a basic algorithm in the field of universal identity recognition, and plays a vital role in subsequent tasks such as face recognition, gait recognition, crowd counting, instance segmentation and the like. Training data is inevitably needed when neural network development in the aspect of target detection is carried out, and currently, relatively well-known data sets used in target detection mainly comprise PASCAL VOC, MS COCO and ImageNet.

The PASCAL VOC challenge is a benchmark test for classification identification and detection of visual objects, providing a standard image annotation dataset and standard evaluation system for detection algorithms and learning performance. The PASCAL VOC provides a standardized set of excellent datasets for image recognition and classification that can be used for image classification, object detection, image segmentation. One of the tasks of the PASCAL VOC challenge is the Person layout task Competition, i.e. the bounding box and the corresponding labels that predict the body parts (head, hand, foot, etc.), the image used for this task is additionally labeled with the body parts (head, hand, foot, etc.) of the Person.

The MS COCO data set is a large-scale image data set developed and maintained by Microsoft, the COCO data set is the data set most commonly used for image detection and positioning at present, and is a new image identification, segmentation and caption data set, and the annotation information of the image not only comprises category and position information, but also comprises semantic text description of the image.

The ImageNet data set is a computer vision system recognition project, is the largest database of image recognition in the world at present, and is established by computer scientists of Stanford in the United states through a recognition system simulating human beings. The ImageNet dataset is a large-scale labeled image dataset organized according to the WordNet architecture, and approximately comprises 1500 ten thousand pictures and 2.2 ten thousand classes, and each picture is strictly screened and labeled manually.

However, the existing data sets used for object detection, such as the various data sets mentioned above, include many object categories that the robot does not need to operate, and the entire object is labeled with a horizontal bounding box (horizontal bounding box), the recognition of the object is fuzzy, which makes it difficult to determine the exact position of the object when the robot grasps the object, resulting in a low success rate of grasping by the robot.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a method for constructing a target component detection data set, a target detection method, an apparatus, a computer-readable storage medium, and a terminal device, which can obtain more accurate object information by performing component disassembly and labeling on a target object, thereby improving the capturing success rate of a robot in practical applications.

In order to solve the technical problem, an embodiment of the present invention provides a method for constructing a target component detection data set, including:

acquiring a first group of images from a preset public data set according to a preset object type, wherein the first group of images comprises at least one image, and each image comprises a target object corresponding to at least one object type;

acquiring a second group of images through image acquisition according to the object types, wherein the second group of images comprise at least one image, and each image comprises a target object corresponding to at least one object type;

performing component disassembly on each target object in each image in the first group of images and the second group of images according to a preset component disassembly standard to obtain a disassembled first group of images and a disassembled second group of images;

constructing an initial data set according to the disassembled first group of images and the disassembled second group of images;

and performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set.

Further, the method carries out subclass labeling on the jth dismantling component of the ith target object through the following steps:

generating a rectangular boundary frame corresponding to the jth disassembling component based on the target image of the ith target object;

and adjusting the rectangular boundary frame according to the position of the jth dismantling component in the target image to obtain a rotating boundary frame corresponding to the jth dismantling component, wherein in an image coordinate system established based on the target image, the rotating boundary frame is represented by (X, y, w, h, theta), (X, y) represents the center point coordinate of the rotating boundary frame, (w, h) represents the width and height of the rotating boundary frame, theta represents the rotation angle of the rotating boundary frame relative to the X axis, and theta is greater than or equal to 0 and less than pi.

Further, the acquiring a second group of images through image acquisition according to the object type specifically includes:

acquiring the number of images corresponding to each object type in the first group of images;

and acquiring a second group of images through image acquisition aiming at the object categories with the number of the images smaller than a preset number threshold.

Further, the sub-class labeling of the disassembled part of each target object in each image in the initial data set to obtain the target part detection data set specifically includes:

dividing the initial data set into a first data set and a second data set, and carrying out subclass labeling on a disassembled part of each target object in each image in the first data set;

dividing the labeled first data set into a training set and a testing set, training a preset network model according to the training set, and optimizing the trained network model according to the testing set;

performing subclass labeling on the disassembled part of each target object in each image in the second data set according to the optimized network model;

and obtaining the target component detection data set according to the labeled first data set and the labeled second data set.

Further, after the subclassing the disassembled parts of each target object in each image in the second data set according to the optimized network model, the method further includes:

inspecting the labeling result of each image in the second data set, and correcting the image labeled with the defect;

then, the obtaining the target component detection data set according to the labeled first data set and the labeled second data set specifically includes:

and obtaining the target component detection data set according to the marked first data set and the inspection corrected second data set.

In order to solve the above technical problem, an embodiment of the present invention further provides a target detection method, including:

training a preset target component detection model according to a preset target component detection data set, wherein the target component detection data set is obtained by adopting any one of the target component detection data set construction methods, and the target detection model is the optimized network model in the embodiment;

and carrying out target detection on the image to be detected according to the trained target detection model to obtain a component detection result of the target object.

In order to solve the technical problem, an embodiment of the present invention further provides a target component detection data set construction apparatus, configured to implement any one of the above target component detection data sets construction methods, where the apparatus includes:

the first group of image acquisition modules are used for acquiring a first group of images from a preset public data set according to preset object categories, wherein the first group of images comprise at least one image, and each image comprises a target object corresponding to at least one object category;

the second group of image acquisition module is used for acquiring a second group of images through image acquisition according to the object types, wherein the second group of images comprise at least one image, and each image comprises a target object corresponding to at least one object type;

the image component disassembling module is used for performing component disassembling on each target object in each image in the first group of images and the second group of images according to a preset component disassembling standard to obtain a disassembled first group of images and a disassembled second group of images;

the initial data set construction module is used for constructing an initial data set according to the disassembled first group of images and the disassembled second group of images;

and the target component detection data set construction module is used for carrying out subclass labeling on the disassembled component of each target object in each image in the initial data set to obtain a target component detection data set.

In order to solve the above technical problem, an embodiment of the present invention further provides an object detection apparatus, configured to implement the object detection method in the foregoing embodiment, where the apparatus includes:

a target detection model training module, configured to train a preset target detection model according to a preset target component detection data set, where the target component detection data set is obtained by using any one of the above target component detection data set construction methods, and the target detection model is the optimized network model in the above embodiment;

and the target object detection module is used for carrying out target detection on the image to be detected according to the trained target detection model to obtain a component detection result of the target object.

An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; the computer program, when running, controls an apparatus where the computer-readable storage medium is located to execute any one of the above methods for constructing a target component detection data set, or the target detection methods in the above embodiments.

The embodiment of the present invention further provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements the method for constructing the target component detection data set according to any one of the above embodiments when executing the computer program, or the target detection method according to the above embodiment.

Compared with the prior art, the embodiment of the invention provides a construction method of a target component detection data set, a target detection method, a target detection device, a computer readable storage medium and a terminal device, wherein a first group of images are acquired from a preset public data set according to a preset object type; acquiring a second group of images through image acquisition according to the object type; performing component disassembly on each target object in each image in the first group of images and the second group of images according to a preset component disassembly standard to obtain a disassembled first group of images and a disassembled second group of images; constructing an initial data set according to the disassembled first group of images and the disassembled second group of images; performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set; the embodiment of the invention constructs a target part detection data set suitable for robot grabbing, mainly comprising images which can be used for the robot to grab objects, and by disassembling and marking the target objects contained in the images in the data set, the expression relationship of the spatial structure of each part of the object is increased, more accurate object information can be obtained, and the grabbing success rate of the robot is improved in practical application.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a method for constructing a target component inspection data set provided by the present invention;

FIG. 2 is a flow chart of a preferred embodiment of a method for object detection provided by the present invention;

FIG. 3 is a block diagram of a preferred embodiment of an apparatus for constructing a target component inspection data set according to the present invention;

FIG. 4 is a block diagram of a preferred embodiment of an object detection apparatus provided in the present invention;

fig. 5 is a block diagram of a preferred embodiment of a terminal device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.

An embodiment of the present invention provides a method for constructing a target component detection data set, which is a flowchart of a preferred embodiment of the method for constructing a target component detection data set according to the present invention, as shown in fig. 1, and the method includes steps S11 to S15:

step S11, acquiring a first group of images from a preset public data set according to a preset object type, where the first group of images includes at least one image, and each image includes a target object corresponding to at least one object type.

Specifically, in the embodiment of the present invention, a plurality of object categories are preset, and each object category corresponds to one target object, so that images including the object categories can be screened from a preset public data set according to the preset object categories, and a first group of images can be obtained according to the screened images; it is understood that the first set of images includes at least one image, and each image includes a target object corresponding to at least one object class.

It should be noted that the preset object type is generally an object type suitable for robot operation, and mainly based on the complexity of the operation of the object by the human, an object that can be simply operated by a single person is selected as a target object, and a corresponding object type is set, meanwhile, the target object has a simple structure, the weight and the volume cannot be too large, and the target object is an object that is common in daily life and has a high operation frequency, such as a cup, a bottle, a door and the like common in life; if the operation of the object by the person is complicated, the operation of complicated mechanical or electronic equipment such as a bicycle, an automobile and the like is not considered.

Illustratively, the public data sets mainly include MS COCO and OpenImage, images including preset object categories can be screened out from the two data sets, and in the actual screening process, the number of screened images including the target object can be divided into different levels according to different use frequencies of the target object; for example, cups, bottles, doors, etc., which are most common in life, are used frequently, and the number of images including such objects is large, while other objects which are used frequently or have simple structural features are screened for a relatively small number of images.

Step S12, acquiring a second group of images through image acquisition according to the object types, where the second group of images includes at least one image, and each image includes a target object corresponding to at least one object type.

Specifically, besides screening out images containing the object types from the public data set, the method can also acquire images of different target objects in different scenes through image acquisition equipment based on the object types, and correspondingly acquire a second group of images; it is understood that the second group of images includes at least one image, and each image includes a target object corresponding to at least one object class.

Step S13, performing component disassembly on each target object in each image in the first set of images and the second set of images according to a preset component disassembly standard, and obtaining a disassembled first set of images and a disassembled second set of images.

Specifically, according to the embodiment of the present invention, a component disassembly standard for performing component disassembly on the target object is preset according to a component function or a structural feature of the target object, so that according to the preset component disassembly standard, component disassembly is performed on each target object included in each image in the obtained first group of images, so as to obtain a disassembled first group of images correspondingly, and component disassembly is performed on each target object included in each image in the obtained second group of images, so as to obtain a disassembled second group of images correspondingly.

It can be understood that, in the disassembling in the embodiment of the present invention, the image is not disassembled, but each component of the target object included in the image is disassembled based on the target object corresponding to the selected object type, so as to disassemble the same target object into different disassembling components, and the same target object includes at least one disassembling component.

And step S14, constructing an initial data set according to the disassembled first group of images and the disassembled second group of images.

Specifically, after the disassembled first group of images and the disassembled second group of images are obtained, the disassembled first group of images and the disassembled second group of images may be sorted and merged, and an initial data set is correspondingly constructed.

And step S15, performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set.

Specifically, after the initial data set is obtained, subclass labeling is performed on the disassembled part of each target object included in each disassembled image in the initial data set, so as to obtain a labeled initial data set correspondingly, where the labeled initial data set is the finally obtained target part detection data set.

For example, assuming that the selected object types suitable for the robot operation include 45 object types, corresponding to 45 target objects, the parts of each target object included in each image in the initial data set are disassembled according to the set object part disassembling criteria (the part disassembling criteria are based on the functional or structural features of each part); for example, for a hand drill, the hand drill can be disassembled into three parts, namely a hand drill handle (drill handle), a hand drill body (drill body) and a drill bit (drill bits); for a bottle, the bottle can be disassembled into a bottle body (bottle body), a bottle neck (bottle neck) and a bottle cap (bottle cap), and other target objects are disassembled according to the same standard; correspondingly, 45 object categories are divided into 88 sub-categories; of these, 45 object classes (target objects) and their corresponding component breakups, as well as the number of images in the initial dataset, are shown in table 1.

It can be understood that when subclassing is performed on each disassembled part of the target object included in the image, the target object is not marked as a whole, but each disassembled part obtained by disassembling is marked, for example, if one image simultaneously includes four types of target objects, namely, a bottle, a plate, a knife and a fork, subclassing is performed on a bottle cap, a bottle neck, a bottle body, a plate edge, a plate center, a knife handle, a knife head, a fork head and a handle in sequence.

Table 145 kinds of object class data table

According to the construction method of the target component detection data set, provided by the embodiment of the invention, a first group of images are acquired from a preset public data set according to a preset object type; acquiring a second group of images through image acquisition according to the object type; performing component disassembly on each target object in each image in the first group of images and the second group of images according to a preset component disassembly standard to obtain a disassembled first group of images and a disassembled second group of images; constructing an initial data set according to the disassembled first group of images and the disassembled second group of images; performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set; the embodiment of the invention constructs a target part detection data set suitable for robot grabbing, mainly comprising images which can be used for the robot to grab objects, and by disassembling and marking the target objects contained in the images in the data set, the expression relationship of the spatial structure of each part of the object is increased, more accurate object information can be obtained, and the grabbing success rate of the robot is improved in practical application.

In another preferred embodiment, the method subclasses the jth disassembled component of the ith target object by:

Specifically, with reference to the above embodiment, when performing data set labeling on an initial data set, subclass labeling needs to be performed on each disassembled part of each target object included in each disassembled image in the initial data set, and a specific labeling process will be described below with a subclass labeling performed on a jth disassembled part of an ith target object as an example (i and j are positive integers greater than 0):

determining a target image where the ith target object is located, and establishing an image coordinate system based on the target image, for example, taking the upper left vertex of the target image as a coordinate origin, taking the horizontal rightward direction along the target image as the positive direction of an X axis, and taking the vertical downward direction along the target image as the positive direction of a Y axis; then, in the image coordinate system, a corresponding rectangular bounding box (rectangular bounding box) is generated for the jth disassembling component of the ith target object, wherein the rectangular bounding box is represented by (X0, y0, w0, h0, θ 0), wherein (X0, y0) represents a center point coordinate initial value of the rectangular bounding box, (w0, h0) represents a width initial value and a height initial value of the rectangular bounding box, and θ 0 represents a rotation angle initial value of the rectangular bounding box clockwise relative to the X axis; and adjusting the relevant initial value of the rectangular boundary frame according to the position of the jth dismantling component of the ith target object in the target image to obtain a rotating boundary frame (oriented bounding box) corresponding to the jth dismantling component of the ith target object, wherein the rotating boundary frame is represented by (X, y, w, h, theta), (X, y) represents the center point coordinate of the rotating boundary frame, (w, h) represents the width and height of the rotating boundary frame, theta represents the clockwise rotation angle of the rotating boundary frame relative to the X axis, and the expression mode of theta can be made in an arc system, and the value range is 0-theta < pi.

It should be noted that, in the embodiment of the present invention, each disassembled part of each target object included in the image is labeled with a subclass, and a rotating bounding box is used for labeling in the labeling process, so that the rotating bounding box just includes a complete disassembled part and does not include redundant background information.

In another preferred embodiment, the acquiring a second group of images through image acquisition according to the object category specifically includes:

Specifically, with reference to the foregoing embodiment, when the second group of images are obtained through image acquisition, the number of images corresponding to each object type in the first group of images may be obtained on the basis of the obtained first group of images, and the number of images corresponding to each object type in the first group of images is compared with a preset number threshold, so as to obtain object types whose number of images is smaller than the preset number threshold, and use these object types as object types to be supplemented, then, for the object types to be supplemented whose number of images is smaller than the preset number threshold, the second group of images including the object types to be supplemented may be obtained through image acquisition, that is, for the object types to be supplemented whose number of images is smaller than the preset number threshold, the images including the object types to be supplemented are acquired through image acquisition, to supplement the number of images containing the object class to be supplemented to a certain number.

It can be understood that the second group of images includes at least one image, and each image includes at least one target object corresponding to the object class to be supplemented.

In another preferred embodiment, the sub-class labeling of the disassembled part of each target object in each image in the initial data set to obtain the target part detection data set specifically includes:

Specifically, with reference to the above embodiment, when performing subclassing on the disassembling component of each target object included in each image in the initial data set, first, the initial data set is divided into a first data set and a second data set, and during actual division, partial images (for example, half of the number of images) in the initial data set may be randomly selected to constitute the first data set, and the remaining partial images (for example, the other half of the number of images) may constitute the second data set; then, subclassing the disassembled part of each target object contained in each image in the first data set (the labeling method is the same as that in the above embodiment), correspondingly obtaining a labeled first data set, dividing the labeled first data set into a training set and a test set, training a preset network model according to the training set, optimizing the trained network model according to the test set, and correspondingly obtaining an optimized network model; then, inputting the second data set into the optimized network model, and performing subclass labeling on the disassembled part of each target object contained in each image in the second data set according to the optimized network model to correspondingly obtain a labeled second data set; and finally, sorting and combining the labeled first data set and the labeled second data set to correspondingly obtain a target component detection data set.

It should be noted that the preset network model mainly includes a feature extraction network and a transmission network, the feature extraction network is used to extract features of the images in the training set, the features mainly include features of color, morphology, texture, spatial distribution and the like of the target object in the images, and the transmission network refers to a Convolutional Neural Network (CNN) for target detection, and can locate and detect sub-categories of the target object in the images.

When a preset network model is trained according to a training set, firstly, images in the training set are input into a feature extraction network, features are extracted from the images, then the extracted features of the images are input into a transmission network, the output of the transmission network is fitted with labeled class labels of the images in the training set, the network is trained end to end in a back propagation mode, and accordingly the trained network model is obtained.

When the trained network model is optimized according to the test set, images in the test set are input into the trained network model, whether the network model is good or bad is judged, whether the network model is good or bad can be determined by two evaluation indexes, namely detection speed and detection accuracy, after the images in the test set are input into the trained network model, the detection speed and the detection accuracy of the current network model on the images in the test set can be obtained, and if the two evaluation indexes are larger than a certain threshold value, the performance of the current network model is considered to meet the requirements, namely the network model is judged to be good; if the network model is judged to be not good, the network structure, the network parameters, the loss function and the like need to be adjusted, the training and optimizing steps are repeated until the performance of the trained network model meets the requirements, and the optimized network model is correspondingly obtained.

As an improvement of the above solution, after the subclassing the disassembled parts of each target object in each image in the second data set according to the optimized network model, the method further includes:

Specifically, with reference to the foregoing embodiment, after the labeled second data set is obtained, the labeling result corresponding to each image in the second data set may be further checked to determine whether the labeling result of each target object is qualified, if not, that is, if the corresponding image has a labeling defect, the image with the labeling defect is corrected, and the second data set after checking and correcting is correspondingly obtained; and finishing all labeling work on the initial data set, and correspondingly sorting and combining the labeled first data set and the second data set after checking and correcting to obtain a target component detection data set.

It should be noted that the image with the labeling defect generally exists in the image generated by the network model and corresponding to the labeling information, and the labeling information of the image may not be very accurate, which is related to the specific network structure, for example, the center coordinate of the rotating bounding box is selected improperly, the width and height are inaccurate, the rotation angle is improper, and the like, which all cause the rotating bounding box to position the target object inaccurately, and the corresponding image has the labeling defect, and then the center coordinate, the width, the height and the rotation angle of the rotating bounding box need to be corrected; for example, manual correction may be performed, and the specific operation method is: the position, size and rotation angle of the rotating bounding box are manually adjusted, so that the rotating bounding box just contains the complete disassembled parts without redundant background information.

An embodiment of the present invention further provides a target detection method, which is a flowchart of a preferred embodiment of the target detection method provided by the present invention, as shown in fig. 2, and the method includes steps S21 to S22:

step S21, training a preset target detection model according to a preset target component detection data set, where the target component detection data set is obtained by using the method for constructing a target component detection data set according to any one of the above embodiments, and the target detection model is the optimized network model according to the above embodiment;

and step S22, carrying out target detection on the image to be detected according to the trained target detection model to obtain a component detection result of the target object.

Specifically, with reference to the above embodiments, in the embodiments of the present invention, a target component detection data set and a target detection model are preset, where the target component detection data set is obtained by using the method for constructing a target component detection data set according to any of the above embodiments, the target detection model is an optimized network model obtained by using the above embodiments after training and optimizing the first data set, on this basis, the preset target detection model is trained according to the preset target component detection data set, and accordingly a trained target detection model is obtained, the trained target detection model is applied to an actual environment, an image to be detected is obtained by using an image acquisition device, the image to be detected is input into the trained target detection model, and target detection is performed on the image to be detected by using the trained target detection model, and finishing the detection task of each part of each target object contained in the image to be detected, and correspondingly obtaining the part detection result of the target object.

It should be noted that the target detection model itself is the optimized network model, and the embodiment of the present invention trains the optimized network model again through the target component detection data set, and the specific training steps include: (1) dividing a target component detection data set into a training set and a test set; (2) firstly, inputting images in a training set into a feature extraction network, extracting features from the images, and then inputting the extracted features of the images into a transmission network; (3) fitting the output of the transmission network with the labeled class labels of the images in the training set, and performing end-to-end training on the network in a reverse propagation mode to correspondingly obtain a trained network model; (4) inputting the images in the test set into the trained network model, and judging whether the network model is good or not, wherein the quality of the network model can be determined by two evaluation indexes, namely detection speed and detection accuracy, and after the images in the test set are input into the trained network model, the detection speed and the detection accuracy of the current network model on the images in the test set can be obtained, and if the two evaluation indexes are greater than a certain threshold value, the performance of the current network model is considered to meet the requirements, namely the network model is judged to be good; if the network model is judged to be not good, the network structure, the network parameters, the loss function and the like need to be adjusted, the training and optimizing steps are repeated until the performance of the trained network model meets the requirements, and the trained target detection model is obtained.

The target detection method provided by the embodiment of the invention constructs a target part detection data set suitable for robot grabbing, mainly comprises images which can be used for robot grabbing objects, and carries out part disassembly and labeling on the target objects contained in the images in the data set, and the rotary boundary frame is used for marking in the marking process, so that the rotary boundary frame just comprises a complete disassembled part and does not comprise redundant background information, compared with the prior art which uses a horizontal boundary frame for marking, more and more accurate object information is obtained, such as the placing angle of the target object, the grabbing position of the target object and the like, increases the expression of the space structure of each part of the object, solves the problem that in the task of grabbing the object by a robot, the grabbing position is difficult to position, and the angle of the object cannot be determined, so that the grabbing success rate of the robot is improved.

An embodiment of the present invention further provides a device for constructing a target component detection data set, which is used to implement the method for constructing a target component detection data set according to any of the above embodiments, and is shown in fig. 3, which is a block diagram of a preferred embodiment of the device for constructing a target component detection data set according to the present invention, and the device includes:

a first group image obtaining module 11, configured to obtain a first group of images from a preset public data set according to a preset object category, where the first group of images includes at least one image, and each image includes a target object corresponding to at least one object category;

a second group image obtaining module 12, configured to obtain a second group of images through image acquisition according to the object type, where the second group of images includes at least one image, and each image includes a target object corresponding to at least one object type;

an image component disassembling module 13, configured to perform component disassembling on each target object in each of the first group of images and the second group of images according to a preset component disassembling standard, so as to obtain a disassembled first group of images and a disassembled second group of images;

an initial data set constructing module 14, configured to construct an initial data set according to the disassembled first group of images and the disassembled second group of images;

and a target component detection data set construction module 15, configured to perform subclass labeling on the disassembled component of each target object in each image in the initial data set, so as to obtain a target component detection data set.

Preferably, the target component detection data set construction module 15 specifically includes an image component labeling unit, configured to perform subclass labeling on a jth dismantling component of an ith target object through the following steps:

Preferably, the second group of image acquisition modules 12 specifically includes:

the image quantity acquiring unit is used for acquiring the corresponding image quantity of each object type in the first group of images;

and the second group of image acquisition units are used for acquiring a second group of images through image acquisition aiming at the object class of which the number of the images is less than a preset number threshold.

Preferably, the target component detection data set construction module 15 further includes:

the first data set labeling unit is used for dividing the initial data set into a first data set and a second data set and carrying out subclass labeling on the disassembling component of each target object in each image in the first data set;

the network model training unit is used for dividing the marked first data set into a training set and a test set, training a preset network model according to the training set, and optimizing the trained network model according to the test set;

the second data set labeling unit is used for carrying out subclass labeling on the disassembled part of each target object in each image in the second data set according to the optimized network model;

and the target component detection data set acquisition unit is used for acquiring the target component detection data set according to the labeled first data set and the labeled second data set.

the annotation correction unit is used for detecting the annotation result of each image in the second data set and correcting the image with the annotation defect;

then, the target component detection data set acquisition unit is specifically configured to:

It should be noted that, the apparatus for constructing a target component detection data set according to an embodiment of the present invention can implement all the processes of the method for constructing a target component detection data set according to any one of the above embodiments, and the functions and achieved technical effects of each module and unit in the apparatus are respectively the same as those of the method for constructing a target component detection data set according to the above embodiment, and are not described herein again.

An embodiment of the present invention further provides a target detection apparatus, configured to implement the target detection method described in the foregoing embodiment, and as shown in fig. 4, the target detection apparatus is a block diagram of a preferred embodiment of the target detection apparatus provided in the present invention, where the apparatus includes:

a target detection model training module 21, configured to train a preset target detection model according to a preset target component detection data set, where the target component detection data set is obtained by using the method for constructing the target component detection data set according to any one of the above embodiments, and the target detection model is the optimized network model according to the above embodiment;

and the target object detection module 22 is configured to perform target detection on the image to be detected according to the trained target detection model, so as to obtain a component detection result of the target object.

It should be noted that, the target detection apparatus provided in the embodiment of the present invention can implement all the processes of the target detection method described in the above embodiment, and the functions and implemented technical effects of each module in the apparatus are respectively the same as those of the target detection method described in the above embodiment, and are not described herein again.

An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; wherein the computer program, when running, controls an apparatus on which the computer-readable storage medium is located to execute the method for constructing a target component detection data set according to any one of the above embodiments, or the method for detecting a target according to the above embodiment.

An embodiment of the present invention further provides a terminal device, as shown in fig. 5, which is a block diagram of a preferred embodiment of the terminal device provided in the present invention, where the terminal device includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, and when the computer program is executed, the processor 10 implements the method for constructing the target component detection data set according to any one of the above embodiments, or the target detection method according to the above embodiment.

Preferably, the computer program can be divided into one or more modules/units (e.g. computer program 1, computer program 2,) which are stored in the memory 20 and executed by the processor 10 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.

The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc., the general purpose Processor may be a microprocessor, or the Processor 10 may be any conventional Processor, the Processor 10 is a control center of the terminal device, and various interfaces and lines are used to connect various parts of the terminal device.

The memory 20 mainly includes a program storage area that may store an operating system, an application program required for at least one function, and the like, and a data storage area that may store related data and the like. In addition, the memory 20 may be a high speed random access memory, may also be a non-volatile memory, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, or the memory 20 may also be other volatile solid state memory devices.

It should be noted that the terminal device may include, but is not limited to, a processor and a memory, and those skilled in the art will understand that the structural block diagram of fig. 5 is only an example of the terminal device and does not constitute a limitation to the terminal device, and may include more or less components than those shown, or combine some components, or different components.

To sum up, the embodiment of the present invention provides a method, a device, a computer-readable storage medium, and a terminal device for constructing a target part detection data set suitable for robot grabbing, which mainly includes an image for a robot to grab an object, performing part disassembly and labeling on the target object included in the image in the data set, and labeling with a rotating bounding box during labeling, so that the rotating bounding box includes exactly a complete disassembled part but does not include redundant background information, compared with the prior art in which a horizontal bounding box is used for labeling, more accurate object information is obtained, such as a placing angle of the target object, a grabbing position of the target object, etc., an expression of a spatial structure of each part of the object is increased, and a task of robot grabbing the object is solved, the grabbing position is difficult to position, and the angle of the object cannot be determined, so that the grabbing success rate of the robot is improved.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A method of constructing a target component inspection dataset, comprising:

2. The method of constructing a target component detection data set according to claim 1, wherein the method subclasses the jth disassembled component of the ith target object by:

3. The method for constructing a target component inspection dataset according to claim 1, wherein said acquiring a second set of images by image acquisition based on said object class comprises:

4. The method according to claim 1, wherein the sub-class labeling of the disassembled parts of each target object in each image in the initial data set to obtain the target part detection data set specifically comprises:

5. The method of constructing a target component detection data set of claim 4, wherein after subclassing the disassembled component of each target object in each image in the second data set according to the optimized network model, the method further comprises:

6. A method of object detection, comprising:

training a preset target detection model according to a preset target component detection data set, wherein the target component detection data set is obtained by adopting the construction method of the target component detection data set according to any one of claims 1-5, and the target detection model is the optimized network model according to claim 4;

7. A target component detection data set construction device for implementing the target component detection data set construction method according to any one of claims 1 to 5, the device comprising:

8. An object detection apparatus for implementing the object detection method according to claim 6, the apparatus comprising:

the target detection model training module is used for training a preset target detection model according to a preset target component detection data set, wherein the target component detection data set is obtained by adopting the construction method of the target component detection data set according to any one of claims 1 to 5, and the target detection model is the optimized network model according to claim 4;

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of constructing a target component detection data set according to any one of claims 1 to 5, or the target detection method according to claim 6.

10. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of constructing a target component detection data set according to any one of claims 1 to 5 or the target detection method according to claim 6 when executing the computer program.