CN114373075A - Target component detection data set construction method, detection method, device and equipment - Google Patents

Target component detection data set construction method, detection method, device and equipment Download PDF

Info

Publication number
CN114373075A
CN114373075A CN202111683433.8A CN202111683433A CN114373075A CN 114373075 A CN114373075 A CN 114373075A CN 202111683433 A CN202111683433 A CN 202111683433A CN 114373075 A CN114373075 A CN 114373075A
Authority
CN
China
Prior art keywords
data set
target
images
image
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111683433.8A
Other languages
Chinese (zh)
Inventor
石光明
白洁
李旭阳
饶承炜
谢雪梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Institute of Technology of Xidian University
Original Assignee
Guangzhou Institute of Technology of Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Institute of Technology of Xidian University filed Critical Guangzhou Institute of Technology of Xidian University
Priority to CN202111683433.8A priority Critical patent/CN114373075A/en
Publication of CN114373075A publication Critical patent/CN114373075A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a construction method, a detection method, a device and equipment of a target component detection data set, wherein the method comprises the following steps: acquiring a first group of images from a preset public data set according to a preset object type; acquiring a second group of images through image acquisition according to the object type; according to a preset component disassembly standard, performing component disassembly on each target object in each image in the first group of images and the second group of images to obtain a disassembled first group of images and a disassembled second group of images; constructing an initial data set according to the disassembled first group of images and the disassembled second group of images; and performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set. By adopting the technical scheme of the invention, more accurate object information can be obtained by disassembling and marking the parts of the target object, so that the grabbing success rate of the robot is improved in practical application.

Description

Target component detection data set construction method, detection method, device and equipment
Technical Field
The invention relates to the technical field of image processing and computer vision, in particular to a construction method of a target component detection data set, a target detection method, a target detection device, a computer readable storage medium and terminal equipment.
Background
The target detection is a hot direction of computer vision and digital image processing, is widely applied to various fields of robots, intelligent video monitoring, industrial detection, aerospace and the like, reduces the consumption of human capital through the computer vision, and has important practical significance. Therefore, the target detection becomes a research hotspot of theory and application in recent years, is an important branch of image processing and computer vision discipline, is also a core part of an intelligent monitoring system, is also a basic algorithm in the field of universal identity recognition, and plays a vital role in subsequent tasks such as face recognition, gait recognition, crowd counting, instance segmentation and the like. Training data is inevitably needed when neural network development in the aspect of target detection is carried out, and currently, relatively well-known data sets used in target detection mainly comprise PASCAL VOC, MS COCO and ImageNet.
The PASCAL VOC challenge is a benchmark test for classification identification and detection of visual objects, providing a standard image annotation dataset and standard evaluation system for detection algorithms and learning performance. The PASCAL VOC provides a standardized set of excellent datasets for image recognition and classification that can be used for image classification, object detection, image segmentation. One of the tasks of the PASCAL VOC challenge is the Person layout task Competition, i.e. the bounding box and the corresponding labels that predict the body parts (head, hand, foot, etc.), the image used for this task is additionally labeled with the body parts (head, hand, foot, etc.) of the Person.
The MS COCO data set is a large-scale image data set developed and maintained by Microsoft, the COCO data set is the data set most commonly used for image detection and positioning at present, and is a new image identification, segmentation and caption data set, and the annotation information of the image not only comprises category and position information, but also comprises semantic text description of the image.
The ImageNet data set is a computer vision system recognition project, is the largest database of image recognition in the world at present, and is established by computer scientists of Stanford in the United states through a recognition system simulating human beings. The ImageNet dataset is a large-scale labeled image dataset organized according to the WordNet architecture, and approximately comprises 1500 ten thousand pictures and 2.2 ten thousand classes, and each picture is strictly screened and labeled manually.
However, the existing data sets used for object detection, such as the various data sets mentioned above, include many object categories that the robot does not need to operate, and the entire object is labeled with a horizontal bounding box (horizontal bounding box), the recognition of the object is fuzzy, which makes it difficult to determine the exact position of the object when the robot grasps the object, resulting in a low success rate of grasping by the robot.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a method for constructing a target component detection data set, a target detection method, an apparatus, a computer-readable storage medium, and a terminal device, which can obtain more accurate object information by performing component disassembly and labeling on a target object, thereby improving the capturing success rate of a robot in practical applications.
In order to solve the technical problem, an embodiment of the present invention provides a method for constructing a target component detection data set, including:
acquiring a first group of images from a preset public data set according to a preset object type, wherein the first group of images comprises at least one image, and each image comprises a target object corresponding to at least one object type;
acquiring a second group of images through image acquisition according to the object types, wherein the second group of images comprise at least one image, and each image comprises a target object corresponding to at least one object type;
performing component disassembly on each target object in each image in the first group of images and the second group of images according to a preset component disassembly standard to obtain a disassembled first group of images and a disassembled second group of images;
constructing an initial data set according to the disassembled first group of images and the disassembled second group of images;
and performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set.
Further, the method carries out subclass labeling on the jth dismantling component of the ith target object through the following steps:
generating a rectangular boundary frame corresponding to the jth disassembling component based on the target image of the ith target object;
and adjusting the rectangular boundary frame according to the position of the jth dismantling component in the target image to obtain a rotating boundary frame corresponding to the jth dismantling component, wherein in an image coordinate system established based on the target image, the rotating boundary frame is represented by (X, y, w, h, theta), (X, y) represents the center point coordinate of the rotating boundary frame, (w, h) represents the width and height of the rotating boundary frame, theta represents the rotation angle of the rotating boundary frame relative to the X axis, and theta is greater than or equal to 0 and less than pi.
Further, the acquiring a second group of images through image acquisition according to the object type specifically includes:
acquiring the number of images corresponding to each object type in the first group of images;
and acquiring a second group of images through image acquisition aiming at the object categories with the number of the images smaller than a preset number threshold.
Further, the sub-class labeling of the disassembled part of each target object in each image in the initial data set to obtain the target part detection data set specifically includes:
dividing the initial data set into a first data set and a second data set, and carrying out subclass labeling on a disassembled part of each target object in each image in the first data set;
dividing the labeled first data set into a training set and a testing set, training a preset network model according to the training set, and optimizing the trained network model according to the testing set;
performing subclass labeling on the disassembled part of each target object in each image in the second data set according to the optimized network model;
and obtaining the target component detection data set according to the labeled first data set and the labeled second data set.
Further, after the subclassing the disassembled parts of each target object in each image in the second data set according to the optimized network model, the method further includes:
inspecting the labeling result of each image in the second data set, and correcting the image labeled with the defect;
then, the obtaining the target component detection data set according to the labeled first data set and the labeled second data set specifically includes:
and obtaining the target component detection data set according to the marked first data set and the inspection corrected second data set.
In order to solve the above technical problem, an embodiment of the present invention further provides a target detection method, including:
training a preset target component detection model according to a preset target component detection data set, wherein the target component detection data set is obtained by adopting any one of the target component detection data set construction methods, and the target detection model is the optimized network model in the embodiment;
and carrying out target detection on the image to be detected according to the trained target detection model to obtain a component detection result of the target object.
In order to solve the technical problem, an embodiment of the present invention further provides a target component detection data set construction apparatus, configured to implement any one of the above target component detection data sets construction methods, where the apparatus includes:
the first group of image acquisition modules are used for acquiring a first group of images from a preset public data set according to preset object categories, wherein the first group of images comprise at least one image, and each image comprises a target object corresponding to at least one object category;
the second group of image acquisition module is used for acquiring a second group of images through image acquisition according to the object types, wherein the second group of images comprise at least one image, and each image comprises a target object corresponding to at least one object type;
the image component disassembling module is used for performing component disassembling on each target object in each image in the first group of images and the second group of images according to a preset component disassembling standard to obtain a disassembled first group of images and a disassembled second group of images;
the initial data set construction module is used for constructing an initial data set according to the disassembled first group of images and the disassembled second group of images;
and the target component detection data set construction module is used for carrying out subclass labeling on the disassembled component of each target object in each image in the initial data set to obtain a target component detection data set.
In order to solve the above technical problem, an embodiment of the present invention further provides an object detection apparatus, configured to implement the object detection method in the foregoing embodiment, where the apparatus includes:
a target detection model training module, configured to train a preset target detection model according to a preset target component detection data set, where the target component detection data set is obtained by using any one of the above target component detection data set construction methods, and the target detection model is the optimized network model in the above embodiment;
and the target object detection module is used for carrying out target detection on the image to be detected according to the trained target detection model to obtain a component detection result of the target object.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; the computer program, when running, controls an apparatus where the computer-readable storage medium is located to execute any one of the above methods for constructing a target component detection data set, or the target detection methods in the above embodiments.
The embodiment of the present invention further provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements the method for constructing the target component detection data set according to any one of the above embodiments when executing the computer program, or the target detection method according to the above embodiment.
Compared with the prior art, the embodiment of the invention provides a construction method of a target component detection data set, a target detection method, a target detection device, a computer readable storage medium and a terminal device, wherein a first group of images are acquired from a preset public data set according to a preset object type; acquiring a second group of images through image acquisition according to the object type; performing component disassembly on each target object in each image in the first group of images and the second group of images according to a preset component disassembly standard to obtain a disassembled first group of images and a disassembled second group of images; constructing an initial data set according to the disassembled first group of images and the disassembled second group of images; performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set; the embodiment of the invention constructs a target part detection data set suitable for robot grabbing, mainly comprising images which can be used for the robot to grab objects, and by disassembling and marking the target objects contained in the images in the data set, the expression relationship of the spatial structure of each part of the object is increased, more accurate object information can be obtained, and the grabbing success rate of the robot is improved in practical application.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a method for constructing a target component inspection data set provided by the present invention;
FIG. 2 is a flow chart of a preferred embodiment of a method for object detection provided by the present invention;
FIG. 3 is a block diagram of a preferred embodiment of an apparatus for constructing a target component inspection data set according to the present invention;
FIG. 4 is a block diagram of a preferred embodiment of an object detection apparatus provided in the present invention;
fig. 5 is a block diagram of a preferred embodiment of a terminal device provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.
An embodiment of the present invention provides a method for constructing a target component detection data set, which is a flowchart of a preferred embodiment of the method for constructing a target component detection data set according to the present invention, as shown in fig. 1, and the method includes steps S11 to S15:
step S11, acquiring a first group of images from a preset public data set according to a preset object type, where the first group of images includes at least one image, and each image includes a target object corresponding to at least one object type.
Specifically, in the embodiment of the present invention, a plurality of object categories are preset, and each object category corresponds to one target object, so that images including the object categories can be screened from a preset public data set according to the preset object categories, and a first group of images can be obtained according to the screened images; it is understood that the first set of images includes at least one image, and each image includes a target object corresponding to at least one object class.
It should be noted that the preset object type is generally an object type suitable for robot operation, and mainly based on the complexity of the operation of the object by the human, an object that can be simply operated by a single person is selected as a target object, and a corresponding object type is set, meanwhile, the target object has a simple structure, the weight and the volume cannot be too large, and the target object is an object that is common in daily life and has a high operation frequency, such as a cup, a bottle, a door and the like common in life; if the operation of the object by the person is complicated, the operation of complicated mechanical or electronic equipment such as a bicycle, an automobile and the like is not considered.
Illustratively, the public data sets mainly include MS COCO and OpenImage, images including preset object categories can be screened out from the two data sets, and in the actual screening process, the number of screened images including the target object can be divided into different levels according to different use frequencies of the target object; for example, cups, bottles, doors, etc., which are most common in life, are used frequently, and the number of images including such objects is large, while other objects which are used frequently or have simple structural features are screened for a relatively small number of images.
Step S12, acquiring a second group of images through image acquisition according to the object types, where the second group of images includes at least one image, and each image includes a target object corresponding to at least one object type.
Specifically, besides screening out images containing the object types from the public data set, the method can also acquire images of different target objects in different scenes through image acquisition equipment based on the object types, and correspondingly acquire a second group of images; it is understood that the second group of images includes at least one image, and each image includes a target object corresponding to at least one object class.
Step S13, performing component disassembly on each target object in each image in the first set of images and the second set of images according to a preset component disassembly standard, and obtaining a disassembled first set of images and a disassembled second set of images.
Specifically, according to the embodiment of the present invention, a component disassembly standard for performing component disassembly on the target object is preset according to a component function or a structural feature of the target object, so that according to the preset component disassembly standard, component disassembly is performed on each target object included in each image in the obtained first group of images, so as to obtain a disassembled first group of images correspondingly, and component disassembly is performed on each target object included in each image in the obtained second group of images, so as to obtain a disassembled second group of images correspondingly.
It can be understood that, in the disassembling in the embodiment of the present invention, the image is not disassembled, but each component of the target object included in the image is disassembled based on the target object corresponding to the selected object type, so as to disassemble the same target object into different disassembling components, and the same target object includes at least one disassembling component.
And step S14, constructing an initial data set according to the disassembled first group of images and the disassembled second group of images.
Specifically, after the disassembled first group of images and the disassembled second group of images are obtained, the disassembled first group of images and the disassembled second group of images may be sorted and merged, and an initial data set is correspondingly constructed.
And step S15, performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set.
Specifically, after the initial data set is obtained, subclass labeling is performed on the disassembled part of each target object included in each disassembled image in the initial data set, so as to obtain a labeled initial data set correspondingly, where the labeled initial data set is the finally obtained target part detection data set.
For example, assuming that the selected object types suitable for the robot operation include 45 object types, corresponding to 45 target objects, the parts of each target object included in each image in the initial data set are disassembled according to the set object part disassembling criteria (the part disassembling criteria are based on the functional or structural features of each part); for example, for a hand drill, the hand drill can be disassembled into three parts, namely a hand drill handle (drill handle), a hand drill body (drill body) and a drill bit (drill bits); for a bottle, the bottle can be disassembled into a bottle body (bottle body), a bottle neck (bottle neck) and a bottle cap (bottle cap), and other target objects are disassembled according to the same standard; correspondingly, 45 object categories are divided into 88 sub-categories; of these, 45 object classes (target objects) and their corresponding component breakups, as well as the number of images in the initial dataset, are shown in table 1.
It can be understood that when subclassing is performed on each disassembled part of the target object included in the image, the target object is not marked as a whole, but each disassembled part obtained by disassembling is marked, for example, if one image simultaneously includes four types of target objects, namely, a bottle, a plate, a knife and a fork, subclassing is performed on a bottle cap, a bottle neck, a bottle body, a plate edge, a plate center, a knife handle, a knife head, a fork head and a handle in sequence.
Table 145 kinds of object class data table
Figure BDA0003451388120000091
Figure BDA0003451388120000101
Figure BDA0003451388120000111
According to the construction method of the target component detection data set, provided by the embodiment of the invention, a first group of images are acquired from a preset public data set according to a preset object type; acquiring a second group of images through image acquisition according to the object type; performing component disassembly on each target object in each image in the first group of images and the second group of images according to a preset component disassembly standard to obtain a disassembled first group of images and a disassembled second group of images; constructing an initial data set according to the disassembled first group of images and the disassembled second group of images; performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set; the embodiment of the invention constructs a target part detection data set suitable for robot grabbing, mainly comprising images which can be used for the robot to grab objects, and by disassembling and marking the target objects contained in the images in the data set, the expression relationship of the spatial structure of each part of the object is increased, more accurate object information can be obtained, and the grabbing success rate of the robot is improved in practical application.
In another preferred embodiment, the method subclasses the jth disassembled component of the ith target object by:
generating a rectangular boundary frame corresponding to the jth disassembling component based on the target image of the ith target object;
and adjusting the rectangular boundary frame according to the position of the jth dismantling component in the target image to obtain a rotating boundary frame corresponding to the jth dismantling component, wherein in an image coordinate system established based on the target image, the rotating boundary frame is represented by (X, y, w, h, theta), (X, y) represents the center point coordinate of the rotating boundary frame, (w, h) represents the width and height of the rotating boundary frame, theta represents the rotation angle of the rotating boundary frame relative to the X axis, and theta is greater than or equal to 0 and less than pi.
Specifically, with reference to the above embodiment, when performing data set labeling on an initial data set, subclass labeling needs to be performed on each disassembled part of each target object included in each disassembled image in the initial data set, and a specific labeling process will be described below with a subclass labeling performed on a jth disassembled part of an ith target object as an example (i and j are positive integers greater than 0):
determining a target image where the ith target object is located, and establishing an image coordinate system based on the target image, for example, taking the upper left vertex of the target image as a coordinate origin, taking the horizontal rightward direction along the target image as the positive direction of an X axis, and taking the vertical downward direction along the target image as the positive direction of a Y axis; then, in the image coordinate system, a corresponding rectangular bounding box (rectangular bounding box) is generated for the jth disassembling component of the ith target object, wherein the rectangular bounding box is represented by (X0, y0, w0, h0, θ 0), wherein (X0, y0) represents a center point coordinate initial value of the rectangular bounding box, (w0, h0) represents a width initial value and a height initial value of the rectangular bounding box, and θ 0 represents a rotation angle initial value of the rectangular bounding box clockwise relative to the X axis; and adjusting the relevant initial value of the rectangular boundary frame according to the position of the jth dismantling component of the ith target object in the target image to obtain a rotating boundary frame (oriented bounding box) corresponding to the jth dismantling component of the ith target object, wherein the rotating boundary frame is represented by (X, y, w, h, theta), (X, y) represents the center point coordinate of the rotating boundary frame, (w, h) represents the width and height of the rotating boundary frame, theta represents the clockwise rotation angle of the rotating boundary frame relative to the X axis, and the expression mode of theta can be made in an arc system, and the value range is 0-theta < pi.
It should be noted that, in the embodiment of the present invention, each disassembled part of each target object included in the image is labeled with a subclass, and a rotating bounding box is used for labeling in the labeling process, so that the rotating bounding box just includes a complete disassembled part and does not include redundant background information.
In another preferred embodiment, the acquiring a second group of images through image acquisition according to the object category specifically includes:
acquiring the number of images corresponding to each object type in the first group of images;
and acquiring a second group of images through image acquisition aiming at the object categories with the number of the images smaller than a preset number threshold.
Specifically, with reference to the foregoing embodiment, when the second group of images are obtained through image acquisition, the number of images corresponding to each object type in the first group of images may be obtained on the basis of the obtained first group of images, and the number of images corresponding to each object type in the first group of images is compared with a preset number threshold, so as to obtain object types whose number of images is smaller than the preset number threshold, and use these object types as object types to be supplemented, then, for the object types to be supplemented whose number of images is smaller than the preset number threshold, the second group of images including the object types to be supplemented may be obtained through image acquisition, that is, for the object types to be supplemented whose number of images is smaller than the preset number threshold, the images including the object types to be supplemented are acquired through image acquisition, to supplement the number of images containing the object class to be supplemented to a certain number.
It can be understood that the second group of images includes at least one image, and each image includes at least one target object corresponding to the object class to be supplemented.
In another preferred embodiment, the sub-class labeling of the disassembled part of each target object in each image in the initial data set to obtain the target part detection data set specifically includes:
dividing the initial data set into a first data set and a second data set, and carrying out subclass labeling on a disassembled part of each target object in each image in the first data set;
dividing the labeled first data set into a training set and a testing set, training a preset network model according to the training set, and optimizing the trained network model according to the testing set;
performing subclass labeling on the disassembled part of each target object in each image in the second data set according to the optimized network model;
and obtaining the target component detection data set according to the labeled first data set and the labeled second data set.
Specifically, with reference to the above embodiment, when performing subclassing on the disassembling component of each target object included in each image in the initial data set, first, the initial data set is divided into a first data set and a second data set, and during actual division, partial images (for example, half of the number of images) in the initial data set may be randomly selected to constitute the first data set, and the remaining partial images (for example, the other half of the number of images) may constitute the second data set; then, subclassing the disassembled part of each target object contained in each image in the first data set (the labeling method is the same as that in the above embodiment), correspondingly obtaining a labeled first data set, dividing the labeled first data set into a training set and a test set, training a preset network model according to the training set, optimizing the trained network model according to the test set, and correspondingly obtaining an optimized network model; then, inputting the second data set into the optimized network model, and performing subclass labeling on the disassembled part of each target object contained in each image in the second data set according to the optimized network model to correspondingly obtain a labeled second data set; and finally, sorting and combining the labeled first data set and the labeled second data set to correspondingly obtain a target component detection data set.
It should be noted that the preset network model mainly includes a feature extraction network and a transmission network, the feature extraction network is used to extract features of the images in the training set, the features mainly include features of color, morphology, texture, spatial distribution and the like of the target object in the images, and the transmission network refers to a Convolutional Neural Network (CNN) for target detection, and can locate and detect sub-categories of the target object in the images.
When a preset network model is trained according to a training set, firstly, images in the training set are input into a feature extraction network, features are extracted from the images, then the extracted features of the images are input into a transmission network, the output of the transmission network is fitted with labeled class labels of the images in the training set, the network is trained end to end in a back propagation mode, and accordingly the trained network model is obtained.
When the trained network model is optimized according to the test set, images in the test set are input into the trained network model, whether the network model is good or bad is judged, whether the network model is good or bad can be determined by two evaluation indexes, namely detection speed and detection accuracy, after the images in the test set are input into the trained network model, the detection speed and the detection accuracy of the current network model on the images in the test set can be obtained, and if the two evaluation indexes are larger than a certain threshold value, the performance of the current network model is considered to meet the requirements, namely the network model is judged to be good; if the network model is judged to be not good, the network structure, the network parameters, the loss function and the like need to be adjusted, the training and optimizing steps are repeated until the performance of the trained network model meets the requirements, and the optimized network model is correspondingly obtained.
As an improvement of the above solution, after the subclassing the disassembled parts of each target object in each image in the second data set according to the optimized network model, the method further includes:
inspecting the labeling result of each image in the second data set, and correcting the image labeled with the defect;
then, the obtaining the target component detection data set according to the labeled first data set and the labeled second data set specifically includes:
and obtaining the target component detection data set according to the marked first data set and the inspection corrected second data set.
Specifically, with reference to the foregoing embodiment, after the labeled second data set is obtained, the labeling result corresponding to each image in the second data set may be further checked to determine whether the labeling result of each target object is qualified, if not, that is, if the corresponding image has a labeling defect, the image with the labeling defect is corrected, and the second data set after checking and correcting is correspondingly obtained; and finishing all labeling work on the initial data set, and correspondingly sorting and combining the labeled first data set and the second data set after checking and correcting to obtain a target component detection data set.
It should be noted that the image with the labeling defect generally exists in the image generated by the network model and corresponding to the labeling information, and the labeling information of the image may not be very accurate, which is related to the specific network structure, for example, the center coordinate of the rotating bounding box is selected improperly, the width and height are inaccurate, the rotation angle is improper, and the like, which all cause the rotating bounding box to position the target object inaccurately, and the corresponding image has the labeling defect, and then the center coordinate, the width, the height and the rotation angle of the rotating bounding box need to be corrected; for example, manual correction may be performed, and the specific operation method is: the position, size and rotation angle of the rotating bounding box are manually adjusted, so that the rotating bounding box just contains the complete disassembled parts without redundant background information.
An embodiment of the present invention further provides a target detection method, which is a flowchart of a preferred embodiment of the target detection method provided by the present invention, as shown in fig. 2, and the method includes steps S21 to S22:
step S21, training a preset target detection model according to a preset target component detection data set, where the target component detection data set is obtained by using the method for constructing a target component detection data set according to any one of the above embodiments, and the target detection model is the optimized network model according to the above embodiment;
and step S22, carrying out target detection on the image to be detected according to the trained target detection model to obtain a component detection result of the target object.
Specifically, with reference to the above embodiments, in the embodiments of the present invention, a target component detection data set and a target detection model are preset, where the target component detection data set is obtained by using the method for constructing a target component detection data set according to any of the above embodiments, the target detection model is an optimized network model obtained by using the above embodiments after training and optimizing the first data set, on this basis, the preset target detection model is trained according to the preset target component detection data set, and accordingly a trained target detection model is obtained, the trained target detection model is applied to an actual environment, an image to be detected is obtained by using an image acquisition device, the image to be detected is input into the trained target detection model, and target detection is performed on the image to be detected by using the trained target detection model, and finishing the detection task of each part of each target object contained in the image to be detected, and correspondingly obtaining the part detection result of the target object.
It should be noted that the target detection model itself is the optimized network model, and the embodiment of the present invention trains the optimized network model again through the target component detection data set, and the specific training steps include: (1) dividing a target component detection data set into a training set and a test set; (2) firstly, inputting images in a training set into a feature extraction network, extracting features from the images, and then inputting the extracted features of the images into a transmission network; (3) fitting the output of the transmission network with the labeled class labels of the images in the training set, and performing end-to-end training on the network in a reverse propagation mode to correspondingly obtain a trained network model; (4) inputting the images in the test set into the trained network model, and judging whether the network model is good or not, wherein the quality of the network model can be determined by two evaluation indexes, namely detection speed and detection accuracy, and after the images in the test set are input into the trained network model, the detection speed and the detection accuracy of the current network model on the images in the test set can be obtained, and if the two evaluation indexes are greater than a certain threshold value, the performance of the current network model is considered to meet the requirements, namely the network model is judged to be good; if the network model is judged to be not good, the network structure, the network parameters, the loss function and the like need to be adjusted, the training and optimizing steps are repeated until the performance of the trained network model meets the requirements, and the trained target detection model is obtained.
The target detection method provided by the embodiment of the invention constructs a target part detection data set suitable for robot grabbing, mainly comprises images which can be used for robot grabbing objects, and carries out part disassembly and labeling on the target objects contained in the images in the data set, and the rotary boundary frame is used for marking in the marking process, so that the rotary boundary frame just comprises a complete disassembled part and does not comprise redundant background information, compared with the prior art which uses a horizontal boundary frame for marking, more and more accurate object information is obtained, such as the placing angle of the target object, the grabbing position of the target object and the like, increases the expression of the space structure of each part of the object, solves the problem that in the task of grabbing the object by a robot, the grabbing position is difficult to position, and the angle of the object cannot be determined, so that the grabbing success rate of the robot is improved.
An embodiment of the present invention further provides a device for constructing a target component detection data set, which is used to implement the method for constructing a target component detection data set according to any of the above embodiments, and is shown in fig. 3, which is a block diagram of a preferred embodiment of the device for constructing a target component detection data set according to the present invention, and the device includes:
a first group image obtaining module 11, configured to obtain a first group of images from a preset public data set according to a preset object category, where the first group of images includes at least one image, and each image includes a target object corresponding to at least one object category;
a second group image obtaining module 12, configured to obtain a second group of images through image acquisition according to the object type, where the second group of images includes at least one image, and each image includes a target object corresponding to at least one object type;
an image component disassembling module 13, configured to perform component disassembling on each target object in each of the first group of images and the second group of images according to a preset component disassembling standard, so as to obtain a disassembled first group of images and a disassembled second group of images;
an initial data set constructing module 14, configured to construct an initial data set according to the disassembled first group of images and the disassembled second group of images;
and a target component detection data set construction module 15, configured to perform subclass labeling on the disassembled component of each target object in each image in the initial data set, so as to obtain a target component detection data set.
Preferably, the target component detection data set construction module 15 specifically includes an image component labeling unit, configured to perform subclass labeling on a jth dismantling component of an ith target object through the following steps:
generating a rectangular boundary frame corresponding to the jth disassembling component based on the target image of the ith target object;
and adjusting the rectangular boundary frame according to the position of the jth dismantling component in the target image to obtain a rotating boundary frame corresponding to the jth dismantling component, wherein in an image coordinate system established based on the target image, the rotating boundary frame is represented by (X, y, w, h, theta), (X, y) represents the center point coordinate of the rotating boundary frame, (w, h) represents the width and height of the rotating boundary frame, theta represents the rotation angle of the rotating boundary frame relative to the X axis, and theta is greater than or equal to 0 and less than pi.
Preferably, the second group of image acquisition modules 12 specifically includes:
the image quantity acquiring unit is used for acquiring the corresponding image quantity of each object type in the first group of images;
and the second group of image acquisition units are used for acquiring a second group of images through image acquisition aiming at the object class of which the number of the images is less than a preset number threshold.
Preferably, the target component detection data set construction module 15 further includes:
the first data set labeling unit is used for dividing the initial data set into a first data set and a second data set and carrying out subclass labeling on the disassembling component of each target object in each image in the first data set;
the network model training unit is used for dividing the marked first data set into a training set and a test set, training a preset network model according to the training set, and optimizing the trained network model according to the test set;
the second data set labeling unit is used for carrying out subclass labeling on the disassembled part of each target object in each image in the second data set according to the optimized network model;
and the target component detection data set acquisition unit is used for acquiring the target component detection data set according to the labeled first data set and the labeled second data set.
Preferably, the target component detection data set construction module 15 further includes:
the annotation correction unit is used for detecting the annotation result of each image in the second data set and correcting the image with the annotation defect;
then, the target component detection data set acquisition unit is specifically configured to:
and obtaining the target component detection data set according to the marked first data set and the inspection corrected second data set.
It should be noted that, the apparatus for constructing a target component detection data set according to an embodiment of the present invention can implement all the processes of the method for constructing a target component detection data set according to any one of the above embodiments, and the functions and achieved technical effects of each module and unit in the apparatus are respectively the same as those of the method for constructing a target component detection data set according to the above embodiment, and are not described herein again.
An embodiment of the present invention further provides a target detection apparatus, configured to implement the target detection method described in the foregoing embodiment, and as shown in fig. 4, the target detection apparatus is a block diagram of a preferred embodiment of the target detection apparatus provided in the present invention, where the apparatus includes:
a target detection model training module 21, configured to train a preset target detection model according to a preset target component detection data set, where the target component detection data set is obtained by using the method for constructing the target component detection data set according to any one of the above embodiments, and the target detection model is the optimized network model according to the above embodiment;
and the target object detection module 22 is configured to perform target detection on the image to be detected according to the trained target detection model, so as to obtain a component detection result of the target object.
It should be noted that, the target detection apparatus provided in the embodiment of the present invention can implement all the processes of the target detection method described in the above embodiment, and the functions and implemented technical effects of each module in the apparatus are respectively the same as those of the target detection method described in the above embodiment, and are not described herein again.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; wherein the computer program, when running, controls an apparatus on which the computer-readable storage medium is located to execute the method for constructing a target component detection data set according to any one of the above embodiments, or the method for detecting a target according to the above embodiment.
An embodiment of the present invention further provides a terminal device, as shown in fig. 5, which is a block diagram of a preferred embodiment of the terminal device provided in the present invention, where the terminal device includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, and when the computer program is executed, the processor 10 implements the method for constructing the target component detection data set according to any one of the above embodiments, or the target detection method according to the above embodiment.
Preferably, the computer program can be divided into one or more modules/units (e.g. computer program 1, computer program 2,) which are stored in the memory 20 and executed by the processor 10 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.
The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc., the general purpose Processor may be a microprocessor, or the Processor 10 may be any conventional Processor, the Processor 10 is a control center of the terminal device, and various interfaces and lines are used to connect various parts of the terminal device.
The memory 20 mainly includes a program storage area that may store an operating system, an application program required for at least one function, and the like, and a data storage area that may store related data and the like. In addition, the memory 20 may be a high speed random access memory, may also be a non-volatile memory, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, or the memory 20 may also be other volatile solid state memory devices.
It should be noted that the terminal device may include, but is not limited to, a processor and a memory, and those skilled in the art will understand that the structural block diagram of fig. 5 is only an example of the terminal device and does not constitute a limitation to the terminal device, and may include more or less components than those shown, or combine some components, or different components.
To sum up, the embodiment of the present invention provides a method, a device, a computer-readable storage medium, and a terminal device for constructing a target part detection data set suitable for robot grabbing, which mainly includes an image for a robot to grab an object, performing part disassembly and labeling on the target object included in the image in the data set, and labeling with a rotating bounding box during labeling, so that the rotating bounding box includes exactly a complete disassembled part but does not include redundant background information, compared with the prior art in which a horizontal bounding box is used for labeling, more accurate object information is obtained, such as a placing angle of the target object, a grabbing position of the target object, etc., an expression of a spatial structure of each part of the object is increased, and a task of robot grabbing the object is solved, the grabbing position is difficult to position, and the angle of the object cannot be determined, so that the grabbing success rate of the robot is improved.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of constructing a target component inspection dataset, comprising:
acquiring a first group of images from a preset public data set according to a preset object type, wherein the first group of images comprises at least one image, and each image comprises a target object corresponding to at least one object type;
acquiring a second group of images through image acquisition according to the object types, wherein the second group of images comprise at least one image, and each image comprises a target object corresponding to at least one object type;
performing component disassembly on each target object in each image in the first group of images and the second group of images according to a preset component disassembly standard to obtain a disassembled first group of images and a disassembled second group of images;
constructing an initial data set according to the disassembled first group of images and the disassembled second group of images;
and performing subclass labeling on the disassembled part of each target object in each image in the initial data set to obtain a target part detection data set.
2. The method of constructing a target component detection data set according to claim 1, wherein the method subclasses the jth disassembled component of the ith target object by:
generating a rectangular boundary frame corresponding to the jth disassembling component based on the target image of the ith target object;
and adjusting the rectangular boundary frame according to the position of the jth dismantling component in the target image to obtain a rotating boundary frame corresponding to the jth dismantling component, wherein in an image coordinate system established based on the target image, the rotating boundary frame is represented by (X, y, w, h, theta), (X, y) represents the center point coordinate of the rotating boundary frame, (w, h) represents the width and height of the rotating boundary frame, theta represents the rotation angle of the rotating boundary frame relative to the X axis, and theta is greater than or equal to 0 and less than pi.
3. The method for constructing a target component inspection dataset according to claim 1, wherein said acquiring a second set of images by image acquisition based on said object class comprises:
acquiring the number of images corresponding to each object type in the first group of images;
and acquiring a second group of images through image acquisition aiming at the object categories with the number of the images smaller than a preset number threshold.
4. The method according to claim 1, wherein the sub-class labeling of the disassembled parts of each target object in each image in the initial data set to obtain the target part detection data set specifically comprises:
dividing the initial data set into a first data set and a second data set, and carrying out subclass labeling on a disassembled part of each target object in each image in the first data set;
dividing the labeled first data set into a training set and a testing set, training a preset network model according to the training set, and optimizing the trained network model according to the testing set;
performing subclass labeling on the disassembled part of each target object in each image in the second data set according to the optimized network model;
and obtaining the target component detection data set according to the labeled first data set and the labeled second data set.
5. The method of constructing a target component detection data set of claim 4, wherein after subclassing the disassembled component of each target object in each image in the second data set according to the optimized network model, the method further comprises:
inspecting the labeling result of each image in the second data set, and correcting the image labeled with the defect;
then, the obtaining the target component detection data set according to the labeled first data set and the labeled second data set specifically includes:
and obtaining the target component detection data set according to the marked first data set and the inspection corrected second data set.
6. A method of object detection, comprising:
training a preset target detection model according to a preset target component detection data set, wherein the target component detection data set is obtained by adopting the construction method of the target component detection data set according to any one of claims 1-5, and the target detection model is the optimized network model according to claim 4;
and carrying out target detection on the image to be detected according to the trained target detection model to obtain a component detection result of the target object.
7. A target component detection data set construction device for implementing the target component detection data set construction method according to any one of claims 1 to 5, the device comprising:
the first group of image acquisition modules are used for acquiring a first group of images from a preset public data set according to preset object categories, wherein the first group of images comprise at least one image, and each image comprises a target object corresponding to at least one object category;
the second group of image acquisition module is used for acquiring a second group of images through image acquisition according to the object types, wherein the second group of images comprise at least one image, and each image comprises a target object corresponding to at least one object type;
the image component disassembling module is used for performing component disassembling on each target object in each image in the first group of images and the second group of images according to a preset component disassembling standard to obtain a disassembled first group of images and a disassembled second group of images;
the initial data set construction module is used for constructing an initial data set according to the disassembled first group of images and the disassembled second group of images;
and the target component detection data set construction module is used for carrying out subclass labeling on the disassembled component of each target object in each image in the initial data set to obtain a target component detection data set.
8. An object detection apparatus for implementing the object detection method according to claim 6, the apparatus comprising:
the target detection model training module is used for training a preset target detection model according to a preset target component detection data set, wherein the target component detection data set is obtained by adopting the construction method of the target component detection data set according to any one of claims 1 to 5, and the target detection model is the optimized network model according to claim 4;
and the target object detection module is used for carrying out target detection on the image to be detected according to the trained target detection model to obtain a component detection result of the target object.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of constructing a target component detection data set according to any one of claims 1 to 5, or the target detection method according to claim 6.
10. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of constructing a target component detection data set according to any one of claims 1 to 5 or the target detection method according to claim 6 when executing the computer program.
CN202111683433.8A 2021-12-31 2021-12-31 Target component detection data set construction method, detection method, device and equipment Pending CN114373075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111683433.8A CN114373075A (en) 2021-12-31 2021-12-31 Target component detection data set construction method, detection method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111683433.8A CN114373075A (en) 2021-12-31 2021-12-31 Target component detection data set construction method, detection method, device and equipment

Publications (1)

Publication Number Publication Date
CN114373075A true CN114373075A (en) 2022-04-19

Family

ID=81141168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111683433.8A Pending CN114373075A (en) 2021-12-31 2021-12-31 Target component detection data set construction method, detection method, device and equipment

Country Status (1)

Country Link
CN (1) CN114373075A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858569A (en) * 2019-03-07 2019-06-07 中国科学院自动化研究所 Multi-tag object detecting method, system, device based on target detection network
CN111222395A (en) * 2019-10-21 2020-06-02 杭州飞步科技有限公司 Target detection method and device and electronic equipment
WO2020164282A1 (en) * 2019-02-14 2020-08-20 平安科技(深圳)有限公司 Yolo-based image target recognition method and apparatus, electronic device, and storage medium
CN112633355A (en) * 2020-12-18 2021-04-09 北京迈格威科技有限公司 Image data processing method and device and target detection model training method and device
CN113095434A (en) * 2021-04-27 2021-07-09 深圳市商汤科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113139096A (en) * 2021-05-10 2021-07-20 中国科学院深圳先进技术研究院 Video data set labeling method and device
CN113408566A (en) * 2020-11-17 2021-09-17 腾讯科技(深圳)有限公司 Target detection method and related equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020164282A1 (en) * 2019-02-14 2020-08-20 平安科技(深圳)有限公司 Yolo-based image target recognition method and apparatus, electronic device, and storage medium
CN109858569A (en) * 2019-03-07 2019-06-07 中国科学院自动化研究所 Multi-tag object detecting method, system, device based on target detection network
CN111222395A (en) * 2019-10-21 2020-06-02 杭州飞步科技有限公司 Target detection method and device and electronic equipment
CN113408566A (en) * 2020-11-17 2021-09-17 腾讯科技(深圳)有限公司 Target detection method and related equipment
CN112633355A (en) * 2020-12-18 2021-04-09 北京迈格威科技有限公司 Image data processing method and device and target detection model training method and device
CN113095434A (en) * 2021-04-27 2021-07-09 深圳市商汤科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113139096A (en) * 2021-05-10 2021-07-20 中国科学院深圳先进技术研究院 Video data set labeling method and device

Similar Documents

Publication Publication Date Title
US20200210702A1 (en) Apparatus and method for image processing to calculate likelihood of image of target object detected from input image
CN103439348B (en) Remote controller key defect detection method based on difference image method
US8542912B2 (en) Determining the uniqueness of a model for machine vision
Liang et al. In-line inspection solution for codes on complex backgrounds for the plastic container industry
CN113989944B (en) Operation action recognition method, device and storage medium
CN111368682B (en) Method and system for detecting and identifying station caption based on master RCNN
KR20210020065A (en) Systems and methods for finding and classifying patterns in images with vision systems
CN112613579A (en) Model training method and evaluation method for human face or human head image quality and selection method for high-quality image
US8542905B2 (en) Determining the uniqueness of a model for machine vision
US10937150B2 (en) Systems and methods of feature correspondence analysis
CN110956656A (en) Spindle positioning method based on depth target detection
CN114219753A (en) Power equipment surface defect detection method based on deep learning and terminal
CN111104942B (en) Template matching network training method, recognition method and device
CN113743434A (en) Training method of target detection network, image augmentation method and device
CN112434581A (en) Outdoor target color identification method and system, electronic device and storage medium
CN111079752A (en) Method and device for identifying circuit breaker in infrared image and readable storage medium
CN114373075A (en) Target component detection data set construction method, detection method, device and equipment
CN113420839B (en) Semi-automatic labeling method and segmentation positioning system for stacking planar target objects
CN114998357A (en) Industrial detection method, system, terminal and medium based on multi-information analysis
CN107491780A (en) A kind of anti-down hanging method of calligraphy based on SIFT
WO2012092132A2 (en) Determining the uniqueness of a model for machine vision
WO2021192024A1 (en) Work management device and work status determination method
Mentari et al. Detecting Objects Using Haar Cascade for Human Counting Implemented in OpenMV
CN105184244A (en) Video face detection method and apparatus
CN110738260A (en) Method, device and equipment for detecting placement of space boxes of retail stores of types

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220419

RJ01 Rejection of invention patent application after publication