CN112966777B - Semi-automatic labeling method and system based on human-computer interaction - Google Patents

Semi-automatic labeling method and system based on human-computer interaction Download PDF

Info

Publication number
CN112966777B
CN112966777B CN202110328124.2A CN202110328124A CN112966777B CN 112966777 B CN112966777 B CN 112966777B CN 202110328124 A CN202110328124 A CN 202110328124A CN 112966777 B CN112966777 B CN 112966777B
Authority
CN
China
Prior art keywords
image
semi
gaussian
generating
automatic labeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110328124.2A
Other languages
Chinese (zh)
Other versions
CN112966777A (en
Inventor
张新钰
李骏
李志伟
刘宇红
王力
卢一倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110328124.2A priority Critical patent/CN112966777B/en
Publication of CN112966777A publication Critical patent/CN112966777A/en
Application granted granted Critical
Publication of CN112966777B publication Critical patent/CN112966777B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/40Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semi-automatic labeling method and a semi-automatic labeling system based on human-computer interaction, wherein the method comprises the following steps: fusing the RGB image to be marked with the generated first Gaussian heat map; preprocessing the fused image; inputting the preprocessed fusion image into a pre-established and trained semi-automatic labeling model, and labeling a plurality of prediction frames for the RGB image to be labeled; when the prediction frame does not meet the requirements, the prediction frame which does not meet the requirements is corrected by generating a second Gaussian heatmap. The method of the invention uses the Gauss heat map as prior information to detect the target, thereby achieving the expected effect of semi-automatic marking; the defect that manual labeling is time-consuming and labor-consuming can be overcome, and the labeling precision is improved.

Description

Semi-automatic labeling method and system based on human-computer interaction
Technical Field
The invention belongs to the field of target detection, particularly relates to a method for achieving a labeling purpose by utilizing extra clicks as prior information and fusing with an RGB (red, green and blue) image, and particularly relates to a semi-automatic labeling method and system based on human-computer interaction.
Background
With the rapid development of technologies such as internet, machine learning, big data and cloud computing, various information data are continuously increasing at an exponential speed, and under the background of big data era, the development of computer vision is relatively mature, so that diversified industrial applications are developed. The labeling data set is an important step for deep learning, but the labeling data is a very tedious work, and the semi-automatic labeling can reduce the workload. Even if the existing semi-automatic marking tool with an open source exists, the used premise is that the higher the precision of the model is, the better the precision is, and if the detection result is inaccurate, the workload is increased, so that the semi-automatic marking tool cannot be used. Moreover, the semi-automatic labeling tool may cause inaccurate detection results and target undetected, so that manual labeling is required for improvement. However, the efficiency and the precision of the model are difficult to balance, so that the mainstream method is considered to be full-manual labeling at present.
However, manually labeling data sets has several disadvantages:
1. data annotation labor cost is high: the target detection algorithm needs massive labeling samples, and the current massive data labeling task depends on manual implementation, "how much manual work is, how much intelligence is, and the cost for manufacturing the labeled data set is high.
2. The quality of manual labeling is difficult to guarantee: the problem of inconsistent judgment scales exists in different annotating personnel and annotating teams, the annotating task is greatly influenced by the subjectivity of the annotating personnel and the examiners, certain annotation errors can be introduced, and the data consistency is difficult to guarantee.
3. Marking threshold height of the target detection data set: compared with massive data labeling requirements, labeling personnel in the professional field are too scarce, so that the labeling threshold is high, and the labeling scale is difficult to keep consistent.
Disclosure of Invention
The invention aims to overcome the technical defects and provides a method for realizing semi-automatic labeling based on a simulated click fusion RGB channel on the basis of a Faster rcnn model. The method can provide the picture with position information for detecting the target when the Faster rcnn fails to detect the target, and solves the problem that the Faster rcnn is easy to mistakenly detect or fail to detect due to the influence of conditions such as illumination conditions, shielding and shadows when the Faster rcnn detects the target. And on the basis of limiting the generation of the prediction frame according to the position information provided in advance, the probability of the generation of the false detection frame is inhibited, the semi-automatic labeling process based on human-computer interaction is effectively realized, and the detection effect and the robustness of the model are improved.
In order to achieve the above object, embodiment 1 of the present invention provides a semi-automatic labeling method based on human-computer interaction, where the method includes:
fusing the RGB image to be marked with the generated first Gaussian heat map;
preprocessing the fused image;
inputting the preprocessed fusion image into a pre-established and trained semi-automatic labeling model, and labeling a plurality of prediction frames for the RGB image to be labeled;
and when the prediction frame does not meet the requirements, correcting the prediction frame which does not meet the requirements by generating a second Gaussian heatmap.
As an improvement of the above method, the RGB image to be annotated is fused with the generated first gaussian heat map; the method specifically comprises the following steps:
determining a region where a target is located on the RGB image to be marked by using the upper left coordinate and the lower right coordinate in the xml file corresponding to the RGB image to be marked, setting all pixel points in the region where the target is located to be 255, and setting pixel points outside the region where the target is located to be 0, thereby generating a Mask image;
generating a plurality of simulation points randomly in the central range of the Mask image, thereby generating a first Gaussian heat map heat1
Figure BDA0002995382390000021
Wherein, (x, y) is the point coordinate on the picture, (x)1,m,y1,m) The coordinate of the mth simulation point is shown, and M is the number of the simulation points; the sigma value is 10, r14, representing the radiation range of the simulation point;
the RGB image to be marked is compared with the generated first Gaussian heat map heat1And performing add operation.
As an improvement of the above method, the semi-automatic labeling model is a Resnet50+ FPN structure with attention mechanism added.
As an improvement of the above method, the method further comprises: the method for training the semi-automatic labeling model specifically comprises the following steps:
establishing a data set for training a model; the data set includes: the system comprises a label file set and an image file set, wherein the label file set comprises a plurality of xml files, the image file set comprises a plurality of RGB images, and the xml files and the RGB images are in one-to-one correspondence;
traversing each RGB image in the data set, determining an area where a target is located on the RGB image by using the upper left coordinate and the lower right coordinate in the corresponding xml file, then setting all pixel points in the area where the target is located to be 255, and setting pixel points outside the area where the target is located to be 0; thereby generating a Mask image, and then generating a first gaussian heat map using the Mask image;
fusing the three-channel RGB image and the first Gaussian heat map, and then preprocessing the fused image; as input for a semi-automatic annotation model;
and setting the size, batch processing quantity, training round times and learning rate of each round of the encoder and the decoder for semi-automatic labeling model training, and training the model.
As an improvement of the above method, when the prediction frame does not meet the requirement, the correction operation is performed on the prediction frame which does not meet the requirement by generating a second gaussian heat map; the method specifically comprises the following steps:
step S1) carrying out intersection ratio calculation on each prediction frame output by the model and a GT frame to obtain the value of the IOU; if the IOU is larger than the threshold value, the prediction frame is not corrected, otherwise, the prediction frame is processed, and the step S2 is executed);
step S2) performing deviation calculation on the prediction frame to be corrected and the coordinates of the xml file, and generating a plurality of simulation points at random according to the deviation to generate a second Gaussian heatmap heat2
Figure BDA0002995382390000031
Wherein (x, y) is the coordinates of a point on the image, (x)2,n,y2,n) Is the nth simulation point, and N is the number of the simulation points; the sigma value in the formula is 10, r2=6;
Step S3) the three-channel RGB image and the single channel of the generated second Gaussian heatmap are spliced, the semi-automatic labeling model is input again, the image of the labeling prediction frame is output, and the step S1) is carried out until the intersection ratio of all the prediction frames and the GT frame is smaller than the threshold value.
The embodiment 2 of the invention provides a semi-automatic labeling system based on human-computer interaction, which comprises: the system comprises a trained semi-automatic labeling model, a fusion module, a preprocessing module, a labeling module and a correction module;
the fusion module is used for fusing the RGB image to be marked with the generated first Gaussian heat map;
the preprocessing module is used for preprocessing the fused image;
the labeling module is used for inputting the preprocessed fusion image into the trained semi-automatic labeling model and labeling the RGB image to be labeled with a plurality of prediction frames;
and the correction module is used for correcting the prediction frame which does not meet the requirement by generating a second Gaussian heatmap when the prediction frame does not meet the requirement.
Compared with the prior art, the invention has the advantages that:
1. according to the method, firstly, extra clicks are used as prior information to detect the target, and then the expected effect of semi-automatic labeling is achieved; then, the false detection rate is reduced through punishment on the false detection target, and the detection effect is further improved;
2. the semi-automatic labeling model of the method not only greatly improves the precision of the original model, but also has good migration effect on different data sets; the defect that manual marking is time-consuming and labor-consuming can be overcome, and detection near the current position can be actively enhanced by the model only by manually providing prior information of the position of the target;
3. in the method, the predicted position may have slight deviation sometimes, so that manual correction is needed, but the problems of high cost and long time consumption exist in the manual correction; therefore, the invention adds a correction part to correct the target with an unsatisfactory prediction result so as to ensure that the prediction is correct;
4. the invention improves the original structure, and adds a correction operation on the basis of interactive detection, so that the correction purpose is achieved.
Drawings
FIG. 1 is a schematic diagram of a semi-automatic labeling method based on human-computer interaction according to embodiment 1 of the present invention;
FIG. 2 is a diagram of a semi-automatic labeling model of the present invention;
FIG. 3 is a schematic diagram of an SKNet part of a semi-automatic labeling model;
the specific implementation mode is as follows:
in order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Before describing the embodiments of the present invention, the related terms related to the embodiments of the present invention are first explained as follows:
RGB image: the method refers to a color image acquired by a monocular camera, and the color image is a three-channel image.
Labeling: and representing class labels used for the supervised training of the target detection neural network, and labeling the class of each target of the color image.
As shown in fig. 1, embodiment 1 of the present invention provides a semi-automatic labeling method based on human-computer interaction, which includes the following specific implementation steps:
step 1) fusing an RGB image to be marked with a Gaussian heat map generated by simulated clicking; the method specifically comprises the following steps:
step 101) generating a Mask image through an xml file corresponding to an image to be marked;
determining a region where a target is located on an RGB image to be marked by using an upper left coordinate and a lower right coordinate in an xml file, setting all pixel points in the region where the target is located to be 255 (any number larger than 0 can be set), and setting pixel points outside the region where the target is located to be 0, so that the region where the target is located can be divided from a background region; thereby generating a Mask image, wherein the Mask image is a gray level image;
step 102) generating a first Gaussian heat map by using Mask images;
randomly generating a point in the central range of the Mask image, generating a first Gaussian heat map by taking the point as the center, and adding all the generated Gaussian heat maps if the simulated click is a plurality of points to form the first Gaussian heat map; fusing the generated first Gaussian heatmap serving as prior information with the RGB image to be marked, and improving the punishment coefficient of the false detection target to reduce the false detection rate;
the coordinates of the simulated click, i.e., the center point of the gaussian heat map, the resulting first gaussian heat map functions in an additive fashion because there may be more than one target per picture.
Generating a first Gaussian heat map heat1The formula of (1) is:
Figure BDA0002995382390000051
wherein, (x, y) is the point coordinate on the picture, (x)1,m,y1,m) Coordinates of points generated for the mth simulation click, wherein M is the number of simulation points; the sigma value in the formula is 10, r14, represents the radiation range of the point simulating the click.
Since a matrix is obtained through the above process, the matrix is converted into the same format as the picture, and thus the result is processed. The processing procedure is to establish a matrix with the same size as the picture size and initialize the matrix, wherein the initialized dimension is (h, w, 1), h is the height of the image, w is the width of the image, and 1 is the number of channels. Since the type obtained at this time is an array and the desired data type is a tensor, it is necessary to convert the array type into a tensor type.
Step 103) fusing the RGB image to be annotated with the generated first Gaussian heat map;
firstly, judging whether the dimensionality of the RGB image is the same as that of the Gaussian heat map generated in the process, and if the dimensionality is the same, performing add operation on the RGB image and the first Gaussian heat map.
Step 2) preprocessing the fused image: whitening, turning, cutting and other operations are carried out on the image;
step 3) constructing and training a semi-automatic labeling model, inputting the preprocessed fused image into the semi-automatic labeling model, and labeling a prediction frame on the fused image;
step 301) constructing a semi-automatic labeling model;
as shown in fig. 2, a basic backbone of the semi-automatic labeling model adopts a supervised model, and a backbone network selects Resnet50 and optimizes the backbone network by selecting an FPN structure. Since the RPN in the fast RCNN is done by the last layer's features. And (3) convolving the characteristics of the last layer by 3x3 to obtain convolutional layers of 256 channels, and convolving by two 1x1 to obtain a category score and a frame regression result. The RPN sub-network behind the feature layer is referred to herein as a network header. Each point on the feature layer is preset with 9 boxes in the manner of anchors. The boxes themselves contain different dimensions and different aspect ratios.
The FPN improves on RPN by applying a network header to each P layer. Since each P layer has different scale information relative to the original picture, the author separates the scale information in the original RPN, allowing each P layer to process only a single scale information. Specifically, for {32 }2、642、1282、2562、5122Anchors of the five scales correspond to five feature layers of { P2, P3, P4, P5 and P6}, respectively. Each feature layer processes candidate frames with three length-width ratios of 1:1, 1:2 and 2: 1. P6 is designed specifically for RPN networks to handle candidate boxes of 512 sizes. It is obtained by down-sampling P5. In addition, the parameters of the above 5 network headers are shared.
In order to further improve the model accuracy, the invention adds an attention mechanism in the front part of the backbone network, and the attention mechanism adopts a sknet method, as shown in fig. 3;
specifically, Selective Kernel Networks, which initiate the concept of dynamically modulating their own receptive field from cortical neurons according to different stimuli, are products that combine the ideas of SE operator, Merge-and-Run maps, and anchorage on entrapment block. In terms of design concept, simple is compared, namely Selective Kernel modification is carried out on all convolution kernels larger than 1, the convenience of smaller theoretical parameters and floats brought by group/depthwise convolution is fully utilized, and the design of adding multiple paths and dynamic selection does not bring large overhead (but the actual acceleration optimization of group/depthwise is not particularly good at present, so that the actual speed is slightly slow), and table 1 can be specifically referred. The design makes it very easy to select Kernel for any network, and only the convolution larger than 1 needs to be switched.
TABLE 1
Figure BDA0002995382390000071
Step 302), training a semi-automatic labeling model;
the model takes its own input value as a supervised label value of the neural network, and an exemplary training process of the semi-automatic labeling model is described below.
Step 302-1) building a data set for training a model; the data set includes: the system comprises a label file set and an image file set, wherein the label file set comprises a plurality of xml files, the image file set comprises a plurality of RGB images, and the xml files and the RGB images are in one-to-one correspondence.
Step 302-2) traversing each RGB image in the data set, determining a region where the target is located on the RGB image by using the upper left coordinate and the lower right coordinate in the xml file, then setting all pixel points in the region where the target is located to be 255 (any number larger than 0 can be set), and setting pixel points outside the region where the target is located to be 0, so that the region where the target is located can be divided from a background region; thereby generating a Mask image, and then generating a first gaussian heat map using the Mask image;
step 302-3) fusing the three-channel RGB image and the first Gaussian heat map, and then preprocessing the fused image; as input for a semi-automatic annotation model;
step 302-4) implementing a supervised neural network based on a tool pytorre, setting the size, batch processing number, training round times, learning rate of each round and other hyper-parameters needing to be artificially defined of an encoder and a decoder of the network, starting training, obtaining a prior frame with a larger IOU corresponding to a GT frame in a label coding process, calculating a prediction result which should be obtained by an anchor with a larger IOU (Intersection-over-Unit) corresponding to a real frame, and finding the anchor corresponding to each real frame of each picture for training, namely the prior frame with the largest IOU. For the predicted frame position obtained by network prediction not being the actual frame position, the position of the final frame can be obtained after the combination processing with the prior frame.
The decoding process requires two parameters, the convolved location and anchor. And then, calculating errors according to the loss function, updating network parameters by using a back propagation algorithm, and completing the first round of training until all rounds of training are completed. In order to accelerate the training speed of the network, the activation function of the network is selected as a ReLU function when the network is trained.
Wherein the loss function for the RPN is:
Figure BDA0002995382390000081
wherein:
pi: probability of the anchor prediction being the target;
Figure BDA0002995382390000082
when the number is 0, the negative label is negative label, and when the number is 1, the positive label is positive label;
ti=(tx,ty,tw,th) Is a vector representing the 4 parameterized coordinates of the prediction box, txTo predict the abscissa of the center of the frame, tyAs ordinate of the center of the prediction box, twTo predict the width of the frame, thIs the high of the prediction box;
Figure BDA0002995382390000083
coordinate vectors of the GT boxes corresponding to anchors where the targets are detected;
Lcls: the class loss adopts a cross entropy function, is the logarithmic loss of two classes (target and non-target), and has the formula
Figure BDA0002995382390000084
Lreg: regression loss is calculated by adopting smooth L1 function, and the formula is as follows:
Figure BDA0002995382390000085
Figure BDA0002995382390000086
this term means that there is only foreground
Figure BDA0002995382390000087
There is a regression loss, otherwise
Figure BDA0002995382390000088
There was no regression loss. The output of the classification layer and the regression layer is respectively composed of { piAnd tiAre composed of NclsAnd NresAnd a balance weight λ normalization, where λ is 10, the normalized value of the classification term being the size of mini _ atch, i.e., NclsThe normalized value of the regression term is the number of anchor positions, i.e., N, 256res2400(40 × 60), so that the classification terms are weighted more or less equally to the regression terms.
The loss function for Faster rcnn is:
Figure BDA0002995382390000089
Lloc(. cndot.) is a regression loss function, with the first formula being the loss formula for u as the foreground and the second formula being the loss formula for u as the background.
p: a probability distribution predicted by the classifier; u: true category labels;tu: t matrix corresponding to GT frame; v: predicting a corresponding t matrix; in this case, λ is 1, and the loss is obtained as a weighted sum of the two, and the regression loss is not considered if the classification is background.
Step 302-5) when the prediction box is not ideal, a correction operation can be performed, which specifically comprises:
step 302-5-1) when training a batch of data, carrying out intersection ratio calculation on each prediction frame output by the model and a GT frame (target prediction area) to obtain the value of IOU; if the IOU is larger than a certain threshold value, the predicted frame is not corrected by default, otherwise, the predicted frame is processed, and the step 301-5-2) is carried out; the threshold is set to 0.7 at this time, meaning that prediction blocks with an intersection ratio less than 0.7 are reprocessed.
Step 302-5-2) performing deviation calculation on the prediction frame to be corrected and the coordinates of the corresponding xml file, and generating a plurality of simulation points again at random according to the deviation, thereby generating a second Gaussian heatmap heat2
Figure BDA0002995382390000091
Wherein (x, y) is the coordinates of a point on the image, (x)2,n,y2,n) Is the nth simulation point, and N is the number of the simulation points; the sigma value in the formula is 10, r2And 6, representing the radiation range of the simulation point. The irradiance range setting is larger than the previous step and represents a gaussian heat map generated with more emphasis on the correction points.
Step 302-5-3) then the three-channel RGB image is stitched with the single channel of the second gaussian heatmap, the input size of the network is (batch _ size, 3, h, w), and the stitched network input is (batch _ size, 4, h, w).
And further optimizing the trained model, realizing the visualization of human-computer interaction, testing the mobility of the data set and improving the migration effect as much as possible.
Step 303) inputting the preprocessed fused image into a trained semi-automatic labeling model, and labeling a plurality of prediction frames on the fused image;
step 4) when the prediction frame output in the step 303) has an undesired prediction frame, correcting the undesired prediction frame, including:
the correction is taken as an independent part, and if the prediction condition is not ideal, the correction part starts to work. And generating a new simulation point by calculating the deviation between the GT frame and the prediction frame, thereby enabling the model to better detect the target.
Step 401) judging whether correction is needed according to the value of the IOU;
performing intersection ratio calculation on each prediction frame output by the model and a GT frame (target prediction region) to obtain the value of the IOU; if the IOU is larger than a certain threshold value, the predicted frame is not corrected by default, otherwise, the predicted frame is processed, and the step 402 is entered); the threshold is set to 0.7 at this time, meaning that prediction blocks with an intersection ratio less than 0.7 are reprocessed.
Step 402) performing deviation calculation on the prediction frame needing to be corrected and the coordinates of the xml file, and generating a plurality of simulation points again at random according to the deviation, thereby generating a second Gaussian heatmap heat2
Figure BDA0002995382390000101
Wherein (x, y) is the coordinates of a point on the image, (x)2,n,y2,n) Is the nth simulation point, and N is the number of the simulation points; the sigma value in the formula is 10, r2And 6, representing the radiation range of the simulation point. The irradiance range setting is larger than the previous step and represents a gaussian heat map generated with more emphasis on the correction points.
And 403) fusing the three-channel RGB image with the generated single channel of the second Gaussian heatmap, inputting the semi-automatic labeling model again, outputting the image labeled with the prediction frame, and turning to the step 401) until the intersection ratio of all the prediction frames is less than 0.7.
Embodiment 2 of the present invention provides a semi-automatic labeling system based on human-computer interaction, including: the system comprises a trained semi-automatic labeling model, a fusion module, a preprocessing module, a labeling module and a correction module;
the fusion module is used for fusing the RGB image to be marked with the generated first Gaussian heat map;
the preprocessing module is used for preprocessing the fused image;
the labeling module is used for inputting the preprocessed fusion image into the trained semi-automatic labeling model and labeling the RGB image to be labeled with a plurality of prediction frames;
and the correction module is used for correcting the prediction frame which does not meet the requirement by generating a second Gaussian heatmap when the prediction frame does not meet the requirement.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (5)

1. A semi-automatic labeling method based on human-computer interaction, the method comprising:
fusing the RGB image to be marked with the generated first Gaussian heat map;
preprocessing the fused image;
inputting the preprocessed fusion image into a pre-established and trained semi-automatic labeling model, and labeling a plurality of prediction frames for the RGB image to be labeled;
when the prediction frame does not meet the requirements, correcting the prediction frame which does not meet the requirements by generating a second Gaussian heatmap;
fusing the RGB image to be marked with the generated first Gaussian heat map; the method specifically comprises the following steps:
determining a region where a target is located on the RGB image to be marked by using the upper left coordinate and the lower right coordinate in the xml file corresponding to the RGB image to be marked, setting all pixel points in the region where the target is located to be 255, and setting pixel points outside the region where the target is located to be 0, thereby generating a Mask image;
generating a plurality of simulation points randomly in the central range of the Mask image, thereby generating a first Gaussian heat map heat1
Figure FDA0003216149350000011
Wherein (x, y) is the coordinates of a point on the image, (x)1,m,y1,m) The coordinate of the mth simulation point is shown, and M is the number of the simulation points; the sigma value is 10, r14, representing the radiation range of the simulation point;
the RGB image to be marked is compared with the generated first Gaussian heat map heat1And performing add operation.
2. The semi-automatic labeling method based on human-computer interaction of claim 1, wherein the semi-automatic labeling model is a Resnet50+ FPN structure for adding attention mechanism.
3. The semi-automatic labeling method based on human-computer interaction of claim 2, characterized in that the method further comprises: the method for training the semi-automatic labeling model specifically comprises the following steps:
establishing a data set for training a model; the data set includes: the system comprises a label file set and an image file set, wherein the label file set comprises a plurality of xml files, the image file set comprises a plurality of RGB images, and the xml files and the RGB images are in one-to-one correspondence;
traversing each RGB image in the data set, determining an area where a target is located on the RGB image by using the upper left coordinate and the lower right coordinate in the corresponding xml file, then setting all pixel points in the area where the target is located to be 255, and setting pixel points outside the area where the target is located to be 0; thereby generating a Mask image, and then generating a first gaussian heat map using the Mask image;
fusing the three-channel RGB image and the first Gaussian heat map, and then preprocessing the fused image; as an input to a neural network model;
and setting the size, batch processing number, training round times and learning rate of each round of the encoder and the decoder for training the neural network model, and training the model.
4. The semi-automatic labeling method based on human-computer interaction of claim 3, wherein when the prediction box does not meet the requirement, the prediction box which does not meet the requirement is corrected by generating a second Gaussian heatmap; the method specifically comprises the following steps:
step S1), carrying out intersection ratio calculation on each prediction box output by the semi-automatic labeling model and a GT box to obtain the value of the IOU; if the IOU is larger than the threshold value, the prediction frame is not corrected, otherwise, the prediction frame is processed, and the step S2 is executed);
step S2) performing deviation calculation on the prediction frame to be corrected and the coordinates of the xml file, and generating a plurality of simulation points at random according to the deviation to generate a second Gaussian heatmap heat2
Figure FDA0003216149350000021
Wherein (x, y) is the coordinates of a point on the image, (x)2,n,y2,n) Is the nth simulation point, and N is the number of the simulation points; the sigma value in the formula is 10, r2=6;
Step S3) the three-channel RGB image and the single channel of the generated second Gaussian heatmap are spliced, the semi-automatic labeling model is input again, the image labeled with the prediction frame is output, and the step S1) is carried out until the intersection ratio of all the prediction frames and the GT frame is smaller than the threshold value.
5. A semi-automatic labeling system based on human-computer interaction, characterized in that the system comprises: the system comprises a trained semi-automatic labeling model, a fusion module, a preprocessing module, a labeling module and a correction module;
the fusion module is used for fusing the RGB image to be marked with the generated first Gaussian heat map;
the preprocessing module is used for preprocessing the fused image;
the labeling module is used for inputting the preprocessed fusion image into the trained semi-automatic labeling model and labeling the RGB image to be labeled with a plurality of prediction frames;
the correction module is used for correcting the prediction frame which does not meet the requirements by generating a second Gaussian heatmap when the prediction frame does not meet the requirements;
the specific processing procedure of the fusion module comprises the following steps:
determining a region where a target is located on the RGB image to be marked by using the upper left coordinate and the lower right coordinate in the xml file corresponding to the RGB image to be marked, setting all pixel points in the region where the target is located to be 255, and setting pixel points outside the region where the target is located to be 0, thereby generating a Mask image;
generating a plurality of simulation points randomly in the central range of the Mask image, thereby generating a first Gaussian heat map heat1
Figure FDA0003216149350000031
Wherein (x, y) is the coordinates of a point on the image, (x)1,m,y1,m) The coordinate of the mth simulation point is shown, and M is the number of the simulation points; the sigma value is 10, r14, representing the radiation range of the simulation point;
the RGB image to be marked is compared with the generated first Gaussian heat map heat1And performing add operation.
CN202110328124.2A 2021-03-26 2021-03-26 Semi-automatic labeling method and system based on human-computer interaction Active CN112966777B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110328124.2A CN112966777B (en) 2021-03-26 2021-03-26 Semi-automatic labeling method and system based on human-computer interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110328124.2A CN112966777B (en) 2021-03-26 2021-03-26 Semi-automatic labeling method and system based on human-computer interaction

Publications (2)

Publication Number Publication Date
CN112966777A CN112966777A (en) 2021-06-15
CN112966777B true CN112966777B (en) 2021-11-30

Family

ID=76278690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110328124.2A Active CN112966777B (en) 2021-03-26 2021-03-26 Semi-automatic labeling method and system based on human-computer interaction

Country Status (1)

Country Link
CN (1) CN112966777B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743416B (en) * 2021-08-24 2024-03-05 的卢技术有限公司 Data enhancement method for non-real sample situation in OCR field
CN116612474B (en) * 2023-07-20 2023-11-03 深圳思谋信息科技有限公司 Object detection method, device, computer equipment and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108227912B (en) * 2017-11-30 2021-05-11 北京市商汤科技开发有限公司 Device control method and apparatus, electronic device, computer storage medium
CN108520223B (en) * 2018-04-02 2021-11-12 广州方硅信息技术有限公司 Video image segmentation method, segmentation device, storage medium and terminal equipment
CN108846309A (en) * 2018-04-27 2018-11-20 淘然视界(杭州)科技有限公司 A kind of pedestrian's automatic marking method and system
CN110570352B (en) * 2019-08-26 2021-11-05 腾讯科技(深圳)有限公司 Image labeling method, device and system and cell labeling method
CN110866908B (en) * 2019-11-12 2021-03-26 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, server, and storage medium

Also Published As

Publication number Publication date
CN112966777A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN109859190B (en) Target area detection method based on deep learning
CN110674714B (en) Human face and human face key point joint detection method based on transfer learning
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN111428765B (en) Target detection method based on global convolution and local depth convolution fusion
CN109919934B (en) Liquid crystal panel defect detection method based on multi-source domain deep transfer learning
CN111275688A (en) Small target detection method based on context feature fusion screening of attention mechanism
CN111815564B (en) Method and device for detecting silk ingots and silk ingot sorting system
CN112966777B (en) Semi-automatic labeling method and system based on human-computer interaction
EP3989104A1 (en) Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
CN114663346A (en) Strip steel surface defect detection method based on improved YOLOv5 network
CN112116620B (en) Indoor image semantic segmentation and coating display method
CN110880176B (en) Semi-supervised industrial image defect segmentation method based on countermeasure generation network
CN110111346B (en) Remote sensing image semantic segmentation method based on parallax information
CN111768415A (en) Image instance segmentation method without quantization pooling
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3
CN111739037B (en) Semantic segmentation method for indoor scene RGB-D image
CN111914720A (en) Method and device for identifying insulator burst of power transmission line
CN113420643A (en) Lightweight underwater target detection method based on depth separable cavity convolution
CN111652297B (en) Fault picture generation method for image detection model training
CN111652836A (en) Multi-scale target detection method based on clustering algorithm and neural network
CN112364881B (en) Advanced sampling consistency image matching method
CN115100489A (en) Image processing method, device and equipment and readable storage medium
US11978189B2 (en) Defect detection method and apparatus
CN114863232A (en) Shadow detection method based on bidirectional multistage feature pyramid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant