WO2023126280A1 - A system and method for quality check of labelled images - Google Patents
A system and method for quality check of labelled images Download PDFInfo
- Publication number
- WO2023126280A1 WO2023126280A1 PCT/EP2022/087313 EP2022087313W WO2023126280A1 WO 2023126280 A1 WO2023126280 A1 WO 2023126280A1 EP 2022087313 W EP2022087313 W EP 2022087313W WO 2023126280 A1 WO2023126280 A1 WO 2023126280A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- images
- labelled
- image
- labelled images
- graphical representation
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000011218 segmentation Effects 0.000 claims abstract description 29
- 230000006870 function Effects 0.000 claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 238000002372 labelling Methods 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 description 10
- 230000002452 interceptive effect Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000003672 processing method Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 241000282412 Homo Species 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 235000009499 Vanilla fragrans Nutrition 0.000 description 1
- 244000263375 Vanilla tahitensis Species 0.000 description 1
- 235000012036 Vanilla tahitensis Nutrition 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/091—Active learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/778—Active pattern-learning, e.g. online learning of image or video features
- G06V10/7784—Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
- G06V10/7788—Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors the supervisor being a human, e.g. interactive learning with a human teacher
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
Definitions
- the present subject matter relates, in general, to a system and method for a unified interactive framework for identifying labelling errors to improve labeling process, specifically for Autonomous Driving Applications.
- Machine learning techniques which are using machine learning models, which can learn from unlabeled data without any human intervention.
- Such type of machine learning techniques are known as unsupervised learning.
- a deep learning model may segment data into groups (or clusters) based on some patterns, the model finds in the data. These groups can then be used as labels so that the data can be used to train a supervised learning model.
- a huge amount of clean labelled data are required for training a supervised deep neural network.
- a big dataset such as MSCOCO, Mapillary Vistas, Youtube-8M.
- Conventional approaches for handling the huge amounts of data have focused on unsupervised methods (label is not required), weakly/semi supervised methods (partial labels required) and, on (semi) automatic labelling of data.
- Such methods may use segmentation prediction as instances of polygons by annotating a set number of pixels in each iteration. For this annotation human intervention is still required.
- the major focus of these work is to reduce the annotation time for human labelers drastically. While most of these wins on reduction of annotation time, at least a few iterations of the algorithm are required to provide human comparable results.
- WO2019137196A1 discloses an image annotation information processing method and apparatus, a server and a system. Supervision and judgment processing logic of a plurality of nodes with different processing results can be provided. When image annotation information goes wrong, the results can be automatically returned, so that operators can perform review, modification, etc. The professional ability of the operators can be improved by continuous auditing feedback interaction, image annotation efficiency is gradually improved, and training set picture annotation accuracy is greatly improved. According to the embodiments, annotation quality can be effectively ensured, timely and effective information feedback is provided in a workflow, and sample image annotation information operating efficiency is improved.
- CN105404896A discloses an annotation data processing method and an annotation data processing system.
- the annotation data processing method comprises the steps that step SI 10: similarity of multiple annotation results related to annotation tasks is calculated; step S120: the similarity is compared with a similarity threshold, the process goes to step S130 if the similarity is greater than or equal to the similarity threshold, and the process goes to step S140 if the similarity is less than the similarity threshold; step S130: a situation that multiple annotation results pass quality detection is determined; and step S140: a situation that multiple annotation results do not pass quality detection is determined.
- the quality of the annotation results is automatically detected by utilizing the similarity so that annotation staff are enabled to possibly obtain the quality of the annotation results timely and then possibly correct annotation errors timely, and thus annotation accuracy can be effectively enhanced.
- FIG. 1 illustrates a system environment for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter
- FIG. 2 illustrates a flow chart of a method for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter; and [0011] FIGs. 3a & 3b illustrate graphical representation of the scoring functions for each image in a 2-d plane, in accordance with an example implementation of the present subject matter.
- FIG. 1 illustrates a system environment for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter.
- the present subject matter describes various approaches to obtain a set of correctly labelled images from a larger set of input labelled images and send the mislabeled images, for further labelling.
- the set of labelled input images 102 may contain autonomous driving scene images with varied semantic layout and content.
- the set of labelled input images 102 may be images from a driving scene comprising a traffic sign, vehicle(s), pedestrian(s), and so on.
- the system environment may include a computing system 100 and a neural network architecture.
- the computing system 100 may be communicatively coupled to the neural network architecture.
- the computing system 100 may be directly or remotely coupled to the neural network architecture.
- Examples of the computing system 100 may include, but are not limited to, a laptop, a notebook computer, a desktop computer, and so on.
- the computing system 100 may include a memory 110.
- the memory 110 may include any non-transitory computer-readable medium including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
- volatile memory such as static random-access memory (SRAM) and dynamic random-access memory (DRAM)
- non-volatile memory such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
- the computing system 100 may also include a processor 112 coupled to the memory 110.
- the processor 112 may include microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any other devices that manipulate signals and data based on computer-readable instructions. Further, functions of the various elements shown in the figures, including any functional blocks labelled as “processor(s)”, may be provided using dedicated hardware as well as hardware capable of executing computer-readable instructions.
- the computing system 100 may include interface(s) 114.
- the interface(s) 114 may include a variety of interfaces, for example, interface(s) for users.
- the interface(s) 114 may include data output devices. In an example, the interface(s) 114 may provide an interactive platform for receiving the input images from a user.
- the computing system 100 includes a segmentation network 116, a scoring module 118 and a quality check module 120.
- the segmentation network 116 is communicatively coupled to the deep neural network 300.
- the segmentation network 116 is configured to generate predictions for each image from the set of labelled images 102.
- the segmentation network 116 generates predictions of segmentation based on a plurality of classifiers and plurality of training data from the deep neural network 300.
- the segmentation network 116 is trained to identify pixels belonging to different classes. For generating the predictions each labelled sample of the image is treated as oracles, but the predictions and labeling might differ. This difference can be used to compute a scoring function to be defined between labels and predictions of segmentations of each image, which measures the dis similarity /similarity between them.
- each pixel in an image belongs to a single class from the classification schema.
- the belongingness of a pixel to a class is deterministic.
- labels are provided to each pixel using this class ideology and hence labels do not have a probability values associated with it. This leads to unambiguous/concrete markings of the labels of the labelled images 102.
- the scoring module 118 is configured to compute two or more scoring functions for each image from the set of images 102 using predictions generated by the segmentation network 116.
- two or more scoring functions may include type of functions such as performance metric (loll) score, probability scores, uncertainty scores, and/or the combinations thereof.
- the performance metric score may be used to determine the accuracy of the segmentation network 116.
- the performance metric score may include Intersection over Union (loU) such as for semantic segmentation which measures the similarityscore.
- the scoring module 118 is further configured to compute confidence score using probability values on the classes for pixels of each image from the set of labelled images 102 from the segmentation network 116.
- the scoring module 118 compares the human labels with prediction output generated by the segmentation network 116. In similar manner, to compute uncertainty score (entropy), the scoring module 118 interpret neural network outputs as probability scores.
- the quality check module 120 is configured to identify mislabeled images from the set of labelled images 102 by visualizing image patches from the set of labelled images (102), obtained from a multi-dimensional graphical representation.
- the multi-dimensional graphical representation obtained from varying values of the two or more scoring functions for each image from the labelled images (102).
- quality check module 120 is further configured to generate a two- dimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images 102.
- the scoring module 118 enables a scoring mechanism which treats labels (from original annotators) as oracles.
- the process of quality check approves the status of oracles for each labelled image from the set of images 102 individually. Hence, it might be erroneous to treat the labels as oracles for each instance.
- This insight poses for another scoring function, which ensures the certainty/confidence score for the predictions and/or labels generated by the segmentation network 116.
- the segmentation network 116 for each pixel, provides a probability values on the classes, thus, enabling use of uncertainty/confidence score.
- the confidence score is used to define a (class-specific) uncertainty score on each prediction generated by the segmentation network 116.
- entropy computes the uncertainty of the prediction i.e., high entropy values point to network being unsure about its prediction while low entropy denotes strong confidence.
- other metric techniques may be envisaged which are variants of the basic entropy definition.
- the identified mislabeled images are further sent for re-labelling.
- the set of mislabeled images are a sub-set of the set of labelled images 102.
- FIG. 2 illustrates a flow chart of a method 200 for identifying mislabeled images from a set of labelled image, for a deep neural network, in accordance with an example implementation of the present subject matter.
- the method 200 may be implemented by the computing system 100 including the memory 110, the processor 112, and the interface(s) 114, of FIG. 1. Further, the computing system 100 may be communicatively coupled with the neural network architecture as described in FIG. 1. Although, the method 200 is described in context of the system that is similar to the computing system 100 of FIG. 1, other suitable devices or systems may be used for execution of the method 200.
- the method 200 may include receiving a set of plurality of labelled images 102 and generating predictions for each image from said set of labelled images 102, by a segmentation network 116.
- the input labelled images 102 may be images from a driving scene comprising a traffic sign, vehicle(s), pedestrian(s), and so on.
- the method 200 may include computing two or more scoring functions by using the generated predictions for each of the labelled images 102 by a scoring module 118.
- the method 200 may include identifying mislabeled images from the set of labelled images 102 by visualizing image patches from the set of labelled images 102, obtained from a multi-dimensional graphical representation, by using a quality check module 120.
- the multi-dimensional graphical representation obtained from varying values of said two or more scoring functions for each image from the labelled images 102.
- method 200 further comprising a step 204 for generating a multi-dimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images 102.
- the identified mislabeled images provide compact sub-set of the sequence of set of labelled images 102 for further labelling.
- method 200 further comprising a step in which a QC admin directly selects regions of abundant mislabeling in said multidimensional graphical representation.
- the QC admin can selects region of abundant mislabeling after step 203 for identifying mislabeled images from the set of labelled images 102.
- a scatter application showing multidimensional graphical representation may be presented to the QC admin for selecting regions of abundant mislabeling.
- the method 200 further comprising a step in which a QC worker receives regions of abundant mislabeling in said multi-dimensional graphical representation by the QC admin and iterate over each image in the assigned grid and mark them for relabeling with comments to be sent for further labelling. All images assigned for relabeling are then passed to labelers for relabeling.
- FIG. 3 is an exemplary graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images, generated by the quality check module 120.
- scoring functions e.g. loll
- confidence score i.e. Uncertainty
- This scatter plot is defined by metric score/IoU on the y-axis and confidence score/uncertainty on the x-axis.
- the center of the images aligned with corresponding loU and Uncertainty scores since images are not point objects, the center of the images aligned with corresponding loU and Uncertainty scores.
- the present invention uses different (colored) views of the images.
- the scatter plots have regions of interest shown in FIG. 3b, where finding images with mislabeling and/or incorrect labeling is easy. This follows from the fact that: when the network predictions and labels disagree (low loU) but the network remains confident about its predictions (low Uncertainty), the probability of finding mislabeling increases. Similarly, images with high loU and low uncertainty points towards no or minor errors (and hence no QC use-case). As shown in FIG.
- FIG. 3b shows an exemplary scatter plot for image patches containing object such as vehicles on BDD dataset.
- Regions (1,2, 3, 4) are distorted but nevertheless a structure can be seen in the scatter plot.
- the present invention built an interactive user application using the scatter plot.
- the application provides a region selector tool which can be utilized for selection of any desired region.
- the selector can then be used effectively to chose datapoints with mislabeling.
- This selection tool combined with regions of interest in scatter plots allows for faster selection of mislabeled patches.
- Figure 3 shows the main window of the scatter QC application.
- a two-step procedure for identifying the mislabeled images for relabeling may be employed.
- a quality check (QC) admin uses a scatter QC application directly and selects regions of abundant mislabeling from a scatter plot shown in FIG. 3b. This region of interest is automatically further divided into smaller grids, each of which is passed down to another QC worker. This process will iterate over each image in the assigned grid and mark them for relabeling with remarks from the QC worker. All images assigned for relabeling are then passed to labelers for relabeling.
- the scatter QC application allows for bounding box-based region selection on the scatter plots.
- the core of the present invention is a unified system and an interactive framework for identifying labelling errors.
- the application areas of the computing system 100 may include semantic segmentation, object detection, classification and the like.
- the present invention focuses on finding mislabeling from a specific class instead of finding potential mislabeling for all classes in the schema at once. Moreover, the present invention provides approaches for computation of evaluationmetric (mean loll) and Uncertainty-score (mean entropy) for a sample labelled image. While most of the literature and patents have focused on (semi)-automated labeling and reduction in time and clicks (for annotators), the present invention focuses on quality check process. In general, quality check process is a tedious repetitive process where errors are supposedly less (as compared to a correction to be done in automated labeling) and hence it’s likely that poor quality labels pass-through quality check. In contrast the present invention is not completely dependent on human intervention for quality check of a huge number of labelled images.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Apparatus For Radiation Diagnosis (AREA)
Abstract
Method (200) and systems (100) for identifying mislabeled images from a set of labelled images, for a deep neural network, are described. A sequence of plurality of input labelled images (102) is provided as an input to a segmentation network (116) for generating predictions for each image from said set of labelled images (102). An scoring module (118) is configured to compute two or more scoring functions for each image form the set of images (102) using the predictions generated by the segmentation network (116). A quality check module (120) is configured to configured to identify mislabeled images from the set of labelled images (102) by visualizing said computed two or more scoring functions in multi-dimensional graphical representation.
Description
A SYSTEM AND METHOD FOR QUALITY CHECK OF
LABELLED IMAGES
FIELD OF THE INVENTION
[0001] The present subject matter relates, in general, to a system and method for a unified interactive framework for identifying labelling errors to improve labeling process, specifically for Autonomous Driving Applications.
BACKGROUND OF THE INVENTION
[0002] There are few machine learning techniques which are using machine learning models, which can learn from unlabeled data without any human intervention. Such type of machine learning techniques are known as unsupervised learning. For an example, a deep learning model may segment data into groups (or clusters) based on some patterns, the model finds in the data. These groups can then be used as labels so that the data can be used to train a supervised learning model.
[0003] The recent technologies of deep learning for supervised tasks in the domain of computer vision requires labelled training data. Human labeling efforts are costly and grow exponentially with the size of the dataset, costing the industries a huge amount for labeling. Automated semantic segmentation which involves assigning a class label for each pixel of an image, is an example of a task for which getting labelled training data is especially on higher side of costing. The problem becomes even more acute when the domain of interest lies in the area of autonomous driving. In most domain which develop such autonomous driving functions, huge number of video/image sequences are captured by vehicles mounted with different reference sensors covering over hundred thousand miles. However, in such solutions of manual curation of data in datasets such as the above can be an issue when it comes to a limited budget for annotating data.
[0004] A huge amount of clean labelled data are required for training a supervised deep neural network. For training such networks, a big dataset such as
MSCOCO, Mapillary Vistas, Youtube-8M, is required. Conventional approaches for handling the huge amounts of data, have focused on unsupervised methods (label is not required), weakly/semi supervised methods (partial labels required) and, on (semi) automatic labelling of data. Such methods may use segmentation prediction as instances of polygons by annotating a set number of pixels in each iteration. For this annotation human intervention is still required. The major focus of these work is to reduce the annotation time for human labelers drastically. While most of these wins on reduction of annotation time, at least a few iterations of the algorithm are required to provide human comparable results.
[0005] Current works in literature focus on reducing the amount of annotated data required by training a model on annotated data and utilizing the model for prediction. The predicted images are further added to the dataset after manual inspection from humans. However, this process is flawed. Firstly, the dataset may be poorly labeled and hence may lead to poor performance of the model. Secondly, each predicted image to be added (in the dataset) needs to be verified manually and becomes a resource intensive operation. Therefore, there is a need for a solution for reducing this resource intensive process by limiting images to be verified by humans.
[0006] A prior art, WO2019137196A1 discloses an image annotation information processing method and apparatus, a server and a system. Supervision and judgment processing logic of a plurality of nodes with different processing results can be provided. When image annotation information goes wrong, the results can be automatically returned, so that operators can perform review, modification, etc. The professional ability of the operators can be improved by continuous auditing feedback interaction, image annotation efficiency is gradually improved, and training set picture annotation accuracy is greatly improved. According to the embodiments, annotation quality can be effectively ensured, timely and effective information feedback is
provided in a workflow, and sample image annotation information operating efficiency is improved.
[0007] Another prior art, CN105404896A discloses an annotation data processing method and an annotation data processing system. The annotation data processing method comprises the steps that step SI 10: similarity of multiple annotation results related to annotation tasks is calculated; step S120: the similarity is compared with a similarity threshold, the process goes to step S130 if the similarity is greater than or equal to the similarity threshold, and the process goes to step S140 if the similarity is less than the similarity threshold; step S130: a situation that multiple annotation results pass quality detection is determined; and step S140: a situation that multiple annotation results do not pass quality detection is determined. According to the annotation data processing method and the annotation data processing system, the quality of the annotation results is automatically detected by utilizing the similarity so that annotation staff are enabled to possibly obtain the quality of the annotation results timely and then possibly correct annotation errors timely, and thus annotation accuracy can be effectively enhanced.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWING
[0008] The detailed description is provided with reference to the accompanying figures, wherein:
[0009] FIG. 1 illustrates a system environment for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter;
[0010] FIG. 2 illustrates a flow chart of a method for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter; and
[0011] FIGs. 3a & 3b illustrate graphical representation of the scoring functions for each image in a 2-d plane, in accordance with an example implementation of the present subject matter.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0012] FIG. 1 illustrates a system environment for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter. The present subject matter describes various approaches to obtain a set of correctly labelled images from a larger set of input labelled images and send the mislabeled images, for further labelling. In an example, the set of labelled input images 102 may contain autonomous driving scene images with varied semantic layout and content. In an example, the set of labelled input images 102 may be images from a driving scene comprising a traffic sign, vehicle(s), pedestrian(s), and so on.
[0013] The system environment may include a computing system 100 and a neural network architecture. The computing system 100 may be communicatively coupled to the neural network architecture. In an example, the computing system 100 may be directly or remotely coupled to the neural network architecture. Examples of the computing system 100 may include, but are not limited to, a laptop, a notebook computer, a desktop computer, and so on.
[0014] The computing system 100 may include a memory 110. The memory 110 may include any non-transitory computer-readable medium including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read
only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[0015] In an example, the computing system 100 may also include a processor 112 coupled to the memory 110. The processor 112 may include microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any other devices that manipulate signals and data based on computer-readable instructions. Further, functions of the various elements shown in the figures, including any functional blocks labelled as “processor(s)”, may be provided using dedicated hardware as well as hardware capable of executing computer-readable instructions. Further, the computing system 100 may include interface(s) 114. The interface(s) 114 may include a variety of interfaces, for example, interface(s) for users. The interface(s) 114 may include data output devices. In an example, the interface(s) 114 may provide an interactive platform for receiving the input images from a user.
[0016] In an example implementation of the present subject matter, a method and a system are proposed for a unified and an interactive framework for identifying labelling errors. The computing system 100 includes a segmentation network 116, a scoring module 118 and a quality check module 120. In one embodiment, the segmentation network 116 is communicatively coupled to the deep neural network 300.
[0017] The segmentation network 116 is configured to generate predictions for each image from the set of labelled images 102. In one embodiment, the segmentation network 116 generates predictions of segmentation based on a plurality of classifiers and plurality of training data from the deep neural network 300. In this embodiment
the segmentation network 116 is trained to identify pixels belonging to different classes. For generating the predictions each labelled sample of the image is treated as oracles, but the predictions and labeling might differ. This difference can be used to compute a scoring function to be defined between labels and predictions of segmentations of each image, which measures the dis similarity /similarity between them.
[0018] For the labelled images, each pixel in an image belongs to a single class from the classification schema. The belongingness of a pixel to a class is deterministic. For labelling images, labels are provided to each pixel using this class ideology and hence labels do not have a probability values associated with it. This leads to unambiguous/concrete markings of the labels of the labelled images 102.
[0019] The scoring module 118 is configured to compute two or more scoring functions for each image from the set of images 102 using predictions generated by the segmentation network 116. In one example, two or more scoring functions may include type of functions such as performance metric (loll) score, probability scores, uncertainty scores, and/or the combinations thereof. In one embodiment, the performance metric score may be used to determine the accuracy of the segmentation network 116. In one example, the performance metric score may include Intersection over Union (loU) such as for semantic segmentation which measures the similarityscore. In one embodiment, the scoring module 118 is further configured to compute confidence score using probability values on the classes for pixels of each image from the set of labelled images 102 from the segmentation network 116.
[0020] In one embodiment, for computing performance metric (loU), the scoring module 118 compares the human labels with prediction output generated by the segmentation network 116. In similar manner, to compute uncertainty score
(entropy), the scoring module 118 interpret neural network outputs as probability scores.
[0021] The quality check module 120 is configured to identify mislabeled images from the set of labelled images 102 by visualizing image patches from the set of labelled images (102), obtained from a multi-dimensional graphical representation. Herein, the multi-dimensional graphical representation obtained from varying values of the two or more scoring functions for each image from the labelled images (102). In one embodiment, quality check module 120 is further configured to generate a two- dimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images 102.
[0022] The scoring module 118 enables a scoring mechanism which treats labels (from original annotators) as oracles. The process of quality check approves the status of oracles for each labelled image from the set of images 102 individually. Hence, it might be erroneous to treat the labels as oracles for each instance. This insight poses for another scoring function, which ensures the certainty/confidence score for the predictions and/or labels generated by the segmentation network 116.
[0023] In one example, the segmentation network 116, for each pixel, provides a probability values on the classes, thus, enabling use of uncertainty/confidence score. The confidence score is used to define a (class-specific) uncertainty score on each prediction generated by the segmentation network 116. In this example, entropy computes the uncertainty of the prediction i.e., high entropy values point to network being unsure about its prediction while low entropy denotes strong confidence. In addition to entropy, other metric techniques may be envisaged which are variants of the basic entropy definition.
[0024] In this embodiment, the identified mislabeled images are further sent for re-labelling. Herein, the set of mislabeled images are a sub-set of the set of labelled images 102.
[0025] FIG. 2 illustrates a flow chart of a method 200 for identifying mislabeled images from a set of labelled image, for a deep neural network, in accordance with an example implementation of the present subject matter. The method 200 may be implemented by the computing system 100 including the memory 110, the processor 112, and the interface(s) 114, of FIG. 1. Further, the computing system 100 may be communicatively coupled with the neural network architecture as described in FIG. 1. Although, the method 200 is described in context of the system that is similar to the computing system 100 of FIG. 1, other suitable devices or systems may be used for execution of the method 200.
[0026] Referring to FIG.2, at block 201, the method 200 may include receiving a set of plurality of labelled images 102 and generating predictions for each image from said set of labelled images 102, by a segmentation network 116. In an example, the input labelled images 102 may be images from a driving scene comprising a traffic sign, vehicle(s), pedestrian(s), and so on.
[0027] At block 202, the method 200 may include computing two or more scoring functions by using the generated predictions for each of the labelled images 102 by a scoring module 118.
[0028] At block 203, the method 200 may include identifying mislabeled images from the set of labelled images 102 by visualizing image patches from the set of labelled images 102, obtained from a multi-dimensional graphical representation,
by using a quality check module 120. Herein, the multi-dimensional graphical representation obtained from varying values of said two or more scoring functions for each image from the labelled images 102. In one embodiment, method 200 further comprising a step 204 for generating a multi-dimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images 102. In this embodiment, the identified mislabeled images provide compact sub-set of the sequence of set of labelled images 102 for further labelling.
[0029] In one embodiment, at block 205 method 200 further comprising a step in which a QC admin directly selects regions of abundant mislabeling in said multidimensional graphical representation. Herein, the QC admin can selects region of abundant mislabeling after step 203 for identifying mislabeled images from the set of labelled images 102. In this embodiment, a scatter application showing multidimensional graphical representation may be presented to the QC admin for selecting regions of abundant mislabeling.
[0030] Further in this embodiment, at block 206 the method 200 further comprising a step in which a QC worker receives regions of abundant mislabeling in said multi-dimensional graphical representation by the QC admin and iterate over each image in the assigned grid and mark them for relabeling with comments to be sent for further labelling. All images assigned for relabeling are then passed to labelers for relabeling.
[0031] FIG. 3 is an exemplary graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images, generated by the quality check module 120. Herein in FIG. 3a considering the scoring functions as performance metric score (e.g. loll) and
confidence score (i.e. Uncertainty) can be used to place each image in a 2-d Scatter plot. This scatter plot is defined by metric score/IoU on the y-axis and confidence score/uncertainty on the x-axis. In the FIG. 3a, since images are not point objects, the center of the images aligned with corresponding loU and Uncertainty scores.
[0032] Instead of providing vanilla images on the scatter plots shown in FIG. 3a & 3b, where visualizing errors might be difficult, the present invention uses different (colored) views of the images. To enable fast tagging of mislabeled examples, the scatter plots have regions of interest shown in FIG. 3b, where finding images with mislabeling and/or incorrect labeling is easy. This follows from the fact that: when the network predictions and labels disagree (low loU) but the network remains confident about its predictions (low Uncertainty), the probability of finding mislabeling increases. Similarly, images with high loU and low uncertainty points towards no or minor errors (and hence no QC use-case). As shown in FIG. 3b, the scatter plot is divided into 4 regions as shown in Table I where Region 2 is of interest with mislabeled images to quality check. In scenarios where deep learning network reflects high confidence score in predictions these regions boundaries may be distorted heavily. But as shown in FIG. 3a, such regions are still easy to segregate with little effort from a human annotator. FIG. 3b shows an exemplary scatter plot for image patches containing object such as vehicles on BDD dataset. Here we observe that the Regions (1,2, 3, 4) are distorted but nevertheless a structure can be seen in the scatter plot.
TABLE I
[0033] The present invention built an interactive user application using the scatter plot. The application provides a region selector tool which can be utilized for selection of any desired region. The selector can then be used effectively to chose datapoints with mislabeling. This selection tool combined with regions of interest in scatter plots allows for faster selection of mislabeled patches. Figure 3 shows the main window of the scatter QC application.
[0034] In one embodiment of the present invention, a two-step procedure for identifying the mislabeled images for relabeling may be employed. In the first step of this procedure, a quality check (QC) admin uses a scatter QC application directly and selects regions of abundant mislabeling from a scatter plot shown in FIG. 3b. This region of interest is automatically further divided into smaller grids, each of which is passed down to another QC worker. This process will iterate over each image in the assigned grid and mark them for relabeling with remarks from the QC worker. All images assigned for relabeling are then passed to labelers for relabeling. The scatter QC application allows for bounding box-based region selection on the scatter plots. Human assisted inputs utilizing this bounding box selection can effectively focus on the regions where mislabeling are abundant. The QC admin and QC workers further divides and selects images for relabeling while focusing on regions of mislabeling. In the last stage of Labeling as a Service (LaaS), the human annotators relabel the patch.
[0035] The core of the present invention is a unified system and an interactive framework for identifying labelling errors. In alternate embodiments of the present invention, the application areas of the computing system 100 may include semantic segmentation, object detection, classification and the like.
[0036] The present invention focuses on finding mislabeling from a specific class instead of finding potential mislabeling for all classes in the schema at once. Moreover, the present invention provides approaches for computation of evaluationmetric (mean loll) and Uncertainty-score (mean entropy) for a sample labelled image. While most of the literature and patents have focused on (semi)-automated labeling and reduction in time and clicks (for annotators), the present invention focuses on quality check process. In general, quality check process is a tedious repetitive process where errors are supposedly less (as compared to a correction to be done in automated labeling) and hence it’s likely that poor quality labels pass-through quality check. In contrast the present invention is not completely dependent on human intervention for quality check of a huge number of labelled images.
[0037] Although aspects for the present disclosure have been described in a language specific to structural features and/or methods, it is to be understood that the appended claims are not limited to the specific features or methods described herein. Rather, the specific features and methods are disclosed as examples of the present disclosure.
Claims
1. A computing system (100) comprising: a memory (110); and a processor (112), coupled to the memory (110), configured to provide a set of plurality of labelled images (102) to a segmentation network (116) wherein said segmentation network (116) is configured to generate predictions of each image from said set of labelled images (102); characterized in that: a scoring module (118) is configured to compute two or more scoring functions for each image from the labelled images (102), using the predictions generated by the segmentation network (116); and a quality check module (120) is configured to identify mislabeled images from the set of labelled images (102) by visualizing image patches from the set of labelled images (102), obtained from a multi-dimensional graphical representation, wherein said multi-dimensional graphical representation obtained from varying values of said two or more scoring functions for each image from the labelled images (102).
2. The computing system (100) as claimed in claim 1, wherein said quality check module (120) is configured to generate a two-dimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images (102).
3. The computing system (100) as claimed in claim 1, wherein said segmentation network (116) is communicatively coupled to a deep neural network (300).
4. The computing system (100) as claimed in claim 3, wherein said segmentation network (116) is configured to generate predictions of segmentation based on
a plurality of classifiers and plurality of training data from the deep neural network (300). The computing system (100) as claimed in claim 1, wherein said segmentation network (116) is a trained network on a plurality of labeled images. The computing system (100) as claimed in claim 1, wherein said two or more scoring functions includes functions such as performance metric (loll), probability scores, uncertainty scores, and/or the combinations thereof. A computer- implemented method (200) for identifying mislabeled images from a set of labelled images, for a deep neural network, the method (200) comprising the steps for: receiving (201) a set of plurality of labelled images (102), and generating predictions for each image from said set of labelled images (102), by a segmentation network (116); computing (202) two or more scoring functions by using the generated predictions for each of the labelled images (102), by a scoring module (118); identifying (203) mislabeled images from the set of labelled images (102) by visualizing image patches from the set of labelled images (102), obtained from a multi-dimensional graphical representation, wherein said multi-dimensional graphical representation obtained from varying values of said two or more scoring functions for each image from the labelled images (102), by using a quality check module (120). The computer-implemented method (200) as claimed in claim 7, wherein said method (200) further comprising a step (204) for generating a multidimensional graphical representation using the computed scoring functions for
allowing faster selection of mislabeled images from the set of the labelled images (102). The computer-implemented method (200) as claimed in claim 7, wherein said method (200) after step (203) for identifying mislabeled images from the set of labelled images (102), further comprising a step (205) wherein, a QC admin directly selects regions of abundant mislabeling in said multi-dimensional graphical representation. The computer-implemented method (200) as claimed in claim 9, wherein said method (200) after step (205) by a QC admin, further comprising a step (206) wherein, a QC worker receives regions of abundant mislabeling in said multidimensional graphical representation and iterate over each image in the assigned grid and mark them for relabeling with comments to be sent for further labelling.
16
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN202141061727 | 2021-12-30 | ||
IN202141061727 | 2021-12-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023126280A1 true WO2023126280A1 (en) | 2023-07-06 |
Family
ID=84901557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2022/087313 WO2023126280A1 (en) | 2021-12-30 | 2022-12-21 | A system and method for quality check of labelled images |
Country Status (2)
Country | Link |
---|---|
TW (1) | TW202345104A (en) |
WO (1) | WO2023126280A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105404896A (en) | 2015-11-03 | 2016-03-16 | 北京旷视科技有限公司 | Annotation data processing method and annotation data processing system |
WO2019137196A1 (en) | 2018-01-11 | 2019-07-18 | 阿里巴巴集团控股有限公司 | Image annotation information processing method and device, server and system |
-
2022
- 2022-12-21 WO PCT/EP2022/087313 patent/WO2023126280A1/en active Application Filing
- 2022-12-28 TW TW111150348A patent/TW202345104A/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105404896A (en) | 2015-11-03 | 2016-03-16 | 北京旷视科技有限公司 | Annotation data processing method and annotation data processing system |
WO2019137196A1 (en) | 2018-01-11 | 2019-07-18 | 阿里巴巴集团控股有限公司 | Image annotation information processing method and device, server and system |
Non-Patent Citations (1)
Title |
---|
UMAA REBBAPRAGADA ET AL: "Active Label Correction", DATA MINING (ICDM), 2012 IEEE 12TH INTERNATIONAL CONFERENCE ON, IEEE, 10 December 2012 (2012-12-10), pages 1080 - 1085, XP032311139, ISBN: 978-1-4673-4649-8, DOI: 10.1109/ICDM.2012.162 * |
Also Published As
Publication number | Publication date |
---|---|
TW202345104A (en) | 2023-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108805170B (en) | Forming data sets for fully supervised learning | |
CN109741332B (en) | Man-machine cooperative image segmentation and annotation method | |
JP5174040B2 (en) | Computer-implemented method for distinguishing between image components and background and system for distinguishing between image components and background | |
US20210326638A1 (en) | Video panoptic segmentation | |
US10262214B1 (en) | Learning method, learning device for detecting lane by using CNN and testing method, testing device using the same | |
WO2023109208A1 (en) | Few-shot object detection method and apparatus | |
EP4018358B1 (en) | Negative sampling algorithm for enhanced image classification | |
CN114600130A (en) | Process for learning new image classes without labels | |
CN113469294B (en) | Method and system for detecting icons in RPA robot | |
CN114730486B (en) | Method and system for generating training data for object detection | |
CN114511077A (en) | Training point cloud processing neural networks using pseudo-element based data augmentation | |
US20240346808A1 (en) | Machine learning training dataset optimization | |
Liu et al. | Uncertain label correction via auxiliary action unit graphs for facial expression recognition | |
Wang et al. | Semantic segmentation of sewer pipe defects using deep dilated convolutional neural network | |
CN117952224A (en) | Deep learning model deployment method, storage medium and computer equipment | |
CN117649515A (en) | Digital twinning-based semi-supervised 3D target detection method, system and equipment | |
Das et al. | Object Detection on Scene Images: A Novel Approach | |
CN113298822B (en) | Point cloud data selection method and device, equipment and storage medium | |
WO2023126280A1 (en) | A system and method for quality check of labelled images | |
JP2022150552A (en) | Data processing apparatus and method | |
CN115170585B (en) | Three-dimensional point cloud semantic segmentation method | |
CN113344038B (en) | Visual analysis system and method for generating integrated classifier for small sample learning task | |
EP4131178A1 (en) | Image classification method and apparatus, and method and apparatus for improving training of an image classifier | |
US20240249844A1 (en) | Diagnostic imaging deep learning system and method | |
EP4280101A1 (en) | Pseudo-ground-truth generation from timestamp supervision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22840190 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022840190 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022840190 Country of ref document: EP Effective date: 20240730 |