TW202345104A

TW202345104A - A system and method for quality check of labelled images

Info

Publication number: TW202345104A
Application number: TW111150348A
Authority: TW
Inventors: 阿貝西吉特庫瑪爾; 阿米特阿文德凱樂; 艾比尼澤戈伊爾皮萊文諾伊約翰豪森; 柯斯達夫穆利克
Original assignee: 德商羅伯特博世有限公司; 印度商羅伯特博斯奇技術及業務解決私人有限公司
Priority date: 2021-12-30
Filing date: 2022-12-28
Publication date: 2023-11-16
Also published as: WO2023126280A1

Abstract

Method (200) and systems (100) for identifying mislabeled images from a set of labelled images, for a deep neural network, are described. A sequence of plurality of input labelled images (102) is provided as an input to a segmentation network (116) for generating predictions for each image from said set of labelled images (102). An scoring module (118) is configured to compute two or more scoring functions for each image form the set of images (102) using the predictions generated by the segmentation network (116). A quality check module (120) is configured to configured to identify mislabeled images from the set of labelled images (102) by visualizing said computed two or more scoring functions in multi-dimensional graphical representation.

Description

Systems and methods for quality inspection of tagged images

本發明主題大體而言係關於一種用於識別標記錯誤以改良標記程序之統一交互式構架之系統及方法，特別是用於自動駕駛應用。The present subject matter generally relates to a system and method for a unified interactive framework for identifying marking errors to improve marking procedures, particularly for autonomous driving applications.

很少有機器學習技術使用機器學習模型，其可在無任何人工干預之情況下自未經標記資料學習。此類類型之機器學習技術被稱為無監督學習。舉例而言，深度學習模型可基於該模型在資料中發現的一些模式而將資料分段成群組（或叢集）。此等群組可接著用作標籤使得資料可用以訓練監督式學習模型。Few machine learning techniques use machine learning models that can learn from unlabeled data without any human intervention. This type of machine learning technique is called unsupervised learning. For example, a deep learning model can segment data into groups (or clusters) based on patterns the model finds in the data. These groups can then be used as labels so that the data can be used to train supervised learning models.

電腦視覺領域中用於監督任務之深度學習的最新技術需要經標記之訓練資料。人工標記工作係昂貴的且隨資料集之大小按指數律成比例地生長，從而給行業帶來了大量的標記成本。涉及向影像之各像素指派類別標籤的自動化語意分段，為尤其在成本較高方面獲得經標記訓練資料的任務的實例。當所關注領域處於自主駕駛領域時，問題變得更加尖銳。在開發此類自動駕駛功能的大部分領域中，大量視訊/影像序列係由安裝有覆蓋數十萬英里的不同參考感測器的車輛擷取。然而，在諸如上述手動策展資料集中之資料的此類解決方案中，當涉及到用於註解資料之有限預算時，可能會存在問題。State-of-the-art deep learning techniques for supervised tasks in computer vision require labeled training data. Manual labeling is expensive and grows exponentially with the size of the data set, thus bringing significant labeling costs to the industry. Automated semantic segmentation, which involves assigning class labels to each pixel of an image, is an example of a task where obtaining labeled training data is particularly costly. The problem becomes even more acute when the area of concern lies in the realm of autonomous driving. In most areas of developing such autonomous driving capabilities, large amounts of video/imagery sequences are captured by vehicles equipped with different reference sensors covering hundreds of thousands of miles. However, in such solutions as the manual curation of data in the dataset described above, there can be problems when it comes to limited budgets for annotating the data.

訓練監督式深度神經網路需要大量純淨的經標記資料。為了訓練此類網路，需要諸如MSCOCO、Mapillary Vistas、Youtube-8M之大型資料集。用於處置大量資料之習知方法已集中於無監督方法（不需要標籤）、弱/半監督方法（需要部分標籤）及資料之（半）自動標記。此類方法可藉由在各反覆中註解設定數目個像素而使用分段預測作為多邊形之例項。對於此註解，仍需要人工干預。此等工作之主要焦點為大幅度減少人類標記者的註解時間。雖然大部分此等方法都在減少註解時間方面取得了勝利，但至少需要演算法之幾次反覆才能提供人工可相當的結果。Training supervised deep neural networks requires large amounts of clean, labeled data. In order to train such networks, large datasets such as MSCOCO, Mapillary Vistas, Youtube-8M are needed. Conventional methods for processing large amounts of data have focused on unsupervised methods (requiring no labels), weak/semi-supervised methods (requiring partial labels), and (semi-)automatic labeling of data. Such methods can use segment prediction as an example of polygons by annotating a set number of pixels in each iteration. For this annotation, manual intervention is still required. The main focus of this work is to significantly reduce the annotation time of human annotators. While most of these methods succeed in reducing annotation time, they require at least a few iterations of the algorithm to provide human-comparable results.

文獻中之當前工作集中於減少在經註解資料上訓練模型並利用該模型進行預測所需的經註解資料量。在來自人類之手動檢測之後，將經預測影像進一步添加至資料集。然而，此程序係有缺陷的。首先，資料集可經不良地標記且因此可導致模型之效能不良。其次，（在資料集中）待添加之各經預測影像需要經手動地驗證且成為資源密集型操作。因此，需要用於藉由限制待由人類驗證之影像來減少此資源密集型程序的解決方案。Current work in the literature focuses on reducing the amount of annotated data required to train a model on the annotated data and use the model to make predictions. After manual detection from humans, the predicted images are further added to the dataset. However, this procedure has flaws. First, the data set can be poorly labeled and thus can lead to poor model performance. Second, each predicted image to be added (in the dataset) needs to be manually verified and becomes a resource-intensive operation. Therefore, solutions are needed to reduce this resource-intensive process by limiting the images to be verified by humans.

先前技術WO2019137196A1揭示一種影像註解資訊處理方法及設備、一種伺服器及一種系統。可提供具有不同處理結果之複數個節點的監督及判斷處理邏輯。當影像註解資訊出錯時，可自動地傳回結果，使得操作員可執行審查、修改等。可藉由持續審計回饋交互來改良操作員之專業能力，逐漸改良影像註解效率，且極大地改良訓練集圖像註解準確度。根據具體實例，可有效地確保註解品質，在工作流中提供及時且有效的資訊回饋，且樣本影像註解資訊操作效率得以改良。The prior art WO2019137196A1 discloses an image annotation information processing method and device, a server and a system. It can provide supervision and judgment processing logic for multiple nodes with different processing results. When there is an error in the image annotation information, the results can be automatically returned so that the operator can review, modify, etc. The operator's professional capabilities can be improved through continuous audit feedback interaction, gradually improving the image annotation efficiency, and greatly improving the training set image annotation accuracy. According to specific examples, annotation quality can be effectively ensured, timely and effective information feedback can be provided in the workflow, and sample image annotation information operation efficiency can be improved.

另一先前技術CN105404896A揭示一種註解資料處理方法及一種註解資料處理系統。該註解資料處理方法包含以下步驟：步驟S110：計算關於註解任務之多個註解結果的相似性；步驟S120：將相似性與相似性臨限值進行比較，若相似性大於或等於相似性臨限值，則程序轉至步驟S130，且若相似性小於相似性臨限值，則程序轉至步驟S140；步驟S130：判定多個註解結果通過品質偵測的情形；及步驟S140：判定多個註解結果未通過品質偵測的情形。根據註解資料處理方法及註解資料處理系統，註解結果之品質係藉由利用相似性自動地偵測，使得註解人員能夠可能及時地獲得註解結果之品質，且接著可能及時地校正註解錯誤，且因此可有效地增強註解準確度。Another prior art CN105404896A discloses an annotation data processing method and an annotation data processing system. The annotation data processing method includes the following steps: Step S110: Calculate the similarity of multiple annotation results on the annotation task; Step S120: Compare the similarity with the similarity threshold, if the similarity is greater than or equal to the similarity threshold value, the program goes to step S130, and if the similarity is less than the similarity threshold value, the program goes to step S140; Step S130: Determine whether multiple annotation results pass quality detection; and Step S140: Determine multiple annotations. The result is that the quality test fails. According to the annotation data processing method and annotation data processing system, the quality of annotation results is automatically detected by utilizing similarities, so that annotators can obtain the quality of annotation results in a timely manner, and then correct annotation errors in a timely manner, and therefore It can effectively enhance annotation accuracy.

本發明提供一種計算系統（100），其包含：一記憶體（110）；及耦接至該記憶體（110）之一處理器（112），其經組態以將複數個經標記影像（102）之一集合提供至一分段網路（116），其中該分段網路（116）經組態以產生針對來自該複數個經標記影像（102）之該集合之各影像的預測；其特徵在於：一計分模組（118）經組態以使用由該分段網路（116）產生之該等預測來計算用於來自該複數個經標記影像（102）之各影像的兩個或多於兩個計分函數；及一品質檢查模組（120）經組態以藉由視覺化從一多維圖形表示獲得的來自該複數個經標記影像（102）之該集合之影像補丁，從而自該複數個經標記影像（102）之該集合中識別錯誤標記之影像，其中該多維圖形表示係從用於來自該複數個經標記影像（102）之各影像之該兩個或多於兩個計分函數的不同值獲得。The present invention provides a computing system (100), which includes: a memory (110); and a processor (112) coupled to the memory (110) configured to convert a plurality of labeled images ( A set of 102) is provided to a segmentation network (116), wherein the segmentation network (116) is configured to generate predictions for each image from the set of the plurality of labeled images (102); Characterized by: a scoring module (118) configured to use the predictions generated by the segmentation network (116) to calculate two scores for each image from the plurality of labeled images (102). one or more scoring functions; and a quality inspection module (120) configured to visualize images from the set of labeled images (102) obtained from a multi-dimensional graphical representation patch to identify mislabeled images from the set of labeled images (102), wherein the multi-dimensional graphical representation is obtained from the two or More than two different values of the scoring function are obtained.

本發明提供一種用於針對一深度神經網路從經標記影像之一集合中識別錯誤標記之影像的電腦實施方法（200），該方法（200）包含用於進行以下操作之步驟：藉由一分段網路（116）接收（201）複數個經標記影像（102）之一集合，並產生針對來自該複數個經標記影像（102）之該集合之各影像的預測；藉由一計分模組（118），藉由使用針對該複數個經標記影像（102）中之各者之所產生之該等預測來計算（202）兩個或多於兩個計分函數；藉由使用一品質檢查模組（120），藉由視覺化自一多維圖形表示獲得的來自該複數個經標記影像（102）之該集合之影像補丁，從而自該複數個經標記影像（102）之該集合中識別（203）錯誤標記之影像，其中該多維圖形表示係從用於來自該複數個經標記影像（102）之各影像之該兩個或多於兩個計分函數的不同值獲得。The present invention provides a computer-implemented method (200) for identifying mislabeled images from a set of labeled images for a deep neural network, the method (200) comprising the steps of: The segmentation network (116) receives (201) a set of labeled images (102) and generates predictions for each image from the set of labeled images (102); by a score A module (118) that computes (202) two or more scoring functions by using the predictions generated for each of the plurality of labeled images (102); by using a The quality inspection module (120) performs the processing of the plurality of labeled images (102) by visualizing image patches from the set of labeled images (102) obtained from a multi-dimensional graphical representation. Mislabeled images are identified (203) in the collection, wherein the multidimensional graphical representation is obtained from different values of the two or more scoring functions for each image from the plurality of labeled images (102).

圖1繪示根據本發明主題之範例實施的用於針對深度神經網路自經標記影像集合識別錯誤標記影像的系統環境。本發明主題描述用於自輸入經標記影像之較大集合獲得正確經標記影像集合並發送錯誤標記影像以進行進一步標記的各種方法。在一實例中，經標記輸入影像102之集合可含有具有變化之語義佈局及內容的自動駕駛場景影像。在一實例中，經標記輸入影像102之集合可為來自包含交通信號、車輛、行人等等之駕駛場景的影像。1 illustrates a system environment for identifying mislabeled images from a collection of labeled images for a deep neural network, in accordance with an example implementation of the present subject matter. The present subject matter describes various methods for obtaining a correct set of labeled images from a larger set of input labeled images and sending incorrectly labeled images for further labeling. In one example, the set of labeled input images 102 may contain images of autonomous driving scenes with varying semantic layout and content. In one example, the set of labeled input images 102 may be images from a driving scene including traffic signals, vehicles, pedestrians, and the like.

系統環境可包括計算系統100及神經網路架構。計算系統100可以通信方式耦接至神經網路架構。在一實例中，計算系統100可直接或遠端地耦接至神經網路架構。計算系統100之實例可包括但不限於膝上型電腦、筆記型電腦、桌上型電腦等等。The system environment may include the computing system 100 and the neural network architecture. Computing system 100 may be communicatively coupled to a neural network architecture. In one example, computing system 100 may be coupled to a neural network architecture directly or remotely. Examples of computing systems 100 may include, but are not limited to, laptop computers, notebook computers, desktop computers, and the like.

計算系統100可包括記憶體110。記憶體110可包括任何非暫時性電腦可讀媒體，包括例如揮發性記憶體，諸如靜態隨機存取記憶體（SRAM）及動態隨機存取記憶體（DRAM），及/或非揮發性記憶體，諸如唯讀記憶體（ROM）、可抹除可程式化ROM、快閃記憶體、硬碟、光碟及磁帶。Computing system 100 may include memory 110 . Memory 110 may include any non-transitory computer-readable medium, including, for example, volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory , such as read-only memory (ROM), erasable programmable ROM, flash memory, hard disk, optical disk and tape.

在一實例中，計算系統100亦可包括耦接至記憶體110之處理器112。處理器112可包括微處理器、微電腦、微控制器、數位信號處理器、中央處理單元、狀態機、邏輯電路系統及/或基於電腦可讀指令操縱信號及資料的任何其他裝置。另外，可使用專用硬體以及能夠執行電腦可讀指令之硬體來提供圖中所展示之各種元件的功能，包括標記為「處理器」之任何功能區塊。另外，計算系統100可包括介面114。介面114可包括多種介面，例如用於使用者之介面。介面114可包括資料輸出裝置。在一實例中，介面114可提供用於自使用者接收輸入影像之交互式平台。In one example, computing system 100 may also include processor 112 coupled to memory 110 . Processor 112 may include a microprocessor, microcomputer, microcontroller, digital signal processor, central processing unit, state machine, logic circuitry, and/or any other device that manipulates signals and data based on computer-readable instructions. Additionally, specialized hardware, as well as hardware capable of executing computer-readable instructions, may be used to provide the functionality of the various elements illustrated in the Figures, including any functional block labeled "processor." Additionally, computing system 100 may include interface 114 . Interface 114 may include various interfaces, such as interfaces for users. Interface 114 may include a data output device. In one example, interface 114 may provide an interactive platform for receiving input images from a user.

在本發明主題之一範例實施中，提出用於識別標記錯誤之統一及交互式構架的方法及系統。計算系統100包括分段網路116、計分模組118及品質檢查模組120。在一個具體實例中，分段網路116以通信方式耦接至深度神經網路300。In an exemplary implementation of the subject matter of the present invention, a method and system for a unified and interactive framework for identifying markup errors are proposed. The computing system 100 includes a segmentation network 116, a scoring module 118, and a quality inspection module 120. In one specific example, segmentation network 116 is communicatively coupled to deep neural network 300.

分段網路116經組態以產生針對來自經標記影像102之集合之各影像的預測。在一個具體實例中，分段網路116基於複數個分類器及來自深度神經網路300之複數個訓練資料而產生分段之預測。在此具體實例中，訓練分段網路116以識別屬於不同類別之像素。為了產生預測，將影像之各經標記樣本視為oracle，但預測與標籤可能不同。此差異可用於計算待定義於各影像之分段的標籤與預測之間的計分函數，該計分函數量測標籤與預測之間的相異性/相似性。Segmentation network 116 is configured to generate predictions for each image from the set of labeled images 102 . In one specific example, segmentation network 116 generates segmentation predictions based on a plurality of classifiers and a plurality of training data from deep neural network 300 . In this specific example, segmentation network 116 is trained to identify pixels belonging to different categories. To generate predictions, each labeled sample of the image is treated as an oracle, but the predictions and labels may be different. This difference can be used to calculate a scoring function between the label and the prediction to be defined for each image segment, which scoring function measures the dissimilarity/similarity between the label and the prediction.

對於經標記影像，影像中之各像素屬於來自分類結構描述之單個類別。像素對類別的歸屬係確定性的。對於標記影像，使用此類別意識形態將標籤提供至各像素，且因此標籤不具有與其相關聯之機率值。此導致經標記影像102之標籤的明確/具體標示。For labeled images, each pixel in the image belongs to a single class derived from the classification structure description. The attribution of pixels to categories is deterministic. For labeled images, a label is provided to each pixel using this class ideology, and therefore the label has no probability value associated with it. This results in a clear/specific identification of the label of the tagged image 102 .

計分模組118經組態以使用由分段網路116產生之預測來計算用於來自影像102之集合之各影像的兩個或多於兩個計分函數。在一個實例中，兩個或多於兩個計分函數可包括諸如效能度量（IoU）分數、機率分數、不確定性分數及/或其組合之函數類型。在一個具體實例中，效能度量分數可用於判定分段網路116之準確度。在一個實例中，效能度量分數可包括交集併集比（Intersection over Union；IoU），諸如用於量測相似性分數之語意分段。在一個具體實例中，計分模組118經進一步組態以使用來自分段網路116的關於經標記影像102集合之各影像之像素的類別之機率值來計算信賴分數。Scoring module 118 is configured to use predictions generated by segmentation network 116 to compute two or more scoring functions for each image from the set of images 102 . In one example, two or more scoring functions may include function types such as metric of performance (IoU) scores, probability scores, uncertainty scores, and/or combinations thereof. In one specific example, performance metric scores may be used to determine the accuracy of segmented network 116 . In one example, the performance metric score may include Intersection over Union (IoU), such as semantic segmentation used to measure similarity scores. In one specific example, scoring module 118 is further configured to calculate a confidence score using probability values from segmentation network 116 regarding the class of a pixel for each image of labeled image 102 set.

在一個具體實例中，為了計算效能度量（IoU），計分模組118將人工標籤與由分段網路116產生之預測輸出進行比較。以類似方式，為了計算不確定性分數（熵），計分模組118將神經網路輸出解譯為機率分數。In one specific example, to calculate the performance metric (IoU), the scoring module 118 compares the manual labels to the predicted output generated by the segmentation network 116 . In a similar manner, to calculate the uncertainty score (entropy), the scoring module 118 interprets the neural network output into a probability score.

品質檢查模組120經組態以藉由視覺化自多維圖形表示獲得的來自經標記影像（102）之集合之影像補丁，從而自經標記影像102之集合識別錯誤標記之影像。本文中，多維圖形表示係自用於來自經標記影像（102）之各影像之兩個或多於兩個計分函數的不同值獲得。在一個具體實例中，品質檢查模組120經進一步組態以使用所計算之計分函數產生二維圖形表示以允許自經標記影像102之集合更快地選擇錯誤標記之影像。The quality inspection module 120 is configured to identify mislabeled images from the set of labeled images 102 by visualizing image patches from the set of labeled images (102) obtained from the multidimensional graphical representation. Herein, a multidimensional graphical representation is obtained from different values of two or more scoring functions for each image from the labeled image (102). In one specific example, the quality inspection module 120 is further configured to generate a two-dimensional graphical representation using the calculated scoring function to allow faster selection of mislabeled images from the set of labeled images 102 .

計分模組118啟用評分機制，其將標籤（來自原始註解者）視為oracle。品質檢查之程序個別地批准來自影像102集合之各個經標記影像的oracle之狀態。因此，將標籤視為用於各例項之oracle可能係錯誤的。此洞察提出了另一計分函數，其確保了由分段網路116產生之預測及/或標籤的確定性/信賴分數。The scoring module 118 enables a scoring mechanism that treats labels (from the original annotator) as oracles. The quality check process individually approves the oracle status of each tagged image from the image 102 collection. Therefore, it may be wrong to think of labels as oracles for each instance. This insight suggests another scoring function that ensures a certainty/confidence score for predictions and/or labels generated by the segmentation network 116 .

在一個實例中，分段網路116針對各像素提供關於類別之機率值，因此，使得能夠使用不確定性/信賴分數。信賴分數用於定義關於由分段網路116產生之各預測的（類別特定）不確定性分數。在此實例中，熵計算預測之不確定性，亦即，高熵值指向網路對其預測不確定，而低熵表示強信賴。除了熵以外，亦可設想作為基本熵定義之變體的其他度量技術。In one example, segmentation network 116 provides a probability value for each pixel regarding the class, thus enabling the use of uncertainty/confidence scores. The confidence score is used to define the (category-specific) uncertainty score for each prediction produced by the segmentation network 116 . In this example, entropy calculates the uncertainty of the prediction, that is, a high entropy value indicates that the network is uncertain about its prediction, while a low entropy value indicates strong trust. In addition to entropy, other measurement techniques that are variants of the basic entropy definition are also conceivable.

在此具體實例中，經識別之錯誤標記影像經進一步發送以供重新標記。本文中，錯誤標記影像之集合為經標記影像102集合之子集。In this particular example, the identified mislabeled images are further sent for relabeling. Herein, the set of mislabeled images is a subset of the set of labeled images 102 .

圖2繪示根據本發明主題之範例實施的用於針對深度神經網路自經標記影像集合識別錯誤標記影像之方法200的流程圖。該方法200可藉由圖1之包括記憶體110、處理器112及介面114之計算系統100實施。另外，計算系統100可以通信方式與神經網路架構耦接，如圖1中所描述。儘管方法200係在類似於圖1之計算系統100的系統之上下文中加以描述，但其他合適裝置或系統可用於執行方法200。2 illustrates a flowchart of a method 200 for identifying mislabeled images from a set of labeled images for a deep neural network, implemented in accordance with an example implementation of the present subject matter. The method 200 may be implemented by the computing system 100 of FIG. 1 including the memory 110, the processor 112 and the interface 114. Additionally, computing system 100 may be communicatively coupled with a neural network architecture, as described in FIG. 1 . Although method 200 is described in the context of a system similar to computing system 100 of FIG. 1, other suitable devices or systems may be used to perform method 200.

參看圖2，在區塊201處，方法200可包括藉由分段網路116接收複數個經標記影像102之集合並產生針對來自經標記影像102之該集合之各影像的預測。在一實例中，輸入經標記影像102可為來自包含交通信號、車輛、行人等等之駕駛場景的影像。Referring to FIG. 2 , at block 201 , method 200 may include receiving a plurality of sets of labeled images 102 via segmentation network 116 and generating predictions for each image from the set of labeled images 102 . In one example, the input labeled images 102 may be images from a driving scene including traffic signals, vehicles, pedestrians, etc.

在區塊202處，方法200可包括藉由計分模組118藉由使用針對經標記影像102中之各者之所產生預測來計算兩個或多於兩個計分函數。At block 202 , the method 200 may include calculating, by the scoring module 118 , two or more scoring functions using the generated predictions for each of the labeled images 102 .

在區塊203處，方法200可包括藉由使用品質檢查模組120視覺化自多維圖形表示獲得的來自經標記影像102之集合之影像補丁，從而自經標記影像102之集合識別錯誤標記之影像。本文中，多維圖形表示係自用於來自經標記影像102之各影像之該兩個或多於兩個計分函數的不同值獲得。在一個具體實例中，方法200進一步包含用於使用所計算之計分函數產生多維圖形表示以允許自經標記影像102之集合更快地選擇錯誤標記之影像的步驟204。在此具體實例中，經識別之錯誤標記影像提供經標記影像102之集合之序列的緊湊子集以用於進一步標記。At block 203 , the method 200 may include identifying mislabeled images from the set of labeled images 102 by using the quality inspection module 120 to visualize image patches from the set of labeled images 102 obtained from the multidimensional graphical representation. . Herein, a multidimensional graphical representation is obtained from different values of the two or more scoring functions for each image from the labeled image 102. In one specific example, method 200 further includes step 204 for generating a multidimensional graphical representation using the calculated scoring function to allow faster selection of incorrectly labeled images from the set of labeled images 102 . In this particular example, the identified mislabeled images provide a compact subset of the sequence of the set of labeled images 102 for further labeling.

在一個具體實例中，在區塊205處，方法200進一步包含QC管理員直接選擇該多維圖形表示中之充足錯誤標記的區域的步驟。本文中，QC管理員可在自經標記影像102之集合識別錯誤標記之影像的步驟203之後選擇充足錯誤標記的區域。在此具體實例中，展示多維圖形表示之散佈應用程式可呈現給QC管理員以用於選擇充足錯誤標記之區域。In one specific example, at block 205, the method 200 further includes the step of the QC administrator directly selecting regions of the multi-dimensional graphical representation that are sufficiently error-labeled. Here, the QC administrator may select sufficiently mislabeled regions following step 203 of identifying mislabeled images from the set of labeled images 102 . In this particular example, a scatter application displaying a multi-dimensional graphical representation may be presented to the QC administrator for use in selecting areas of sufficient error marking.

另外在此具體實例中，在區塊206處，方法200進一步包含QC工作人員接收由QC管理員獲得的在該多維圖形表示中具有充足錯誤標記的區域且遍及經指派網格中之各影像進行反覆且以待發送的批註來標示該等充足錯誤標記的區域需要重新標記以便進行進一步標記的步驟。經指派用於重新標記之所有影像接著經傳遞至標記者以進行重新標記。Additionally in this particular example, at block 206 , the method 200 further includes the QC staff receiving the regions obtained by the QC administrator that have sufficient error flags in the multidimensional graphical representation and running across each image in the assigned grid. Iteratively and with comments to be sent to indicate that sufficiently mismarked areas need to be remarked for further marking. All images assigned for relabeling are then passed to the labeler for relabeling.

圖3為使用所計算之計分函數以允許自經標記影像集合更快地選擇由品質檢查模組120產生的錯誤標記之影像的例示性圖形表示。在本文中，在圖3a中，將計分函數視為效能度量分數（例如IoU）且信賴分數（亦即，不確定性）可用以將各影像置放於二維散佈圖中。此散佈圖係由y軸上之度量分數/IoU及x軸上之信賴分數/不確定性定義。在圖3a中，由於影像並非點物件，因此影像之中心與對應IoU及不確定性分數對準。3 is an exemplary graphical representation of the use of a calculated scoring function to allow faster selection of incorrectly labeled images produced by the quality inspection module 120 from a collection of labeled images. In this paper, in Figure 3a, the scoring function is considered as a performance metric score (e.g., IoU) and the confidence score (i.e., uncertainty) can be used to place each image in a two-dimensional scatter plot. This scatter plot is defined by the metric score/IoU on the y-axis and the confidence score/uncertainty on the x-axis. In Figure 3a, since the image is not a point object, the center of the image is aligned with the corresponding IoU and uncertainty score.

本發明使用影像之不同（彩色）視圖，而非在圖3a及圖3b中所展示之散佈圖上提供簡樸影像，其中視覺化錯誤可為困難的。為了使能夠快速標記錯誤標記之實例，散佈圖具有圖3b中所展示之所關注區域，其中找到具有錯誤標記及/或不正確標記之影像係容易的。此源於以下事實：當網路預測與標籤不一致（低IoU）但網路保對其預測保持信賴（低不確定性）時，找到錯誤標記之機率會增加。類似地，具有高IoU及低不確定性之影像指向無或較小錯誤（且因此無QC使用案例）。如圖3b中所展示，散佈圖劃分成4個區域，如表I中所展示，其中區域2因錯誤標記之影像受到關注以進行品質檢查。在深度學習網路反映預測之高信賴分數的情境中，此等區域邊界可能嚴重失真。但如圖3a中所展示，此類區域仍容易以極小工作量與人類註解者分離。圖3b展示用於BDD資料集上之含有物件（諸如車輛）之影像補丁的例示性散佈圖。此處吾人觀測到，區域（1，2，3，4）失真，但儘管如此，可在散佈圖中看到結構。區域 不確定性 低高 IOU 高 左上方（區域 1 ） 右上方（區域 4 ）低 左下方（區域 2 ） 右下方（區域 3 ）表I Rather than providing a simple image on the scatter plot shown in Figures 3a and 3b, the present invention uses different (color) views of the image, where visualizing errors can be difficult. To enable quickly labeling instances of mislabeling, the scatter plot has regions of interest as shown in Figure 3b, where it is easy to find images with mislabeling and/or incorrect labeling. This stems from the fact that when the network's predictions are inconsistent with the labels (low IoU) but the network maintains trust in its predictions (low uncertainty), the chance of finding a wrong label increases. Similarly, images with high IoU and low uncertainty point to no or small errors (and therefore no QC use case). As shown in Figure 3b, the scatter plot is divided into 4 regions, as shown in Table I, of which region 2 is focused for quality inspection due to incorrectly labeled images. In situations where deep learning networks reflect high confidence scores in predictions, these region boundaries can be severely distorted. But as shown in Figure 3a, such regions are still easily separated from human annotators with minimal effort. Figure 3b shows an exemplary scatter plot for an image patch containing objects (such as vehicles) on the BDD dataset. Here we observe that the region (1, 2, 3, 4) is distorted, but despite this, the structure can be seen in the scatter plot. area uncertainty Low high IOU high Upper left (area 1 ) Upper right (area 4 ) Low Lower left (area 2 ) Lower right (area 3 ) Table I

本發明使用散佈圖建置交互式使用者應用程式。該應用程式提供可用於選擇任何所要區域之區域選擇器工具。該選擇器可接著有效地用於選擇具有錯誤標記之資料點。此選擇工具與散佈圖中之所關注區域組合允許更快地選擇錯誤標記之補丁。圖3展示散佈QC應用程式之主要視窗。The present invention uses scatter plots to build interactive user applications. The application provides a region selector tool that can be used to select any desired region. This selector can then be effectively used to select data points with error flags. This selection tool combined with the area of interest in the scatter plot allows faster selection of incorrectly labeled patches. Figure 3 shows the main window of the distributed QC application.

在本發明之一個具體實例中，可採用用於識別錯誤標記影像以供重新標記的兩步程序。在此程序之第一步驟中，品質檢查（QC）管理員直接使用散佈QC應用程式且自圖3b中所展示之散佈圖選擇充足錯誤標記的區域。此所關注區域自動進一步劃分成較小網格，該等網格中之各者傳遞至另一QC工作人員。此程序將遍及經指派網格中之各影像進行反覆且標示其以用於用來自QC工作人員之批註重新標記。經指派用於重新標記之所有影像接著經傳遞至標記者以進行重新標記。散佈QC應用程式允許在散佈圖上進行基於限界框之區域選擇。利用此限界框選擇的人工輔助輸入可有效地集中於錯誤標記充足的區域。QC管理員及QC工作人員在集中於錯誤標記之區域時進一步劃分並選擇用於重新標記之影像。在標記即服務（LaaS）之最後階段，人類註解者重新標記補丁。In one embodiment of the present invention, a two-step procedure for identifying mislabeled images for relabeling may be employed. In the first step of this procedure, the quality control (QC) administrator directly uses the scatter QC application and selects sufficiently error-marked regions from the scatter plot shown in Figure 3b. This area of interest is automatically further divided into smaller grids, each of which is passed to another QC worker. This process will be iterated over each image in the assigned grid and marked for re-marking with comments from QC staff. All images assigned for relabeling are then passed to the labeler for relabeling. The Scatter QC application allows bounding box-based region selection on scatter plots. Human-assisted input utilizing this bounding box selection can effectively focus on regions with sufficient error labeling. QC administrators and QC staff further segment and select images for relabeling when focusing on incorrectly labeled areas. In the final phase of Labeling as a Service (LaaS), human annotators relabel patches.

本發明之核心為用於識別標記錯誤之統一系統及交互式構架。在本發明之替代具體實例中，計算系統100之應用領域可包括語義分段、物件偵測、分類及其類似者。At the core of the invention is a unified system and interactive framework for identifying markup errors. In alternative embodiments of the present invention, application areas of computing system 100 may include semantic segmentation, object detection, classification, and the like.

本發明集中於發現來自特定類別之錯誤標記，而非一次針對結構描述中之所有類別發現潛在錯誤標記。此外，本發明提供用於計算樣本標記影像之評估度量（平均IoU）及不確定性分數（平均熵）的方法。雖然大部分文獻及專利都集中於（半）自動化標記及時間及點擊減少（對於註解者），但本發明集中於品質檢查程序。一般而言，品質檢查程序係繁瑣的重複程序，其中錯誤被假設較少（與待在自動化標記中進行之校正相比），且因此，不良品質標籤很可能會通過品質檢查。相比之下，本發明並不完全取決於人工干預來對大量經標記影像進行品質檢查。The present invention focuses on finding erroneous tags from specific categories, rather than finding potential erroneous tags for all categories in the structure description at once. In addition, the present invention provides methods for calculating evaluation metrics (average IoU) and uncertainty scores (average entropy) of sample labeled images. While most literature and patents focus on (semi-)automated labeling and time and click reduction (for the annotator), the present invention focuses on quality inspection procedures. Generally speaking, quality inspection procedures are tedious and repetitive procedures in which errors are assumed to be fewer (compared to corrections to be made in automated labeling), and therefore poor quality labels are likely to pass quality inspection. In contrast, the present invention does not rely entirely on human intervention to perform quality checks on large numbers of labeled images.

儘管已用特定於結構特徵及/或方法的語言描述本揭示內容之態樣，但應理解，所附申請專利範圍不限於本文中所描述之特定特徵或方法。實情為，特定特徵及方法係作為本揭示內容之實例加以揭示。Although aspects of the present disclosure have been described in language specific to structural features and/or methods, it should be understood that the scope of the appended claims is not limited to the specific features or methods described herein. Rather, the specific features and methods are disclosed as examples of this disclosure.

100:計算系統 102:經標記輸入影像 110:記憶體 112:處理器 114:介面 116:分段網路 118:計分模組 120:品質檢查模組 200:方法 201:區塊 202:區塊 203:區塊/步驟 204:步驟 205:區塊 206:區塊 300:深度神經網路 100:Computing system 102: Labeled input image 110:Memory 112: Processor 114:Interface 116: Segmented network 118:Scoring module 120:Quality inspection module 200:Method 201:Block 202:Block 203: Block/Step 204:Step 205:Block 206:Block 300:Deep Neural Network

參看附圖提供實施方式，在該等圖中：Embodiments are provided with reference to the accompanying drawings, in which:

[圖1]繪示根據本發明主題之範例實施的用於針對深度神經網路自經標記影像集合識別錯誤標記影像的系統環境；[Fig. 1] illustrates a system environment for identifying mislabeled images from a labeled image set for a deep neural network according to an example implementation of the inventive subject matter;

[圖2]繪示根據本發明主題之範例實施的用於針對深度神經網路自經標記影像集合識別錯誤標記影像之方法的流程圖；[Fig. 2] illustrates a flowchart of a method for identifying mislabeled images from a labeled image set for a deep neural network according to an example implementation of the inventive subject matter;

[圖3a]及[圖3b]繪示根據本發明主題之範例實施的用於二維平面中之各影像之計分函數的圖形表示。[Fig. 3a] and [Fig. 3b] illustrate graphical representations of scoring functions for each image in a two-dimensional plane according to example implementations of the inventive subject matter.

100:計算系統 100:Computing system

102:經標記輸入影像 102: Labeled input image

110:記憶體 110:Memory

112:處理器 112: Processor

114:介面 114:Interface

116:分段網路 116: Segmented network

118:計分模組 118:Scoring module

120:品質檢查模組 120:Quality inspection module

300:深度神經網路 300:Deep Neural Network

Claims

A computing system (100) comprising: a memory (110); and A processor (112) coupled to the memory (110) configured to provide a set of tagged images (102) to a segmented network (116), wherein the segmented network Path (116) configured to generate predictions for each image from the set of labeled images (102); Its characteristics are: A scoring module (118) is configured to use the predictions generated by the segmentation network (116) to calculate two or more images for each of the plurality of labeled images (102). two scoring functions; and A quality inspection module (120) is configured to generate data from the plurality of labeled images (102) by visualizing image patches from the set of labeled images (102) obtained from a multi-dimensional graphical representation. Identifying mislabeled images in the set of 102), wherein the multidimensional graphical representation is obtained from different values of the two or more scoring functions for each image from the plurality of labeled images (102) .

The computing system (100) of claim 1, wherein the quality inspection module (120) is configured to generate a two-dimensional graphical representation using the two or more calculated scoring functions to allow from the The incorrectly labeled image is selected more quickly from the set of labeled images (102).

The computing system (100) of claim 1, wherein the segmented network (116) is communicatively coupled to a deep neural network (300).

The computing system (100) of claim 3, wherein the segmentation network (116) is configured to generate segmentation predictions based on a plurality of classifiers and a plurality of training data from the deep neural network (300) .

The computing system (100) of claim 1, wherein the segmented network (116) is a trained network on the plurality of labeled images.

Such as the computing system (100) of claim 1, wherein the two or more scoring functions include functions of performance measures (IoU), probability scores, uncertainty scores and/or combinations thereof.

A computer-implemented method (200) for identifying mislabeled images from a set of labeled images for a deep neural network, the computer-implemented method (200) including steps for: receiving (201) a set of a plurality of labeled images (102) via a segmented network (116) and generating a prediction for each image from the set of the plurality of labeled images (102); Computing (202) two or more scoring functions by a scoring module (118) using the predictions generated for each of the plurality of labeled images (102) ; By using a quality inspection module (120), by visualizing image patches from the set of labeled images (102) obtained from a multi-dimensional graphical representation, from the plurality of labeled images (102) Identifying (203) mislabeled images in the set of 102), wherein the multidimensional graphical representation is derived from the two or more scoring functions for each image from the plurality of labeled images (102) Different values are obtained.

The computer-implemented method (200) of claim 7, wherein the method (200) further includes using the calculated two or more scoring functions to generate a multi-dimensional graphical representation for allowing data from the plurality of A step of faster selecting (204) the mislabeled image from the set of labeled images (102).

The computer-implemented method (200) of claim 7, wherein the method (200), after the step (203) of identifying the mislabeled image from the set of labeled images (102), further comprises a QC A step in which the administrator directly selects sufficiently error-labeled regions in the multidimensional graphical representation (205).

The computer-implemented method (200) of claim 9, wherein after step (205) is performed by the QC administrator, the method (200) further includes a QC staff member receiving the sufficient error flags in the multi-dimensional graphical representation. Regions are iterated across each image in the assigned grid, with annotations to be sent indicating that sufficient mislabeled regions need to be re-labeled for further labeling (206).