KR102141302B1

KR102141302B1 - Object detection method based 0n deep learning regression model and image processing apparatus

Info

Publication number: KR102141302B1
Application number: KR1020190024740A
Authority: KR
Inventors: 김정태; 조희연; 김경실
Original assignee: 이화여자대학교 산학협력단
Priority date: 2019-03-04
Filing date: 2019-03-04
Publication date: 2020-08-04

Abstract

An image object detection method based on a deep learning regression model includes the steps of: inputting a source image including a target object and a background to a deep learning regression model by an image processing device; and generating an output image in which the target object is classified from the source image using the deep learning regression model by the image processing device, wherein the deep learning regression model is a convolutional encoder-decoder structure and the deep learning regression model distinguishes between the target object and the background using a pixel-based regression method.

Description

OBJECT DETECTION METHOD BASED 0N DEEP LEARNING REGRESSION MODEL AND IMAGE PROCESSING APPARATUS}

이하 설명하는 기술은 영상에서 특정 객체를 검출하는 기법에 관한 것이다. 특히 이하 설명하는 기술은 픽셀 단위 회귀 모델을 사용한 객체 검출 기법에 관한 것이다.The technique described below relates to a technique for detecting a specific object in an image. In particular, the technique described below relates to an object detection technique using a pixel unit regression model.

머신 비전(machine vision)은 결함 검사, 분류, 인식 등의 응용 분야에서 수동검사자를 대치하여 반복적으로 고정도의 검사를 수행할 수 있어서 널리 연구되고 있다. Machine vision (machine vision) has been widely studied because it can perform a highly accurate inspection repeatedly by replacing a manual inspector in applications such as defect inspection, classification, and recognition.

최근 주목받고 있는 머신 러닝 기술은 수동 검사자의 판단 형태를 학습할 수 있어서 기존 머신 비전 기술이 적용되기 어려웠던 응용 분야에서 활발히 연구되고 있다.Machine learning technology, which has recently attracted attention, is actively being studied in applications where it is difficult to apply the existing machine vision technology because it can learn the judgment form of a manual inspector.

Matthias Haselmann, Dieter P. Gruber, Paul Tabatabai, Anomaly Detection using Deep Learning based Image Completion, 2018 17th IEEE International Conference on Machine Learning and Applications Matthias Haselmann, Dieter P. Gruber, Paul Tabatabai, Anomaly Detection using Deep Learning based Image Completion, 2018 17th IEEE International Conference on Machine Learning and Applications

유사한 패턴의 배경을 갖는 영상에서 특정 객체(불량 부위 등)를 검출하기가 쉽지 않다. 종래 기술은 소스(source) 영상에서 특정 객체를 배제한 영상과 소스 영상을 비교하여 차이값을 결정하는 방식을 사용하였다. 다만 유사한 패턴의 배경이나 잡음이 많은 배경에서 특정 객체를 검출하는 과정 자체가 쉽지 않아 제품 불량 판단의 정확도가 떨어진다.It is not easy to detect a specific object (such as a defective part) in an image having a similar pattern background. In the related art, a method of determining a difference value by comparing a source image with a specific object excluded from a source image is used. However, since the process of detecting a specific object in the background of a similar pattern or a noisy background is not easy, the accuracy of product defect judgment is poor.

이하 설명하는 기술은 픽셀 단위 회귀 방식의 딥러닝 모델을 사용하여 소스 영상에 포함된 특정 객체를 검출하는 기법을 제공하고자 한다. 이하 설명하는 기술은 왜곡 성분이 제거된 배경과 객체를 비교하여 특정 객체를 검출하는 기법을 제공하고자 한다.The technique described below is intended to provide a technique for detecting a specific object included in a source image using a pixel-wise regression-type deep learning model. The technique described below is intended to provide a technique for detecting a specific object by comparing an object with a background from which the distortion component is removed.

회귀 딥러닝 모델 기반의 영상 객체 검출 방법은 처리장치가 회귀 딥러닝 모델에 타깃 객체 및 배경이 포함된 소스 영상을 입력하는 단계 및 상기 영상처리장치가 상기 회귀 딥러닝 모델을 이용하여 상기 소스 영상에서 상기 타깃 객체가 구분된 출력 영상을 생성하는 단계를 포함한다.A method for detecting an image object based on a regression deep learning model includes: a processing device inputting a source image including a target object and a background into the regression deep learning model, and the image processing device uses the regression deep learning model to detect the image object in the source image. And generating an output image in which the target object is classified.

회귀 딥러닝 모델 기반의 객체 검출 장치는 객체 및 배경이 포함된 소스 영상을 입력받는 입력장치, 상기 소스 영상에서 상기 타깃 객체가 구분된 출력 영상을 생성하는 회귀 딥러닝 모델을 저장하는 저장장치 및 상기 소스 영상을 상기 회귀 딥러닝 모델에 입력하여 상기 출력 영상을 생성하는 연산장치를 포함한다.An object detection apparatus based on a regression deep learning model includes an input device that receives a source image including an object and a background, a storage device that stores a regression deep learning model that generates an output image in which the target object is separated from the source image, and the And a computing device that inputs a source image to the regression deep learning model to generate the output image.

상기 회귀 딥러닝 모델은 컨볼루션 인코더-디코더(Convolutional encoder-decoder) 구조이고, 상기 회귀 딥러닝 모델은 픽셀 단위의 회귀(regression) 방법으로 상기 타깃 객체와 배경을 구분한다. The regression deep learning model is a convolutional encoder-decoder structure, and the regression deep learning model distinguishes the target object and the background by a pixel-wise regression method.

이하 설명하는 기술은 픽셀 단위 회귀 방식을 사용하여 왜곡 성분이 많은 소스 영상에서도 특정 객체를 정확하게 검출한다. 이하 설명하는 기술은 특정 객체가 식별된(구분된) 영상을 생성하여 제품 불량 검출이 용이하다.The technique described below accurately detects a specific object even in a source image having many distortion components using a pixel-wise regression method. The technology described below facilitates product defect detection by generating an image in which a specific object is identified (divided).

도 1은 회귀 딥러닝 모델을 이용한 불량 검출 시스템에 대한 예이다.
도 2는 컨볼루셔널 인코더-디코더에 대한 예이다.
도 3은 회귀 딥러닝 모델의 학습 네트워크 구조에 대한 예이다.
도 4는 소스 영상과 출력 영상에 대한 예이다.
도 5는 객체 검출 장치의 구성에 대한 예이다.1 is an example of a defect detection system using a regression deep learning model.
2 is an example of a convolutional encoder-decoder.
3 is an example of a learning network structure of a regression deep learning model.
4 is an example of a source image and an output image.
5 is an example of the configuration of an object detection device.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The technique described below may be applied to various changes and may have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the techniques described below to specific embodiments, and should be understood to include all changes, equivalents, or substitutes included in the spirit and scope of the techniques described below.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, B, etc. can be used to describe various components, but the components are not limited by the above terms, and only for distinguishing one component from other components Used only. For example, the first component may be referred to as a second component, and similarly, the second component may be referred to as a first component without departing from the scope of the technology described below. The term and/or includes a combination of a plurality of related described items or any one of a plurality of related described items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In the terminology used herein, a singular expression should be understood to include a plurality of expressions unless clearly interpreted differently in the context, and terms such as “comprises” describe features, numbers, steps, operations, and components described. It is to be understood that it means that a part or a combination thereof is present, and does not exclude the presence or addition possibility of one or more other features or numbers, step operation components, parts or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Prior to the detailed description of the drawings, it is intended to clarify that the division of components in this specification is only divided by the main functions of each component. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each subdivided function. In addition, each of the components to be described below may additionally perform some or all of the functions in charge of other components in addition to the main functions in charge thereof, and some of the main functions in charge of each of the components are different. Needless to say, it may also be carried out in a dedicated manner.

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, in performing the method or the method of operation, each process constituting the method may occur differently from the specified order unless a specific order is explicitly stated in the context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

이하 설명하는 기술은 기계학습(machine learning)모델을 사용하여 제품의 불량을 검출하는 기법이다. 이하 설명하는 기술은 제품에 대한 영상을 분석하여 제품 품질을 검사하는 머신 비전 기법에 적용될 수 있다. 기계학습모델은 널리 알려진 바와 같이 다양한 모델이 있다. 이하 설명하는 기술은 인공신경망(artificial neural network)을 사용하여 영상을 분석한다고 가정한다.The technique described below is a technique for detecting defects in products using a machine learning model. The technique described below may be applied to a machine vision technique that analyzes an image of a product and inspects product quality. There are various models of machine learning models as is well known. The technique described below is assumed to analyze an image using an artificial neural network.

이하 인공신경망을 이용하여 영상을 분석하는 주체는 영상처리장치라고 명명하다. 영상처리장치는 일정한 데이터 처리 및 연산이 가능한 장치에 해당한다. 영상처리장치는 연산 기능이 있는 컴퓨팅 장치이다. 예컨대, 영상처리장치는 PC, 스마트기기, 서버 등과 같은 장치로 구현될 수 있다. 영상처리장치는 사전에 학습된 인공신경망 모델을 이용하여 입력 영상을 처리한다. Hereinafter, a subject analyzing an image using an artificial neural network is called an image processing apparatus. The image processing device corresponds to a device capable of processing and calculating data. The image processing device is a computing device having an arithmetic function. For example, the image processing device may be implemented as a device such as a PC, a smart device, and a server. The image processing apparatus processes the input image using a previously trained artificial neural network model.

영상처리장치는 제품(객체)을 포함하는 영상을 분석할 수 있다. 분석 대상이 되는 입력 영상을 이하 소스 영상이라고 명명한다. 영상 처리 장치는 인공신경망에 소스 영상을 입력하여 분석 결과를 생성한다. 영상처리장치는 분석 결과를 일정한 영상으로 생성할 수 있다. 영상처리장치가 소스 영상에서 검출한 특정 객체를 타깃 객체라고 명명한다. 예컨대, 타깃 객체는 불량 부위, 특정 부품, 특정 종류의 객체 등이 될 수 있다. 소스 영상은 타깃 객체와 배경을 포함한다.The image processing apparatus may analyze an image including a product (object). The input image to be analyzed is hereinafter referred to as a source image. The image processing device generates an analysis result by inputting a source image into the artificial neural network. The image processing apparatus may generate an analysis result as a constant image. The specific object detected by the image processing device in the source image is called a target object. For example, the target object may be a defective part, a specific part, or a specific kind of object. The source image includes the target object and background.

도 1은 회귀 딥러닝 모델을 이용한 불량 검출 시스템(100)에 대한 예이다. 도 1은 회귀 딥러닝 모델을 사용하여 불량 부위 객체를 검출하는 시스템이다. 회귀 딥러닝 모델은 후술한다. 회귀 딥러닝 모델은 소스 영상에서 특정 객체를 검출하는 신경망 모델이다. 따라서 불량 검출 시스템(100)은 회귀 딥러닝 모델을 적용한 하나의 애플리케이션에 해당한다.1 is an example of a defect detection system 100 using a regression deep learning model. 1 is a system for detecting a defective site object using a regression deep learning model. The regression deep learning model will be described later. The regression deep learning model is a neural network model that detects a specific object in a source image. Therefore, the defect detection system 100 corresponds to one application to which a regression deep learning model is applied.

도 1의 불량 검출 시스템(100)은 제품 공정 라인에 설치된 시스템일 수 있다. 카메라(111, 112, 113)는 검사 대상인 제품의 외관을 촬영한다. 도 1은 3개의 카메라(111, 112 및 113)를 예로 도시하였다. 복수의 공정 라인에서 개별 카메라를 사용하여 불량 검사를 할 수 있다. 나아가 하나의 제품을 서로 다른 관점에서 촬영하는 복수의 카메라를 사용하여 불량 검사를 할 수도 있다. 카메라(111, 112 내지 113)는 소스 영상을 캡쳐한다.The defect detection system 100 of FIG. 1 may be a system installed in a product processing line. The cameras 111, 112, and 113 photograph the appearance of the product to be inspected. 1 shows three cameras 111, 112, and 113 as examples. Defect inspection can be performed using individual cameras in multiple process lines. Furthermore, defect inspection may be performed using a plurality of cameras photographing one product from different viewpoints. The cameras 111, 112 to 113 capture the source image.

도 1에서 불량 검출하는 영상처리장치로 검출 서버(150) 및 검출 PC(180)를 예로 도시하였다. 불량 검출 시스템(100)은 검출 서버(150) 또는 검출 PC(180) 중 어느 하나를 포함할 수 있다. 나아가 불량 검출 시스템(100)은 검출 서버(150) 및 검출 PC(180)를 모두 포함할 수 있다.In FIG. 1, the detection server 150 and the detection PC 180 are illustrated as image processing devices for detecting defects. The defect detection system 100 may include either the detection server 150 or the detection PC 180. Furthermore, the defect detection system 100 may include both the detection server 150 and the detection PC 180.

검출 서버(150)는 카메라(111, 112 및 113 중 적어도 하나)로부터 소스 영상을 전달받는다. 검출 서버(150)는 유선 또는 무선 네트워크를 통해 소스 영상을 수신할 수 있다. 검출 서버(150)는 제품의 불량 여부를 나타내는 분석 결과 또는 분할 영상을 생성하여 사용자(10)에게 전달한다. 사용자(10)는 사용자 단말을 통해 검출 서버(150)의 분석 결과 또는 분할 영상을 확인할 수 있다. 사용자 단말은 PC, 스마트 기기, 디스플레이 장치 등과 같은 장치를 의미한다.The detection server 150 receives the source image from the cameras (at least one of 111, 112 and 113). The detection server 150 may receive the source image through a wired or wireless network. The detection server 150 generates an analysis result indicating whether the product is defective or a divided image and transmits it to the user 10. The user 10 may check the analysis result or the divided image of the detection server 150 through the user terminal. The user terminal means a device such as a PC, a smart device, and a display device.

검출 PC(180)는 카메라(111, 112 및 113 중 적어도 하나)로부터 소스 영상을 전달받는다. 검출 PC(180)는 유선 또는 무선 네트워크를 통해 소스 영상을 수신할 수 있다. 검출 PC(180)는 제품의 불량 여부를 나타내는 분석 결과 또는 분할 영상을 생성하여 사용자(10)에게 전달한다. 사용자(20)는 검출 PC(180)를 통해 제품의 불량 여부를 확인할 수 있다.The detection PC 180 receives the source image from the cameras (at least one of 111, 112, and 113). The detection PC 180 may receive the source image through a wired or wireless network. The detection PC 180 generates an analysis result indicating whether the product is defective or a divided image and transmits it to the user 10. The user 20 may check whether the product is defective through the detection PC 180.

영상처리장치는 회귀 딥러닝 모델을 사용하여 소스 영상에서 타깃 객체를 검출할 수 있다. 회귀 딥러닝 모델은 컨볼루셔널 인코더-디코더(convolutional encoder/decoder) 구조의 모델일 수 있다. 먼저 컨볼루셔널 인코더-디코더에 대하여 설명한다. The image processing apparatus may detect a target object from the source image using a regression deep learning model. The regression deep learning model may be a model of a convolutional encoder/decoder structure. First, the convolutional encoder-decoder will be described.

도 2는 컨볼루셔널 인코더-디코더에 대한 예이다. 도 2는 일반적인 컨볼루셔널 인코더-디코더에 대한 예이다. 컨볼루셔널 인코더-디코더는 컨볼루셔널 인코더 및 컨볼루셔널 디코더로 구성된다. 컨볼루셔널 인코더-디코더는 서로 거울상 구조를 갖는다. 컨볼루셔널 인코더-디코더는 소스 영상에서 추출되는 특징 맵(feature map) 의 크기를 줄였다가 다시 소스 영상 크기만큼 크게 만들어서, 소스 영상의 각 픽셀에 대해 분류 결과로 클래스를 분류한다.2 is an example of a convolutional encoder-decoder. 2 is an example of a general convolutional encoder-decoder. The convolutional encoder-decoder consists of a convolutional encoder and a convolutional decoder. The convolutional encoder-decoder has a mirror image structure with each other. The convolutional encoder-decoder reduces the size of the feature map extracted from the source image and then makes it as large as the source image size, and classifies the class as a classification result for each pixel of the source image.

컨볼루셔널 인코더는 복수의 계층을 갖는다. 도 2는 5개의 계층을 갖는 컨볼루셔널 인코더를 예로 도시하였다. 하나의 계층은 컨볼루셔널 계층(convolutional layer)과 풀링 계층(pooling layer)을 갖는다. 도 2는 컨볼루셔널 계층이라고만 표시하였으나, 해당 계층은 컨볼루셔널 계층 외에도 배치 표준화 계층(batch normalization layer) 및 비선형화 계층(non linear activation layer)을 더 포함할 수 있다. 이 경우 하나의 계층은 컨볼루셔널 계층, 배치 표준화 계층, 비선형화 계층 순서를 가질 수 있다. 해당 계층은 복수 회 반복될 수 있다. 도 2는 2회 반복되는 계층을 도시하였다. 즉 하나의 계층은 (i) (컨볼루셔널 계층, 배치 표준화 계층 및 비선형화 계층) × n회 + (ii) 풀링 계층 구조를 갖는다.Convolutional encoders have multiple layers. 2 shows an example of a convolutional encoder having 5 layers. One layer has a convolutional layer and a pooling layer. Although FIG. 2 shows only the convolutional layer, the layer may further include a batch normalization layer and a non-linear activation layer in addition to the convolutional layer. In this case, one layer may have a convolutional layer, a batch normalization layer, and a nonlinearization layer order. This layer can be repeated multiple times. 2 shows a layer repeated twice. That is, one layer has (i) (convolutional layer, batch normalization layer, and nonlinearization layer) x n times + (ii) a pooling layer structure.

컨볼루셔널 계층은 입력 이미지에 대한 컨볼루셔널 연산을 통해 특징맵(feature map)을 출력한다. 이때 컨볼루셔널 연산을 수행하는 필터(filter)를 커널(kernel) 이라고도 부른다. 필터의 크기를 필터 크기 또는 커널 크기라고 한다. 커널을 구성하는 연산 파라미터(parameter)를 커널 파라미터(kernel parameter), 필터 파라미터(filter parameter), 또는 가중치(weight)라고 한다. 컨볼루셔널 계층에서는 하나의 입력에 서로 다른 종류의 필터를 사용할 수 있다. The convolutional layer outputs a feature map through convolutional operation on the input image. At this time, a filter that performs convolutional operations is also called a kernel. The size of the filter is called the filter size or kernel size. The operation parameters constituting the kernel are referred to as kernel parameters, filter parameters, or weights. In the convolutional layer, different types of filters can be used for one input.

컨볼루셔널 계층은 입력이미지의 특정 영역을 대상으로 컨볼루션 연산을 수행한다. 연산 영역을 윈도우 (window)라고 부른다. 윈도우는 영상의 좌측 상단에서 우측 하단까지 한 칸씩 이동할 수 있고, 한 번에 이동하는 이동 크기를 조절할 수 있다. 이동 크기를 스트라이드(stride)라고 한다. 컨볼루셔널 계층은 입력이미지에서 윈도우를 이동하면서 입력이미지의 모든 영역에 대하여 컨볼루션 연산을 수행한다. 한편 컨볼루셔널 계층은 영상의 가장 자리에 패딩(padding)을 하여 컨볼루션 연산 후 입력 영상의 차원을 유지시킨다. The convolutional layer performs a convolution operation on a specific area of the input image. The computational domain is called a window. The window can be moved one space from the top left to the bottom right of the image, and the size of the move can be adjusted at one time. The size of the movement is called stride. The convolutional layer performs a convolution operation on all areas of the input image while moving a window in the input image. Meanwhile, the convolutional layer pads at the edge of the image to maintain the dimension of the input image after the convolution operation.

풀링 계층은 컨볼루셔널 계층의 연산 결과로 얻은 특징맵을 서브 샘플링(sub sampling)한다. 풀링 연산은 최대 풀링(max pooling)과 평균 풀링(average pooling) 등이 있다. 최대 풀링은 윈도우 내에서 가장 큰 샘플 값을 선택한다. 평균 풀링은 윈도우에 포함된 값의 평균 값으로 샘플링한다. 일반적으로 풀링은 스트라이드와 윈도우의 크기가 갖도록 하는 것일 일반적이다. The pooling layer sub-samples the feature map obtained as a result of the computation of the convolutional layer. The pooling operation includes maximum pooling and average pooling. Maximum pooling selects the largest sample value within the window. The average pooling is sampled with the average value of the values included in the window. In general, pooling is such that the stride and the window have a size.

비선형 연산 계층(nonlinear operation layer)은 뉴런(노드)에서 출력값을 결정하는 계층이다. 비선형 연산 계층은 전달 함수(transfer function)를 사용한다. 전달 함수는 Relu, sigmoid 함수 등이 있다.The nonlinear operation layer is a layer that determines output values from neurons (nodes). The nonlinear operation layer uses a transfer function. Transfer functions include Relu and sigmoid functions.

컨볼루셔널 인코더는 소스 영상에 대한 특징 맵을 생성한다.The convolutional encoder generates a feature map for the source image.

컨볼루셔널 디코더는 컨볼루셔널 인코더가 생성한 특징 맵을 이용하여 일정한 영상을 생성한다. 컨볼루셔널 디코더는 복수의 계층으로 구성된다. 도 2는 5개의 계층을 갖는 컨볼루셔널 디코더를 예로 도시하였다. The convolutional decoder generates a constant image using the feature map generated by the convolutional encoder. The convolutional decoder is composed of a plurality of layers. 2 shows an example of a convolutional decoder having 5 layers.

하나의 계층은 업샘플링 계층(upsampling layer) 및 역컨볼루셔널 계층(deconvolutional layer)을 갖는다. 역컨볼루셔널 계층은 컨볼루셔널 인코더의 컨볼루셔널 계층의 역동작을 수행한다. 역컨볼루셔널 계층은 컨볼루셔널 계층, 배치 표준화 계층, 비 선형화 계층의 구조가 반복될 수 있다. 다만 컨볼루셔널 디코더는 컨볼루셔널 인코더와 대칭적인 구조를 갖기에, 컨볼루셔널 인코더와 동일한 개수의 컨볼루셔널 계층, 동일한 컨볼루셔널 필터 크기 및 개수를 갖는다.One layer has an upsampling layer and a deconvolutional layer. The inverse convolutional layer performs an inverse operation of the convolutional layer of the convolutional encoder. The structure of the convolutional layer, the convolutional layer, the batch standardization layer, and the non-linearization layer may be repeated. However, since the convolutional decoder has a symmetric structure with the convolutional encoder, it has the same number of convolutional layers and the same convolutional filter size and number as the convolutional encoder.

업샘플링 계층은 풀링 계층의 역동작을 수행한다. 업샘플링 계층은 업샘플링(upsampling)을 진행한다. 업샘플링 계층은 풀링 계층과 다르게 반대로 차원을 확대하는 역할을 한다.The upsampling layer performs the reverse operation of the pooling layer. The upsampling layer performs upsampling. The upsampling layer, unlike the pooling layer, serves to expand the dimension.

역컨볼루셔널 계층은 컨볼루셔널 계층의 역동작을 수행한다. 역컨볼루셔널 계층은 컨볼루셔널 계층과 반대 방향으로 컨볼루션 연산을 수행한다. 역컨볼루셔널 계층은 입력으로 특징맵을 받아 커널을 이용한 컨볼루션 연산으로 출력 영상을 생성한다. 스트라이드를 1로 하면 역컨볼루셔널 계층은 특징맵의 가로, 세로 크기가 출력의 가로, 세로와 동일한 영상을 출력한다. 스트라이드를 2로 하면 역컨볼루셔널 계층은 특징맵의 가로, 세로 크기 대비 절반 크기의 영상을 출력한다. The inverse convolutional layer performs an inverse operation of the convolutional layer. The inverse convolutional layer performs a convolution operation in the opposite direction to the convolutional layer. The inverse convolutional layer receives a feature map as an input and generates an output image by convolution operation using the kernel. If stride is 1, the inverse convolutional layer outputs an image in which the horizontal and vertical sizes of the feature map are the same as the horizontal and vertical dimensions of the output. If stride is 2, the inverse convolutional layer outputs half the size of the feature map horizontally and vertically.

도 3은 회귀 딥러닝 모델의 학습 네트워크 구조에 대한 예이다. 회귀 딥러닝 모델은 컨볼루셔널 인코더-디코더 구조를 갖는다. 도 3에서 학습 네트워크(N)은 회귀 딥러닝 모델인 학습 네트워크를 의미한다. 학습 네트워크(N)은 인코더(N1) 및 디코더(N2)를 포함한다. 3 is an example of a learning network structure of a regression deep learning model. The regression deep learning model has a convolutional encoder-decoder structure. 3, the learning network N refers to a learning network that is a regression deep learning model. The learning network N includes an encoder N1 and a decoder N2.

인코더(N1)는 소스 영상을 입력받아 특징 맵을 출력한다. 디코더(N2)는 입력받은 특징 맵을 기준으로 특정 영상을 출력한다. 디코더(N2)가 출력하는 영상을 출력 영상이라고 명명한다.The encoder N1 receives a source image and outputs a feature map. The decoder N2 outputs a specific image based on the received feature map. The image output by the decoder N2 is called an output image.

인코더(N1)는 소스 영상의 특징 맵을 생성하고, 특징 맵의 크기를 감소시키면서 일부 유효한 특징값만을 추출한다. 디코더(N2)는 인코더(N1)가 출력하는 유효한 특징값을 기준으로 특징 맵의 크기를 증가시킨다. 디코더(N2)는 최종적으로 소스 영상과 동일한 크기의 출력 영상을 출력한다. The encoder N1 generates a feature map of the source image and extracts only some valid feature values while reducing the size of the feature map. The decoder N2 increases the size of the feature map based on valid feature values output from the encoder N1. The decoder N2 finally outputs an output image having the same size as the source image.

인코더(N1)는 복수의 인코더 단을 포함한다. 도 3은 3개의 인코더 단(encoder 1, encoder 2 및 encoder 3)을 예로 도시하였다. The encoder N1 includes a plurality of encoder stages. 3 shows three encoder stages (encoder 1, encoder 2 and encoder 3) as examples.

인코더 단은 복수의 컨볼루셔널 블록과 하나의 풀링 계층으로 구성될 수 있다. 하나의 컨볼루셔널 블록은 컨볼루셔널 계층, 비선형화 계층 및 표준화 계층을 포함한다. 비선형화 계층은 relu layer를 사용할 수 있다. 표준화 계층은 배치 표준화 계층을 사용할 수 있다. 각 계층의 동작은 전술한 바와 같다. The encoder stage may be composed of a plurality of convolutional blocks and one pooling layer. One convolutional block includes a convolutional layer, a nonlinearization layer, and a standardization layer. A relu layer can be used for the nonlinear layer. The standardization layer may use a batch standardization layer. The operation of each layer is as described above.

풀링 계층은 최대 풀링(max pooling)을 수행할 수 있다. 풀링 계층은 최댓값을 갖는 픽셀 위치를 저장해두고, 대응되는 디코더 단의 업 샘플링 계층에 전달한다(도 3에서 점선 화살표로 표시함). The pooling layer can perform max pooling. The pooling layer stores the pixel position having the maximum value and delivers it to the upsampling layer of the corresponding decoder stage (indicated by the dotted arrow in FIG. 3).

디코더(N2)는 복수의 디코더 단을 포함한다. 도 3은 3개의 디코더 단(decoder 1, decoder 2 및 decoder 3)을 예로 도시하였다. 전술한 바와 같이 디코더(N1)는 인코더(N1)에 대칭되는 구조(거울상)를 갖는다. 따라서 디코더 단은 하나의 업샘플링 계층 및 복수(컨볼루셔널 블록의 개수와 동일한)의 역컨볼루셔널 블록으로 구성될 수 있다. 디코더 단은 인코더 단의 동작을 역으로 수행한다. The decoder N2 includes a plurality of decoder stages. 3 shows three decoder stages (decoder 1, decoder 2 and decoder 3) as an example. As described above, the decoder N1 has a structure (mirror image) that is symmetrical to the encoder N1. Therefore, the decoder stage may be composed of one upsampling layer and a plurality of (equal to the number of convolutional blocks) inverse convolutional blocks. The decoder stage reverses the operation of the encoder stage.

업샘플링 계층은 대응되는 인코더 단의 최대 풀링 계층으로부터 전달 받은 최댓값 픽셀 위치에 입력 특징 맵의 값을 출력하고, 최댓값 픽셀 위치 외에는 '0' 의 값을 출력한다. 디코더 단은 대응되는 인코더 단의 컨볼루셔널 블록 및 컨볼루셔널 계층과 필터 개수, 필터 크기 등이 동일하다.The upsampling layer outputs the value of the input feature map at the maximum pixel position received from the maximum pooling layer of the corresponding encoder stage, and outputs a value of '0' other than the maximum pixel position. The decoder stage has the same number of filters and filter sizes as the convolutional block and convolutional layer of the corresponding encoder stage.

딥 러닝 모델은 파라미터 (parameter)는 손실함수를 감소시키는 방향으로 학습을 수행한다. 손실함수는 학습 과정을 통해 신경망의 가중치를 최적화할 수 있다. 예컨대, 가중치 최적화는 경사 하강법(gradient descent method)을 이용할 수 있다. 영상처리장치는 정의된 손실함수에 대하여 학습하고자 하는 각 파라미터의 기울기(gradient) d_parameter를 계산한다. 영상처리장치는 기울기를 최소화하는 값을 찾아 가중치를 최적화할 수 있다. 영상처리장치는 각 파라미터의 기울기를 이용하여 각 파라미터의 값 업데이트할 수 있다. 업데이트되는 파라미터 parameter_new는 다음과 같이 결정될 수 있다. parameter_new = parameter_prev - learnRate* d_parameter이다. 여기서 학습률(learnRate) 은 모델 학습 시에 직접 설정하는 설정 값이다. 도 3의 학습 네트워크는 아래의 수학식 1과 같은 손실함수 L을 이용할 수 있다. In the deep learning model, parameters are trained in the direction of reducing the loss function. The loss function can optimize the weight of the neural network through the learning process. For example, the weight optimization may use a gradient descent method. The image processing apparatus calculates a gradient d _parameter of each parameter to be learned for the defined loss function. The image processing apparatus may optimize the weight by finding a value that minimizes the slope. The image processing apparatus may update the value of each parameter by using the slope of each parameter. The parameter _new to be updated can be determined as follows. parameter _new = parameter _prev -learnRate* d _parameter . Here, the learning rate (learnRate) is a setting value that is set directly during model training. The learning network of FIG. 3 may use the loss function L as shown in Equation 1 below.

수학식 1에서 n은 학습에 사용한 데이터 수, d는 채널 수, c는 열 개수, r은 행 개수이다. 열 및 행은 영상에서의 픽셀 위치를 나타낸다. Y는 출력 데이터, T는 정답 데이터이다. R은 정규화 항이다. 손실함수는 출력 데이터와 정답 데이터 간의 MSE(Mean Squared Error)와 정규화 항을 합한 형태를 갖는다. MSE는 결국 출력 데이터가 정답 데이터와 같아지도록 한다. λ는 하이퍼 파라미터(hyper parameter)로 사전에 설정되는 값이다. 손실함수의 정규화항에 대한 가중치 결정한다.In Equation 1, n is the number of data used for learning, d is the number of channels, c is the number of columns, and r is the number of rows. Columns and rows indicate pixel locations in the image. Y is output data and T is correct answer data. R is a normalization term. The loss function has the form of combining the mean squared error (MSE) and the normalization term between the output data and the correct answer data. The MSE eventually ensures that the output data is the same as the correct answer data. λ is a value that is set in advance as a hyper parameter. Determine the weight for the normalized term of the loss function.

아래 수학식 2는 정규화 항을 나타낸다.Equation 2 below shows a normalization term.

수학식 2에서 ε는 제곱근 내부의 값이 0 이 되는 상황을 방지하기 위한 임의의 작은 양수를 의미한다.In Equation 2, ε means an arbitrary small positive number to prevent a situation in which the value inside the square root becomes 0.

정규화항 R은 출력 데이터의 값이 산발적으로 분포하지 않고, 주변 픽셀과 비슷한 값으로 출력되게 한다. 정규화항 R은 출력 데이터 Y의 행 방향 및 열 방향으로의 주변 픽셀 간의 TV(total variation)를 정규화한다. TV 정규화는 영상에서 급격하게 값이 변하는 경계 부분에서 경계를 비교적 보존하면서, 주변 픽셀 간에 비슷한 값을 갖게 한다. 정규화항은 특정 픽셀을 인접한 주변 픽셀과 일정한 범위 내에서 유사한 값을 갖게 한다. The normalization term R causes the values of the output data not to be distributed sporadically, but to output similar values to the surrounding pixels. The normalization term R normalizes total variation (TV) between peripheral pixels in the row direction and column direction of the output data Y. TV normalization allows similar values between surrounding pixels, while preserving the border relatively in the border portion where the value changes rapidly in the image. The normalization term allows certain pixels to have similar values within a certain range from adjacent surrounding pixels.

소스 영상의 배경은 노이즈 또는 불균일 조명 성분 등을 가질 수 있다. 정규화항은 이러한 배경에서 왜곡 성분을 제거 내지 평탄화(smoothing)하는 역할을 한다. The background of the source image may have noise or non-uniform lighting components. The normalization term serves to remove or smooth distortion components in this background.

도 3에 도시한 학습된 회귀 딥러닝 모델은 픽셀 단위 회귀 방법으로 각 픽셀 별로 배경과의 차이 값을 계산한다. 인코더(N1)가 픽셀 단위로 특징을 추출하는 과정에서 회귀 방식으로 배경의 노이즈를 제거하고, 노이즈가 제거된 배경과 현재 픽셀(예컨대, 특정 객체 일부)의 차이 값을 계산한다. 이를 통해 배경이 복잡하거나, 배경이 특정 패턴을 갖는 경우에도 배경과 타깃 객체를 구분한다. 디코더(N2)는 이와 같이 구분된 특징을 갖는 특징 맵에 기초하여 출력 영상을 생성한다.The trained regression deep learning model illustrated in FIG. 3 calculates a difference value from a background for each pixel by a pixel unit regression method. In the process of the encoder N1 extracting features in units of pixels, noise of a background is removed by a regression method, and a difference value between the noise-removed background and the current pixel (eg, a specific object part) is calculated. Through this, even if the background is complicated or the background has a specific pattern, the background and the target object are distinguished. The decoder N2 generates an output image based on the feature map having the distinguished features.

도 4는 소스 영상과 출력 영상에 대한 예이다. 도 4에서 A는 소스 영상의 예를 나타내고, B는 출력 영상의 예를 나타낸다. 출력 영상은 결과를 직관적으로 나타내기 위해 차이값을 붉은 색상 값으로 나타낸 예이다. 여기서 차이값은 해당 픽셀과 배경과의 차이를 의미한다. 붉은색 픽셀(영역)은 차이값이 가장 높은 영역을 나타내고, 노란색 픽셀은 붉은 색보다는 차이값이 낮은 영역을 나타내고, 파란색은 차이값이 없는 배경을 나타낸다.4 is an example of a source image and an output image. In FIG. 4, A represents an example of a source image, and B represents an example of an output image. The output image is an example of displaying the difference value in red color to intuitively display the result. Here, the difference value means the difference between the pixel and the background. The red pixel (area) represents the region with the highest difference value, the yellow pixel represents the region where the difference value is lower than the red color, and blue represents the background without the difference value.

도 5는 객체 검출 장치(200)의 구성에 대한 예이다. 객체 검출 장치(200)는 전술한 영상처리장치에 해당한다. 객체 검출 장치(200)는 도 1의 검출 서버(150) 내지 검출 PC(180)에 해당하기도 한다.5 is an example of the configuration of the object detection device 200. The object detection device 200 corresponds to the image processing device described above. The object detection device 200 may also correspond to the detection server 150 to detection PC 180 of FIG. 1.

객체 검출 장치(200)는 회귀 딥러닝 모델을 이용하여 소스 영상에서 객체가 식별된 출력 영상을 생성한다. 객체 검출 장치(200)는 물리적으로 다양한 형태로 구현될 수 있다. 예컨대, 객체 검출 장치(200)는 PC와 같은 컴퓨터 장치, 네트워크의 서버, 영상 처리 전용 칩셋 등의 형태를 가질 수 있다. 컴퓨터 장치는 스마트 기기 등과 같은 모바일 기기를 포함할 수 있다.The object detection apparatus 200 generates an output image in which the object is identified in the source image using the regression deep learning model. The object detection device 200 may be physically implemented in various forms. For example, the object detection device 200 may take the form of a computer device such as a PC, a server of a network, a chipset dedicated to image processing, and the like. The computer device may include a mobile device such as a smart device.

객체 검출 장치(200)는 저장 장치(210), 메모리(220), 연산장치(230), 인터페이스 장치(240) 및 통신 장치(250)를 포함한다.The object detection device 200 includes a storage device 210, a memory 220, a computing device 230, an interface device 240, and a communication device 250.

저장 장치(210)는 영상 처리를 위한 신경망 모델을 저장한다. 예컨대, 저장 장치(210)는 도 3과 같은 회귀 딥러닝 모델을 저장할 수 있다. 회귀 딥러닝 모델은 사전에 학습된 상태라고 가정한다. 나아가 저장 장치(210)는 영상 처리에 필요한 프로그램 내지 소스 코드 등을 저장할 수 있다. 저장 장치(210)는 입력된 소스 영상 및 생성된 분할 영상을 저장할 수 있다.The storage device 210 stores a neural network model for image processing. For example, the storage device 210 may store a regression deep learning model as shown in FIG. 3. The regression deep learning model is assumed to be pre-trained. Furthermore, the storage device 210 may store programs or source codes required for image processing. The storage device 210 may store the input source image and the generated split image.

메모리(220)는 객체 검출 장치(200)가 수신한 소스 영상 및 출력 영상 생성과정에서 생성되는 데이터 및 정보 등을 저장할 수 있다.The memory 220 may store data and information generated in the process of generating the source image and the output image received by the object detection apparatus 200.

인터페이스 장치(240)는 외부로부터 일정한 명령 및 데이터를 입력받는 장치이다. 인터페이스 장치(240)는 물리적으로 연결된 입력 장치 또는 외부 저장 장치로부터 소스 영상을 입력받을 수 있다. 인터페이스 장치(240)는 영상 처리를 위한 각종 신경망 모델을 입력받을 수 있다. 인터페이스 장치(240)는 신경망 모델 생성을 위한 학습데이터, 정보 및 파라미터값을 입력받을 수도 있다.The interface device 240 is a device that receives certain commands and data from the outside. The interface device 240 may receive a source image from a physically connected input device or an external storage device. The interface device 240 may receive various neural network models for image processing. The interface device 240 may receive learning data, information, and parameter values for generating a neural network model.

통신 장치(250)는 유선 또는 무선 네트워크를 통해 일정한 정보를 수신하고 전송하는 구성을 의미한다. 통신 장치(250)는 외부 객체로부터 소스 영상을 수신할 수 있다. 통신 장치(250)는 각종 신경망 모델 및 모델 학습을 위한 데이터도 수신할 수 있다. 통신 장치(250)는 생성한 출력 영상을 외부 객체로 송신할 수 있다.The communication device 250 refers to a configuration that receives and transmits certain information through a wired or wireless network. The communication device 250 may receive a source image from an external object. The communication device 250 may also receive various neural network models and data for model training. The communication device 250 may transmit the generated output image as an external object.

통신 장치(250) 내지 인터페이스 장치(240)는 외부로부터 일정한 데이터 내지 명령을 전달받는 장치이다. 통신 장치(250) 내지 인터페이스 장치(240)를 입력장치라고 명명할 수 있다.The communication device 250 to the interface device 240 is a device that receives certain data or commands from the outside. The communication device 250 to the interface device 240 may be referred to as an input device.

연산 장치(230)는 저장장치(210)에 저장된 신경망 모델 내지 프로그램을 이용하여 타깃 객체가 검출된 출력 영상을 생성한다. 연산 장치(230)는 주어진 학습 데이터를 이용하여 영상 처리 과정에 사용되는회귀 딥러닝 모델을 학습할 수 있다. 연산 장치(230)는 전술한 과정을 통해 구축된 회귀 딥러닝 모델을 이용하여 소스 영상에 대한 출력 영상을 생성할 수 있다. 연산 장치(230)는 데이터를 처리하고, 일정한 연산을 처리하는 프로세서, AP, 프로그램이 임베디드된 칩과 같은 장치일 수 있다.The computing device 230 generates an output image in which the target object is detected using a neural network model or program stored in the storage device 210. The computing device 230 may learn a regression deep learning model used in the image processing process using the given learning data. The computing device 230 may generate an output image for the source image using the regression deep learning model constructed through the above-described process. The computing device 230 may be a device such as a processor, an AP, or a chip embedded with a program that processes data and processes a certain operation.

또한, 상술한 바와 같은 영상 처리 방법, 배경에서 타깃 객체를 구분한 영상을 생성하는 방법 내지 분량 검출 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.In addition, the above-described image processing method, a method of generating an image in which a target object is classified from a background, or a method of detecting a quantity may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be stored and provided in a non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium means a medium that stores data semi-permanently and that can be read by a device, rather than a medium that stores data for a short time, such as registers, caches, and memory. Specifically, the various applications or programs described above may be stored and provided in a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

본 실시례 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.The drawings attached to the present embodiment and the present specification merely show a part of the technical spirit included in the above-described technology, and are easily understood by those skilled in the art within the scope of the technical spirit included in the above-described technical specification and drawings. It will be apparent that all of the examples and specific examples that can be inferred are included in the scope of the above-described technology.

100 : 불량 검출 시스템
111, 112, 113 : 카메라
150 : 검출 서버
180 : 검출 PC
10, 20 : 사용자
200 : 객체 검출 장치
210 : 저장장치
220 : 메모리
230 : 연산장치
240 : 인터페이스장치
250 : 통신장치100: defect detection system
111, 112, 113: Camera
150: detection server
180: detecting PC
10, 20: user
200: object detection device
210: storage device
220: memory
230: computing device
240: interface device
250: communication device

Claims

Inputting a source image including a target object and a background into the regression deep learning model by the image processing apparatus; And
The image processing apparatus includes the step of generating an output image in which the target object is separated from the source image using the regression deep learning model.
The regression deep learning model is a convolutional encoder-decoder structure, and the regression deep learning model distinguishes the target object from the background by a pixel unit regression method, and the loss function of the encoder is A method for detecting an image object based on a regression deep learning model defined by the following equation.

,

(Where L is the loss function, n is the number of data used for learning, d is the number of channels, c is the number of columns, r is the number of rows, Y is the output data, T is the correct answer data, R is the normalization term, λ is the weight, ε is any small positive number)

According to claim 1,
The encoder extracts a feature in units of pixels to generate a feature map, and the decoder outputs the output image as an input of the feature map, based on a regression deep learning model-based image object detection method.

According to claim 1,
The encoder uses a pixel-wise regression to smooth a distortion component in the background, and extracts a feature for the target object and a background from which the distortion component is removed.

According to claim 1,
The encoder is a regression deep learning model-based image object detection method including a plurality of encoder blocks and one pooling layer.

According to claim 1,
The encoder loss function includes a normalization term in which a specific pixel is adjusted to a similar value within a predetermined range from surrounding pixels.

delete