KR20190051619A

KR20190051619A - Image data processing apparatus using ensamble and fine tunning and controlling method thereof

Info

Publication number: KR20190051619A
Application number: KR1020170147464A
Authority: KR
Inventors: 정희철; 권순; 김준광; 이진희; 정우영; 최민국
Original assignee: 재단법인대구경북과학기술원
Priority date: 2017-11-07
Filing date: 2017-11-07
Publication date: 2019-05-15
Also published as: KR102426491B1

Abstract

Disclosed are an image data processing apparatus and a control method thereof. The control method comprises the steps of: receiving image data; calculating an average value of output values outputted from a plurality of CNNs based on the received image data; calculating a loss value of an average model including the CNNs included in the average value calculation and calculating an auxiliary loss value for each CNN included in the average value calculation; calculating a total loss value based on the calculated loss value of the average model and the calculated auxiliary loss value; and performing fine-tuning on the average value based on the calculated total loss value.

Description

TECHNICAL FIELD [0001] The present invention relates to an image data processing apparatus and an image data processing method using an ensemble and a fine adjustment,

본 개시는 영상 데이터 처리 장치 및 제어 방법에 관한 것으로, 더욱 상세하게는 여러 네트워크 모델을 앙상블(ensemble)하는 앙상블 및 미세 조정을 이용한 영상 데이터 처리 장치 및 제어 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image data processing apparatus and a control method, and more particularly, to an apparatus and method for processing image data using an ensemble and fine adjustment that ensemble various network models.

지능형 교통 감시 시스템의 개발은 최근 중요한 이슈로 부각되고 있다. 지능형 교통 감시 시스템에서 가장 중요한 기능의 하나는 인식된 영상 데이터로부터 유용한 분석하고 추출하는 것이다. 특히, 차량 또는 보행자를 분류하고 지역화하는 기술은 교통 감시 분석을 위한 기본적인 과장이다.The development of intelligent traffic surveillance system has recently become an important issue. One of the most important functions of the intelligent traffic surveillance system is to analyze and extract from the recognized image data. In particular, the technique of classifying and localizing vehicles or pedestrians is a basic exaggeration for traffic surveillance analysis.

차량 분류 및 지역화를 위한 기존 기술은 수많은 교통 카메라에 의해 일정한 주기로 이미지를 촬영하여 실제 교통 감지 환경을 얻는다. 그리고, 얻은 수많은 이미지를 분석하여 차량 분류 및 지역화한다. 그러나, 기존 기술의 분석 결과의 정확도는 만족할 만한 수준이 아니다.Conventional technologies for classifying and localizing vehicles capture images at regular intervals by a number of traffic cameras to obtain an actual traffic detection environment. Then, the obtained images are analyzed to classify and localize the vehicles. However, the accuracy of the analysis results of existing technologies is not satisfactory.

영상 인식의 정확도를 높이기 위해 복수 개의 영상 인식 네트워크 모델을 이용하는 방법이 있다. 그러나, 복수 개의 네트워크 모델의 출력값을 평균하여 영상 인식 성능을 향상시킬 수 있지만 네트워크들 간의 시너지 효과가 극대화되지 못하여 효율적이지 못하다. 따라서, 복수 개의 네트워크 모델을 사용하여 영상 인식 성능을 향상시키고 시너지 효과를 극대화시켜 효율적인 영상 인식 장치에 대한 필요성이 존재한다.In order to increase the accuracy of image recognition, there is a method of using a plurality of image recognition network models. However, although the image recognition performance can be improved by averaging the output values of a plurality of network models, the synergy effect between the networks is not maximized, which is not efficient. Accordingly, there is a need for an efficient image recognition apparatus by using a plurality of network models to improve image recognition performance and maximize synergy effects.

본 개시는 상술한 문제점을 해결하기 위한 것으로, 본 개시의 목적은 영상 인식 성능을 향상시키고 복수 개의 네트워크 모델 간의 시너지를 극대화하는 영상 데이터 처리 장치 및 제어 방법을 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and it is an object of the present invention to provide an image data processing apparatus and a control method for improving image recognition performance and maximizing synergy among a plurality of network models.

이상과 같은 목적을 달성하기 위한 본 개시의 일 실시 예에 따르면, 영상 데이터 처리 장치의 제어 방법은 영상 데이터를 입력받는 단계, 상기 입력받은 영상 데이터에 기초하여 복수의 CNN에서 출력되는 출력값의 평균값을 산출하는 단계, 상기 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 상기 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출하는 손실값 산출 단계, 상기 산출된 평균 모델의 손실값과 상기 산출된 보조 손실값에 기초하여 총 손실값을 산출하는 단계 및 상기 산출된 총 손실값에 기초하여 상기 평균값을 미세 조정하는 단계를 포함한다.According to an embodiment of the present invention, there is provided a method of controlling an image data processing apparatus, the method including receiving image data, calculating an average value of output values output from a plurality of CNNs based on the input image data, A loss value calculating step of calculating a loss value of an average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of a plurality of CNNs included in the average value calculation, Calculating a total loss value based on the calculated loss value and the calculated auxiliary loss value; and finely adjusting the average value based on the calculated total loss value.

그리고, 상기 평균 모델의 손실값은 상기 입력받은 영상 데이터를 정규화하고, 상기 정규화된 영상 데이터와 학습된 가중치 파라미터에 기초하여 상기 평균값 산출에 포함된 복수의 CNN 각각에서 출력된 출력값의 평균값에 대한 손실값일 수 있다.The loss value of the average model is obtained by normalizing the input image data and calculating a loss value of an average value of output values output from each of the plurality of CNNs included in the average value calculation based on the normalized image data and the learned weight parameter Lt; / RTI >

한편, 상기 평균값을 산출하는 단계는 복수의 CNN 전체 중 임의로 선택된 CNN을 배제시키고 상기 평균값을 산출할 수 있다.Meanwhile, in the step of calculating the average value, CNNs arbitrarily selected from all the plurality of CNNs may be excluded and the average value may be calculated.

그리고, 상기 평균값을 산출하는 단계는 CNN을 임의로 선택하여 배제시키고 상기 평균값을 산출하는 과정을 반복할 수 있다.The step of calculating the average value may repeat the process of arbitrarily selecting and excluding CNN and calculating the average value.

또한, 상기 평균값을 산출하는 단계는 상기 복수의 CNN 전체 중 기 설정된 확률로 배제시킬 CNN을 균일하게 선택할 수 있다.Also, the step of calculating the average value may uniformly select CNNs to be excluded with a predetermined probability among the plurality of CNNs.

이상과 같은 목적을 달성하기 위한 본 개시의 일 실시 예에 따르면, 영상 데이터 처리 장치는 영상 데이터를 입력받는 입력부, 상기 입력받은 영상 데이터에 기초하여 학습을 수행하는 복수의 CNN 및 상기 복수의 CNN에서 출력되는 출력값의 평균값을 산출하는 제어부를 포함하고, 상기 제어부는 상기 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 상기 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출하고, 상기 산출된 평균 모델의 손실값과 상기 산출된 보조 손실값에 기초하여 총 손실값을 산출하며, 상기 산출된 총 손실값에 기초하여 상기 평균값을 미세 조정한다.According to an aspect of the present invention, there is provided an image data processing apparatus including an input unit for inputting image data, a plurality of CNNs for performing learning based on the input image data, Wherein the control unit calculates a loss value of an average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of a plurality of CNNs included in the average value calculation, Calculates a total loss value based on the calculated loss value of the average model and the calculated auxiliary loss value, and finely adjusts the average value based on the calculated total loss value.

한편, 상기 제어부는 복수의 CNN 전체 중 임의로 선택된 CNN을 배제시키고 상기 평균값을 산출할 수 있다.Meanwhile, the control unit may exclude arbitrarily selected CNNs from all the plurality of CNNs and calculate the average value.

그리고, 상기 제어부는 CNN을 임의로 선택하여 배제시키고 상기 평균값을 산출하는 과정을 반복할 수 있다.The control unit may repeat the process of arbitrarily selecting and excluding CNN and calculating the average value.

또한, 상기 제어부는 상기 복수의 CNN 전체 중 기 설정된 확률로 배제시킬 CNN을 균일하게 선택할 수 있다.In addition, the controller can uniformly select CNNs to be excluded with a predetermined probability among the plurality of CNNs.

이상 설명한 바와 같이, 본 개시의 다양한 실시 예에 따르면, 영상 데이터 처리 장치 및 제어 방법은 영상 인식 성능을 향상시키고 복수 개의 네트워크 모델 간의 시너지를 극대화할 수 있다.As described above, according to various embodiments of the present disclosure, an image data processing apparatus and a control method can improve image recognition performance and maximize synergy among a plurality of network models.

도 1은 본 개시의 일 실시 예에 따른 영상 데이터 처리 장치의 블록도이다.
도 2는 본 개시의 일 실시 예에 따른 네트워크 융합을 위한 미세 조정을 설명하는 도면이다.
도 3은 본 개시의 일 실시 예에 따른 네트워크 융합 모델을 설명하는 도면이다.
도 4는 본 개시의 일 실시 예에 따른 영역 기반 영상 데이터 처리 과정을 설명하는 도면이다.
도 5는 본 개시의 일 실시 예에 따른 영역 기반 데이터 처리 장치를 위한 멀티 네트워크 융합 모델을 설명하는 도면이다.
도 6은 네트워크 융합 모델의 정확도(precision)와 검출(recall)에 대한 결과를 나타낸 도면이다.
도 7은 본 개시의 일 실시 예에 따른 영상 데이터 처리 장치의 흐름도이다.1 is a block diagram of an image data processing apparatus according to an embodiment of the present disclosure.
2 is a diagram illustrating a fine adjustment for network convergence according to one embodiment of the present disclosure;
3 is a diagram illustrating a network convergence model according to an embodiment of the present disclosure;
4 is a view for explaining a region-based image data processing process according to an embodiment of the present disclosure.
5 is a diagram illustrating a multi-network convergence model for an area-based data processing apparatus according to an embodiment of the present disclosure;
6 is a diagram showing the results of the precision and the recall of the network fusion model.
7 is a flowchart of an image data processing apparatus according to an embodiment of the present disclosure.

이하에서는 첨부된 도면을 참조하여 다양한 실시 예를 보다 상세하게 설명한다. 본 명세서에 기재된 실시 예는 다양하게 변형될 수 있다. 특정한 실시 예가 도면에서 묘사되고 상세한 설명에서 자세하게 설명될 수 있다. 그러나, 첨부된 도면에 개시된 특정한 실시 예는 다양한 실시 예를 쉽게 이해하도록 하기 위한 것일 뿐이다. 따라서, 첨부된 도면에 개시된 특정 실시 예에 의해 기술적 사상이 제한되는 것은 아니며, 발명의 사상 및 기술 범위에 포함되는 모든 균등물 또는 대체물을 포함하는 것으로 이해되어야 한다.Various embodiments will now be described in detail with reference to the accompanying drawings. The embodiments described herein can be variously modified. Specific embodiments are described in the drawings and may be described in detail in the detailed description. It should be understood, however, that the specific embodiments disclosed in the accompanying drawings are intended only to facilitate understanding of various embodiments. Accordingly, it is to be understood that the technical idea is not limited by the specific embodiments disclosed in the accompanying drawings, but includes all equivalents or alternatives falling within the spirit and scope of the invention.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이러한 구성요소들은 상술한 용어에 의해 한정되지는 않는다. 상술한 용어는 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. Terms including ordinals, such as first, second, etc., may be used to describe various elements, but such elements are not limited to the above terms. The above terms are used only for the purpose of distinguishing one component from another.

본 명세서에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.In this specification, the terms " comprises " or " having ", and the like, are intended to specify the presence of stated features, integers, steps, operations, elements, parts, or combinations thereof, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof. It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

한편, 본 명세서에서 사용되는 구성요소에 대한 "모듈" 또는 "부"는 적어도 하나의 기능 또는 동작을 수행한다. 그리고, "모듈" 또는 "부"는 하드웨어, 소프트웨어 또는 하드웨어와 소프트웨어의 조합에 의해 기능 또는 동작을 수행할 수 있다. 또한, 특정 하드웨어에서 수행되어야 하거나 적어도 하나의 제어부에서 수행되는 "모듈" 또는 "부"를 제외한 복수의 "모듈들" 또는 복수의 "부들"은 적어도 하나의 모듈로 통합될 수도 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.In the meantime, " module " or " part " for components used in the present specification performs at least one function or operation. Also, " module " or " part " may perform functions or operations by hardware, software, or a combination of hardware and software. Also, a plurality of " modules " or a plurality of " parts ", other than a " module " or " part " which is to be performed in a specific hardware or performed in at least one control section, may be integrated into at least one module. The singular expressions include plural expressions unless the context clearly dictates otherwise.

그 밖에도, 본 발명을 설명함에 있어서, 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우, 그에 대한 상세한 설명은 축약하거나 생략한다.In addition, in the description of the present invention, when it is judged that the detailed description of known functions or constructions related thereto may unnecessarily obscure the gist of the present invention, the detailed description thereof will be abbreviated or omitted.

도 1은 본 개시의 일 실시 예에 따른 영상 데이터 처리 장치의 블록도이다.1 is a block diagram of an image data processing apparatus according to an embodiment of the present disclosure.

도 1을 참조하면, 영상 데이터 처리 장치(100)는 입력부(110), 복수의 CNN(Covolutional Neural Network)(120) 및 제어부(130)를 포함한다. 입력부(110)는 영상 데이터를 입력받는다. 예를 들어, 영상 데이터는 차량을 포함하는 이미지일 수 있다. 예를 들어, 입력부(110)는 통신 인터페이스로 연결된 통신 모듈, 입력 인터페이스로 연결된 입력 단자 등을 포함할 수 있다. 즉, 입력부(110)가 통신 모듈로 구현되는 경우, 영상 데이터 처리 장치(100)는 유무선 통신 방식을 이용하여 영상 데이터를 입력받을 수 있다. 또는, 입력부(110)가 입력 단자로 구현되는 경우, 영상 데이터 처리 장치(100)는 내외부 저장 장치로부터 영상 데이터를 입력받을 수 있다.Referring to FIG. 1, an apparatus 100 for processing image data includes an input unit 110, a plurality of CNNs (Covolutional Neural Networks) 120, and a controller 130. The input unit 110 receives image data. For example, the image data may be an image including a vehicle. For example, the input unit 110 may include a communication module connected to the communication interface, an input terminal connected to the input interface, and the like. That is, when the input unit 110 is implemented as a communication module, the image data processing apparatus 100 can receive image data using a wire / wireless communication method. Alternatively, when the input unit 110 is implemented as an input terminal, the image data processing apparatus 100 may receive image data from the internal and external storage devices.

복수의 CNN(120)은 적어도 두 개 이상의 CNN(120-1, 120-2, 120-3)을 포함할 수 있다. 복수의 CNN(120) 각각은 입력된 영상 데이터를 전달받아 학습을 수행한다. 복수의 CNN(120)은 학습 과정에서 미세 조정 과정 및 앙상블 과정을 통해 영상 데이터의 인식 성능을 향상시킬 수 있다. 미세 조정 과정은 영상의 인식률을 높이기 위해 에러를 보정하는 과정이고, 앙상블 과정은 복수의 CNN의 출력값을 평균하는 과정을 의미한다.The plurality of CNNs 120 may include at least two CNNs 120-1, 120-2, and 120-3. Each of the plurality of CNNs 120 receives the input image data and performs learning. A plurality of CNNs 120 can improve recognition performance of image data through a fine adjustment process and an ensemble process in a learning process. The fine adjustment process is a process of correcting an error to increase the recognition rate of an image, and the ensemble process is a process of averaging output values of a plurality of CNNs.

제어부(130)는 복수의 CNN(120)에서 출력되는 1차적으로 학습된 출력값의 평균값을 산출한다. 그리고, 제어부(130)는 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출한다. 영상 데이터 처리 장치(100)는 앙상블 과정에서 전체 CNN 중에서 일부 CNN의 출력값을 배제할 수 있다. 즉, 제어부(130)는 평균값 산출에 전체 CNN(120)의 출력값을 평균한 평균값을 산출할 수 있고, 제1 내지 제n CNN(120-1, 120-2, 120-3) 중 적어도 하나의 CNN을 배제하고 나머지 CNN의 출력값을 평균한 평균값을 산출할 수도 있다. 제어부(130)는 배제되는 CNN을 임의로 선택할 수 있다. 그리고, 제어부(130)는 CNN 출력값의 평균값 산출 과정을 반복할 수 있는데, 반복할 때마다 서로 다른 CNN을 임의로 선택해서 배제할 수 있다. 물론, 반복되는 평균값 산출 과정에서 평균값 산출에 사용되는 CNN은 동일한 CNN을 포함할 수도 있다. 제어부(130)는 평균값 산출에서 배제되는 CNN을 기 설정된 확률로 균일하게 선택할 수 있다. 구체적인 앙상블 과정은 후술한다.The control unit 130 calculates an average value of the output values of the firstarily learned output from the plurality of CNNs 120. [ Then, the control unit 130 calculates the loss value of the average model including the plurality of CNNs included in the average value calculation and the auxiliary loss value for each of the plurality of CNNs included in the average value calculation. The video data processing apparatus 100 can exclude the output values of some CNNs from all the CNNs in the ensemble process. That is, the control unit 130 can calculate an average value obtained by averaging the output values of all the CNNs 120 in the average value calculation, and can calculate the average value of at least one of the first to the n-th CNNs 120-1, 120-2, CNN may be excluded and the average value of the output values of the remaining CNNs may be calculated. The control unit 130 can arbitrarily select the CNN to be excluded. The control unit 130 may repeat the average value calculation process of the CNN output value, and may select and discard different CNNs each time it is repeated. Of course, the CNN used for the average value calculation in the repeated average calculation process may include the same CNN. The control unit 130 can uniformly select CNNs excluded from the average value calculation with a predetermined probability. The concrete ensemble process will be described later.

제어부(130)는 산출된 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 총 손실값을 산출하며, 산출된 총 손실값에 기초하여 평균값을 미세 조정(Joint fine tuning)한다. 예를 들어, 평균 모델의 손실값은 입력받은 영상 데이터를 정규화하고, 정규화된 영상 데이터와 학습된 가중치 파라미터에 기초하여 평균값 산출에 포함된 복수의 CNN 각각에서 출력된 출력값의 평균값에 대한 손실값일 수 있다. 구체적인 미세 조정 과정은 후술한다.The controller 130 calculates the total loss value based on the calculated loss value of the average model and the calculated auxiliary loss value, and finely adjusts the average value based on the calculated total loss value (Joint fine tuning). For example, the loss value of the average model may be a loss value for an average value of output values output from each of the plurality of CNNs included in the average value calculation based on the normalized image data and the learned weight parameter, have. A detailed fine tuning procedure will be described later.

도 2는 본 개시의 일 실시 예에 따른 네트워크 융합을 위한 미세 조정을 설명하는 도면이다.2 is a diagram illustrating a fine adjustment for network convergence according to one embodiment of the present disclosure;

도 2를 참조하면 복수의 CNN과 손실값이 도시되어 있다. 일 실시 예로서, 분류 태스크를 위한 CNN은 최단거리 연결을 적용하는 18 레이어를 가진 pre activation ResNet을 기반으로 할 수 있다. 학습을 위한 영상 데이터는 224×224×3의 크기를 가진 이미지일 수 있다. 학습을 위한 영상 데이터의 크기는 다양하기 때문에 제어부(130)는 각 학습을 위한 영상 데이터를 리사이징하고 랜덤하게 224×224×3으로 잘라낼 수 있다. 각 영상 데이터의 RGB 픽셀은 아래의 식과 같이 정규화될 수 있다.Referring to FIG. 2, a plurality of CNNs and loss values are shown. As an example, CNN for classification tasks can be based on pre activation ResNet with 18 layers applying the shortest distance connection. The image data for learning may be an image having a size of 224 x 224 x 3. Since the size of the image data for learning varies, the controller 130 may resize the image data for each learning and randomly cut the image data to 224 × 224 × 3. The RGB pixels of each image data can be normalized as shown in the following equation.

---------- (1)

---------- (One)

여기서, x(i, j)는 (i, j) 좌표에서 RGB 픽셀 값을 가지는 벡터일 수 있다. 측광 왜곡(photometric distortion)과 컬러 증가율(color augmentation)은 좋은 결과를 얻기 위해 적용될 수 있다. 각 CNN은 ReLU(Rectified Linear Unit)를 활성화 함수로 이용하고 배치 정규화(batch normalization)가 각 레이어에 사용될 수 있다. 네트워크의 출력 개수는 데이터셋의 카테고리의 수와 동일할 수 있고, 일례로 11일 수 있다. 각 CNN의 최종 출력에는 소프트맥스 레이어가 적용될 수 있다.Here, x (i, j) may be a vector having RGB pixel values at (i, j) coordinates. Photometric distortion and color augmentation can be applied to achieve good results. Each CNN uses ReLU (Rectified Linear Unit) as an activation function and batch normalization can be used for each layer. The number of outputs of the network may be equal to the number of categories of the data set, and may be 11, for example. A soft max layer may be applied to the final output of each CNN.

영상 데이터 처리 장치는 전체 CNN 중 일부 CNN을 앙상블하기 위해 미세 조정 과정을 수행할 수 있다. 영상 데이터 처리 장치는 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 총 손실값을 산출한다. 총 손실

는 다음과 같이 정의될 수 있다.The image data processing apparatus may perform a fine tuning process for ensuring some CNNs of all the CNNs. The image data processing device calculates the total loss value based on the loss value of the average model and the calculated auxiliary loss value. Total loss

Can be defined as follows.

---------- (2)

여기서 N은 CNN 모델의 개수이고,

는 평균 모델의 손실 함수이다. 그리고,

는 i번째 CNN의 보조(auxiliary) 손실이다. 손실값은 학습 단계에서만 사용될 수 있다. o는 다음과 같이 정의된다.Where N is the number of CNN models,

Is the loss function of the mean model. And,

Is the auxiliary loss of the ith CNN. The loss value can only be used in the learning phase. o is defined as follows.

---------- (3)

여기서,

는 w_i의 학습된 가중치 파라미터를 가지는 i번째 CNN의 출력이고,

는 식(1)에 의해 정규화된 입력 데이터를 나타낸다. 각 네트워크는

를 최소화하기 위해 학습 비율

의 아주 작은 값을 사용하여 미세 조정될 수 있다. 예를 들어,

= 1e^-5일 수 있다. 가중치 파라미터의 값은 초기에 임의의 값일 수 있다. 그리고, 가중치 파라미터는 학습에 따라 가변되고, 역전파 과정을 통해 학습이 종료될 때까지 계속적으로 수정될 수 있다.here,

Is the output of the i-th CNN with the learned weight parameter of w _i ,

Represents the input data normalized by equation (1). Each network

To minimize the learning rate

Can be fine-tuned using a very small value of < RTI ID = 0.0 > E.g,

= 1e ^-5 . The value of the weight parameter may initially be any value. Then, the weight parameter is varied according to the learning, and can be continuously corrected until the learning is ended through the back propagation process.

즉, 도 2를 참조하면 제1 내지 제n CNN이 도시되어 있다. 그리고, 각 CNN의 보조 손실값이 산출될 수 있다. 그리고, 평균값 산출에 사용된 CNN(평균 모델)의 평균값에 대한 손실값(

)이 산출될 수 있다. 총 손실값은 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 산출될 수 있다. 영상 데이터 처리 장치는 총 손실값을 최소화하도록 손실값을 산출하고 산출된 결과에 기초하여 복수의 CNN을 미세 조정할 수 있다.That is, referring to FIG. 2, first to nth CNNs are shown. Then, the auxiliary loss value of each CNN can be calculated. Then, the loss value (the average value) of the average value of the CNN (average model)

) Can be calculated. The total loss value can be calculated based on the loss value of the average model and the calculated auxiliary loss value. The image data processing device may calculate the loss value so as to minimize the total loss value and fine adjust the plurality of CNNs based on the calculated result.

도 3은 본 개시의 일 실시 예에 따른 네트워크 융합 모델을 설명하는 도면이다.3 is a diagram illustrating a network convergence model according to an embodiment of the present disclosure;

도 3(a) 및 도 3(b)를 참조하면, 앙상블 과정이 도시되어 있다. 앙상블 과정은 전체 CNN 중 일부 CNN을 선택하여 평균값을 산출하는 과정이다. 영상 데이터 처리 장치는 평균값 산출에 사용되는 CNN을 균일하고 랜덤하게 선택할 수 있다. 반대로 말하면, 영상 데이터 처리 장치는 균일하고 랜덤하게 CNN을 선택하여 평균값 산출 과정에서 배제시킬 수 있다. 3 (a) and 3 (b), an ensemble process is shown. The ensemble process is the process of selecting a few CNNs of all CNNs and calculating an average value. The image data processing apparatus can uniformly and randomly select the CNN used for calculating the average value. Conversely, the video data processing apparatus can select CNN uniformly and randomly and exclude it from the average value calculation process.

영상 데이터 처리 장치는 1-P_d의 확률로 평균값 산출에 사용될 CNN을 선택할 수 있다. 여기서 P_d는 드랍율(drop rate)을 나타낸다. 일 실시 예로서, P_d는 0.1일 수 있다. The image data processing apparatus can select the CNN to be used for calculating the average value with a probability of 1-P _d . Where P _d represents the drop rate. In one embodiment, P _d may be 0.1.

그리고, 영상 데이터 처리 장치는 앙상블 과정을 반복할 수 있다. 영상 데이터 처리 장치는 랜덤하게 CNN을 선택하기 때문에 첫번째 과정과 두번째 과정에서 평균값 산출에 동일한 CNN이 선택될 수 있고 배제될 수도 있다.Then, the video data processing apparatus can repeat the ensemble process. Since the video data processing apparatus randomly selects CNN, the same CNN can be selected and excluded in the average value calculation in the first and second processes.

예를 들어, 도 3(a)를 참조하면 첫번째 앙상블 과정에서 제1, 제2, 제4 및 제5 CNN이 평균값 산출에 선택되고, 제3, 제6 및 제7 CNN은 배제될 수 있다. 그리고, 도 3(b)를 참조하면 두번째 앙상블 과정에서 제1, 제2, 제4, 제6 및 제7 CNN이 평균값 산출에 선택되고, 제3 및 제5 CNN은 배제될 수 있다.For example, referring to FIG. 3 (a), first, second, fourth, and fifth CNNs are selected for average value calculation in the first ensemble process, and third, sixth, and seventh CNNs can be excluded. Referring to FIG. 3 (b), the first, second, fourth, sixth, and seventh CNNs are selected for the average value calculation in the second ensemble process, and the third and fifth CNNs can be excluded.

선택된 CNN들은 학습 단계에서 예측을 위해 평균값을 산출하는 데 이용될 수 있다. 일부 CNN을 선택하여 평균값을 산출하는 과정은 학습 단계에서 좋은 예측을 하는데 도움을 줄 수 있다. 즉, 앙상블 과정은 표 1에서 보여주는 바와 같이 학습 단계의 모든 모델을 이용하여 예측을 수행할 때보다 좋은 결과를 나타낸다.The selected CNNs may be used to calculate an average value for prediction at the learning stage. Selecting some CNNs and calculating the average value can help to make good predictions in the learning stage. That is, the ensemble process shows better results than the prediction using all models in the learning stage, as shown in Table 1.

MethodMethod EnsembleEnsemble JFJF 본 실시 예In this embodiment Cohen KappaCohen Kappa 0.95670.9567 0.95760.9576 0.95780.9578 AccuracyAccuracy 0.97210.9721 0.97270.9727 0.97280.9728 Mean PrecisionMean Precision 0.92930.9293 0.93140.9314 0.93070.9307 Mean RecallMean Recall 0.90180.9018 0.89850.8985 0.89910.8991

테스트 단계에서 더 정확한 예측을 위해 영상 데이터 처리 장치는 멀티 크랍 테스트(multi-crop test)를 수행할 수 있다. 예를 들어, 영상 데이터 처리 장치는 네트워크의 fully connected layer를 fully convolutional 형태로 변형되어 구현되고, 멀티플 스케일에서 값을 평균할 수도 있다. 그리고, 영상 데이터 처리 장치는 미세 조정 과정 이후 상술한 앙상블 과정을 통해 학습된 모델의 소프트맥스 출력값을 평균할 수 있다.For more accurate prediction in the test phase, the image data processing apparatus can perform a multi-crop test. For example, a video data processing apparatus may be implemented by transforming a fully connected layer of a network into a fully convolutional form and averaging the values at multiple scales. Then, the image data processing apparatus can average the soft max output values of the learned model through the ensemble process after the fine adjustment process.

도 4는 본 개시의 일 실시 예에 따른 영역 기반 영상 데이터 처리 과정을 설명하는 도면이다.4 is a view for explaining a region-based image data processing process according to an embodiment of the present disclosure.

도 4를 참조하면 CFE(Convolutional Feature Extraction) 레이어와 RFL(RoI Feature Localization) 레이어를 포함하는 지역 기반 컨벌류션 감지기의 구조가 도시되어 있다.Referring to FIG. 4, there is shown a structure of a region-based convolution sensor including a Convolutional Feature Extraction (CFE) layer and a RoF Feature Localization (RFL) layer.

지역 기반 컨벌류션 감지기는 위치 민감 스코어 맵(position-sensitive score map)과 위치 민감 RoI 풀링(position-sensitive RoI pooling)을 지역 기반 fully convolutional network(R-FCN)에 조합한 지역 기반 CNN일 수 있다. 지역 기반 컨벌류션 감지기는 CFE 레이어와 RoI 레이어를 포함할 수 있다. CFE 레이어는 백본 아키텍처(backborn architecture)로부터 CNN 특징을 추출할 수 있다. 그리고, RFL 레이어는 RoI 특징과 NMS(Non-Maximum Suppression)를 가지는 바운딩 박스를 지역화(localize)할 수 있다.The region-based convolution sensor may be a region-based CNN combining a position-sensitive score map and position-sensitive RoI pooling into a region-based fully convolutional network (R-FCN). The area-based convolutional detector may include a CFE layer and a RoI layer. The CFE layer can extract CNN features from the backborn architecture. And, the RFL layer can localize the bounding box with RoI features and NMS (Non-Maximum Suppression).

일 실시 예로서, 영상 데이터 처리 장치는 수십만개의 학습 이미지로 학습하고, 11개의 객체 클래스를 포함하는 약 3만개의 테스트 이미지로 테스트될 수 있다. 예를 들어, 11개의 객체 클래스는 연결식 트럭(articulated truck), 자전거(bicycle), 버스(bus), 차(car), 오토바이(motorcycle), 무동력 차량(non-motorized vechile), 보행자(pedestrian), 소형 트럭(pickup truck), 단일 구성 트럭(single unit truck), 작업 밴(work van), 배경(background)일 수 있다.In one embodiment, the image data processing apparatus can be tested with several thousand test images and about 30,000 test images including 11 object classes. For example, eleven object classes may be used in conjunction with articulated trucks, bicycles, buses, cars, motorcycles, non-motorized vechiles, pedestrians, A pickup truck, a single unit truck, a work van, and a background.

일 실시 예로서, 영상 데이터 처리 장치는 제공된 학습 테이터와 크로스 엔트로피 손실(cross-entropy loss) 및 박스 회귀 손실(box regression loss)의 합으로 구성된 객체 함수를 가지고 분류자(classifier)를 학습할 수 있다.In one embodiment, the image data processing apparatus can learn a classifier with an object function composed of a provided learning data, a sum of cross-entropy loss and box regression loss .

---------- (4)

c*는 c*가 0일 때 백그라운드 레이블(label)을 가지는 RoI 그라운드 진실 레이블(RoI ground truth label)이고, I(c*)는 c*가 백그라운드 레이블(c*=0)을 가지는 경우를 제외한 1과 동일한 지시자(indicator)이다.c * is a RoI ground truth label with a background label when c * is 0 and I (c *) is a RoI ground truth label with c * = 0 except when c * has a background label (c * = 0) It is the same indicator as 1.

는 클래스 출력(Oc*)이 주어졌을 때 분류를 위한 크로스 엔트로피 손실고,

는 t_b=(t(x), t(y), t(w), t(h))를 가지는 바운딩 박스 회귀 손실이다. 그리고,

는 균형 웨이트(balance weight)로서 1일 수 있다.

Is the cross entropy loss for classification when the class output (Oc *) is given,

Is the bounding box regression loss with t _b = (t (x), t (y), t (w), t (h)). And,

May be one as a balance weight.

도 5는 본 개시의 일 실시 예에 따른 영역 기반 데이터 처리 장치를 위한 멀티 네트워크 융합 모델을 설명하는 도면이고, 도 6은 네트워크 융합 모델의 정확도(precision)와 검출(recall)에 대한 결과를 나타낸 도면이다.FIG. 5 is a diagram illustrating a multi-network convergence model for a region-based data processing apparatus according to an embodiment of the present disclosure, FIG. 6 is a diagram showing the results of precision and detection of a network convergence model, to be.

도 5를 참조하면 복수의 CFE 레이어와 복수의 RFL 레이어가 도시되어 있다. 영상 데이터 처리 장치의 각각의 R-FCN 모델이 학습된 후에 도 5에 도시된 바와 같이 하나의 지역화 모델로 앙상블될 수 있다. 각 모델은 입력 데이터로서 영상 데이터를 입력받고 영상 데이터에 대한 최초 바운딩 박스를 생성할 수 있다. 이러한 각 모델의 바운딩 박스는 최종적으로 나의 NMS로 병합될 수 있다. 표 2에 나타난 바와 같이, 본 개시의 4개의 학습된 R-FCN을 이용한 앙상블 모델은 전체 클래스의 79.24% mAP의 결과를 보여주었으며, 기존 기술 대비 높은 정확도를 보여준다.Referring to FIG. 5, a plurality of CFE layers and a plurality of RFL layers are illustrated. After each R-FCN model of the video data processing apparatus is learned, it can be ensemble into one localization model as shown in Fig. Each model can receive input image data as input data and generate an initial bounding box for the image data. The bounding box of each of these models can eventually be merged into my NMS. As shown in Table 2, the ensemble model using the four learned R-FCNs of this disclosure showed a result of 79.24% mAP of the entire class, showing high accuracy compared with the existing technology.

[표 2][Table 2]

도 6을 참조하면 최종 앙상블 모델의 각 클래스에 대한 세부적인 mAP가 도시되어 있다.Referring to FIG. 6, a detailed mAP for each class of the final ensemble model is shown.

지금까지 영상 데이터 처리 장치의 다양한 실시 예를 설명하였다. 아래에서는 영상 데이터 처리 장치 제어 방법의 흐름도를 설명한다.Various embodiments of the image data processing apparatus have been described so far. A flowchart of the video data processing apparatus control method will be described below.

도 7은 본 개시의 일 실시 예에 따른 영상 데이터 처리 장치의 흐름도이다.7 is a flowchart of an image data processing apparatus according to an embodiment of the present disclosure.

데이터 처리 장치는 영상 데이터를 입력받는다(S710). 일 실시 예로서, 영상 데이터는 통신 인터페이스로 연결된 통신 모듈, 입력 인터페이스로 연결된 입력 단자 등을 통해 데이터 처리 장치로 입력될 수 있다.The data processing apparatus receives the image data (S710). In one embodiment, the image data may be input to a data processing apparatus through a communication module connected to the communication interface, an input terminal connected to the input interface, or the like.

데이터 처리 장치는 입력받은 영상 데이터에 기초하여 복수의 CNN에서 출력되는 출력값의 평균값을 산출한다(S720). 데이터 처리 장치는 복수의 CNN 전체 중 임의로 선택된 CNN을 배제시키고 평균값을 산출할 수 있다. 그리고, 데이터 처리 장치는 CNN을 임의로 선택하여 배제시키고 평균값을 산출하는 과정을 반복할 수 있다. 데이터 처리 장치는 복수의 CNN 전체 중 기 설정된 확률로 배제시킬 CNN을 균일하게 선택할 수 있다. 따라서, 평균값 산출이 반복될 때, 이전 과정의 평균값 산출에 선택된 CNN이 다음 과정의 평균값 산출에 선택될 수도 있고, 배제될 수도 있다. 그러나, 데이터 처리 장치는 CNN 선택을 균일하게 하기 때문에 평균값 산출의 횟수가 반복되면 전체적으로 선택된 CNN은 균일해질 수 있다.The data processing apparatus calculates an average value of output values output from a plurality of CNNs based on the input image data (S720). The data processing apparatus can exclude arbitrarily selected CNNs from all the plurality of CNNs and calculate an average value. Then, the data processing apparatus can repeat the process of arbitrarily selecting and excluding CNN and calculating an average value. The data processing apparatus can uniformly select CNNs to be excluded with a predetermined probability among the plurality of CNNs. Therefore, when the average value calculation is repeated, the CNN selected for the calculation of the average value of the previous process may be selected or excluded from the calculation of the average value of the next process. However, since the data processing apparatus makes the CNN selection uniform, the CNN selected as a whole can be made uniform if the average value calculation is repeated a number of times.

데이터 처리 장치는 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출한다(S730). 예를 들어, 입력받은 영상 데이터는 정규화될 수 있다. 평균 모델의 손실값은 정규화된 영상 데이터와 학습된 가중치 파라미터에 기초하여 평균값 산출에 포함된 복수의 CNN 각각에서 출력된 출력값의 평균값일 수 있다.The data processing apparatus calculates the loss value of the average model including a plurality of CNNs included in the average value calculation and the auxiliary loss value for each of the plurality of CNNs included in the average value calculation (S730). For example, the input image data can be normalized. The loss value of the average model may be an average value of output values output from each of the plurality of CNNs included in the average value calculation based on the normalized image data and the learned weight parameter.

데이터 처리 장치는 산출된 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 총 손실값을 산출한다(S740). 그리고, 데이터 처리 장치는 총 손실값에 기초하여 평균값을 미세 조정한다(S750).The data processing apparatus calculates the total loss value based on the calculated loss value of the average model and the calculated auxiliary loss value (S740). Then, the data processing apparatus finely adjusts the average value based on the total loss value (S750).

상술한 다양한 실시 예에 따른 영상 데이터 처리 장치의 제어 방법은 컴퓨터 프로그램 제품으로 제공될 수도 있다. 컴퓨터 프로그램 제품은 S/W 프로그램 자체 또는 S/W 프로그램이 저장된 비일시적 판독 가능 매체(non-transitory computer readable medium)를 포함할 수 있다.The control method of the image data processing apparatus according to the above-described various embodiments may be provided as a computer program product. The computer program product may include a software program itself or a non-transitory computer readable medium in which the software program is stored.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다. A non-transitory readable medium is a medium that stores data for a short period of time, such as a register, cache, memory, etc., but semi-permanently stores data and is readable by the apparatus. In particular, the various applications or programs described above may be stored on non-volatile readable media such as CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM,

또한, 이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

100: 영상 데이터 처리 장치 110: 입력부
120: CNN 130: 제어부100: image data processing device 110: input part
120: CNN 130:

Claims

Receiving image data;
Calculating an average value of output values output from a plurality of CNNs based on the input image data;
A loss value calculation step of calculating a loss value of an average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of a plurality of CNNs included in the average value calculation;
Calculating a total loss value based on the calculated loss value of the average model and the calculated auxiliary loss value; And
And finely adjusting the average value based on the calculated total loss value.

The method according to claim 1,
The loss value of the average model is calculated by:
Which is a loss value of an average value of output values output from each of the plurality of CNNs included in the average value calculation based on the normalized image data and the learned weight parameter, Control method.

The method according to claim 1,
The step of calculating the average value includes:
And excluding the CNNs arbitrarily selected from all of the plurality of CNNs, and calculating the average value.

The method of claim 3,
The step of calculating the average value includes:
CNN is arbitrarily selected and excluded, and the average value is calculated.

5. The method of claim 4,
The step of calculating the average value includes:
And selecting CNNs to be excluded with a predetermined probability among all of the plurality of CNNs uniformly.

An input unit for receiving image data;
A plurality of CNNs for performing learning based on the input image data; And
And a controller for calculating an average value of output values output from the plurality of CNNs,
Wherein,
Calculating a loss value of an average model including a plurality of CNNs included in the average value calculation, and an auxiliary loss value for each of a plurality of CNNs included in the average value calculation, calculating a loss value of the calculated average model, Calculates a total loss value based on the loss value, and fine-adjusts the average value based on the calculated total loss value.

The method according to claim 6,
The loss value of the average model is calculated by:
And normalizing the input image data and calculating a loss value for an average value of output values output from each of the plurality of CNNs included in the average value calculation based on the normalized image data and the learned weight parameter.

The method according to claim 6,
Wherein,
And eliminates CNNs arbitrarily selected from all the plurality of CNNs and calculates the average value.

9. The method of claim 8,
Wherein,
CNN is arbitrarily selected and excluded, and the average value is calculated.

10. The method of claim 9,
Wherein,
And to uniformly select CNNs to be excluded with a predetermined probability among all of the plurality of CNNs.