KR102426491B1

KR102426491B1 - Image data processing apparatus using ensamble and fine tunning and controlling method thereof

Info

Publication number: KR102426491B1
Application number: KR1020170147464A
Authority: KR
Inventors: 정희철; 권순; 김준광; 이진희; 정우영; 최민국
Original assignee: 재단법인대구경북과학기술원
Priority date: 2017-11-07
Filing date: 2017-11-07
Publication date: 2022-07-28
Also published as: KR20190051619A

Abstract

영상 데이터 처리 장치 및 제어 방법이 개시된다. 영상 데이터 처리 장치의 제어 방법은 영상 데이터를 입력받는 단계, 입력받은 영상 데이터에 기초하여 복수의 CNN에서 출력되는 출력값의 평균값을 산출하는 단계, 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출하는 손실값 산출 단계, 산출된 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 총 손실값을 산출하는 단계 및 산출된 총 손실값에 기초하여 평균값을 미세 조정하는 단계를 포함한다.An image data processing apparatus and a control method are disclosed. The control method of the image data processing apparatus includes the steps of receiving image data, calculating an average value of output values output from a plurality of CNNs based on the received image data, and an average model including a plurality of CNNs included in the average value calculation. A loss value calculation step of calculating an auxiliary loss value for each of the plurality of CNNs included in the loss value and average value calculation, calculating a total loss value based on the calculated average model loss value and the calculated auxiliary loss value, and calculation and fine-tuning the average value based on the total loss value obtained.

Description

Image data processing apparatus and control method using ensemble and fine adjustment

본 개시는 영상 데이터 처리 장치 및 제어 방법에 관한 것으로, 더욱 상세하게는 여러 네트워크 모델을 앙상블(ensemble)하는 앙상블 및 미세 조정을 이용한 영상 데이터 처리 장치 및 제어 방법에 관한 것이다.The present disclosure relates to an image data processing apparatus and a control method, and more particularly, to an image data processing apparatus and a control method using an ensemble and fine adjustment for ensemble of several network models.

지능형 교통 감시 시스템의 개발은 최근 중요한 이슈로 부각되고 있다. 지능형 교통 감시 시스템에서 가장 중요한 기능의 하나는 인식된 영상 데이터로부터 유용한 분석하고 추출하는 것이다. 특히, 차량 또는 보행자를 분류하고 지역화하는 기술은 교통 감시 분석을 위한 기본적인 과장이다.The development of an intelligent traffic monitoring system has recently emerged as an important issue. One of the most important functions of an intelligent traffic monitoring system is useful analysis and extraction from recognized image data. In particular, techniques to classify and localize vehicles or pedestrians are fundamental exaggerations for traffic monitoring analysis.

차량 분류 및 지역화를 위한 기존 기술은 수많은 교통 카메라에 의해 일정한 주기로 이미지를 촬영하여 실제 교통 감지 환경을 얻는다. 그리고, 얻은 수많은 이미지를 분석하여 차량 분류 및 지역화한다. 그러나, 기존 기술의 분석 결과의 정확도는 만족할 만한 수준이 아니다.Existing technologies for vehicle classification and localization obtain a real traffic detection environment by taking images at regular intervals by numerous traffic cameras. Then, it analyzes the numerous images obtained to classify and localize the vehicle. However, the accuracy of the analysis results of existing techniques is not satisfactory.

영상 인식의 정확도를 높이기 위해 복수 개의 영상 인식 네트워크 모델을 이용하는 방법이 있다. 그러나, 복수 개의 네트워크 모델의 출력값을 평균하여 영상 인식 성능을 향상시킬 수 있지만 네트워크들 간의 시너지 효과가 극대화되지 못하여 효율적이지 못하다. 따라서, 복수 개의 네트워크 모델을 사용하여 영상 인식 성능을 향상시키고 시너지 효과를 극대화시켜 효율적인 영상 인식 장치에 대한 필요성이 존재한다.In order to increase the accuracy of image recognition, there is a method of using a plurality of image recognition network models. However, although the image recognition performance can be improved by averaging the output values of a plurality of network models, the synergistic effect between the networks is not maximized, which is not efficient. Therefore, there is a need for an efficient image recognition apparatus by using a plurality of network models to improve image recognition performance and maximize synergy.

본 개시는 상술한 문제점을 해결하기 위한 것으로, 본 개시의 목적은 영상 인식 성능을 향상시키고 복수 개의 네트워크 모델 간의 시너지를 극대화하는 영상 데이터 처리 장치 및 제어 방법을 제공하는 것이다.SUMMARY The present disclosure provides an image data processing apparatus and control method for improving image recognition performance and maximizing synergy between a plurality of network models.

이상과 같은 목적을 달성하기 위한 본 개시의 일 실시 예에 따르면, 영상 데이터 처리 장치의 제어 방법은 영상 데이터를 입력받는 단계, 상기 입력받은 영상 데이터에 기초하여 복수의 CNN에서 출력되는 출력값의 평균값을 산출하는 단계, 상기 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 상기 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출하는 손실값 산출 단계, 상기 산출된 평균 모델의 손실값과 상기 산출된 보조 손실값에 기초하여 총 손실값을 산출하는 단계 및 상기 산출된 총 손실값에 기초하여 상기 평균값을 미세 조정하는 단계를 포함한다.According to an embodiment of the present disclosure for achieving the above object, a control method of an image data processing apparatus includes receiving image data, and an average value of output values output from a plurality of CNNs based on the received image data. Calculating, a loss value calculating step of calculating a loss value of an average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of a plurality of CNNs included in the average value calculation, the calculated average model calculating a total loss value based on the loss value of , and the calculated auxiliary loss value, and finely adjusting the average value based on the calculated total loss value.

그리고, 상기 평균 모델의 손실값은 상기 입력받은 영상 데이터를 정규화하고, 상기 정규화된 영상 데이터와 학습된 가중치 파라미터에 기초하여 상기 평균값 산출에 포함된 복수의 CNN 각각에서 출력된 출력값의 평균값에 대한 손실값일 수 있다.And, the loss value of the average model normalizes the input image data, and based on the normalized image data and the learned weight parameter, the loss with respect to the average value of the output values output from each of the plurality of CNNs included in the average value calculation can be a value.

한편, 상기 평균값을 산출하는 단계는 복수의 CNN 전체 중 임의로 선택된 CNN을 배제시키고 상기 평균값을 산출할 수 있다.Meanwhile, the calculating of the average value may include calculating the average value by excluding an arbitrarily selected CNN from among all the plurality of CNNs.

그리고, 상기 평균값을 산출하는 단계는 CNN을 임의로 선택하여 배제시키고 상기 평균값을 산출하는 과정을 반복할 수 있다.In addition, the calculating of the average value may include arbitrarily selecting and excluding CNNs and repeating the process of calculating the average value.

또한, 상기 평균값을 산출하는 단계는 상기 복수의 CNN 전체 중 기 설정된 확률로 배제시킬 CNN을 균일하게 선택할 수 있다.In addition, the calculating of the average value may include uniformly selecting a CNN to be excluded with a preset probability among all the plurality of CNNs.

이상과 같은 목적을 달성하기 위한 본 개시의 일 실시 예에 따르면, 영상 데이터 처리 장치는 영상 데이터를 입력받는 입력부, 상기 입력받은 영상 데이터에 기초하여 학습을 수행하는 복수의 CNN 및 상기 복수의 CNN에서 출력되는 출력값의 평균값을 산출하는 제어부를 포함하고, 상기 제어부는 상기 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 상기 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출하고, 상기 산출된 평균 모델의 손실값과 상기 산출된 보조 손실값에 기초하여 총 손실값을 산출하며, 상기 산출된 총 손실값에 기초하여 상기 평균값을 미세 조정한다.According to an embodiment of the present disclosure for achieving the above object, an image data processing apparatus includes an input unit for receiving image data, a plurality of CNNs that perform learning based on the received image data, and the plurality of CNNs. A control unit for calculating an average value of output values, wherein the control unit includes a loss value of an average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of the plurality of CNNs included in the average value calculation and calculates a total loss value based on the calculated loss value of the average model and the calculated auxiliary loss value, and finely adjusts the average value based on the calculated total loss value.

한편, 상기 제어부는 복수의 CNN 전체 중 임의로 선택된 CNN을 배제시키고 상기 평균값을 산출할 수 있다.Meanwhile, the control unit may calculate the average value by excluding an arbitrarily selected CNN from among all the plurality of CNNs.

그리고, 상기 제어부는 CNN을 임의로 선택하여 배제시키고 상기 평균값을 산출하는 과정을 반복할 수 있다.In addition, the control unit may arbitrarily select and exclude CNNs and repeat the process of calculating the average value.

또한, 상기 제어부는 상기 복수의 CNN 전체 중 기 설정된 확률로 배제시킬 CNN을 균일하게 선택할 수 있다.In addition, the control unit may uniformly select a CNN to be excluded with a preset probability among all of the plurality of CNNs.

이상 설명한 바와 같이, 본 개시의 다양한 실시 예에 따르면, 영상 데이터 처리 장치 및 제어 방법은 영상 인식 성능을 향상시키고 복수 개의 네트워크 모델 간의 시너지를 극대화할 수 있다.As described above, according to various embodiments of the present disclosure, the image data processing apparatus and control method may improve image recognition performance and maximize synergy between a plurality of network models.

도 1은 본 개시의 일 실시 예에 따른 영상 데이터 처리 장치의 블록도이다.
도 2는 본 개시의 일 실시 예에 따른 네트워크 융합을 위한 미세 조정을 설명하는 도면이다.
도 3은 본 개시의 일 실시 예에 따른 네트워크 융합 모델을 설명하는 도면이다.
도 4는 본 개시의 일 실시 예에 따른 영역 기반 영상 데이터 처리 과정을 설명하는 도면이다.
도 5는 본 개시의 일 실시 예에 따른 영역 기반 데이터 처리 장치를 위한 멀티 네트워크 융합 모델을 설명하는 도면이다.
도 6은 네트워크 융합 모델의 정확도(precision)와 검출(recall)에 대한 결과를 나타낸 도면이다.
도 7은 본 개시의 일 실시 예에 따른 영상 데이터 처리 장치의 흐름도이다.1 is a block diagram of an image data processing apparatus according to an embodiment of the present disclosure.
2 is a diagram for explaining fine adjustment for network convergence according to an embodiment of the present disclosure.
3 is a diagram illustrating a network convergence model according to an embodiment of the present disclosure.
4 is a view for explaining a region-based image data processing process according to an embodiment of the present disclosure.
5 is a diagram illustrating a multi-network convergence model for a region-based data processing apparatus according to an embodiment of the present disclosure.
FIG. 6 is a diagram showing results for precision and recall of a network convergence model.
7 is a flowchart of an image data processing apparatus according to an embodiment of the present disclosure.

이하에서는 첨부된 도면을 참조하여 다양한 실시 예를 보다 상세하게 설명한다. 본 명세서에 기재된 실시 예는 다양하게 변형될 수 있다. 특정한 실시 예가 도면에서 묘사되고 상세한 설명에서 자세하게 설명될 수 있다. 그러나, 첨부된 도면에 개시된 특정한 실시 예는 다양한 실시 예를 쉽게 이해하도록 하기 위한 것일 뿐이다. 따라서, 첨부된 도면에 개시된 특정 실시 예에 의해 기술적 사상이 제한되는 것은 아니며, 발명의 사상 및 기술 범위에 포함되는 모든 균등물 또는 대체물을 포함하는 것으로 이해되어야 한다.Hereinafter, various embodiments will be described in more detail with reference to the accompanying drawings. The embodiments described herein may be variously modified. Certain embodiments may be depicted in the drawings and described in detail in the detailed description. However, the specific embodiments disclosed in the accompanying drawings are only provided to facilitate understanding of the various embodiments. Therefore, the technical spirit is not limited by the specific embodiments disclosed in the accompanying drawings, and it should be understood to include all equivalents or substitutes included in the spirit and scope of the invention.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이러한 구성요소들은 상술한 용어에 의해 한정되지는 않는다. 상술한 용어는 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. Terms including an ordinal number, such as first, second, etc., may be used to describe various elements, but these elements are not limited by the above-described terms. The above terminology is used only for the purpose of distinguishing one component from another component.

본 명세서에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.In the present specification, terms such as "comprises" or "have" are intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, but one or more other features It is to be understood that this does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof. When a component is referred to as being “connected” or “connected” to another component, it is understood that the other component may be directly connected or connected to the other component, but other components may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

한편, 본 명세서에서 사용되는 구성요소에 대한 "모듈" 또는 "부"는 적어도 하나의 기능 또는 동작을 수행한다. 그리고, "모듈" 또는 "부"는 하드웨어, 소프트웨어 또는 하드웨어와 소프트웨어의 조합에 의해 기능 또는 동작을 수행할 수 있다. 또한, 특정 하드웨어에서 수행되어야 하거나 적어도 하나의 제어부에서 수행되는 "모듈" 또는 "부"를 제외한 복수의 "모듈들" 또는 복수의 "부들"은 적어도 하나의 모듈로 통합될 수도 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.Meanwhile, as used herein, a “module” or “unit” for a component performs at least one function or operation. In addition, a “module” or “unit” may perform a function or operation by hardware, software, or a combination of hardware and software. In addition, a plurality of “modules” or a plurality of “units” other than a “module” or “unit” to be performed in specific hardware or performed by at least one control unit may be integrated into at least one module. The singular expression includes the plural expression unless the context clearly dictates otherwise.

그 밖에도, 본 발명을 설명함에 있어서, 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우, 그에 대한 상세한 설명은 축약하거나 생략한다.In addition, in describing the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be abbreviated or omitted.

도 1은 본 개시의 일 실시 예에 따른 영상 데이터 처리 장치의 블록도이다.1 is a block diagram of an image data processing apparatus according to an embodiment of the present disclosure.

도 1을 참조하면, 영상 데이터 처리 장치(100)는 입력부(110), 복수의 CNN(Covolutional Neural Network)(120) 및 제어부(130)를 포함한다. 입력부(110)는 영상 데이터를 입력받는다. 예를 들어, 영상 데이터는 차량을 포함하는 이미지일 수 있다. 예를 들어, 입력부(110)는 통신 인터페이스로 연결된 통신 모듈, 입력 인터페이스로 연결된 입력 단자 등을 포함할 수 있다. 즉, 입력부(110)가 통신 모듈로 구현되는 경우, 영상 데이터 처리 장치(100)는 유무선 통신 방식을 이용하여 영상 데이터를 입력받을 수 있다. 또는, 입력부(110)가 입력 단자로 구현되는 경우, 영상 데이터 처리 장치(100)는 내외부 저장 장치로부터 영상 데이터를 입력받을 수 있다.Referring to FIG. 1 , the image data processing apparatus 100 includes an input unit 110 , a plurality of Covolutional Neural Networks (CNNs) 120 , and a control unit 130 . The input unit 110 receives image data. For example, the image data may be an image including a vehicle. For example, the input unit 110 may include a communication module connected to the communication interface, an input terminal connected to the input interface, and the like. That is, when the input unit 110 is implemented as a communication module, the image data processing apparatus 100 may receive image data using a wired/wireless communication method. Alternatively, when the input unit 110 is implemented as an input terminal, the image data processing apparatus 100 may receive image data from an internal or external storage device.

복수의 CNN(120)은 적어도 두 개 이상의 CNN(120-1, 120-2, 120-3)을 포함할 수 있다. 복수의 CNN(120) 각각은 입력된 영상 데이터를 전달받아 학습을 수행한다. 복수의 CNN(120)은 학습 과정에서 미세 조정 과정 및 앙상블 과정을 통해 영상 데이터의 인식 성능을 향상시킬 수 있다. 미세 조정 과정은 영상의 인식률을 높이기 위해 에러를 보정하는 과정이고, 앙상블 과정은 복수의 CNN의 출력값을 평균하는 과정을 의미한다.The plurality of CNNs 120 may include at least two or more CNNs 120-1, 120-2, and 120-3. Each of the plurality of CNNs 120 performs learning by receiving input image data. The plurality of CNNs 120 may improve recognition performance of image data through a fine-tuning process and an ensemble process in the learning process. The fine-tuning process is a process of correcting errors in order to increase the recognition rate of an image, and the ensemble process means a process of averaging the output values of a plurality of CNNs.

제어부(130)는 복수의 CNN(120)에서 출력되는 1차적으로 학습된 출력값의 평균값을 산출한다. 그리고, 제어부(130)는 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출한다. 영상 데이터 처리 장치(100)는 앙상블 과정에서 전체 CNN 중에서 일부 CNN의 출력값을 배제할 수 있다. 즉, 제어부(130)는 평균값 산출에 전체 CNN(120)의 출력값을 평균한 평균값을 산출할 수 있고, 제1 내지 제n CNN(120-1, 120-2, 120-3) 중 적어도 하나의 CNN을 배제하고 나머지 CNN의 출력값을 평균한 평균값을 산출할 수도 있다. 제어부(130)는 배제되는 CNN을 임의로 선택할 수 있다. 그리고, 제어부(130)는 CNN 출력값의 평균값 산출 과정을 반복할 수 있는데, 반복할 때마다 서로 다른 CNN을 임의로 선택해서 배제할 수 있다. 물론, 반복되는 평균값 산출 과정에서 평균값 산출에 사용되는 CNN은 동일한 CNN을 포함할 수도 있다. 제어부(130)는 평균값 산출에서 배제되는 CNN을 기 설정된 확률로 균일하게 선택할 수 있다. 구체적인 앙상블 과정은 후술한다.The control unit 130 calculates an average value of the output values output primarily from the plurality of CNNs 120 . Then, the control unit 130 calculates a loss value of the average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of the plurality of CNNs included in the average value calculation. The image data processing apparatus 100 may exclude output values of some CNNs from among all CNNs in the ensemble process. That is, the control unit 130 may calculate an average value obtained by averaging the output values of all CNNs 120 to calculate the average value, and at least one of the first to n-th CNNs 120-1, 120-2, 120-3 It is also possible to calculate an average value by excluding the CNN and averaging the output values of the remaining CNNs. The controller 130 may arbitrarily select an excluded CNN. Then, the controller 130 may repeat the process of calculating the average value of the CNN output values, and each time it is repeated, a different CNN may be arbitrarily selected and excluded. Of course, the CNN used to calculate the average value in the repeated average value calculation process may include the same CNN. The controller 130 may uniformly select CNNs excluded from calculating the average value with a preset probability. A detailed ensemble process will be described later.

제어부(130)는 산출된 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 총 손실값을 산출하며, 산출된 총 손실값에 기초하여 평균값을 미세 조정(Joint fine tuning)한다. 예를 들어, 평균 모델의 손실값은 입력받은 영상 데이터를 정규화하고, 정규화된 영상 데이터와 학습된 가중치 파라미터에 기초하여 평균값 산출에 포함된 복수의 CNN 각각에서 출력된 출력값의 평균값에 대한 손실값일 수 있다. 구체적인 미세 조정 과정은 후술한다.The controller 130 calculates a total loss value based on the calculated loss value of the average model and the calculated auxiliary loss value, and fine-tunes the average value based on the calculated total loss value (joint fine tuning). For example, the loss value of the average model normalizes the input image data, and based on the normalized image data and the learned weight parameter, the loss value for the average value of the output values output from each of a plurality of CNNs included in the average value calculation Can be have. A detailed fine-tuning process will be described later.

도 2는 본 개시의 일 실시 예에 따른 네트워크 융합을 위한 미세 조정을 설명하는 도면이다.2 is a diagram for explaining fine adjustment for network convergence according to an embodiment of the present disclosure.

도 2를 참조하면 복수의 CNN과 손실값이 도시되어 있다. 일 실시 예로서, 분류 태스크를 위한 CNN은 최단거리 연결을 적용하는 18 레이어를 가진 pre activation ResNet을 기반으로 할 수 있다. 학습을 위한 영상 데이터는 224×224×3의 크기를 가진 이미지일 수 있다. 학습을 위한 영상 데이터의 크기는 다양하기 때문에 제어부(130)는 각 학습을 위한 영상 데이터를 리사이징하고 랜덤하게 224×224×3으로 잘라낼 수 있다. 각 영상 데이터의 RGB 픽셀은 아래의 식과 같이 정규화될 수 있다.Referring to FIG. 2 , a plurality of CNNs and loss values are shown. As an embodiment, the CNN for the classification task may be based on a pre-activation ResNet with 18 layers that applies the shortest distance connection. The image data for learning may be an image having a size of 224×224×3. Since the size of the image data for learning varies, the controller 130 may resize the image data for each learning and randomly cut it to 224×224×3. RGB pixels of each image data may be normalized as shown in the following equation.

---------- (1)

---------- (One)

여기서, x(i, j)는 (i, j) 좌표에서 RGB 픽셀 값을 가지는 벡터일 수 있다. 측광 왜곡(photometric distortion)과 컬러 증가율(color augmentation)은 좋은 결과를 얻기 위해 적용될 수 있다. 각 CNN은 ReLU(Rectified Linear Unit)를 활성화 함수로 이용하고 배치 정규화(batch normalization)가 각 레이어에 사용될 수 있다. 네트워크의 출력 개수는 데이터셋의 카테고리의 수와 동일할 수 있고, 일례로 11일 수 있다. 각 CNN의 최종 출력에는 소프트맥스 레이어가 적용될 수 있다.Here, x(i, j) may be a vector having RGB pixel values at (i, j) coordinates. Photometric distortion and color augmentation can be applied to get good results. Each CNN uses a Rectified Linear Unit (ReLU) as an activation function, and batch normalization can be used for each layer. The number of outputs of the network may be the same as the number of categories of the dataset, and may be 11, for example. A softmax layer can be applied to the final output of each CNN.

영상 데이터 처리 장치는 전체 CNN 중 일부 CNN을 앙상블하기 위해 미세 조정 과정을 수행할 수 있다. 영상 데이터 처리 장치는 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 총 손실값을 산출한다. 총 손실

는 다음과 같이 정의될 수 있다.The image data processing device may perform a fine-tuning process to ensemble some CNNs among all CNNs. The image data processing apparatus calculates a total loss value based on the loss value of the average model and the calculated auxiliary loss value. total loss

can be defined as follows.

---------- (2)

여기서 N은 CNN 모델의 개수이고,

는 평균 모델의 손실 함수이다. 그리고,

는 i번째 CNN의 보조(auxiliary) 손실이다. 손실값은 학습 단계에서만 사용될 수 있다. o는 다음과 같이 정의된다.where N is the number of CNN models,

is the loss function of the average model. and,

is the auxiliary loss of the i-th CNN. Loss values can only be used in the learning phase. o is defined as

---------- (3)

여기서,

는 w_i의 학습된 가중치 파라미터를 가지는 i번째 CNN의 출력이고,

는 식(1)에 의해 정규화된 입력 데이터를 나타낸다. 각 네트워크는

를 최소화하기 위해 학습 비율

의 아주 작은 값을 사용하여 미세 조정될 수 있다. 예를 들어,

= 1e^-5일 수 있다. 가중치 파라미터의 값은 초기에 임의의 값일 수 있다. 그리고, 가중치 파라미터는 학습에 따라 가변되고, 역전파 과정을 통해 학습이 종료될 때까지 계속적으로 수정될 수 있다.here,

is the output of the i-th CNN with the learned weight parameters of w _i ,

denotes the input data normalized by Equation (1). each network

learning rate to minimize

can be fine-tuned using very small values of for example,

= 1e ^-5 . The value of the weight parameter may initially be any value. In addition, the weight parameter may vary according to learning and may be continuously modified through the backpropagation process until learning is terminated.

즉, 도 2를 참조하면 제1 내지 제n CNN이 도시되어 있다. 그리고, 각 CNN의 보조 손실값이 산출될 수 있다. 그리고, 평균값 산출에 사용된 CNN(평균 모델)의 평균값에 대한 손실값(

)이 산출될 수 있다. 총 손실값은 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 산출될 수 있다. 영상 데이터 처리 장치는 총 손실값을 최소화하도록 손실값을 산출하고 산출된 결과에 기초하여 복수의 CNN을 미세 조정할 수 있다.That is, referring to FIG. 2 , first to nth CNNs are illustrated. Then, an auxiliary loss value of each CNN may be calculated. And, the loss value for the average value of the CNN (average model) used to calculate the average value (

) can be calculated. The total loss value may be calculated based on the loss value of the average model and the calculated auxiliary loss value. The image data processing apparatus may calculate a loss value to minimize the total loss value and fine-tune the plurality of CNNs based on the calculated result.

도 3은 본 개시의 일 실시 예에 따른 네트워크 융합 모델을 설명하는 도면이다.3 is a diagram illustrating a network convergence model according to an embodiment of the present disclosure.

도 3(a) 및 도 3(b)를 참조하면, 앙상블 과정이 도시되어 있다. 앙상블 과정은 전체 CNN 중 일부 CNN을 선택하여 평균값을 산출하는 과정이다. 영상 데이터 처리 장치는 평균값 산출에 사용되는 CNN을 균일하고 랜덤하게 선택할 수 있다. 반대로 말하면, 영상 데이터 처리 장치는 균일하고 랜덤하게 CNN을 선택하여 평균값 산출 과정에서 배제시킬 수 있다. Referring to Figs. 3(a) and 3(b), the ensemble process is illustrated. The ensemble process is a process of selecting some CNNs among all CNNs and calculating the average value. The image data processing apparatus may uniformly and randomly select the CNN used for calculating the average value. Conversely, the image data processing apparatus can select CNNs uniformly and randomly and exclude them from the average value calculation process.

영상 데이터 처리 장치는 1-P_d의 확률로 평균값 산출에 사용될 CNN을 선택할 수 있다. 여기서 P_d는 드랍율(drop rate)을 나타낸다. 일 실시 예로서, P_d는 0.1일 수 있다. The image data processing apparatus may select a CNN to be used for calculating the average value with a probability of 1-P _d . Here, P _d represents a drop rate. As an embodiment, P _d may be 0.1.

그리고, 영상 데이터 처리 장치는 앙상블 과정을 반복할 수 있다. 영상 데이터 처리 장치는 랜덤하게 CNN을 선택하기 때문에 첫번째 과정과 두번째 과정에서 평균값 산출에 동일한 CNN이 선택될 수 있고 배제될 수도 있다.Then, the image data processing apparatus may repeat the ensemble process. Since the image data processing apparatus randomly selects a CNN, the same CNN may be selected or excluded for average value calculation in the first and second processes.

예를 들어, 도 3(a)를 참조하면 첫번째 앙상블 과정에서 제1, 제2, 제4 및 제5 CNN이 평균값 산출에 선택되고, 제3, 제6 및 제7 CNN은 배제될 수 있다. 그리고, 도 3(b)를 참조하면 두번째 앙상블 과정에서 제1, 제2, 제4, 제6 및 제7 CNN이 평균값 산출에 선택되고, 제3 및 제5 CNN은 배제될 수 있다.For example, referring to FIG. 3A , in the first ensemble process, the first, second, fourth, and fifth CNNs may be selected for calculating the average value, and the third, sixth, and seventh CNNs may be excluded. And, referring to FIG. 3(b) , in the second ensemble process, the first, second, fourth, sixth, and seventh CNNs may be selected for calculating the average value, and the third and fifth CNNs may be excluded.

선택된 CNN들은 학습 단계에서 예측을 위해 평균값을 산출하는 데 이용될 수 있다. 일부 CNN을 선택하여 평균값을 산출하는 과정은 학습 단계에서 좋은 예측을 하는데 도움을 줄 수 있다. 즉, 앙상블 과정은 표 1에서 보여주는 바와 같이 학습 단계의 모든 모델을 이용하여 예측을 수행할 때보다 좋은 결과를 나타낸다.The selected CNNs can be used to calculate an average value for prediction in the learning phase. The process of selecting some CNNs and calculating the average value can help to make good predictions in the learning phase. That is, as shown in Table 1, the ensemble process shows better results than when prediction is performed using all models in the learning stage.

MethodMethod EnsembleEnsemble JFJF 본 실시 예this example Cohen KappaCohen Kappa 0.95670.9567 0.95760.9576 0.95780.9578 AccuracyAccuracy 0.97210.9721 0.97270.9727 0.97280.9728 Mean PrecisionMean Precision 0.92930.9293 0.93140.9314 0.93070.9307 Mean RecallMean Recall 0.90180.9018 0.89850.8985 0.89910.8991

테스트 단계에서 더 정확한 예측을 위해 영상 데이터 처리 장치는 멀티 크랍 테스트(multi-crop test)를 수행할 수 있다. 예를 들어, 영상 데이터 처리 장치는 네트워크의 fully connected layer를 fully convolutional 형태로 변형되어 구현되고, 멀티플 스케일에서 값을 평균할 수도 있다. 그리고, 영상 데이터 처리 장치는 미세 조정 과정 이후 상술한 앙상블 과정을 통해 학습된 모델의 소프트맥스 출력값을 평균할 수 있다.For more accurate prediction in the test stage, the image data processing apparatus may perform a multi-crop test. For example, the image data processing apparatus may be implemented by transforming a fully connected layer of a network into a fully convolutional form, and may average values on multiple scales. In addition, after the fine adjustment process, the image data processing apparatus may average the softmax output values of the models learned through the ensemble process described above.

도 4는 본 개시의 일 실시 예에 따른 영역 기반 영상 데이터 처리 과정을 설명하는 도면이다.4 is a view for explaining a region-based image data processing process according to an embodiment of the present disclosure.

도 4를 참조하면 CFE(Convolutional Feature Extraction) 레이어와 RFL(RoI Feature Localization) 레이어를 포함하는 지역 기반 컨벌류션 감지기의 구조가 도시되어 있다.Referring to FIG. 4 , a structure of a region-based convolutional detector including a Convolutional Feature Extraction (CFE) layer and a RoI Feature Localization (RFL) layer is illustrated.

지역 기반 컨벌류션 감지기는 위치 민감 스코어 맵(position-sensitive score map)과 위치 민감 RoI 풀링(position-sensitive RoI pooling)을 지역 기반 fully convolutional network(R-FCN)에 조합한 지역 기반 CNN일 수 있다. 지역 기반 컨벌류션 감지기는 CFE 레이어와 RoI 레이어를 포함할 수 있다. CFE 레이어는 백본 아키텍처(backborn architecture)로부터 CNN 특징을 추출할 수 있다. 그리고, RFL 레이어는 RoI 특징과 NMS(Non-Maximum Suppression)를 가지는 바운딩 박스를 지역화(localize)할 수 있다.The region-based convolutional detector may be a region-based CNN that combines a position-sensitive score map and position-sensitive RoI pooling into a region-based fully convolutional network (R-FCN). A region-based convolutional detector may include a CFE layer and a RoI layer. The CFE layer can extract CNN features from the backborn architecture. In addition, the RFL layer may localize a bounding box having a RoI feature and a Non-Maximum Suppression (NMS).

일 실시 예로서, 영상 데이터 처리 장치는 수십만개의 학습 이미지로 학습하고, 11개의 객체 클래스를 포함하는 약 3만개의 테스트 이미지로 테스트될 수 있다. 예를 들어, 11개의 객체 클래스는 연결식 트럭(articulated truck), 자전거(bicycle), 버스(bus), 차(car), 오토바이(motorcycle), 무동력 차량(non-motorized vechile), 보행자(pedestrian), 소형 트럭(pickup truck), 단일 구성 트럭(single unit truck), 작업 밴(work van), 배경(background)일 수 있다.As an embodiment, the image data processing apparatus may be trained with hundreds of thousands of training images and tested with about 30,000 test images including 11 object classes. For example, the 11 object classes are articulated truck, bicycle, bus, car, motorcycle, non-motorized vechile, pedestrian, It may be a pickup truck, a single unit truck, a work van, or a background.

일 실시 예로서, 영상 데이터 처리 장치는 제공된 학습 테이터와 크로스 엔트로피 손실(cross-entropy loss) 및 박스 회귀 손실(box regression loss)의 합으로 구성된 객체 함수를 가지고 분류자(classifier)를 학습할 수 있다.As an embodiment, the image data processing apparatus may learn a classifier using the provided training data and an object function composed of a sum of cross-entropy loss and box regression loss. .

---------- (4)

c*는 c*가 0일 때 백그라운드 레이블(label)을 가지는 RoI 그라운드 진실 레이블(RoI ground truth label)이고, I(c*)는 c*가 백그라운드 레이블(c*=0)을 가지는 경우를 제외한 1과 동일한 지시자(indicator)이다.c* is the RoI ground truth label having a background label when c* is 0, and I(c*) is the case where c* has a background label (c*=0) except The same indicator as 1.

는 클래스 출력(Oc*)이 주어졌을 때 분류를 위한 크로스 엔트로피 손실고,

는 t_b=(t(x), t(y), t(w), t(h))를 가지는 바운딩 박스 회귀 손실이다. 그리고,

는 균형 웨이트(balance weight)로서 1일 수 있다.

is the cross-entropy loss for classification given the class output (Oc*),

is the bounding box regression loss with t _b =(t(x), t(y), t(w), t(h)). and,

may be 1 as a balance weight.

도 5는 본 개시의 일 실시 예에 따른 영역 기반 데이터 처리 장치를 위한 멀티 네트워크 융합 모델을 설명하는 도면이고, 도 6은 네트워크 융합 모델의 정확도(precision)와 검출(recall)에 대한 결과를 나타낸 도면이다.5 is a diagram illustrating a multi-network convergence model for a region-based data processing apparatus according to an embodiment of the present disclosure, and FIG. 6 is a diagram illustrating the results of precision and detection of the network convergence model. to be.

도 5를 참조하면 복수의 CFE 레이어와 복수의 RFL 레이어가 도시되어 있다. 영상 데이터 처리 장치의 각각의 R-FCN 모델이 학습된 후에 도 5에 도시된 바와 같이 하나의 지역화 모델로 앙상블될 수 있다. 각 모델은 입력 데이터로서 영상 데이터를 입력받고 영상 데이터에 대한 최초 바운딩 박스를 생성할 수 있다. 이러한 각 모델의 바운딩 박스는 최종적으로 나의 NMS로 병합될 수 있다. 표 2에 나타난 바와 같이, 본 개시의 4개의 학습된 R-FCN을 이용한 앙상블 모델은 전체 클래스의 79.24% mAP의 결과를 보여주었으며, 기존 기술 대비 높은 정확도를 보여준다.Referring to FIG. 5 , a plurality of CFE layers and a plurality of RFL layers are illustrated. After each R-FCN model of the image data processing apparatus is trained, it may be ensembled into one localization model as shown in FIG. 5 . Each model may receive image data as input data and generate an initial bounding box for the image data. The bounding box of each of these models can be finally merged into my NMS. As shown in Table 2, the ensemble model using the four learned R-FCNs of the present disclosure showed a result of 79.24% mAP of the entire class, showing high accuracy compared to the existing technique.

[표 2][Table 2]

도 6을 참조하면 최종 앙상블 모델의 각 클래스에 대한 세부적인 mAP가 도시되어 있다.Referring to FIG. 6 , detailed mAPs for each class of the final ensemble model are shown.

지금까지 영상 데이터 처리 장치의 다양한 실시 예를 설명하였다. 아래에서는 영상 데이터 처리 장치 제어 방법의 흐름도를 설명한다.So far, various embodiments of the image data processing apparatus have been described. Hereinafter, a flowchart of a method for controlling an image data processing apparatus will be described.

도 7은 본 개시의 일 실시 예에 따른 영상 데이터 처리 장치의 흐름도이다.7 is a flowchart of an image data processing apparatus according to an embodiment of the present disclosure.

데이터 처리 장치는 영상 데이터를 입력받는다(S710). 일 실시 예로서, 영상 데이터는 통신 인터페이스로 연결된 통신 모듈, 입력 인터페이스로 연결된 입력 단자 등을 통해 데이터 처리 장치로 입력될 수 있다.The data processing apparatus receives image data (S710). As an embodiment, the image data may be input to the data processing apparatus through a communication module connected to the communication interface, an input terminal connected to the input interface, or the like.

데이터 처리 장치는 입력받은 영상 데이터에 기초하여 복수의 CNN에서 출력되는 출력값의 평균값을 산출한다(S720). 데이터 처리 장치는 복수의 CNN 전체 중 임의로 선택된 CNN을 배제시키고 평균값을 산출할 수 있다. 그리고, 데이터 처리 장치는 CNN을 임의로 선택하여 배제시키고 평균값을 산출하는 과정을 반복할 수 있다. 데이터 처리 장치는 복수의 CNN 전체 중 기 설정된 확률로 배제시킬 CNN을 균일하게 선택할 수 있다. 따라서, 평균값 산출이 반복될 때, 이전 과정의 평균값 산출에 선택된 CNN이 다음 과정의 평균값 산출에 선택될 수도 있고, 배제될 수도 있다. 그러나, 데이터 처리 장치는 CNN 선택을 균일하게 하기 때문에 평균값 산출의 횟수가 반복되면 전체적으로 선택된 CNN은 균일해질 수 있다.The data processing apparatus calculates an average value of output values output from a plurality of CNNs based on the received image data (S720). The data processing apparatus may exclude an arbitrarily selected CNN from among all the plurality of CNNs and calculate an average value. In addition, the data processing apparatus may repeat the process of arbitrarily selecting and excluding CNNs and calculating an average value. The data processing apparatus may uniformly select a CNN to be excluded with a preset probability among all of the plurality of CNNs. Therefore, when the average value calculation is repeated, the CNN selected for the average value calculation of the previous process may be selected for the average value calculation of the next process, or it may be excluded. However, since the data processing device makes the CNN selection uniform, the overall selected CNN may be uniform if the number of average calculations is repeated.

데이터 처리 장치는 평균값 산출에 포함된 복수의 CNN을 포함하는 평균 모델의 손실값 및 평균값 산출에 포함된 복수의 CNN 각각에 대한 보조 손실값을 산출한다(S730). 예를 들어, 입력받은 영상 데이터는 정규화될 수 있다. 평균 모델의 손실값은 정규화된 영상 데이터와 학습된 가중치 파라미터에 기초하여 평균값 산출에 포함된 복수의 CNN 각각에서 출력된 출력값의 평균값일 수 있다.The data processing apparatus calculates a loss value of an average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of the plurality of CNNs included in the average value calculation ( S730 ). For example, input image data may be normalized. The loss value of the average model may be an average value of output values output from each of the plurality of CNNs included in the average value calculation based on the normalized image data and the learned weight parameter.

데이터 처리 장치는 산출된 평균 모델의 손실값과 산출된 보조 손실값에 기초하여 총 손실값을 산출한다(S740). 그리고, 데이터 처리 장치는 총 손실값에 기초하여 평균값을 미세 조정한다(S750).The data processing apparatus calculates a total loss value based on the calculated loss value of the average model and the calculated auxiliary loss value (S740). Then, the data processing apparatus finely adjusts the average value based on the total loss value (S750).

상술한 다양한 실시 예에 따른 영상 데이터 처리 장치의 제어 방법은 컴퓨터 프로그램 제품으로 제공될 수도 있다. 컴퓨터 프로그램 제품은 S/W 프로그램 자체 또는 S/W 프로그램이 저장된 비일시적 판독 가능 매체(non-transitory computer readable medium)를 포함할 수 있다.The method for controlling the image data processing apparatus according to the above-described various embodiments may be provided as a computer program product. The computer program product may include the S/W program itself or a non-transitory computer readable medium in which the S/W program is stored.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다. The non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, etc., and can be read by a device. Specifically, the above-described various applications or programs may be provided by being stored in a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

또한, 이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims Various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or prospect of the present invention.

100: 영상 데이터 처리 장치 110: 입력부
120: CNN 130: 제어부100: image data processing device 110: input unit
120: CNN 130: control unit

Claims

receiving image data;
calculating an average value of output values output from a plurality of CNNs based on the received image data;
a loss value calculation step of calculating a loss value of an average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of the plurality of CNNs included in the average value calculation;
calculating a total loss value by summing the calculated loss value of the average model and the calculated auxiliary loss value; and
Including; fine-tuning the average value based on the calculated total loss value;
The loss value of the average model is,
Normalizing the input image data, and a loss value with respect to the average value of the output values output from each of the plurality of CNNs included in the average value calculation based on the normalized image data and the learned weight parameter, control of the image data processing apparatus Way.

delete

According to claim 1,
Calculating the average value comprises:
A method of controlling an image data processing apparatus, excluding a CNN arbitrarily selected from among a plurality of CNNs and calculating the average value.

4. The method of claim 3,
Calculating the average value comprises:
A method of controlling an image data processing apparatus that randomly selects and excludes a CNN and repeats the process of calculating the average value.

5. The method of claim 4,
Calculating the average value comprises:
A method of controlling an image data processing apparatus for uniformly selecting a CNN to be excluded with a preset probability among all of the plurality of CNNs.

an input unit for receiving image data;
a plurality of CNNs for learning based on the received image data; and
Including; a control unit for calculating an average value of the output values output from the plurality of CNNs;
The control unit is
A loss value of an average model including a plurality of CNNs included in the average value calculation and an auxiliary loss value for each of a plurality of CNNs included in the average value calculation are calculated, and the calculated average model loss value and the calculated auxiliary value are calculated. Calculate a total loss value by summing the loss values, and finely adjust the average value based on the calculated total loss value,
The loss value of the average model is,
The image data processing apparatus which normalizes the input image data and is a loss value with respect to an average value of output values output from each of the plurality of CNNs included in the average value calculation based on the normalized image data and the learned weight parameter.

delete

7. The method of claim 6,
The control unit is
An image data processing apparatus for calculating the average value by excluding an arbitrarily selected CNN from all of the plurality of CNNs.

9. The method of claim 8,
The control unit is
An image data processing apparatus that arbitrarily selects and excludes a CNN and repeats the process of calculating the average value.

10. The method of claim 9,
The control unit is
An image data processing apparatus for uniformly selecting a CNN to be excluded with a preset probability among all of the plurality of CNNs.