KR102599753B1

KR102599753B1 - YUV Image Processing Method and System Using A Neural Network Composed Of Dual-Path Blocks

Info

Publication number: KR102599753B1
Application number: KR1020230095048A
Authority: KR
Inventors: 이은수; 김정욱
Original assignee: 주식회사 카비
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-11-08

Abstract

YUV 데이터 형식을 사용하며, YUV 데이터 형식에 적합한 이중경로 블록으로 구성된 뉴럴 네트워크를 활용한 YUV 이미지 처리 방법 및 장치이 개시된다. 본 발명의 YUV 이미지 처리 방법은 뉴럴 네트워크는 YUV 형식의 이미지 데이터를 입력하고 처리하여 루마 특징맵과 크로마 특징맵을 생성하는 단계; 상기 뉴럴 네트워크의 이중경로 블록(dual-path block)은 입력된 이미지 데이터에서 특징을 추출하는 단계; 및 상기 뉴럴 네트워크는 추출된 특징을 기반으로 작업을 수행하는 단계를 포함하며, 상기 이중 경로블록은 다수 층으로 구성되며, 각각 두 개의 브랜치로 구성된 두 경로가 병렬적으로 구성되어, 각각의 브랜치가 루마(Luma, Y) 특징 정보와 크로마(Chroma, UV) 특징 정보를 개별적으로 추출하고, 상기 추출된 정보들을 교환 및 결합하도록 학습되는 것을 특징으로 한다. 본 발명에 따르면, 뉴럴 네트워크의 입력으로 YUV 이미지 데이터를 사용하는 이중경로 블록으로 구성된 뉴럴 네트워크를 활용하여 효율적이며 효과적인 이미지 처리 방법을 제공할 수 있다.A YUV image processing method and device using the YUV data format and utilizing a neural network composed of dual path blocks suitable for the YUV data format are disclosed. The YUV image processing method of the present invention includes the steps of a neural network inputting and processing image data in YUV format to generate a luma feature map and a chroma feature map; The dual-path block of the neural network includes extracting features from input image data; And the neural network includes performing a task based on the extracted features, wherein the dual path block is composed of multiple layers, and two paths each composed of two branches are configured in parallel, and each branch is configured in parallel. It is characterized by learning to extract Luma (Y) feature information and Chroma (UV) feature information separately, and to exchange and combine the extracted information. According to the present invention, an efficient and effective image processing method can be provided by utilizing a neural network composed of dual-path blocks that use YUV image data as an input to the neural network.

Description

YUV image processing method and device using a neural network composed of dual-path blocks {YUV Image Processing Method and System Using A Neural Network Composed Of Dual-Path Blocks}

본 발명은 이중경로 블록으로 구성된 뉴럴 네트워크를 활용한 YUV 이미지 처리 방법 및 장치에 관한 것으로서, 더욱 상세하게는 RGB 포맷에 비해 상대적으로 적은 데이터를 가진 YUV 이미지 데이터 형식을 사용하며, YUV 이미지 데이터 형식에 적합한 이중경로 블록으로 구성된 뉴럴 네트워크를 활용한 YUV 이미지 처리 방법 및 장치에 관한 것이다.The present invention relates to a YUV image processing method and device using a neural network composed of dual-path blocks. More specifically, it uses the YUV image data format, which has relatively less data compared to the RGB format, and uses the YUV image data format. This relates to a YUV image processing method and device using a neural network composed of suitable dual-path blocks.

최근, 이미지 처리 분야에서 뉴럴 네트워크를 사용하는 딥 러닝 기술은 그 놀라운 성능으로 주목받고 있다. 뉴럴 네트워크는 데이터로부터 복잡한 특징으로 학습하는 능력을 가지고 있어, 매우 효과적인 이미지 처리 기술들을 개발할 수 있다.Recently, deep learning technology using neural networks in the image processing field has been attracting attention for its amazing performance. Neural networks have the ability to learn complex features from data, allowing the development of highly effective image processing techniques.

딥 러닝 기술은 모델의 추론 정확도와 동작 속도를 균형있게 개선하는데 중점을 두고 발전되어 왔다. 예를 들어, 모델을 더 깊고 복잡하게 만들면 추론 정확도가 향상되지만 동작 속도는 저하된다. 반면에 이미지 해상도를 낮추면 모델의 동작 속도는 빨라지지만 추론이 부정확해진다. 이러한 요소들을 잘 조합함으로써 뉴럴 네트워크의 정확도-속도 간의 트레이드오프(trade-off)를 개선 시킬 수 있다.Deep learning technology has been developed with a focus on improving the model's inference accuracy and operation speed in a balanced manner. For example, making models deeper and more complex improves inference accuracy but reduces operation speed. On the other hand, lowering the image resolution makes the model run faster, but makes inferences more inaccurate. By combining these factors well, the trade-off between accuracy and speed of neural networks can be improved.

일반적으로, 뉴럴 네트워크의 입력 이미지 데이터로 표준적인 색 표현 방식인 RGB 포맷이 사용된다. 하지만 RGB 데이터를 모델의 입력으로 사용할 때는 다음과 같은 문제들에 직면할 수 있다.Generally, the RGB format, a standard color expression method, is used as input image data for a neural network. However, when using RGB data as input to a model, you may face the following problems:

우선 RGB 데이터는 빨강, 초록, 파랑의 세 가지 색을 0에서 255 사이의 값으로 표현하여 다양한 색을 나타낼 수 있지만, 메모리 사용량이 크다는 단점이 있다.First of all, RGB data can represent a variety of colors by expressing the three colors of red, green, and blue with values between 0 and 255, but it has the disadvantage of requiring large memory usage.

또한, RGB 데이터의 효율적인 저장을 위해 YUV 변환, 이산 코사인 변환, 양자화 등의 과정이 사용되기 때문에, 학습 혹은 추론 과정에서 디코딩(decoding) 작업을 위한 비용이 추가적으로 발생한다는 단점이 있다.In addition, because processes such as YUV transformation, discrete cosine transformation, and quantization are used to efficiently store RGB data, there is a disadvantage that additional costs for decoding are incurred during the learning or inference process.

대한민국 공개특허 제10-2023-0013989호Republic of Korea Patent Publication No. 10-2023-0013989 대한민국 등록특허 제2234097호Republic of Korea Patent No. 2234097 대한민국 등록특허 제2200496호Republic of Korea Patent No. 2200496

본 발명은 상술한 문제점을 감안하여 안출한 것으로 그 목적은 RGB 포맷에 비해 상대적으로 적은 데이터를 사용하는 이중경로 블록으로 구성된 뉴럴 네트워크를 활용한 YUV 이미지 처리 방법 및 장치를 제공하는 것이다.The present invention was developed in consideration of the above-mentioned problems, and its purpose is to provide a YUV image processing method and device using a neural network composed of dual-path blocks that use relatively less data compared to the RGB format.

상기 과제해결을 위한 본 발명의 YUV 이미지 처리 장치는 명도인 루마(Luma, Y) 성분과 색차인 크로마(Chroma, UV) 성분으로 구성되는 YUV 이미지 데이터; 및 상기 YUV 이미지 데이터를 입력받아서 처리하여 루마 성분과 크로마 성분을 분리하여 입력하고 처리하는 입력부, 상기 입력부로부터 입력된 루마 성분과 크로마 성분을 이중 경로블록에서 특징을 추출하는 특징 탐지부, 및 상기 이중 경로블록에서 추출한 특징을 기반으로 작업을 수행하는 작업 수행부로 구성되는 뉴럴 네트워크를 포함하며, 상기 이중 경로블록은 다수 층으로 구성되며, 각각 두 개의 브랜치로 구성된 두 경로가 병렬적으로 구성되어, 각각의 브랜치가 루마(Y) 특징 정보와 크로마(UV) 특징 정보를 개별적으로 추출하고, 상기 추출된 정보들을 교환 및 결합하도록 학습되는 것을 특징으로 한다.The YUV image processing device of the present invention for solving the above problems includes YUV image data consisting of a luma (Y) component, which is a brightness component, and a chroma (UV) component, which is a color difference; and an input unit that receives and processes the YUV image data to separately input and process the luma component and chroma component, a feature detection unit that extracts features from a dual path block using the luma component and chroma component input from the input unit, and the dual path block. It includes a neural network composed of a task execution unit that performs a task based on features extracted from the path block, wherein the dual path block is composed of multiple layers, and two paths each composed of two branches are configured in parallel, respectively. The branch of is learned to separately extract luma (Y) feature information and chroma (UV) feature information, and exchange and combine the extracted information.

상기 다른 과제해결을 위한 본 발명의 YUV 이미지 처리 방법은 뉴럴 네트워크는 루마(Y) 성분과 크로마(UV) 성분으로 구성되는 YUV 형식의 이미지 데이터를 입력하고 처리하는 단계; 상기 뉴럴 네트워크의 이중경로 블록(dual-path block)으로 입력된 루마(Y) 성분과 크로마(UV) 성분에서 특징을 추출하는 단계; 및 상기 뉴럴 네트워크는 추출된 특징을 기반으로 작업을 수행하는 단계를 포함하며, 상기 이중 경로블록은 다수 층으로 구성되며, 각각 두 개의 브랜치로 구성된 두 경로가 병렬적으로 구성되어, 각각의 브랜치가 루마(Y) 특징 정보와 크로마(UV) 특징 정보를 개별적으로 추출하고, 상기 추출된 정보들을 교환 및 결합하도록 학습되는 것을 특징으로 한다.The YUV image processing method of the present invention for solving the above other problems includes the steps of inputting and processing image data in YUV format, which consists of a luma (Y) component and a chroma (UV) component, in a neural network; Extracting features from luma (Y) and chroma (UV) components input to a dual-path block of the neural network; And the neural network includes performing a task based on the extracted features, wherein the dual path block is composed of multiple layers, and two paths each composed of two branches are configured in parallel, and each branch is configured in parallel. It is characterized by learning to extract luma (Y) feature information and chroma (UV) feature information separately, and to exchange and combine the extracted information.

본 발명에 있어서, 다수 층으로 쌓인 상기 이중경로 블록들 사이에는 스킵 연결(skip connection)을 추가하여 학습을 용이하게 할 수 있다.In the present invention, learning can be facilitated by adding a skip connection between the dual path blocks stacked in multiple layers.

본 발명에 있어서, 상기 이중경로 블록(dual-path block)으로 입력된 이미지에서 특징으로 추출하는 단계는, 상기 루마 성분(Y)을 컨볼루션 레이어인 를 포함하는 강화 루마 성분 추출 브랜치에 입력하여 강화 루마 특징맵을 생성하는 단계; 상기 크로마 성분(UV)을 컨볼루션 레이어인 를 포함하는 강화 크로마 성분 특징 추출 브랜치에 입력하여 강화 크로마 특징맵을 생성하는 단계; 상기 입력된 루마 성분(Y)을 교환 루마 성분 추출 브랜치에 입력하여 교환 루마 특징맵을 생성하고, 상기 강화 크로마 추출 브랜치의 추출결과에 연결하는 단계; 상기 입력된 크로마 성분(UV)을 교환 크로마 성분 특징 추출 브랜치에 입력하여 교환 크로마 특징맵을 생성하고, 상기 강화 루마 특징 추출 브랜치의 출력 결과에 연결하는 단계; 및 연결된 상기 특징맵들을 결합하는 단계를 포함할 수 있다.In the present invention, the step of extracting features from an image input as a dual-path block involves extracting the luma component (Y) from a convolutional layer. Generating an enhanced luma feature map by inputting it into an enhanced luma component extraction branch including; The chroma component (UV) is a convolution layer. Generating an enhanced chroma feature map by inputting it into an enhanced chroma component feature extraction branch including; Inputting the input luma component (Y) into an exchanged luma component extraction branch to generate an exchanged luma feature map and connecting it to the extraction result of the enhanced chroma extraction branch; Inputting the input chroma component (UV) into a swapped chroma component feature extraction branch to generate a swapped chroma feature map and connecting it to the output result of the enhanced luma feature extraction branch; and combining the connected feature maps.

본 발명에 있어서, 상기 교환 루마 특징맵의 생성은, 상기 입력된 루마 성분(Y)을 풀링(pooling) 레이어(pool)와 컨볼루션 레이어인 을 이용하여, 크기가 조정된 교환 루마 특징맵이 생성될 수 있다.In the present invention, the generation of the exchange luma feature map involves combining the input luma component (Y) with a pooling layer and a convolution layer. Using , a scaled exchange luma feature map can be generated.

본 발명에 있어서, 상기 교환 크로마 특징맵의 생성은, 상기 입력된 크로마 성분(UV)을 컨볼루션 레이어인 과 최근접 이웃 보간(Nearest Neighbor Interpolation) 레이어(up)를 이용하여, 크기가 조정된 교환 크로마 특징맵을 형성할 수 있다.In the present invention, the generation of the exchanged chroma feature map involves converting the input chroma component (UV) into a convolution layer. and Nearest Neighbor Interpolation layer (up) can be used to form a scaled exchange chroma feature map.

본 발명에 있어서, 병렬적으로 구성된 컨볼루션 레이어들인 상기 , , , 및 은 실행 시 하나의 컨볼루션 레이어로 통합되어 수행될 수 있다.In the present invention, the convolutional layers configured in parallel , , , and can be performed by being integrated into one convolutional layer at the time of execution.

본 발명에 있어서, 연결된 상기 특징맵끼리의 결합하는 단계는, 상기 연결된 특징맵은 각각 원소별 합 연산(element-wise sum)에 의해 합쳐지고, 배치 정규화(BN), 활성화 함수(activation function)를 통과하여 보다 풍부해진 특징맵을 출력할 수 있다.In the present invention, in the step of combining the connected feature maps, the connected feature maps are each combined by an element-wise sum operation, and batch normalization (BN) and activation function are performed. By passing through it, a richer feature map can be output.

전술한 바와 같은 구성을 갖는 본 발명에 따르면, 뉴럴 네트워크의 입력으로 YUV 이미지 데이터를 사용하는 이중경로 블록으로 구성된 뉴럴 네트워크를 활용하여 효율적이며 효과적인 이미지 처리 방법을 제공할 수 있다.According to the present invention having the configuration described above, an efficient and effective image processing method can be provided by utilizing a neural network composed of dual-path blocks that use YUV image data as an input to the neural network.

RGB 이미지 데이터 대신에 YUV 이미지 데이터를 입력으로 사용할 때 다음 두 가지 큰 장점이 있다.There are two major advantages when using YUV image data as input instead of RGB image data:

우선, 이미지 디코딩 과정에서 YUV 포맷을 RGB 포맷으로 변환하는 과정을 생략할 수 있기 때문에, 이미지 처리 속도가 빨라질 수 있다.First, since the process of converting YUV format to RGB format can be omitted during the image decoding process, image processing speed can be accelerated.

또한, YUV 4:2:0 포맷은 색 표현을 위한 데이터가 기존에 비해 4배 적게 필요하다. 따라서 학습 및 추론 시 메모리 소모량이 작고, 연산 속도는 빨라질 수 있다.Additionally, the YUV 4:2:0 format requires four times less data for color expression than before. Therefore, memory consumption during learning and inference can be small and computation speed can be fast.

또한, YUV 이미지 데이터를 처리하기에 적합한 형태인 이중경로 블록을 사용하여 뉴럴 네트워크의 성능을 향상시킬 수 있다.Additionally, the performance of the neural network can be improved by using a dual-path block, which is a form suitable for processing YUV image data.

도 1은 본 발명의 일실시예에 따른 이중경로 블록으로 구성된 뉴럴 네트워크를 활용한 YUV 이미지 처리 장치를 나타내는 블록도이다.
도 2는 본 발명의 일실시예에 따른 이중경로 블록을 사용한 뉴럴 네트워크의 YUV 이미지 처리 방법를 나타내는 순서도이다.
도 3은 본 발명의 일실시예에 따른 이중경로 블록의 연산을 설명하는 순서도이다.
도 4는 본 발명의 일실시예에 따른 컨볼루션 레이어가 실제 연산되는 과정을 설명하기 위한 도면이다.Figure 1 is a block diagram showing a YUV image processing device using a neural network composed of dual path blocks according to an embodiment of the present invention.
Figure 2 is a flowchart showing a YUV image processing method of a neural network using a dual path block according to an embodiment of the present invention.
Figure 3 is a flowchart explaining the operation of a dual-path block according to an embodiment of the present invention.
Figure 4 is a diagram for explaining the actual operation process of a convolution layer according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 따른 이중경로 블록으로 구성된 뉴럴 네트워크를 활용한 YUV 이미지 처리 방법 및 장치를 상세히 설명하기로 한다. Hereinafter, a YUV image processing method and device using a neural network composed of dual path blocks according to a preferred embodiment of the present invention will be described in detail with reference to the attached drawings.

본 발명은 효율적이고 효과적인 딥 러닝 기반 이미지 처리 방법을 제공한다. 이를 위해 RGB 포맷에 비해 상대적으로 적은 데이터를 가진 YUV 포맷의 이미지 데이터를 사용하고, 상기 YUV 이미지 데이터 형식을 처리하는데 적합한 이중경로 블록으로 구성된 뉴럴 네트워크 블록 구조를 제안한다.The present invention provides an efficient and effective deep learning-based image processing method. To this end, we use image data in the YUV format, which has relatively less data than the RGB format, and propose a neural network block structure composed of dual-path blocks suitable for processing the YUV image data format.

구체적으로는, YUV 4:2:0 포맷의 이미지 데이터를 입력으로 사용하면서 메모리 사용량 및 연산량을 감소시킬 수 있고, 상기 YUV 포맷의 이미지 처리에 적합한 이중경로 블록을 구현하여 좋은 추론 성능을 얻을 수 있는 방법을 제시한다.Specifically, by using image data in YUV 4:2:0 format as input, memory usage and calculation amount can be reduced, and good inference performance can be obtained by implementing a dual-path block suitable for image processing in the YUV format. suggests a method.

추가적으로, YUV 이미지 데이터 사용의 장점은 이미지 디코딩(decoding) 과정이 간소화될 수 있다는 것이다. 일실시예로, JPEG 압축된 이미지의 디코딩 과정은 YUV 형식에서 RGB 형식으로 변환하는 과정이 포함된다. YUV 이미지를 직접 사용하는 경우 상기 변환 과정을 생략할 수 있고, 이는 약간의 연산량 감소 및 실제 애플리케이션 등의 실행에서 레이턴시(latency) 감소의 효과를 얻을 수 있다.Additionally, an advantage of using YUV image data is that the image decoding process can be simplified. In one embodiment, the decoding process of a JPEG compressed image includes converting from YUV format to RGB format. When using a YUV image directly, the conversion process can be omitted, which can achieve the effect of slightly reducing the amount of calculation and reducing latency in the execution of actual applications.

도 1은 본 발명의 일실시예에 따른 이중경로 블록으로 구성된 뉴럴 네트워크를 활용한 YUV 이미지 처리 장치를 나타내는 블록도이다.Figure 1 is a block diagram showing a YUV image processing device using a neural network composed of dual path blocks according to an embodiment of the present invention.

도 1을 참조하면, 뉴럴 네트워크를 활용한 YUV 이미지 처리 장치는 YUV 이미지 데이터(10)와 이를 이미지 처리하는 뉴럴 네트워크(50)로 구성된다.Referring to FIG. 1, a YUV image processing device using a neural network consists of YUV image data 10 and a neural network 50 that processes the image data.

상기 YUV 이미지 데이터(10)는 명도인 루마(Luma, Y) 성분과 색차인 크로마(Chroma, UV) 성분으로 구성된다. The YUV image data 10 consists of a luma (Y) component, which is the brightness, and a chroma (UV) component, which is the color difference.

상기 뉴럴 네트워크(50)는 상기 YUV 이미지 데이터(10)를 입력받아서 처리하여 루마 성분과 크로마 성분을 분리하여 입력하는 입력부(100), 상기 입력부(100)에서 입력된 루마 성분과 크로마 성분을 이중 경로블록(210)에서 특징을 추출하는 특징 탐지부(200), 및 상기 이중 경로블록(210)에서 추출한 특징을 기반으로 작업을 수행하는 작업 수행부(300)를 포함한다.The neural network 50 includes an input unit 100 that receives and processes the YUV image data 10 to separate the luma component and the chroma component, and outputs the luma component and the chroma component input from the input unit 100 through a dual path. It includes a feature detection unit 200 that extracts features from the block 210, and a task execution unit 300 that performs a task based on the features extracted from the dual path block 210.

상기 이중 경로블록(210)은 각각 두 개의 브랜치로 구성된 두 경로가 병렬적으로 구성되어, 각각의 브랜치가 루마(Y) 특징 정보와 크로마(UV) 특징 정보를 개별적으로 추출하고, 상기 추출된 정보들을 교환 및 결합하도록 학습된다.The dual path block 210 is composed of two paths each consisting of two branches in parallel, so that each branch separately extracts luma (Y) feature information and chroma (UV) feature information, and the extracted information Learned to exchange and combine them.

도 2는 본 발명의 일실시예에 따른 이중경로 블록을 사용한 뉴럴 네트워크의 YUV 이미지 처리 방법를 나타내는 순서도이다.Figure 2 is a flowchart showing a YUV image processing method of a neural network using a dual path block according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 뉴럴 네트워크(50)는 YUV 형식의 이미지 데이터를 입력(S100)하고 처리(S110)하는 단계, 입력된 이미지 데이터에서 이중경로 블록(dual-path block)을 이용하여 특징으로 추출(S200)하는 단계, 및 추출된 특징을 기반으로 작업을 수행하는 단계(S300)를 포함한다. 본 발명에서, YUV 형식의 이미지 데이터는 Y(루마) 성분과 UV(크로마) 성분으로 구분된다.Referring to Figures 1 and 2, the neural network 50 inputs (S100) and processes (S110) image data in YUV format, using a dual-path block in the input image data. It includes extracting features (S200) and performing a task based on the extracted features (S300). In the present invention, image data in YUV format is divided into Y (luma) component and UV (chroma) component.

상기 이미지를 입력(S100)하고 처리하는 단계(S110)는 스트라이드(stride)가 2이고, 커널(kernel) 크기가 7×7인 컨볼루션(convolution) 레이어, 배치 정규화(batch normalization) 레이어, ReLU 활성화 함수(activation function), 그리고 스트라이드가 2이고 커널 크기가 3×3인 맥스 풀링(max pooling) 레이어를 포함한다. 이 단계를 통과하면 루마(Y) 성분 및 크로마(UV) 성분의 크기는 각각 4 배 감소한다. 예를 들어, 32×32 크기의 이미지가 입력되면, 8×8 크기의 루마 특징맵(feature map)과 4×4 크기의 크로마 특징맵(feature map)이 생성된다.The step of inputting (S100) and processing the image (S110) involves using a convolution layer with a stride of 2 and a kernel size of 7×7, a batch normalization layer, and ReLU activation. It includes an activation function and a max pooling layer with a stride of 2 and a kernel size of 3×3. After passing this step, the sizes of the luma (Y) component and chroma (UV) component are each reduced by 4 times. For example, when an image of size 32×32 is input, an 8×8 luma feature map and a 4×4 chroma feature map are created.

상기 입력된 이미지에서 이중경로 블록(210, dual-path block)을 이용하여, 특징을 추출하는 단계(S200)는 입력된 루마 성분과 크로마 성분은 직렬로 연결된 이중경로 블록(210)을 통과하여 특징맵이 담고 있는 정보가 풍부해진다. 이중 경로블록(210)의 동작에 관하여는 후술한다.In the step (S200) of extracting features from the input image using a dual-path block (210), the input luma component and chroma component pass through the dual-path block (210) connected in series. The information contained in the map becomes richer. The operation of the dual path block 210 will be described later.

상기 이중경로 블록(210)을 다수 층 쌓아 상기 뉴럴 네트워크(50)를 깊게 만들수록 추론 성능이 향상될 수 있다.As the neural network 50 is made deeper by stacking multiple layers of the dual path blocks 210, inference performance can be improved.

또한, 다수 층으로 쌓인 상기 이중경로 블록(210) 사이에 스킵 연결(211, skip connection)을 추가하여 학습을 용이하게 할 수 있다.Additionally, learning can be facilitated by adding a skip connection (211) between the dual path blocks 210 stacked in multiple layers.

상기 추출된 특징을 기반으로 특정 작업을 수행(S300)하는 단계는 상기 추출된 특징맵은 수행되고자 하는 작업에 따라 알맞은 헤드(310, task-specific head)를 통과한다. 본 발명의 뉴럴 네트워크는 다양한 이미지 처리 작업들에 활용될 수 있다. 예컨대, 이미지 클래스 분류, 얼굴 인식, 자율 주행 등이다.In the step of performing a specific task based on the extracted features (S300), the extracted feature map passes through a head (310, task-specific head) appropriate for the task to be performed. The neural network of the present invention can be used in various image processing tasks. For example, image class classification, face recognition, autonomous driving, etc.

도 3은 본 발명의 일실시예에 따른 이중경로 블록(210)의 연산을 설명하는 순서도이다.Figure 3 is a flowchart explaining the operation of the dual path block 210 according to an embodiment of the present invention.

상술한 바와 같이, YUV 형식의 이미지 데이터는 루마 성분(Y)과 크로마 성분(UV)으로 분리되고 처리(S100, S110)되어 있다.As described above, image data in YUV format is separated into luma component (Y) and chroma component (UV) and processed (S100, S110).

먼저, 이중경로 블록(210)은 상기 루마 성분(Y)을 컨볼루션 기반의 강화 루마 성분 특징 추출 브랜치(S210)에 입력한다.First, the dual path block 210 inputs the luma component (Y) to the convolution-based enhanced luma component feature extraction branch (S210).

상기 강화 루마 성분 특징 추출 브랜치(S210)는, 커널 크기가 3×3인 컨볼루션 레이어인 를 포함한다. 입력된 루마 성분(Y)은 를 통과하면서 루마 특징맵이 담고 있는 정보가 강화되어 강화 루마 특징맵이 생성된다.The enhanced luma component feature extraction branch (S210) is a convolutional layer with a kernel size of 3×3. Includes. The input luma component (Y) is As it passes through, the information contained in the luma feature map is strengthened and an enhanced luma feature map is created.

다음으로, 상기 크로마 성분(UV)을 컨볼루션 기반의 강화 크로마 성분 특징 추출 브랜치(S220)에 입력한다Next, the chroma component (UV) is input to the convolution-based enhanced chroma component feature extraction branch (S220).

상기 강화 크로마 성분 특징 추출 브랜치(S220)는, 커널 크기가 3×3인 컨볼루션 레이어, 를 포함한다. 입력된 크로마 성분(UV)은 를 통과하면서 크로마 특징맵이 담고 있는 정보가 강화되어 강화 크로마 특징맵이 생성된다.The enhanced chroma component feature extraction branch (S220) is a convolutional layer with a kernel size of 3 × 3, Includes. The input chroma component (UV) is As it passes through, the information contained in the chroma feature map is strengthened and an enhanced chroma feature map is created.

다음으로, 상기 입력된 루마 성분(Y)을 교환 루마 성분 특징 추출 브랜치(S230)에 입력하여 교환 루마 특징맵을 생성하고, 상기 강화 크로마 성분 특징 추출 브랜치(S220)의 추출결과에 연결한다. 이를 위해서, 스트라이드가 2이고, 커널크기가 2×2인 평균 풀링(average pooling) 레이어(pool)와, 커널 크기가 3×3인 컨볼루션 레이어인 를 포함한다. 상기 입력된 루마 성분(Y)은, 상기 풀링 레이어(pool)를 거쳐 상기 강화 크로마 특징맵과 크기가 동일하게 조절되며, 이어 를 통과하여 강화 크로마 특징맵에 전달될 정보가 추출된다.Next, the input luma component (Y) is input to the exchanged luma component feature extraction branch (S230) to generate an exchanged luma feature map, and is connected to the extraction result of the enhanced chroma component feature extraction branch (S220). For this purpose, an average pooling layer with a stride of 2 and a kernel size of 2 × 2, and a convolution layer with a kernel size of 3 × 3 are used. Includes. The input luma component (Y) is adjusted to the same size as the enhanced chroma feature map through the pooling layer, and then Information to be transmitted to the enhanced chroma feature map is extracted.

다음으로, 상기 입력된 크로마 성분(UV)을 교환 크로마 성분 특징 추출 브랜치(S240)에 입력하여 교환 크로마 성분을 생성하고, 상기 강화 루마 성분 특징 추출 브랜치(S210)의 출력 결과에 연결한다. 이를 위해서, 커널 크기가 3×3인 컨볼루션 레이어인 과 최근접 이웃 보간(Nearest Neighbor Interpolation) 레이어(up)를 포함한다. 상기 입력된 크로마 성분(UV)은, 를 통과하여 상기 강화 루마 특징맵에 전달될 정보가 추출되며, 상기 최근접 이웃 보간 레이어(up)를 거쳐 강화 루마 특징맵에 연결될 수 있도록 크기가 조절된다.Next, the input chroma component (UV) is input to the exchanged chroma component feature extraction branch (S240) to generate an exchanged chroma component, and connected to the output result of the enhanced luma component feature extraction branch (S210). For this purpose, a convolutional layer with a kernel size of 3×3 and a Nearest Neighbor Interpolation layer (up). The input chroma component (UV) is, Information to be transmitted to the enhanced luma feature map is extracted through , and its size is adjusted so that it can be connected to the enhanced luma feature map through the nearest neighbor interpolation layer (up).

다음으로, 연결된 상기 특징맵들끼리 결합(S250, S260)한다. 연결된 특징맵끼리 결합하는 과정에서 루마 특징 정보와 크로마 특징 정보는 더 다양화될 수 있다.Next, the connected feature maps are combined (S250, S260). In the process of combining connected feature maps, luma feature information and chroma feature information can become more diverse.

연결된 특징맵끼리의 결합은, 원소별 합(element-wise sum) 연산, 배치 정규화(BN), 및 ReLU 할성화 함수(ReLU)를 포함한다. 상기 연결된 특징맵은 각각 원소별 합 연산에 의해 합쳐지고, 배치 정규화(BN), ReLU 활성화 함수(ReLU)를 통과하여 보다 풍부해진 결합 특징맵을 출력한다.Combination of connected feature maps includes element-wise sum operation, batch normalization (BN), and ReLU function (ReLU). The connected feature maps are combined by a sum operation for each element, and pass through batch normalization (BN) and ReLU activation function (ReLU) to output a richer combined feature map.

여기서, 상기 병렬적으로 구성된 컨볼루션 레이어들인 , , , 및 은 실행 시 하나의 컨볼루션 레이어로 통합되어 수행될 수 있다.Here, the parallel convolutional layers are , , , and can be performed by being integrated into one convolutional layer at the time of execution.

일반적으로, 뉴럴 네트워크 내의 컨볼루션 연산은 (B, iC, iH, iW) 크기의 4차원 특징맵 텐서와 (oC, iC, k, k) 크기의 4차원 가중치 커널 텐서를 각각 2차원 행렬로 변환하고 GEMM(GEneral Matrix Multiplications) 연산을 수행하는 방식으로 최적화 된다. (B, iC, iH, iW) 크기의 4차원 특징맵 텐서를 (iC×k×k, B×oH×W) 크기의 2차원 행렬로 변환하는 과정을 im2col이라고 하며, 이 과정을 적절히 사용하면 루마 특징맵과 크로마 특징맵을 (iC_C×k×k + iC_L×k×k, 1.5×B×oH×oW) 크기의 2차원 행렬로 합칠 수 있다. 또한, , , , 및 의 가중치 커널 텐서들은 간단히 합쳐 (oC_C + oC_L, iC_C×k×k + iC_L×k×k) 크기의 2차원 가중치 커널 행렬로 만들 수 있으므로, 단 한 번의 GEMM 연산으로 네 개의 컨볼루션을 동시에 수행할 수 있다.In general, the convolution operation within a neural network converts a 4-dimensional feature map tensor of size (B, iC, iH, iW) and a 4-dimensional weight kernel tensor of size (oC, iC, k, k) into a 2-dimensional matrix, respectively. and is optimized by performing GEMM (GEneral Matrix Multiplications) operations. The process of converting a 4-dimensional feature map tensor of size (B, iC, iH, iW) into a 2-dimensional matrix of size (iC × k × k, B × oH × W) is called im2col, and if this process is used appropriately, The luma feature map and chroma feature map can be combined into a two-dimensional matrix of size (iC_C×k×k + iC_L×k×k, 1.5×B×oH×oW). also, , , , and The weight kernel tensors can be simply combined to create a two-dimensional weight kernel matrix of size (oC_C + oC_L, iC_C×k×k + iC_L×k×k), so four convolutions can be performed simultaneously with just one GEMM operation. You can.

여기서, B는 배치 사이즈, iH, iW는 입력 특징맵의 높이와 너비, iC는 입력 특징맵의 채널 수를 의미하고, k는 가중치 커널 사이즈, oH, oW는 출력 특징맵의 높이와 너비, oC는 출력 특징맵의 채널 수를 의미한다. 아래 첨자 L과 C는 각각 루마와 크로마를 나타낸다.Here, B is the batch size, iH, iW are the height and width of the input feature map, iC is the number of channels of the input feature map, k is the weight kernel size, oH, oW are the height and width of the output feature map, oC. means the number of channels of the output feature map. The subscripts L and C represent luma and chroma, respectively.

도 4는 본 발명의 일실시예에 따른 컨볼루션 레이어가 실제 연산되는 과정을 설명하기 위한 도면이다.Figure 4 is a diagram for explaining the actual operation process of a convolution layer according to an embodiment of the present invention.

도 4를 참조하면, 가중치 커널(weight kernel)과 입력 특징맵(input feature maps)의 단 한번의 GEMM 연산으로 출력 특징맵(output feature maps)이 생성된다.Referring to Figure 4, output feature maps are generated through a single GEMM operation of the weight kernel and input feature maps.

다시 도 1를 참조하면, 다수 층의 이중 경로블록(200)을 통해 추출된 특징맵은 수행되고자 하는 작업에 따라 알맞은 헤드(task-specific head)를 통과한다. 본 발명의 뉴럴 네트워크(50)는 다양한 이미지 처리 작업들에 활용될 수 있는데, 예컨대 이미지 클래스 분류, 얼굴 인식, 자율 주행 등이다.Referring again to FIG. 1, the feature map extracted through the multi-layer dual path block 200 passes through a task-specific head depending on the task to be performed. The neural network 50 of the present invention can be used for various image processing tasks, such as image class classification, face recognition, and autonomous driving.

이하 일실시예로, 본 발명을 사용한 이미지 클래스 분류 작업에 대해 설명한다.Below, as an example, an image class classification task using the present invention will be described.

YUV 이미지 데이터(10)와 이중경로 블록(200)을 포함하는 본 발명의 뉴럴 네트워크(50)를 사용하는 것의 효율성을 검증하기 위해, 유명한 네트워크인 ResNet-18과 비교한다. 유사한 크기를 갖게 하기 위해 본 발명의 뉴럴 네트워크를 16층의 이중 경로블록으로 구성한다. 각 블록에서 루마, 크로마 특징맵을 위한 채널 개수는 각각 32개로 구성한다.To verify the effectiveness of using the neural network 50 of the present invention containing YUV image data 10 and dual path blocks 200, it is compared with the famous network ResNet-18. In order to have a similar size, the neural network of the present invention is composed of 16-layer dual path blocks. In each block, the number of channels for luma and chroma feature maps is 32 each.

본 발명의 뉴럴 네트워크는 학습 가능한 파라미터 개수가 11.1M개로, 11.2M개의 ResNet-18과 유사한 크기를 갖는다. 하지만, YUV 이미지 데이터를 입력하는 본 발명의 뉴럴 네트워크의 연산량(FLOPs)는 48,3M으로, RGB 이미지 데이터를 입력받는 ResNet-18의 70,5M에 비해 30% 더 적고, 따라서 더 효율적으로 동작할 수 있다.The neural network of the present invention has 11.1M learnable parameters, a similar size to ResNet-18, which has 11.2M parameters. However, the amount of calculations (FLOPs) of the neural network of the present invention that inputs YUV image data is 48,3M, which is 30% smaller than the 70,5M of ResNet-18 that inputs RGB image data, and therefore operates more efficiently. You can.

추가적으로, YUV 이미지 데이터는 이미지 디코딩 과정에서 RGB 이미지 데이터에 비해 더 적은 비용이 소모되므로, 전체 실행 과정의 레이턴시(latency)가 더 낮을 수 있다.Additionally, YUV image data consumes less cost than RGB image data in the image decoding process, so the latency of the entire execution process may be lower.

구체적으로, 10개의 클래스와 각 클래스당 6,000장으로 구성된 CIFAR-10 데이터셋을 사용하여 본 발명의 뉴럴 네트워크의 효과성을 검증할 수 있다. 상기 데이터셋은 학습용 50,000장과 평가용 10,000장으로 구성되어 있고, 모든 이미지는 32×32 사이즈를 갖는다.Specifically, the effectiveness of the neural network of the present invention can be verified using the CIFAR-10 dataset consisting of 10 classes and 6,000 images for each class. The dataset consists of 50,000 images for training and 10,000 images for evaluation, and all images have a size of 32×32.

공정한 비교를 위해 학습용 하이퍼파라미터 설정은 모두 동일하게 한다. For a fair comparison, all learning hyperparameter settings are the same.

학습용 이미지 데이터를 사용하여 200 이폭(epoch) 학습하고, 평가용 이미지 데이터에 대한 추론 결과가 얼마나 정확한지 정확도(accuracy)를 측정한다. 배치 사이즈 128, 손실(loss) 함수는 크로스 엔트로피(cross-entropy), 그리고 옵티마이저(opimizer)는 SGD(Stochastic Gradient Descent)를 사용한다. 초기 학습율(learining rate)는 0.01로 설정하고, 134, 178 이폭(epoch)에서 10배씩 감소하는 스케쥴을 사용한다.We learn for 200 epochs using image data for training, and measure the accuracy of how accurate the inference results for the image data for evaluation are. The batch size is 128, the loss function is cross-entropy, and the optimizer uses SGD (Stochastic Gradient Descent). The initial learning rate is set to 0.01, and a schedule that decreases by a factor of 10 at 134 and 178 epochs is used.

평가 데이터셋에 대해 추론 정확도는 본 발명의 뉴럴 네트워크는 89%이며, ResNet-18은 86%로서, 본 발명이 더 좋은 성능을 얻을 수 있다.For the evaluation dataset, the inference accuracy is 89% for the neural network of the present invention and 86% for ResNet-18, so the present invention can achieve better performance.

이상에서 설명한 본 발명은 전술한 도면 및 상세한 설명에 의하여 한정되는 것은 아니고, 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 해당 기술분야의 당업자가 다양하게 수정 및 변경시킨 것 또한 본 발명의 범위 내에 포함됨은 물론이다.The present invention described above is not limited to the above-described drawings and detailed description, but can be modified and modified in various ways by those skilled in the art without departing from the spirit and scope of the present invention as set forth in the claims below. Of course, it is also included within the scope of the present invention.

10: YUV 이미지 데이터 50: 뉴럴 네트워크
100: 입력부 200: 특징 탐지부
210: 이중경로 블록 211: 스킵 연결
300: 작업 수행부 310: 헤드10: YUV image data 50: Neural network
100: input unit 200: feature detection unit
210: Dual path block 211: Skip connection
300: Work performance unit 310: Head

Claims

delete

The neural network includes inputting and processing image data in YUV format, which consists of a luma (Y) component and a chroma (UV) component;
Extracting features from luma (Y) and chroma (UV) components input to a dual-path block of the neural network; and
The neural network includes performing a task based on the extracted features,
The dual path block is composed of multiple layers, and two paths each composed of two branches are configured in parallel, so that each branch separately extracts luma (Y) feature information and chroma (UV) feature information, and Learned to exchange and combine extracted information.
The step of extracting features from the image input to the dual-path block is,
The luma component (Y) is a convolution layer. Generating an enhanced luma feature map by inputting the enhanced luma component into a feature extraction branch including;
The chroma component (UV) is a convolution layer. Generating an enhanced chroma feature map by inputting it into an enhanced chroma component feature extraction branch including;
Inputting the input luma component (Y) into a swapped luma component feature extraction branch to generate a swapped luma feature map and connecting it to the extraction result of the enhanced chroma component feature extraction branch;
Inputting the input chroma component (UV) into a swapped chroma component feature extraction branch to generate a swapped chroma feature map and connecting it to the output result of the enhanced luma component feature extraction branch; and
A YUV image processing method comprising combining the connected feature maps.

According to paragraph 2,
A YUV image processing method characterized in that learning can be facilitated by adding a skip connection between the dual path blocks stacked in multiple layers.

delete

According to paragraph 2,
The generation of the exchange luma feature map is,
The input luma component (Y) is divided into a pooling layer and a convolution layer. Using , a scaled exchange luma feature map is generated,
The generation of the exchange chroma feature map is,
The input chroma component (UV) is converted to a convolution layer. A YUV image processing method characterized in that a resized chroma feature map is generated using a Nearest Neighbor Interpolation layer (up).

According to clause 5,
The above convolutional layers constructed in parallel , , , and is a YUV image processing method characterized in that it is integrated into one convolutional layer when executed.

According to paragraph 2,
The step of combining the connected feature maps is,
A YUV image processing method characterized in that the connected feature maps are combined by an element-wise sum operation, pass through batch normalization (BN) and an activation function, and output a combined feature map. .