KR102269034B1

KR102269034B1 - Apparatus and method of using AI metadata related to image quality

Info

Publication number: KR102269034B1
Application number: KR1020200023848A
Authority: KR
Inventors: 백상욱; 천민수; 박용섭; 박재연; 최광표
Original assignee: 삼성전자주식회사
Priority date: 2019-11-20
Filing date: 2020-02-26
Publication date: 2021-06-24
Also published as: KR20210061906A

Abstract

하나 이상의 인스트럭션을 저장하는 메모리; 상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행하는 하나 이상의 프로세서; 및 출력부를 포함하고, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 제1 AI(artificial intelligence) 네트워크를 이용하여, 미리 정의된 복수의 객체 중 제1 영상에 포함된 객체의 종류에 대응하는 적어도 하나의 클래스를 포함하는 클래스 정보, 및 상기 제1 영상 내에서 각 클래스에 대응하는 영역을 나타내는 적어도 하나의 클래스 맵을 포함하는 AI 메타 데이터를 생성하고, 상기 제1 영상을 부호화하여 부호화 영상을 생성하고, 상기 출력부를 통해 상기 부호화 영상 및 상기 AI 메타 데이터를 출력하는 영상 제공 장치가 제공된다.a memory storing one or more instructions; one or more processors to execute the one or more instructions stored in the memory; and an output unit, wherein the one or more processors execute the one or more instructions and use a first artificial intelligence (AI) network to correspond to the type of object included in the first image among a plurality of predefined objects Creates AI metadata including class information including at least one class, and at least one class map indicating a region corresponding to each class in the first image, and encodes the first image to create an encoded image and outputting the encoded image and the AI metadata through the output unit.

Description

Apparatus and method of using AI metadata related to image quality

본 개시의 실시예들은 영상 제공 장치, 영상 제공 장치 제어 방법, 영상 재생 장치, 영상 제공 장치 제어 방법, 및 컴퓨터 프로그램에 관련된 것이다.Embodiments of the present disclosure relate to an image providing apparatus, a method for controlling an image providing apparatus, an image reproducing apparatus, a method for controlling an image providing apparatus, and a computer program.

디지털 영상을 네트워크를 통해 장치 간에 전송할 때, 전송되는 데이터 량을 줄이기 위해, 부호화, 영상 다운스케일 등 다양한 기술이 이용되고 있다. 그런데 이와 같이 영상의 데이터 량을 감소시키고, 영상을 수신한 장치에서 영상을 다시 복원하여 재생하는 과정에서, 원본 영상의 화질을 재현해내지 못해, 화질 저하가 발생하는 문제점이 있다.When a digital image is transmitted between devices through a network, various technologies such as encoding and image downscaling are used to reduce the amount of transmitted data. However, in the process of reducing the amount of data of the image and reconstructing and playing back the image in the device that received the image, the image quality of the original image cannot be reproduced, resulting in deterioration of image quality.

영상 재생 장치에서 복원되는 화질을 개선하기 위해, 부호화되어 입력된 입력 영상을 재생하는 과정에서, 입력 영상의 특징을 추출하여 화질을 개선하는 시도가 이루어지고 있다. 그런데 이와 같이 입력 영상으로부터 영상의 특징을 추출하는 처리 과정은, 상당한 리소스와 처리 시간을 동반하게 되어, 영상 재생 장치에서의 처리 부하를 증가시키고, 처리 속도를 저하시키는 문제점이 있다.In order to improve the image quality restored by the image reproducing apparatus, an attempt has been made to improve the image quality by extracting features of the input image in the process of reproducing the encoded input image. However, the process of extracting image features from the input image as described above involves considerable resources and processing time, so there is a problem in that the processing load in the image reproducing apparatus is increased and the processing speed is reduced.

본 개시의 실시예들은, AI 메타 데이터의 생성을 위해 이용되는 AI 화질 네트워크를 경량화하기 위한 것이다. Embodiments of the present disclosure are for lightening the AI quality network used for generation of AI metadata.

또한, 본 개시의 실시예들은, 영상 재생 장치에서 화질 처리를 위해 이용되는 AI 네트워크를 경량화하기 위한 것이다.In addition, embodiments of the present disclosure are for reducing the weight of an AI network used for image quality processing in an image reproducing apparatus.

또한, 본 개시의 실시예들은, 원본 영상으로부터 AI 메타 데이터를 생성함에 의해, AI 화질 네트워크의 성능을 향상시키기 위한 것이다. In addition, embodiments of the present disclosure are for improving the performance of an AI image quality network by generating AI metadata from an original image.

본 개시의 일 실시예의 일 측면에 따르면, 하나 이상의 인스트럭션을 저장하는 메모리; 상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행하는 하나 이상의 프로세서; 및 출력부를 포함하고, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 제1 AI(artificial intelligence) 네트워크를 이용하여, 미리 정의된 복수의 객체 중 제1 영상에 포함된 객체의 종류에 대응하는 적어도 하나의 클래스를 포함하는 클래스 정보, 및 상기 제1 영상 내에서 각 클래스에 대응하는 영역을 나타내는 적어도 하나의 클래스 맵을 포함하는 AI 메타 데이터를 생성하고, 상기 제1 영상을 부호화하여 부호화 영상을 생성하고, 상기 출력부를 통해 상기 부호화 영상 및 상기 AI 메타 데이터를 출력하는 영상 제공 장치가 제공된다. According to an aspect of an embodiment of the present disclosure, a memory for storing one or more instructions; one or more processors to execute the one or more instructions stored in the memory; and an output unit, wherein the one or more processors execute the one or more instructions and use a first artificial intelligence (AI) network to correspond to the type of object included in the first image among a plurality of predefined objects Creates AI metadata including class information including at least one class, and at least one class map indicating a region corresponding to each class in the first image, and encodes the first image to create an encoded image and outputting the encoded image and the AI metadata through the output unit.

또한, 본 개시의 일 실시예에 따르면, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 제1 AI 네트워크에 상기 제1 영상을 입력하여, 상기 미리 정의된 복수의 객체의 종류 각각에 대한 복수의 세그멘테이션 확률 맵을 생성하고, 상기 복수의 세그멘테이션 확률 맵에 기초하여, 상기 제1 영상에 포함된 객체의 종류에 대응하는 적어도 하나의 클래스를 정의하고, 상기 적어도 하나의 클래스를 포함하는 클래스 정보를 생성하고, 상기 복수의 세그멘테이션 확률 맵으로부터, 상기 적어도 하나의 클래스 각각에 대한 상기 적어도 하나의 클래스 맵을 생성할 수 있다.In addition, according to an embodiment of the present disclosure, the one or more processors execute the one or more instructions, input the first image to the first AI network, and apply to each type of the plurality of predefined objects. A class including generating a plurality of segmentation probability maps for , defining at least one class corresponding to a type of an object included in the first image based on the plurality of segmentation probability maps, and including the at least one class information may be generated, and the at least one class map for each of the at least one class may be generated from the plurality of segmentation probability maps.

또한, 본 개시의 일 실시예에 따르면, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 복수의 세그멘테이션 확률 맵 각각에 대해, 0의 값을 갖는 픽셀을 제외한 픽셀들의 평균 값을 산출하고, 상기 복수의 세그멘테이션 확률 맵 각각의 평균 값의 크기에 기초하여, 상기 미리 정의된 복수의 객체 중 일부를 선택하여 상기 적어도 하나의 클래스를 정의할 수 있다.In addition, according to an embodiment of the present disclosure, the one or more processors execute the one or more instructions to calculate an average value of pixels except for a pixel having a value of 0 for each of the plurality of segmentation probability maps, and , the at least one class may be defined by selecting some of the plurality of predefined objects based on the size of the average value of each of the plurality of segmentation probability maps.

또한, 본 개시의 일 실시예에 따르면, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 미리 정의된 복수의 객체 중 상기 적어도 하나의 클래스로 선택되지 않은 객체들 중 적어도 일부를 상기 적어도 하나의 클래스에 맵핑하고, 상기 적어도 하나의 클래스에 맵핑된 객체의 상기 세그멘테이션 확률 맵과 해당 객체가 맵핑된 클래스의 상기 세그멘테이션 확률 맵을 합성하여, 상기 클래스 맵을 생성할 수 있다.In addition, according to an embodiment of the present disclosure, the one or more processors execute the one or more instructions to display at least some of the objects that are not selected as the at least one class among the plurality of predefined objects. The class map may be generated by mapping to one class and synthesizing the segmentation probability map of the object mapped to the at least one class and the segmentation probability map of the class to which the corresponding object is mapped.

또한, 본 개시의 일 실시예에 따르면, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 제1 AI 네트워크으로부터, 미리 정의된 복수의 주파수 영역 각각에 대한 주파수에 대한 세그멘테이션 확률 맵을 생성하고, 상기 주파수에 대한 세그멘테이션 확률 맵에 기초하여, 상기 제1 영상에 대한 주파수 영역에 대한 정보를 포함하는 주파수 정보, 및 상기 주파수 정보에 포함된 주파수 영역 각각에 대응하는 적어도 하나의 주파수 맵을 포함하는 AI 메타 데이터를 생성할 수 있다.In addition, according to an embodiment of the present disclosure, the one or more processors execute the one or more instructions to generate, from the first AI network, a segmentation probability map for a frequency for each of a plurality of predefined frequency domains. and frequency information including information on a frequency domain for the first image based on the segmentation probability map for the frequency, and at least one frequency map corresponding to each of the frequency domains included in the frequency information. AI metadata can be generated.

또한, 본 개시의 일 실시예에 따르면, 상기 제1 영상은 복수의 프레임을 포함하고, 상기 클래스 정보 및 상기 적어도 하나의 클래스 맵은 상기 복수의 프레임 각각에 대해 생성되고, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 복수의 프레임으로부터 적어도 하나의 프레임을 포함하는 적어도 하나의 시퀀스를 정의하고, 각각의 시퀀스에 대해, 해당 시퀀스에 포함된 상기 적어도 하나의 프레임에 포함된 클래스에 대한 정보를 나타내는 시퀀스 클래스 정보, 및 상기 해당 시퀀스에 포함된 상기 적어도 하나의 프레임 각각에 포함된 클래스에 대한 정보를 나타내는 프레임 클래스 정보를 생성하고, 상기 프레임 클래스 정보는, 상기 시퀀스 클래스 정보에 포함된 클래스 중 해당 프레임에 포함된 클래스의 조합을 나타내고, 상기 시퀀스 클래스 정보보다 비트 수가 적을 수 있다.In addition, according to an embodiment of the present disclosure, the first image includes a plurality of frames, the class information and the at least one class map are generated for each of the plurality of frames, and the one or more processors include: By executing the one or more instructions, at least one sequence including at least one frame is defined from the plurality of frames, and for each sequence, a class included in the at least one frame included in the sequence is defined. Generates sequence class information indicating information and frame class information indicating information on a class included in each of the at least one frame included in the corresponding sequence, wherein the frame class information includes a class included in the sequence class information represents a combination of classes included in the corresponding frame, and the number of bits may be smaller than the sequence class information.

또한, 본 개시의 일 실시예에 따르면, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 적어도 하나의 클래스 맵에 기초하여, 각 픽셀이 상기 적어도 하나의 클래스 중 하나에 대응되는 대표 값을 갖는 경량화 클래스 맵을 생성하고, 상기 클래스 정보 및 상기 경량화 클래스 맵을 포함하는 경량화 AI 메타 데이터를 생성하고, 상기 출력부를 통해 상기 부호화 영상 및 상기 경량화 클래스 맵을 출력할 수 있다.Also, according to an embodiment of the present disclosure, the one or more processors execute the one or more instructions, and based on the at least one class map, each pixel has a representative value corresponding to one of the at least one class. may generate a lightweight class map having , generate lightweight AI metadata including the class information and the lightweight class map, and output the encoded image and the lightweight class map through the output unit.

또한, 본 개시의 일 실시예에 따르면, 상기 제1 AI 네트워크는, 적어도 하나의 컨벌루션 레이어(convolution layer) 및 적어도 하나의 최대값 풀링 레이어(max pooling layer)를 포함하고, 상기 제1 영상으로부터 특징 맵(feature map)을 생성하는 제1-1 AI 네트워크; 및 적어도 하나의 컨벌루션 레이어와 적어도 하나의 활성화 레이어(activation layer)를 포함하고 상기 제1-1 AI 네트워크로부터 상기 특징 맵을 입력받아 처리하는 제1 레이어 그룹, 상기 제1 레이어 그룹의 출력을 업 스케일링하는 업 스케일러, 및 적어도 하나의 컨벌루션 레이어 및 적어도 하나의 최소값 풀링 레이어(min pooling layer)를 포함하고 상기 업 스케일러의 출력을 입력 받아 상기 미리 정의된 복수의 객체 각각에 대한 세그멘테이션 확률 맵을 생성하는 제2 레이어 그룹을 포함하는 제1-2 AI 네트워크를 포함할 수 있다.Also, according to an embodiment of the present disclosure, the first AI network includes at least one convolution layer and at least one max pooling layer, and features from the first image. A 1-1 AI network that generates a map (feature map); and a first layer group including at least one convolutional layer and at least one activation layer, the first layer group receiving and processing the feature map from the 1-1 AI network, and upscaling the output of the first layer group and an upscaler comprising at least one convolutional layer and at least one min pooling layer, and receiving the output of the upscaler as input to generate a segmentation probability map for each of the plurality of predefined objects A 1-2 AI network including a 2-layer group may be included.

또한, 본 개시의 일 실시예에 따르면, 상기 제1 AI 네트워크는 제2 AI 네트워크와 연계되어 학습되고, 상기 제2 AI 네트워크는 상기 부호화 영상 및 상기 AI 메타 데이터를 수신하여 상기 부호화 영상을 복호화하는 장치에 포함되고, 상기 AI 메타 데이터로부터 상기 부호화 영상에 대응하는 영상 데이터의 화질 처리를 수행할 수 있다.In addition, according to an embodiment of the present disclosure, the first AI network is trained in connection with a second AI network, and the second AI network receives the encoded image and the AI metadata to decode the encoded image. It may be included in the device and perform image quality processing of image data corresponding to the encoded image from the AI metadata.

또한, 본 개시의 일 실시예에 따르면, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 제1 영상에 대해 다운 스케일 처리 및 부호화 처리를 하여 상기 부호화 영상을 생성할 수 있다.Also, according to an embodiment of the present disclosure, the one or more processors may execute the one or more instructions to perform downscale processing and encoding processing on the first image to generate the encoded image.

또한, 본 개시의 일 실시예의 다른 측면에 따르면, 제1 AI(artificial intelligence) 네트워크를 이용하여, 미리 정의된 복수의 객체 중 제1 영상에 포함된 객체의 종류에 대응하는 적어도 하나의 클래스를 포함하는 클래스 정보, 및 상기 제1 영상 내에서 각 객체에 대응하는 영역을 나타내는 적어도 하나의 클래스 맵을 포함하는 AI 메타 데이터를 생성하는 단계; 상기 제1 영상을 부호화하여 부호화 영상 생성하는 단계; 및 부호화 영상 및 상기 AI 메타 데이터를 출력하는 단계를 포함하는, 영상 제공 장치 제어 방법이 제공된다.In addition, according to another aspect of an embodiment of the present disclosure, using a first artificial intelligence (AI) network, at least one class corresponding to the type of object included in the first image among a plurality of predefined objects is included. generating AI metadata including class information and at least one class map indicating a region corresponding to each object in the first image; generating an encoded image by encoding the first image; and outputting the encoded image and the AI metadata.

또한, 본 개시의 일 실시예의 또 다른 측면에 따르면, 하나 이상의 인스트럭션을 저장하는 메모리; 상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행하는 하나 이상의 프로세서; 제1 영상에 대응하는 부호화 영상 및 상기 부호화 영상에 대응하는 AI 메타 데이터를 입력 받는 입력부; 및 출력부를 포함하고, 상기 AI 메타 데이터는 상기 제1 영상 내에 포함된 객체의 종류에 대응하는 적어도 하나의 클래스를 포함하는 클래스 정보, 및 상기 제1 영상 내에서 각 클래스에 대응하는 영역을 나타내는 적어도 하나의 클래스 맵을 포함하고, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 부호화 영상을 복호화하여 상기 제1 영상에 대응하는 제2 영상을 생성하고, 제2 AI 네트워크를 이용하여, 상기 제2 영상 및 상기 AI 메타 데이터로부터, 화질 개선 처리가 수행된 제3 영상을 생성하고, 상기 출력부를 통해 상기 제3 영상을 출력하는, 영상 재생 장치가 제공된다. In addition, according to another aspect of an embodiment of the present disclosure, a memory for storing one or more instructions; one or more processors to execute the one or more instructions stored in the memory; an input unit receiving an encoded image corresponding to the first image and AI metadata corresponding to the encoded image; and an output unit, wherein the AI metadata includes class information including at least one class corresponding to a type of an object included in the first image, and at least an area corresponding to each class in the first image. one class map, wherein the one or more processors execute the one or more instructions to decode the encoded image to generate a second image corresponding to the first image, and use a second AI network, An image reproducing apparatus is provided, which generates a third image on which image quality improvement has been performed, from the second image and the AI metadata, and outputs the third image through the output unit.

또한, 본 개시의 일 실시예에 따르면, 상기 제2 AI 네트워크는, 적어도 하나의 컨벌루션 레이어(convolution layer), 적어도 하나의 변조 레이어(modulation layer), 및 적어도 하나의 활성화 레이어(activation layer)를 포함하는 제2-1 AI 네트워크를 포함하고, 상기 적어도 하나의 변조 레이어는, 상기 AI 메타 데이터로부터 생성된 변조 파라미터 세트에 기초하여, 해당 변조 레이어로 입력된 특징 맵을 처리할 수 있다.Also, according to an embodiment of the present disclosure, the second AI network includes at least one convolution layer, at least one modulation layer, and at least one activation layer. and a 2-1 AI network, wherein the at least one modulation layer may process a feature map input to the modulation layer based on the modulation parameter set generated from the AI metadata.

또한, 본 개시의 일 실시예에 따르면, 상기 제2-1 AI 네트워크는 복수의 변조 레이어를 포함하고, 상기 제2 AI 네트워크는, 상기 AI 메타 데이터로부터 상기 복수의 변조 레이어 각각에 대응하는 변조 파라미터 생성 네트워크를 포함하고, 상기 변조 파라미터 생성 네트워크 각각은 대응하는 변조 레이어에 대한 상기 변조 파라미터 세트를 생성할 수 있다.In addition, according to an embodiment of the present disclosure, the 2-1 AI network includes a plurality of modulation layers, and the second AI network includes a modulation parameter corresponding to each of the plurality of modulation layers from the AI metadata. a generation network, wherein each modulation parameter generation network may generate the set of modulation parameters for a corresponding modulation layer.

또한, 본 개시의 일 실시예에 따르면, 상기 변조 파라미터 세트는, 입력된 특징 맵의 데이터 값에 곱셈 연산에 의해 합성되는 제1 연산 변조 파라미터, 및 상기 제1 연산 변조 파라미터와 상기 입력된 특징 맵의 데이터 값의 곱셈 결과에 덧셈 연산에 의해 합성되는 제2 연산 변조 파라미터를 포함하고, 상기 적어도 하나의 변조 레이어는, 상기 제1 연산 변조 파라미터 및 상기 제2 연산 변조 파라미터에 기초하여 상기 입력된 특징 맵에 대한 처리를 수행할 수 있다.In addition, according to an embodiment of the present disclosure, the modulation parameter set includes a first arithmetic modulation parameter synthesized by a multiplication operation on a data value of an input feature map, and the first arithmetic modulation parameter and the input feature map a second arithmetic modulation parameter synthesized by an addition operation to a result of multiplying data values of , wherein the at least one modulation layer comprises the input feature based on the first arithmetic modulation parameter and the second arithmetic modulation parameter. Processing can be performed on the map.

또한, 본 개시의 일 실시예에 따르면, 상기 제2-1 AI 네트워크는 복수의 잔차 블록을 포함하고, 각각의 상기 복수의 잔차 블록은, 적어도 하나의 컨벌루션 레이어, 적어도 하나의 변조 레이어, 및 적어도 하나의 활성화 레이어를 포함하고, 잔차 버전의 처리 결과 값을 생성하는 메인 스트림, 및 해당 블록에 포함된 적어도 하나의 레이어를 바이패스하여, 예측 버전의 처리 결과를 생성하는 적어도 하나의 제2 스킵 처리 경로를 포함하고, 상기 제2-1 AI 네트워크는, 상기 복수의 잔차 블록 중 적어도 하나의 잔차 블록을 스킵하여 예측 버전의 처리 결과 값을 생성하는 적어도 하나의 제1 스킵 처리 경로를 포함할 수 있다.Further, according to an embodiment of the present disclosure, the 2-1 AI network includes a plurality of residual blocks, and each of the plurality of residual blocks includes at least one convolutional layer, at least one modulation layer, and at least A main stream including one activation layer and generating a residual version processing result value, and at least one second skip processing for generating a prediction version processing result by bypassing at least one layer included in the corresponding block a path, and the 2-1 AI network may include at least one first skip processing path that skips at least one residual block among the plurality of residual blocks to generate a processing result value of a prediction version. .

또한, 본 개시의 일 실시예에 따르면, 상기 제2 AI 네트워크는, 상기 제2-1 AI 네트워크의 출력을 입력 받아 업 스케일 처리를 수행하는 업 스케일러, 및 상기 업 스케일러의 출력을 입력 받아 제3 영상을 생성하여 출력하고, 적어도 하나의 기계 학습된 레이어를 포함하는 제2 화질 처리부를 포함할 수 있다.In addition, according to an embodiment of the present disclosure, the second AI network includes an upscaler that receives the output of the 2-1 AI network and performs upscaling, and a third It may include a second image quality processing unit that generates and outputs an image and includes at least one machine-learned layer.

또한, 본 개시의 일 실시예에 따르면, 상기 입력부를 통해 입력된 AI 메타 데이터는 상기 적어도 하나의 클래스 각각에 대한 적어도 하나의 클래스 맵이 경량화된 경량화 클래스 맵을 포함하고, 상기 하나 이상의 프로세서는, 상기 하나 이상의 인스트럭션을 실행하여, 상기 경량화 클래스 맵으로부터, 상기 적어도 하나의 클래스 각각에 대한 복원 클래스 맵을 생성하고, 상기 복원 클래스 맵은 각 픽셀이 해당 클래스에 대응하는지 여부를 나타내는 값을 가질 수 있다.In addition, according to an embodiment of the present disclosure, the AI metadata input through the input unit includes a lightweight class map in which at least one class map for each of the at least one class is lightened, and the one or more processors, By executing the one or more instructions, a restored class map for each of the at least one class is generated from the lightweight class map, and the restored class map may have a value indicating whether each pixel corresponds to the corresponding class. .

또한, 본 개시의 일 실시예의 또 다른 측면에 따르면, 제1 영상에 대응하는 부호화 영상 및 상기 부호화 영상에 대응하는 AI 메타 데이터를 입력 받는 단계; 상기 부호화 영상을 복호화하여 상기 제1 영상에 대응하는 제2 영상을 생성하는 단계; 제2 AI 네트워크를 이용하여, 상기 제2 영상 및 상기 AI 메타 데이터로부터, 화질 개선 처리가 수행된 제3 영상을 생성하는 단계; 및 상기 제3 영상을 출력하는 단계를 포함하고, 상기 AI 메타 데이터는 상기 제1 영상 내에 포함된 객체의 종류에 대응하는 적어도 하나의 클래스를 포함하는 클래스 정보, 및 상기 제1 영상 내에서 각 클래스에 대응하는 영역을 나타내는 적어도 하나의 클래스 맵을 포함하는, 영상 재생 장치 제어 방법이 제공된다.According to another aspect of an embodiment of the present disclosure, the method may further include: receiving an encoded image corresponding to a first image and AI metadata corresponding to the encoded image; generating a second image corresponding to the first image by decoding the encoded image; generating, from the second image and the AI metadata, a third image on which image quality improvement has been performed, by using a second AI network; and outputting the third image, wherein the AI metadata includes class information including at least one class corresponding to a type of an object included in the first image, and each class in the first image A method for controlling an image reproducing apparatus is provided, including at least one class map indicating a region corresponding to .

또한, 본 개시의 일 실시예의 또 다른 측면에 따르면, 프로세서에 의해 실행되었을 때, 영상 재생 장치 제어 방법을 수행하는 컴퓨터 프로그램 명령어를 저장하는 컴퓨터로 읽을 수 있는 기록 매체에 있어서, 상기 영상 재생 장치 제어 방법은, 제1 영상에 대응하는 부호화 영상 및 상기 부호화 영상에 대응하는 AI 메타 데이터를 입력 받는 단계; 상기 부호화 영상을 복호화하여 상기 제1 영상에 대응하는 제2 영상을 생성하는 단계; 제2 AI 네트워크를 이용하여, 상기 제2 영상 및 상기 AI 메타 데이터로부터, 화질 개선 처리가 수행된 제3 영상을 생성하는 단계; 및 상기 제3 영상을 출력하는 단계를 포함하고, 상기 AI 메타 데이터는 상기 제1 영상 내에 포함된 객체의 종류에 대응하는 적어도 하나의 클래스를 포함하는 클래스 정보, 및 상기 제1 영상 내에서 각 클래스에 대응하는 영역을 나타내는 적어도 하나의 클래스 맵을 포함하는, 기록 매체가 제공된다.In addition, according to another aspect of an embodiment of the present disclosure, in a computer-readable recording medium storing computer program instructions for performing a method of controlling an image reproducing apparatus when executed by a processor, the image reproducing apparatus controlling the image reproducing apparatus The method may include: receiving an encoded image corresponding to the first image and AI metadata corresponding to the encoded image; generating a second image corresponding to the first image by decoding the encoded image; generating, from the second image and the AI metadata, a third image on which image quality improvement has been performed, by using a second AI network; and outputting the third image, wherein the AI metadata includes class information including at least one class corresponding to a type of an object included in the first image, and each class in the first image A recording medium is provided, comprising at least one class map indicating an area corresponding to .

본 개시의 실시예들에 따르면, AI 메타 데이터의 생성을 위해 이용되는 AI 화질 네트워크를 경량화할 수 있는 효과가 있다.According to embodiments of the present disclosure, there is an effect of reducing the weight of the AI image quality network used for generating AI metadata.

또한, 본 개시의 실시예들에 따르면, 영상 재생 장치에서 화질 처리를 위해 이용되는 AI 네트워크를 경량화할 수 있는 효과가 있다.In addition, according to embodiments of the present disclosure, there is an effect of reducing the weight of the AI network used for image quality processing in the image reproducing apparatus.

또한, 본 개시의 실시예들에 따르면, 원본 영상으로부터 AI 메타 데이터를 생성함에 의해, AI 화질 네트워크의 성능을 향상시킬 수 있는 효과가 있다.Also, according to embodiments of the present disclosure, there is an effect of improving the performance of an AI image quality network by generating AI metadata from an original image.

도 1은 본 개시의 일 실시예에 따른 영상 제공 장치 및 영상 재생 장치의 구조를 나타낸 도면이다.
도 2는 일 실시예에 따른 영상 제공 장치의 프로세서의 구조를 나타낸 도면이다.
도 3은 본 개시의 일 실시예에 따른 메타 정보 추출부의 처리 과정을 나타낸 도면이다.
도 4는 본 개시의 일 실시예에 따라, 영상 특성 분석 처리에서 제1 영상으로부터 객체 정보 및 클래스 정보를 생성하는 과정을 나타낸 도면이다.
도 5는 본 개시의 일 실시예에 따른 메타 정보 추출부의 정보 생성 네트워크의 동작에 대해 설명한 도면이다.
도 6은 본 개시의 일 실시예에 따른 영상 제공 장치의 메타 정보 경량화부의 동작을 나타낸 도면이다.
도 7은 본 개시의 일 실시예에 따른 메타 정보 압축부의 동작을 나타낸 도면이다.
도 8은 본 개시의 일 실시예에 따라, 메타 정보 압축부에서 클래스 정보를 압축하는 방식을 나타낸 도면이다.
도 9는 본 개시의 일 실시예에 따른 영상 제공 장치의 메타 정보 압축부의 동작을 나타낸 도면이다.
도 10은 본 개시의 일 실시예에 따른 영상 제공 장치 제어 방법을 나타낸 흐름도이다.
도 11은 본 개시의 일 실시예에 따른 영상 재생 장치의 입력부, 프로세서, 및 출력부의 구조를 나타낸 도면이다.
도 12는 본 개시의 일 실시예에 따른 메타 정보 합성부의 동작을 나타낸 도면이다.
도 13은 본 개시의 일 실시예에 따른 잔차 블록(1212)의 구조를 나타낸 도면이다.
도 14는 본 개시의 일 실시예에 따른 제2-1 AI 네트워크 및 제2-2 AI 네트워크의 동작을 나타낸 도면이다.
도 15a는 본 개시의 일 실시예에 따른 제2-1 AI 네트워크 및 제2-2 AI 네트워크의 구조를 나타낸 도면이다.
도 15b는 본 개시의 일 실시예에 따른 제2-1 AI 네트워크 및 제2-2 AI 네트워크의 구조를 나타낸 도면이다.
도 16은 본 개시의 일 실시예에 따른 메타 데이터 합성부 및 메타 데이터 가공부의 동작을 나타낸 도면이다.
도 17은 본 개시의 일 실시예에 따른 영상 제공 장치의 제1 AI 네트워크 및 영상 재생 장치의 제2 AI 네트워크의 학습 방법을 설명하기 위한 도면이다.
도 18은 본 개시의 일 실시예에 따른 영상 재생 장치 제어 방법을 나타낸 흐름도이다.
도 19는 본 개시의 일 실시예에 따른 영상 제공 장치와 영상 재생 장치의 구성을 나타낸 도면이다.1 is a diagram illustrating structures of an image providing apparatus and an image reproducing apparatus according to an embodiment of the present disclosure.
2 is a diagram illustrating a structure of a processor of an image providing apparatus according to an exemplary embodiment.
3 is a diagram illustrating a processing process of a meta information extracting unit according to an embodiment of the present disclosure.
4 is a diagram illustrating a process of generating object information and class information from a first image in an image characteristic analysis process, according to an embodiment of the present disclosure.
5 is a diagram illustrating an operation of an information generating network of a meta information extracting unit according to an embodiment of the present disclosure.
6 is a diagram illustrating an operation of a meta information lightweight unit of an image providing apparatus according to an embodiment of the present disclosure.
7 is a diagram illustrating an operation of a meta information compression unit according to an embodiment of the present disclosure.
8 is a diagram illustrating a method of compressing class information in a meta information compression unit according to an embodiment of the present disclosure.
9 is a diagram illustrating an operation of a meta information compression unit of an image providing apparatus according to an embodiment of the present disclosure.
10 is a flowchart illustrating a method of controlling an image providing apparatus according to an embodiment of the present disclosure.
11 is a diagram illustrating structures of an input unit, a processor, and an output unit of an image reproducing apparatus according to an embodiment of the present disclosure.
12 is a diagram illustrating an operation of a meta information synthesizing unit according to an embodiment of the present disclosure.
13 is a diagram illustrating a structure of a residual block 1212 according to an embodiment of the present disclosure.
14 is a diagram illustrating operations of a 2-1 AI network and a 2-2 AI network according to an embodiment of the present disclosure.
15A is a diagram illustrating structures of a 2-1 AI network and a 2-2 AI network according to an embodiment of the present disclosure.
15B is a diagram illustrating structures of a 2-1 AI network and a 2-2 AI network according to an embodiment of the present disclosure.
16 is a diagram illustrating operations of a metadata synthesizing unit and a metadata processing unit according to an embodiment of the present disclosure.
17 is a diagram for describing a method of learning a first AI network of an image providing apparatus and a second AI network of an image reproducing apparatus according to an embodiment of the present disclosure.
18 is a flowchart illustrating a method of controlling an image reproducing apparatus according to an embodiment of the present disclosure.
19 is a diagram illustrating the configuration of an image providing apparatus and an image reproducing apparatus according to an embodiment of the present disclosure.

본 명세서는 청구항의 권리범위를 명확히 하고, 본 개시의 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자가 청구항에 기재된 실시예를 실시할 수 있도록, 실시예들의 원리를 설명하고 개시한다. 개시된 실시예들은 다양한 형태로 구현될 수 있다.This specification clarifies the scope of the claims, and explains and discloses the principles of the embodiments so that those of ordinary skill in the art to which the embodiments of the present disclosure pertain can practice the embodiments described in the claims. The disclosed embodiments may be implemented in various forms.

명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 명세서가 실시예들의 모든 요소들을 설명하는 것은 아니며, 본 개시의 실시예들이 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 "모듈" 또는 "부"(unit)라는 용어는 소프트웨어, 하드웨어 또는 펌웨어 중 하나 또는 둘 이상의 조합으로 구현될 수 있으며, 실시예들에 따라 복수의 "모듈" 또는 "부"가 하나의 요소(element)로 구현되거나, 하나의 "모듈" 또는 "부"가 복수의 요소들을 포함하는 것도 가능하다. Like reference numerals refer to like elements throughout. This specification does not describe all elements of the embodiments, and general content in the technical field to which the embodiments of the present disclosure pertain or overlapping between the embodiments will be omitted. The term "module" or "unit" used in this specification may be implemented in one or a combination of two or more of software, hardware, or firmware, and according to embodiments, a plurality of "modules" or "units" may be one It is also possible that it is implemented as an element of , or that one “module” or “part” includes a plurality of elements.

실시예를 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 개시의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제 1, 제 2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the subject matter of the present disclosure, the detailed description thereof will be omitted. In addition, numbers (eg, first, second, etc.) used in the description process of the specification are only identifiers for distinguishing one component from other components.

또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.In addition, in this specification, when a component is referred to as "connected" or "connected" with another component, the component may be directly connected or directly connected to the other component, but in particular It should be understood that, unless there is a description to the contrary, it may be connected or connected through another element in the middle.

이하 첨부된 도면들을 참고하여 본 개시의 실시예들의 작용 원리 및 다양한 실시예들에 대해 설명한다.Hereinafter, the principle of operation and various embodiments of the embodiments of the present disclosure will be described with reference to the accompanying drawings.

본 명세서에서 '제1 AI(artificial intelligence) 네트워크' 및 '제2 AI 네트워크'는 다수의 트레이닝 데이터를 이용하여 기계 학습된 인공지능 모델을 의미한다. 제1 AI 네트워크 및 제2 AI 네트워크는 다양한 기계 학습 모델을 포함할 수 있고, 예를 들면, 심층 신경망 네트워크(DNN, Deep Neural Network) 구조를 포함한다.In the present specification, 'a first artificial intelligence (AI) network' and a 'second AI network' refer to an artificial intelligence model that is machine-learned using a plurality of training data. The first AI network and the second AI network may include various machine learning models, for example, include a deep neural network (DNN) structure.

또한, 본 명세서에서, '영상(image)' 또는 '픽처'는 정지영상, 복수의 연속된 정지영상(또는 프레임)으로 구성된 동영상, 또는 비디오에 대응될 수 있다.Also, in this specification, an 'image' or 'picture' may correspond to a still image, a moving picture composed of a plurality of continuous still images (or frames), or a video.

또한, 본 명세서에서 '제1 영상'은 부호화의 대상이 되는 영상을 의미하고, '부호화 영상'은 제1 영상을 부호화하여 생성된 영상을 의미한다. 또한, '제2 영상'은 부호화 영상을 복호화하여 생성된 영상을 의미하고, '제 3 영상'은 제2 영상에 대해 화질 개선 처리가 수행된 영상을 의미한다.In addition, in the present specification, a 'first image' refers to an image to be encoded, and an 'encoded image' refers to an image generated by encoding the first image. In addition, the 'second image' refers to an image generated by decoding the encoded image, and the 'third image' refers to an image on which image quality improvement processing is performed on the second image.

도 1은 본 개시의 일 실시예에 따른 영상 제공 장치 및 영상 재생 장치의 구조를 나타낸 도면이다.1 is a diagram illustrating structures of an image providing apparatus and an image reproducing apparatus according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따르면, 영상 제공 장치(110)에서 제1 영상으로부터 부호화 영상 및 AI 메타 데이터를 생성하여 영상 재생 장치(150)로 전송하고, 영상 재생 장치(150)는 부호화 영상 및 AI 메타 데이터를 수신하여, 부호화 영상을 복호화하고 화질 개선 처리를 수행하여, 수신된 부호화 영상을 출력한다. 영상 제공 장치(110)와 영상 재생 장치(150)는 통신 네트워크, 또는 다양한 입출력 인터페이스를 통해 연결될 수 있다.According to an embodiment of the present disclosure, the image providing apparatus 110 generates an encoded image and AI metadata from the first image and transmits them to the image reproducing apparatus 150, and the image reproducing apparatus 150 generates the encoded image and AI The received metadata is received, the encoded image is decoded, and the image quality improvement process is performed to output the received encoded image. The image providing apparatus 110 and the image reproducing apparatus 150 may be connected through a communication network or various input/output interfaces.

영상 제공 장치(110)는 다양한 종류의 전자 장치로 구현될 수 있다. 일 실시예에 따르면, 영상 제공 장치(110)는 영상 컨텐츠를 제공하는 서비스 제공자의 서버에 대응될 수 있다. 또한, 영상 제공 장치(110)는 물리적으로 독립된 전자 장치뿐만 아니라, 클라우드 서버의 형태로도 구현될 수 있다.The image providing apparatus 110 may be implemented as various types of electronic devices. According to an embodiment, the image providing apparatus 110 may correspond to a server of a service provider that provides image content. Also, the image providing device 110 may be implemented in the form of a cloud server as well as a physically independent electronic device.

영상 제공 장치(110)는 프로세서(112), 메모리(116), 및 출력부(118)를 포함한다.The image providing apparatus 110 includes a processor 112 , a memory 116 , and an output unit 118 .

프로세서(112)는 영상 제공 장치(110) 전반의 동작을 제어한다. 프로세서(112)는 하나 또는 그 이상의 프로세서로 구현될 수 있다. 프로세서(112)는 메모리(116)에 저장된 인스트럭션 또는 커맨드를 실행하여 소정의 동작을 수행할 수 있다.The processor 112 controls the overall operation of the image providing apparatus 110 . The processor 112 may be implemented with one or more processors. The processor 112 may execute an instruction or a command stored in the memory 116 to perform a predetermined operation.

프로세서(112)는 제1 영상을 입력 받아, 화질 처리에 관련된 AI 메타 데이터를 추출하고, 제1 영상을 부호화하여 부호화 영상을 생성할 수 있다. 프로세서(112)는 제1 AI 네트워크(114)를 이용하여, 제1 영상으로부터 AI 메타 데이터를 추출할 수 있다.The processor 112 may receive the first image, extract AI metadata related to image quality processing, and encode the first image to generate an encoded image. The processor 112 may extract AI metadata from the first image by using the first AI network 114 .

AI 메타 데이터는 제1 영상으로부터 추출된 화질 개선 처리와 관련된 메타 데이터를 의미한다. AI 메타 데이터는 제1 영상으로부터, 영상 내의 객체의 종류에 관련된 정보를 포함한다. AI 메타 데이터는 제1 영상으로부터 제1 AI 네트워크(114)를 이용하여 추출된다. AI metadata refers to metadata related to image quality improvement processing extracted from the first image. The AI metadata includes, from the first image, information related to the type of object in the image. AI metadata is extracted from the first image using the first AI network 114 .

프로세서(112)는 제1 AI(artificial intelligence) 네트워크(114)를 이용하여 입력 영상 데이터를 처리할 수 있다. 일 실시예에 따르면, 제1 AI 네트워크(114)는 영상 제공 장치(110) 내에 구비될 수 있다. 다른 실시예에 따르면, 제1 AI 네트워크(114)는 다른 장치에 구비되고, 영상 제공 장치(110)는 통신 인터페이스 등을 통해 외부 장치에 구비된 제1 AI 네트워크(114)를 이용할 수 있다. 본 명세서에서는 제1 AI 네트워크(114)가 영상 제공 장치(110) 내에 구비된 실시예를 중심으로 설명하지만, 본 개시의 실시예가 이러한 것으로 한정되는 것은 아니다.The processor 112 may process the input image data using the first artificial intelligence (AI) network 114 . According to an embodiment, the first AI network 114 may be provided in the image providing device 110 . According to another embodiment, the first AI network 114 may be provided in another device, and the image providing device 110 may use the first AI network 114 provided in an external device through a communication interface or the like. In the present specification, an embodiment in which the first AI network 114 is provided in the image providing device 110 is mainly described, but the embodiment of the present disclosure is not limited thereto.

제1 AI 네트워크(114)는 다수의 트레이닝 데이터를 이용하여 기계 학습된 인공지능 모델을 의미한다. 제1 AI 네트워크(114)는 다양한 구조의 기계 학습 모델을 포함할 수 있고, 예를 들면, 심층 신경망 네트워크(DNN, Deep Neural Network) 구조를 포함한다. 또한, 제1 AI 네트워크(114)는 예를 들면 합성곱 신경망(Convolutional Neural Network, CNN), 순환 신경망(Recurrent Neural Network, RNN) 등의 구조를 가질 수 있다. 또한, 제1 AI 네트워크(114)는 SFTGAN(Spatial Feature Transform Generative Adversarial Network), ESRGAN(Enhanced Super-resolution Generative Adversarial Network) 등의 기계 학습 알고리즘을 채용할 수 있다.The first AI network 114 refers to an artificial intelligence model that is machine-learned using a plurality of training data. The first AI network 114 may include a machine learning model having various structures, for example, a deep neural network (DNN) structure. In addition, the first AI network 114 may have a structure such as, for example, a convolutional neural network (CNN), a recurrent neural network (RNN), or the like. In addition, the first AI network 114 may employ a machine learning algorithm such as a Spatial Feature Transform Generative Adversarial Network (SFTGAN) or an Enhanced Super-resolution Generative Adversarial Network (ESRGAN).

제1 AI 네트워크(114)는 저장 공간 및 각 저장 공간에 저장된 데이터의 처리를 수행하는 프로세서로 구성될 수 있다. 제1 AI 네트워크(114)는 각 저장 공간의 데이터 사이의 처리를 정의하는 알고리즘 및 파라미터 값에 의해, 적어도 하나의 노드 및 레이어를 정의하고, 노드 및 레이어 사이의 데이터 처리를 정의할 수 있다. 제1 AI 네트워크(114)는 복수의 노드 및 복수의 레이어를 포함하고, 복수의 노드 및 복수의 레이어 사이에 데이터 전달에 의해, 입력 데이터를 처리하여 출력 데이터를 생성할 수 있다.The first AI network 114 may include a storage space and a processor for processing data stored in each storage space. The first AI network 114 may define at least one node and a layer, and define data processing between the nodes and layers, by an algorithm and parameter values that define processing between data in each storage space. The first AI network 114 may include a plurality of nodes and a plurality of layers, and may generate output data by processing input data by transferring data between the plurality of nodes and the plurality of layers.

메모리(116)는 영상 제공 장치(110)의 동작에 필요한 컴퓨터 프로그램 인스트럭션, 정보, 및 컨텐츠를 저장한다. 메모리(116)는 휘발성 저장매체, 비휘발성 저장매체, 또는 이들의 조합을 포함할 수 있다. 메모리(116)는 다양한 형태의 저장매체로 구현될 수 있다. 메모리(116)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory), SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.The memory 116 stores computer program instructions, information, and contents necessary for the operation of the image providing apparatus 110 . The memory 116 may include a volatile storage medium, a non-volatile storage medium, or a combination thereof. The memory 116 may be implemented with various types of storage media. The memory 116 may include a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg, SD or XD memory), and a RAM. (RAM, Random Access Memory), SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic It may include at least one type of storage medium among a disk and an optical disk.

출력부(118)는 영상 제공 장치(110)에서 생성 또는 처리된 데이터 또는 제어 신호를 출력한다. 출력부(118)는 영상 제공 장치(110)에서 생성된 부호화 영상 및 AI 메타 데이터를 출력한다. 출력부(118)는 통신 인터페이스, 디스플레이, 스피커, 터치스크린 등의 다양한 종류의 출력 인터페이스를 포함할 수 있다.The output unit 118 outputs data or control signals generated or processed by the image providing apparatus 110 . The output unit 118 outputs the encoded image and AI metadata generated by the image providing device 110 . The output unit 118 may include various types of output interfaces such as a communication interface, a display, a speaker, and a touch screen.

일 실시예에 따르면, 출력부(118)는 통신부 또는 통신 인터페이스를 포함할 수 있다. 출력부(118)는 통신 인터페이스를 통해 영상 재생 장치(150) 등의 외부 장치로 데이터 또는 신호를 출력할 수 있다. 통신 인터페이스는 직접 또는 다른 장치를 경유하여 부호화 영상 및 AI 메타 데이터를 영상 재생 장치(150)로 출력할 수 있다.According to an embodiment, the output unit 118 may include a communication unit or a communication interface. The output unit 118 may output data or signals to an external device such as the image reproducing apparatus 150 through a communication interface. The communication interface may output the encoded image and AI metadata to the image reproducing apparatus 150 directly or via another device.

영상 재생 장치(150)는 다양한 종류의 전자 장치로 구현될 수 있다. 일 실시예에 따르면, 영상 재생 장치(150)는 사용자 단말에 대응될 수 있다. 또한, 영상 재생 장치(150)는 휴대용 통신 단말, 스마트폰, 웨어러블 장치, 랩톱 컴퓨터, 데스크톱 컴퓨터, 태블릿 PC, 키오스크 등 다양한 형태의 전자 장치로 구현될 수 있다.The image reproducing apparatus 150 may be implemented as various types of electronic devices. According to an embodiment, the image reproducing apparatus 150 may correspond to a user terminal. Also, the image reproducing apparatus 150 may be implemented as various types of electronic devices, such as a portable communication terminal, a smart phone, a wearable device, a laptop computer, a desktop computer, a tablet PC, and a kiosk.

영상 재생 장치(150)는 입력부(152), 프로세서(154), 메모리(158), 및 출력부(160)를 포함할 수 있다.The image reproducing apparatus 150 may include an input unit 152 , a processor 154 , a memory 158 , and an output unit 160 .

입력부(152)는 데이터 또는 제어 신호를 입력 받아, 프로세서(154)로 전달한다. 입력부(152)는 영상 제공 장치(110)로부터 출력된 부호화 영상 및 AI 메타 데이터를 입력 받는다. The input unit 152 receives data or a control signal and transmits it to the processor 154 . The input unit 152 receives the encoded image and AI metadata output from the image providing device 110 .

입력부(152)는 통신부 또는 통신 인터페이스를 포함할 수 있다. 또한, 터치스크린, 키보드, 마우스, 터치 패드 등의 입출력 인터페이스를 포함할 수 있다.The input unit 152 may include a communication unit or a communication interface. Also, it may include an input/output interface such as a touch screen, a keyboard, a mouse, and a touch pad.

프로세서(154)는 영상 재생 장치(150) 전반의 동작을 제어한다. 프로세서(154)는 하나 또는 그 이상의 프로세서로 구현될 수 있다. 프로세서(154)는 메모리(158)에 저장된 인스트럭션 또는 커맨드를 실행하여 소정의 동작을 수행할 수 있다.The processor 154 controls the overall operation of the image reproducing apparatus 150 . Processor 154 may be implemented with one or more processors. The processor 154 may execute an instruction or a command stored in the memory 158 to perform a predetermined operation.

프로세서(154)는 제2 AI(artificial intelligence) 네트워크(156)를 이용하여 입력 영상 데이터를 처리할 수 있다. 프로세서(154)는 영상 제공 장치(110)로부터 입력된 부호화 영상 및 AI 메타 데이터를 수신하여, 부호화 영상에 대한 복호화 처리를 통해 제2 영상을 획득하고, 제2 영상에 대해, AI 메타 데이터를 이용하여 화질 개선 처리를 수행하여 제3 영상을 획득할 수 있다. 프로세서(154)는 제2 AI 네트워크(156)를 이용하여, 제2 영상 및 AI 메타 데이터 입력으로부터, 제3 영상 출력을 획득할 수 있다. The processor 154 may process the input image data using the second artificial intelligence (AI) network 156 . The processor 154 receives the encoded image and AI metadata input from the image providing device 110 , obtains a second image through decoding processing for the encoded image, and uses the AI metadata for the second image Thus, the third image may be obtained by performing image quality improvement processing. The processor 154 may obtain a third image output from the second image and AI metadata input by using the second AI network 156 .

일 실시예에 따르면, 영상 재생 장치(150)는 저해상도 부호화 영상을 수신하여, 부호화 영상으로부터 획득된 제2 영상에 대한 업스케일 처리를 함께 수행할 수 있다. 프로세서(156)는 제2 영상에 대한 화질 개선 처리 및 업스케일 처리를 수행할 수 있다. 프로세서(156)는 제2 AI 네트워크(156)를 이용하여 화질 개선 처리 및 업스케일 처리를 수행할 수 있다.According to an embodiment, the image reproducing apparatus 150 may receive the low-resolution encoded image and perform upscaling on the second image obtained from the encoded image together. The processor 156 may perform image quality improvement processing and upscaling processing on the second image. The processor 156 may perform image quality improvement processing and upscaling processing using the second AI network 156 .

일 실시예에 따르면, 제2 AI 네트워크(156)는 영상 재생 장치(150) 내에 구비될 수 있다. 다른 실시예에 따르면, 제2 AI 네트워크(156)는 다른 장치에 구비되고, 영상 재생 장치(150)는 통신 인터페이스 등을 통해 외부 장치에 구비된 제2 AI 네트워크(156)를 이용할 수 있다. 본 명세서에서는 제2 AI 네트워크(156)가 영상 재생 장치(150) 내에 구비된 실시예를 중심으로 설명하지만, 본 개시의 실시예가 이러한 것으로 한정되는 것은 아니다.According to an embodiment, the second AI network 156 may be provided in the image reproducing apparatus 150 . According to another embodiment, the second AI network 156 may be provided in another device, and the image reproducing device 150 may use the second AI network 156 provided in the external device through a communication interface or the like. In the present specification, an embodiment in which the second AI network 156 is provided in the image reproducing apparatus 150 is mainly described, but the embodiment of the present disclosure is not limited thereto.

제2 AI 네트워크(156)는 다수의 트레이닝 데이터를 이용하여 기계 학습된 인공지능 모델을 의미한다. 제2 AI 네트워크(156)는 다양한 구조의 기계 학습 모델을 포함할 수 있고, 예를 들면, 심층 신경망 네트워크(DNN, Deep Neural Network) 구조를 포함한다. 또한, 제2 AI 네트워크(156)는 예를 들면 합성곱 신경망(Convolutional Neural Network, CNN), 순환 신경망(Recurrent Neural Network, RNN) 등의 구조를 가질 수 있다. 또한, 제2 AI 네트워크(156)는 SFTGAN(Spatial Feature Transform Generative Adversarial Network), ESRGAN(Enhanced Super-resolution Generative Adversarial Network) 등의 기계 학습 알고리즘을 채용할 수 있다.The second AI network 156 refers to a machine-learning artificial intelligence model using a plurality of training data. The second AI network 156 may include a machine learning model having various structures, for example, a deep neural network (DNN) structure. In addition, the second AI network 156 may have a structure such as, for example, a convolutional neural network (CNN), a recurrent neural network (RNN), or the like. In addition, the second AI network 156 may employ a machine learning algorithm such as a Spatial Feature Transform Generative Adversarial Network (SFTGAN) or an Enhanced Super-resolution Generative Adversarial Network (ESRGAN).

제2 AI 네트워크(156)는 저장 공간 및 각 저장 공간에 저장된 데이터의 처리를 수행하는 프로세서로 구성될 수 있다. 제2 AI 네트워크(156)는 각 저장 공간의 데이터 사이의 처리를 정의하는 알고리즘 및 파라미터 값에 의해, 적어도 하나의 노드 및 레이어를 정의하고, 노드 및 레이어 사이의 데이터 처리를 정의할 수 있다. 제2 AI 네트워크(156)는 복수의 노드 및 복수의 레이어를 포함하고, 복수의 노드 및 복수의 레이어 사이에 데이터 전달에 의해, 입력 데이터를 처리하여 출력 데이터를 생성할 수 있다.The second AI network 156 may include a storage space and a processor that performs processing of data stored in each storage space. The second AI network 156 may define at least one node and layer, and define data processing between the nodes and layers, by an algorithm and parameter values that define processing between data in each storage space. The second AI network 156 may include a plurality of nodes and a plurality of layers, and may generate output data by processing input data by transferring data between the plurality of nodes and the plurality of layers.

메모리(158)는 영상 재생 장치(150)의 동작에 필요한 컴퓨터 프로그램 인스트럭션, 정보, 및 컨텐츠를 저장한다. 메모리(158)는 휘발성 저장매체, 비휘발성 저장매체, 또는 이들의 조합을 포함할 수 있다. 메모리(158)는 앞서 영상 제공 장치(110)의 메모리(116)에서 설명한 바와 같이, 다양한 형태의 저장매체로 구현될 수 있다. The memory 158 stores computer program instructions, information, and contents necessary for the operation of the image reproducing apparatus 150 . Memory 158 may include volatile storage media, non-volatile storage media, or a combination thereof. As described above with respect to the memory 116 of the image providing apparatus 110 , the memory 158 may be implemented as various types of storage media.

출력부(160)는 영상 재생 장치(150)에서 생성 또는 처리된 데이터 또는 제어 신호를 출력한다. 출력부(160)는 영상 재생 장치(150)에서 제3 영상을 출력한다. 출력부(118)는 통신 인터페이스, 디스플레이, 스피커, 터치스크린 등의 다양한 종류의 출력 인터페이스를 포함할 수 있다.The output unit 160 outputs data or control signals generated or processed by the image reproducing apparatus 150 . The output unit 160 outputs the third image from the image reproducing apparatus 150 . The output unit 118 may include various types of output interfaces such as a communication interface, a display, a speaker, and a touch screen.

일 실시예에 따르면, 출력부(160)는 디스플레이에 대응하고, 제3 영상을 표시함에 의해 출력할 수 있다.According to an embodiment, the output unit 160 may output the third image by displaying it corresponding to the display.

다른 실시예에 따르면, 출력부(160)는 통신부 또는 통신 인터페이스를 포함할 수 있다. 출력부(160)는 통신 인터페이스를 통해 외부 장치, 디스플레이 장치, 또는 멀티미디어 장치 등으로 제3 영상을 전송할 수 있다. 예를 들면, 출력부(160)는 제3 영상의 디스플레이 데이터를 외부의 디스플레이 장치로 전송할 수 있다. According to another embodiment, the output unit 160 may include a communication unit or a communication interface. The output unit 160 may transmit the third image to an external device, a display device, or a multimedia device through a communication interface. For example, the output unit 160 may transmit display data of the third image to an external display device.

도 2는 일 실시예에 따른 영상 제공 장치의 프로세서의 구조를 나타낸 도면이다.2 is a diagram illustrating a structure of a processor of an image providing apparatus according to an exemplary embodiment.

본 개시에서 프로세서(112 또는 154) 내의 블록들은 적어도 하나의 소프트웨어 처리 블록 또는 적어도 하나의 전용 하드웨어 프로세서, 및 이들의 조합에 대응될 수 있다. 본 개시에서 프로세서(112 또는 154) 내에 정의한 블록들은 본 개시의 실시예들을 수행하기 위한 소프트웨어 처리 단위의 일례일 뿐이고, 본 개시에서 개시된 처리 단위 이외에도 다양한 방식으로 본 개시의 실시예들을 수행하는 처리 단위가 정의될 수 있다.In the present disclosure, blocks within the processor 112 or 154 may correspond to at least one software processing block or at least one dedicated hardware processor, and a combination thereof. Blocks defined in the processor 112 or 154 in the present disclosure are only an example of a software processing unit for performing embodiments of the present disclosure, and a processing unit that performs embodiments of the present disclosure in various ways other than the processing unit disclosed in the present disclosure can be defined.

일 실시예에 따르면, 영상 제공 장치(110)의 프로세서(112)는 메타 데이터 생성부(210) 및 부호화부(220)를 포함할 수 있다. 메타 데이터 생성부(210)는 제1 영상을 입력 받아, AI 메타 데이터를 생성한다. According to an embodiment, the processor 112 of the image providing apparatus 110 may include a metadata generator 210 and an encoder 220 . The metadata generator 210 receives the first image and generates AI metadata.

일 실시예에 따르면, 제1 영상은 프로세서(112a)에 의해 다운 스케일링 되어 부호화될 수 있다. 예를 들면, 제1 영상은 메타 데이터 생성부(210), 부호화부(220), 또는 별도의 다운 스케일러 블록에 의해 다운 스케일링 될 수 있다. 일 실시예에 따르면, 메타 데이터 생성부(210)는 다운 스케일링 되기 전의 고해상도 고화질 영상으로부터 AI 메타 데이터를 생성한다. 영상 제공 장치(110)는 고해상도 고화질 영상으로부터 AI 메타 데이터를 생성함에 의해, 화질에 관련된 정보의 손실 없이 AI 메타 데이터를 생성할 수 있다. According to an embodiment, the first image may be downscaled and encoded by the processor 112a. For example, the first image may be downscaled by the metadata generator 210 , the encoder 220 , or a separate downscaler block. According to an embodiment, the metadata generator 210 generates AI metadata from a high-resolution high-definition image before downscaling. The image providing apparatus 110 may generate AI metadata without loss of information related to image quality by generating AI metadata from a high-resolution high-definition image.

메타 데이터 생성부(210)는 일부 처리에서 제1 AI 네트워크(114)를 이용할 수 있다. 제1 AI 네트워크(114)는 프로세서(112a) 내에 구비되거나, 별도의 전용 프로세서로 영상 제공 장치(110) 내에 구비되거나, 외부 장치에 구비될 수 있다.The metadata generator 210 may use the first AI network 114 in some processing. The first AI network 114 may be provided in the processor 112a, in the image providing device 110 as a separate dedicated processor, or in an external device.

메타 데이터 생성부(210)는 메타 정보 추출부(212) 및 메타 정보 경량화부(214)를 포함할 수 있다. 또한, 일 실시예에 따르면, 메타 데이터 생성부(210)는 메타 정보 압축부(216)를 더 포함할 수 있다. 메타 정보 압축부(216)는 실시예에 따라 생략 가능하다.The meta data generating unit 210 may include a meta information extracting unit 212 and a meta information lightening unit 214 . Also, according to an embodiment, the meta data generating unit 210 may further include a meta information compressing unit 216 . The meta information compression unit 216 may be omitted in some embodiments.

메타 정보 추출부(212)는 제1 영상으로부터 AI 메타 데이터를 추출한다. AI 메타 데이터는 제1 영상에 포함된 객체의 종류를 나타내는 클래스 정보, 및 제1 영상 내에서 각 영역에 대응하는 클래스를 나타내는 클래스 맵을 포함할 수 있다. 클래스 맵은 제1 영상에 대응하는 픽셀 어레이에 대해, 각 픽셀에 대응하는 클래스를 정의한 정보이다. 클래스 맵은 제1 영상의 클래스 정보에 포함되는 적어도 하나의 클래스 각각에 대해 생성될 수 있다. 예를 들면, 제1 영상에 대해 디폴트(default), 하늘(sky), 풀(grass), 및 식물(plant)의 4 개의 클래스가 정의되어, 클래스 정보가 4개의 클래스를 정의하는 경우, 클래스 맵은 default, sky, grass, 및 plant 각각에 대해 생성되어, 총 4개의 클래스 맵이 생성될 수 있다. The meta information extraction unit 212 extracts AI metadata from the first image. The AI metadata may include class information indicating the type of object included in the first image and a class map indicating a class corresponding to each region in the first image. The class map is information defining a class corresponding to each pixel with respect to a pixel array corresponding to the first image. The class map may be generated for each of at least one class included in the class information of the first image. For example, when four classes of default, sky, grass, and plant are defined for the first image and the class information defines four classes, the class map is generated for each of default, sky, grass, and plant, so a total of 4 class maps can be generated.

메타 정보 추출부(212)는 제1 영상으로부터 객체를 검출하고, 검출된 객체들을 종류에 대응하는 소정 개수의 클래스를 정의할 수 있으며, 클래스로 정의되지 않은 객체는 소정 클래스에 맵핑한다. 객체는 영상 내에 포함된 대상체를 나타낸다. 객체의 종류는 실시예에 따라 다양하게 정의될 수 있으며, 예를 들면, 하늘(sky), 물(water), 풀(grass), 산(mountain), 빌딩, 식물(plant), 또는 동물(animal) 등을 포함할 수 있다. 만약 제1 영상에서 water에 대응하는 객체가 검출되었지만, water 객체에 대응하는 픽셀 수가 적어 water 객체는 클래스 정보에 포함되지 않은 경우, water 객체에 대응하는 영역은 제1 영상에 대해 정의된 클래스 중 하나에 맵핑될 수 있다. 클래스 맵은 각 픽셀에 대해, 해당 클래스에 대응할 확률을 나타내는 확률 맵의 형태로 구현될 수 있다. 예를 들면, sky에 대응하는 클래스 맵은, 각 픽셀이 sky에 대응할 확률을 나타낼 수 있다. 클래스 맵의 해상도는 제1 영상의 해상도와 같거나, 제1 영상의 해상도보다 작을 수 있다. The meta information extractor 212 may detect an object from the first image, define a predetermined number of classes corresponding to the types of the detected objects, and map the objects that are not defined as classes to the predetermined class. The object represents an object included in the image. The type of object may be defined in various ways according to embodiments, for example, sky, water, grass, mountain, building, plant, or animal. ) and the like. If an object corresponding to water is detected in the first image, but the water object is not included in class information because the number of pixels corresponding to the water object is small, the area corresponding to the water object is one of the classes defined for the first image. can be mapped to The class map may be implemented in the form of a probability map indicating a probability corresponding to a corresponding class for each pixel. For example, a class map corresponding to sky may indicate a probability that each pixel corresponds to sky. The resolution of the class map may be the same as the resolution of the first image or may be smaller than the resolution of the first image.

메타 정보 추출부(212)는 제1 AI 네트워크(114)를 이용하여, 제1 영상으로부터 AI 메타 데이터를 추출할 수 있다. 일 실시예에 따르면, 메타 정보 추출부(212)는 제1 영상 및 클래스 정보로부터 클래스 맵을 추출하기 위해, 제1 AI 네트워크(114)를 이용할 수 있다.The meta information extraction unit 212 may extract AI metadata from the first image by using the first AI network 114 . According to an embodiment, the meta information extraction unit 212 may use the first AI network 114 to extract a class map from the first image and class information.

메타 정보 추출부(212)는 AI 메타 데이터가 경량화 처리를 거치기 전의 형태의 AI 메타 데이터를 생성하고, AI 메타 데이터는 클래스 정보 및 각 클래스에 대응하는 복수의 클래스 맵 을 포함할 수 있다.The meta information extraction unit 212 may generate AI metadata in a form before the AI metadata undergoes weight reduction processing, and the AI metadata may include class information and a plurality of class maps corresponding to each class.

메타 정보 경량화부(214)는 메타 정보 추출부(212)에서 추출된 AI 메타 데이터의 데이터 용량을 감소시키는 경량화 처리를 수행한다. 메타 정보 경량화부(214)는 복수의 클래스 맵을 경량화함에 의해, AI 메타 데이터의 비트 수를 감소시켜, AI 메타 데이터를 경량화한다. 메타 정보 경량화부(214)는 하나의 맵에 각 위치(픽셀) 별 대표 값을 정의하여 경량화된 클래스 맵을 생성한다. 경량화된 클래스 맵의 대표 값은 각 픽셀에 대응하는 클래스를 나타낸다. 이에 따라, 경량화된 AI 메타 데이터는 경량화된 클래스 맵 및 클래스 정보를 포함한다.The meta information weight reduction unit 214 performs a weight reduction process for reducing the data capacity of the AI metadata extracted by the meta information extraction unit 212 . The meta information weight reduction unit 214 reduces the number of bits of the AI metadata by reducing the weight of the plurality of class maps, thereby reducing the weight of the AI metadata. The meta information lightweight unit 214 generates a lightweight class map by defining a representative value for each location (pixel) in one map. A representative value of the lightweight class map indicates a class corresponding to each pixel. Accordingly, the lightweight AI metadata includes a lightweight class map and class information.

메타 정보 압축부(216)는 메타 정보 경량화부(214)에 의해 경량화된 클래스 맵의 가로 및 세로 크기를 감소시키는 처리를 수행하여, 경량화된 클래스 맵의 크기를 감소시킨다. 메타 정보 압축부(216)는 영상 압축에 사용되는 무손실(lossless) 압축 방식을 적용하여, 경량화된 클래스 맵을 압축한다. 메타 정보 압축부(216)는 경량화된 클래스 맵 및 클래스 정보를 압축하여, 압축된 AI 메타 데이터를 생성하여 출력한다. 압축된 AI 메타 데이터는 경량화된 클래스 맵 및 클래스 정보를 포함한다.The meta-information compression unit 216 reduces the size of the lightweight class map by reducing the horizontal and vertical sizes of the class map lightweighted by the meta-information lightweight unit 214 . The meta information compression unit 216 compresses the lightweight class map by applying a lossless compression method used for image compression. The meta information compression unit 216 compresses the lightweight class map and class information to generate and output compressed AI metadata. The compressed AI metadata includes a lightweight class map and class information.

부호화부(220)는 제1 영상을 부호화하여 부호화 영상을 생성한다. 부호화부(220)는 다양한 종류의 영상 부호화 알고리즘을 이용하여 제1 영상을 부호화할 수 있다. 부호화부(220)는 제 1 영상을 예측하여 예측 데이터를 생성하는 과정, 제 1 영상과 예측 데이터 사이의 차이에 해당하는 잔차 데이터를 생성하는 과정, 공간 영역 성분인 잔차 데이터를 주파수 영역 성분으로 변환(transformation)하는 과정, 주파수 영역 성분으로 변환된 잔차 데이터를 양자화(quantization)하는 과정 및 양자화된 잔차 데이터를 엔트로피 부호화하는 과정 등을 수행할 수 있다. 이와 같은 부호화 과정은 MPEG-2, H.264 AVC(Advanced Video Coding), MPEG-4, HEVC(High Efficiency Video Coding), VC-1, VP8, VP9 및 AV1(AOMedia Video 1) 등 주파수 변환을 이용한 영상 압축 방법 중의 하나를 통해 구현될 수 있다.The encoder 220 encodes the first image to generate an encoded image. The encoder 220 may encode the first image using various types of image encoding algorithms. The encoder 220 generates prediction data by predicting the first image, generates residual data corresponding to a difference between the first image and the prediction data, and transforms the spatial domain residual data into frequency domain components. A process of transforming, a process of quantizing the residual data transformed into a frequency domain component, and a process of entropy encoding the quantized residual data may be performed. This encoding process uses frequency conversion such as MPEG-2, H.264 AVC (Advanced Video Coding), MPEG-4, HEVC (High Efficiency Video Coding), VC-1, VP8, VP9 and AV1 (AOMedia Video 1). It may be implemented through one of the image compression methods.

압축된 AI 메타 데이터는 부호화 영상의 헤더에 포함되어 출력되거나, 부호화 영상과 별도로 출력될 수 있다. 일 실시예에 따르면, 메타 정보 추출부(212)는 클래스 정보 및 클래스 맵 이외에도, 다른 종류의 영상 특성에 대응하는 영상 특성 정보 및 영상 특성 맵을 생성하고, AI 메타 데이터는 추가적인 영상 특성 정보 및 영상 특성 맵을 포함할 수 있다. 추가적인 영상 특성은 주파수, 텍스처, 시멘틱, 또는 촬영 파라미터 중 적어도 하나 또는 이들의 조합을 포함할 수 있다.The compressed AI metadata may be included in the header of the encoded image and output, or may be output separately from the encoded image. According to an embodiment, the meta-information extractor 212 generates image characteristic information and an image characteristic map corresponding to other types of image characteristics in addition to the class information and the class map, and the AI metadata includes additional image characteristic information and the image. It may include a feature map. The additional image characteristics may include at least one of frequency, texture, semantics, or imaging parameters or a combination thereof.

일 실시예에 따르면, 메타 정보 추출부(212)는 제1 영상으로부터 주파수 정보를 분석하여 주파수 정보 및 주파수 맵을 생성할 수 있다. AI 메타 데이터는 주파수 정보 및 주파수 맵을 포함할 수 있다. 주파수 정보는 제1 영상에 포함된 주파수 영역들에 대한 정보를 포함할 수 있다. 주파수 맵은 각 주파수 영역에 대해, 제1 영상 내에서 해당 주파수 영역에 해당할 확률을 나타낼 수 있다. 예를 들면, 5개의 주파수 범위가 정의되고, 각 주파수 영역에 대응하는 5개의 주파수 맵이 생성될 수 있다. 일 실시예에 따르면, 저주파 영역은, 영상 내에서 평탄한 영역으로, 화질 처리에 의한 세밀감 개선이 적기 때문에, 영상 특성 정보에서 제외되고, 영상 특성 정보는 소정 주파수 이상의 고주파 영역에 대해서만 정의할 수 있다.According to an embodiment, the meta information extractor 212 may generate frequency information and a frequency map by analyzing frequency information from the first image. AI metadata may include frequency information and a frequency map. The frequency information may include information on frequency regions included in the first image. The frequency map may indicate, for each frequency domain, a probability corresponding to the corresponding frequency domain in the first image. For example, five frequency ranges may be defined, and five frequency maps corresponding to each frequency region may be generated. According to an embodiment, the low-frequency region is a flat region within the image, and since the improvement in detail by image quality processing is small, it is excluded from the image characteristic information, and the image characteristic information can be defined only for a high-frequency region of a predetermined frequency or higher. .

다른 실시예에 따르면, 메타 정보 추출부(212)는 제1 영상으로부터 텍스처 정보를 분석하여 텍스처 정보 및 텍스처 맵을 생성할 수 있다. AI 메타 데이터는 텍스처 정보 및 텍스처 맵을 포함할 수 있다. 텍스처 정보는 제1 영상의 각 영역에서의 조밀도에 대응하는 정보이다. 메타 정보 추출부(212)는 제1 영상의 각 영역으로부터 조밀도를 계산하여, 조밀도 값에 따라 텍스처 정보를 생성할 수 있다. 높은 조밀도는 해당 영역 내에서 복잡도가 높고, 다양한 패턴이 존재하는 것을 나타낼 수 있다. 조밀도(SI, Spatial Information)는 다음의 수학식 1에 기초하여 산출될 수 있다. According to another embodiment, the meta information extractor 212 may generate texture information and a texture map by analyzing texture information from the first image. AI metadata may include texture information and texture maps. The texture information is information corresponding to the density in each region of the first image. The meta information extractor 212 may calculate a density from each region of the first image and generate texture information according to the density value. A high density may indicate that a high complexity and various patterns exist within a corresponding region. The density (Spatial Information, SI) may be calculated based on Equation 1 below.

[수학식 1][Equation 1]

Spatial Information(SI) = stdev[Sobel(ISpatial Information(SI) = stdev[Sobel(I) _HRHR )])]

조밀도 산출을 위해, 제1 영상은 Sobel 필터(Sobel(I_HR))로 필터링될 수 있다. 여기서 I_HR은 필터에 입력되는 영상을 나타내고, 본 실시예에서는 제1 영상에 대응된다. 다음으로 Sobel 필터에 의해 처리된 제1 영상에서의 픽셀들에 대해 표준 편차가 산출된다(stdev[Sobel(I_HR)]). 조밀도 값의 분포에 따라, 조밀도 값에 따른 소정의 영역들이 정의되고, 제1 영역에서 검출된 조밀도 값의 영역들이 텍스처 정보에 포함될 수 있다. 텍스처 맵은 텍스처 정보에 포함된 각각의 조밀도 영역에 대응하는 제1 영상의 영역을 나타낸 맵 정보이다. 텍스처 맵은 확률 맵 형태로 표현될 수 있다.To calculate the density, the first image may be filtered with a _{Sobel filter (Sobel(I HR )).} Here, I _HR represents an image input to the filter, and corresponds to the first image in the present embodiment. Next, a standard deviation is calculated for pixels in the first image processed by the Sobel filter (stdev[Sobel(I _HR )]). According to the distribution of the density values, predetermined regions according to the density values may be defined, and regions of the density values detected in the first region may be included in the texture information. The texture map is map information indicating a region of the first image corresponding to each density region included in the texture information. The texture map may be expressed in the form of a probability map.

다른 실시예에 따르면, 메타 정보 추출부(212)는 제1 영상에 적용된 촬영 파라미터 정보 및 촬영 환경 정보를 추출하여 AI 메타 데이터에 기록할 수 있다. 촬영 파라미터는 예를 들면, 줌 파라미터, 조리개 정보, 초점 거리 정보, 플래시 사용 정보, 화이트밸런스 정보, ISO 정보, 노출 시간 정보, 촬영 모드 정보, 아웃 포커싱 정보, AF(Auto-focusing) 정보, AE(Auto-exposure) 정보, 인물 검출 정보, 장면 정보 등을 포함할 수 있다. 촬영 환경 정보는 조도 정보를 포함할 수 있다. 영상 특성 맵은 해당 촬영 파라미터 정보 또는 촬영 환경 정보에서 제1 영상 내의 각 픽셀에 대응하는 값을 정의할 수 있는 경우 함께 생성될 수 있다. 제1 영상 내의 각 픽셀에 대응하는 값을 정의할 수 없는 경우, 제1 영상에 대해 영상 특성 값이 정의될 수 있다. 예를 들면, 줌 파라미터, 조리개 정보, 초점 거리 정보, 플래시 사용 정보, 화이트밸런스 정보, ISO 정보, 노출 시간 정보, 촬영 모드 정보, 아웃 포커싱 정보, AF(Auto-focusing) 정보, AE(Auto-exposure) 정보, 장면 정보, 조도 정보에 대해서는 제1 영상에 대응하는 영상 특성 값이 정의될 수 있다. 인물 검출 정보는 제1 영상에서 인물이 검출된 영역을 나타내는 영상 특성 맵 또는 인물이 검출된 영역의 좌표를 나타내는 영상 특성 값이 정의될 수 있다.According to another embodiment, the meta information extractor 212 may extract shooting parameter information and shooting environment information applied to the first image and record the extracted information in AI metadata. Shooting parameters include, for example, zoom parameter, aperture information, focal length information, flash usage information, white balance information, ISO information, exposure time information, shooting mode information, out-focusing information, AF (Auto-focusing) information, AE ( Auto-exposure) information, person detection information, scene information, and the like may be included. The photographing environment information may include illuminance information. The image characteristic map may be generated together when a value corresponding to each pixel in the first image can be defined in the corresponding shooting parameter information or the shooting environment information. When a value corresponding to each pixel in the first image cannot be defined, an image characteristic value may be defined for the first image. For example, zoom parameters, aperture information, focal length information, flash usage information, white balance information, ISO information, exposure time information, shooting mode information, out-focusing information, AF (Auto-focusing) information, AE (Auto-exposure) information ) information, scene information, and illuminance information, an image characteristic value corresponding to the first image may be defined. The person detection information may include an image characteristic map indicating an area in which a person is detected in the first image or an image characteristic value indicating a coordinate of an area in which the person is detected.

본 개시에서는 메타 정보 추출부(212)에서 클래스 정보 및 클래스 맵을 포함하는 AI 메타 데이터를 생성하는 실시예를 중심으로 설명한다. 다만, 본 개시의 실시예가 클래스 정보 및 클래스 맵에 대한 것으로 한정되는 것은 아니다.In the present disclosure, an embodiment in which the meta information extraction unit 212 generates AI metadata including class information and a class map will be mainly described. However, embodiments of the present disclosure are not limited to class information and a class map.

도 3은 본 개시의 일 실시예에 따른 메타 정보 추출부의 처리 과정을 나타낸 도면이다.3 is a diagram illustrating a processing process of a meta information extracting unit according to an embodiment of the present disclosure.

메타 정보 추출부(212a)는 제1 영상(302)을 입력 받아, AI 메타 데이터를 생성하여 출력한다. 제1 영상(302)은 영상 부호화 처리 전의 고화질 영상에 대응될 수 있다. 또한, 제1 영상(302)은 다운 스케일링 처리 전의 고해상도 영상일 수 있다.The meta information extraction unit 212a receives the first image 302 as input, generates and outputs AI meta data. The first image 302 may correspond to the high-definition image before the image encoding process. Also, the first image 302 may be a high-resolution image before the downscaling process.

메타 정보 추출부(212a)는 제1 영상(302)을 입력 받아, 영상 특성 분석 처리(312)를 수행한다. 영상 특성 분석 처리(312)는 제1 영상(302) 및 복수의 세그멘테이션 확률 맵으로부터, 영상 특성 정보를 생성한다. 영상 특성 정보는 제1 영상(302)에 포함된 클래스의 종류를 나타내는 클래스 정보를 포함한다. 영상 특성 분석 처리(312)에서는 다양한 종류의 영상 특성 분석 알고리즘을 이용하여 제1 영상(302)으로부터 영상 특성 정보를 생성한다. 일 실시예에 따르면, 영상 특성 분석 처리(312)는 정보 생성 네트워크(316)에서 생성된 복수의 세그멘테이션 확률 맵으로부터, 제1 영상(302)에 포함된 주요 클래스를 추출하여 클래스 정보를 생성할 수 있다. 영상 특성 분석 처리(312)는 영상 특성 분석 알고리즘으로 예를 들면, 주파수 정보를 분석할 수 있는 푸리에(Fourier) 변환, 웨이블릿(Wavelet) 변환 등을 사용할 수 있다. 또한, 영상 특성 분석 처리(312)는 영상 특성 분석 알고리즘으로 영상 내 에지(edge) 정보를 분석할 수 있는 소벨(Sobel) 필터, 라플라시안(Laplacian) 필터, 캐니(Canny) 필터 등을 사용할 수 있다. 또한, 메타 정보 추출부(212a)는 영상 특성 분석 알고리즘으로 영상 내 텍스처(texture) 정보를 분석할 수 있는 모폴로지(morphology) 침식(erosion), 팽창(dilation) 연산을 사용할 수 있다. 메타 정보 추출부(212a)는 영상 특성 분석 처리(312)에서 복수의 영상 특성 분석 알고리즘들을 조합하여 영상의 특성을 다양한 영상 특성으로 세분화할 수 있다. 예를 들면, 메타 정보 추출부(212a)는 영상 특성 분석 처리(312)에서 복수의 영상 특성 분석 알고리즘을 이용하여, 주파수 정보, 에지 정보, 또는 텍스처 정보 중 적어도 하나 또는 이들의 조합을 산출할 수 있다.The meta information extraction unit 212a receives the first image 302 and performs image characteristic analysis processing 312 . The image characteristic analysis processing 312 generates image characteristic information from the first image 302 and the plurality of segmentation probability maps. The image characteristic information includes class information indicating a type of a class included in the first image 302 . In the image characteristic analysis processing 312 , image characteristic information is generated from the first image 302 using various types of image characteristic analysis algorithms. According to an embodiment, the image characteristic analysis processing 312 may generate class information by extracting a main class included in the first image 302 from a plurality of segmentation probability maps generated by the information generating network 316 . have. The image characteristic analysis processing 312 may use, as an image characteristic analysis algorithm, for example, a Fourier transform, a wavelet transform, or the like, capable of analyzing frequency information. In addition, the image characteristic analysis processing 312 may use a Sobel filter, a Laplacian filter, a Canny filter, etc. capable of analyzing edge information in an image as an image characteristic analysis algorithm. Also, the meta information extraction unit 212a may use morphology erosion and dilation operations capable of analyzing texture information in an image using an image characteristic analysis algorithm. The meta information extractor 212a may combine a plurality of image characteristic analysis algorithms in the image characteristic analysis process 312 to subdivide image characteristics into various image characteristics. For example, the meta information extraction unit 212a may calculate at least one or a combination of frequency information, edge information, and texture information by using a plurality of image characteristic analysis algorithms in the image characteristic analysis process 312 . have.

영상 특성 분석 처리(312)는 정보 생성 네트워크(316)에서 생성된 복수의 세그멘테이션 확률 맵을 이용하여, 제1 영상(302)에서 검출된 객체들 중 일부를 선택하여 소정 개수의 클래스로 정의할 수 있다. 앞서 설명한 바와 같이, 메타 정보 추출부(212a)는 검출된 객체의 종류의 수가 미리 정해진 수보다 많은 경우, 대응하는 영역이 넓은 순으로 소정 개수의 객체를 추출하여 소정 개수의 클래스를 정의할 수 있다. 예를 들면, 영상 특성 분석 처리(312)는 제1 영상(302)으로부터 sky, grass, plant, water, 및 default 의 총 5개의 객체가 검출되고 미리 결정된 클래스의 수는 4개인 경우, 대응하는 영역이 넓은 순으로 소정 개수의 객체를 추출하여, sky, grass, plant, 및 default를 클래스로 정의하고, water는 클래스로 정의하지 않을 수 있다. 일 실시예에 따르면, 메타 정보 추출부(212a)는 영역의 넓이에 상관 없이, 특정 객체로 분류되지 않는 영역을 나타내는 default 클래스는 항상 클래스에 포함시킬 수 있다.The image characteristic analysis processing 312 selects some of the objects detected in the first image 302 using a plurality of segmentation probability maps generated by the information generation network 316 and defines a predetermined number of classes. have. As described above, when the number of types of detected objects is greater than a predetermined number, the meta information extracting unit 212a may define a predetermined number of classes by extracting a predetermined number of objects in the order of their corresponding regions being wide. . For example, in the image characteristic analysis processing 312 , when a total of five objects of sky, grass, plant, water, and default are detected from the first image 302 and the number of predetermined classes is four, the corresponding region By extracting a predetermined number of objects in this wide order, sky, grass, plant, and default may be defined as classes, and water may not be defined as classes. According to an embodiment, the meta-information extractor 212a may always include a default class indicating an area that is not classified as a specific object in the class, regardless of the size of the area.

정보 생성 네트워크(316)는 제1 영상(302)을 입력 받아, 미리 정의된 복수의 객체들에 대한 세그멘테이션 확률 맵을 생성한다. 예를 들면, 정보 생성 네트워크(316)에 대해 32개의 객체가 미리 정의되고, 정보 생성 네트워크(316)는 제1 영상(302)을 입력받아 32개 객체 각각에 대한 세그멘테이션 확률 맵을 생성할 수 있다. 세그멘테이션 확률 맵은 제1 영상(302)의 각 픽셀 또는 영역이 해당 세그멘테이션 확률 맵에 대응하는 객체에 대응할 확률을 나타낸다.. 정보 생성 네트워크(316)는 제1 AI 네트워크(114)를 포함하거나, 제1 AI 네트워크(114)를 이용할 수 있다. 일 실시예에 따르면, 제1 AI 네트워크(114)는 SFTGAN 알고리즘을 이용하여, 제1 영상(302)으로부터 복수의 세그멘테이션 확률 맵을 생성할 수 있다. The information generating network 316 receives the first image 302 and generates a segmentation probability map for a plurality of predefined objects. For example, 32 objects are predefined for the information generating network 316 , and the information generating network 316 may receive the first image 302 and generate a segmentation probability map for each of the 32 objects. . The segmentation probability map indicates a probability that each pixel or region of the first image 302 corresponds to an object corresponding to the corresponding segmentation probability map. The information generating network 316 includes the first AI network 114 or 1 AI network 114 is available. According to an embodiment, the first AI network 114 may generate a plurality of segmentation probability maps from the first image 302 by using the SFTGAN algorithm.

또한, 메타 정보 추출부(212a)는 클래스 정보 및 클래스 맵을 포함하는 AI 메타 데이터를 생성하는 메타 데이터 생성 처리(318)를 수행한다. 메타 데이터 생성 처리(318)에서는 정보 생성 네트워크(316)에서 생성된 복수의 세그멘테이션 확률 맵 중, 영상 특성 분석 처리(312)에서 생성된 클래스 정보에 포함된 객체의 세그멘테이션 확률 맵을 추출하여, 클래스 맵을 정의한다. 또한, 메타 데이터 생성 처리(318)는 복수의 클래스 맵 및 클래스 정보를 포함하는 AI 메타 데이터를 출력한다. 메타 정보 추출부(212a)는 소정의 규격에 기초하여, 클래스 정보와 클래스 맵을 포함하는 AI 메타 데이터를 생성할 수 있다. In addition, the meta information extraction unit 212a performs a meta data generation process 318 for generating AI meta data including class information and a class map. In the meta data generation processing 318, a segmentation probability map of an object included in the class information generated in the image characteristic analysis processing 312 is extracted from among the plurality of segmentation probability maps generated by the information generation network 316, and a class map to define In addition, the meta data generation process 318 outputs AI meta data including a plurality of class maps and class information. The meta information extraction unit 212a may generate AI metadata including class information and a class map based on a predetermined standard.

도 4는 본 개시의 일 실시예에 따라, 영상 특성 분석 처리에서 제1 영상으로부터 객체 정보 및 클래스 정보를 생성하는 과정을 나타낸 도면이다. 4 is a diagram illustrating a process of generating object information and class information from a first image in an image characteristic analysis process, according to an embodiment of the present disclosure.

일 실시예에 따르면, 메타 데이터 추출부(212a)는 영상 특성 분석 처리(312)에서 제1 영상(302)으로부터 제1 영상(302)에 포함된 객체 종류를 나타내는 정보를 추출한다. 정보 생성 네트워크(316)는 제1 영상(302)에서 하늘, 물, 풀, 산, 빌딩, 식물, 동물 등 미리 정의된 객체 들에 대응되는 복수의 세그멘테이션 확률 맵(412a, 412b, …, 412h)을 생성한다. 영상 특성 분석 처리(312)에서는 정보 생성 네트워크(316)에서 생성된 복수의 세그멘테이션 확률 맵(412a, 412b, …, 412h)으로부터 클래스 정보를 생성한다. According to an embodiment, the metadata extractor 212a extracts information indicating the type of object included in the first image 302 from the first image 302 in the image characteristic analysis process 312 . The information generating network 316 includes a plurality of segmentation probability maps 412a, 412b, ..., 412h corresponding to predefined objects such as sky, water, grass, mountains, buildings, plants, and animals in the first image 302 . create In the image characteristic analysis process 312 , class information is generated from the plurality of segmentation probability maps 412a, 412b, ..., 412h generated by the information generating network 316 .

복수의 세그멘테이션 확률 맵(412a, 412b, …, 412h)은 각 객체에 대해 생성되고, 제1 영상의 각 영역이 해당 객체에 대응할 확률을 나타내는 확률 맵이다. 본 개시에서 소정 클래스에 해당할 확률을 나타내는 확률 맵은 클래스 맵, 소정 객체에 대응할 확률을 나타내는 확률 맵은 세그멘테이션 확률 맵으로 지칭한다. 클래스 맵은 영상 특성 분석 처리(312)에서 클래스 정보를 정의한 후, 클래스 정보에 포함된 클래스들에 대해 정의될 수 있다. The plurality of segmentation probability maps 412a, 412b, ..., 412h are generated for each object and are probability maps indicating a probability that each region of the first image corresponds to the corresponding object. In the present disclosure, a probability map indicating a probability corresponding to a predetermined class is referred to as a class map, and a probability map indicating a probability corresponding to a predetermined object is referred to as a segmentation probability map. The class map may be defined for classes included in the class information after class information is defined in the image characteristic analysis process 312 .

일 실시예에 따르면, 복수의 세그멘테이션 확률 맵(412a, 412b, …, 412h)은 제1 영상(302)의 각 픽셀 또는 영역에서 각 객체 종류에 해당할 확률을 나타낼 수 있다. 복수의 세그멘테이션 확률 맵(412a, 412b, …, 412h)은 제1 영상(302)과 동일 해상도로 정의되거나, 제1 영상(302)보다 낮은 해상도로 정의될 수 있다.According to an embodiment, the plurality of segmentation probability maps 412a , 412b , ..., 412h may indicate a probability corresponding to each object type in each pixel or region of the first image 302 . The plurality of segmentation probability maps 412a , 412b , ..., 412h may be defined with the same resolution as the first image 302 , or may be defined with a lower resolution than the first image 302 .

복수의 세그멘테이션 확률 맵(412a, 412b, …, 412h)은 미리 정의된 적어도 하나의 객체 종류 각각에 대응하는 세그멘테이션 확률 맵 (412a, 412b, ..., 412h)를 포함할 수 있다. 각 세그멘테이션 확률 맵 (412a, 412b, ..., 412h)은 제1 영상(302)의 각 픽셀 또는 영역이 해당 객체 종류에 대응하는지 여부 또는 해당 객체 종류에 해당할 확률을 나타내는 정보를 포함한다. 예를 들면, 하늘에 대응하는 세그멘테이션 확률 맵(412b)은 하늘에 대응하는 제1 영상(302)의 영역에서 1의 값을 갖고, 하늘에 대응하지 않는 제1 영상(302)의 영역에서 0의 값을 가질 수 있다. 다른 예로서, 세그멘테이션 확률 맵 (412a, 412b, ..., 412h)의 각 픽셀은 확률을 나타내는 양의 실수 값을 가질 수 있다. 또한, 디폴트 값에 해당하는 세그멘테이션 확률 맵(412a)이 산출될 수 있고, 이는 미리 정의된 객체 종류 중 어디에도 대응하지 않는 영역을 나타낼 수 있다. 일 실시예에 따르면, 복수의 세그멘테이션 확률 맵(412a, 412b, ..., 412h)의 각 픽셀 값은, 해당 픽셀에 대해, 복수의 세그멘테이션 확률 맵(412a, 412b, ..., 412h) 각각의 픽셀 값의 합이 1이 되도록 설정될 수 있다.The plurality of segmentation probability maps 412a, 412b, ..., 412h may include segmentation probability maps 412a, 412b, ..., 412h corresponding to at least one predefined object type, respectively. Each segmentation probability map 412a, 412b, ..., 412h includes information indicating whether each pixel or region of the first image 302 corresponds to a corresponding object type or a probability corresponding to the corresponding object type. For example, the segmentation probability map 412b corresponding to the sky has a value of 1 in the region of the first image 302 corresponding to the sky, and has a value of 0 in the region of the first image 302 that does not correspond to the sky. can have a value. As another example, each pixel of the segmentation probability maps 412a, 412b, ..., 412h may have a positive real value representing a probability. Also, a segmentation probability map 412a corresponding to a default value may be calculated, which may indicate a region that does not correspond to any of the predefined object types. According to an embodiment, each pixel value of the plurality of segmentation probability maps 412a, 412b, ..., 412h is, for the corresponding pixel, the plurality of segmentation probability maps 412a, 412b, ..., 412h, respectively. may be set such that the sum of pixel values of .

메타 데이터 추출부(212a)는 영상 특성 분석 처리(312)에서 복수의 세그멘테이션 확률 맵(412a, 412b, ..., 412h)으로부터, 세그멘테이션 확률 맵(412a, 412b, ..., 412h)의 평균 값 상위 n개의 클래스를 도출하여, 클래스 정보에 포함될 클래스를 결정한다. 영상 특성 분석 처리(312)는 세그멘테이션 확률 맵 처리(430), 상위 랭크 추출(440), 및 클래스 정보 생성(450)의 단계를 거쳐 수행될 수 있다. In the image characteristic analysis process 312 , the meta data extraction unit 212a performs an average of the segmentation probability maps 412a, 412b, ..., 412h from the plurality of segmentation probability maps 412a, 412b, ..., 412h. The class to be included in class information is determined by deriving the top n classes. The image characteristic analysis process 312 may be performed through the steps of the segmentation probability map process 430 , the upper rank extraction 440 , and the class information generation 450 .

세그멘테이션 확률 맵 처리(430)는 복수의 세그멘테이션 확률 맵(412a, 412b, ..., 412h) 각각에 대해, 0이 아닌 픽셀 값에 대해 평균값을 계산하고, 해당 클래스에서 평균 값보다 낮은 픽셀 값을 제거한다. 예를 들면, Sky 객체에 대한 세그멘테이션 확률 맵(412b)에서 0이 아닌 픽셀 값에 대해 평균 값이 0.4인 경우, 세그멘테이션 확률 맵(412b)에서 0.4보다 작은 픽셀 값을 제거한다. 이러한 처리는 오분류를 방지하고 불필요한 값을 제거하기 위한 처리이다. The segmentation probability map processing 430 calculates an average value for non-zero pixel values for each of the plurality of segmentation probability maps 412a, 412b, ..., 412h, and determines a pixel value lower than the average value in the corresponding class. Remove. For example, if the average value is 0.4 for a non-zero pixel value in the segmentation probability map 412b for the Sky object, pixel values smaller than 0.4 are removed from the segmentation probability map 412b. This processing is for preventing misclassification and removing unnecessary values.

다음으로, 상위 랭크 추출 처리(440)는 각 세그멘테이션 확률 맵(412a, 412b, ..., 412h)의 평균 값을 기준으로, 상위 k개의 세그멘테이션 확률 맵(412a, 412b, 412f)을 결정한다. k개는 클래스 개수로 미리 정의된 자연수이다. 세그멘테이션 확률 맵(412a, 412b, ..., 412h)의 평균 값이 높다는 것은, 제1 영상(320)에서 해당 객체가 차지하는 면적이 넓다는 것을 의미한다. 영상 특성 분석 처리(312)는 상위 랭크의 n개의 객체를 추출하여 클래스로 정의함에 의해, 제1 영상(320)에서 넓은 면적을 차지하는 객체의 종류를 정확하게 추출할 수 있다. 상위 랭크로 선택된 세그멘테이션 확률 맵(412a, 412b, 412f)은 일 실시예이고, 각 실시예에서 상위 랭크의 세그멘테이션 확률 맵은 다양하게 결정될 수 있다.Next, the upper rank extraction process 440 determines the top k segmentation probability maps 412a, 412b, 412f based on the average value of each segmentation probability map 412a, 412b, ..., 412h. k is a predefined natural number as the number of classes. A high average value of the segmentation probability maps 412a, 412b, ..., 412h means that the area occupied by the corresponding object in the first image 320 is large. The image characteristic analysis processing 312 extracts n objects of higher rank and defines them as classes, so that the type of object occupying a large area in the first image 320 may be accurately extracted. The segmentation probability maps 412a , 412b , and 412f selected as the upper rank are one embodiment, and in each embodiment, the segmentation probability map of the upper rank may be determined in various ways.

다음으로, 클래스 정보 생성 처리(450)는, 상위 랭크의 세그멘테이션 확률 맵(412a, 412b, 412f)에 대응하는 객체들을 클래스로 정의하고, 클래스 정보(420)를 생성한다. 클래스 정보(420)는 상위 랭크로 선택된 복수의 클래스들을 나타낸다. 클래스 정보(420)는 예를 들면, 세그멘테이션 확률 맵(412a, 412b, 412f)의 평균 값이 상위 k개에 속하는 클래스에 대한 정보로, 도 4의 예에서는 Default, Sky, 및 Bulding을 포함한다. Next, the class information generation process 450 defines objects corresponding to the segmentation probability maps 412a, 412b, and 412f of the higher rank as classes, and generates the class information 420 . The class information 420 indicates a plurality of classes selected as a higher rank. The class information 420 is, for example, information on classes whose average values of the segmentation probability maps 412a, 412b, and 412f belong to the top k, and includes Default, Sky, and Bulding in the example of FIG. 4 .

일 실시예에 따르면, 메타 데이터 추출부(212a)는 세그멘테이션 확률 맵(412a, 412b, ..., 412h)의 검출 값이 0이 아닌 객체 중, 클래스로 선택되지 않은 객체를, 선택된 객체, 즉 클래스 중 하나로 맵핑한다. 예를 들면, 식물에 대응하는 객체(세그멘테이션 확률 맵 412g에 대응)는 상위 3개의 객체에 선택되지 않았는데, 정의된 클래스 중 하늘에 대응하는 클래스로 맵핑되어 전송될 수 있다. 서로 맵핑되는 객체의 종류는 미리 정의될 수 있다. 미리 정의된 객체 종류 중, 실제 AI 네트워크에 적용 시, 세밀감(detail)을 유사하게 복원하는 객체 종류들은, 서로 맵핑되는 객체로 미리 정의될 수 있다. 예를 들면, 사람의 머리카락(hair)과, 동물의 털(fur)은 서로 맵핑되는 객체로 정의될 수 있다. 메타 데이터 추출부(212a)는 메모리(116) 등에 미리 저장된 서로 맵핑되는 객체에 대한 정보를 이용하여, 객체 간의 맵핑을 수행할 수 있다. 일 실시예에 따르면, 어떠한 객체에 대해 맵핑되는 객체가 정의되지 않은 경우, 해당 객체는 디폴트에 대응하는 객체로 맵핑될 수 있다. According to an embodiment, the metadata extraction unit 212a selects an object that is not selected as a class from among the objects whose detection values of the segmentation probability maps 412a, 412b, ..., 412h are not 0, the selected object, that is, Map to one of the classes. For example, an object corresponding to a plant (corresponding to the segmentation probability map 412g) is not selected in the top three objects, but may be mapped and transmitted to a class corresponding to the sky among the defined classes. The types of objects mapped to each other may be predefined. Among the predefined object types, object types that similarly restore detail when applied to an actual AI network may be predefined as objects mapped to each other. For example, human hair and animal fur may be defined as objects mapped to each other. The metadata extractor 212a may perform mapping between objects by using information about objects that are mapped to each other previously stored in the memory 116 or the like. According to an embodiment, when an object mapped to a certain object is not defined, the corresponding object may be mapped to an object corresponding to a default.

영상 특성 분석 처리(312)는 이와 같은 객체의 맵핑 처리를 수행하여, 복수의 세그멘테이션 확률 맵(412a, 412b, ..., 412h)을 합성한 클래스 맵을 생성할 수 있다. 예를 들면, 영상 특성 분석 처리(312)는 클래스로 선택된 객체의 세그멘테이션 확률 맵(412a, 412b, 412f)에 선택되지 않은 객체의 세그멘테이션 확률 맵(412c, 412d, 412e, 412g, 412h) 중 적어도 하나를 합성하여, 클래스 맵을 생성할 수 있다. 세그멘테이션 확률 맵(412a, 412b, ..., 412h)의 합성은 픽셀 값의 평균 처리, 가중 평균 처리 등을 이용하여 수행될 수 있다. 세그멘테이션 확률 맵(412a, 412b, ..., 412h)에 대한 맵핑 처리가 수행되는 경우, 메타 데이터 생성 처리(318)는 맵핑 처리에 의해 생성된 클래스 맵을 영상 특성 분석 처리(312)로부터 수신한다. 메타 데이터 생성 처리(318)는 영상 특성 분선 처리(312)에서 클래스 정보 및 클래스 맵을 수신하여, AI 메타 데이터를 생성할 수 있다.다른 예로서, 영상 특성 분석 처리(312)는 서로 맵핑될 객체를 결정하고, 객체간 맵핑에 대한 정보를 메타 데이터 생성 처리(318)로 전달한다. 메타 데이터 생성 처리(318)는 객체간 맵핑에 대한 정보에 기초하여, 정보 생성 네트워크(316)에서 생성된 세그멘테이션 확률 맵으로부터, 클래스 맵을 생성한다. 메타 데이터 생성 처리(138)에서 세그멘테이션 확률 맵(412a, 412b, ..., 412h)의 합성은 픽셀 값의 평균 처리, 가중 평균 처리 등을 이용하여 수행될 수 있다. The image characteristic analysis processing 312 may generate a class map in which a plurality of segmentation probability maps 412a, 412b, ..., 412h are synthesized by performing the object mapping processing. For example, the image characteristic analysis processing 312 performs the segmentation probability map 412a, 412b, 412f of the object selected as a class, and at least one of the segmentation probability maps 412c, 412d, 412e, 412g, 412h can be synthesized to create a class map. The synthesis of the segmentation probability maps 412a, 412b, ..., 412h may be performed using averaging of pixel values, weighted averaging, or the like. When the mapping processing for the segmentation probability maps 412a, 412b, ..., 412h is performed, the meta data generation processing 318 receives the class map generated by the mapping processing from the image characteristic analysis processing 312 . The metadata generation processing 318 may receive the class information and the class map from the image characteristic segmentation processing 312 to generate AI metadata. As another example, the image characteristic analysis processing 312 may perform object mapping to each other. , and transmits information about the mapping between objects to the meta data generation process 318 . The meta data generation process 318 generates a class map from the segmentation probability map generated in the information generation network 316 based on the information on the inter-object mapping. In the meta data generation process 138, the synthesis of the segmentation probability maps 412a, 412b, ..., 412h may be performed using averaging of pixel values, weighted averaging, or the like.

도 5는 본 개시의 일 실시예에 따른 메타 정보 추출부의 정보 생성 네트워크의 동작에 대해 설명한 도면이다.5 is a diagram illustrating an operation of an information generating network of a meta information extracting unit according to an embodiment of the present disclosure.

정보 생성 네트워크(316)는 제1 영상(502)을 입력 받아, 미리 정의된 복수의 객체들에 대한 세그멘테이션 확률 맵(504)을 생성한다. 복수의 객체들은 정보 생성 네트워크(316)의 설계 과정에서 미리 정의될 수 있다. 예를 들면, 정보 생성 네트워크(316)에 대해 32개의 객체가 미리 정의되고, 정보 생성 네트워크(316)는 제1 영상(302)을 입력 받아 32개 객체 각각에 대한 세그멘테이션 확률 맵(504)을 생성할 수 있다. 정보 생성 네트워크(316)는 미리 설정된 객체들에 대해 세그멘테이션 확률 맵(504)을 생성하도록 학습된다. 정보 생성 네트워크(316)는 복수의 연산 레이어를 포함하고, 다수의 학습 데이터에 의해 학습되어 생성된 파라미터 값에 기초하여 복수의 연산 레이어에 의한 처리를 수행한다.The information generating network 316 receives the first image 502 and generates a segmentation probability map 504 for a plurality of predefined objects. The plurality of objects may be predefined in the design process of the information generating network 316 . For example, 32 objects are predefined for the information generation network 316 , and the information generation network 316 receives the first image 302 and generates a segmentation probability map 504 for each of the 32 objects. can do. The information generation network 316 is trained to generate a segmentation probability map 504 for preset objects. The information generation network 316 includes a plurality of computation layers, and performs processing by the plurality of computation layers based on parameter values generated by learning by a plurality of training data.

일 실시예에 따르면, 정보 생성 네트워크(316)는 객체뿐만 아니라, 주파수 특성, 텍스처 특성 등 다른 영상 특성 정보를 기준으로 세그멘테이션 확률 맵(504)을 생성하도록 구성될 수도 있다. 예를 들면, 정보 생성 네트워크(316)는 미리 설정된 복수의 주파수 영역들 각각에 대한 세그멘테이션 확률 맵(504)을 생성하여 출력할 수 있다.According to an embodiment, the information generating network 316 may be configured to generate the segmentation probability map 504 based on not only the object but also other image characteristic information such as a frequency characteristic and a texture characteristic. For example, the information generating network 316 may generate and output the segmentation probability map 504 for each of a plurality of preset frequency domains.

정보 생성 네트워크(316)는 복수의 레이어를 포함하는 제1 AI 네트워크(114a)에 대응되거나, 제1 AI 네트워크(114a)를 이용할 수 있다. 제1 AI 네트워크(114a)는 제1-1 AI 네트워크(510) 및 제1-2 AI 네트워크(520)를 포함할 수 있다. 제1 AI 네트워크(114a)는 예를 들면, CNN 구조를 가질 수 있다. 본 개시에서 제1-1 AI 네트워크(510) 및 제1-2 AI 네트워크(520)는 복수의 레이어의 그룹을 지칭하기 위한 용어이며, 제1 AI 네트워크(114a)가 제1-1 AI 네트워크(510) 및 제1-2 AI 네트워크(520)로 분리되어 구현되는 것을 의미하는 것은 아니다. 제1 AI 네트워크(114a)는 복수의 레이어를 포함하여 다양하게 구성될 수 있고, 하나 또는 복수의 레이어 그룹을 포함할 수 있다. The information generating network 316 may correspond to the first AI network 114a including a plurality of layers or may use the first AI network 114a. The first AI network 114a may include a 1-1 AI network 510 and a 1-2 AI network 520 . The first AI network 114a may have, for example, a CNN structure. In the present disclosure, the 1-1 AI network 510 and the 1-2 AI network 520 are terms for referring to a group of a plurality of layers, and the first AI network 114a is the 1-1 AI network ( 510) and the 1-2 AI network 520 are not meant to be implemented separately. The first AI network 114a may be configured in various ways including a plurality of layers, and may include one or a plurality of layer groups.

제1-1 AI 네트워크(510)는 제1 영상으로부터 특징 맵(feature map)을 생성한다. 특징 맵은 제1 영상으로부터 소정의 함축된 정보를 추출한 데이터이다. The 1-1 AI network 510 generates a feature map from the first image. The feature map is data obtained by extracting predetermined implicit information from the first image.

제1-1 AI 네트워크(510)는 복수의 레이어를 포함하고, 싱글 포워드 패스(single forward pass) 구조를 가질 수 있다. 일 실시예에 따르면, 제1-1 AI 네트워크(510)는 적어도 하나의 컨벌루션 레이어 및 적어도 하나의 최대값 풀링 레이어의 조합을 포함할 수 있다. 예를 들면, 제1-1 AI 네트워크(510)는 소정 개수의 컨벌루션 레이어와 최대값 풀링 레이어의 배열이 반복되는 구조를 가질 수 있다. 제1-1 AI 네트워크(510)에 입력된 제1 영상(502) 및 클래스 정보는 복수의 레이어를 통해 순차적으로 처리되면서 전달된다. The 1-1 AI network 510 may include a plurality of layers and have a single forward pass structure. According to an embodiment, the 1-1 AI network 510 may include a combination of at least one convolutional layer and at least one maximum pooling layer. For example, the 1-1 AI network 510 may have a structure in which an arrangement of a predetermined number of convolutional layers and a maximum pooling layer is repeated. The first image 502 and class information input to the 1-1 AI network 510 are sequentially processed and transmitted through a plurality of layers.

제1-1 AI 네트워크(510)는 제1 영상을 복수의 컨벌루션 레이어로 처리하여 특징 맵을 생성한다. 제1-1 AI 네트워크(510)는 입력 데이터를 소정 개수의 컨벌루션 레이어를 통해 처리한 결과인 중간 데이터를 최대값 풀링 레이어를 이용하여 처리하여, 중간 데이터에서 주요 특징 값만 추출하는 처리를 반복하여 수행할 수 있다. 제1-1 AI 네트워크(510)는 제1 영상으로부터 다운스케일 된 특징 맵을 생성할 수 있다. 제1-1 AI 네트워크(510)에서 출력된 특징 맵은 제1-2 AI 네트워크(520)로 입력된다.The 1-1 AI network 510 generates a feature map by processing the first image as a plurality of convolutional layers. The 1-1 AI network 510 processes the intermediate data, which is a result of processing the input data through a predetermined number of convolutional layers, using the maximum pooling layer, and repeats the process of extracting only the main feature values from the intermediate data. can do. The 1-1 AI network 510 may generate a downscaled feature map from the first image. The feature map output from the 1-1 AI network 510 is input to the 1-2 AI network 520 .

컨벌루션 레이어는 소정의 필터 커널을 이용하여, 입력 데이터에 대해 컨벌루션 처리를 수행한다. 필터 커널에 의해, 입력 데이터로부터 소정의 특징이 추출된다. 예를 들면, 컨벌루션 레이어는 3*3 크기의 필터 커널을 이용하여 입력 데이터에 대한 컨벌루션 연산을 수행할 수 있다. 컨벌루션 레이어는 필터 커널의 파라미터 값들과, 그에 대응되는 입력 데이터 내의 픽셀 값들 사이의 곱 연산 및 덧셈 연산을 통해 특징 맵의 각 픽셀 값들을 결정한다. 필터 커널의 파라미터 값 및 컨벌루션 연산의 파라미터들은 학습에 의해 미리 결정될 수 있다. 컨벌루션 레이어는 입력 데이터와 동일한 크기의 출력 데이터를 생성하기 위해, 패딩(padding) 처리를 추가로 수행할 수 있다. The convolution layer performs convolution processing on input data using a predetermined filter kernel. A predetermined feature is extracted from the input data by the filter kernel. For example, the convolutional layer may perform a convolution operation on input data using a filter kernel having a size of 3*3. The convolutional layer determines respective pixel values of the feature map through multiplication and addition operations between parameter values of the filter kernel and pixel values in input data corresponding thereto. The parameter values of the filter kernel and parameters of the convolution operation may be predetermined by learning. The convolutional layer may additionally perform a padding process to generate output data having the same size as the input data.

최대값 풀링 레이어(Max pooling layer)는 입력 데이터에 대해 소정 필터를 적용하여, 필터 내의 입력 데이터 중 최대 값을 추출하여, 출력한다. 예를 들면, 최대값 풀링 레이어는 입력 데이터에 대해 2*2 필터를 이동시키면서, 2*2 필터 내에서 최대 값을 추출하여 출력한다. 최대값 풀링 레이어는 입력 데이터에 대해 필터 사이즈에 대응하는 블록 단위로 필터를 적용하여, 입력 데이터의 사이즈를 필터 사이즈의 비율만큼 감소시킬 수 있다. 예를 들면, 2*2 필터를 이용하는 최대값 풀링 레이어는 입력 데이터의 가로 및 세로 길이를 각각 1/2로 감소시킨다. 최대값 풀링 레이어는 입력 데이터에서 주요 값만 다음 레이어로 전달하기 위해 이용될 수 있다.A maximum pooling layer applies a predetermined filter to input data, extracts a maximum value from among input data in the filter, and outputs it. For example, the maximum value pooling layer extracts and outputs the maximum value within the 2*2 filter while moving the 2*2 filter with respect to the input data. The maximum pooling layer may reduce the size of the input data by a ratio of the filter size by applying a filter to the input data in units of blocks corresponding to the filter size. For example, a maximum pooling layer using a 2*2 filter reduces the horizontal and vertical lengths of input data by half. The maximum value pooling layer may be used to transfer only the main value in the input data to the next layer.

제1-2 AI 네트워크(520)는 특징 맵을 입력 받아, 복수의 세그멘테이션 확률 맵(504)을 생성한다. 제1-2 AI 네트워크(520)는 복수의 레이어를 포함하고, 싱글 포워드 패스(single forward pass) 구조를 가질 수 있다. 일 실시예에 따르면, 제1-2 AI 네트워크(520)는 적어도 하나의 컨벌루션 레이어, 적어도 하나의 활성화 레이어, 및 적어도 하나의 최소값 풀링 레이어의 다양한 조합을 포함할 수 있다. The 1-2 AI network 520 receives the feature map and generates a plurality of segmentation probability maps 504 . The 1-2 AI network 520 may include a plurality of layers and have a single forward pass structure. According to an embodiment, the 1-2 AI network 520 may include various combinations of at least one convolutional layer, at least one activation layer, and at least one minimum value pooling layer.

활성화 레이어는 입력 데이터에 비선형 특성을 부여하는 처리를 수행한다. 활성화 레이어는 예를 들면, 시그모이드 함수(sigmoid function), Tanh 함수, ReLU(Rectified Linear Unit) 함수 등에 기초한 연산을 수행할 수 있으나, 이에 한정되는 것은 아니다.The activation layer performs processing for imparting non-linear characteristics to the input data. The activation layer may perform an operation based on, for example, a sigmoid function, a Tanh function, a Rectified Linear Unit (ReLU) function, and the like, but is not limited thereto.

최소값 풀링 레이어(Min pooling layer)는 입력 데이터에 대해 소정 필터를 적용하여, 필터 내의 입력 데이터 중 최소 값을 추출하여, 출력한다. 예를 들면, 최소값 풀링 레이어는 입력 데이터에 대해 2*2 필터를 이동시키면서, 2*2 필터 내에서 최소 값을 추출하여 출력한다. 최소값 풀링 레이어는 입력 데이터에 대해 필터 사이즈에 대응하는 블록 단위로 필터를 적용하여, 입력 데이터의 사이즈를 필터 사이즈의 비율만큼 감소시킬 수 있다. 예를 들면, 2*2 필터를 이용하는 최소값 풀링 레이어는 입력 데이터의 가로 및 세로 길이를 각각 1/2로 감소시킨다.The min pooling layer applies a predetermined filter to input data, extracts a minimum value from among the input data in the filter, and outputs it. For example, the minimum value pooling layer extracts and outputs the minimum value within the 2*2 filter while moving the 2*2 filter with respect to the input data. The minimum pooling layer may reduce the size of the input data by a ratio of the filter size by applying a filter to the input data in units of blocks corresponding to the filter size. For example, a minimum pooling layer using a 2*2 filter reduces the horizontal and vertical lengths of input data by half.

일 실시예에 따르면, 제1-2 AI 네트워크(520)는 제1 레이어 그룹(522), 업 스케일러(524), 및 제2 레이어 그룹(526)을 포함할 수 있다. 제1 레이어 그룹(522), 업 스케일러(524), 및 제2 레이어 그룹(526)은 싱글 포워드 패스 구조로 배치될 수 있다.According to an embodiment, the 1-2 AI network 520 may include a first layer group 522 , an upscaler 524 , and a second layer group 526 . The first layer group 522 , the upscaler 524 , and the second layer group 526 may be disposed in a single forward pass structure.

제1 레이어 그룹(522)은 특징 맵을 입력 받아 특징 맵에 대한 비선형 처리를 수행하여 출력한다. 제1 레이어 그룹(522)은 적어도 하나의 컨벌루션 레이어 및 적어도 하나의 활성화 레이어의 조합을 포함할 수 있다. 예를 들면, 제1 레이어 그룹(522)은 컨벌루션 레이어와 활성화 레이어가 교대로 배치된 구조를 가질 수 있다.The first layer group 522 receives the feature map, performs non-linear processing on the feature map, and outputs it. The first layer group 522 may include a combination of at least one convolutional layer and at least one activation layer. For example, the first layer group 522 may have a structure in which a convolutional layer and an activation layer are alternately disposed.

업 스케일러(524)는 제1 레이어 그룹(522)의 출력을 입력 받아, 업 스케일 처리를 수행한다. 업 스케일러(524)는 다양한 종류의 업 스케일 알고리즘을 이용하여 구현될 수 있다. 일 실시예에 따르면, 업 스케일러는 DNN 구조를 갖는 AI 업 스케일러에 대응될 수 있다.The upscaler 524 receives the output of the first layer group 522 and performs an upscaling process. The upscaler 524 may be implemented using various types of upscaling algorithms. According to an embodiment, the upscaler may correspond to an AI upscaler having a DNN structure.

제2 레이어 그룹(526)는 업 스케일러(524)의 출력을 입력 받아, 추가적인 처리를 수행한다. 제2 레이어 그룹(526)은 적어도 하나의 컨벌루션 레이어와 적어도 하나의 최소값 풀링 레이어의 조합을 포함할 수 있다. 예를 들면, 제2 레이어 그룹(526)은 복수의 컨벌루션 레이어가 배치되고, 마지막에 최소값 풀링 레이어가 배치된 구조를 가질 수 있다.The second layer group 526 receives the output of the upscaler 524 and performs additional processing. The second layer group 526 may include a combination of at least one convolutional layer and at least one minimum value pooling layer. For example, the second layer group 526 may have a structure in which a plurality of convolutional layers are disposed and a minimum value pooling layer is disposed last.

객체 종류 이외에 다른 영상 특성을 함께 분석하는 경우, 각 영상 특성에 대응하는 제1 AI 네트워크(114a)가 별개로 구비될 수 있다. 예를 들면, 영상 특성으로 클래스, 주파수, 및 텍스처를 분석하는 경우, 클래스에 대응하는 제1 AI 네트워크, 주파수에 대응하는 제1 AI 네트워크, 및 텍스처에 대응하는 제1 AI 네트워크가 각각 별개로 구비될 수 있다.When other image characteristics other than the object type are analyzed together, the first AI network 114a corresponding to each image characteristic may be separately provided. For example, when class, frequency, and texture are analyzed as image characteristics, a first AI network corresponding to a class, a first AI network corresponding to a frequency, and a first AI network corresponding to a texture are separately provided. can be

도 6은 본 개시의 일 실시예에 따른 영상 제공 장치의 메타 정보 경량화부의 동작을 나타낸 도면이다.6 is a diagram illustrating an operation of a meta information lightweight unit of an image providing apparatus according to an embodiment of the present disclosure.

프로세서(112a)의 메타 데이터 생성부(210)는 메타 정보 경량화부(214a)를 포함한다. 메타 정보 경량화부(214a)는 메타 정보 추출부(212)에서 생성된 AI 메타 데이터를 입력 받아, AI메타 데이터의 데이터 용량을 감소시키는 경량화 처리를 수행한다. The meta data generating unit 210 of the processor 112a includes a meta information lightening unit 214a. The meta information weight reduction unit 214a receives the AI metadata generated by the meta information extraction unit 212 and performs a weight reduction process for reducing the data capacity of the AI metadata.

메타 정보 경량화부(214a)는 메타 정보 추출부(212)로부터 출력된 클래스 정보 및 클래스 맵을 입력 받아, 위치 별 대표 값 정보를 갖는 경량화된 클래스 맵(612) 및 각 대표 값에 대응하는 클래스를 나타내는 클래스 정보(614)로 변환하여, 경량화된 AI 메타 데이터(610)를 생성할 수 있다. The meta-information lightweight unit 214a receives the class information and the class map output from the meta-information extraction unit 212, and provides a lightweight class map 612 having representative value information for each location and a class corresponding to each representative value. By converting the indicated class information 614, lightweight AI metadata 610 may be generated.

메타 정보 경량화부(214a)는 클래스 정보 및 클래스 맵(620a, 620b, 620c, 620d)으로부터, 각 픽셀이, 해당 픽셀에 대응하는 세그멘테이션 영역의 클래스의 대표 값을 갖는 경량화된 클래스 맵(612)을 생성한다. 경량화된 클래스 맵(612)은 각 픽셀에 대응하는 대표 값 정보를 갖는 맵이다. 일 실시예에 따르면, 각 대표 값에 대응하는 컬러가 정의되고, 경량화된 클래스 맵(612)의 각 픽셀은 해당 픽셀의 값에 대응하는 컬러를 가질 수 있다(616). The meta-information lightweight unit 214a, from the class information and class maps 620a, 620b, 620c, and 620d, each pixel has a representative value of the class of the segmentation area corresponding to the pixel. Lightweight class map 612. create The lightweight class map 612 is a map having representative value information corresponding to each pixel. According to an embodiment, a color corresponding to each representative value is defined, and each pixel of the lightweight class map 612 may have a color corresponding to the value of the corresponding pixel ( 616 ).

각 픽셀의 대표 값은, 복수의 클래스 맵(420)에 나타난 확률 값에 기초하여 결정될 수 있다. 예를 들면, 경량화된 클래스 맵(612)의 대표 값은, 해당 픽셀에서 가장 높은 확률 값을 갖는 클래스에 대응하는 값으로 정의될 수 있다. 경량화된 클래스 맵 (612)의 대표 값의 개수는 클래스 정보에 포함된 클래스 개수(k개)에 기초하여 결정될 수 있다. 따라서 클래스 정보에 포함된 클래스 개수(k개)에 기초하여 경량화된 클래스 맵(612)의 각 픽셀에 할당되는 비트 수가 (log₂k)로 결정될 수 있다. 또한, 경량화된 클래스 맵(612)은 (log₂k)*w*h 비트로 표현될 수 있다. 여기서 w는 경량화된 클래스 맵의 폭, h는 경량화된 클래스 맵의 높이를 의미한다. 경량화된 클래스 맵의 폭과 높이는 제1 영상(602)과 동일하거나, 제1 영상(602)의 폭과 높이보다 작을 수 있다.The representative value of each pixel may be determined based on probability values displayed on the plurality of class maps 420 . For example, the representative value of the lightweight class map 612 may be defined as a value corresponding to a class having the highest probability value in the corresponding pixel. The number of representative values of the lightweight class map 612 may be determined based on the number of classes (k) included in the class information. Accordingly, the number of bits allocated to each pixel of the lightweight class map 612 may be determined as _{(log 2 k) based on the number of classes (k) included in the class information.} In addition, the lightweight class map 612 _{may be expressed as (log 2} k)*w*h bits. Here, w denotes the width of the lightweight class map, and h denotes the height of the lightweight class map. The width and height of the lightweight class map may be the same as the first image 602 , or may be smaller than the width and height of the first image 602 .

메타 정보 경량화부(214a)는 메타 정보 추출부(212)에서 생성된 클래스 정보에 포함된 클래스 종류와, 경량화된 클래스 맵(612)의 각 대표 값에 대응되는 클래스에 대한 정보를 포함하는 경량화된 클래스 정보(614)를 생성한다. 예를 들면, 선택된 클래스가 물, 풀, 디폴트인 경우, 클래스 정보(614)는 선택된 클래스는 물, 풀, 디폴트이고, 물에 대해 제1 값이 할당되고, 풀에 대해 제2 값이 할당되고, 디폴트에 대해 제3 값이 할당되었다는 정보를 포함할 수 있다. 제1 값, 제2 값, 및 제3 값은 최소 비트 수로 표현할 수 있도록 가능한 작은 값으로 정의될 수 있다.The meta-information lightweight unit 214a includes information on a class type included in the class information generated by the meta-information extraction unit 212 and information on a class corresponding to each representative value of the lightweight class map 612 . Class information 614 is generated. For example, if the selected class is water, grass, default, the class information 614 may indicate that the selected class is water, grass, default, a first value is assigned to water, a second value is assigned to pool, and , information that a third value is assigned for the default. The first value, the second value, and the third value may be defined as values as small as possible so as to be expressed by the minimum number of bits.

메타 정보 경량화부(214a)는 경량화된 클래스 맵(612)과 경량화된 클래스 정보(614)를 포함하는 경량화된 AI 메타 데이터(610)를 출력한다. The meta information lightweight unit 214a outputs the lightweight AI metadata 610 including the lightweight class map 612 and the lightweight class information 614 .

AI 메타 데이터가 경량화되지 않는 경우, 복수의 클래스 맵(620a, 620b, 620c, 및 620d)은 클래스 각각에 대한 클래스 맵(620a, 620b, 620c, 및 620d)을 포함하여, 한 프레임 영상에 대해 k*w*h 비트가 필요할 수 있다. 또한, 클래스 맵(620a, 620b, 620c, 및 620d)에서 각 픽셀에 대해 해당 클래스에 해당할 확률을 나타내는 경우(618), 확률 값을 표현하기 위해 플로팅 값으로 클래스 맵을 생성해야 할 수 있고, 플로팅 값의 레벨이 32인 경우, 한 프레임 영상의 복수의 클래스 맵에 대해 32*k*w*h 비트가 필요할 수 있다. 반면에 본 실시예와 같이 AI 메타 데이터를 경량화하는 경우, 한 프레임 영상에 대해 (log₂k)*w*h 비트로 AI 메타 데이터를 표현할 수 있다. 따라서 AI 메타 데이터의 경량화에 의해, 경량화 이전 대비 AI 메타 데이터의 비트 수가 ((log₂ k) / (ck)) 수준으로 감소할 수 있다. 여기서 c는 기존 클래스 맵(620a, 620b, 620c, 및 620d)의 한 픽셀에 할당되는 비트 수를 의미한다(앞의 예에서는 32비트). 따라서 본 실시예에 따르면, AI 메타 데이터에 요구되는 비트 수를 현저하게 감소시킬 수 있는 효과가 있다.When the AI metadata is not lightweight, a plurality of class maps 620a, 620b, 620c, and 620d include class maps 620a, 620b, 620c, and 620d for each class, k for one frame image *w*h bits may be required. In addition, if the class maps 620a, 620b, 620c, and 620d indicate the probability of corresponding to the corresponding class for each pixel (618), it may be necessary to generate a class map as a floating value to represent the probability value, When the level of the floating value is 32, 32*k*w*h bits may be required for a plurality of class maps of one frame image. On the other hand, when the AI metadata is lightened as in the present embodiment, the _{AI metadata can be expressed as (log 2} k)*w*h bits for one frame image. Therefore, by lightening the AI metadata, the number of bits of the AI metadata _{can be reduced to ((log 2} k) / (ck)) level compared to before the weight reduction. Here, c denotes the number of bits allocated to one pixel of the existing class maps 620a, 620b, 620c, and 620d (32 bits in the previous example). Therefore, according to the present embodiment, there is an effect that the number of bits required for AI metadata can be significantly reduced.

도 7은 본 개시의 일 실시예에 따른 메타 정보 압축부의 동작을 나타낸 도면이다.7 is a diagram illustrating an operation of a meta information compression unit according to an embodiment of the present disclosure.

메타 정보 압축부(216)는 메타 정보 경량화부(214)에서 경량화된 클래스 정보에 대한 압축을 수행한다. 메타 정보 압축부(216)는 클래스 정보에 대한 압축 및 경량화된 클래스 맵에 대한 압축을 수행할 수 있다. 도 7 및 도 8을 참고하여 클래스 정보에 대한 압축을 설명하고, 도 9를 참고하여 경량화된 클래스 맵에 대한 압축을 설명한다.The meta-information compression unit 216 compresses the class information lightened by the meta-information lightweight unit 214 . The meta information compression unit 216 may perform compression on class information and compression on a lightweight class map. Compression of class information will be described with reference to FIGS. 7 and 8 , and compression of a lightweight class map will be described with reference to FIG. 9 .

일 실시예에 따르면, 클래스 정보(730)는 미리 정의된 객체(710) 각각에 대해 할당된 비트 값을 가질 수 있다. 예를 들면, 클래스 정보(730)는 미리 정의된 객체(710) 각각에 대해, 클래스로 선택된 객체(710)에 대해 1의 값을 할당하고, 나머지 객체(710)에 대해 0의 값을 할당할 수 있다. 예를 들면, AI 메타 데이터에 대해 8개의 미리 정의된 객체(710)가 있는 경우, 8비트의 클래스 정보가 정의되고, 각 비트에 각각의 객체(710)가 할당될 수 있다. 또한, 각 객체(710)에 대응하는 비트 값은 해당 객체(710)가 클래스로 선택되었는지 여부에 따라 0 또는 1의 값을 가질 수 있다. 따라서 도 7의 클래스 정보는, 디폴트 클래스, 하늘 클래스, 및 풀 클래스가 클래스로 선택된 클래스 정보(730)를 나타낸다.According to an embodiment, the class information 730 may have a bit value allocated to each of the predefined objects 710 . For example, the class information 730 assigns a value of 1 to the object 710 selected as a class, for each of the predefined objects 710 , and assigns a value of 0 to the remaining objects 710 . can For example, if there are 8 predefined objects 710 for AI metadata, 8 bits of class information may be defined, and each object 710 may be allocated to each bit. Also, the bit value corresponding to each object 710 may have a value of 0 or 1 depending on whether the corresponding object 710 is selected as a class. Accordingly, the class information of FIG. 7 indicates class information 730 in which a default class, an empty class, and a full class are selected as classes.

메타 정보 압축부(216)는 클래스 정보(730)에 대한 압축 처리를 수행할 수 있다. 메타 정보 압축부(216)는 클래스 정보(730) 및 클래스 맵을 심볼 압축에 사용되는 압축 방식을 이용하여 압축하여, 비트스트림으로 전송할 수 있다. 메타 정보 압축부(216)는 예를 들면, Run-length encoding, Huffman coding, 또는 Arithmetic coding 등의 방식을 이용하여 클래스 정보(730) 및 클래스 맵을 압축할 수 있다.The meta information compression unit 216 may perform compression processing on the class information 730 . The meta information compression unit 216 may compress the class information 730 and the class map using a compression method used for symbol compression, and transmit the compressed information as a bitstream. The meta information compression unit 216 may compress the class information 730 and the class map using, for example, run-length encoding, Huffman coding, or arithmetic coding.

도 8은 본 개시의 일 실시예에 따라, 메타 정보 압축부에서 클래스 정보를 압축하는 방식을 나타낸 도면이다. 도 8에서는 앞서 도 7에서 설명한 실시예와 같이, 각 객체 대해 하나의 비트가 할당되고, 1과 0의 값으로 클래스 정보를 나타내는 실시예를 중심으로 설명한다. 8 is a diagram illustrating a method of compressing class information in a meta information compression unit according to an embodiment of the present disclosure. In FIG. 8, as in the embodiment described above with reference to FIG. 7, an embodiment in which one bit is allocated to each object and class information is indicated by values of 1 and 0 will be mainly described.

일 실시예에 따르면, 제1 영상은 복수의 프레임을 포함하는 비디오 데이터이고, 클래스 정보는 복수의 프레임(Frame_1, Frame_2, ..., Frame_n)을 포함하는 시퀀스 별로 정의될 수 있다. According to an embodiment, the first image is video data including a plurality of frames, and class information may be defined for each sequence including the plurality of frames (Frame_1, Frame_2, ..., Frame_n).

일 실시예에 따르면, 시퀀스에 포함되는 프레임의 개수는 소정 개수로 미리 설정될 수 있다. According to an embodiment, the number of frames included in the sequence may be preset to a predetermined number.

다른 실시예에 따르면, 시퀀스에 포함되는 프레임의 개수는 제1 영상의 영상 컨텐츠에 기초하여 동적으로 정의될 수 있다. 예를 들면, 제1 영상에서 소정 기준 값 이상의 영상 특성 변화가 검출되는 경우, 씬(scene) 변화가 검출되는 경우 등에 서로 다른 시퀀스로 구별되어 정의될 수 있다. 또한, 시퀀스에 포함되는 프레임 개수의 최대 값이 설정되고, 메타 정보 추출부(212a)는 제1 영상의 영상 컨텐츠에서 시퀀스 변화의 기준에 충족되는 경우가 검출되지 않더라도 시퀀스에 포함되는 프레임 개수가 최대 값에 도달하는 경우, 각 시퀀스에 최대 값 이하의 프레임만이 포함되도록 시퀀스를 추가로 정의할 수 있다.According to another embodiment, the number of frames included in the sequence may be dynamically defined based on the image content of the first image. For example, when an image characteristic change of a predetermined reference value or more is detected in the first image, when a scene change is detected, etc., different sequences may be distinguished and defined. In addition, the maximum value of the number of frames included in the sequence is set, and the meta information extraction unit 212a sets the maximum number of frames included in the sequence even if it is not detected that the criterion for sequence change is not satisfied in the image content of the first image When a value is reached, the sequence can be further defined so that each sequence contains only frames below the maximum value.

다른 실시예에 따르면, 시퀀스에 포함되는 프레임은, 해당 시퀀스의 프레임에서 검출된 후보 클래스에 기초하여 결정될 수 있다. 메타 데이터 추출부(212a)는 해당 시퀀스의 프레임에서, 소정 값 이상의 검출 값을 갖는 후보 클래스의 개수가 소정 개수를 초과하지 않도록 시퀀스를 정의할 수 있다. 예를 들면, 메타 정보 압축부(216)는 미리 정의된 객체가 8개인 경우, 해당 시퀀스 프레임에서 소정 값 이상의 검출 값을 갖는 객체의 개수가 4개를 초과하지 않도록 시퀀스를 정의할 수 있다.According to another embodiment, a frame included in a sequence may be determined based on a candidate class detected in a frame of the corresponding sequence. The metadata extractor 212a may define a sequence such that the number of candidate classes having a detection value greater than or equal to a predetermined value in a frame of the corresponding sequence does not exceed the predetermined number. For example, when there are eight predefined objects, the meta information compression unit 216 may define a sequence such that the number of objects having a detection value greater than or equal to a predetermined value in a corresponding sequence frame does not exceed four.

일 실시예에 따르면, 클래스 정보(810)는 시퀀스 클래스 정보(812) 및 프레임 클래스 정보(814)를 포함할 수 있다. 클래스 정보(810)는 시퀀스에 포함되는 프레임의 개수에 대응하는 개수만큼의 프레임 클래스 정보(814)를 포함할 수 있다. 또한, 클래스 정보(810)는 해당 클래스 정보(810)에 대응하는 시퀀스에 포함되는 프레임 범위, 프레임 개수 등의 추가 정보를 포함할 수 있다. 또한, 일 실시예에 따르면, 클래스 정보(810)는 해당 클래스 정보(810)의 길이, 해당 클래스 정보(810)에서 시퀀스 클래스 정보(812) 및 프레임 클래스 정보(814)의 범위 등의 추가 정보를 포함할 수 있다. According to an embodiment, the class information 810 may include sequence class information 812 and frame class information 814 . The class information 810 may include as much frame class information 814 as the number corresponding to the number of frames included in the sequence. In addition, the class information 810 may include additional information such as a frame range and the number of frames included in a sequence corresponding to the class information 810 . In addition, according to an embodiment, the class information 810 includes additional information such as the length of the corresponding class information 810 and the range of the sequence class information 812 and the frame class information 814 in the corresponding class information 810 . may include

시퀀스 클래스 정보(812)는 해당 시퀀스에 포함된 프레임들에서 검출된 후보 클래스들을 나타낸다. 시퀀스 클래스 정보(812)는 미리 정의된 후보 클래스의 개수에 대응하는 비트 수를 포함할 수 있다. 각 프레임의 클래스 정보에서 선택되는 후보 클래스의 개수가 k개인 경우, 시퀀스 클래스 정보(812)는 k개보다 크거나 갖은 수의 후보 클래스에 대해 1의 값을 가질 수 있다. 예를 들면, 프레임 클래스 정보(814)가 후보 클래스 중, 객체 검출의 검출 값의 상위 3개의 후보 클래스를 선택하여 나타낸 정보인 경우, 시퀀스 클래스 정보(812)는 3개 이상의 1의 값(예를 들면 4개의 1의 값)을 가질 수 있다. 이러한 경우, 시퀀스 클래스 정보는 해당 시퀀스에 포함된 프레임들이 4개의 후보 클래스 중 소정 개수(예를 들면, 3개)의 후보 클래스에 대응하는 후보 클래스의 객체를 포함함을 의미한다. The sequence class information 812 indicates candidate classes detected from frames included in the corresponding sequence. The sequence class information 812 may include the number of bits corresponding to the number of predefined candidate classes. When the number of candidate classes selected from the class information of each frame is k, the sequence class information 812 may have a value of 1 for a number of candidate classes greater than or equal to k. For example, when the frame class information 814 is information indicated by selecting and representing the upper three candidate classes of the detection values of object detection among the candidate classes, the sequence class information 812 includes three or more values of 1 (eg, For example, it can have 4 values of 1). In this case, the sequence class information means that frames included in the corresponding sequence include objects of candidate classes corresponding to a predetermined number (eg, three) of the four candidate classes.

프레임 클래스 정보(814)는 시퀀스에 포함되는 각각의 프레임(Frame_1, Frame_2, ..., Frame_n)에 대응하는 정보로서, 각 프레임에서 검출되는 객체에 대응되는 후보 클래스에 대한 정보를 포함한다. 프레임 클래스 정보(814)는 프레임 순서에 따라 순차적으로 배치되어 비트 스트림에 기록될 수 있다. The frame class information 814 is information corresponding to each frame (Frame_1, Frame_2, ..., Frame_n) included in the sequence, and includes information about a candidate class corresponding to an object detected in each frame. The frame class information 814 may be sequentially arranged according to a frame order and recorded in a bit stream.

프레임 클래스 정보(614)는 시퀀스 클래스 정보(812)에서 1의 값을 갖는 비트 수에 대응하는 데이터 크기(비트 수)를 가질 수 있다. 예를 들면, 시퀀스 클래스 정보(812)에서 1의 값을 갖는 비트 수가 4개인 경우, 해당 시퀀스에 대응하는 복수의 프레임 클래스 정보(814)는 4 비트로 정의될 수 있다. 따라서 일 실시예에 따르면, 프레임 클래스 정보(814)의 비트 수는 시퀀스 클래스 정보(812)에 따라 달라질 수 있다. 다른 실시예에 따르면, 프레임 클래스 정보(814)의 비트 수는 미리 설정되고, 시퀀스 클래스 정보(812)는 프레임 클래스 정보(814)의 비트 수에 대응하는 비트 수만큼의 1의 값을 가질 수 있다. 이를 위해, 시퀀스는 프레임 클래스 정보(814)의 비트 수에 대응하는 개수 이하의 1의 값을 갖도록 정의될 수 있다.The frame class information 614 may have a data size (number of bits) corresponding to the number of bits having a value of 1 in the sequence class information 812 . For example, when the number of bits having a value of 1 in the sequence class information 812 is 4, a plurality of frame class information 814 corresponding to the sequence may be defined as 4 bits. Accordingly, according to an embodiment, the number of bits of the frame class information 814 may vary according to the sequence class information 812 . According to another embodiment, the number of bits of the frame class information 814 is preset, and the sequence class information 812 may have a value of 1 as many as the number of bits corresponding to the number of bits of the frame class information 814 . . To this end, the sequence may be defined to have a value of 1 or less corresponding to the number of bits of the frame class information 814 .

도 9는 본 개시의 일 실시예에 따른 영상 제공 장치의 메타 정보 압축부의 동작을 나타낸 도면이다. 9 is a diagram illustrating an operation of a meta information compression unit of an image providing apparatus according to an embodiment of the present disclosure.

메타 정보 압축부(216)는 메타 정보 경량화부(214a)에서 생성된 경량화된 클래스 맵(902)을 다운스케일하여, 경량화된 클래스 맵(902)을 압축하고, 압축된 클래스 맵(904)을 생성하여 출력한다. 경량화된 클래스 맵(902)의 압축 처리는 영상 압축에 사용되는 무손실 압축 방식을 이용할 수 있다. 예를 들면, 메타 정보 압축부(216)는 경량화된 클래스 맵(902) 내의 픽셀 값을 그룹핑하여, 최대 확률 클래스 정보를 추출함에 의해, 경량화된 클래스 맵(902)을 압축할 수 있다. 메타 정보 압축부(216)는 예를 들면, Run-length encoding, Huffman coding, Arithmetic coding 등의 방식을 이용하여 경량화된 클래스 맵(902)을 압축할 수 있다. 일 실시예에 따르면, 메타 정보 압축부(216)는 경량화된 클래스 정보를 소정의 압축 방식을 이용하여 압축할 수 있다. 메타 정보 압축부(216)는 압축된 클래스 정보 및 압축된 클래스 맵(904)을 포함하는 압축 AI 메타 데이터를 생성한다. The meta information compression unit 216 downscales the lightweight class map 902 generated by the meta information lightweight unit 214a, compresses the lightweight class map 902, and generates a compressed class map 904 . to output Compression processing of the lightweight class map 902 may use a lossless compression method used for image compression. For example, the meta information compression unit 216 may compress the lightweight class map 902 by grouping pixel values in the lightweight class map 902 and extracting maximum probability class information. The meta information compression unit 216 may compress the lightweight class map 902 using, for example, run-length encoding, Huffman coding, or arithmetic coding. According to an embodiment, the meta information compression unit 216 may compress the lightweight class information using a predetermined compression method. The meta information compression unit 216 generates compressed AI metadata including the compressed class information and the compressed class map 904 .

본 출원인의 본 개시의 실시예들에 기초한 실험 결과에 의하면, AI 메타 데이터의 경량화된 클래스 맵(902)를 압축하더라도, 영상 재생 장치에서 화질의 열화 없이 유사한 디테일이 생성되는 것을 확인하였다. 따라서 본 실시예에 따르면, 화질의 열화 없이 AI 메타 데이터의 용량을 감소시켜, 데이터 저장 및 전송의 효율성을 증대시킬 수 있는 효과가 있다.According to the experimental results based on the present disclosure by the present applicant, it was confirmed that similar detail is generated without deterioration of image quality in the image reproducing apparatus even when the lightweight class map 902 of the AI metadata is compressed. Therefore, according to the present embodiment, there is an effect of reducing the capacity of AI metadata without deterioration of image quality, thereby increasing the efficiency of data storage and transmission.

도 10은 본 개시의 일 실시예에 따른 영상 제공 장치 제어 방법을 나타낸 흐름도이다. 10 is a flowchart illustrating a method of controlling an image providing apparatus according to an embodiment of the present disclosure.

본 개시의 영상 제공 장치 제어 방법의 각 단계들은 프로세서, 메모리, 및 출력부를 구비하고, 기계 학습 모델을 이용하는 다양한 형태의 전자 장치에 의해 수행될 수 있다. 본 명세서는 본 개시의 실시예들에 따른 영상 제공 장치(110)가 영상 제공 장치 제어 방법을 수행하는 실시예를 중심으로 설명한다. 따라서 영상 제공 장치(110)에 대해 설명된 실시예들은 영상 제공 장치 제어 방법에 대한 실시예들에 적용 가능하고, 반대로 영상 제공 장치 제어 방법에 대해 설명된 실시예들은 영상 제공 장치(110)에 대한 실시예들에 적용 가능하다. 개시된 실시예들에 따른 영상 제공 장치 제어 방법은 본 명세서에 개시된 영상 제공 장치(110)에 의해 수행되는 것으로 그 실시예가 한정되지 않고, 다양한 형태의 전자 장치에 의해 수행될 수 있다.Each step of the method for controlling an image providing apparatus of the present disclosure may be performed by various types of electronic devices including a processor, a memory, and an output unit, and using a machine learning model. The present specification will be mainly described with respect to an embodiment in which the image providing apparatus 110 according to embodiments of the present disclosure performs a method for controlling the image providing apparatus. Accordingly, the embodiments described for the image providing apparatus 110 are applicable to the embodiments of the method for controlling the image providing apparatus, and on the contrary, the embodiments described for the method for controlling the image providing apparatus 110 are for the image providing apparatus 110 . Applicable to the embodiments. The method for controlling an image providing apparatus according to the disclosed embodiments is not limited to being performed by the image providing apparatus 110 disclosed herein, and may be performed by various types of electronic devices.

영상 제공 장치는, 제1 영상을 입력 받는다(S1002). The image providing apparatus receives the first image (S1002).

영상 제공 장치는 제1 영상에 대해, AI 메타 데이터 추출 처리를 수행하여, AI 메타 데이터를 생성한다(S1004). 영상 제공 장치는 제1 영상으로부터 소정의 영상 특성 분석을 수행하여 영상 특성 정보 및 영상 특성 맵을 생성한다. 영상 제공 장치는 제1 영상을 제1 AI 네트워크에 입력하여, 미리 정의된 객체들에 대한 복수의 세그멘테이션 확률 맵을 생성한다. 영상 제공 장치는 복수의 세그멘테이션 확률 맵으로부터, 제1 영상에 포함된 객체에 대한 정보인 클래스 정보를 생성하고, 클래스 정보에서 정의된 각 클래스에 대한 클래스 맵을 생성한다. 영상 제공 장치는 클래스 정보 및 클래스 맵을 포함하는 AI 메타 데이터를 생성한다. The image providing apparatus generates AI metadata by performing AI metadata extraction processing on the first image (S1004). The image providing apparatus generates image characteristic information and an image characteristic map by performing a predetermined image characteristic analysis on the first image. The image providing apparatus generates a plurality of segmentation probability maps for predefined objects by inputting the first image to the first AI network. The image providing apparatus generates class information, which is information about an object included in the first image, from a plurality of segmentation probability maps, and generates a class map for each class defined in the class information. The image providing apparatus generates AI metadata including class information and a class map.

다음으로, 영상 제공 장치는 클래스 정보 및 클래스 맵을 포함하는 AI 메타 데이터를 경량화하는 처리를 수행한다. 영상 제공 장치는 복수의 클래스 맵을 하나의 경량화된 클래스 맵으로 경량화한다. Next, the image providing apparatus performs a process of reducing the weight of AI metadata including class information and class map. The image providing apparatus lightens the plurality of class maps into one lightweight class map.

다음으로, 영상 제공 장치는 경량화된 클래스 맵을 압축하여 압축 클래스 맵을 생성한다. 경량화 클래스 정보도 추가로 압축될 수 있다. 영상 제공 장치는 압축된 클래스 정보 및 압축된 클래스 맵을 포함하는 압축 AI 메타 데이터를 생성하여 출력한다. 앞서 메타 데이터 생성부(210)에 대해 설명한 AI 메타 데이터 생성, 경량화, 및 압축 처리에 대한 내용은 S1004 단계의 AI 메타 데이터 생성 동작에도 유사하게 적용 가능하므로, 중복되는 설명은 생략한다. Next, the image providing apparatus compresses the lightweight class map to generate a compressed class map. Lightweight class information may be further compressed. The image providing apparatus generates and outputs compressed AI metadata including compressed class information and a compressed class map. Since the contents of the AI metadata generation, weight reduction, and compression processing described above for the metadata generator 210 are similarly applicable to the AI metadata generation operation of step S1004, a redundant description will be omitted.

또한, 영상 제공 장치는 제1 영상에 대해 부호화 처리를 수행하여, 부호화 영상을 생성한다(S1006). 영상 제공 장치는 다양한 종류의 영상 부호화 알고리즘을 이용하여 제1 영상을 부호화할 수 있다. 영상 제공 장치는 제 1 영상을 예측하여 예측 데이터를 생성하는 과정, 제 1 영상과 예측 데이터 사이의 차이에 해당하는 잔차 데이터를 생성하는 과정, 공간 영역 성분인 잔차 데이터를 주파수 영역 성분으로 변환(transformation)하는 과정, 주파수 영역 성분으로 변환된 잔차 데이터를 양자화(quantization)하는 과정 및 양자화된 잔차 데이터를 엔트로피 부호화하는 과정 등을 수행할 수 있다. 이와 같은 부호화 과정은 MPEG-2, H.264 AVC(Advanced Video Coding), MPEG-4, HEVC(High Efficiency Video Coding), VC-1, VP8, VP9 및 AV1(AOMedia Video 1) 등 주파수 변환을 이용한 영상 압축 방법 중의 하나를 통해 구현될 수 있다.Also, the image providing apparatus generates an encoded image by performing an encoding process on the first image (S1006). The image providing apparatus may encode the first image using various types of image encoding algorithms. The image providing apparatus predicts the first image to generate prediction data, generates residual data corresponding to a difference between the first image and the prediction data, and transforms the residual data, which is a spatial domain component, into a frequency domain component. ), a process of quantizing the residual data transformed into a frequency domain component, and a process of entropy encoding the quantized residual data may be performed. This encoding process uses frequency conversion such as MPEG-2, H.264 AVC (Advanced Video Coding), MPEG-4, HEVC (High Efficiency Video Coding), VC-1, VP8, VP9 and AV1 (AOMedia Video 1). It may be implemented through one of the image compression methods.

`다음으로 영상 제공 장치는 압축된 AI 메타 데이터와 부호화 영상을 출력한다(S1008). 영상 제공 장치는 통신 인터페이스, 디스플레이, 스피커, 터치스크린 등의 다양한 종류의 출력 인터페이스를 통해 압축된 AI 메타 데이터와 부호화 영상을 출력할 수 있다.`Next, the image providing apparatus outputs the compressed AI metadata and the encoded image (S1008). The image providing apparatus may output compressed AI metadata and encoded images through various types of output interfaces such as a communication interface, a display, a speaker, and a touch screen.

도 11은 본 개시의 일 실시예에 따른 영상 재생 장치의 입력부, 프로세서, 및 출력부의 구조를 나타낸 도면이다. 11 is a diagram illustrating structures of an input unit, a processor, and an output unit of an image reproducing apparatus according to an embodiment of the present disclosure.

영상 재생 장치(150)는 입력부(152)를 통해 부호화 영상 및 압축된 AI 메타 데이터를 입력 받는다. 부호화 영상 및 압축된 AI 메타 데이터는 영상 제공 장치(110)에서 생성된 부호화 영상 및 압축된 AI 메타 데이터에 대응하는 데이터이다. 입력부(152)는 부호화 영상을 복호화부(1110)로 출력하고, 압축된 AI 메타 데이터를 화질 처리부(1120)로 출력한다.The image reproducing apparatus 150 receives the encoded image and compressed AI metadata through the input unit 152 . The encoded image and the compressed AI metadata are data corresponding to the encoded image and the compressed AI metadata generated by the image providing apparatus 110 . The input unit 152 outputs the encoded image to the decoder 1110 , and outputs compressed AI metadata to the image quality processing unit 1120 .

입력부(152)는 외부 장치로부터 부호화 영상 및 압축된 AI 메타 데이터를 입력 받는 통신부, 입출력 인터페이스 등에 대응될 수 있다.The input unit 152 may correspond to a communication unit that receives an encoded image and compressed AI metadata from an external device, an input/output interface, and the like.

프로세서(154a)는 복호화부(1110) 및 화질 처리부(1120)를 포함한다. The processor 154a includes a decoding unit 1110 and an image quality processing unit 1120 .

복호화부(1110)는 부호화 영상에 대한 복호화 처리를 수행하여, 제2 영상을 생성한다. 복호화부(1110)는 부호화 영상에 적용된 영상 부호화 알고리즘에 대응하는 복호화 방법을 이용하여 부호화 영상을 복호화할 수 있다. 부호화 영상에 적용된 영상 부호화 알고리즘에 대한 정보는 부호화 영상의 헤더 또는 부호화 영상과 함께 입력된 추가 정보 등으로부터 획득할 수 있다. 영상 제공 장치(110)의 부호화부(220)에서 사용된 부호화 방법에 대응하는 방법을 이용하여 부호화 영상을 복호화 할 수 있다. 복호화부(1110)는 영상 데이터를 엔트로피 복호화하여 양자화된 잔차 데이터를 생성하는 과정, 양자화된 잔차 데이터를 역 양자화하는 과정, 예측 데이터를 생성하는 과정, 예측 데이터와 잔차 데이터를 이용하여 제2 영상을 복원하는 과정 등을 포함할 수 있다. 이와 같은, 복호화 과정은 부호화 영상을 부호화하는 과정에서 사용된 MPEG-2, H.264, MPEG-4, HEVC, VC-1, VP8, VP9 및 AV1 등 주파수 변환을 이용한 영상 압축 방법 중의 하나에 대응되는 영상 복원 방법을 통해 구현될 수 있다.The decoder 1110 generates a second image by performing a decoding process on the encoded image. The decoder 1110 may decode the encoded image by using a decoding method corresponding to an image encoding algorithm applied to the encoded image. Information on the image encoding algorithm applied to the encoded image may be obtained from the header of the encoded image or additional information input together with the encoded image. The encoded image may be decoded using a method corresponding to the encoding method used by the encoder 220 of the image providing apparatus 110 . The decoder 1110 entropy-decodes image data to generate quantized residual data, inverse quantizes the quantized residual data, generates prediction data, and generates a second image using the prediction data and the residual data. It may include a process of restoration, and the like. Such a decoding process corresponds to one of the video compression methods using frequency conversion, such as MPEG-2, H.264, MPEG-4, HEVC, VC-1, VP8, VP9, and AV1 used in the process of encoding the encoded video. It can be implemented through an image restoration method.

화질 처리부(1120)는 복호화된 제2 영상 및 압축된 AI 메타 데이터를 입력 받아, 화질 개선 처리를 수행하여 제3 영상을 생성하고 출력한다. 화질 개선 처리는 복호화된 제2 영상에 대한 기존의 다양한 종류의 화질 처리, 및 본 개시의 실시예들에 따른 AI 메타 데이터를 이용한 화질 처리를 포함할 수 있다. 화질 처리부(1120)는 입력 정보 가공부(1122), 메타 정보 합성부(1124), 메타 정보 복원부(1126), 및 메타 정보 가공부(1128)를 포함할 수 있다.The image quality processing unit 1120 receives the decoded second image and the compressed AI metadata, performs image quality improvement processing, and generates and outputs a third image. The image quality improvement processing may include various types of existing image quality processing for the decoded second image, and image quality processing using AI metadata according to embodiments of the present disclosure. The image quality processing unit 1120 may include an input information processing unit 1122 , a meta information synthesis unit 1124 , a meta information restoration unit 1126 , and a meta information processing unit 1128 .

입력부(152)를 통해 입력된 AI 메타 데이터는 메타 정보 복원부(1126)로 입력된다. 메타 정보 복원부(1126)는 영상 제공 장치(110)로부터 입력된 압축된 AI 메타 데이터를 입력 받아, 압축 해제 처리(1130) 및 경량화 복원 처리(1132)를 수행하여, 압축된 AI 메타 데이터를 복수의 클래스 맵 및 클래스 정보로 복원한다. 복수의 클래스 맵 및 클래스 정보는 영상 제공 장치(110)의 메타 정보 추출부(212)에서 출력된 클래스 맵 및 클래스 정보에 대응된다. AI metadata input through the input unit 152 is input to the meta information restoration unit 1126 . The meta information restoration unit 1126 receives the compressed AI metadata input from the image providing device 110 and performs a decompression process 1130 and a light weight restoration process 1132 to recover a plurality of compressed AI metadata. It restores the class map and class information of The plurality of class maps and class information correspond to the class map and class information output from the meta information extraction unit 212 of the image providing apparatus 110 .

메타 정보 복원부(1126)는 영상 제공 장치(110)의 메타 정보 경량화부(214)에서 경량화되고 메타 정보 압축부(216)에서 압축된 AI 메타 데이터에 대해, 압축 해제 처리(1130)를 수행하여 경량화된 클래스 맵 및 경량화된 클래스 정보를 생성한다. 압축 해제 처리(1130)는 영상 제공 장치(110)의 메타 정보 압축부(216)에서 사용한 부호화 알고리즘에 대응하는 복호화 처리에 의해 수행될 수 있다. 압축 해제 처리(1130)에 의해 생성된 경량화된 클래스 맵 및 경량화된 클래스 정보는 앞서 영상 제공 장치(110)의 메타 정보 경량화부(214a)에 의해 생성된 경량화된 클래스 맵(812) 및 경량화된 클래스 정보(814)에 대응된다. The meta information restoration unit 1126 performs a decompression process 1130 on the AI metadata that is lightened by the meta information lightweight unit 214 of the image providing device 110 and compressed by the meta information compression unit 216 . Lightweight class map and lightweight class information are generated. The decompression process 1130 may be performed by a decoding process corresponding to the encoding algorithm used by the meta information compression unit 216 of the image providing apparatus 110 . The lightweight class map and the lightweight class information generated by the decompression process 1130 are the lightweight class map 812 and the lightweight class previously generated by the meta information lightweight unit 214a of the image providing device 110 . It corresponds to information 814 .

또한, 메타 정보 복원부(1126)는 경량화 복원 처리(1132)를 수행하여, 경량화된 클래스 맵 및 경량화된 클래스 정보로부터 복수의 복원 클래스 맵 및 클래스 정보를 생성한다. 경량화 복원 처리(1132)는 경량화된 클래스 맵의 대표 값을 이용하여, 각 클래스에 대응하는 복수의 복원 클래스 맵을 생성한다. 경량화 복원 처리(1132)는, 경량화된 클래스 정보에 기록된 각 대표 값에 대응하는 클래스에 대한 정보에 기초하여, 각 클래스에 대응하는 복원 클래스 맵을 생성한다. 예를 들면, 경량화 복원 처리(1132)는 경량화된 클래스 정보로부터 제1 대표 값은 water 클래스에 대응한다는 정보를 획득하고, 경량화된 클래스 맵으로부터 제1 대표 값에 대응하는 영역을 추출하여, water 클래스에 대응하는 복원 클래스 맵을 생성한다. 이와 같은 방법으로 경량화 복원 처리(1132)는 각 대표 값에 대응하는 복원 클래스 맵을 생성하여, 경량화된 클래스 정보에 기록된 각 클래스에 대한 복수의 복원 클래스 맵을 생성한다. 또한, 경량화 복원 처리(1132)에 의해, 복수의 클래스 맵에 대응되는 클래스에 대한 정보를 기록한 클래스 정보가 생성된다. 경량화 복원 처리(1132)에 의해 생성된 복수의 복원 클래스 맵은 영상 제공 장치의 메타 정보 추출부(212)에서 생성된 클래스 맵으로부터 확률에 대한 정보가 제거되고, 각 픽셀이 해당 클래스에 해당하는지 여부에 대한 정보만을 나타낸다. 따라서 복원 클래스 맵은 영상 제공 장치의 메타 정보 추출부(212)에서 생성된 클래스 맵보다 작은 용량을 가질 수 있다. 경량화 복원 처리(1132)에 의해 생성된 클래스 정보는 메타 정보 추출부(212)에 의해 생성된 클래스 정보에 대응된다.In addition, the meta information restoration unit 1126 performs the lightweight restoration process 1132 to generate a plurality of restoration class maps and class information from the lightweight class map and the lightweight class information. The lightweight restoration process 1132 generates a plurality of restoration class maps corresponding to each class by using the representative value of the lightweight class map. The lightweight restoration process 1132 generates a restoration class map corresponding to each class based on the information about the class corresponding to each representative value recorded in the lightweight class information. For example, the light weight restoration process 1132 obtains information that the first representative value corresponds to the water class from the light weight class information, extracts the area corresponding to the first representative value from the light weight class map, and classifies the water class Create a restoration class map corresponding to . In this way, the lightweight restoration process 1132 generates a restoration class map corresponding to each representative value, and generates a plurality of restoration class maps for each class recorded in the lightweight class information. In addition, class information in which information about classes corresponding to a plurality of class maps is recorded is generated by the lightweight restoration process 1132 . In the plurality of restored class maps generated by the lightweight restoration process 1132 , information on probability is removed from the class map generated by the meta information extraction unit 212 of the image providing apparatus, and whether each pixel corresponds to a corresponding class Shows only information about Accordingly, the restored class map may have a smaller capacity than the class map generated by the meta information extractor 212 of the image providing apparatus. The class information generated by the lightweight restoration process 1132 corresponds to the class information generated by the meta information extraction unit 212 .

메타 정보 가공부(1128)는 메타 정보 복원부(1126)에 의해 복원된 AI 메타 데이터, 즉 복수의 복원 클래스 맵 및 클래스 정보로부터 화질 처리를 위한 변조 파라미터를 생성한다. 메타 정보 가공부(1128)는 복수의 복원 클래스 맵 및 클래스 정보로부터, 변조 파라미터를 생성하는 제2-2 AI 네트워크(1150)를 포함한다. 변조 파라미터는 메타 정보 합성부(1124)의 제2-1 AI 네트워크(1140)의 변조 레이어에 적용되는 파라미터이다. 제2-1 AI 네트워크(1140)는 복수의 변조 레이어를 포함한다. 메타 정보 가공부(1128)의 제2-2 AI 네트워크(1150)는 제2-1 AI 네트워크(1140)에 포함된 복수의 변조 레이어 각각에 대한 변조 파라미터를 생성한다. 또한, AI 메타 데이터는 각 프레임에 대응하는 복수의 복원 클래스 맵 및 클래스 정보를 포함하고, 메타 정보 가공부(1128)의 제2-2 AI 네트워크(1150)는 각 프레임에 대응하는 변조 파라미터를 생성할 수 있다. The meta information processing unit 1128 generates modulation parameters for image quality processing from the AI metadata restored by the meta information restoration unit 1126 , that is, a plurality of restored class maps and class information. The meta-information processing unit 1128 includes a 2-2 AI network 1150 that generates modulation parameters from a plurality of reconstructed class maps and class information. The modulation parameter is a parameter applied to the modulation layer of the 2-1 AI network 1140 of the meta information synthesis unit 1124 . The 2-1 AI network 1140 includes a plurality of modulation layers. The 2-2 AI network 1150 of the meta information processing unit 1128 generates modulation parameters for each of the plurality of modulation layers included in the 2-1 AI network 1140 . In addition, the AI metadata includes a plurality of restored class maps and class information corresponding to each frame, and the 2-2 AI network 1150 of the meta information processing unit 1128 generates a modulation parameter corresponding to each frame. can do.

입력 정보 가공부(1122)는 복호화부(1110)에서 복호화된 제2 영상을 메타 정보 합성부(1124)에서 요구되는 형태로 가공하여 메타 정보 합성부(1124)로 입력한다. 일 실시예에 따르면, 입력 정보 가공부(1122)는 특징 맵을 생성하여 메타 정보 합성부(1124)로 출력할 수 있다. 일 실시예에 따르면, 입력 정보 가공부(1122)는 특징 맵을 생성하기 위한 적어도 하나의 컨번루션 레이어 및 적어도 하나의 활성화 레이어의 조합을 포함할 수 있다. The input information processing unit 1122 processes the second image decoded by the decoding unit 1110 into a form required by the meta information synthesis unit 1124 and inputs it to the meta information synthesis unit 1124 . According to an embodiment, the input information processing unit 1122 may generate a feature map and output it to the meta information synthesis unit 1124 . According to an embodiment, the input information processing unit 1122 may include a combination of at least one convolutional layer and at least one activation layer for generating the feature map.

메타 정보 합성부(1124)는 입력 정보 가공부(1122)에서 출력된 특징 맵과, 메타 정보 가공부(1128)에서 생성된 변조 파라미터를 입력 받아, 제2 영상에 대해 화질 개선 처리를 수행하여 생성된 제3 영상을 출력한다. 메타 정보 합성부(1124)는 제2-1 AI 네트워크(1140)를 이용하여 제2 영상에 대한 화질 개선 처리를 수행할 수 있다. The meta information synthesizing unit 1124 receives the feature map output from the input information processing unit 1122 and the modulation parameter generated by the meta information processing unit 1128, and performs image quality improvement processing on the second image to generate it output the third image. The meta information synthesizer 1124 may perform image quality improvement processing on the second image using the 2-1 AI network 1140 .

메타 정보 합성부(1124)는 입력 정보 가공부(1122)로부터 특징 맵을 입력 받아, AI 메타 데이터에 기초한 화질 처리를 수행한다. 메타 정보 합성부(1124)는 메타 정보 합성 처리를 수행하는 제2-1 AI 네트워크(1140)를 포함한다. 메타 정보 합성부(1124)는 제2-1 AI 네트워크(1140)에 포함된 복수의 변조 레이어에, 메타 정보 가공부(1128)로부터 입력된 변조 파라미터를 적용하여, 제2 영상에 대한 화질 처리를 수행한다. 일 실시예에 따르면, 제2-1 AI 네트워크(1140)에 의한 변조 처리는 변조 파라미터에 기초한 아핀 변환(Affine transformation)을 이용하여 수행된다.The meta information synthesis unit 1124 receives the feature map from the input information processing unit 1122 and performs image quality processing based on AI metadata. The meta-information synthesizing unit 1124 includes a 2-1 AI network 1140 that performs meta-information synthesizing processing. The meta information synthesizing unit 1124 applies the modulation parameters input from the meta information processing unit 1128 to a plurality of modulation layers included in the 2-1 AI network 1140 to perform image quality processing on the second image. carry out According to an embodiment, the modulation processing by the 2-1 AI network 1140 is performed using an affine transformation based on a modulation parameter.

출력부(160)는 메타 정보 합성부(1124)에서 생성된 제3 영상을 입력 받아 출력한다. 일 실시예에 따르면, 출력부(160)는 제3 영상을 표시하는 디스플레이에 대응될 수 있다. 다른 실시예에 따르면, 출력부(160)는 제3 영상을 외부 장치로 전송하는 통신부에 대응될 수 있다.The output unit 160 receives and outputs the third image generated by the meta information synthesis unit 1124 . According to an embodiment, the output unit 160 may correspond to a display displaying the third image. According to another embodiment, the output unit 160 may correspond to a communication unit that transmits the third image to an external device.

도 12는 본 개시의 일 실시예에 따른 메타 정보 합성부의 동작을 나타낸 도면이다.12 is a diagram illustrating an operation of a meta information synthesizing unit according to an embodiment of the present disclosure.

메타 정보 합성부(1124)는 적어도 하나의 레이어를 포함하는 제2-1 AI 네트워크(1140)를 포함한다. 제2-1 AI 네트워크(1140)는 적어도 하나의 컨벌루션 레이어 및 적어도 하나의 변조 레이어의 조합을 포함할 수 있다. 또한, 제2-1 AI 네트워크(1140)는 적어도 하나의 활성화 레이어를 더 포함할 수 있다. 또한, 제2-1 AI 네트워크(1140)는 저해상도의 제2 영상으로부터 생성된 특징 맵을 입력 받아, 입력 데이터의 해상도를 업 스케일하는 업 스케일러(1220)를 포함할 수 있다. The meta information synthesis unit 1124 includes a 2-1 AI network 1140 including at least one layer. The 2-1 AI network 1140 may include a combination of at least one convolutional layer and at least one modulation layer. Also, the 2-1 AI network 1140 may further include at least one activation layer. Also, the 2-1 AI network 1140 may include an upscaler 1220 that receives the feature map generated from the low-resolution second image and upscales the resolution of the input data.

제2-1 AI 네트워크(1140)는 적어도 하나의 잔차 블록(residual block, 1212)을 포함하는 제1 화질 처리부(1210), 업 스케일러(1220), 및 제2 화질 처리부(1230)을 포함할 수 있다.The 2-1 AI network 1140 may include a first image quality processing unit 1210 including at least one residual block 1212 , an upscaler 1220 , and a second image quality processing unit 1230 . have.

본 명세서에서 제1 영상은 고해상도 원본 영상에 해당하고, 제2 영상 다운스케일된 저해상도 영상에 대응하고, 제3 영상은 영상 재생 장치(150)의 복원 과정에 의해 업스케일 된 슈퍼 해상도 영상(super resolution)에 대응한다.In the present specification, the first image corresponds to the high-resolution original image, the second image corresponds to the downscaled low-resolution image, and the third image corresponds to the super-resolution image upscaled by the restoration process of the image reproducing apparatus 150 . ) corresponds to

제1 화질 처리부(1210)는 복수의 잔차 블록(1212)을 포함한다. 제1 화질 처리부(1210)는 특징 벡터를 입력 받아 화질 개선 처리를 수행한다. 제1 화질 처리부(1210)는 복수의 잔차 블록(1212)을 스킵하는 제1 스킵 처리 경로(1214)를 포함할 수 있다. 제1 스킵 처리 경로(1214)는 입력 데이터를 그대로 전달하거나, 소정의 처리를 수행하여 전달할 수 있다. 제1 화질 처리부(1210)는 복수의 잔차 블록(1212)을 이용하여 잔차 버전(residual version)의 처리 결과 값을 생성하고, 제1 스킵 처리 경로(1214)를 통해 예측 버전(prediction version)의 처리 결과 값을 생성한다. 또한, 제1 화질 처리부(1210)는 잔차 버전의 처리 결과 값 및 예측 버전의 처리 결과 값을 합산하여, 제1 화질 처리부(1210)의 처리 결과 값으로 출력한다. The first image quality processing unit 1210 includes a plurality of residual blocks 1212 . The first image quality processing unit 1210 receives the feature vector and performs image quality improvement processing. The first image quality processing unit 1210 may include a first skip processing path 1214 that skips the plurality of residual blocks 1212 . The first skip processing path 1214 may transfer the input data as it is, or may perform a predetermined process to transfer the input data. The first image quality processing unit 1210 generates a processing result value of a residual version by using the plurality of residual blocks 1212 , and processes a prediction version through the first skip processing path 1214 . Produces a result value. Also, the first image quality processing unit 1210 sums the processing result value of the residual version and the processing result value of the prediction version, and outputs it as the processing result value of the first image quality processing unit 1210 .

일 실시예에 따르면, 제1 스킵 처리 경로(1214)는 복수의 잔차 블록(1212)을 모두 스킵하도록 시작점과 끝점이 정해질 수 있다. 제1 스킵 처리 경로(1214)의 시작점과 끝점은 실시예에 따라 다르게 결정될 수 있다. 또한, 제1 스킵 처리 경로(1214)의 개수는 다양하게 결정될 수 있고, 제1 화질 처리부(1210)는 하나 이상의 제1 스킵 처리 경로(1214)를 구비할 수 있다. According to an embodiment, a start point and an end point of the first skip processing path 1214 may be determined to skip all of the plurality of residual blocks 1212 . The starting point and the ending point of the first skip processing path 1214 may be determined differently according to embodiments. Also, the number of the first skip processing paths 1214 may be variously determined, and the first image quality processing unit 1210 may include one or more first skip processing paths 1214 .

도 13을 참조하여, 잔차 블록(1212)의 구조를 설명한다.Referring to FIG. 13 , the structure of the residual block 1212 will be described.

도 13은 본 개시의 일 실시예에 따른 잔차 블록(1212)의 구조를 나타낸 도면이다. 13 is a diagram illustrating a structure of a residual block 1212 according to an embodiment of the present disclosure.

잔차 블록(1212)은 특징 맵을 입력 받아 처리한다. 잔차 블록(1212)은 적어도 하나의 컨벌루션 레이어 또는 적어도 하나의 변조 레이어의 조합을 포함할 수 있다. 또한, 일 실시예에 따르면, 잔차 블록(1212)은 적어도 하나의 활성화 레이어를 더 포함할 수 있다. 일 실시예에 따르면, 잔차 블록(1212)은 컨벌루션 레이어, 변조 레이어, 및 활성화 레이어의 순서로 레이어가 반복 배치된 구조를 가질 수 있다.The residual block 1212 receives and processes the feature map. The residual block 1212 may include at least one convolutional layer or a combination of at least one modulation layer. Also, according to an embodiment, the residual block 1212 may further include at least one activation layer. According to an embodiment, the residual block 1212 may have a structure in which layers are repeatedly arranged in the order of a convolutional layer, a modulation layer, and an activation layer.

잔차 블록(1212)은 복수의 레이어를 포함하는 메인 스트림(1310)을 바이패스하여, 입력단으로부터 출력단으로 진행하는 제2 스킵 처리 경로(1320)를 포함할 수 있다. 메인 스트림(1310)은 입력 데이터에 대해 잔차 버전의 처리 결과 값을 생성하고, 제2 스킵 처리 경로(1320)는 예측 버전의 처리 결과 값을 생성한다. 잔차 블록(1212)은 잔차 버전의 처리 결과 값 및 예측 버전의 처리 결과 값을 합산하여, 잔차 블록(1212)의 처리 결과 값으로 출력한다. The residual block 1212 may include a second skip processing path 1320 that bypasses the main stream 1310 including a plurality of layers and proceeds from the input end to the output end. The main stream 1310 generates a residual version of the processing result value for the input data, and the second skip processing path 1320 generates a prediction version of the processing result value. The residual block 1212 sums the processing result value of the residual version and the processing result value of the prediction version, and outputs it as the processing result value of the residual block 1212 .

일 실시예에 따르면, 제2 스킵 처리 경로(1320)는 잔차 블록(1212) 내의 복수의 레이어를 모두 스킵하도록 시작점과 끝점이 정해질 수 있다. 제2 스킵 처리 경로(1320)의 시작점과 끝점은 실시예에 따라 다르게 결정될 수 있다. 또한, 제2 스킵 처리 경로(1320)의 개수는 다양하게 결정될 수 있고, 잔차 블록(1212)은 하나 이상의 제2 스킵 처리 경로(1320)를 구비할 수 있다. According to an embodiment, a start point and an end point of the second skip processing path 1320 may be determined to skip all of the plurality of layers in the residual block 1212 . The starting point and the ending point of the second skip processing path 1320 may be determined differently according to embodiments. Also, the number of second skip processing paths 1320 may be variously determined, and the residual block 1212 may include one or more second skip processing paths 1320 .

본 실시예에 따르면, 메인 스트림(1310)과 제2 스킵 처리 경로(1320)를 포함하는 구조에 의해, 복수의 레이어를 포함하는 메인 스트림에서는 제2-1 AI 네트워크(1140)의 출력에 대응하는 슈퍼 해상도 영상과 제2-1 AI 네트워크(1140)의 입력에 대응하는 저해상도 영상의 차이만을 학습하고, 나머지 정보는 제2 스킵 처리 경로(1320)를 통해 전달함에 의해, 디테일 학습 효율을 높일 수 있다. 여기서 메인 스트림(1310)의 학습에 이용되는 잔차 데이터(F_mian-stream(I_LR))는 슈퍼 해상도 영상(I_SR)과 저해상도 영상(I_LR)의 차영상으로, 다음과 같이 정의될 수 있다.According to the present embodiment, due to the structure including the main stream 1310 and the second skip processing path 1320 , the main stream including a plurality of layers corresponds to the output of the 2-1 AI network 1140 . By learning only the difference between the super-resolution image and the low-resolution image corresponding to the input of the 2-1 AI network 1140 , and transmitting the remaining information through the second skip processing path 1320 , the detail learning efficiency can be increased. . _{Here, the residual data F mian-stream} (I _LR ) used for learning the main stream 1310 is a difference image between the super-resolution image I _SR and the low-resolution image I _LR and may be defined as follows. .

[수학식 2][Equation 2]

F_mian-stream(I_LR) = I_SR - I_LR F _mian-stream (I _LR ) = I _SR - I _LR

변조 레이어는 입력 데이터에 대한 변조 처리를 수행한다. 변조 레이어는 예를 들면, 입력 데이터에 대한 아핀 변환 처리를 수행할 수 있다. 제1 화질 처리부(1210)에 포함되는 복수의 변조 레이어 각각에 대해, 별개의 변조 파라미터가 정의될 수 있다. 메타 정보 가공부(1128)는 제1 화질 처리부(1210)에 포함되는 복수의 변조 레이어 각각에 대한 변조 파라미터를 개별적으로 생성하여 메타 정보 합성부(1124)로 출력할 수 있다. 이를 위해, 메타 정보 가공부(1128)의 제2-2 AI 네트워크(1150)는 각 변조 레이어에 대응하는 네트워크를 별개로 구비할 수 있다.The modulation layer performs modulation processing on input data. The modulation layer may, for example, perform an affine transformation process on input data. A separate modulation parameter may be defined for each of the plurality of modulation layers included in the first image quality processing unit 1210 . The meta-information processing unit 1128 may individually generate modulation parameters for each of the plurality of modulation layers included in the first image quality processing unit 1210 and output them to the meta-information synthesis unit 1124 . To this end, the 2-2 AI network 1150 of the meta information processing unit 1128 may separately include a network corresponding to each modulation layer.

다시 도 12를 참조하면, 제1 화질 처리부(1210)의 처리 결과가 업 스케일러(1220)로 입력된다. 업 스케일러(1220)는 제1 화질 처리부(1210)의 처리 결과에 대해 업스케일 처리를 수행한다. 만약 제2-1 AI 네트워크(1140) 이전에 제2 영상의 업스케일 처리가 수행된 후에, 고해상도의 제2 영상의 특징 맵이 제2-1 AI 네트워크(1140)로 입력되는 경우, 제2-1 AI 네트워크(1140)에서 업 스케일러(1220)는 생략될 수 있다. 업 스케일러(1220)는 다양한 업 스케일 알고리즘을 이용하여 구현될 수 있으며, DNN 구조의 AI 네트워크로 구현될 수 있다.Referring again to FIG. 12 , the processing result of the first image quality processing unit 1210 is input to the upscaler 1220 . The upscaler 1220 upscales the processing result of the first image quality processing unit 1210 . If, after the upscaling of the second image is performed before the 2-1 AI network 1140 , the feature map of the second image with high resolution is input to the 2-1 AI network 1140 , the second- 1 In the AI network 1140 , the upscaler 1220 may be omitted. The upscaler 1220 may be implemented using various upscaling algorithms, and may be implemented as an AI network having a DNN structure.

제2 화질 처리부(1230)는 업 스케일러(1220)의 출력에 대해, 추가적인 화질 처리를 수행하여 제3 영상을 생성하여 출력한다. 만약 제2-1 AI 네트워크(1140)에서 업 스케일러(1220)가 생략된 경우, 제2 화질 처리부(1230)는 제1 화질 처리부(1210)의 처리 결과 값에 대해, 추가적인 화질 처리를 수행한다. 제2 화질 처리부(1230)는 적어도 하나의 컨벌루션 레이어 및 적어도 하나의 활성화 레이어의 조합을 포함할 수 있다. The second image quality processing unit 1230 generates and outputs a third image by performing additional image quality processing on the output of the upscaler 1220 . If the upscaler 1220 is omitted from the 2-1 AI network 1140 , the second image quality processing unit 1230 performs additional image quality processing on the processing result value of the first image quality processing unit 1210 . The second image quality processing unit 1230 may include a combination of at least one convolutional layer and at least one activation layer.

도 14는 본 개시의 일 실시예에 따른 제2-1 AI 네트워크 및 제2-2 AI 네트워크의 동작을 나타낸 도면이다.14 is a diagram illustrating operations of a 2-1 AI network and a 2-2 AI network according to an embodiment of the present disclosure.

일 실시예에 따르면, 메타 정보 가공부(1128)는 영상 제공 장치(110)로부터 수신되어 복원된 AI 메타 데이터로부터 변조 파라미터를 생성하여 메타 정보 합성부(1124)의 제2-1 AI 네트워크(1140)로 출력한다. 메타 정보 가공부(1128)는 제2-2 AI 네트워크(1150)를 포함하거나, 외부 장치에 구비된 제2-2 AI 네트워크(1150)를 이용할 수 있다. According to an embodiment, the meta-information processing unit 1128 generates a modulation parameter from the AI metadata received from the image providing device 110 and restored, and the second-first AI network 1140 of the meta-information synthesis unit 1124 . ) is output. The meta information processing unit 1128 may include the 2-2 AI network 1150 or may use the 2-2 AI network 1150 provided in the external device.

제2-2 AI 네트워크(1150)는 적어도 하나의 컨벌루션 레이어 및 적어도 하나의 활성화 레이어의 조합을 포함한다. 변조 파라미터는 복수의 변조 파라미터 세트를 포함하고, 제2-2 AI 네트워크(1150)는 복수의 변조 파라미터 세트 각각에 대해 별개의 AI 네트워크(1410, 1420)를 포함할 수 있다. 하나의 변조 레이어에 대해, 복수의 파라미터를 포함하는 변조 파라미터 세트가 출력되고, 제2-2 AI 네트워크(1150)는 복수의 변조 레이어에 대해 복수의 변조 파라미터 세트를 생성하여 출력할 수 있다. 제2-2 AI 네트워크(1150)는 제1 변조 레이어에 대해 제1 변조 파라미터 세트를 생성하여 출력하고, 제2 변조 레이어에 대해 제2 변조 파라미터 세트를 생성하여 출력할 수 있다. 제2-2 AI 네트워크(1150)는 제1 변조 레이어에 대응하는 AI 네트워크(1410a, 1420a), 제2 변조 레이어에 대응하는 AI 네트워크(1410b, 1420b), 제3 변조 레이어에 대응하는 AI 네트워크(1410c, 1420c)를 포함할 수 있다. 변조 레이어의 개수가 늘어남에 따라, 제2-2 AI 네트워크(1140) 내의 AI 네트워크(1410, 1420)의 개수도 늘어난다. 일 실시예에 따르면, 변조 파라미터 세트는 제1 변조 파라미터 및 제2 변조 파라미터를 포함하고, 제2-2 AI 네트워크(1150)는 제1 변조 파라미터를 생성하는 제1 변조 파라미터 생성부(1410) 및 제2 변조 파라미터 생성부(1420)를 포함할 수 있다. 제1 변조 파라미터 생성부(1410) 및 제2 변조 파라미터 생성부(1420)는 서로 일부 레이어를 공유하는 구조로 구성되거나, 공유하는 레이어 없이 별개로 구성될 수 있다.The 2-2 AI network 1150 includes a combination of at least one convolutional layer and at least one activation layer. The modulation parameter may include a plurality of modulation parameter sets, and the 2-2 AI network 1150 may include separate AI networks 1410 and 1420 for each of the plurality of modulation parameter sets. For one modulation layer, a modulation parameter set including a plurality of parameters is output, and the 2-2 AI network 1150 may generate and output a plurality of modulation parameter sets for a plurality of modulation layers. The 2-2 AI network 1150 may generate and output a first modulation parameter set for the first modulation layer and may generate and output a second modulation parameter set for the second modulation layer. The 2-2 AI network 1150 includes AI networks 1410a and 1420a corresponding to the first modulation layer, AI networks 1410b and 1420b corresponding to the second modulation layer, and AI networks corresponding to the third modulation layer ( 1410c, 1420c). As the number of modulation layers increases, the number of AI networks 1410 and 1420 in the 2-2 AI network 1140 also increases. According to an embodiment, the modulation parameter set includes a first modulation parameter and a second modulation parameter, and the 2-2 AI network 1150 includes a first modulation parameter generator 1410 that generates the first modulation parameter, and A second modulation parameter generator 1420 may be included. The first modulation parameter generator 1410 and the second modulation parameter generator 1420 may be configured in a structure that shares some layers with each other, or may be configured separately without a shared layer.

제2-2 AI 네트워크(1150)는 복수의 변조 파라미터 세트를 생성하여, 제2-1 AI 네트워크(1140)로 출력한다. 제2-1 AI 네트워크(1140)는 제2-2 AI 네트워크(1150)로부터 입력된 복수의 변조 파라미터 세트를 이용하여, 적어도 하나의 변조 레이어 각각의 변조 파라미터 세트를 설정한다. 제2-1 AI 네트워크(1140)의 변조 레이어는 설정된 변조 파라미터 세트에 기초하여 입력 데이터에 대한 변조 처리를 수행한다. The 2-2 AI network 1150 generates a plurality of modulation parameter sets and outputs them to the 2-1 AI network 1140 . The 2-1 AI network 1140 sets a modulation parameter set for each of at least one modulation layer by using the plurality of modulation parameter sets input from the 2-2 AI network 1150 . The modulation layer of the 2-1 AI network 1140 performs modulation processing on input data based on the set modulation parameter set.

제2-2 AI 네트워크(1150)로부터 출력되는 복수의 변조 파라미터 세트는 소정 차원의 벡터 또는 텐서(tensor)의 형태로 기록될 수 있다. 예를 들면, 제2-2 AI 네트워크(1150)는 64채널을 통해 제2-1 AI 네트워크로 복수의 변조 파라미터 세트를 출력하고, 각 채널은 복수의 변조 파라미터 세트에 대응하는 벡터의 각 엘리먼트에 대응될 수 있다. 복수의 변조 파라미터 세트에 포함되는 엘리먼트의 수에 따라 제2-2 AI 네트워크(1150)의 채널 수가 결정될 수 있다.The plurality of modulation parameter sets output from the 2-2 AI network 1150 may be recorded in the form of a vector or tensor of a predetermined dimension. For example, the 2-2 AI network 1150 outputs a plurality of modulation parameter sets to the 2-1 AI network through 64 channels, and each channel corresponds to each element of a vector corresponding to the plurality of modulation parameter sets. can be matched. The number of channels of the 2-2 AI network 1150 may be determined according to the number of elements included in the plurality of modulation parameter sets.

도 15a는 본 개시의 일 실시예에 따른 제2-1 AI 네트워크 및 제2-2 AI 네트워크의 구조를 나타낸 도면이다.15A is a diagram illustrating structures of a 2-1 AI network and a 2-2 AI network according to an embodiment of the present disclosure.

일 실시예에 따르면, 제2-2 AI 네트워크(1150)는 제2-1 AI 네트워크(1140)의 제1 화질 처리부(1210)에 포함된 변조 레이어(1512a, 1512b, 1512c, 1512d)의 개수만큼의 변조 파라미터 생성 네트워크(1520a, 1520b, 1520c, 1520d)을 포함할 수 있다. 복수의 변조 파라미터 생성 네트워크(1520a, 1520b, 1520c, 1520d)는 각각 AI 메타 데이터(AI_META)를 입력 받아, 해당 변조 파라미터 생성 네트워크(1520a, 1520b, 1520c, 1520d)에 대응하는 변조 레이어(1512a, 1512b, 1512c, 1512d)에 해당하는 변조 파라미터(P_SET1, P_SET2, P_SET3, P_SET4)를 생성하여 출력한다. 각각의 변조 파라미터 생성 네트워크(1520a, 1520b, 1520c, 1520d)는 제1 변조 파라미터 생성부(1410) 및 제2 변조 파라미터 생성부(1420)를 포함할 수 있다. 일 실시예에 따르면, 복수의 변조 파라미터 생성 네트워크(1520a, 1520b, 1520c, 1520d)는 실시예에 따라, 일부 레이어를 공유하거나, 레이어를 공유하지 않을 수 있다. According to an embodiment, the 2-2 AI network 1150 is configured by the number of modulation layers 1512a, 1512b, 1512c, 1512d included in the first image quality processing unit 1210 of the 2-1 AI network 1140 . modulation parameter generation networks 1520a, 1520b, 1520c, and 1520d. The plurality of modulation parameter generation networks 1520a, 1520b, 1520c, and 1520d receive AI metadata (AI_META) as input, respectively, and modulation layers 1512a and 1512b corresponding to the modulation parameter generation networks 1520a, 1520b, 1520c, 1520d. , 1512c, 1512d) are generated and outputted to the modulation parameters (P_SET1, P_SET2, P_SET3, P_SET4). Each of the modulation parameter generation networks 1520a , 1520b , 1520c , and 1520d may include a first modulation parameter generator 1410 and a second modulation parameter generator 1420 . According to an embodiment, the plurality of modulation parameter generating networks 1520a, 1520b, 1520c, and 1520d may share some layers or may not share layers, depending on the embodiment.

도 15b는 본 개시의 일 실시예에 따른 제2-1 AI 네트워크 및 제2-2 AI 네트워크의 구조를 나타낸 도면이다.본 개시의 일 실시예에 따르면, 영상 재생 장치(150)는 영상 제공 장치(110)로부터 복수의 영상 특성에 대응하는 복수의 AI 메타 데이터를 수신할 수 있다. 복수의 영상 특성은 예를 들면, 객체, 주파수, 텍스처, 시멘틱, 또는 촬영 파라미터 중 적어도 하나 또는 이들의 조합을 포함할 수 있다. 영상 재생 장치(150)는 각각의 영상 특성에 대응하는 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)를 포함할 수 있다. 예를 들면, 영상 재생 장치(150)는 객체 특성에 대응하는 제1 AI 메타 데이터(AI_META1), 및 주파수 특성에 대응하는 제2 AI 메타 데이터(AI_META2)를 수신할 수 있다. 제1 AI 메타 데이터(AI_META1)는 클래스 정보 및 복수의 클래스 맵을 포함하고, 제2 AI 메타 데이터(AI_MEAT2)는 주파수 정보 및 복수의 주파수 맵을 포함할 수 있다. 제1 AI 메타 데이터(AI_MEAT1) 및 제2 AI 메타 데이터(AI_MEATA2)는 경량화되고 압축된 형태로 영상 재생 장치(150)로 전달되고, 메타 정보 복원부(1126)에 의해 압축 해제 처리 및 경량화 복원 처리를 거쳐 제2-2 AI 네트워크(1150)로 전달된다. 15B is a diagram illustrating structures of a 2-1 AI network and a 2-2 AI network according to an embodiment of the present disclosure. According to an embodiment of the present disclosure, the image reproducing apparatus 150 includes an image providing apparatus. A plurality of AI metadata corresponding to a plurality of image characteristics may be received from 110 . The plurality of image characteristics may include, for example, at least one of an object, a frequency, a texture, a semantic, or an imaging parameter or a combination thereof. The image reproducing apparatus 150 may include modulation parameter generating networks 1540a, 1540b, 1540c, and 1540d corresponding to respective image characteristics. For example, the image reproducing apparatus 150 may receive the first AI metadata AI_META1 corresponding to the object characteristic and the second AI metadata AI_META2 corresponding to the frequency characteristic. The first AI metadata AI_META1 may include class information and a plurality of class maps, and the second AI metadata AI_MEAT2 may include frequency information and a plurality of frequency maps. The first AI metadata (AI_MEAT1) and the second AI metadata (AI_MEATA2) are transmitted to the image reproducing apparatus 150 in a lightweight and compressed form, and are decompressed and lightened and restored by the meta information restoration unit 1126 . It is transmitted to the 2-2 AI network 1150 through

변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)는 제1 AI 메타 데이터(AI_META1) 및 제2 AI 메타 데이터(AI_META2)를 입력 받아, 각각 제1 변조 파라미터 세트(P_SET1), 제2 변조 파라미터 세트(P_SET2), 제3 변조 파라미터 세트(P_SET3), 및 제4 변조 파라미터 세트(P_SET4)를 생성한다. 입력되는 영상 특성의 종류가 늘어남에 따라, 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)가 각각의 영상 특성에 대응하는 AI 메타 데이터의 조합을 입력 받아 변조 파라미터를 생성하도록, 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)가 학습될 수 있다. The modulation parameter generation networks 1540a, 1540b, 1540c, and 1540d receive the first AI metadata (AI_META1) and the second AI metadata (AI_META2) as inputs, and receive a first modulation parameter set (P_SET1) and a second modulation parameter set, respectively. (P_SET2), a third modulation parameter set (P_SET3), and a fourth modulation parameter set (P_SET4) are generated. As the types of input image characteristics increase, the modulation parameter generation networks 1540a, 1540b, 1540c, and 1540d receive combinations of AI metadata corresponding to respective image characteristics and generate modulation parameters. (1540a, 1540b, 1540c, 1540d) can be learned.

일 실시예에 따르면, 각각의 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)는 각각 소정의 영상 특성에 대응될 수 있다. 예를 들면, 변조 파라미터 생성 네트워크(1540a, 1540b)는 객체 특성에 대응되고, 변조 파라미터 생성 네트워크(1540c, 1540d)는 주파수 특성에 대응도리 수 있다. 객체 특성에 대응하는 변조 파라미터 생성 네트워크(1540a, 1540b)는 제1 잔차 블록(1530a)의 변조 레이어(1532a, 1532b)로 제1 변조 파라미터 세트(P_SET1) 및 제2 변조 파라미터 세트(P_SET2)를 입력한다. 주파수 특성에 대응하는 변조 파라미터 생성 네트워크(1540c, 1540d)는 제2 AI 메타 데이터(AI_META2)를 입력 받아, 각각 제3 변조 파라미터 세트(P_SET3) 및 제4 변조 파라미터 세트(P_SET4)를 생성한다. 주파수 특성에 대응하는 변조 파라미터 생성 네트워크(1540c, 1540d)는 제2 잔차 블록(1530b)의 변조 레이어(1532c, 1532d)로 제3 변조 파라미터 세트(P_SET3) 및 제4 변조 파라미터 세트(P_SET4)를 입력한다.According to an embodiment, each of the modulation parameter generating networks 1540a, 1540b, 1540c, and 1540d may each correspond to a predetermined image characteristic. For example, the modulation parameter generating networks 1540a and 1540b may correspond to object characteristics, and the modulation parameter generating networks 1540c and 1540d may correspond to frequency characteristics. The modulation parameter generation networks 1540a and 1540b corresponding to object characteristics input a first modulation parameter set (P_SET1) and a second modulation parameter set (P_SET2) to the modulation layers 1532a and 1532b of the first residual block 1530a. do. The modulation parameter generating networks 1540c and 1540d corresponding to the frequency characteristic receive the second AI metadata AI_META2 and generate a third modulation parameter set P_SET3 and a fourth modulation parameter set P_SET4, respectively. The modulation parameter generation networks 1540c and 1540d corresponding to the frequency characteristic input the third modulation parameter set P_SET3 and the fourth modulation parameter set P_SET4 to the modulation layers 1532c and 1532d of the second residual block 1530b. do.

각 영상 특성에 대응하는 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)는 해당 영상 특성에 대응하는 AI 메타 데이터가 입력됨에 따라 활성화될 수 있다. 영상 재생 장치(150)는 복수의 영상 특성에 대응하는 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)를 구비하고, 입력된 AI 메타 데이터에 대응하는 영상 특성에 따라 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)를 선택적으로 활성화시킬 수 있다. 메타 정보 가공부(1128)는 AI 메타 데이터가 입력됨에 따라, 해당 AI 메타 데이터에 대응하는 영상 특성을 인식하고, 해당 AI 메타 데이터의 영상 특성에 대응하는 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)로 AI 메타 데이터를 전달한다. 또한, 해당 AI 메타 데이터의 영상 특성에 대응하는 변조 파라미터 생성 네트워크(1540a, 1540b, 1540c, 1540d)로부터 변조 파라미터 세트(P_SET1, P_SET2, P_SET3, P_SET4)가 생성되어 변조 레이어(1532a, 1532b, 1532c, 1532d)로 입력됨에 따라, 해당 변조 레이어(1532a, 1532b, 1532c, 1532d)가 활성화되고, 제2 영상에 대해 활성화된 변조 레이어(1532a, 1532b, 1532c, 1532d)에 의한 화질 처리가 수행될 수 있다.The modulation parameter generating networks 1540a, 1540b, 1540c, and 1540d corresponding to each image characteristic may be activated as AI metadata corresponding to the corresponding image characteristic is input. The image reproducing apparatus 150 includes modulation parameter generating networks 1540a, 1540b, 1540c, and 1540d corresponding to a plurality of image characteristics, and a modulation parameter generating network 1540a according to image characteristics corresponding to input AI metadata. 1540b, 1540c, 1540d) can be selectively activated. As the AI metadata is input, the meta information processing unit 1128 recognizes image characteristics corresponding to the AI metadata, and generates modulation parameter generating networks 1540a, 1540b, and 1540c corresponding to the image characteristics of the AI metadata. 1540d) to pass AI metadata. In addition, modulation parameter sets (P_SET1, P_SET2, P_SET3, P_SET4) are generated from the modulation parameter generation networks 1540a, 1540b, 1540c, and 1540d corresponding to the image characteristics of the corresponding AI metadata, and the modulation layers 1532a, 1532b, 1532c, 1532d), the corresponding modulation layers 1532a, 1532b, 1532c, and 1532d are activated, and image quality processing by the activated modulation layers 1532a, 1532b, 1532c, and 1532d for the second image may be performed. .

도 16은 본 개시의 일 실시예에 따른 메타 데이터 합성부 및 메타 데이터 가공부의 동작을 나타낸 도면이다.16 is a diagram illustrating operations of a metadata synthesizing unit and a metadata processing unit according to an embodiment of the present disclosure.

메타 데이터 합성부는 적어도 하나의 잔차 블록(1212)을 포함하고, 각 잔차 블록(1212)은 적어도 하나의 변조 레이어(1640)를 포함할 수 있다. 각각의 변조 레이어(1640)는 각 변조 레이어(1640)에 대해 개별적으로 생성된 변조 파라미터 세트를 입력 받는다. The metadata synthesizing unit may include at least one residual block 1212 , and each residual block 1212 may include at least one modulation layer 1640 . Each modulation layer 1640 receives a modulation parameter set individually generated for each modulation layer 1640 .

일 실시예에 따르면, 잔차 블록(1212)의 변조 레이어(1640)로 입력되는 변조 파라미터 세트는 입력 데이터와 곱셈 연산을 하는 제1 연산 변조 파라미터(a_i), 및 입력 데이터와 덧셈 연산을 하는 제2 연산 변조 파라미터(b_i)를 포함한다. 변조 레이어(1640)의 입력 값 f_i는 가로 w, 세로 c 및 높이 h 사이즈를 갖는 입력 특징 맵(fi(w, h, c), 1642) 형태를 가질 수 있다. 여기서 w, c, 및 h는 요소 값의 개수를 나타내고, 자연수로 정의된다. 제1 연산 변조 파라미터(a_i)는 f_i(w, h, c)와 동일 사이즈의 웨이트 맵(ai(w, h, c))의 형태를 가질 수 있다. 제2 연산 변조 파라미터(b_i)는 f_i(w, h, c)와 동일 사이즈의 웨이트 맵(bi(w, h, c))의 형태를 가질 수 있다. 입력 특징 맵(1642), 제1 연산 변조 파라미터(a_i), 및 제2 연산 변조 파라미터(b_i)는 각각 w*h*c 개의 요소 값을 가질 수 있다. According to an embodiment, the modulation parameter set input to the modulation layer 1640 of the residual block 1212 includes a first operation modulation parameter (a _i ) for performing a multiplication operation with the input data, and a first operation modulation parameter (a i ) for performing an addition operation with the input data. 2 contains the computational modulation parameter (b _{i ).} _{The input value f i} of the modulation layer 1640 may have the form of an input feature map fi(w, h, c) 1642 having a width w, a height c, and a height h. where w, c, and h represent the number of element values and are defined as natural numbers. The first arithmetic modulation parameter (a _i ) may have the form of a weight map (ai(w, h, c)) having the same size as _{f i (w, h, c).} The second arithmetic modulation parameter b _i may have the form of a weight map bi(w, h, c) having the same size as f _{i (w, h, c).} The input feature map 1642, the first computational modulation parameter (a _i ), and the second computational modulation parameter (b _i ) may each have w*h*c element values.

제2-2 AI 네트워크(1150)는 AI 메타 데이터를 입력 받아, 각각의 변조 레이어(1640)에 대응하는 제1 연산 변조 파라미터(a_i) 및 제2 연산 변조 파라미터(b_i)를 생성한다. 제2-2 AI 네트워크(1150)는 제1 연산 변조 파라미터(a_i)의 생성 처리 및 제2 연산 변조 파라미터(b_i)의 생성 처리에 공통으로 사용되는 공통 레이어(1610), 제1 연산 변조 파라미터(a_i)를 생성하는 제1 연산 변조 파라미터 생성 레이어(1620), 및 제2 연산 변조 파라미터(b_i)를 생성하는 제2 연산 변조 파라미터 생성 레이어(1630)를 포함할 수 있다. 제1 연산 변조 파라미터(a_i)의 생성 처리와 제2 연산 변조 파라미터(b_i)의 생성 처리의 공통 레이어와 개별적인 레이어의 배치 및 연결 구조는 다양하게 결정될 수 있다.The 2-2 AI network 1150 receives AI metadata and generates a first arithmetic modulation parameter (a _i ) and a second arithmetic modulation parameter (b _i ) corresponding to each modulation layer 1640 . The 2-2 AI network 1150 includes a common layer 1610 commonly used for generation processing of the first arithmetic modulation parameter (a _i ) and generation processing of the second arithmetic modulation parameter (b _{i ), the first arithmetic modulation} a first arithmetic modulation parameter generating layer 1620 to generate a parameter a _i , and a second arithmetic modulation parameter generating layer 1630 to generate a second arithmetic modulation parameter _{b i .} The arrangement and connection structures of common layers and individual layers in the generation processing of the first operation modulation parameter (a _i ) and the generation processing of the second operation modulation parameter (b _{i ) may be variously determined.}

변조 레이어(1640)는 제1 연산 변조 파라미터(a_i)를 입력 받아, 이전 레이어에서 입력된 입력 특징 맵(f_i)과 곱셈 연산을 수행한다. 변조 레이어(1640)는 입력 특징 맵(f_i)과 제1 연산 변조 파라미터(a_i)를 곱할 때, 동일 위치의 요소 값을 곱하는 point-wise multiplication 연산을 수행할 수 있다. 변조 레이어(1640)는 제1 연산 변조 파라미터(a_i)를 이용한 곱셈 결과 값(1644)과 제2 연산 변조 파라미터(b_i)를 더하여, 변조 레이어(1640)의 출력 값인

값(1646)을 생성한다. 변조 레이어(1640)는 곱셈 결과 값(1644)과 제2 연산 변조 파라미터(b_i)를 더할 때, 동일 위치의 요소 값을 더한다.

값(1646)은 수학식 3과 같이 정의될 수 있다.The modulation layer 1640 receives the first operation modulation parameter a _i , and performs a multiplication operation with the input feature map f _{i input from the previous layer.} The modulation layer 1640 may perform a point-wise multiplication operation of multiplying element values at the same location when multiplying the _{input feature map f i} and the first operation modulation parameter a _{i .} The modulation layer 1640 is an output value of the modulation layer 1640 by adding the multiplication result value 1644 using the first arithmetic modulation parameter (a _i ) and the second arithmetic modulation parameter (b _{i ).}

Generate the value 1646. When the modulation layer 1640 adds the multiplication result value 1644 and the second operation modulation parameter b _i , the modulation layer 1640 adds element values at the same position.

The value 1646 may be defined as in Equation (3).

[수학식 3][Equation 3]

복수의 변조 레이어(1640) 각각은 수학식 3과 같은 연산 결과 값을 생성하여 다음 레이어로 출력한다. 메타 데이터 가공부(1128)는 제2-1 AI 네트워크(1140)에 포함된 복수의 변조 레이어(1640)의 처리에 의해, 화질 개선 처리를 수행하고, 영상의 텍스처를 복원한 고화질 영상을 얻을 수 있다.Each of the plurality of modulation layers 1640 generates an operation result value as in Equation 3 and outputs it to the next layer. The metadata processing unit 1128 may perform image quality improvement processing by processing the plurality of modulation layers 1640 included in the 2-1 AI network 1140, and obtain a high-quality image in which the texture of the image is restored. have.

본 개시의 다른 실시예에 따르면, 변조 레이어(1640)의 출력 값

는 수학식 4와 같이 정의될 수 있다. 본 실시예에 따르면, 변조 레이어(1640)의 출력 값은 다차수 함수에 의해 정의된다. 제2-2 AI 네트워크(1140)는 각각의 변조 레이어(1640)에 대해 수학식 4에서 정의된 a_i, b_i, ..., 및 n_i를 생성하여 출력할 수 있다. According to another embodiment of the present disclosure, an output value of the modulation layer 1640

can be defined as in Equation (4). According to this embodiment, the output value of the modulation layer 1640 is defined by a multi-order function. The 2-2 AI network 1140 may generate and output _{a i} , b _i , ..., and n _i defined in Equation 4 for each modulation layer 1640 .

[수학식 4][Equation 4]

는 수학식 5와 같이 정의될 수 있다. 본 실시예에 따르면, 변조 레이어(1640)의 출력 값은 로그 함수에 의해 정의된다. 제2-2 AI 네트워크(1140)는 각각의 변조 레이어(1640)에 대해 수학식 5에서 정의된 a_i, b_i, ..., 및 n_i를 생성하여 출력할 수 있다.According to another embodiment of the present disclosure, an output value of the modulation layer 1640

may be defined as in Equation 5. According to this embodiment, the output value of the modulation layer 1640 is defined by a log function. The 2-2 AI network 1140 may generate and output _{a i} , b _i , ..., and n _i defined in Equation 5 for each modulation layer 1640 .

[수학식 5][Equation 5]

는 수학식 6과 같이 정의될 수 있다. 본 실시예에 따르면, 변조 레이어(1640)의 출력 값은 지수 함수에 의해 정의된다. 제2-2 AI 네트워크(1140)는 각각의 변조 레이어(1640)에 대해 수학식 6에서 정의된 a_i, b_i, ..., 및 n_i를 생성하여 출력할 수 있다.According to another embodiment of the present disclosure, an output value of the modulation layer 1640

can be defined as in Equation (6). According to this embodiment, the output value of the modulation layer 1640 is defined by an exponential function. The 2-2 AI network 1140 may generate and output _{a i} , b _i , ..., and n _i defined in Equation 6 for each modulation layer 1640 .

[수학식 6][Equation 6]

일 실시예에 따르면, 제2-1 AI 네트워크(1140)는 서로 다른 변조 함수를 갖는 변조 레이어(1640)의 조합을 포함할 수 있다. 예를 들면, 제1 변조 레이어는 수학식 4에 의한 변조 처리를 수행하고, 제2 변조 레이어는 수학식 6에 의한 변조 처리를 수행할 수 있다.According to an embodiment, the 2-1 AI network 1140 may include a combination of modulation layers 1640 having different modulation functions. For example, the first modulation layer may perform modulation processing according to Equation 4, and the second modulation layer may perform modulation processing according to Equation 6.

일 실시예에 따르면, 제2-1 AI 네트워크(1140)의 변조 레이어(1640)의 함수는 입력되는 영상 특성의 종류 및 영상 특성의 조합에 따라 달라질 수 있다. 예를 들면, 제2-1 AI 네트워크(1140)는 객체에 대한 AI 메타 데이터만 입력되는 경우, 수학식 3을 이용하여 변조 레이어(1640)에 의한 화질 처리를 수행하고, 객체에 대한 AI 메타 데이터와 주파수에 대한 AI 메타 데이터가 함께 들어오는 경우, 수학식 4를 이용하여 변조 레이어(1640)에 의한 화질 처리를 수행할 수 있다. 프로세서(1120)는 입력되는 AI 메타 데이터에 대응하는 영상 특성의 종류 및 조합에 따라, 변조 레이어(1640)에 의해 수행되는 변조 처리의 종류를 변경할 수 있다. 예를 들면, AI 메타 데이터는 대응하는 영상 특성의 종류에 대한 정보를 포함하고, 메타 정보 가공부(1128)에서 AI 메타 데이터에 대응하는 영상 특성의 종류에 따라 변조 처리의 종류에 대응하는 변조 처리 파라미터를 생성하여 메타 정보 합성부(1124)로 출력할 수 있다. 메타 정보 합성부(1124)는 변조 처리 파라미터에 기초하여, 변조 처리의 종류를 설정하고, 변조 처리를 수행할 수 있다.According to an embodiment, the function of the modulation layer 1640 of the 2-1 AI network 1140 may vary depending on the type of input image characteristics and the combination of image characteristics. For example, when only AI metadata for an object is input, the 2-1 AI network 1140 performs image quality processing by the modulation layer 1640 using Equation 3, and AI metadata for the object When AI metadata for and frequency are received together, image quality processing by the modulation layer 1640 may be performed using Equation (4). The processor 1120 may change the type of modulation processing performed by the modulation layer 1640 according to the type and combination of image characteristics corresponding to the input AI metadata. For example, the AI metadata includes information on the type of the corresponding image characteristic, and the meta-information processing unit 1128 modulates the modulation processing corresponding to the type of the modulation processing according to the type of the image characteristic corresponding to the AI metadata. A parameter may be generated and output to the meta information synthesizing unit 1124 . The meta information synthesizing unit 1124 may set a type of modulation processing based on the modulation processing parameter and perform the modulation processing.

변조 레이어(1640)는 입력 특징 맵에 대해 수학식 3에 의한 변조 연산을 수행하여, 제2 영상의 화질을 개선할 수 있다. 변조 레이어(1640)는 제2-2 AI 네트워크(1150)에 의해 결정된 변조 파라미터 세트를 이용한 변조 연산에 의해, 제2 영상의 객체 관련 텍스처를 복원하여, 원본 영상과 거의 유사한 텍스처를 갖는 제3 영상을 생성한다. 특히, 제2-2 AI 네트워크(1150)에 의해 결정된 변조 파라미터 세트를 이용한 변조 영상에 의해, 원본 영상의 객체, 주파수 등의 영상 특성에 관련된 영상의 디테일한 부분이 복원되어 제3 영상의 화질이 개선될 수 있다.The modulation layer 1640 may perform a modulation operation according to Equation 3 on the input feature map to improve the quality of the second image. The modulation layer 1640 reconstructs the object-related texture of the second image by a modulation operation using the modulation parameter set determined by the 2-2 AI network 1150 to restore the third image having a texture almost similar to the original image. create In particular, by the modulated image using the modulation parameter set determined by the 2-2 AI network 1150, the detailed part of the image related to the image characteristics such as the object and frequency of the original image is restored, so that the quality of the third image is improved. can be improved.

도 17은 본 개시의 일 실시예에 따른 영상 제공 장치의 제1 AI 네트워크 및 영상 재생 장치의 제2 AI 네트워크의 학습 방법을 설명하기 위한 도면이다.17 is a diagram for explaining a method of learning a first AI network of an image providing apparatus and a second AI network of an image reproducing apparatus according to an embodiment of the present disclosure.

일 실시예에 따르면, 영상 재생 장치의 제2 AI 네트워크는 저해상도 영상(LR), AI 메타 데이터, 및 고해상도 영상(HR)을 포함하는 다수의 트레이닝 데이터를 이용하여 학습될 수 있다. 일 실시예에 따르면, 제2 AI 네트워크를 학습시키는 학습 처리부(1700)는 제2 AI 네트워크(156)로 저해상도 영상(LR) 및 AI 메타 데이터를 입력하고, 제2 AI 네트워크(156)로부터 생성된 슈퍼 해상도 영상(SR)을 고해상도 영상(HR)과 비교하고, 비교 결과에 기초하여 제2 AI 네트워크(156)를 업데이트할 수 있다. 제2 AI 네트워크(156)는 다양한 종류의 GAN(Generative Adversarial Network) 알고리즘을 이용하여, 고해상도 영상(HR)과 슈퍼 해상도 영상(SR)을 비교하고, 그 비교 결과에 따라 제2 AI 네트워크(156)를 업데이트할 수 있다. According to an embodiment, the second AI network of the image reproducing apparatus may be learned using a plurality of training data including a low-resolution image (LR), AI metadata, and a high-resolution image (HR). According to an embodiment, the learning processing unit 1700 for learning the second AI network inputs a low-resolution image (LR) and AI metadata to the second AI network 156, and the second AI network 156 generates The super-resolution image SR may be compared with the high-resolution image HR, and the second AI network 156 may be updated based on the comparison result. The second AI network 156 compares the high-resolution image (HR) and the super-resolution image (SR) using various types of Generative Adversarial Network (GAN) algorithms, and the second AI network 156 according to the comparison result. can be updated.

일 실시예예 따르면, 제2 AI 네트워크는 ESRGAN(Enhanced Super-Resolution Generative Adversarial Networks) 알고리즘에 기초하여 학습될 수 있다. 학습 처리부(1700)는 상대론적 판별부(relativistic discriminator)를 이용하여, 실제 이미지(real image, x_r)가 가짜 이미지(face image, x_f)보다 상대적으로 더 실제 같을 확률을 예측한다. 여기서 실제 이미지(x_r)는 고해상도 이미지(HR)에 대응하고, 가짜 이미지(x_f)는 슈퍼 해상도 이미지에 대응한다. 이를 위해, 학습 처리부(1700)는 상대론적 판별부(D_Ra)를 이용하여, 슈퍼 해상도 이미지(SR)와 고해상도 이미지(HR)를 비교한다. 상대론적 판별부(D_Ra)는 다음의 수학식 7에 기초하여 판별 결과 값을 산출한다. According to an embodiment, the second AI network may be trained based on an Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) algorithm. The learning processing unit 1700 predicts a probability that a _{real image (x r} ) is relatively more realistic than a face image (x _{f ) using a relativistic discriminator.} Here, the real image (x _r ) corresponds to the high-resolution image (HR), and the fake image (x _f ) corresponds to the super-resolution image. To this end, the learning processing unit 1700 compares the super-resolution image SR and the high-resolution image HR using the _{relativistic discrimination unit D Ra .} The relativistic determination unit D _Ra calculates a determination result value based on Equation 7 below.

[수학식 7][Equation 7]

DD _RaRa (x(x _rr , x, x _ff ) = σ(C(x) = σ(C(x) _rr )) - E)) - E _xfxf ((C(x((C(x _ff ))) )))

여기서 σ는 시그모이드 함수를 의미하고, C(x)는 입력 영상이 실제 영상인지 여부를 판별하는 판별부(discriminator)의 출력을 의미하고, E_xf(.)는 미니-배치(mini batch)에서 모든 가짜 데이터에 대한 평균 연산을 나타낸다. 학습 처리부(1700)는 상대론적 판별부(DRa)의 처리 결과에 기초하여, 제2 AI 네트워크(156)의 파라미터 값들을 업데이트함에 의해, 학습을 수행한다.Here, σ denotes a sigmoid function, C(x) denotes the output of a discriminator that determines whether the input image is an actual image, and E _xf (.) denotes a mini-batch represents the average operation for all spurious data. The learning processing unit 1700 performs learning by updating the parameter values of the second AI network 156 based on the processing result of the relativistic discrimination unit DRa.

일 실시예에 따르면, 제1 AI 네트워크(114)와 제2 AI 네트워크(156)는 연계되어 학습될 수 있다. 학습 처리부(1700)는 고해상도 영상(HR)을 포함하는 트레이닝 데이터를 이용하여, 제1 AI 네트워크(114)와 제2 AI 네트워크(156)의 처리에 의해 슈퍼 해상도 영상(SR)을 생성하고, 고해상도 영상(HR)과 슈퍼 해상도 영상(SR)의 비교 결과에 기초하여 제1 AI 네트워크(114) 및 제2 AI 네트워크(156)를 업데이트할 수 있다. According to an embodiment, the first AI network 114 and the second AI network 156 may be linked and learned. The learning processing unit 1700 generates a super-resolution image SR by processing the first AI network 114 and the second AI network 156 using training data including the high-resolution image HR, and the high-resolution image SR is generated. The first AI network 114 and the second AI network 156 may be updated based on the comparison result of the image HR and the super-resolution image SR.

다른 실시예에 따르면, 제1 AI 네트워크(114)와 제2 AI 네트워크(156)의 학습 과정은, 제1 AI 네트워크(114)와 제2 AI 네트워크(156)를 각각 학습시키는 개별 학습, 및 제1 AI 네트워크(114)와 제2 AI 네트워크(156)의 연계 학습을 포함할 수 있다. 개별 학습은, 고해상도 영상(HR), 저해상도 영상(LR), 및 AI 메타 데이터를 포함하는 다수의 트레이닝 데이터를 이용하여, 제1 AI 네트워크(114) 및 제2 AI 네트워크(156)를 개별적으로 학습시킨다. 다음으로, 개별 학습된 제1 AI 네트워크(114) 및 제2 AI 네트워크(156)에 대해 고해상도 영상(HR)을 트레이닝 데이터로 이용하여 연계 학습을 수행한다. 연계 학습 과정에 의해, 개별 학습된 제1 AI 네트워크(114) 및 제2 AI 네트워크(156)가 추가로 업데이트될 수 있다.According to another embodiment, the learning process of the first AI network 114 and the second AI network 156 includes individual learning for learning the first AI network 114 and the second AI network 156, respectively, and the second AI network 156. It may include joint learning of the first AI network 114 and the second AI network 156 . In the individual learning, the first AI network 114 and the second AI network 156 are individually learned using a plurality of training data including high-resolution image (HR), low-resolution image (LR), and AI metadata. make it Next, linkage learning is performed using the high-resolution image (HR) as training data for the individually learned first AI network 114 and the second AI network 156 . The individually learned first AI network 114 and the second AI network 156 may be further updated by the associative learning process.

도 18은 본 개시의 일 실시예에 따른 영상 재생 장치 제어 방법을 나타낸 흐름도이다.18 is a flowchart illustrating a method of controlling an image reproducing apparatus according to an embodiment of the present disclosure.

본 개시의 영상 재생 장치 제어 방법의 각 단계들은 입력부, 프로세서, 메모리, 및 출력부를 구비하고, 기계 학습 모델을 이용하는 다양한 형태의 전자 장치에 의해 수행될 수 있다. 본 명세서는 본 개시의 실시예들에 따른 영상 재생 장치(150)가 영상 재생 장치 제어 방법을 수행하는 실시예를 중심으로 설명한다. 따라서 영상 재생 장치(150)에 대해 설명된 실시예들은 영상 재생 장치 제어 방법에 대한 실시예들에 적용 가능하고, 반대로 영상 재생 장치 제어 방법에 대해 설명된 실시예들은 영상 재생 장치(150)에 대한 실시예들에 적용 가능하다. 개시된 실시예들에 따른 영상 재생 장치 제어 방법은 본 명세서에 개시된 영상 재생 장치(150)에 의해 수행되는 것으로 그 실시예가 한정되지 않고, 다양한 형태의 전자 장치에 의해 수행될 수 있다.Each step of the method for controlling an image reproducing apparatus of the present disclosure may be performed by various types of electronic devices having an input unit, a processor, a memory, and an output unit, and using a machine learning model. The present specification will be mainly described with respect to an embodiment in which the image reproducing apparatus 150 according to embodiments of the present disclosure performs the method of controlling the image reproducing apparatus. Accordingly, the embodiments described for the image reproducing apparatus 150 are applicable to the embodiments of the method for controlling the image reproducing apparatus, and on the contrary, the embodiments described for the method for controlling the image reproducing apparatus 150 are for the image reproducing apparatus 150 . Applicable to the embodiments. The method for controlling an image reproducing apparatus according to the disclosed embodiments is not limited to being performed by the image reproducing apparatus 150 disclosed herein, and may be performed by various types of electronic devices.

영상 재생 장치는, 부호화 영상 및 AI 메타 데이터를 입력 받는다(S1802). AI 메타 데이터는 영상 제공 장치에 의해 생성된다. AI 메타 데이터는 경량화되고 압축된 형태로 영상 재생 장치로 입력될 수 있다. AI 메타 데이터는 부호화 영상에 대응되는 클래스 정보 및 복수의 클래스 맵을 포함한다. The video reproducing apparatus receives the encoded video and AI metadata (S1802). AI metadata is generated by the image providing device. AI metadata may be input to the image reproducing device in a lightweight and compressed form. The AI metadata includes class information corresponding to the encoded image and a plurality of class maps.

다음으로, 영상 재생 장치는 부호화 영상을 복호화하여 제2 영상을 생성한다(S1804). 영상 재생 장치는 영상 제공 장치에 의해 부호화된 부호화 영상을, 영상 제공 장치에서 사용된 부호화 처리에 대응하는 복호화 처리를 이용하여 제2 영상을 생성한다. 복호화 처리는 앞서 설명한 복호화부(1110)의 동작과 동일 또는 유사하므로, 중복되는 설명은 생략한다.Next, the image reproducing apparatus generates a second image by decoding the encoded image (S1804). The image reproducing apparatus generates a second image from the encoded image encoded by the image providing apparatus by using a decoding process corresponding to the encoding process used in the image providing apparatus. Since the decoding process is the same as or similar to the operation of the decoding unit 1110 described above, a redundant description will be omitted.

또한, 영상 재생 장치는 AI 메타 데이터 및 제2 영상으로부터 제2 AI 네트워크를 이용하여 화질 개선 처리가 수행된 제3 영상을 생성한다(S1806). 영상 재생 장치는 AI 메타 데이터에 대한 압축 해제 처리 및 경량화 복원 처리를 수행한다. AI 메타 데이터는 압축 해제 처리 및 경량화 복원 처리에 의해, 클래스 정보 및 복수의 복원 클래스 맵 형태로 복원된다. 클래스 정보 및 복수의 복원 클래스 맵은 제2-2 AI 네트워크에 의해 복수의 변조 파라미터 세트로 변환된다. 제2-2 AI 네트워크는 화질 처리를 위한 잔차 블록 내의 변조 레이어 각각에 대한 변조 파라미터 세트를 생성한다. 영상 재생 장치는 제2 영상에 대해 제2-1 AI 네트워크를 이용하여 화질 처리를 수행한다. 제2-1 AI 네트워크는 적어도 하나의 잔차 블록을 포함하고, 각 잔차 블록은 적어도 하나의 변조 레이어를 포함한다. 변조 레이어는 제2-2 AI 네트워크에 의해 생성된 변조 파라미터 세트에 의해 파라미터가 설정된다. 제2 영상으로부터 생성된 특성 맵은 적어도 하나의 잔차 블록에 의해 처리된다. 잔차 블록 내의 변조 레이어는 특성 맵에 대한 화질 처리를 수행한다. 변조 레이어는 예를 들면, 아핀 변환을 이용하여 특성 맵에 대한 화질 처리를 수행할 수 있다. 제2-1 AI 네트워크는 제2 영상에 대한 화질 처리를 수행하여 제3 영상을 생성하고 출력한다. 화질 개선 처리는, 앞서 설명한 화질 처리부(1120) 의 동작과 동일 또는 유사하므로, 중복되는 설명은 생략한다.Also, the image reproducing apparatus generates a third image on which image quality improvement processing is performed by using the second AI network from the AI metadata and the second image (S1806). The image reproducing apparatus performs decompression processing and light weight restoration processing on AI metadata. AI metadata is restored in the form of class information and a plurality of restored class maps by decompression processing and lightweight restoration processing. The class information and the plurality of reconstructed class maps are converted into a plurality of modulation parameter sets by the 2-2 AI network. The 2-2 AI network generates a modulation parameter set for each modulation layer in the residual block for image quality processing. The image reproducing apparatus performs image quality processing on the second image using the 2-1 AI network. The 2-1 AI network includes at least one residual block, and each residual block includes at least one modulation layer. The modulation layer is parameterized by the modulation parameter set generated by the 2-2 AI network. The feature map generated from the second image is processed by at least one residual block. A modulation layer in the residual block performs image quality processing on the feature map. The modulation layer may perform image quality processing on the feature map using, for example, an affine transform. The 2-1 AI network generates and outputs a third image by performing image quality processing on the second image. Since the image quality improvement process is the same as or similar to the operation of the image quality processing unit 1120 described above, a redundant description will be omitted.

다음으로, 영상 재생 장치는 제3 영상을 출력한다(S1808). 영상 재생 장치는 제3 영상을 디스플레이하거나, 다른 장치로 전송하여 출력할 수 있다. 출력 처리는 앞서 설명한 출력부(160)의 동작과 동일 또는 유사하므로, 중복되는 설명은 생략한다.Next, the image reproducing apparatus outputs a third image (S1808). The image reproducing apparatus may display the third image or transmit it to another device and output it. Since the output processing is the same as or similar to the operation of the output unit 160 described above, a redundant description will be omitted.

도 19는 본 개시의 일 실시예에 따른 영상 제공 장치와 영상 재생 장치의 구성을 나타낸 도면이다.19 is a diagram illustrating the configuration of an image providing apparatus and an image reproducing apparatus according to an embodiment of the present disclosure.

영상 제공 장치와 영상 재생 장치는 다양한 형태의 전자 장치(1900)로 구현될 수 있다. 전자 장치(1900)는 프로세서(1910), 메모리(1920), 및 입/출력부(1930)를 포함한다. 영상 제공 장치(110)의 프로세서(112)는 전자 장치(1900)의 프로세서(1910)에 대응하고, 영상 제공 장치(110)의 메모리(116)는 전자 장치(1900)의 메모리(1920)에 대응하고, 영상 제공 장치(110)의 출력부(118)는 전자 장치(1900)의 입/출력부(1930)에 대응할 수 있다. 영상 재생 장치(150)의 입력부(152) 및 출력부(160)는 전자 장치(1900)의 입/출력부(1930)에 대응하고, 영상 재생 장치(150)의 프로세서(154)는 전자 장치(1900)의 프로세서(1910)에 대응하고, 영상 재생 장치(150)의 메모리(158)는 전자 장치(1900)의 메모리(1920)에 대응할 수 있다.The image providing apparatus and the image reproducing apparatus may be implemented as various types of electronic devices 1900 . The electronic device 1900 includes a processor 1910 , a memory 1920 , and an input/output unit 1930 . The processor 112 of the image providing device 110 corresponds to the processor 1910 of the electronic device 1900 , and the memory 116 of the image providing device 110 corresponds to the memory 1920 of the electronic device 1900 . and the output unit 118 of the image providing apparatus 110 may correspond to the input/output unit 1930 of the electronic device 1900 . The input unit 152 and the output unit 160 of the image reproducing apparatus 150 correspond to the input/output unit 1930 of the electronic device 1900, and the processor 154 of the image reproducing apparatus 150 includes the electronic device ( Corresponds to the processor 1910 of the 1900 , and the memory 158 of the image reproducing apparatus 150 may correspond to the memory 1920 of the electronic device 1900 .

프로세서(1910)는 하나 이상의 프로세서를 포함할 수 있다. 프로세서(1910)는 중앙 제어부, 영상 처리부, AI 처리부 등의 전용 프로세서를 포함할 수 있다.Processor 1910 may include one or more processors. The processor 1910 may include a dedicated processor such as a central control unit, an image processing unit, and an AI processing unit.

메모리(1920)는 휘발성 저장매체, 비휘발성 저장매체, 또는 이들의 조합을 포함할 수 있다. 또한, 메모리(1920)는 메인 메모리, 캐시 메모리, 레지스터, 비휘발성 메모리 등 다양한 종류의 메모리를 포함할 수 있다. 메모리(1920)는 다양한 형태의 저장매체로 구현될 수 있다. 예를 들면, 메모리(1920)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory), SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.The memory 1920 may include a volatile storage medium, a non-volatile storage medium, or a combination thereof. Also, the memory 1920 may include various types of memory, such as a main memory, a cache memory, a register, and a non-volatile memory. The memory 1920 may be implemented as various types of storage media. For example, the memory 1920 may include a flash memory type, a hard disk type, a multimedia card micro type, or a card type memory (eg, SD or XD memory). etc.), RAM (Random Access Memory), SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), It may include at least one type of storage medium among a magnetic memory, a magnetic disk, and an optical disk.

입/출력부(1930)는 다양한 종류의 입출력 인터페이스 및 통신부(1940)를 포함할 수 있다. 입/출력부(1930)는 디스플레이부, 터치스크린, 터치패드, 통신부, 또는 이들의 조합을 포함할 수 있다. 통신부(1940)는 다양한 종류의 통신 모듈을 포함할 수 있다. 통신부(1940)는 근거리 통신부(1942), 이동 통신부(1944), 또는 방송 수신부(1946) 등을 포함할 수 있다. 근거리 통신부(1942)는 블루투스, BLE(Bluetooth Low Energy), 근거리 무선 통신 (Near Field Communication), RFID(Radio Frequency Identification), WLAN(와이파이), 지그비(Zigbee), 적외선(IrDA, infrared Data Association) 통신, WFD(Wi-Fi Direct), UWB(ultra wideband), Ant+ 통신, 또는 이들의 조합에 의한 통신을 수행할 수 있다.The input/output unit 1930 may include various types of input/output interfaces and communication units 1940 . The input/output unit 1930 may include a display unit, a touch screen, a touch pad, a communication unit, or a combination thereof. The communication unit 1940 may include various types of communication modules. The communication unit 1940 may include a short-range communication unit 1942 , a mobile communication unit 1944 , or a broadcast receiving unit 1946 . The short-range communication unit 1942 communicates with Bluetooth, Bluetooth Low Energy (BLE), near field communication, RFID (Radio Frequency Identification), WLAN (Wi-Fi), Zigbee, and infrared (IrDA) communication. , WFD (Wi-Fi Direct), UWB (ultra wideband), Ant+ communication, or a combination thereof may perform communication.

한편, 상술한 본 개시의 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성가능하고, 작성된 프로그램은 매체에 저장될 수 있다.Meanwhile, the above-described embodiments of the present disclosure can be written as a program that can be executed on a computer, and the written program can be stored in a medium.

매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The medium may be to continuously store a computer executable program, or to temporarily store it for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or several hardware combined, it is not limited to a medium directly connected to any computer system, and may exist distributed on a network. Examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like. In addition, examples of other media may include recording media or storage media managed by an app store that distributes applications, sites that supply or distribute various other software, and servers.

한편, 상술한 AI 네트워크 모델은, 소프트웨어 모듈로 구현될 수 있다. 소프트웨어 모듈(예를 들어, 명령어(instruction)를 포함하는 프로그램 모듈)로 구현되는 경우, AI 네트워크 모델은 컴퓨터로 읽을 수 있는 판독 가능한 기록매체에 저장될 수 있다.Meanwhile, the above-described AI network model may be implemented as a software module. When implemented as a software module (eg, a program module including instructions), the AI network model may be stored in a computer-readable recording medium.

또한, AI 네트워크 모델은 하드웨어 칩 형태로 집적되어 전술한 영상 제공 장치 또는 영상 재생 장치의 일부가 될 수도 있다. 예를 들어, AI 네트워크 모델은 인공 지능을 위한 전용 하드웨어 칩 형태로 제작될 수도 있고, 또는 기존의 범용 프로세서(예를 들어, CPU 또는 애플리케이션 프로세서) 또는 그래픽 전용 프로세서(예를 들어, GPU)의 일부로 제작될 수도 있다.In addition, the AI network model may be integrated in the form of a hardware chip to become a part of the above-described image providing apparatus or image reproducing apparatus. For example, the AI network model may be built in the form of a dedicated hardware chip for artificial intelligence, or as part of an existing general-purpose processor (eg, CPU or application processor) or graphics-only processor (eg, GPU). may be manufactured.

또한, AI 네트워크 모델은 다운로드 가능한 소프트웨어 형태로 제공될 수도 있다. 컴퓨터 프로그램 제품은 제조사 또는 전자 마켓을 통해 전자적으로 배포되는 소프트웨어 프로그램 형태의 상품(예를 들어, 다운로드 가능한 애플리케이션)을 포함할 수 있다. 전자적 배포를 위하여, 소프트웨어 프로그램의 적어도 일부는 저장 매체에 저장되거나, 임시적으로 생성될 수 있다. 이 경우, 저장 매체는 제조사 또는 전자 마켓의 서버, 또는 중계 서버의 저장매체가 될 수 있다.In addition, the AI network model may be provided in the form of downloadable software. The computer program product may include a product (eg, a downloadable application) in the form of a software program distributed electronically through a manufacturer or an electronic market. For electronic distribution, at least a portion of the software program may be stored in a storage medium or may be temporarily generated. In this case, the storage medium may be a server of a manufacturer or an electronic market, or a storage medium of a relay server.

이상, 본 개시의 기술적 사상을 바람직한 실시예를 들어 상세하게 설명하였으나, 본 개시의 기술적 사상은 상기 실시예들에 한정되지 않고, 본 개시의 기술적 사상의 범위 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 여러 가지 변형 및 변경이 가능하다.In the above, the technical idea of the present disclosure has been described in detail with reference to preferred embodiments, but the technical idea of the present disclosure is not limited to the above embodiments, and those of ordinary skill in the art within the scope of the technical spirit of the present disclosure Various modifications and changes are possible by the person.

Claims

a memory storing one or more instructions;
one or more processors to execute the one or more instructions stored in the memory; and
including an output;
The one or more processors by executing the one or more instructions,
Class information including at least one class corresponding to the type of object included in the first image among a plurality of predefined objects using a first artificial intelligence (AI) network, and each class in the first image generate AI metadata including at least one class map representing a region corresponding to
Encoding the first image to generate an encoded image,
outputting the encoded image and the AI metadata through the output unit;
The one or more processors by executing the one or more instructions,
inputting the first image to the first AI network to generate a plurality of segmentation probability maps for each of the plurality of predefined object types;
defining at least one class corresponding to a type of an object included in the first image based on the plurality of segmentation probability maps;
generating class information including the at least one class;
and generating the at least one class map for each of the at least one class from the plurality of segmentation probability maps.

delete

According to claim 1,
The one or more processors by executing the one or more instructions,
For each of the plurality of segmentation probability maps, calculating an average value of pixels except for a pixel having a value of 0;
An image providing apparatus for defining the at least one class by selecting some of the plurality of predefined objects based on the size of an average value of each of the plurality of segmentation probability maps.

4. The method of claim 3,
The one or more processors by executing the one or more instructions,
mapping at least some of the objects that are not selected as the at least one class among the plurality of predefined objects to the at least one class,
and generating the class map by synthesizing the segmentation probability map of the object mapped to the at least one class and the segmentation probability map of the class to which the corresponding object is mapped.

According to claim 1,
The one or more processors by executing the one or more instructions,
generating, from the first AI network, a segmentation probability map for a frequency for each of a plurality of predefined frequency domains;
AI including frequency information including information on a frequency domain for the first image, and at least one frequency map corresponding to each of the frequency domains included in the frequency information, based on the segmentation probability map for the frequency An image providing device that generates metadata.

According to claim 1,
The first image includes a plurality of frames, the class information and the at least one class map are generated for each of the plurality of frames,
The one or more processors by executing the one or more instructions,
defining at least one sequence including at least one frame from the plurality of frames,
For each sequence, sequence class information indicating information on a class included in the at least one frame included in the sequence, and information on a class included in each of the at least one frame included in the sequence create frame class information representing;
The frame class information indicates a combination of classes included in a corresponding frame among classes included in the sequence class information, and the number of bits is smaller than that of the sequence class information.

According to claim 1,
The one or more processors by executing the one or more instructions,
Based on the at least one class map, each pixel generates a lightweight class map having a representative value corresponding to one of the at least one class,
Generate lightweight AI metadata including the class information and the lightweight class map,
An image providing apparatus for outputting the encoded image and the lightweight class map through the output unit.

According to claim 1,
The first AI network,
a 1-1 AI network including at least one convolution layer and at least one max pooling layer and generating a feature map from the first image; and
A first layer group including at least one convolutional layer and at least one activation layer and receiving and processing the feature map from the 1-1 AI network, upscaling the output of the first layer group a second comprising an upscaler, and at least one convolutional layer and at least one min pooling layer, and receiving an output of the upscaler to generate a segmentation probability map for each of the plurality of predefined objects An image providing device comprising a 1-2 AI network including a layer group.

According to claim 1,
The first AI network is learned in connection with the second AI network,
The second AI network is included in an apparatus for decoding the encoded image by receiving the encoded image and the AI metadata, and performs image quality processing on image data corresponding to the encoded image from the AI metadata. Device.

According to claim 1,
The one or more processors by executing the one or more instructions,
and generating the encoded image by performing downscale processing and encoding processing on the first image.

Class information including at least one class corresponding to the type of object included in the first image among a plurality of predefined objects using a first artificial intelligence (AI) network, and each object in the first image generating AI metadata including at least one class map indicating a region corresponding to ;
generating an encoded image by encoding the first image; and
outputting the encoded image and the AI metadata;
The step of generating the AI metadata includes:
generating a plurality of segmentation probability maps for each of the plurality of predefined object types by inputting the first image to the first AI network;
defining at least one class corresponding to a type of an object included in the first image based on the plurality of segmentation probability maps;
generating class information including the at least one class;
and generating the at least one class map for each of the at least one class from the plurality of segmentation probability maps.

a memory storing one or more instructions;
one or more processors to execute the one or more instructions stored in the memory;
an input unit receiving an encoded image corresponding to the first image and AI metadata corresponding to the encoded image; and
including an output;
The AI metadata includes class information including at least one class corresponding to the type of object included in the first image, and at least one class map indicating a region corresponding to each class in the first image and,
The one or more processors by executing the one or more instructions,
generating a second image corresponding to the first image by decoding the encoded image;
using a second AI network to generate a third image on which image quality improvement processing is performed from the second image and the AI metadata;
output the third image through the output unit,
The second AI network includes a 2-1 AI network including at least one convolution layer, at least one modulation layer, and at least one activation layer,
The at least one modulation layer processes a feature map input to the modulation layer based on a modulation parameter set generated from the AI metadata.

delete

13. The method of claim 12,
The 2-1 AI network includes a plurality of modulation layers,
wherein the second AI network includes a modulation parameter generation network corresponding to each of the plurality of modulation layers from the AI metadata, each of the modulation parameter generation networks generating the modulation parameter set for a corresponding modulation layer; video playback device.

15. The method of claim 14,
The modulation parameter set is a first arithmetic modulation parameter synthesized by a multiplication operation on the data value of the input feature map, and a multiplication result of the first arithmetic modulation parameter and the data value of the input feature map by an addition operation. a second computational modulation parameter to be synthesized;
and the at least one modulation layer performs processing on the input feature map based on the first arithmetic modulation parameter and the second arithmetic modulation parameter.

13. The method of claim 12,
The 2-1 AI network includes a plurality of residual blocks,
Each of the plurality of residual blocks comprises:
a main stream comprising at least one convolutional layer, at least one modulation layer, and at least one activation layer, for generating a residual version of a processing result value; and
at least one second skip processing path for generating a processing result of a prediction version by bypassing at least one layer included in the corresponding block;
The 2-1 AI network includes at least one first skip processing path that skips at least one residual block among the plurality of residual blocks to generate a processing result value of a prediction version.

13. The method of claim 12,
The second AI network,
an upscaler that receives the output of the 2-1 AI network and performs upscale processing; and
and a second image quality processing unit receiving the output of the upscaler, generating and outputting a third image, and including at least one machine-learned layer.

13. The method of claim 12,
The AI metadata input through the input unit includes a lightweight class map in which at least one class map for each of the at least one class is lightened,
The one or more processors by executing the one or more instructions,
From the lightweight class map, generate a restored class map for each of the at least one class,
The reconstructed class map has a value indicating whether each pixel corresponds to a corresponding class.

receiving an encoded image corresponding to the first image and AI metadata corresponding to the encoded image;
generating a second image corresponding to the first image by decoding the encoded image;
generating, from the second image and the AI metadata, a third image on which image quality improvement has been performed, by using a second AI network; and
outputting the third image,
The AI metadata includes class information including at least one class corresponding to the type of object included in the first image, and at least one class map indicating a region corresponding to each class in the first image and,
The second AI network includes a 2-1 AI network including at least one convolution layer, at least one modulation layer, and at least one activation layer,
The at least one modulation layer processes a feature map input to the corresponding modulation layer based on a modulation parameter set generated from the AI metadata.

A computer-readable recording medium storing computer program instructions for performing a method of controlling an image reproducing apparatus when executed by a processor, the method comprising:
receiving an encoded image corresponding to the first image and AI metadata corresponding to the encoded image;
generating a second image corresponding to the first image by decoding the encoded image;
generating, from the second image and the AI metadata, a third image on which image quality improvement has been performed, by using a second AI network; and
outputting the third image,
The AI metadata includes class information including at least one class corresponding to the type of object included in the first image, and at least one class map indicating a region corresponding to each class in the first image and,
The second AI network includes a 2-1 AI network including at least one convolution layer, at least one modulation layer, and at least one activation layer,
The at least one modulation layer processes a feature map input to the modulation layer based on a modulation parameter set generated from the AI metadata.