KR20220067463A

KR20220067463A - Electronic apparatus and controlling method thereof

Info

Publication number: KR20220067463A
Application number: KR1020210030947A
Authority: KR
Inventors: 박승호; 문영수
Original assignee: 삼성전자주식회사
Priority date: 2020-11-17
Filing date: 2021-03-09
Publication date: 2022-05-24

Abstract

Disclosed is an electronic device dividing an input image into a plurality of areas and converting the divided area based on different style information. The electronic device includes: a display; a memory storing a learned artificial intelligence model; and a processor obtaining a high-resolution output image having critical resolution or greater from a low-resolution input image under the critical resolution by using the learned artificial intelligence model. The processor also controls the display to display the obtained high-resolution image. The processor identifies a first area including first property information in the low-resolution input image and a second area including second property information. The processor obtains a first image and a second image. Test processing of a first intensity is performed on the first area in the first image. The test processing of a second intensity different from the first intensity is performed on the second area in the second image. The processor can individually scale up the first image and the second image. Also, the processor obtains a high-resolution image by combining the first image and the second image scaled up.

Description

ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

본 개시는 전자 장치 및 그 제어방법에 관한 것으로, 더욱 상세하게는 입력 이미지를 복수의 영역으로 구분하여 해상도를 변경하는 전자 장치 및 그 제어방법에 대한 것이다.The present disclosure relates to an electronic device and a method for controlling the same, and more particularly, to an electronic device for changing resolution by dividing an input image into a plurality of regions, and a method for controlling the same.

인공지능 시스템은 인간 수준의 지능을 구현하는 컴퓨터 시스템으로서 기계가 스스로 학습하고 판단하며, 사용할수록 인식률이 향상되는 시스템이다.An artificial intelligence system is a computer system that implements human-level intelligence. It is a system in which a machine learns and judges on its own, and the recognition rate improves the more it is used.

인공지능 기술은 입력 데이터들의 특징을 스스로 분류/학습하는 알고리즘을 이용하는 기계학습(딥러닝) 기술 및 기계학습 알고리즘을 활용하여 인간 두뇌의 인지, 판단 등의 기능을 모사하는 요소 기술들로 구성된다.Artificial intelligence technology consists of machine learning (deep learning) technology that uses an algorithm that classifies/learns the characteristics of input data by itself, and element technology that uses machine learning algorithms to simulate functions such as cognition and judgment of the human brain.

요소기술들은, 예로, 인간의 언어/문자를 인식하는 언어적 이해 기술, 사물을 인간의 시각처럼 인식하는 시각적 이해 기술, 정보를 판단하여 논리적으로 추론하고 예측하는 추론/예측 기술, 인간의 경험 정보를 지식데이터로 처리하는 지식 표현 기술 및 차량의 자율 주행, 로봇의 움직임을 제어하는 동작 제어 기술 중 적어도 하나를 포함할 수 있다.Element technologies are, for example, linguistic understanding technology for recognizing human language/text, visual understanding technology for recognizing objects as from human vision, reasoning/prediction technology for logically reasoning and predicting by judging information, and human experience information It may include at least one of a knowledge expression technology that processes the data as knowledge data, an autonomous driving of a vehicle, and a motion control technology that controls the movement of a robot.

인공 지능 모델은 저해상도 이미지에서 고해상도 이미지로 변환하는데 이용될 수 있다.Artificial intelligence models can be used to transform low-resolution images into high-resolution images.

초해상도(Super Resolution) 이미지를 생성하기 위하여 인지 손실(perceptual loss) 및 적대 손실(adversarial loss)에 기초하여 학습된 인공 지능 모델이 이용될 수 있다. 인지 손실 및 적대 손실을 이용하는 인공 지능 모델은 재구성의 전반적인 시각적 품질이 향상될 수 있다. 특히, 생성적 대립 신경망(Generative Adversarial Network, GAN)을 이용하여 이미지 변환 동작을 수행하는 경우, 고해상도 이미지를 생성하는데 이용될 수 있다.An artificial intelligence model trained based on perceptual loss and adversarial loss may be used to generate a super-resolution image. Artificial intelligence models using cognitive loss and adversarial loss can improve the overall visual quality of reconstructions. In particular, when an image conversion operation is performed using a generative adversarial network (GAN), it may be used to generate a high-resolution image.

다만, 재학습이 이루어지지 않은 상태에서 부자연스러운 영역을 변환할 수 없다는 문제점이 있을 수 있다. 또한, 연속 이미지 효과를 전환함에 있어서 한계가 있다. 또한, 전체 영역에 동일한 이미지 변환 방식이 적용될 뿐, 일부 영역별로 이미지 변환 방식을 다르게 하지 않아 부자연스러운 이미지를 생성할 수 있다.However, there may be a problem that an unnatural region cannot be transformed in a state where re-learning is not performed. Also, there is a limitation in switching the continuous image effect. In addition, the same image conversion method is applied to the entire area, and an image conversion method is not different for each area, thereby generating an unnatural image.

또한, 일반적인 이미지 변환 모델은 동일한 기능 공간에서 이미지의 전체 영역에 대한 인지 손실을 계산하기 때문에 결과가 단조롭고 부자연스럽다는 문제점이 있었다.In addition, the general image transformation model has a problem that the result is monotonous and unnatural because it calculates the cognitive loss for the entire area of the image in the same functional space.

본 개시는 상술한 문제를 개선하기 위해 고안된 것으로, 본 개시의 목적은 입력 이미지를 복수의 영역으로 구분하고 구분된 영역을 서로 다른 스타일 정보에 기초하여 변환하는 전자 장치 및 그의 제어 방법을 제공함에 있다.The present disclosure has been devised to improve the above problems, and an object of the present disclosure is to provide an electronic device that divides an input image into a plurality of regions and converts the divided regions based on different style information, and a control method thereof .

상술한 목적을 달성하기 위한 본 실시 예에 따른 전자 장치는 디스플레이, 학습된 인공 지능 모델이 저장된 메모리 및 상기 학습된 인공 지능 모델을 이용하여 임계 해상도 미만의 저해상도 입력 이미지로부터 상기 임계 해상도 이상의 고해상도 출력 이미지를 획득할 수 있고 상기 획득된 고해상도 이미지를 표시하도록 상기 디스플레이를 제어하는 프로세서를 포함하고, 상기 프로세서는 상기 저해상도 입력 이미지에서 제1 특성 정보를 포함하는 제1 영역 및 제2 특성 정보를 포함하는 제2 영역을 식별하고 상기 제1 영역에 대해 제1 강도의 텍스처 처리를 수행한 제1 이미지 및 상기 제2 영역에 대해 상기 제1 강도와 상이한 제2 강도의 텍스처 처리를 수행한 제2 이미지를 획득하고, 상기 제1 이미지 및 상기 제2 이미지를 각각 업스케일링할 수 있고, 상기 업스케일링된 제1 및 제2 이미지를 결합하여 상기 고해상도 이미지를 획득한다.An electronic device according to an embodiment of the present invention for achieving the above object is a high-resolution output image above the threshold resolution from a low-resolution input image less than a threshold resolution using a display, a memory in which a learned artificial intelligence model is stored, and the learned artificial intelligence model and a processor capable of obtaining 2 regions are identified, and a first image in which texturing of a first intensity is performed on the first region and a second image in which texturing of a second intensity different from the first intensity is performed on the second region is obtained and upscaling the first image and the second image, respectively, and combining the upscaled first and second images to obtain the high-resolution image.

한편, 상기 제1 특성 정보는 오브젝트 영역 정보를 포함할 수 있고 상기 제2 특성 정보는 백그라운드 영역 정보를 포함할 수 있고, 상기 프로세서는 상기 오브젝트 영역 정보를 포함하는 영역을 상기 제1 영역으로 식별할 수 있고 상기 백그라운드 영역 정보를 포함하는 영역을 상기 제2 영역으로 식별할 수 있다.Meanwhile, the first characteristic information may include object region information and the second characteristic information may include background region information, and the processor identifies the region including the object region information as the first region. and an area including the background area information may be identified as the second area.

한편, 상기 프로세서는 상기 제1 영역에 포함된 오브젝트의 타입이 제1 타입이면 상기 제1 영역에 대해 상기 제1 타입에 대응되는 강도의 텍스처 처리를 수행할 수 있고, 상기 제1 영역에 포함된 오브젝트의 타입이 제2 타입이면 상기 제1 영역에 대해 상기 제2 타입에 대응되는 강도의 텍스처 처리를 수행할 수 있다.Meanwhile, if the type of the object included in the first region is the first type, the processor may perform texture processing with an intensity corresponding to the first type on the first region, and If the object type is the second type, texture processing with an intensity corresponding to the second type may be performed on the first area.

한편, 상기 프로세서는 상기 저해상도 입력 이미지를 제1 인공 지능 모델에 입력하여 상기 제1 영역에 대한 식별 정보, 상기 제1 강도, 상기 제2 영역에 대한 식별 정보 및 상기 제2 강도를 획득할 수 있고, 상기 제1 인공 지능 모델은 이미지가 입력되면 입력된 이미지에서 특성이 상이한 영역에 대한 식별 정보 및 식별된 각 영역에 대한 텍스처 강도를 출력하도록 학습될 수 있다.Meanwhile, the processor may input the low-resolution input image into a first artificial intelligence model to obtain identification information for the first region, the first intensity, identification information for the second region, and the second intensity, and , the first artificial intelligence model may be trained to output identification information for regions having different characteristics in the input image and texture intensity for each identified region when an image is input.

한편, 상기 제1 영역에 대한 식별 정보는 상기 저해상도 입력 이미지에 포함된 상기 제1 영역에 대한 좌표 정보를 포함할 수 있고, 상기 제2 영역에 대한 식별 정보는 상기 저해상도 입력 이미지에 포함된 상기 제2 영역에 대한 좌표 정보를 포함할 수 있다.Meanwhile, the identification information on the first region may include coordinate information on the first region included in the low-resolution input image, and the identification information on the second region includes the second region included in the low-resolution input image. 2 may include coordinate information for the region.

한편, 상기 프로세서는 상기 제1 영역 및 상기 제2 영역에 대한 식별 정보를 포함하는 이미지, 상기 제1 강도 및 상기 제2 강도를 제2 인공 지능 모델에 입력하여 상기 텍스처 처리 및 업스케일링 처리된 제1 및 제2 이미지를 획득할 수 있다.On the other hand, the processor inputs an image including identification information for the first region and the second region, the first intensity, and the second intensity to a second artificial intelligence model, and receives the texture-processed and upscaling second image. First and second images may be acquired.

한편, 상기 제2 인공 지능 모델은 SFT(Spartial Feature Transformation)를 수행하는 RRDB(Residual-in-Residual Dense Block)을 포함하도록 구현될 수 있다.Meanwhile, the second artificial intelligence model may be implemented to include a residual-in-residual density block (RRDB) that performs partial feature transformation (SFT).

한편, 상기 제2 인공 지능 모델은 복수의 파라미터를 가지는 레이어를 포함할 수 있고, 상이한 특성을 가지는 복수의 영역으로 식별된 이미지 및 상기 복수의 영역에 대응되는 텍스처 강도에 기초하여 획득된 복수의 이미지를 기준 이미지와 비교하여 손실 값을 산출할 수 있고, 상기 산출된 손실 값에 기초하여 상기 복수의 파라미터를 학습시킬 수 있다.Meanwhile, the second artificial intelligence model may include a layer having a plurality of parameters, an image identified as a plurality of regions having different characteristics, and a plurality of images obtained based on texture intensity corresponding to the plurality of regions may be compared with a reference image to calculate a loss value, and the plurality of parameters may be learned based on the calculated loss value.

한편, 상기 제2 인공 지능 모델은 재구성 손실(reconstruction loss), 적대 손실(adversarial loss) 또는 인지 손실(perceptual loss) 중 적어도 하나에 기초하여 상기 손실 값을 산출할 수 있다.Meanwhile, the second artificial intelligence model may calculate the loss value based on at least one of a reconstruction loss, an adversarial loss, and a perceptual loss.

한편, 상기 제2 인공 지능 모델은 GAN(Generative Adversarial Network)으로 구현될 수 있다.Meanwhile, the second artificial intelligence model may be implemented as a Generative Adversarial Network (GAN).

본 개시의 일 실시 예에 따른, 임계 해상도 미만의 저해상도 입력 이미지로부터 상기 임계 해상도 이상의 고해상도 출력 이미지를 획득하도록 학습된 인공 지능 모델을 저장하는 전자 장치의 제어 방법은 상기 저해상도 입력 이미지에서 제1 특성 정보를 포함하는 제1 영역 및 제2 특성 정보를 포함하는 제2 영역을 식별하는 단계, 상기 제1 영역에 대해 제1 강도의 텍스처 처리를 수행한 제1 이미지 및 상기 제2 영역에 대해 상기 제1 강도와 상이한 제2 강도의 텍스처 처리를 수행한 제2 이미지를 획득하는 단계, 상기 제1 이미지 및 상기 제2 이미지를 각각 업스케일링하는 단계 및 상기 업스케일링된 제1 및 제2 이미지를 결합하여 상기 고해상도 이미지를 획득하는 단계를 포함한다.According to an embodiment of the present disclosure, a control method of an electronic device for storing an artificial intelligence model trained to acquire a high-resolution output image equal to or greater than the threshold resolution from a low-resolution input image less than a threshold resolution includes first characteristic information from the low-resolution input image identifying a first area including a first area and a second area including second characteristic information; obtaining a second image subjected to texturing of a second intensity different from the intensity, upscaling the first image and the second image, respectively, and combining the upscaled first and second images to obtain the and acquiring a high-resolution image.

한편, 상기 제1 특성 정보는 오브젝트 영역 정보를 포함할 수 있고 상기 제2 특성 정보는 백그라운드 영역 정보를 포함할 수 있고, 상기 제1 영역 및 상기 제2 영역을 식별하는 단계는 상기 오브젝트 영역 정보를 포함하는 영역을 상기 제1 영역으로 식별할 수 있고 상기 백그라운드 영역 정보를 포함하는 영역을 상기 제2 영역으로 식별할 수 있다.Meanwhile, the first characteristic information may include object region information and the second characteristic information may include background region information, and the step of identifying the first region and the second region includes the object region information. An area including the area may be identified as the first area, and an area including the background area information may be identified as the second area.

한편, 상기 제1 이미지 및 상기 제2 이미지를 획득하는 단계는 상기 제1 영역에 포함된 오브젝트의 타입이 제1 타입이면 상기 제1 영역에 대해 상기 제1 타입에 대응되는 강도의 텍스처 처리를 수행할 수 있고, 상기 제1 영역에 포함된 오브젝트의 타입이 제2 타입이면 상기 제1 영역에 대해 상기 제2 타입에 대응되는 강도의 텍스처 처리를 수행할 수 있다.Meanwhile, in the acquiring of the first image and the second image, if the type of the object included in the first region is the first type, texture processing with an intensity corresponding to the first type is performed on the first region. If the type of the object included in the first region is the second type, texture processing with an intensity corresponding to the second type may be performed on the first region.

한편, 상기 저해상도 입력 이미지를 제1 인공 지능 모델에 입력하여 상기 제1 영역에 대한 식별 정보, 상기 제1 강도, 상기 제2 영역에 대한 식별 정보 및 상기 제2 강도를 획득하는 단계를 더 포함할 수 있고, 상기 제1 인공 지능 모델은 이미지가 입력되면 입력된 이미지에서 특성이 상이한 영역에 대한 식별 정보 및 식별된 각 영역에 대한 텍스처 강도를 출력하도록 학습된, 제어 방법.On the other hand, the step of inputting the low-resolution input image into a first artificial intelligence model to obtain identification information for the first region, the first intensity, identification information for the second region, and the second intensity and the first artificial intelligence model is trained to output identification information for regions having different characteristics in the input image and texture intensity for each identified region when an image is input.

한편, 상기 제1 이미지 및 상기 제2 이미지를 획득하는 단계 및 상기 제1 이미지 및 상기 제2 이미지를 각각 업스케일링하는 단계는, 상기 제1 영역 및 상기 제2 영역에 대한 식별 정보를 포함하는 이미지, 상기 제1 강도 및 상기 제2 강도를 제2 인공 지능 모델에 입력하여 상기 텍스처 처리 및 업스케일링 처리된 제1 및 제2 이미지를 획득할 수 있다.Meanwhile, the steps of obtaining the first image and the second image and upscaling the first image and the second image, respectively, include an image including identification information for the first area and the second area , by inputting the first intensity and the second intensity into a second artificial intelligence model, the first and second images subjected to the texture processing and upscaling may be obtained.

도 1은 일 실시 예에 따른 이미지 변환 동작을 설명하기 위한 도면이다.
도 2는 다른 실시 예에 따른 이미지 변환 동작을 설명하기 위한 도면이다.
도 3은 본 개시의 일 실시 예에 따른 전자 장치를 도시한 블록도이다.
도 4는 도 3의 전자 장치의 구체적인 구성을 설명하기 위한 블록도이다.
도 5는 일 실시 예에 따른 이미지 변환 동작을 설명하기 위한 흐름도이다.
도 6은 도 5의 이미지 변환 동작을 구체적으로 설명하기 위한 흐름도이다.
도 7은 입력 이미지를 복수의 영역으로 구분하는 동작을 설명하기 위한 도면이다.
도 8은 이미지 변환 모델이 학습되는 동작을 설명하기 위한 흐름도이다.
도 9는 본 개시의 일 실시 예에 따른 이미지 변환 모델을 설명하기 위한 도면이다.
도 10은 이미지 변환 동작에 이용하는 계산을 설명하기 위한 도면이다.
도 11은 본 개시의 일 실시 예에 따른 학습 동작에 이용하는 계산을 설명하기 위한 도면이다.
도 12는 본 개시의 다른 실시 예에 따른 이미지 변환 모델을 구체적으로 설명하기 위한 도면이다.
도 13은 본 개시의 일 실시 예에 따른 이미지 변환 모델에 이용되는 기본 블록을 설명하기 위한 도면이다.
도 14는 본 개시의 일 실시 예에 따른 스타일 식별 모델을 설명하기 위한 도면이다.
도 15는 본 개시의 다른 실시 예에 따른 학습 동작에 이용하는 계산을 설명하기 위한 도면이다.
도 16은 본 개시의 다양한 실시 예에 따른 전자 장치의 이미지 변환 동작의 효과를 설명하기 위한 도면이다.
도 17은 일 실시 예에 따른 전자 장치의 이미지 변환 동작을 비교하기 위한 도면이다.
도 18은 다른 실시 예에 따른 전자 장치의 이미지 변환 동작을 비교하기 위한 도면이다.
도 19는 또 다른 실시 예에 따른 전자 장치의 이미지 변환 동작을 비교하기 위한 도면이다.
도 20은 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법을 설명하기 위한 도면이다.1 is a diagram for describing an image conversion operation according to an exemplary embodiment.
2 is a diagram for describing an image conversion operation according to another exemplary embodiment.
3 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure.
FIG. 4 is a block diagram illustrating a detailed configuration of the electronic device of FIG. 3 .
5 is a flowchart illustrating an image conversion operation according to an exemplary embodiment.
6 is a flowchart for describing the image conversion operation of FIG. 5 in detail.
7 is a diagram for explaining an operation of dividing an input image into a plurality of regions.
8 is a flowchart illustrating an operation in which an image transformation model is learned.
9 is a view for explaining an image conversion model according to an embodiment of the present disclosure.
10 is a diagram for explaining a calculation used in an image conversion operation.
11 is a diagram for explaining a calculation used for a learning operation according to an embodiment of the present disclosure.
12 is a view for specifically explaining an image conversion model according to another embodiment of the present disclosure.
13 is a diagram for describing a basic block used in an image transformation model according to an embodiment of the present disclosure.
14 is a view for explaining a style identification model according to an embodiment of the present disclosure.
15 is a diagram for explaining a calculation used for a learning operation according to another embodiment of the present disclosure.
16 is a diagram for explaining an effect of an image conversion operation of an electronic device according to various embodiments of the present disclosure;
17 is a diagram for comparing image conversion operations of an electronic device according to an exemplary embodiment.
18 is a diagram for comparing image conversion operations of an electronic device according to another exemplary embodiment.
19 is a diagram for comparing image conversion operations of an electronic device according to another exemplary embodiment.
20 is a diagram for explaining a method of controlling an electronic device according to an embodiment of the present disclosure.

이하에서는 첨부 도면을 참조하여 본 개시를 상세히 설명한다.Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.

본 개시의 실시 예에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 개시의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in the embodiments of the present disclosure are selected as currently widely used general terms as possible while considering the functions in the present disclosure, which may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, etc. . In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding disclosure. Therefore, the terms used in the present disclosure should be defined based on the meaning of the term and the contents of the present disclosure, rather than the simple name of the term.

본 명세서에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this specification, expressions such as “have,” “may have,” “include,” or “may include” indicate the presence of a corresponding characteristic (eg, a numerical value, function, operation, or component such as a part). and does not exclude the presence of additional features.

A 또는/및 B 중 적어도 하나라는 표현은 "A" 또는 "B" 또는 "A 및 B" 중 어느 하나를 나타내는 것으로 이해되어야 한다.The expression "at least one of A and/or B" is to be understood as indicating either "A" or "B" or "A and B".

본 명세서에서 사용된 "제1," "제2," "첫째," 또는 "둘째,"등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다.As used herein, expressions such as "first," "second," "first," or "second," can modify various elements, regardless of order and/or importance, and refer to one element. It is used only to distinguish it from other components, and does not limit the components.

어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "(기능적으로 또는 통신적으로) 연결되어((operatively or communicatively) coupled with/to)" 있다거나 "접속되어(connected to)" 있다고 언급된 때에는, 어떤 구성요소가 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제3 구성요소)를 통하여 연결될 수 있다고 이해되어야 할 것이다.A component (eg, a first component) is "coupled with/to (operatively or communicatively)" to another component (eg, a second component); When referring to "connected to", it should be understood that an element may be directly connected to another element or may be connected through another element (eg, a third element).

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구성되다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as "comprises" or "consisting of" are intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, and are intended to indicate that one or more other It should be understood that this does not preclude the possibility of addition or presence of features or numbers, steps, operations, components, parts, or combinations thereof.

본 개시에서 "모듈" 혹은 "부"는 적어도 하나의 기능이나 동작을 수행하며, 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 "모듈" 혹은 복수의 "부"는 특정한 하드웨어로 구현될 필요가 있는 "모듈" 혹은 "부"를 제외하고는 적어도 하나의 모듈로 일체화되어 적어도 하나의 프로세서(미도시)로 구현될 수 있다.In the present disclosure, a “module” or “unit” performs at least one function or operation, and may be implemented as hardware or software, or a combination of hardware and software. In addition, a plurality of “modules” or a plurality of “units” are integrated into at least one module and implemented with at least one processor (not shown) except for “modules” or “units” that need to be implemented with specific hardware. can be

본 명세서에서, 사용자라는 용어는 전자 장치를 사용하는 사람 또는 전자 장치를 사용하는 장치(예: 인공지능 전자 장치)를 지칭할 수 있다.In this specification, the term user may refer to a person who uses an electronic device or a device (eg, an artificial intelligence electronic device) using the electronic device.

이하 첨부된 도면들을 참조하여 본 개시의 일 실시 예를 보다 상세하게 설명한다.Hereinafter, an embodiment of the present disclosure will be described in more detail with reference to the accompanying drawings.

본 개시에 따른 인공지능과 관련된 기능은 프로세서와 메모리를 통해 동작된다. 프로세서는 하나 또는 복수의 프로세서로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서 또는 NPU(Neural Network Processing Unit)와 같은 인공지능 전용 프로세서일 수 있다. 하나 또는 복수의 프로세서는, 메모리에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서가 인공지능 전용 프로세서인 경우, 인공지능 전용 프로세서는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다.Functions related to artificial intelligence according to the present disclosure are operated through a processor and a memory. The processor may consist of one or a plurality of processors. At this time, one or more processors are general-purpose processors such as CPUs, APs, Digital Signal Processors (DSPs), etc., graphics-only processors such as GPUs and VPUs (Vision Processing Units), or artificial intelligence-only processors such as NPUs (Neural Network Processing Units). can be One or a plurality of processors control to process input data according to a predefined operation rule or artificial intelligence model stored in the memory. Alternatively, when one or more processors are AI-only processors, the AI-only processor may be designed with a hardware structure specialized for processing a specific AI model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/또는 시스템을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)이 있으나, 전술한 예에 한정되지 않는다.A predefined action rule or artificial intelligence model is characterized in that it is created through learning. Here, being made through learning means that a basic artificial intelligence model is learned using a plurality of learning data by a learning algorithm, so that a predefined action rule or artificial intelligence model set to perform a desired characteristic (or purpose) is created means burden. Such learning may be performed in the device itself on which artificial intelligence according to the present disclosure is performed, or may be performed through a separate server and/or system. Examples of the learning algorithm include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

인공지능 모델은, 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들(weight values)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 신경망은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다.The artificial intelligence model may be composed of a plurality of neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and a neural network operation is performed through an operation between the operation result of a previous layer and the plurality of weights. The plurality of weights of the plurality of neural network layers may be optimized by the learning result of the artificial intelligence model. For example, a plurality of weights may be updated so that a loss value or a cost value obtained from the artificial intelligence model during the learning process is reduced or minimized. The artificial neural network may include a deep neural network (DNN), for example, a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a Restricted Boltzmann Machine (RBM), There may be a Deep Belief Network (DBN), a Bidirectional Recurrent Deep Neural Network (BRDNN), or a Deep Q-Networks, but is not limited thereto.

한편, 지식 표현은 인간의 경험정보를 지식데이터로 자동화 처리하는 기술로서, 지식 구축(데이터 생성/분류), 지식 관리(데이터 활용) 등을 포함할 수 있다.Meanwhile, knowledge expression is a technology for automatically processing human experience information into knowledge data, and may include knowledge construction (data generation/classification), knowledge management (data utilization), and the like.

도 1은 일 실시 예에 따른 이미지 변환 동작을 설명하기 위한 도면이다.1 is a diagram for describing an image conversion operation according to an exemplary embodiment.

도 1을 참조하면, 이미지 변환 모델(101)은 입력 이미지(10)를 수신하여 출력 이미지(20)를 생성할 수 있다. 여기서, 입력 이미지(10)는 저해상도 이미지를 의미할 수 있고, 출력 이미지(20)는 고해상도 이미지를 의미할 수 있다. 여기서, 이미지 변환 모델(101)이 수행하는 이미지 변환 동작은 이미지 초해상화(super resolution)일 수 있다. 여기서, 이미지 변환 모델(101)은 초해상화 네트워크에 해당할 수 있다.Referring to FIG. 1 , the image conversion model 101 may receive an input image 10 and generate an output image 20 . Here, the input image 10 may mean a low-resolution image, and the output image 20 may mean a high-resolution image. Here, the image conversion operation performed by the image conversion model 101 may be image super resolution. Here, the image conversion model 101 may correspond to a super-resolution network.

여기서, 이미지 변환 모델(101)는 스타일 식별 모델(102)을 이용하여 초해상화 동작을 수행할 수 있다. 여기서, 스타일 식별 모델(102)은 이미지 변환 동작에 적용되는 스타일을 결정할 수 있다. 구체적으로, 스타일 식별 모델(102)은 복수의 스타일 정보를 포함하고 있을 수 있다. 그리고, 스타일 식별 모델(102)은 입력 이미지(10)에 대응되는 스타일을 결정할 수 있다.Here, the image conversion model 101 may perform a super-resolution operation using the style identification model 102 . Here, the style identification model 102 may determine a style applied to an image conversion operation. Specifically, the style identification model 102 may include a plurality of style information. In addition, the style identification model 102 may determine a style corresponding to the input image 10 .

여기서, 스타일 정보는 스타일맵을 의미할 수 있다. 여기서, 스타일맵은 영역맵, 제어맵 등으로 기재될 수 있다.Here, the style information may mean a style map. Here, the style map may be described as a region map, a control map, or the like.

이미지 변환 모델(101)는 입력 이미지(10)에 대한 정보를 스타일 식별 모델(102)에 전송할 수 있으며, 스타일 식별 모델(102)은 수신된 입력 이미지(10)에 대한 정보에 기초하여 입력 이미지(10)에 적합한 스타일을 식별할 수 있다. 그리고, 스타일 식별 모델(102)은 식별된 스타일을 이미지 변환 모델(101)에 전송할 수 있다.The image conversion model 101 may transmit information on the input image 10 to the style identification model 102 , and the style identification model 102 may transmit information about the input image 10 to the input image ( 10) can identify a suitable style. In addition, the style identification model 102 may transmit the identified style to the image conversion model 101 .

여기서, 이미지 변환 모델(101)은 스타일 식별 모델(102)에서 식별된 스타일에 기초하여 입력 이미지(10)를 출력 이미지(20)로 초해상화할 수 있다.Here, the image conversion model 101 may super-resolution the input image 10 into the output image 20 based on the style identified in the style identification model 102 .

도 2는 다른 실시 예에 따른 이미지 변환 동작을 설명하기 위한 도면이다.2 is a diagram for describing an image conversion operation according to another exemplary embodiment.

도 2를 참조하면, 이미지 변환 모델(101)은 입력 이미지(10)를 수신하여 출력 이미지(20)를 생성할 수 있다. 여기서, 이미지 변환 모델(101)은 스타일 식별 모델(102)을 포함할 수 있다. 여기서, 스타일 식별 모델(102)은 입력 이미지(10)에 대응되는 스타일을 결정할 수 있다. 구체적으로, 스타일 식별 모델(102)은 입력 이미지(10)에 적합한 스타일을 식별할 수 있으며, 이미지 변환 모델(101)은 식별된 스타일에 기초하여 입력 이미지(10)를 출력 이미지(20)로 초해상화할 수 있다.Referring to FIG. 2 , the image conversion model 101 may receive an input image 10 and generate an output image 20 . Here, the image transformation model 101 may include the style identification model 102 . Here, the style identification model 102 may determine a style corresponding to the input image 10 . Specifically, the style identification model 102 may identify a style suitable for the input image 10 , and the image conversion model 101 converts the input image 10 into the output image 20 based on the identified style. can be resolved.

도 3은 본 개시의 일 실시 예에 따른 전자 장치를 도시한 블록도이다.3 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure.

도 3을 참조하면, 전자 장치(100)는 메모리(110) 및 프로세서(120)로 구성될 수 있다.Referring to FIG. 3 , the electronic device 100 may include a memory 110 and a processor 120 .

본 명세서의 다양한 실시 예들에 따른 전자 장치(100)는, 예를 들면, 스마트폰, 태블릿 PC, 이동 전화기, 데스크탑 PC, 랩탑 PC, PDA, PMP(portable multimedia player) 중 적어도 하나를 포함할 수 있다. 어떤 실시 예들에서, 전자 장치(100)는, 예를 들면, 텔레비전, DVD(digital video disk) 플레이어, 미디어 박스(예: 삼성 HomeSyncTM, 애플TVTM, 또는 구글 TVTM)중 적어도 하나를 포함할 수 있다.The electronic device 100 according to various embodiments of the present specification may include, for example, at least one of a smartphone, a tablet PC, a mobile phone, a desktop PC, a laptop PC, a PDA, and a portable multimedia player (PMP). . In some embodiments, the electronic device 100 may include, for example, at least one of a television, a digital video disk (DVD) player, and a media box (eg, Samsung HomeSync™, Apple TV™, or Google TV™).

메모리(110)는 프로세서(120)에 포함된 롬(ROM)(예를 들어, EEPROM(electrically erasable programmable read-only memory)), 램(RAM) 등의 내부 메모리로 구현되거나, 프로세서(120)와 별도의 메모리로 구현될 수도 있다. 이 경우, 메모리(110)는 데이터 저장 용도에 따라 전자 장치(100)에 임베디드된 메모리 형태로 구현되거나, 전자 장치(100)에 탈부착이 가능한 메모리 형태로 구현될 수도 있다. 예를 들어, 전자 장치(100)의 구동을 위한 데이터의 경우 전자 장치(100)에 임베디드된 메모리에 저장되고, 전자 장치(100)의 확장 기능을 위한 데이터의 경우 전자 장치(100)에 탈부착이 가능한 메모리에 저장될 수 있다.The memory 110 is implemented as an internal memory such as a ROM (eg, electrically erasable programmable read-only memory (EEPROM)) included in the processor 120, a RAM, or the like, or with the processor 120 It may be implemented as a separate memory. In this case, the memory 110 may be implemented in the form of a memory embedded in the electronic device 100 or may be implemented in the form of a memory detachable from the electronic device 100 depending on the purpose of data storage. For example, data for driving the electronic device 100 is stored in a memory embedded in the electronic device 100 , and data for an extended function of the electronic device 100 is detachable from the electronic device 100 . It can be stored in any available memory.

한편, 전자 장치(100)에 임베디드된 메모리의 경우 휘발성 메모리(예: DRAM(dynamic RAM), SRAM(static RAM), 또는 SDRAM(synchronous dynamic RAM) 등), 비휘발성 메모리(non-volatile Memory)(예: OTPROM(one time programmable ROM), PROM(programmable ROM), EPROM(erasable and programmable ROM), EEPROM(electrically erasable and programmable ROM), mask ROM, flash ROM, 플래시 메모리(예: NAND flash 또는 NOR flash 등), 하드 드라이브, 또는 솔리드 스테이트 드라이브(solid state drive(SSD)) 중 적어도 하나로 구현되고, 전자 장치(100)에 탈부착이 가능한 메모리의 경우 메모리 카드(예를 들어, CF(compact flash), SD(secure digital), Micro-SD(micro secure digital), Mini-SD(mini secure digital), xD(extreme digital), MMC(multi-media card) 등), USB 포트에 연결 가능한 외부 메모리(예를 들어, USB 메모리) 등과 같은 형태로 구현될 수 있다.Meanwhile, in the case of a memory embedded in the electronic device 100 , a volatile memory (eg, dynamic RAM (DRAM), static RAM (SRAM), or synchronous dynamic RAM (SDRAM)), non-volatile memory ( Examples: one time programmable ROM (OTPROM), programmable ROM (PROM), erasable and programmable ROM (EPROM), electrically erasable and programmable ROM (EEPROM), mask ROM, flash ROM, flash memory (such as NAND flash or NOR flash, etc.) ), a hard drive, or a solid state drive (SSD), and in the case of a removable memory in the electronic device 100, a memory card (eg, compact flash (CF), SD ( secure digital), Micro-SD (micro secure digital), Mini-SD (mini secure digital), xD (extreme digital), MMC (multi-media card), etc.), external memory that can be connected to the USB port (e.g., USB memory) and the like.

프로세서(120)는 전자 장치(100)의 전반적인 제어 동작을 수행할 수 있다. 구체적으로, 프로세서(120)는 전자 장치(100)의 전반적인 동작을 제어하는 기능을 한다.The processor 120 may perform an overall control operation of the electronic device 100 . Specifically, the processor 120 functions to control the overall operation of the electronic device 100 .

프로세서(120)는 디지털 신호를 처리하는 디지털 시그널 프로세서(digital signal processor(DSP), 마이크로 프로세서(microprocessor), TCON(Time controller)으로 구현될 수 있다. 다만, 이에 한정되는 것은 아니며, 중앙처리장치(central processing unit(CPU)), MCU(Micro Controller Unit), MPU(micro processing unit), 컨트롤러(controller), 어플리케이션 프로세서(application processor(AP)), GPU(graphics-processing unit) 또는 커뮤니케이션 프로세서(communication processor(CP)), ARM 프로세서 중 하나 또는 그 이상을 포함하거나, 해당 용어로 정의될 수 있다. 또한, 프로세서(120)는 프로세싱 알고리즘이 내장된 SoC(System on Chip), LSI(large scale integration)로 구현될 수도 있고, FPGA(Field Programmable gate array) 형태로 구현될 수도 있다. 또한, 프로세서(120)는 메모리(110)에 저장된 컴퓨터 실행가능 명령어(computer executable instructions)를 실행함으로써 다양한 기능을 수행할 수 있다.The processor 120 may be implemented as a digital signal processor (DSP), a microprocessor, or a time controller (TCON) that processes digital signals, but is not limited thereto, and the central processing unit ( central processing unit (CPU), micro controller unit (MCU), micro processing unit (MPU), controller, application processor (AP), graphics-processing unit (GPU) or communication processor (CP)), may include one or more of an ARM processor, or may be defined by a corresponding term In addition, the processor 120 is a SoC (System on Chip) or LSI (large scale integration) in which a processing algorithm is embedded. It may be implemented in the form of a field programmable gate array (FPGA), and the processor 120 may perform various functions by executing computer executable instructions stored in the memory 110. have.

프로세서(120)는 학습된 인공 지능 모델을 이용하여 임계 해상도 미만의 저해상도 입력 이미지로부터 임계 해상도 이상의 고해상도 출력 이미지를 획득하고 획득된 고해상도 이미지를 표시하도록 디스플레이(140)를 제어할 수 있다. 그리고, 프로세서(120)는 저해상도 입력 이미지에서 제1 특성 정보를 포함하는 제1 영역 및 제2 특성 정보를 포함하는 제2 영역을 식별하고, 제1 영역에 대해 제1 강도의 텍스처 처리를 수행한 제1 이미지 및 제2 영역에 대해 제1 강도와 상이한 제2 강도의 텍스처 처리를 수행한 제2 이미지를 획득하고, 제1 이미지 및 제2 이미지를 각각 업스케일링하고, 업스케일링된 제1 및 제2 이미지를 결합하여 고해상도 이미지를 획득할 수 있다.The processor 120 may control the display 140 to obtain a high-resolution output image having a resolution higher than or equal to a threshold resolution from a low-resolution input image less than the threshold resolution by using the learned artificial intelligence model, and display the obtained high-resolution image. Then, the processor 120 identifies a first region including the first characteristic information and a second region including the second characteristic information in the low-resolution input image, and performs texture processing of a first intensity on the first region. acquiring a second image to which texturing of a second intensity different from the first intensity is performed on the first image and the second region, upscaling the first image and the second image, respectively, and performing the upscaled first and second images The two images can be combined to obtain a high-resolution image.

여기서, 인공 지능 모델은 임계 해상도 미만의 저해상도 이미지를 임계 해상도 이상의 고해상도 이미지로 변환하는 모델일 수 있다. 여기서, 임계 해상도는 사용자의 설정에 의해 변경 가능하다. 예를 들어, 프로세서(120)는 제1 해상도의 입력 이미지를 제1 해상도보다 큰 크기의 제2 해상도로 변경할 수 있다. 여기서, 프로세서(120)는 생성된 고해상도 이미지를 표시하도록 디스플레이(140)를 제어할 수 있다.Here, the artificial intelligence model may be a model that converts a low-resolution image less than the threshold resolution into a high-resolution image greater than or equal to the threshold resolution. Here, the threshold resolution can be changed by a user's setting. For example, the processor 120 may change the input image having the first resolution to the second resolution having a size larger than the first resolution. Here, the processor 120 may control the display 140 to display the generated high-resolution image.

일 실시 예에 따라, 인공 지능 모델은 전자 장치(100)의 메모리(110)에 저장될 수 있다. 따라서, 프로세서(120)는 메모리(110)에 저장된 인공 지능 모델을 이용하여 저해상도 이미지를 고해상도 이미지로 변환할 수 있다.According to an embodiment, the artificial intelligence model may be stored in the memory 110 of the electronic device 100 . Accordingly, the processor 120 may convert the low-resolution image into a high-resolution image by using the artificial intelligence model stored in the memory 110 .

다른 실시 예에 따라, 인공 지능 모델은 외부 서버에 저장될 수 있다. 따라서, 프로세서(120)는 통신 인터페이스(130)를 통해 저해상도 이미지를 외부 서버에 전송할 수 있다. 여기서, 외부 서버는 수신된 저해상도 이미지를 고해상도 이미지로 변환할 수 있으며, 변환된 고해상도 이미지를 다시 전자 장치(100)에 전송할 수 있다. 프로세서(120)는 통신 인터페이스(130)를 통해 외부 서버로부터 고해상도 이미지를 수신할 수 있다. 그리고, 프로세서(120)는 수신된 고해상도 이미지를 표시하도록 디스플레이(140)를 제어할 수 있다.According to another embodiment, the artificial intelligence model may be stored in an external server. Accordingly, the processor 120 may transmit the low-resolution image to the external server through the communication interface 130 . Here, the external server may convert the received low-resolution image into a high-resolution image, and transmit the converted high-resolution image back to the electronic device 100 . The processor 120 may receive a high-resolution image from an external server through the communication interface 130 . In addition, the processor 120 may control the display 140 to display the received high-resolution image.

여기서, 프로세서(120)는 저해상도 입력 이미지에서 제1 영역 및 제2 영역을 식별(또는 구분)할 수 있다. 여기서, 제1 영역은 저해상도 입력 이미지의 전체 영역에서 제1 특성 정보가 식별되는 영역을 의미할 수 있으며, 제2 영역은 저해상도 입력 이미지의 전체 영역에서 제2 특성 정보가 식별되는 영역을 의미할 수 있다.Here, the processor 120 may identify (or classify) the first region and the second region in the low-resolution input image. Here, the first region may mean a region in which the first characteristic information is identified in the entire region of the low-resolution input image, and the second region may refer to an region in which the second characteristic information is identified in the entire region of the low-resolution input image. have.

한편, 제1 특성 정보는 오브젝트 영역 정보를 포함하고 제2 특성 정보는 백그라운드 영역 정보를 포함할 수 있고, 오브젝트 영역 정보를 포함하는 영역을 제1 영역으로 식별하고 백그라운드 영역 정보를 포함하는 영역을 제2 영역으로 식별할 수 있다.Meanwhile, the first characteristic information may include object region information and the second characteristic information may include background region information, identifying an area including the object region information as the first region and generating a region including the background region information. Two areas can be identified.

예를 들어, 프로세서(120)는 저해상도 입력 이미지의 전체 영역에서 오브젝트 영역 및 백그라운드 영역을 식별할 수 있다. 여기서, 오브젝트 영역은 사람, 사물 등이 포함된 영역을 의미할 수 있다. 또한, 오브젝트 영역은 텍스처 처리가 많이 필요한 영역을 의미할 수 있다. 여기서, 백그라운드 영역은 하늘, 구름, 바다 등이 포함된 영역을 의미할 수 있다. 또한, 백그라운드 영역은 텍스처 처리가 많이 필요하지 않은 영역을 의미할 수 있다.For example, the processor 120 may identify the object region and the background region from the entire region of the low-resolution input image. Here, the object area may mean an area including people, things, and the like. Also, the object area may mean an area requiring a lot of texture processing. Here, the background area may mean an area including the sky, clouds, sea, and the like. Also, the background area may mean an area that does not require much texture processing.

여기서, 특성 정보는 오브젝트 영역 정보 또는 백그라운드 영역 정보를 포함할 수 있다. 여기서, 오브젝트 영역 정보는 오브젝트를 포함하고 있음을 나타내는 특성을 의미할 수 있으며, 백그라운드 영역 정보는 백그라운드를 포함하고 있음을 나타내는 특성을 의미할 수 있다.Here, the characteristic information may include object area information or background area information. Here, the object region information may mean a characteristic indicating that an object is included, and the background region information may indicate a characteristic indicating that the background is included.

여기서, 프로세서(120)는 저해상도 입력 이미지에 기초하여 특성 정보를 획득할 수 있다. 구체적으로, 프로세서(120)는 저해상도 입력 이미지를 분석하여 위치별 특성 정보를 판단할 수 있다. 그리고, 프로세서(120)는 판단된 특성 정보에 기초하여 저해상도 입력 이미지의 전체 영역을 복수의 영역으로 구분할 수 있다.Here, the processor 120 may acquire characteristic information based on the low-resolution input image. Specifically, the processor 120 may analyze the low-resolution input image to determine characteristic information for each location. In addition, the processor 120 may divide the entire region of the low-resolution input image into a plurality of regions based on the determined characteristic information.

예를 들어, 프로세서(120)는 저해상도 이미지를 분석하여 위치에 따라 오브젝트 특성을 갖는 위치 및 배경 특성을 갖는 위치를 획득할 수 있다. 그리고, 프로세서(120)는 획득된 오브젝트 특성을 갖는 위치 및 배경 특성을 갖는 위치에 기초하여 복수의 영역을 식별할 수 있다. 구체적으로, 프로세서(120)는 오브젝트 특성을 갖는 위치들을 그룹핑하여 하나의 영역으로 결정하고, 배경 특성을 갖는 위치들을 그룹핑하여 하나의 영역으로 결정할 수 있다.For example, the processor 120 may analyze the low-resolution image to obtain a position having an object characteristic and a position having a background characteristic according to a position. In addition, the processor 120 may identify the plurality of regions based on the obtained position having the object characteristic and the position having the background characteristic. Specifically, the processor 120 may group positions having object characteristics to determine one region, and group positions having background characteristics to determine one region.

여기서, 설명의 편의를 위해 2개의 영역으로 구분하는 실시 예를 기재하였지만, 사용자 설정에 따라서 2개의 영역이 아닌 3개 이상의 영역으로 구분될 수 있다.Here, although an embodiment in which two regions are divided for convenience of description has been described, the region may be divided into three or more regions instead of two regions according to user settings.

한편, 프로세서(120)는 제1 영역에 포함된 오브젝트의 타입이 제1 타입이면 제1 영역에 대해 제1 타입에 대응되는 강도의 텍스처 처리를 수행하고, 제1 영역에 포함된 오브젝트의 타입이 제2 타입이면 제1 영역에 대해 제2 타입에 대응되는 강도의 텍스처 처리를 수행할 수 있다.Meanwhile, if the type of the object included in the first area is the first type, the processor 120 performs texture processing with an intensity corresponding to the first type on the first area, and the type of the object included in the first area is In the case of the second type, texture processing with an intensity corresponding to the second type may be performed on the first area.

여기서, 프로세서(120)는 오브젝트 특성을 갖는 영역에서 오브젝트 타입에 기초하여 오브젝트 영역을 복수의 영역으로 구분할 수 있다. 그리고, 오브젝트 타입이 상이한 영역이 식별된 경우, 프로세서(120)는 오브젝트 타입에 따라 상이한 강도의 텍스처 처리를 수행할 수 있다.Here, the processor 120 may divide the object region into a plurality of regions based on the object type in the region having object characteristics. In addition, when regions having different object types are identified, the processor 120 may perform texture processing of different intensity according to the object type.

예를 들어, 프로세서(120)는 오브젝트 영역에서 사람이 포함된 영역과 건물이 포함된 영역을 구분할 수 있다. 그리고, 사람이 포함된 영역에 대하여 사람에 대응되는 강도(예를 들어, 레벨 3)의 텍스처 처리를 수행하고 건물이 포함된 영역에 대하여 건물에 대응되는 강도(예를 들어, 레벨 5)의 텍스처 처리를 수행할 수 있다.For example, the processor 120 may distinguish an area including a person from an area including a building in the object area. Then, texture processing of the intensity corresponding to the person (eg, level 3) is performed on the area including the person, and the texture of the intensity (eg, level 5) corresponding to the building is performed on the area including the building. processing can be performed.

여기서, 제1 강도 및 제2 강도 등의 강도 정보는 스타일 정보로 기재할 수 있다. 텍스처 처리를 위한 강도가 상이하다는 의미는 스타일이 상이하다는 의미일 수 있다. 스타일 정보는 특정 영역에 대하여 어떤 처리를 수행해야 하는지 나타내는 정보를 포함할 수 있다. 예를 들어, 제1 스타일 정보는 제1 강도의 텍스처 처리를 수행해야 된다는 정보를 포함할 수 있으며, 제2 스타일 정보는 제2 강도의 텍스처 처리를 수행해야 된다는 정보를 포함할 수 있다. 프로세서(120)는 입력 이미지(10)에 포함된 제1 영역에 대응되는 제1 특성 정보 및 제2 영역에 대응되는 제2 특성 정보 각각에 기초하여 제1 스타일 정보 및 제2 스타일 정보를 획득할 수 있고, 제1 스타일 정보 및 제2 스타일 정보를 인공 지능 모델에 각각 입력하여 제1 영역에 대응되는 고해상도 이미지 및 제2 영역에 대응되는 고해상도 이미지를 획득할 수 있고, 제1 영역에 대응되는 고해상도 이미지 및 제2 영역에 대응되는 고해상도 이미지에 기초하여 출력 이미지(20)를 획득할 수 있다.Here, strength information such as the first strength and the second strength may be described as style information. A different intensity for texture processing may mean a different style. The style information may include information indicating which processing to be performed on a specific region. For example, the first style information may include information indicating that texture processing of a first intensity should be performed, and the second style information may include information indicating that texture processing of a second intensity should be performed. The processor 120 may acquire the first style information and the second style information based on each of the first characteristic information corresponding to the first region and the second characteristic information corresponding to the second region included in the input image 10 . and inputting the first style information and the second style information into the artificial intelligence model, respectively, to obtain a high-resolution image corresponding to the first region and a high-resolution image corresponding to the second region, and a high-resolution image corresponding to the first region The output image 20 may be acquired based on the image and the high-resolution image corresponding to the second region.

여기서, 프로세서(120)는 입력 이미지(10)를 획득할 수 있다. 여기서, 입력 이미지(10)는 저해상도 이미지를 의미할 수 있다.Here, the processor 120 may acquire the input image 10 . Here, the input image 10 may mean a low-resolution image.

또한, 프로세서(120)는 획득된 입력 이미지(10)에 대응되는 전체 영역을 복수의 영역으로 구분(또는 식별)할 수 있다. 구체적으로, 프로세서(120)는 입력 이미지(10)를 분석하여 영역별로 특성이 상이 한지 판단할 수 있다. 입력 이미지(10)에 포함된 복수의 영역의 특성이 다르다면 상이한 방식으로 고해상도 변환 동작을 수행할 필요성이 있기 때문이다. 따라서, 프로세서(120)는 입력 이미지(10)에 포함된 서로 다른 특성을 갖는 복수의 영역을 구분할 수 있다.Also, the processor 120 may divide (or identify) the entire region corresponding to the acquired input image 10 into a plurality of regions. Specifically, the processor 120 may analyze the input image 10 to determine whether the characteristics are different for each region. This is because, if characteristics of a plurality of regions included in the input image 10 are different, it is necessary to perform a high-resolution conversion operation in a different manner. Accordingly, the processor 120 may distinguish a plurality of regions having different characteristics included in the input image 10 .

구체적으로, 프로세서(120)는 입력 이미지(10)에서 오브젝트를 식별할 수 있으며, 식별된 오브젝트에 기초하여 복수의 영역을 구분할 수 있다.Specifically, the processor 120 may identify an object in the input image 10 , and may classify a plurality of regions based on the identified object.

한편, 프로세서(120)는 입력 이미지(10)에 포함된 제1 오브젝트 및 제2 오브젝트를 식별할 수 있고, 제1 오브젝트의 위치 및 제2 오브젝트의 위치에 기초하여 제1 오브젝트를 포함하는 제1 영역 및 제2 오브젝트를 포함하는 제2 영역을 구분할 수 있다. 여기서, 특성 정보는 구분된(또는 식별된) 영역에 대응되는 특성을 포함할 수 있다.Meanwhile, the processor 120 may identify the first object and the second object included in the input image 10 , and based on the position of the first object and the position of the second object, the first object including the first object The region and the second region including the second object may be divided. Here, the characteristic information may include a characteristic corresponding to the divided (or identified) area.

한편, 제1 특성 정보는 제1 오브젝트의 특성 정보를 포함할 수 있고, 제2 특성 정보는 제2 오브젝트의 특성 정보를 포함할 수 있다.Meanwhile, the first characteristic information may include characteristic information of the first object, and the second characteristic information may include characteristic information of the second object.

예를 들어, 특성 정보는 오브젝트의 종류, 이름을 포함할 수 있다. 제1 오브젝트가 배경 오브젝트이면, 제1 특성 정보는 배경 특성일 수 있으며, 추가적으로 하늘 배경, 숲 배경, 건물 배경, 땅 배경 등 특정된 배경을 포함할 수 있다.For example, the characteristic information may include the type and name of the object. When the first object is a background object, the first characteristic information may be a background characteristic, and may additionally include a specific background such as a sky background, a forest background, a building background, and a ground background.

또한, 제2 오브젝트가 사물 오브젝트이면, 제2 특성 정보는 사물 특성일 수 있다. 제2 오브젝트가 사물 오브젝트이면, 제2 특성 정보는 사물 특성일 수 있으며, 추가적으로 사람 사물, 동물 사물, 건축물 사물, 금속 표면 사물 등을 의미할 수 있다.Also, if the second object is a thing object, the second characteristic information may be a thing characteristic. If the second object is an object object, the second characteristic information may be an object characteristic, and may additionally mean a human object, an animal object, a building object, a metal surface object, or the like.

한편, 제1 스타일 정보는 기 결정된 이미지 스타일의 제1 레벨에 기초하여 제1 영역에 대응되는 이미지를 변환하기 위한 정보를 포함할 수 있고, 제2 스타일 정보는 기 결정된 이미지 스타일의 제1 레벨과 상이한 제2 레벨에 기초하여 제2 영역에 대응되는 이미지를 변환하기 위한 정보를 포함할 수 있다.Meanwhile, the first style information may include information for transforming an image corresponding to the first region based on a first level of a predetermined image style, and the second style information includes a first level and a predetermined image style. Information for transforming an image corresponding to the second region based on a different second level may be included.

여기서, 기 결정된 이미지 스타일은 인공 지능 모델이 고해상도 이미지를 생성하면서 추가적으로 고려할 스타일을 의미할 수 있다. 여기서, 인공 지능 모델은 복수의 이미지 스타일 중 적어도 하나의 스타일에 기초하여 고해상도 이미지를 생성할 수 있다. 예를 들어, 기 결정된 이미지 스타일은 질감(texture) 효과를 반영하는 스타일을 의미할 수 있다.Here, the predetermined image style may mean a style to be additionally considered while the artificial intelligence model generates a high-resolution image. Here, the artificial intelligence model may generate a high-resolution image based on at least one style among a plurality of image styles. For example, the predetermined image style may mean a style reflecting a texture effect.

또한, 기 결정된 이미지 스타일은 복수의 레벨로 이미지 변환 동작에 반영될 수 있다. 여기서, 레벨은 기 결정된 이미지 스타일의 반영 정도를 나타내는 정보를 의미할 수 있다. 예를 들어, 질감(texture) 효과를 반영하는 스타일은 3개의 레벨로 이미지 변환 동작에 반영될 수 있다. 여기서, 제1 레벨은 질감 효과를 최소로 반영하거나 질감 효과를 전혀 반영하지 않는 레벨이고, 제2 레벨은 질감 효과를 중간 정도로 반영하는 레벨이고, 제3 레벨은 질감 효과를 최대로 반영하는 레벨일 수 있다. 따라서, 프로세서(120)는 획득된 특성 정보에 대응되는 레벨을 식별하고, 식별된 레벨에 기초하여 이미지 변환 동작을 수행할 수 있다.In addition, the predetermined image style may be reflected in the image conversion operation in a plurality of levels. Here, the level may mean information indicating the degree of reflection of a predetermined image style. For example, a style reflecting a texture effect may be reflected in an image conversion operation in three levels. Here, the first level is a level that minimally reflects the texture effect or does not reflect the texture effect at all, the second level is a level that reflects the texture effect to an intermediate degree, and the third level is a level that reflects the texture effect to the maximum. can Accordingly, the processor 120 may identify a level corresponding to the acquired characteristic information, and perform an image conversion operation based on the identified level.

스타일 정보는 기 결정된 이미지 스타일 또는 기 결정된 이미지 스타일의 레벨 정보를 포함할 수 있다. 여기서, 기 결정된 이미지 스타일에 대응되는 레벨이 존재하지 않는 경우, 스타일 정보는 레벨 정보를 포함하지 않을 수 있다. 또한, 기 결정된 이미지 스타일 자체에 이미 레벨 정보가 반영될 수 있다. 예를 들어, 스타일 정보는 제1 레벨의 질감 효과를 반영하는 제1 스타일, 제2 레벨의 질감 효과를 반영하는 제2 스타일, 제3 레벨의 질감 효과를 반영하는 제3 스타일 중 적어도 하나를 포함할 수 있다.The style information may include a predetermined image style or level information of a predetermined image style. Here, when a level corresponding to the predetermined image style does not exist, the style information may not include level information. In addition, level information may already be reflected in the predetermined image style itself. For example, the style information includes at least one of a first style reflecting a texture effect of a first level, a second style reflecting a texture effect of a second level, and a third style reflecting a texture effect of a third level can do.

한편, 인공 지능 모델은 복수의 이미지 스타일을 포함하는 스타일 식별 모델(102) 및 입력 이미지(10)를 출력 이미지(20)로 변환하는 이미지 변환 모델(101)을 포함할 수 있다.Meanwhile, the artificial intelligence model may include a style identification model 102 including a plurality of image styles and an image conversion model 101 for converting an input image 10 into an output image 20 .

한편, 프로세서(120)는 저해상도 입력 이미지를 제1 인공 지능 모델에 입력하여 제1 영역에 대한 식별 정보, 제1 영역을 처리할 텍스처 강도인 제1 강도, 제2 영역에 대한 식별 정보 및 제2 영역을 처리할 텍스처 강도인 제2 강도를 획득하며, 제1 인공 지능 모델은 이미지가 입력되면 입력된 이미지에서 특성이 상이한 영역에 대한 식별 정보 및 식별된 각 영역에 대한 텍스처 강도를 출력하도록 학습할 수 있다.Meanwhile, the processor 120 inputs the low-resolution input image to the first artificial intelligence model to provide identification information for the first region, a first intensity that is a texture intensity to process the first region, identification information for the second region, and the second A second intensity, which is the texture intensity to process the region, is acquired, and when an image is input, the first artificial intelligence model learns to output identification information for regions with different characteristics in the input image and texture intensity for each identified region. can

여기서, 제1 인공 지능 모델은 도12의 스타일 식별 모델(102)일 수 있다. 여기서, 제1 인공 지능 모델은 이미지를 입력 받아 영역 및 영역에 대응되는 텍스처 강도를 출력할 수 있다. 구체적으로, 제1 인공 지능 모델을 이미지를 복수의 영역으로 구분하고, 구분된 영역에 대응되는 특성을 획득하고, 획득된 특성에 대응되는 텍스처 강도를 획득할 수 있다. 예를 들어, 제1 인공 지능 모델은 이미지를 제1 영역 및 제2 영역으로 구분하고, 제1 영역에는 제1 강도의 텍스처 처리가 필요하고 제2 영역은 제2 강도의 텍스처 처리가 필요하다고 결정할 수 있다.Here, the first artificial intelligence model may be the style identification model 102 of FIG. 12 . Here, the first artificial intelligence model may receive an image and output a region and a texture intensity corresponding to the region. Specifically, in the first AI model, the image may be divided into a plurality of regions, a characteristic corresponding to the divided region may be acquired, and a texture intensity corresponding to the acquired characteristic may be acquired. For example, the first artificial intelligence model divides the image into a first region and a second region, and determines that the first region needs texturing of a first intensity and that the second region needs texturing of a second intensity. can

여기서, 제1 영역에 대한 식별 정보는 제1 영역의 위치 정보 또는 좌표 정보를 의미할 수 있다. 여기서, 제2 영역에 대한 식별 정보는 제2 영역의 위치 정보 또는 좌표 정보를 의미할 수 있다.Here, the identification information on the first region may mean location information or coordinate information of the first region. Here, the identification information on the second region may mean location information or coordinate information of the second region.

한편, 제1 영역에 대한 식별 정보는 저해상도 입력 이미지에 포함된 제1 영역에 대한 좌표 정보를 포함하고, 제2 영역에 대한 식별 정보는 저해상도 입력 이미지에 포함된 제2 영역에 대한 좌표 정보를 포함할 수 있다.Meanwhile, the identification information for the first region includes coordinate information for the first region included in the low-resolution input image, and the identification information for the second region includes coordinate information for the second region included in the low-resolution input image. can do.

여기서, 좌표 정보는 이미지 상의 위치 정보를 의미할 수 있다.Here, the coordinate information may mean location information on an image.

한편, 프로세서(120)는 제1 영역 및 제2 영역에 대한 식별 정보를 포함하는 이미지, 제1 강도 및 제2 강도를 제2 인공 지능 모델에 입력하여 텍스처 처리 및 업스케일링 처리된 제1 및 제2 이미지를 획득할 수 있다.On the other hand, the processor 120 inputs the image including the identification information for the first region and the second region, the first intensity, and the second intensity to the second artificial intelligence model, and the first and second regions subjected to texture processing and upscaling. 2 images can be acquired.

여기서, 프로세서(120)는 복수의 영역 각각에 대한 식별 정보 및 복수의 영역 각각에 대한 텍스처 강도 정보를 제2 인공 지능 모델에 입력할 수 있다. 여기서, 제2 인공 지능 모델은 도 12의 이미지 변환 모델(101)일 수 있다.Here, the processor 120 may input identification information for each of the plurality of regions and texture intensity information for each of the plurality of regions to the second artificial intelligence model. Here, the second artificial intelligence model may be the image transformation model 101 of FIG. 12 .

여기서, 제2 인공 지능 모델은 제1 영역에 대하여 제1 강도의 텍스처 처리를 수행하여 제1 이미지를 획득할 수 있다. 그리고, 제2 인공 지능 모델은 제2 영역에 대하여 제2 강도의 텍스처 처리를 수행하여 제2 이미지를 획득할 수 있다.Here, the second artificial intelligence model may acquire the first image by performing texture processing of the first intensity on the first region. In addition, the second artificial intelligence model may obtain a second image by performing texture processing of a second intensity on the second region.

여기서, 제2 인공 지능 모델은 획득된 제1 이미지를 업스케일링하여 업스케일링된 제1 이미지를 획득할 수 있다. 그리고, 제2 인공 지능 모델은 획득된 제2 이미지를 업스케일링하여 업스케일링된 제2 이미지를 획득할 수 있다.Here, the second artificial intelligence model may acquire the upscaled first image by upscaling the acquired first image. In addition, the second artificial intelligence model may acquire an upscaled second image by upscaling the acquired second image.

한편, 제2 인공 지능 모델은 SFT(Spartial Feature Transformation)를 수행하는 RRDB(Residual-in-Residual Dense Block)을 포함하도록 구현될 수 있다. 구체적으로, 인공 지능 모델에 포함된 이미지 변환 모델(101)이 SFT 레이어를 포함하는 RRDB를 기본 블록으로 이용할 수 있다. 이와 관련된 구체적인 설명은 도 13에서 후술한다.Meanwhile, the second artificial intelligence model may be implemented to include a residual-in-residual density block (RRDB) that performs partial feature transformation (SFT). Specifically, the image transformation model 101 included in the artificial intelligence model may use the RRDB including the SFT layer as a basic block. A detailed description related thereto will be described later with reference to FIG. 13 .

여기서, 이미지 변환 모델(101)및 스타일 식별 모델(102)은 별개의 인공 지능 모델로 구현될 수 있다. 각 모델이 별개의 동작을 구분하여 수행함으로써 데이터 처리 속도를 향상시키거나 동작 알고리즘을 간소화할 수 있다.Here, the image transformation model 101 and the style identification model 102 may be implemented as separate artificial intelligence models. By performing each model separately and performing separate operations, data processing speed can be improved or operation algorithms can be simplified.

여기서, 이미지 변환 모델(101)은 저해상도 이미지를 고해상도 이미지로 변환하는 동작을 수행할 수 있으며, 변환 동작에서 스타일 식별 모델(102)에서 식별된 스타일 정보를 추가적으로 고려할 수 있다.Here, the image conversion model 101 may perform an operation of converting a low-resolution image into a high-resolution image, and may additionally consider the style information identified in the style identification model 102 in the conversion operation.

예를 들어, 스타일 식별 모델(102)에서 제1 영역에 질감 효과를 최소로 반영하고 제2 영역에 질감 효과를 최대로 반영하는 스타일 정보를 생성하고, 생성된 스타일 정보를 이미지 변환 모델(101)에 전송할 수 있다. 여기서, 이미지 변환 모델(101)는 스타일 식별 모델(102)로부터 수신한 스타일 정보에 기초하여 제1 영역에 대응되는 이미지는 질감 효과를 최소로 반영하면서 고해상도 이미지로 변환하고, 제2 영역에 대응되는 이미지는 질감 효과를 최대로 반영하면서 고해상도 이미지로 변환할 수 있다.For example, in the style identification model 102 , the style information that reflects the texture effect to the minimum in the first area and the texture effect to the maximum in the second area is generated, and the generated style information is used in the image conversion model 101 . can be sent to Here, the image conversion model 101 converts the image corresponding to the first region into a high-resolution image while minimizing the texture effect based on the style information received from the style identification model 102 and corresponding to the second region. Images can be converted to high-resolution images while maximally reflecting the texture effect.

또한, 프로세서(120)는 이미지 변환 모델(101)에 기초하여 제1 특성 정보 및 제2 특성 정보를 획득할 수 있고, 스타일 식별 모델(102)에 기초하여 제1 특성 정보에 대응되는 제1 스타일 정보 및 제2 특성 정보에 대응되는 제2 스타일 정보를 획득할 수 있다.In addition, the processor 120 may obtain the first characteristic information and the second characteristic information based on the image transformation model 101 , and based on the style identification model 102 , the first style corresponding to the first characteristic information Second style information corresponding to the information and the second characteristic information may be acquired.

한편, 프로세서(120)는 제1 스타일 정보에 기초하여 제1 영역을 저해상도의 제1 이미지로 변환할 수 있고, 제2 스타일 정보에 기초하여 제2 영역을 저해상도 제2 이미지로 변환할 수 있고, 제1 이미지를 업스케일링하여 제1 영역에 대응되는 고해상도 이미지를 획득할 수 있고, 제2 이미지를 업스케일링하여 제2 영역에 대응되는 고해상도 이미지를 획득할 수 있다.Meanwhile, the processor 120 may convert the first region into a low-resolution first image based on the first style information, and convert the second region into a low-resolution second image based on the second style information, A high-resolution image corresponding to the first area may be obtained by upscaling the first image, and a high-resolution image corresponding to the second area may be obtained by upscaling the second image.

여기서, 프로세서(120)는 이미지 변환 모델(101)에 기초하여 이미지 변환 동작을 수행할 수 있는데, 저해상도 처리 동작, 업스케일링 동작 및 고해상도 처리 동작을 수행할 수 있다. 여기서, 저해상도 처리 동작은 업스케일링 동작 이전에 입력 이미지(10)를 변환하는 동작을 의미할 수 있다. 여기서, 저해상도 처리 동작은 도 12의 저해상도 처리부(1201)에 의하여 수행될 수 있다. 여기서, 업스케일링 동작은 도 12의 업스케일링부(1202)에 의하여 수행될 수 있다. 여기서, 고해상도 처리 동작은 도 12의 고해상도 처리부(1203)에 의하여 수행될 수 있다.Here, the processor 120 may perform an image conversion operation based on the image conversion model 101 , and may perform a low-resolution processing operation, an upscaling operation, and a high-resolution processing operation. Here, the low-resolution processing operation may refer to an operation of converting the input image 10 before the upscaling operation. Here, the low-resolution processing operation may be performed by the low-resolution processing unit 1201 of FIG. 12 . Here, the upscaling operation may be performed by the upscaling unit 1202 of FIG. 12 . Here, the high-resolution processing operation may be performed by the high-resolution processing unit 1203 of FIG. 12 .

여기서, 이미지 변환 모델(101) 및 스타일 식별 모델(102)을 이용한 다양한 실시 예는 도 1 및 도 2에서 기재하였다. 또한, 구체적인 동작은 도 12에서 후술한다.Here, various embodiments using the image conversion model 101 and the style identification model 102 have been described in FIGS. 1 and 2 . In addition, a specific operation will be described later with reference to FIG. 12 .

한편, 상술한 설명은 입력 이미지(10)를 출력 이미지(20)로 변환하는 동작과 관련된 설명이었다. 변환 동작에서 이미지 변환 모델(101) 및 스타일 식별 모델(102)이 이용될 수 있으며, 이미지 변환 모델(101)에서 이미지 변환 동작에 이용하는 파라미터는 이미 학습된 최종 파라미터일 수 있다. 즉, 이미지 변환 모델(101)은 이미 복수의 입력 이미지, 복수의 입력 이미지에 대응되는 기준 이미지에 기초하여 학습을 완료한 상태일 수 있다. 학습이 완료된 이미지 변환 모델(101)에 기초하여 전자 장치(100)는 이미지 변환 동작을 수행할 수 있다.Meanwhile, the above description was related to the operation of converting the input image 10 into the output image 20 . In the transformation operation, the image transformation model 101 and the style identification model 102 may be used, and the parameters used in the image transformation operation in the image transformation model 101 may be already learned final parameters. That is, the image conversion model 101 may have already completed learning based on the plurality of input images and the reference image corresponding to the plurality of input images. Based on the image conversion model 101 that has been trained, the electronic device 100 may perform an image conversion operation.

한편, 제2 인공 지능 모델은 복수의 파라미터를 가지는 레이어를 포함하고, 상이한 특성을 가지는 복수의 영역으로 식별된 이미지 및 복수의 영역에 대응되는 텍스처 강도에 기초하여 획득된 복수의 이미지를 기준 이미지와 비교하여 손실 값을 산출하고, 산출된 손실 값에 기초하여 복수의 파라미터를 학습시킬 수 있다.On the other hand, the second artificial intelligence model includes a layer having a plurality of parameters, an image identified as a plurality of regions having different characteristics, and a plurality of images obtained based on texture intensity corresponding to the plurality of regions with a reference image A loss value may be calculated by comparison, and a plurality of parameters may be learned based on the calculated loss value.

이미지 변환 모델(101)이 학습되는 과정에 대하여 설명한다. 인공 지능 모델은 학습을 위하여 손실값 계산 모델(103)을 추가적으로 이용할 수 있다.A process in which the image transformation model 101 is trained will be described. The artificial intelligence model may additionally use the loss value calculation model 103 for learning.

구체적으로, 인공 지능 모델은 복수의 이미지 스타일을 포함하는 스타일 식별 모델(102), 입력 이미지(10)를 출력 이미지(20)로 변환하는 이미지 변환 모델(101) 및 이미지 변환 모델(101)에 대응되는 복수의 파라미터를 결정하기 위한 학습 동작에 이용되는 손실값 계산 모델(103)을 포함할 수 있고, 이미지 변환 모델(101)은 스타일 정보에 기초하여 학습 입력 이미지를 학습 출력 이미지로 변환할 수 있고, 손실값 계산 모델(103)은 스타일 정보에 기초하여 학습 출력 이미지에 대응되는 손실을 획득할 수 있고, 이미지 변환 모델(101)은 손실에 기초하여 복수의 파라미터를 결정할 수 있다.Specifically, the artificial intelligence model corresponds to a style identification model 102 including a plurality of image styles, an image transformation model 101 for converting an input image 10 into an output image 20 , and an image transformation model 101 . and a loss value calculation model 103 used in a learning operation to determine a plurality of parameters to be , the loss value calculation model 103 may acquire a loss corresponding to the training output image based on the style information, and the image conversion model 101 may determine a plurality of parameters based on the loss.

한편, 제2 인공 지능 모델은 재구성 손실(reconstruction loss), 적대 손실(adversarial loss) 또는 인지 손실(perceptual loss) 중 적어도 하나에 기초하여 손실 값을 산출할 수 있다.Meanwhile, the second artificial intelligence model may calculate a loss value based on at least one of a reconstruction loss, an adversarial loss, and a perceptual loss.

한편, 제2 인공 지능 모델은 GAN(Generative Adversarial Network)으로 구현될 수 있다.Meanwhile, the second artificial intelligence model may be implemented as a Generative Adversarial Network (GAN).

여기서, 인지 손실은 스타일 정보에 기초하여 획득되고, 이미지 변환 모델(101)은 적어도 하나의 손실에 기초하여 복수의 파라미터를 결정할 수 있다.Here, the cognitive loss is obtained based on the style information, and the image transformation model 101 may determine a plurality of parameters based on at least one loss.

여기서, 손실값 획득 동작에서 이용되는 이미지는 이미지 변환 모델(101)에서 생성한 출력 이미지(20)와 입력 이미지(10)에 대응되는 기준 이미지(30)일 수 있다. 여기서, 기준 이미지(30)는 입력 이미지(10)의 고해상도 이미지를 의미할 수 있으며 실측 자료(Ground Truth)를 의미할 수 있다.Here, the image used in the loss value acquisition operation may be the output image 20 generated by the image transformation model 101 and the reference image 30 corresponding to the input image 10 . Here, the reference image 30 may mean a high-resolution image of the input image 10 and may mean ground truth.

여기서, 재구성 손실은 출력 이미지(20)와 기준 이미지(30)의 차이값을 포함하는 정보일 수 있다. 재구성 손실과 관련된 구체적인 계산은 도 11의 수학식(1111)에서 후술한다.Here, the reconstruction loss may be information including a difference value between the output image 20 and the reference image 30 . A detailed calculation related to the reconstruction loss will be described later in Equation 1111 of FIG. 11 .

여기서, 인지 손실은 지각 손실일 수 있으며, 다른 딥러닝 기반의 영상 분류 네트워크에 학습 이미지들(입력 이미지(10), 출력 이미지(20) 또는 기준 이미지(30) 중 적어도 하나)을 통과시킨 후 획득되는 특징맵(feature map) 사이의 손실을 의미할 수 있다. 여기서, 영상 기반 분류 네트워크는 스타일 정보에 대응되는 이미지 변환 네트워크를 의미할 수 있다. 예를 들어, 제1 스타일 정보를 반영하기 위하여 이미지 변환 모델(101)은 제1 이미지 변환 네트워크를 이용할 수 있다. 여기서, 손실값 계산 모델(103)은 제1 이미지 변환 네트워크에 기초하여 출력 이미지(20) 및 기준 이미지(30) 사이의 차이값을 획득할 수 있다. 구체적인 계산 과정은 도 11의 수학식(1121) 및 수학식(1122)에서 후술한다.Here, the cognitive loss may be a perceptual loss, and is acquired after passing the training images (at least one of the input image 10, the output image 20, or the reference image 30) to another deep learning-based image classification network. It may mean a loss between the feature maps that become Here, the image-based classification network may mean an image transformation network corresponding to style information. For example, the image transformation model 101 may use the first image transformation network to reflect the first style information. Here, the loss value calculation model 103 may obtain a difference value between the output image 20 and the reference image 30 based on the first image transformation network. A detailed calculation process will be described later in Equations 1121 and 1122 of FIG. 11 .

여기서, 적대 손실은 적대적 생성 네트워크(Generative Adversarial Network, GAN)에 기초하여 획득되는 손실을 의미할 수 있다. 여기서, 적대 손실은 진짜 이미지(기준 이미지)를 진짜로 판단할 확률 및 가짜 이미지(출력 이미지)를 가짜로 판단할 확률값의 차이에 기초하여 획득되는 손실을 의미할 수 있다. 구체적인 계산 과정은 도 11의 수학식(1131), 수학식(1132) 및 수학식(1133)에서 후술한다.Here, the adversarial loss may mean a loss obtained based on a Generative Adversarial Network (GAN). Here, the hostility loss may mean a loss obtained based on a difference between a probability value of determining a real image (reference image) as real and a probability value of determining a fake image (output image) as fake. A detailed calculation process will be described later in Equations 1131 , 1132 , and 1133 of FIG. 11 .

한편, 손실값 계산 모델(103)은 손실값 계산 모델(103)에 포함된 판별기(discriminator)에 기초하여 판별기 손실(discriminator loss)을 획득할 수 있고, 이미지 변환 모델(101)은 판별기 손실에 기초하여 복수의 파라미터를 결정할 수 있다.On the other hand, the loss value calculation model 103 may obtain a discriminator loss based on a discriminator included in the loss value calculation model 103 , and the image conversion model 101 is a discriminator A plurality of parameters may be determined based on the loss.

여기서, 판별기(discriminator)는 적대적 생성 네트워크(Generative Adversarial Network, GAN)에서 이용되는 분류 네트워크를 의미할 수 있다. 여기서, 판별기는 진짜 이미지(기준 이미지)를 진짜로 판단할 확률 및 가짜 이미지(출력 이미지)를 가짜로 판단하는 동작을 수행할 수 있다.Here, the discriminator may refer to a classification network used in a Generative Adversarial Network (GAN). Here, the discriminator may perform an operation of determining a real image (reference image) as real and a fake image (output image) as fake.

여기서, 판별기 손실은 판별기의 판별 동작 자체를 학습하는 과정에서 획득되는 손실을 의미할 수 있다.Here, the discriminator loss may mean a loss obtained in the process of learning the discriminator's discriminator operation itself.

적대 손실과 판별기 손실은 진짜 이미지(기준 이미지)를 진짜로 판단할 확률 및 가짜 이미지(출력 이미지)를 가짜로 판단하는 동작에서 획득되는 것이라는 점에서 공통될 수 있다. 하지만, 적대 손실은 이미지 변환 동작에서 획득되는 손실이며, 판별기 손실은 판별 동작에서 획득되는 손실이라는 점에서 차이가 있다. 구체적인 계산 과정은 도 15의 수학식(1501), 수학식(1502) 및 수학식(1503)에서 후술한다.The adversarial loss and the discriminator loss can be common in that they are obtained in the probability of judging the real image (reference image) as real and the operation of judging the fake image (output image) as fake. However, there is a difference in that the adversarial loss is a loss obtained in the image conversion operation and the discriminator loss is a loss obtained in the discrimination operation. A detailed calculation process will be described later in Equations 1501, 1502, and 1503 of FIG. 15 .

한편, 전자 장치(100)는 다양한 스타일 정보에 기초하여 이미지 변환을 수행하기 위해 복수의 지각 손실의 가중 합계를 이용하여 이미지 변환 모델(101)을 최적화할 수 있다.Meanwhile, the electronic device 100 may optimize the image transformation model 101 by using a weighted sum of a plurality of perceptual losses to perform image transformation based on various style information.

또한, 전자 장치(100)는 재구성 스타일을 생성할 수 있는 조건부 목표 함수를 이용하는 이미지 변환 모델(101)을 이용하여 복수의 영역마다 적응적인 고해상도 이미지를 생성할 수 있다.Also, the electronic device 100 may generate an adaptive high-resolution image for each of a plurality of regions by using the image transformation model 101 using a conditional target function capable of generating a reconstruction style.

한편, 인공 지능 모델을 이용하여 입력 데이터를 출력 이미지로 변환하는 다양한 실시 예가 있을 수 있다.Meanwhile, there may be various embodiments of converting input data into an output image using an artificial intelligence model.

일 실시 예에 따라, 전자 장치(100)는 인공 지능 모델을 저장하고 있으며, 입력 데이터를 수신하고, 인공 지능 모델에 기초하여 수신된 입력 데이터에 대응되는 출력 이미지를 획득하고, 출력 이미지를 사용자에게 제공할 수 있다. 여기서, 출력 이미지를 사용자에게 제공하는 것은 이미지를 디스플레이에 표시하거나 오디오를 스피커를 통해 출력하는 것을 의미할 수 있다. 예를 들어, TV(전자 장치)는 입력 이미지를 수신하고, 수신된 입력 이미지의 해상도를 변경하여 출력 이미지를 획득하고, 획득된 출력 이미지를 TV(전자 장치)의 디스플레이에 직접 표시할 수 있다.According to an embodiment, the electronic device 100 stores the artificial intelligence model, receives input data, obtains an output image corresponding to the received input data based on the artificial intelligence model, and provides the output image to the user. can provide Here, providing the output image to the user may mean displaying the image on a display or outputting audio through a speaker. For example, the TV (electronic device) may receive an input image, obtain an output image by changing the resolution of the received input image, and may directly display the obtained output image on the display of the TV (electronic device).

다른 실시 예에 따라, 전자 장치(100)는 인공 지능 모델을 저장하고 있으며, 입력 데이터를 수신하고, 인공 지능 모델에 기초하여 수신된 입력 데이터에 대응되는 출력 데이터를 획득하고, 획득된 출력 데이터를 외부 장치에 전송할 수 있다. 여기서, 외부 장치는 TV, 스마트폰 등 사용자의 단말 장치일 수 있다. 전자 장치(100)는 출력 이미지를 획득하기 위한 장치일 뿐, 실질적으로 출력 이미지를 사용자에게 제공하는 동작은 외부 장치에서 수행될 수 있다. 예를 들어, TV(외부 장치)는 입력 이미지를 수신하고, 수신된 입력 이미지를 인공 지능 서버(전자 장치)에 전송할 수 있다. 인공 지능 서버(전자 장치)는 수신된 입력 이미지를 출력 이미지로 변환하고, 변환된 출력 이미지를 다시 TV(외부 장치)에 전송할 수 있다.According to another embodiment, the electronic device 100 stores the artificial intelligence model, receives input data, obtains output data corresponding to the received input data based on the artificial intelligence model, and uses the obtained output data. It can be transmitted to an external device. Here, the external device may be a user terminal device such as a TV or a smart phone. The electronic device 100 is only a device for acquiring an output image, and an operation of substantially providing the output image to a user may be performed by an external device. For example, a TV (external device) may receive an input image and transmit the received input image to an artificial intelligence server (electronic device). The artificial intelligence server (electronic device) may convert the received input image into an output image, and transmit the converted output image back to the TV (external device).

또 다른 실시 예에 따라, 전자 장치(100) 및 외부 장치가 함께 출력 데이터를 획득할 수 있다. 출력 데이터를 획득하기 위해 복수의 동작이 필요하다고 가정한다. 여기서, 복수의 동작 중 일부는 전자 장치(100)에서 수행되고, 복수의 동작 중 나머지는 외부 장치에서 수행될 수 있다. 따라서, 출력 데이터를 획득하기 위해선 인공 지능 모델을 저장하고 있는 전자 장치(100)뿐 아니라 외부 장치도 필요할 수 있다. 예를 들어, TV(외부 장치)는 입력 이미지를 수신하고, 수신된 입력 이미지를 복수의 영역으로 구분할 수 있다. 그리고, TV(외부 장치)는 구분된 복수의 영역 각각에 대응되는 스타일 정보를 획득할 수 있다. 그리고, TV(외부 장치)는 입력 이미지 및 복수의 영역 각각에 대응되는 스타일 정보를 인공 지능 서버(전자 장치)에 전송할 수 있다. 여기서, 인공 지능 서버(전자 장치)는 TV(외부 장치)로부터 수신한 입력 이미지 및 복수의 영역 각각에 대응되는 스타일 정보에 기초하여 출력 이미지를 획득할 수 있다. 그리고, 인공 지능 서버(전자 장치)는 출력 이미지를 TV(외부 장치)에 전송할 수 있다. 그리고, TV(외부 장치)는 인공 지능 서버(전자 장치)로부터 수신된 출력 이미지를 표시할 수 있다. 구체적으로, TV(외부 장치)는 스타일 식별 모델(102)을 저장하고 있고, 인공 지능 서버(전자 장치)는 이미지 변환 모델(101)을 저장할 수 있다.According to another embodiment, the electronic device 100 and the external device may acquire output data together. Assume that a plurality of operations are required to obtain the output data. Here, some of the plurality of operations may be performed by the electronic device 100 , and others of the plurality of operations may be performed by an external device. Accordingly, in order to obtain the output data, not only the electronic device 100 storing the artificial intelligence model but also an external device may be required. For example, a TV (external device) may receive an input image and divide the received input image into a plurality of regions. In addition, the TV (external device) may acquire style information corresponding to each of a plurality of divided regions. In addition, the TV (external device) may transmit the input image and style information corresponding to each of the plurality of regions to the artificial intelligence server (electronic device). Here, the artificial intelligence server (electronic device) may acquire an output image based on an input image received from a TV (external device) and style information corresponding to each of the plurality of regions. And, the artificial intelligence server (electronic device) may transmit the output image to the TV (external device). In addition, the TV (external device) may display the output image received from the artificial intelligence server (electronic device). Specifically, the TV (external device) may store the style identification model 102 , and the artificial intelligence server (electronic device) may store the image conversion model 101 .

한편, 상술한 설명에서는 전자 장치(100)가 텍스처 처리를 수행하여 고해상도 이미지를 획득하는 것으로 기재하였다. 하지만, 구현 예에 따라, 전자 장치(100)는 텍스처 처리 이외의 다양한 처리 동작을 수행할 수 있다.Meanwhile, in the above description, it has been described that the electronic device 100 performs texture processing to obtain a high-resolution image. However, according to an implementation example, the electronic device 100 may perform various processing operations other than texture processing.

한편, 상술한 설명에서는 텍스처 처리를 수행한 이후에 업스케일링 동작을 수행하는 것으로 기재하였다. 하지만, 구현 예에 따라, 전자 장치(100)는 업스케일링 동작을 먼저 수행하고, 업스케일링된 이미지에 텍스처 처리를 수행할 수 있다. 여기서, 업스케일링 동작은 복수의 영역 각각에 적용될 수 있으며, 텍스처 처리도 업스케일링된 복수의 영역 각각에 적용될 수 있다.Meanwhile, in the above description, it has been described that the upscaling operation is performed after the texture processing is performed. However, according to an implementation example, the electronic device 100 may first perform an upscaling operation and then perform texture processing on the upscaled image. Here, the upscaling operation may be applied to each of the plurality of regions, and texture processing may also be applied to each of the plurality of upscaled regions.

한편, 이상에서는 전자 장치(100)를 구성하는 간단한 구성에 대해서만 도시하고 설명하였지만, 구현 시에는 다양한 구성이 추가로 구비될 수 있다. 이에 대해서는 도 4를 참조하여 이하에서 설명한다.Meanwhile, although only a simple configuration constituting the electronic device 100 has been illustrated and described above, various configurations may be additionally provided when implemented. This will be described below with reference to FIG. 4 .

도 4는 도 3의 전자 장치의 구체적인 구성을 설명하기 위한 블록도이다.FIG. 4 is a block diagram illustrating a detailed configuration of the electronic device of FIG. 3 .

도 4를 참조하면, 전자 장치(100)는 메모리(110), 프로세서(120), 통신 인터페이스(130), 디스플레이(140), 조작 인터페이스(150) 및 입출력 인터페이스(160)로 구성될 수 있다.Referring to FIG. 4 , the electronic device 100 may include a memory 110 , a processor 120 , a communication interface 130 , a display 140 , an operation interface 150 , and an input/output interface 160 .

한편, 메모리(110) 및 프로세서(120)의 동작 중에서 앞서 설명한 것과 동일한 동작에 대해서는 중복 설명은 생략한다.Meanwhile, redundant descriptions of the same operations as those described above among the operations of the memory 110 and the processor 120 will be omitted.

통신 인터페이스(130)는 다양한 유형의 통신 방식에 따라 다양한 유형의 외부 장치와 통신을 수행하는 구성이다. 통신 인터페이스(130)는 와이파이 모듈, 블루투스 모듈, 적외선 통신 모듈 및 무선 통신 모듈 등을 포함한다. 여기서, 각 통신 모듈은 적어도 하나의 하드웨어 칩 형태로 구현될 수 있다.The communication interface 130 is configured to communicate with various types of external devices according to various types of communication methods. The communication interface 130 includes a Wi-Fi module, a Bluetooth module, an infrared communication module, and a wireless communication module. Here, each communication module may be implemented in the form of at least one hardware chip.

통신 인터페이스(130)는 와이파이 모듈, 블루투스 모듈은 각각 와이파이 방식, 블루투스 방식으로 통신을 수행할 수 있다. 와이파이 모듈이나 블루투스 모듈을 이용하는 경우에는SSID 및 세션 키 등과 같은 각종 연결 정보를 먼저 송수신하여, 이를 이용하여 통신 연결한 후 각종 정보들을 송수신할 수 있다.The communication interface 130 may communicate with a Wi-Fi module and a Bluetooth module using a Wi-Fi method and a Bluetooth method, respectively. In the case of using a Wi-Fi module or a Bluetooth module, various types of connection information such as an SSID and a session key are first transmitted and received, and various types of information can be transmitted/received after communication connection using this.

적외선 통신 모듈은 가시 광선과 밀리미터파 사이에 있는 적외선을 이용하여 근거리에 무선으로 데이터를 전송하는 적외선 통신(IrDA, infrared Data Association)기술에 따라 통신을 수행한다.The infrared communication module communicates according to the infrared data association (IrDA) technology, which wirelessly transmits data in a short distance using infrared that is between visible light and millimeter wave.

무선 통신 모듈은 상술한 통신 방식 이외에 지그비(zigbee), 3G(3rd Generation), 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), LTE-A(LTE Advanced), 4G(4th Generation), 5G(5th Generation)등과 같은 다양한 무선 통신 규격에 따라 통신을 수행하는 적어도 하나의 통신 칩을 포함할 수 있다.In addition to the above-described communication methods, the wireless communication module includes Zigbee, 3rd Generation (3G), 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), LTE Advanced (LTE-A), 4th Generation (4G), 5G It may include at least one communication chip that performs communication according to various wireless communication standards such as (5th Generation).

그 밖에 통신 인터페이스(130)는LAN(Local Area Network) 모듈, 이더넷 모듈, 페어 케이블, 동축 케이블, 광섬유 케이블 또는 UWB(Ultra Wide-Band) 모듈 등을 이용하여 통신을 수행하는 유선 통신 모듈 중 적어도 하나를 포함할 수 있다.In addition, the communication interface 130 is at least one of a wired communication module for performing communication using a LAN (Local Area Network) module, an Ethernet module, a pair cable, a coaxial cable, an optical fiber cable, or an Ultra Wide-Band (UWB) module. may include

일 예에 따라 통신 인터페이스(130)는 원격 제어 장치와 같은 외부 장치 및 외부 서버와 통신하기 위해 동일한 통신 모듈(예를 들어, 와이파이 모듈)을 이용할 수 있다.According to an example, the communication interface 130 may use the same communication module (eg, a Wi-Fi module) to communicate with an external device such as a remote control device and an external server.

다른 예에 따라 통신 인터페이스(130)는 원격 제어 장치와 같은 외부 장치 및 외부 서버와 통신하기 위해 상이한 통신 모듈을 이용할 수 있다. 예를 들어, 통신 인터페이스(130)는 외부 서버와 통신하기 위해 이더넷 모듈 또는 와이파이 모듈 중 적어도 하나를 이용할 수 있고, 원격 제어 장치와 같은 외부 장치와 통신하기 위해 블루투스 모듈을 이용할 수도 있다. 다만 이는 일 실시 예에 불과하며 통신 인터페이스(130)는 복수의 외부 장치 또는 외부 서버와 통신하는 경우 다양한 통신 모듈 중 적어도 하나의 통신 모듈을 이용할 수 있다.According to another example, the communication interface 130 may use different communication modules to communicate with an external device such as a remote control device and an external server. For example, the communication interface 130 may use at least one of an Ethernet module or a Wi-Fi module to communicate with an external server, and may use a Bluetooth module to communicate with an external device such as a remote control device. However, this is only an embodiment, and when communicating with a plurality of external devices or external servers, the communication interface 130 may use at least one communication module among various communication modules.

디스플레이(140)는 LCD(Liquid Crystal Display), OLED(Organic Light Emitting Diodes) 디스플레이, PDP(Plasma Display Panel) 등과 같은 다양한 형태의 디스플레이로 구현될 수 있다. 디스플레이(140)내에는 a-si TFT, LTPS(low temperature poly silicon) TFT, OTFT(organic TFT) 등과 같은 형태로 구현될 수 있는 구동 회로, 백라이트 유닛 등도 함께 포함될 수 있다. 한편, 디스플레이(140)는 터치 센서와 결합된 터치 스크린, 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display) 등으로 구현될 수 있다.The display 140 may be implemented as various types of displays, such as a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display panel (PDP), and the like. The display 140 may also include a driving circuit, a backlight unit, and the like, which may be implemented in the form of an a-si TFT, a low temperature poly silicon (LTPS) TFT, or an organic TFT (OTFT). Meanwhile, the display 140 may be implemented as a touch screen combined with a touch sensor, a flexible display, a three-dimensional display, or the like.

또한, 본 개시의 일 실시 예에 따른, 디스플레이(140)는 영상을 출력하는 디스플레이 패널뿐만 아니라, 디스플레이 패널을 하우징하는 베젤을 포함할 수 있다. 특히, 본 개시의 일 실시 예에 따른, 베젤은 사용자 인터렉션을 감지하기 위한 터치 센서(미도시)를 포함할 수 있다.Also, according to an embodiment of the present disclosure, the display 140 may include a bezel housing the display panel as well as a display panel for outputting an image. In particular, according to an embodiment of the present disclosure, the bezel may include a touch sensor (not shown) for detecting user interaction.

일 실시 예에 따라, 전자 장치(100)는 디스플레이(140)를 포함할 수 있다. 구체적으로, 전자 장치(100)는 획득된 이미지 또는 컨텐츠를 디스플레이(140)에 직접 표시할 수 있다.According to an embodiment, the electronic device 100 may include a display 140 . Specifically, the electronic device 100 may directly display the acquired image or content on the display 140 .

한편, 다른 실시 예에 따라, 전자 장치(100)는 디스플레이(140)를 포함하지 않을 수 있다. 전자 장치(100)는 외부 디스플레이 장치와 연결될 수 있으며, 전자 장치(100)에 저장된 이미지 또는 컨텐츠를 외부 디스플레이 장치에 전송할 수 있다. 구체적으로, 전자 장치(100)는 외부 디스플레이 장치에서 이미지 또는 컨텐츠가 표시되도록 제어하기 위한 제어 신호와 함께 이미지 또는 컨텐츠를 외부 디스플레이 장치에 전송할 수 있다. 여기서, 외부 디스플레이 장치는 전자 장치(100)와 통신 인터페이스(130) 또는 입출력 인터페이스(160)를 통해 연결될 수 있다. 예를 들어, 전자 장치(100)는 STB(Set Top Box)와 같이 디스플레이를 포함하지 않을 수 있다. 또한, 전자 장치(100)는 텍스트 정보 등의 간단한 정보만을 표시할 수 있는 소형 디스플레이만을 포함할 수 있다. 여기서, 전자 장치(100)는 이미지 또는 컨텐츠를 통신 인터페이스(130)를 통해 유선 또는 무선으로 외부 디스플레이 장치에 전송하거나 입출력 인터페이스(160)를 통해 외부 디스플레이 장치에 전송할 수 있다.Meanwhile, according to another embodiment, the electronic device 100 may not include the display 140 . The electronic device 100 may be connected to an external display device, and may transmit an image or content stored in the electronic device 100 to the external display device. Specifically, the electronic device 100 may transmit the image or content to the external display device together with a control signal for controlling the external display device to display the image or content. Here, the external display device may be connected to the electronic device 100 through the communication interface 130 or the input/output interface 160 . For example, the electronic device 100 may not include a display such as a set top box (STB). Also, the electronic device 100 may include only a small display capable of displaying only simple information such as text information. Here, the electronic device 100 may transmit the image or content to the external display device through the communication interface 130 by wire or wirelessly or to the external display device through the input/output interface 160 .

조작 인터페이스(150)는 버튼, 터치 패드, 마우스 및 키보드와 같은 장치로 구현되거나, 상술한 디스플레이 기능 및 조작 입력 기능도 함께 수행 가능한 터치 스크린으로도 구현될 수 있다. 여기서, 버튼은 전자 장치(100)의 본체 외관의 전면부나 측면부, 배면부 등의 임의의 영역에 형성된 기계적 버튼, 터치 패드, 휠 등과 같은 다양한 유형의 버튼이 될 수 있다.The manipulation interface 150 may be implemented as a device such as a button, a touch pad, a mouse, and a keyboard, or may be implemented as a touch screen capable of performing the above-described display function and manipulation input function together. Here, the button may be various types of buttons such as a mechanical button, a touch pad, a wheel, etc. formed in an arbitrary area such as the front, side, or rear of the exterior of the main body of the electronic device 100 .

입출력 인터페이스(160)는 HDMI(High Definition Multimedia Interface), MHL (Mobile High-Definition Link), USB (Universal Serial Bus), DP(Display Port), 썬더볼트(Thunderbolt), VGA(Video Graphics Array)포트, RGB 포트, D-SUB(D-subminiature), DVI(Digital Visual Interface) 중 어느 하나의 인터페이스일 수 있다.Input/output interface 160 is HDMI (High Definition Multimedia Interface), MHL (Mobile High-Definition Link), USB (Universal Serial Bus), DP (Display Port), Thunderbolt (Thunderbolt), VGA (Video Graphics Array) port, The interface may be any one of an RGB port, a D-subminiature (D-SUB), and a digital visual interface (DVI).

입출력 인터페이스(160)는 오디오 및 비디오 신호 중 적어도 하나를 입출력 할 수 있다. 구현 예에 따라, 입출력 인터페이스(160)는 오디오 신호만을 입출력하는 포트와 비디오 신호만을 입출력하는 포트를 별개의 포트로 포함하거나, 오디오 신호 및 비디오 신호를 모두 입출력하는 하나의 포트로 구현될 수 있다.The input/output interface 160 may input/output at least one of audio and video signals. Depending on the implementation, the input/output interface 160 may include a port for inputting and outputting only an audio signal and a port for inputting and outputting only a video signal as separate ports, or may be implemented as a single port for inputting and outputting both an audio signal and a video signal.

한편, 전자 장치(100)는 입출력 인터페이스(160)를 통해 오디오 및 비디오 신호 중 적어도 하나를 외부 장치(예를 들어, 외부 디스플레이 장치 또는 외부 스피커)에 전송할 수 있다. 구체적으로, 입출력 인터페이스(160)에 포함된 출력 포트가 외부 장치와 연결될 수 있으며, 전자 장치(100)는 오디오 및 비디오 신호 중 적어도 하나를 출력 포트를 통해 외부 장치에 전송할 수 있다.Meanwhile, the electronic device 100 may transmit at least one of audio and video signals to an external device (eg, an external display device or an external speaker) through the input/output interface 160 . Specifically, an output port included in the input/output interface 160 may be connected to an external device, and the electronic device 100 may transmit at least one of an audio and video signal to the external device through the output port.

전자 장치(100)는 마이크(170)를 더 포함할 수 있다. 마이크는 사용자 음성이나 기타 소리를 입력 받아 오디오 데이터로 변환하기 위한 구성이다.The electronic device 100 may further include a microphone 170 . A microphone is a component for receiving a user's voice or other sound and converting it into audio data.

마이크(170)는 활성화 상태에서 사용자의 음성을 수신할 수 있다. 예를 들어, 마이크(170)는 전자 장치(100)의 상측이나 전면 방향, 측면 방향 등에 일체형으로 형성될 수 있다. 마이크(170)는 아날로그 형태의 사용자 음성을 수집하는 마이크, 수집된 사용자 음성을 증폭하는 앰프 회로, 증폭된 사용자 음성을 샘플링하여 디지털 신호로 변환하는 A/D 변환회로, 변환된 디지털 신호로부터 노이즈 성분을 제거하는 필터 회로 등과 같은 다양한 구성을 포함할 수 있다.The microphone 170 may receive a user's voice in an activated state. For example, the microphone 170 may be integrally formed in the upper side, the front direction, the side direction, or the like of the electronic device 100 . The microphone 170 includes a microphone for collecting analog user voice, an amplifier circuit for amplifying the collected user voice, an A/D conversion circuit for sampling the amplified user voice and converting it into a digital signal, and a noise component from the converted digital signal. It may include various configurations such as a filter circuit that removes the

전자 장치(100)가 마이크(170)를 통해 수신된 사용자 음성 신호에 대응되는 동작을 수행하는 다양한 실시 예가 있을 수 있다.There may be various embodiments in which the electronic device 100 performs an operation corresponding to a user voice signal received through the microphone 170 .

일 예로, 전자 장치(100)는 마이크(170)를 통해 수신된 사용자 음성 신호에 기초하여 디스플레이(140)를 제어할 수 있다. 예를 들어, A 컨텐츠를 표시하기 위한 사용자 음성 신호가 수신되면, 전자 장치(100)는 A컨텐츠를 표시하도록 디스플레이(140)를 제어할 수 있다.For example, the electronic device 100 may control the display 140 based on a user voice signal received through the microphone 170 . For example, when a user voice signal for displaying content A is received, the electronic device 100 may control the display 140 to display content A.

다른 예로, 전자 장치(100)는 마이크(170)를 통해 수신된 사용자 음성 신호에 기초하여 전자 장치(100)와 연결된 외부 디스플레이 장치를 제어할 수 있다. 구체적으로, 전자 장치(100)는 사용자 음성 신호에 대응되는 동작이 외부 디스플레이 장치에서 수행되도록 외부 디스플레이 장치를 제어하기 위한 제어 신호를 생성하고, 생성된 제어 신호를 외부 디스플레이 장치에 전송할 수 있다. 여기서, 전자 장치(100)는 외부 디스플레이 장치를 제어하기 위한 원격 제어 어플리케이션을 저장할 수 있다. 그리고, 전자 장치(100)는 생성된 제어 신호를 블루투스, 와이파이 또는 적외선 중 적어도 하나의 통신 방법을 이용하여 외부 디스플레이 장치에 전송할 수 있다. 예를 들어, A 컨텐츠를 표시하기 위한 사용자 음성 신호가 수신되면, 전자 장치(100)는 A 컨텐츠가 외부 디스플레이 장치에서 표시되도록 제어하기 위한 제어 신호를 외부 디스플레이 장치에 전송할 수 있다. 여기서, 전자 장치(100)는 스마트폰, AI 스피커 등 원격 제어 어플리케이션을 설치할 수 있는 다양한 단말 장치를 의미할 수 있다.As another example, the electronic device 100 may control an external display device connected to the electronic device 100 based on a user voice signal received through the microphone 170 . Specifically, the electronic device 100 may generate a control signal for controlling the external display device so that an operation corresponding to the user's voice signal is performed on the external display device, and transmit the generated control signal to the external display device. Here, the electronic device 100 may store a remote control application for controlling the external display device. In addition, the electronic device 100 may transmit the generated control signal to the external display device using at least one communication method among Bluetooth, Wi-Fi, and infrared. For example, when a user voice signal for displaying content A is received, the electronic device 100 may transmit a control signal for controlling content A to be displayed on the external display device to the external display device. Here, the electronic device 100 may refer to various terminal devices in which a remote control application, such as a smart phone or an AI speaker, can be installed.

또 다른 예로, 전자 장치(100)는 마이크(170)를 통해 수신된 사용자 음성 신호에 기초하여 전자 장치(100)와 연결된 외부 디스플레이 장치를 제어하기 위하여 원격 제어 장치를 이용할 수 있다. 구체적으로, 전자 장치(100)는 사용자 음성 신호에 대응되는 동작이 외부 디스플레이 장치에서 수행되도록 외부 디스플레이 장치를 제어하기 위한 제어 신호를 원격 제어 장치에 전송할 수 있다. 그리고, 원격 제어 장치는 전자 장치(100)로부터 수신된 제어 신호를 외부 디스플레이 장치에 전송할 수 있다. 예를 들어, A 컨텐츠를 표시하기 위한 사용자 음성 신호가 수신되면, 전자 장치(100)는 A 컨텐츠가 외부 디스플레이 장치에서 표시되도록 제어하기 위한 제어 신호를 원격 제어 장치에 전송하고, 원격 제어 장치는 수신된 제어 신호를 외부 디스플레이 장치에 전송할 수 있다.As another example, the electronic device 100 may use the remote control device to control an external display device connected to the electronic device 100 based on a user voice signal received through the microphone 170 . Specifically, the electronic device 100 may transmit a control signal for controlling the external display device to the remote control device so that an operation corresponding to the user voice signal is performed on the external display device. In addition, the remote control device may transmit a control signal received from the electronic device 100 to the external display device. For example, when a user voice signal for displaying content A is received, the electronic device 100 transmits a control signal for controlling content A to be displayed on the external display device to the remote control device, and the remote control device receives control signal may be transmitted to an external display device.

도 5는 일 실시 예에 따른 이미지 변환 동작을 설명하기 위한 흐름도이다.5 is a flowchart illustrating an image conversion operation according to an exemplary embodiment.

도 5를 참조하면, 전자 장치(100)는 입력 이미지(10)를 획득할 수 있다 (S505). 그리고, 전자 장치(100)는 입력 이미지(10)를 복수의 영역으로 구분할 수 있다 (S510). 여기서, 전자 장치(100)는 입력 이미지(10)에서 식별되는 오브젝트에 기초하여 입력 이미지(10)를 복수의 영역으로 구분할 수 있다. 예를 들어, 전자 장치(100)는 입력 이미지(10)에 배경 오브젝트 및 사물 오브젝트를 구분하고, 배경 오브젝트가 포함된 영역과 사물 오브젝트가 포함된 영역을 식별할 수 있다.Referring to FIG. 5 , the electronic device 100 may acquire an input image 10 ( S505 ). Then, the electronic device 100 may divide the input image 10 into a plurality of regions (S510). Here, the electronic device 100 may divide the input image 10 into a plurality of regions based on the object identified in the input image 10 . For example, the electronic device 100 may classify a background object and an object object in the input image 10 , and may identify a region including the background object and a region including the object object.

또한, 전자 장치(100)는 구분된 복수의 영역 각각에 대응되는 특성 정보를 획득할 수 있다 (S515). 여기서, 특성 정보는 복수의 영역에 포함된 오브젝트에 대응되는 특성을 포함할 수 있다. 예를 들어, 특성 정보는 배경 오브젝트와 관련된 정보 또는 사물 오브젝트와 관련된 정보일 수 있다.Also, the electronic device 100 may obtain characteristic information corresponding to each of the plurality of divided regions (S515). Here, the characteristic information may include characteristics corresponding to objects included in the plurality of regions. For example, the characteristic information may be information related to a background object or information related to a thing object.

또한, 전자 장치(100)는 특성 정보에 대응되는 스타일 정보를 복수의 영역별로 식별할 수 있다 (S520). 여기서, 스타일 정보는 이미지 변환 동작에 이용되는 스타일을 의미할 수 있다. 예를 들어, 스타일 정보는 질감, 선명도, 명암 중 적어도 하나를 의미할 수 있다. 예를 들어, 스타일 정보는 질감(텍스처)의 강도, 선명도의 강도, 명암의 강도 중 적어도 하나를 포함할 수 있다. 예를 들어, 제1 스타일 정보는 제1 강도로 텍스처 처리를 수행해야 함을 나타내는 정보를 포함할 수 있으며, 제2 스타일 정보는 제2 강도로 텍스처 처리를 수행해야 함을 나타내는 정보를 포함할 수 있다.Also, the electronic device 100 may identify style information corresponding to the characteristic information for each of a plurality of regions ( S520 ). Here, the style information may mean a style used for an image conversion operation. For example, the style information may mean at least one of texture, sharpness, and contrast. For example, the style information may include at least one of a texture (texture) intensity, a sharpness intensity, and a contrast intensity. For example, the first style information may include information indicating that texturing should be performed with a first intensity, and the second style information may include information indicating that texturing should be performed with a second intensity. have.

또한, 전자 장치(100)는 식별된 스타일 정보에 기초하여 출력 이미지(20)를 획득할 수 있다 (S525). 여기서, 전자 장치(100)는 식별된 스타일 정보에 기초하여 입력 이미지(10)를 출력 이미지(20)로 변환할 수 있다.Also, the electronic device 100 may acquire the output image 20 based on the identified style information (S525). Here, the electronic device 100 may convert the input image 10 into the output image 20 based on the identified style information.

도 6은 도 5의 이미지 변환 동작을 구체적으로 설명하기 위한 흐름도이다.6 is a flowchart for describing the image conversion operation of FIG. 5 in detail.

도 6을 참조하면, 전자 장치(100)는 입력 이미지(10)를 획득할 수 있다 (S605). 그리고, 전자 장치(100)는 입력 이미지(10)를 제1 영역 및 제2 영역으로 구분할 수 있다 (S610). 그리고, 전자 장치(100)는 제1 영역에 대응되는 제1 특성 정보 및 제2 영역에 대응되는 제2 특성 정보를 획득할 수 있다 (S615).Referring to FIG. 6 , the electronic device 100 may acquire an input image 10 ( S605 ). Then, the electronic device 100 may divide the input image 10 into a first area and a second area (S610). Then, the electronic device 100 may obtain first characteristic information corresponding to the first area and second characteristic information corresponding to the second area ( S615 ).

또한, 전자 장치(100)는 제1 특성 정보에 대응되는 제1 스타일 정보를 획득하고, 제2 특성 정보에 대응되는 제2 스타일 정보를 획득(또는 식별)할 수 있다 (S620).Also, the electronic device 100 may obtain first style information corresponding to the first characteristic information and obtain (or identify) second style information corresponding to the second characteristic information (S620).

또한, 전자 장치(100)는 제1 스타일 정보에 기초하여 제1 영역에 대응되는 이미지를 고해상도 이미지로 변환하고, 제2 스타일 정보에 기초하여 제2 영역에 대응되는 이미지를 고해상도의 이미지로 변환할 수 있다 (S625).Also, the electronic device 100 converts the image corresponding to the first region into a high-resolution image based on the first style information and converts the image corresponding to the second region into a high-resolution image based on the second style information. It can be done (S625).

또한, 전자 장치(100)는 각 영역에 대응되는 고해상도 이미지들에 기초하여 출력 이미지를 획득할 수 있다 (S630). 구체적으로, 전자 장치(100)는 변환된 제1 영역에 대응되는 고해상도 이미지 및 제2 영역에 대응되는 고해상도 이미지를 결합하여 전체 이미지인 출력 이미지(20)를 획득할 수 있다.Also, the electronic device 100 may acquire an output image based on high-resolution images corresponding to each region ( S630 ). Specifically, the electronic device 100 may obtain the output image 20 that is the entire image by combining the converted high-resolution image corresponding to the first region and the high-resolution image corresponding to the second region.

도 7은 입력 이미지를 복수의 영역으로 구분하는 동작을 설명하기 위한 도면이다.7 is a diagram for explaining an operation of dividing an input image into a plurality of regions.

도 7을 참조하면, 전자 장치(100)는 입력 이미지(10)를 분석하여 입력 이미지(10)를 복수의 영역으로 구분할 수 있다.Referring to FIG. 7 , the electronic device 100 may analyze the input image 10 to divide the input image 10 into a plurality of regions.

일 실시 예에 따라, 전자 장치(100)는 입력 이미지(10)에 포함된 오브젝트를 식별할 수 있고, 식별된 오브젝트의 위치에 기초하여 입력 이미지(10)를 복수의 영역으로 구분할 수 있다.According to an embodiment, the electronic device 100 may identify an object included in the input image 10 and may divide the input image 10 into a plurality of regions based on the identified position of the object.

다른 실시 예에 따라, 전자 장치(100)는 입력 이미지(10)에 포함된 외곽선(또는 경계선)을 식별하고, 식별된 외곽선(또는 경계선)에 기초하여 입력 이미지(10)를 복수의 영역으로 구분할 수 있다.According to another embodiment, the electronic device 100 identifies an outline (or boundary line) included in the input image 10 and divides the input image 10 into a plurality of regions based on the identified outline (or boundary line). can

입력 이미지(10)를 복수의 영역으로 구분하는 이유는 영역 별로 다른 스타일의 이미지 변환을 수행하기 위함이다. 예를 들어, 배경 영역은 질감 효과가 거의 필요하지 않고 사물 영역은 질감 효과가 필요할 수 있다. 여기서, 배경 영역에도 질감 효과를 추가하지 않아야 더 높은 완성도를 갖는 이미지를 획득할 수 있다. 따라서, 전자 장치(100)는 서로 다른 영역에 서로 다른 스타일의 이미지 변환 동작을 수행할 수 있다.The reason for dividing the input image 10 into a plurality of regions is to perform image conversion of a different style for each region. For example, a background area may hardly need a texture effect and an object area may need a texture effect. Here, an image having a higher degree of completeness can be obtained without adding a texture effect even to the background area. Accordingly, the electronic device 100 may perform image conversion operations of different styles in different regions.

여기서, 전자 장치(100)는 입력 이미지(10)의 전체 영역을 제1 영역(710) 및 제2 영역(720)으로 구분할 수 있다.Here, the electronic device 100 may divide the entire area of the input image 10 into a first area 710 and a second area 720 .

한편, 구현 예에 따라, 전자 장치(100)는 제1 영역(710)을 식별한 이후, 제1 영역(710) 이외의 영역을 제2 영역(720)으로 식별할 수 있다.Meanwhile, according to an embodiment, after identifying the first area 710 , the electronic device 100 may identify an area other than the first area 710 as the second area 720 .

도 8은 이미지 변환 모델이 학습되는 동작을 설명하기 위한 흐름도이다.8 is a flowchart illustrating an operation in which an image transformation model is learned.

도 8을 참조하면, 전자 장치(100)는 저해상 이미지 및 고해상 이미지를 획득할 수 있다 (S805). 여기서, 저해상 이미지는 이미지 변환 동작에서 학습 입력 이미지로 이용될 수 있다. 그리고, 고해상 이미지는 이미지 변환 동작에서 손실값을 획득하는 기준 이미지로 이용될 수 있다.Referring to FIG. 8 , the electronic device 100 may acquire a low-resolution image and a high-resolution image (S805). Here, the low-resolution image may be used as a learning input image in the image conversion operation. In addition, the high-resolution image may be used as a reference image for obtaining a loss value in an image conversion operation.

또한, 전자 장치(100)는 저해상 이미지를 복수의 영역으로 구분할 수 있다 (S810). 그리고, 전자 장치(100)는 복수의 영역 각각에 대응되는 특성 정보를 획득할 수 있다 (S815). 그리고, 전자 장치(100)는 특성 정보에 대응되는 스타일 정보를 복수의 영역별로 식별할 수 있다 (S820).Also, the electronic device 100 may divide the low-resolution image into a plurality of regions (S810). Then, the electronic device 100 may obtain characteristic information corresponding to each of the plurality of regions (S815). Then, the electronic device 100 may identify style information corresponding to the characteristic information for each of a plurality of regions (S820).

또한, 전자 장치(100)는 S820 단계에서 식별된 스타일 정보에 기초하여 저해상 이미지를 출력 이미지로 변환할 수 있다. 여기서, 출력 이미지로 변환되는 동작은 S810단계에서 구분된 영역별로 수행될 수 있다. 여기서, 출력 이미지는 이미지 변환 모델(101)의 목적에 부합하는 추정 이미지일 수 있다. 이미지 변환 모델(101)이 이미지의 초해상화를 위한 모델이라면, 출력 이미지는 이미지 변환 모델(101)에 의해 생성된 고해상도의 이미지일 수 있다.Also, the electronic device 100 may convert the low-resolution image into an output image based on the style information identified in step S820 . Here, the operation of converting the output image may be performed for each area divided in step S810. Here, the output image may be an estimated image that meets the purpose of the image transformation model 101 . If the image conversion model 101 is a model for super-resolution of an image, the output image may be a high-resolution image generated by the image conversion model 101 .

여기서, 전자 장치(100)는 기 저장된 손실 함수에 기초하여 S825단계에서 획득한 출력 이미지 및 S805단계에서 획득한 고해상 이미지 사이의 손실값을 획득할 수 있다. 고해상 이미지는 이미 해상도가 높은 정답 이미지에 해당하므로, 이미지 변환 모델(101)을 통한 이미지 변환이 얼마나 정확히 이루어졌는지 기준이 될 수 있다. 따라서, 전자 장치(100)는 이미지 변환 모델(101)에 의해 획득된 출력 이미지(20)가 고해상 이미지와 얼마나 일치하는지 판단하기 위하여 손실값을 획득할 수 있다.Here, the electronic device 100 may acquire a loss value between the output image acquired in step S825 and the high-resolution image acquired in step S805 based on the pre-stored loss function. Since the high-resolution image already corresponds to a high-resolution correct answer image, it may be a criterion for how accurately image conversion is performed through the image conversion model 101 . Accordingly, the electronic device 100 may acquire a loss value in order to determine how much the output image 20 obtained by the image conversion model 101 matches the high-resolution image.

여기서, 전자 장치(100)는 손실값을 최소로 하기 위한 스타일 식별 모델(102)의 이미지 변환 파라미터를 결정할 수 있다 (S835).Here, the electronic device 100 may determine an image conversion parameter of the style identification model 102 for minimizing the loss ( S835 ).

그리고, 전자 장치(100)는 학습을 종료시킬 것인지 식별할 수 있다 (S840). 학습이 종료되었다고 판단되면 (S840-Y), 전자 장치(100)는 S835 단계에서 결정된 파라미터를 확정할 수 있다. 여기서, 학습 종료를 식별하는 기준은 손실값일 수 있다. 예를 들어, 손실값이 임계값 미만이면, 전자 장치(100)는 학습을 종료할 수 있다.Then, the electronic device 100 may identify whether to end the learning (S840). If it is determined that the learning is finished (S840-Y), the electronic device 100 may confirm the parameter determined in step S835. Here, the criterion for identifying the end of learning may be a loss value. For example, when the loss value is less than the threshold value, the electronic device 100 may end learning.

학습이 종료되지 않은 것으로 식별되면 (S840-N), 전자 장치(100)는 새로운 저해상 이미지 및 새로운 고해상 이미지를 획득하여 S805 내지 S840 단계를 반복할 수 있다.If it is identified that learning is not finished (S840-N), the electronic device 100 may repeat steps S805 to S840 by acquiring a new low-resolution image and a new high-resolution image.

한편, 구현 예에 따라, 전자 장치(100)는 동일한 저해상 이미지 및 동일한 고해상 이미지를 변경된 파라미터에 의해 이미지 변환 동작을 수행할 수 있다. 구체적으로, 전자 장치(100)는 이미지 변환 모델(101)에 이용되었던 파라미터를 변경할 수 있다. 그리고, 전자 장치(100)는 변경된 파라미터에 기초하여 다시 S805 내지 S840 단계를 수행할 수 있다.Meanwhile, according to an embodiment, the electronic device 100 may perform an image conversion operation on the same low-resolution image and the same high-resolution image according to changed parameters. Specifically, the electronic device 100 may change the parameter used in the image conversion model 101 . Then, the electronic device 100 may perform steps S805 to S840 again based on the changed parameter.

도 9는 본 개시의 일 실시 예에 따른 이미지 변환 모델을 설명하기 위한 도면이다.9 is a view for explaining an image conversion model according to an embodiment of the present disclosure.

도 9를 참조하면, 전자 장치(100)는 이미지 변환 모델(101), 스타일 식별 모델(102) 및 손실값 계산 모델(103)을 포함할 수 있다. 여기서, 이미지 변환 모델(101)은 입력 이미지(10)를 획득하여 출력 이미지(20)로 변환할 수 있다. 여기서, 스타일 식별 모델(102)은 입력 이미지(10)에 대응되는 스타일 정보를 결정하고, 결정된 스타일 정보를 스타일 식별 모델(102)에 전송할 수 있다. 이미지 변환 모델(101)은 스타일 식별 모델(102)에서 전송한 스타일 정보에 기초하여 입력 이미지(10)를 출력 이미지(20)로 변환할 수 있다.Referring to FIG. 9 , the electronic device 100 may include an image transformation model 101 , a style identification model 102 , and a loss value calculation model 103 . Here, the image conversion model 101 may obtain the input image 10 and convert it into the output image 20 . Here, the style identification model 102 may determine style information corresponding to the input image 10 , and transmit the determined style information to the style identification model 102 . The image conversion model 101 may convert the input image 10 into the output image 20 based on the style information transmitted from the style identification model 102 .

여기서, 손실값 계산 모델(103)은 출력 이미지(20) 및 기준 이미지(30)에 기초하여 손실값을 획득할 수 있다. 여기서, 기준 이미지(30)는 입력 이미지(10)의 고해상 이미지일 수 있다. 손실값 계산 모델(103)은 손실 함수(901)를 포함할 수 있다. 여기서, 손실값 계산 모델(103)은 손실 함수(901)에 기초하여 출력 이미지(20) 및 기준 이미지(30) 사이의 손실값을 계산할 수 있다.Here, the loss value calculation model 103 may obtain a loss value based on the output image 20 and the reference image 30 . Here, the reference image 30 may be a high-resolution image of the input image 10 . The loss value calculation model 103 may include a loss function 901 . Here, the loss value calculation model 103 may calculate a loss value between the output image 20 and the reference image 30 based on the loss function 901 .

여기서, 손실값은 평균 제곱 오차(Mean Squared Error, MSE), 인지 손실 또는 적대 손실 중 적어도 하나를 포함할 수 있다.Here, the loss value may include at least one of a mean squared error (MSE), a cognitive loss, or an adversarial loss.

여기서, 전자 장치(100)는 사전 학습된 분류 네트워크의 레이어 결과를 특징 공간으로 이용하여 인지 손실을 획득할 수 있다. 그리고, 전자 장치(100)는 특징 공간에서의 에러(거리)를 감소함으로써 이미지 변환 모델(101)을 최적화할 수 있다. 이미지 변환 모델(101)에 포함된 각 레이어는 영상의 복원 스타일 또는 텍스쳐 생성 모양에 따라 상이하게 배치될 수 있다.Here, the electronic device 100 may acquire a cognitive loss by using the layer result of the pre-learned classification network as a feature space. In addition, the electronic device 100 may optimize the image transformation model 101 by reducing an error (distance) in the feature space. Each layer included in the image conversion model 101 may be differently disposed according to a reconstruction style of an image or a texture generation shape.

또한, 손실값 계산 모델(103)은 스타일 식별 모델(102)로부터 수신되는 스타일 정보에 기초하여 다양한 종류의 손실값을 다르게 가중합할 수 있다. 그리고, 전자 장치(100)는 다양한 복원 스타일을 단일의 이미지 변환 모델(101)이 초해상화 동작에 이용할 수 있도록 학습할 수 있다.Also, the loss value calculation model 103 may differently weight and sum various types of loss values based on the style information received from the style identification model 102 . In addition, the electronic device 100 may learn to use various restoration styles for a single image transformation model 101 to perform a super-resolution operation.

도 10은 이미지 변환 동작에 이용하는 계산을 설명하기 위한 도면이다.10 is a diagram for explaining a calculation used in an image conversion operation.

도 10을 참조하면, 이미지 변환 모델(101)은 수학식(1005)에 기초하여 이미지를 변환할 수 있다. 구체적으로, I^-HR은 출력 이미지(20)를 의미할 수 있다. 그리고, G_θ는 θ를 파라미터로 하는 이미지 변환 함수를 의미할 수 있다. I-LR은 입력 이미지(10)를 의미할 수 있다. T는 스타일 정보를 의미할 수 있다. 결과적으로 출력 이미지(I^-HR)는 이미지 변환 함수(G_θ)에 의해 생성된 이미지일 수 있고, 이미지 변환 함수(G_θ)는 스타일 정보(T)에 기초하여 입력 이미지(I-LR)를 변환할 수 있다.Referring to FIG. 10 , the image conversion model 101 may convert an image based on Equation 1005 . Specifically, I^-HR may mean the output image 20 . And, G_θ may mean an image conversion function using θ as a parameter. I-LR may refer to the input image 10 . T may mean style information. As a result, the output image I^-HR may be an image generated by the image conversion function G_θ, and the image conversion function G_θ converts the input image I-LR based on the style information T can do.

여기서, 수학식(1010)은 이미지 변환 모델(101)의 θ 파라미터를 결정하기 위한 수학식일 수 있다. θ^는 최종적으로 결정된 파라미터를 의미할 수 있다. Ii^-HR은 i번째 출력 이미지(20)를 의미하고, Ii-HR은 i번째 기준 이미지(30)를 의미하고, Ti는 i번째 스타일 정보를 의미할 수 있다.Here, Equation 1010 may be an equation for determining the θ parameter of the image transformation model 101 . θ^ may mean a finally determined parameter. Ii^-HR may mean the i-th output image 20 , Ii-HR may mean the i-th reference image 30 , and Ti may mean the i-th style information.

여기서, arg min(f(θ))는 f(θ)를 최소로 만드는 θ를 결정하는 함수를 의미할 수 있다. L(a,b)는 a와 b 사이의 손실값을 계산하는 함수를 의미할 수 있다. 따라서, L(Ii^-HR, Ii-HR |Ti)은 스타일 정보(Ti)를 고려하여 출력 이미지(Ii^-HR) 및 기준 이미지(Ii-HR) 사이의 손실값을 계산하는 함수를 의미할 수 있다.Here, arg min(f(θ)) may mean a function that determines θ that minimizes f(θ). L(a,b) may mean a function for calculating a loss value between a and b. Therefore, L(Ii^-HR, Ii-HR |Ti) means a function that calculates the loss value between the output image (Ii^-HR) and the reference image (Ii-HR) considering the style information (Ti) can do.

도 11은 본 개시의 일 실시 예에 따른 학습 동작에 이용하는 계산을 설명하기 위한 도면이다.11 is a diagram for explaining a calculation used for a learning operation according to an embodiment of the present disclosure.

도 11을 참조하면, 수학식(1100)은 이미지 변환 모델(101)의 파라미터를 결정하는 학습 동작에서 손실값 계산 동작에 이용될 수 있다. 전자 장치(100)는 수학식(1100)을 이용하여 출력 이미지(20) 및 기준 이미지(30) 사이의 손실값을 획득할 수 있다. 여기서, 전자 장치(100)는 손실값 계산 모델(103)을 이용하여 손실값을 획득할 수 있다.Referring to FIG. 11 , Equation 1100 may be used for a loss value calculation operation in a learning operation for determining a parameter of the image transformation model 101 . The electronic device 100 may obtain a loss value between the output image 20 and the reference image 30 using Equation 1100 . Here, the electronic device 100 may obtain a loss value by using the loss value calculation model 103 .

수학식(1100)을 참조하면, O는 손실값 함수를 의미할 수 있다. O는 λ_rec*L_rec+λ_per*L_per+ λ_adv*L_adv일 수 있다.Referring to Equation 1100, O may mean a loss value function. O may be λ_rec*L_rec+λ_per*L_per+λ_adv*L_adv.

여기서, L_rec은 재구성 손실(reconstruction loss)을 의미하고, L_per는 인지 손실(perceptual loss)을 의미하고, L_adv는 적대 손실(adversarial loss)을 의미할 수 있다. 그리고, λ_rec는 재구성 손실의 가중치를 의미하고, λ_per는 인지 손실의 가중치를 의미하고, λ_adv는 적대 손실의 가중치를 의미할 수 있다. λ_rec, λ_per 및 λ_adv는 서로 상이한 값을 가질 수 있다. λ_rec, λ_per 및 λ_adv는 0 내지 1 사이의 값을 가질 수 있으며, λ_rec, λ_per 및 λ_adv의 합은 1일 수 있다.Here, L_rec may mean a reconstruction loss, L_per may mean a perceptual loss, and L_adv may mean an adversarial loss. And, λ_rec may mean a weight of reconstruction loss, λ_per may mean a weight of cognitive loss, and λ_adv may mean a weight of hostility loss. λ_rec, λ_per, and λ_adv may have different values. λ_rec, λ_per, and λ_adv may have a value between 0 and 1, and the sum of λ_rec, λ_per, and λ_adv may be 1.

여기서, 재구성 손실(L_rec)은 수학식(1111)에 기초하여 계산될 수 있다. 여기서, E는 평균값을 계산하는 함수를 의미할 수 있다. 여기서, Ii^-HR은i번째 출력 이미지를 의미할 수 있다. 여기서, Ii-HR은 i번째 기준 이미지일 수 있다. 여기서, ||Ii^-HR-Ii-HR||_1은 i번째 출력 이미지와 i번째 기준 이미지의 유클리디언 거리(Euclidean Distance) 차이값의 절대값들을 합한 값일 수 있다. 그리고, E[||Ii^-HR-Ii-HR||_1]은 i번째 출력 이미지와 i번째 기준이미지의 차이값들의 평균값을 의미할 수 있다.Here, the reconstruction loss (L_rec) may be calculated based on Equation (1111). Here, E may mean a function for calculating an average value. Here, Ii^-HR may mean an i-th output image. Here, Ii-HR may be an i-th reference image. Here, ||Ii^-HR-Ii-HR||_1 may be the sum of absolute values of the Euclidean distance difference values between the i-th output image and the i-th reference image. And, E[||Ii^-HR-Ii-HR||_1] may mean an average value of difference values between the i-th output image and the i-th reference image.

여기서, 인지 손실(L_per)은 수학식(1121)에 기초하여 계산될 수 있다. 여기서, l은 스타일을 의미할 수 있다. 여기서, λ_l은 스타일 정보를 적용하는 함수와 관련된 가중치일 수 있다. 여기서, w_l(T)은 스타일 정보 자체와 관련된 가중치일 수 있다. 그리고, w_l(T)은 학습 중 랜덤 변수인 스타일 정보(또는 스타일맵)에 기초하여 변경될 수 있다.Here, the cognitive loss L_per may be calculated based on Equation 1121. Here, l may mean a style. Here, λ_l may be a weight related to a function to which style information is applied. Here, w_l(T) may be a weight related to the style information itself. And, w_l(T) may be changed based on style information (or style map), which is a random variable during learning.

따라서, λ_l은 스타일 정보에 기초하여 이미지 변환 동작을 수행하는 복수의 함수 중 특정 함수에 대응되는 가중치를 의미할 수 있다. 그리고, w_l(T)은 복수의 스타일 중 특정 스타일에 대응되는 가중치를 의미할 수 있다.Accordingly, λ_l may mean a weight corresponding to a specific function among a plurality of functions that perform an image conversion operation based on style information. And, w_l(T) may mean a weight corresponding to a specific style among a plurality of styles.

여기서, 지각 손실은 서로 다른 수준의 기능 공간에서 여러 인지 손실의 가중 합계일 수 있다.Here, the perceptual loss may be a weighted sum of multiple cognitive losses at different levels of functional space.

여기서, L_l은 스타일 l의 손실값을 의미할 수 있으며 수학식(1122)에 기초하여ㅏ 계산될 수 있다. 여기서, φ_l은 스타일 l에 기초하여 이미지를 변환하는 함수를 의미할 수 있다. 여기서, φ_l(Ii^-HR)는 i번째 출력 이미지(Ii^-HR)를 스타일 l에 대응되는 특성 공간으로 변환하는 함수를 의미할 수 있다. 여기서, φ_l(Ii-HR)는 i번째 기준 이미지(Ii-HR)를 스타일 l에 대응되는 특성 공간으로 변환하는 함수를 의미할 수 있다. ||Ii^-HR-Ii-HR||_2는 i번째 출력 이미지(Ii^-HR) 및 i번째 기준 이미지(Ii-HR)의 차이값의 제곱값들을 합한값일 수 있다. 그리고, E[||Ii^-HR-Ii-HR||_2]는 i번째 출력 이미지(Ii^-HR) 및 i번째 기준 이미지(Ii-HR)의 차이값의 제곱값들의 합을 제곱근하는 함수를 의미할 수 있다.Here, L_l may mean a loss value of style l, and may be calculated based on Equation 1122. Here, φ_l may mean a function that transforms an image based on style l. Here, φ_l(Ii^-HR) may mean a function for converting the i-th output image Ii^-HR into a feature space corresponding to style l. Here, φ_l(Ii-HR) may mean a function that transforms the i-th reference image Ii-HR into a feature space corresponding to style l. ||Ii^-HR-Ii-HR||_2 may be a sum of squares of difference values between the i-th output image Ii^-HR and the i-th reference image Ii-HR. And, E[||Ii^-HR-Ii-HR||_2] is the square root of the sum of the square values of the difference values between the i-th output image (Ii^-HR) and the i-th reference image (Ii-HR). It can mean a function.

여기서, 수학식(1111)은 L1 norm을 이용하고, 수학식(1122)는 L2 norm을 이용하는 계산식일 수 있다.Here, Equation 1111 may be a calculation using the L1 norm, and Equation 1122 may be a calculation using the L2 norm.

여기서, 적대 손실(L_adv)은 수학식(1131)에 기초하여 계산될 수 있다. 여기서, E_I^-HR(f(x))은 출력 이미지들에 대한 f(x)의 평균값을 의미할 수 있다. 여기서, f(x)는 log(D~(I^-HR)일 수 있다. 여기서, E_I-HR(f(x))은 기준 이미지들에 대한 f(x)의 평균값을 의미할 수 있다. 여기서, f(x)는 log(D~(I-HR)일 수 있다.Here, the hostility loss L_adv may be calculated based on Equation 1131. Here, E_I^-HR(f(x)) may mean an average value of f(x) for the output images. Here, f(x) may be log(D~(I^-HR), where E_I-HR(f(x)) may mean an average value of f(x) for reference images. Here, f(x) may be log(D~(I-HR).

또한, D~(I-HR)은 수학식(1132)에 기초하여 계산될 수 있으며, D~(I-HR)은 진짜 이미지(기준 이미지)를 진짜(기준 이미지)로 식별할 확률을 의미할 수 있다. 여기서, D~(I^-HR)은 수학식(1133)에 기초하여 계산될 수 있으며, D~(I^-HR)는 가짜 이미지(출력 이미지)를 가짜(출력 이미지)로 식별할 확률을 의미할 수 있다. 여기서, C(I^-HR)은 판별기를 통해 획득되는 출력 이미지에 관련된 값을 의미할 수 있다. 그리고, C(I-HR)은 판별기를 통해 획득되는 기준 이미지에 관련된 값을 의미할 수 있다.In addition, D~(I-HR) may be calculated based on Equation 1132, and D~(I-HR) may mean the probability of identifying a real image (reference image) as real (reference image). can Here, D~(I^-HR) may be calculated based on Equation (1133), and D~(I^-HR) represents the probability of identifying a fake image (output image) as a fake (output image). can mean Here, C(I^-HR) may mean a value related to an output image obtained through the discriminator. And, C(I-HR) may mean a value related to the reference image obtained through the discriminator.

도 12는 본 개시의 다른 실시 예에 따른 이미지 변환 모델을 구체적으로 설명하기 위한 도면이다.12 is a view for specifically explaining an image conversion model according to another embodiment of the present disclosure.

도 12를 참조하면, 전자 장치(100)는 이미지 변환 모델(101), 스타일 식별 모델(102) 및 손실값 계산 모델(103)을 포함할 수 있다.Referring to FIG. 12 , the electronic device 100 may include an image conversion model 101 , a style identification model 102 , and a loss value calculation model 103 .

일반적인 이미지 변환 동작에서는 이미지 변환 모델(101) 및 스타일 식별 모델(102)이 이용될 수 있으며, 이미지 학습 동작에서는 이미지 변환 모델(101), 스타일 식별 모델(102) 및 손실값 계산 모델(103)이 모두 이용될 수 있다.In a general image transformation operation, the image transformation model 101 and the style identification model 102 may be used, and in the image learning operation, the image transformation model 101 , the style identification model 102 and the loss calculation model 103 are All can be used.

이미지 변환 모델(101)은 저해상도 처리부(1201), 업스케일링부(1202) 및 고해상도 처리부(1203)를 포함할 수 있다.The image conversion model 101 may include a low resolution processing unit 1201 , an upscaling unit 1202 , and a high resolution processing unit 1203 .

여기서, 저해상도 처리부(1201)는 SR(Super Resolution) 분기(branch)에 해당할 수 있으며 적어도 하나의 컨벌루션 레이어 및 복수의 기본 블록을 포함할 수 있다. 여기서, 저해상도 처리부(1201)는 입력 이미지(10)가 업스케일링 되기 이전에 저해상도 상태에서 이미지를 변환할 수 있다. 여기서, 업스케일링부(1202)는 변환된 저해상도의 이미지를 업스케일링할 수 있다. 여기서, 고해상도 처리부(1203)는 업스케일링된 이미지를 적어도 하나의 컨벌루션 레이어 및 ReLU 레이어에 기초하여 변환할 수 있다. 여기서, 저해상도 처리부(1201)는 SFT(Spatial Feature Transform) 레이어가 포함된 RRDB(Residual in Residual Dense Block)일 수 있다. 여기서, SFT 레이어는 스타일 정보(T)에 기초하여 변조 파라미터를 출력하는 매핑 기능을 학습하는데 이용될 수 있다. 따라서, SFT 레이어는 목표의 최적화를 위한 필수적 레이어일 수 있다.Here, the low resolution processing unit 1201 may correspond to a super resolution (SR) branch and may include at least one convolutional layer and a plurality of basic blocks. Here, the low resolution processing unit 1201 may convert the image in a low resolution state before the input image 10 is upscaled. Here, the upscaling unit 1202 may upscale the converted low-resolution image. Here, the high resolution processing unit 1203 may convert the upscaled image based on at least one convolutional layer and a ReLU layer. Here, the low resolution processing unit 1201 may be a residual in residual density block (RRDB) including a spatial feature transform (SFT) layer. Here, the SFT layer may be used to learn a mapping function for outputting a modulation parameter based on the style information (T). Therefore, the SFT layer may be an essential layer for the optimization of the target.

여기서, 스타일 식별 모델(102)은 사용자 제어부(1204), 스타일 식별 네트워크(1205), 스타일맵 생성부(1206) 및 조건 네트워크(1207)를 포함할 수 있다. 여기서, 사용자 제어부(1204)는 직접 입력 이미지(10)의 변환에 이용될 스타일을 사용자로부터 선택 받을 수 있다. 여기서, 스타일 식별 네트워크(1205)는 스타일맵 생성을 위한 네트워크로서 입력 이미지(10)에 적합한 스타일을 식별할 수 있다. 여기서, 스타일맵 생성부(1206)는 사용자 제어부(1204) 또는 스타일 식별 네트워크(1205) 중 적어도 하나를 통해 결정된 스타일에 기초하여 입력 이미지(10)에 대응되는 스타일맵을 생성할 수 있다. 여기서, 조건 네트워크(1207)는 생성된 스타일맵을 이미지 변환 모델(101)에 전송할 수 있다. 구체적으로, 조건 네트워크(1207)는 생성된 스타일맵을 이미지 변환 모델(101)에 포함된 기본 블록에 전달할 수 있다. 구체적으로, 조건 네트워크(1207)는 스타일맵을 이미지 변환 모델(101)의 피쳐 모듈레이션 레이어(예를 들어, SFT(Spatial Feature Transform)) 레이어에 전송할 수 있다. 한편, 조건 네트워크(1207)에 포함된 모든 컨볼루션 레이어는 다른 영역의 간섭을 피하기 위해 1*1 커널을 이용할 수 있다.Here, the style identification model 102 may include a user control unit 1204 , a style identification network 1205 , a style map generation unit 1206 , and a condition network 1207 . Here, the user control unit 1204 may receive a selection from the user of a style to be used for the conversion of the input image 10 directly. Here, the style identification network 1205 may identify a style suitable for the input image 10 as a network for generating a style map. Here, the style map generator 1206 may generate a style map corresponding to the input image 10 based on a style determined through at least one of the user control unit 1204 and the style identification network 1205 . Here, the conditional network 1207 may transmit the generated style map to the image transformation model 101 . Specifically, the conditional network 1207 may transmit the generated style map to a basic block included in the image transformation model 101 . Specifically, the conditional network 1207 may transmit the style map to a feature modulation layer (eg, Spatial Feature Transform (SFT)) layer of the image transformation model 101 . Meanwhile, all convolutional layers included in the conditional network 1207 may use a 1*1 kernel to avoid interference from other regions.

여기서, 스타일맵은 객체, 배경, 경계를 구분하는 정보를 포함할 수 있으며, 객체, 배경, 경계 등에 적용될 스타일 정보를 포함할 수 있다. 한편, 다른 실시 예에 따라, 전자 장치(100)는 스타일 정보를 이미지 변환 모델(101)에 반영하기 위하여 입력 이미지에 부가 정보를 연결(concatenation)할 수 있다.Here, the style map may include information for classifying an object, a background, and a boundary, and may include style information to be applied to an object, a background, and a boundary. Meanwhile, according to another embodiment, the electronic device 100 may concatenate additional information to the input image to reflect the style information to the image conversion model 101 .

여기서, 스타일맵은 SFT(Spatial Feature Transform) 레이어를 통해 복수의 영역의 가장 자리 및 텍스처 복구 스타일을 제어하는데 이용될 수 있다.Here, the style map may be used to control edges of a plurality of regions and a texture restoration style through a Spatial Feature Transform (SFT) layer.

따라서, 이미지 변환 모델(101)은 스타일 식별 모델(102)에서 생성된 스타일맵을 이용하여 출력 이미지(20)를 생성할 수 있다. 한편, 상술한 동작은 일반 이미지 변환 동작에서 수행되는 동작일 수 있다.Accordingly, the image conversion model 101 may generate the output image 20 by using the style map generated by the style identification model 102 . Meanwhile, the above-described operation may be an operation performed in a general image conversion operation.

한편, 이미지 변환 모델(101)에서 이용되는 다양한 파라미터를 결정하기 위해 학습 동작이 필요할 수 있다. 여기서, 학습 동작을 위하여 손실값 계산 모델(103)이 이용될 수 있다.Meanwhile, a learning operation may be required to determine various parameters used in the image transformation model 101 . Here, the loss value calculation model 103 may be used for the learning operation.

여기서, 손실값 계산 모델(103)은 손실 함수(1208, loss function), 판별기(1209, discriminator) 및 VGG 네트워크(1210)를 포함할 수 있다. 여기서, 손실 함수(1208)는 출력 이미지(20) 및 기준 이미지(30) 사이의 손실값을 획득할 수 있다. 여기서, 판별기(1209)는 출력 이미지(20) 및 기준 이미지(30)에 대응되는 판별기 손실을 획득할 수 있다. 여기서, VGG 네트워크는 복수의 스타일에 대응되는 스타일 적용 함수를 의미할 수 있다. 여기서, 스타일 식별 모델(102)은 생성된 스타일맵을 손실값 계산 모델(103)에 전달할 수 있다. 그리고, 손실값 계산 모델(103)은 스타일 식별 모델(102)로부터 획득한 스타일맵에 기초하여 손실값을 계산할 수 있다.Here, the loss value calculation model 103 may include a loss function 1208 , a discriminator 1209 , and a VGG network 1210 . Here, the loss function 1208 may obtain a loss value between the output image 20 and the reference image 30 . Here, the discriminator 1209 may obtain a discriminator loss corresponding to the output image 20 and the reference image 30 . Here, the VGG network may mean a style application function corresponding to a plurality of styles. Here, the style identification model 102 may transmit the generated style map to the loss value calculation model 103 . In addition, the loss value calculation model 103 may calculate a loss value based on the style map obtained from the style identification model 102 .

한편, 손실값 계산 모델(103)은 스타일 식별 모델(102)에서 전송하는 스타일 정보(또는 스타일맵)에 기초하여 손실 함수(901)의 계산을 변경할 수 있다. 또한, 손실값 계산 모델(103)은 스타일 식별 모델(102)에서 전송하는 스타일 정보(또는 스타일맵)에 기초하여 다양한 레벨의 특성 공간에서 평균 제곱 오차(Mean Squared Error, MSE), 인지 손실 또는 지각 손실 중 적어도 하나를 조합하여 손실값을 획득할 수 있다.Meanwhile, the loss calculation model 103 may change the calculation of the loss function 901 based on the style information (or style map) transmitted from the style identification model 102 . In addition, the loss value calculation model 103 is based on the style information (or style map) transmitted from the style identification model 102 Mean Squared Error (MSE), cognitive loss or perception in various levels of the characteristic space. A loss value may be obtained by combining at least one of the losses.

도 13은 본 개시의 일 실시 예에 따른 이미지 변환 모델에 이용되는 기본 블록을 설명하기 위한 도면이다.13 is a diagram for describing a basic block used in an image transformation model according to an embodiment of the present disclosure.

도 13을 참조하면, 저해상도 처리부(1201)에 포함되는 기본 블록은 복수의 단위 블록(1301, 1302, 1303, 1304) 및 컨벌루션 레이어(1305)를 포함하는 RRDB(Residual in Residual Dense Block)일 수 있다. 여기서, 단위 블록(1301, 1302, 1303, 1304)은 SFT(Spatial Feature Transform) 레이어, 컨벌루션 레이어 및 LRelu(Leaky ReLU) 레이어를 포함할 수 있다.Referring to FIG. 13 , the basic block included in the low resolution processing unit 1201 may be a residual in residual density block (RRDB) including a plurality of unit blocks 1301 , 1302 , 1303 , 1304 and a convolutional layer 1305 . . Here, the unit blocks 1301 , 1302 , 1303 , and 1304 may include a Spatial Feature Transform (SFT) layer, a convolutional layer, and a Leaky ReLU (LRelu) layer.

여기서, 각 SFT 레이어는 스타일 식별 모델(102)에서 생성한 스타일 정보(또는 스타일맵)를 수신할 수 있다.Here, each SFT layer may receive style information (or style map) generated by the style identification model 102 .

도 14는 본 개시의 일 실시 예에 따른 스타일 식별 모델을 설명하기 위한 도면이다.14 is a diagram for describing a style identification model according to an embodiment of the present disclosure.

도 14를 참조하면, 스타일맵 생성부(1206)는 입력 이미지(1401)에 기초하여 스타일맵(1405)을 생성할 수 있다. 구체적으로, 스타일맵 생성부(1206)는 컨벌루션 레이어(1402, 1404) 및 복수의 기본 블록(1403-1,1403-2,1403-n)을 포함할 수 있다. 스타일맵(1405)은 스타일 적용 여부에 따라 흰색 및 검은색 사이의 색상으로 표현될 수 있다. 예를 들어, 스타일맵(1405)에서 흰색에 해당하는 픽셀은 질감 스타일을 전혀 적용하지 않는 픽셀일 수 있다. 또한, 스타일맵(1405)에서 검은색에 해당하는 픽셀은 질감 스타일을 최대한으로 적용하는 픽셀일 수 있다. 또한, 스타일맵(1405)에서 회색에 해당하는 픽셀은 질감 스타일을 중간 정도로 적용하는 픽셀일 수 있다.Referring to FIG. 14 , the style map generator 1206 may generate a style map 1405 based on an input image 1401 . Specifically, the style map generator 1206 may include convolutional layers 1402 and 1404 and a plurality of basic blocks 1403-1, 1403-2, and 1403-n. The style map 1405 may be expressed in a color between white and black depending on whether a style is applied. For example, a pixel corresponding to white in the style map 1405 may be a pixel to which a texture style is not applied at all. Also, a pixel corresponding to black in the style map 1405 may be a pixel to which a texture style is maximally applied. Also, a pixel corresponding to gray in the style map 1405 may be a pixel to which a texture style is applied to an intermediate degree.

한편, 스타일맵(1405)을 색상으로 표현하였으나, 스타일맵(1405)은 0과 1 사이의 값을 가지는 행렬 데이터로 표현될 수 있다.Meanwhile, although the style map 1405 is expressed as a color, the style map 1405 may be expressed as matrix data having a value between 0 and 1.

도 15는 본 개시의 다른 실시 예에 따른 학습 동작에 이용하는 계산을 설명하기 위한 도면이다.15 is a diagram for explaining a calculation used for a learning operation according to another embodiment of the present disclosure.

도 15를 참조하면, 판별기 손실(L_dis)은 수학식(1501)에 기초하여 계산될 수 있다. 여기서, E_I-HR(f(x))은 기준 이미지들에 대한 f(x)의 평균값을 의미할 수 있다. 여기서, f(x)는 log(D~(I-HR)일 수 있다. 여기서, E_I^-HR(f(x))은 출력 이미지들에 대한 f(x)의 평균값을 의미할 수 있다. 여기서, f(x)는 log(D~(I^-HR)일 수 있다.Referring to FIG. 15 , the discriminator loss L_dis may be calculated based on Equation 1501. Here, E_I-HR(f(x)) may mean an average value of f(x) for reference images. Here, f(x) may be log(D~(I-HR). Here, E_I^-HR(f(x)) may mean an average value of f(x) for output images. Here, f(x) may be log(D~(I^-HR).

수학식(1502) 및 수학식(1503)은 도 11의 수학식(1132) 및 수학식(1133)에 대응될 수 있다. 여기서, D~(I-HR)은 수학식(1502)에 기초하여 계산될 수 있으며, D~(I-HR)은 진짜 이미지(기준 이미지)를 진짜(기준 이미지)로 식별할 확률을 의미할 수 있다. 여기서, D~(I^-HR)은 수학식(1503)에 기초하여 계산될 수 있으며, D~(I^-HR)는 가짜 이미지(출력 이미지)를 가짜(출력 이미지)로 식별할 확률을 의미할 수 있다. 여기서, C(I^-HR)은 판별기를 통해 획득되는 출력 이미지에 관련된 값을 의미할 수 있다. 그리고, C(I-HR)은 판별기를 통해 획득되는 기준 이미지에 관련된 값을 의미할 수 있다.Equations 1502 and 1503 may correspond to Equations 1132 and 1133 of FIG. 11 . Here, D~(I-HR) may be calculated based on Equation 1502, and D~(I-HR) means the probability of identifying a real image (reference image) as a real (reference image). can Here, D~(I^-HR) may be calculated based on Equation 1503, and D~(I^-HR) represents the probability of identifying a fake image (output image) as a fake (output image). can mean Here, C(I^-HR) may mean a value related to an output image obtained through the discriminator. And, C(I-HR) may mean a value related to the reference image obtained through the discriminator.

도 16은 본 개시의 다양한 실시 예에 따른 전자 장치의 이미지 변환 동작의 효과를 설명하기 위한 도면이다.16 is a diagram for explaining an effect of an image conversion operation of an electronic device according to various embodiments of the present disclosure;

도 16을 참조하면, 그래프(1605)는 x축이 PSNR(Peak Signal-to-Noise Ratio)로 하고 y축이 LPIPS(Learned Perceptual Image Patch Similarity)일 수 있다. PSNR은 픽셀 단위로 값의 차이를 나타내는 값일 수 있으며, PSNR이 높을수록 인지 손실이 낮음을 의미할 수 있다. 또한, LPIPS는 인지 유사도를 의미할 수 있으며, 값이 낮을수록 성능이 좋을 수 있다.Referring to FIG. 16 , in a graph 1605 , the x-axis may be Peak Signal-to-Noise Ratio (PSNR) and the y-axis may be Learned Perceptual Image Patch Similarity (LPIPS). The PSNR may be a value indicating a difference in values in units of pixels, and a higher PSNR may mean a lower cognitive loss. In addition, LPIPS may mean cognitive similarity, and the lower the value, the better the performance.

SRResNET, NatSR, SRGAN, SFTGAN, ESRGAN, SPSR은 다양한 초해상 모델을 의미할 수 있다. 여기서, FxSR-D는 이미지 변환 모델이 제1 스타일 정보(예를 들어, 질감 효과를 최소로 반영하는 스타일)에 기초하여 초해상화하는 실시 예일 수 있다. 여기서, FxSR-P는 이미지 변환 모델이 제2 스타일 정보(예를 들어, 질감 효과를 최대로 반영하는 스타일)에 기초하여 초해상화하는 실시 예일 수 있다.SRResNET, NatSR, SRGAN, SFTGAN, ESRGAN, and SPSR may refer to various super-resolution models. Here, FxSR-D may be an embodiment in which the image conversion model is super-resolution based on first style information (eg, a style that minimally reflects a texture effect). Here, the FxSR-P may be an embodiment in which the image conversion model is super-resolution based on the second style information (eg, a style that maximally reflects a texture effect).

그래프(1605)에서 FxSR-D 및 FxSR-P가 좌측 그리고 하측에 위치한다는 점에서, 다른 초해상화 모델보다 성능이 우수하다고 평가될 수 있다.In the graph 1605, in that FxSR-D and FxSR-P are located on the left and bottom, it can be evaluated that the performance is superior to that of other super-resolution models.

도 17은 일 실시 예에 따른 전자 장치의 이미지 변환 동작을 비교하기 위한 도면이다.17 is a diagram for comparing image conversion operations of an electronic device according to an exemplary embodiment.

도 17을 참조하면, 입력 이미지(1710, 1720, 1730)에 대하여 다양한 이미지 변환 방식을 적용할 수 있다. HR에 해당하는 이미지(1711, 1721, 1731)는 기준 이미지를 의미할 수 있다. FxSR-D에 해당하는 이미지(1712, 1722, 1732)는 질감 효과를 최소로 반영하는 스타일(T=0)로 초해상화가 수행된 이미지일 수 있다. 또한, FxSR-E에 해당하는 이미지(1713,1723,1733)는 질감 효과를 중간으로 반영하는 스타일(T=0.5)로 초해상화가 수행된 이미지일 수 있다. 또한, FxSR-P에 해당하는 이미지(1714,1724,1734)는 질감 효과를 최대로 반영하는 스타일(T=1)로 초해상화가 수행된 이미지일 수 있다.Referring to FIG. 17 , various image conversion methods may be applied to input images 1710 , 1720 , and 1730 . The images 1711 , 1721 , and 1731 corresponding to HR may refer to reference images. The images 1712 , 1722 , and 1732 corresponding to FxSR-D may be images in which super-resolution is performed in a style (T=0) that minimally reflects a texture effect. In addition, the images 1713 , 1723 , and 1733 corresponding to FxSR-E may be images in which super-resolution is performed in a style (T=0.5) that reflects the texture effect in the middle. Also, the images 1714 , 1724 , and 1734 corresponding to FxSR-P may be images in which super-resolution is performed in a style (T=1) that maximizes the texture effect.

여기서, 전자 장치(100)는 입력 이미지(1710)에서 특정 영역(1710-1)을 식별하고, 특정 영역(1710-1)에 대응되는 이미지(예를 들어, 건물 오브젝트를 포함하는 이미지)를 초해상화할 수 있다. 이미지(1711)는 특정 영역(1710-1)에 대응되는 기준 이미지일 수 있다. 이미지(1712, 1713, 1714)는 질감 효과를 상이하게 반영하는 스타일로 초해상화가 수행된 이미지일 수 있다. 이미지(1714)가 기본 이미지(1711)와 가장 유사할 수 있다.Here, the electronic device 100 identifies a specific area 1710-1 from the input image 1710, and first generates an image (eg, an image including a building object) corresponding to the specific area 1710-1. can be resolved. The image 1711 may be a reference image corresponding to the specific region 1710-1. The images 1712 , 1713 , and 1714 may be images in which super-resolution is performed in a style that reflects the texture effect differently. Image 1714 may be most similar to base image 1711 .

여기서, 전자 장치(100)는 입력 이미지(1720)에서 특정 영역(1720-1)을 식별하고, 특정 영역(1720-1)에 대응되는 이미지(예를 들어, 스크래치가 나 있는 벽돌 오브젝트를 포함하는 이미지)를 초해상화할 수 있다. 이미지(1721)는 특정 영역(1720-1)에 대응되는 기준 이미지일 수 있다. 이미지(1722, 1723, 1724)는 질감 효과를 상이하게 반영하는 스타일로 초해상화가 수행된 이미지일 수 있다. 이미지(1724)가 기본 이미지(1721)와 가장 유사할 수 있다.Here, the electronic device 100 identifies a specific region 1720 - 1 in the input image 1720 , and an image corresponding to the specific region 1720 - 1 (eg, including a scratched brick object). image) can be super-resolution. The image 1721 may be a reference image corresponding to the specific region 1720-1. The images 1722 , 1723 , and 1724 may be images in which super-resolution is performed in a style that reflects the texture effect differently. Image 1724 may be most similar to base image 1721 .

여기서, 전자 장치(100)는 입력 이미지(1730)에서 특정 영역(1730-1)을 식별하고, 특정 영역(1730-1)에 대응되는 이미지(예를 들어, 식물 오브젝트를 포함하는 이미지)를 초해상화할 수 있다. 이미지(1731)는 특정 영역(1730-1)에 대응되는 기준 이미지일 수 있다. 이미지(1732, 1733, 1734)는 질감 효과를 상이하게 반영하는 스타일로 초해상화가 수행된 이미지일 수 있다. 이미지(1734)가 기본 이미지(1731)와 가장 유사할 수 있다.Here, the electronic device 100 identifies a specific region 1730-1 in the input image 1730 and selects an image (eg, an image including a plant object) corresponding to the specific region 1730-1. can be resolved. The image 1731 may be a reference image corresponding to the specific area 1730-1. The images 1732 , 1733 , and 1734 may be images in which super-resolution is performed in a style that reflects the texture effect differently. Image 1734 may be most similar to base image 1731 .

도 18은 다른 실시 예에 따른 전자 장치의 이미지 변환 동작을 비교하기 위한 도면이다.18 is a diagram for comparing image conversion operations of an electronic device according to another exemplary embodiment.

도 18을 참조하면, 이미지(1800)는 기준 이미지일 수 있으며, 이미지(1800-2)는 제1 영역(1801)에 대응되는 이미지일 수 있고, 이미지(1800-3)은 제2 영역(1802)에 대응되는 이미지일 수 있다.Referring to FIG. 18 , an image 1800 may be a reference image, an image 1800-2 may be an image corresponding to a first region 1801 , and an image 1800-3 may be a second region 1802 . ) may be an image corresponding to

스타일맵(1810-1, 1820-1, 1830-1, 1840-1, 1850-1)은 스타일 효과(예를 들어, 질감 효과)를 어느 정도 반영할지 여부를 나타내는 데이터를 의미할 수 있다. 여기서 검은색은 질감 효과를 최소로 반영하는 데이터를 의미하고, 흰색은 질감 효과를 최대로 반영하는 데이터를 의미할 수 있다. 회색은 질감 효과를 중간으로 반영하는 데이터를 의미할 수 있다. 따라서, 회색의 정도에 따라 질감 효과가 반영되는 정도가 상이할 수 있다.The style maps 1810-1, 1820-1, 1830-1, 1840-1, and 1850-1 may refer to data indicating to what extent a style effect (eg, a texture effect) is to be reflected. Here, black may mean data that reflects the texture effect to the minimum, and white may mean data that reflects the texture effect to the maximum. Gray may mean data reflecting a texture effect as a medium. Accordingly, the degree to which the texture effect is reflected may be different depending on the degree of gray.

또한, 이미지(1810)는 전체 영역에 대하여 질감 효과를 최소로 반영하는 스타일(T=0)로 초해상화가 수행된 이미지일 수 있다. 또한, 이미지(1810-1)는 스타일맵을 의미할 수 있다. 이미지(1810-2)는 이미지(1810) 중 제1 영역(1801)에 대응되는 이미지일 수 있다. 이미지(1810-3)는 이미지(1810) 중 제2 영역(1802)에 대응되는 이미지일 수 있다.Also, the image 1810 may be an image in which super-resolution is performed in a style (T=0) that minimizes the texture effect for the entire area. Also, the image 1810-1 may mean a style map. The image 1810 - 2 may be an image corresponding to the first region 1801 of the image 1810 . The image 1810 - 3 may be an image corresponding to the second region 1802 of the image 1810 .

또한, 이미지(1820)는 전체 영역에 대하여 질감 효과를 최대로 반영하는 스타일(T=1)로 초해상화가 수행된 이미지일 수 있다. 또한, 이미지(1820-1)는 스타일맵을 의미할 수 있다. 이미지(1820-2)는 이미지(1820) 중 제1 영역(1801)에 대응되는 이미지일 수 있다. 이미지(1820-3)는 이미지(1820) 중 제2 영역(1802)에 대응되는 이미지일 수 있다.Also, the image 1820 may be an image in which super-resolution is performed in a style (T=1) that maximizes the texture effect for the entire area. Also, the image 1820-1 may mean a style map. The image 1820 - 2 may be an image corresponding to the first region 1801 of the image 1820 . The image 1820 - 3 may be an image corresponding to the second region 1802 of the image 1820 .

또한, 이미지(1830)는 영역 별로 질감 효과를 다르게 반영하는 스타일로 초해상화가 수행된 이미지일 수 있다. 또한, 이미지(1830-1)는 스타일맵을 의미할 수 있다. 이미지(1830-2)는 이미지(1830) 중 제1 영역(1801)에 대응되는 이미지일 수 있다. 이미지(1830-3)는 이미지(1830) 중 제2 영역(1802)에 대응되는 이미지일 수 있다.Also, the image 1830 may be an image in which super-resolution is performed in a style that reflects the texture effect differently for each region. Also, the image 1830-1 may mean a style map. The image 1830 - 2 may be an image corresponding to the first area 1801 of the image 1830 . The image 1830 - 3 may be an image corresponding to the second region 1802 of the image 1830 .

또한, 이미지(1840)는 일 실시 예에 따라 사용자가 인위적으로 영역을 지정하여 질감 효과를 다르게 반영하는 스타일로 초해상화가 수행된 이미지일 수 있다. 또한, 이미지(1840-1)는 스타일맵을 의미할 수 있다. 이미지(1840-2)는 이미지(1840) 중 제1 영역(1801)에 대응되는 이미지일 수 있다. 이미지(1840-3)는 이미지(1840) 중 제2 영역(1802)에 대응되는 이미지일 수 있다.Also, according to an embodiment, the image 1840 may be an image in which super-resolution is performed in a style in which a user artificially designates an area to reflect a texture effect differently. Also, the image 1840 - 1 may mean a style map. The image 1840 - 2 may be an image corresponding to the first region 1801 of the image 1840 . The image 1840 - 3 may be an image corresponding to the second region 1802 of the image 1840 .

또한, 이미지(1850)는 다른 실시 예에 따라 사용자가 인위적으로 영역을 지정하여 질감 효과를 다르게 반영하는 스타일로 초해상화가 수행된 이미지일 수 있다. 또한, 이미지(1850-1)는 스타일맵을 의미할 수 있다. 이미지(1850-2)는 이미지(1850) 중 제1 영역(1801)에 대응되는 이미지일 수 있다. 이미지(1850-3)는 이미지(1850) 중 제2 영역(1802)에 대응되는 이미지일 수 있다.Also, according to another embodiment, the image 1850 may be an image in which super-resolution is performed in a style in which a user artificially designates a region to reflect a texture effect differently. Also, the image 1850 - 1 may mean a style map. The image 1850 - 2 may be an image corresponding to the first area 1801 of the image 1850 . The image 1850 - 3 may be an image corresponding to the second area 1802 of the image 1850 .

전체 영역에 대하여 질감 효과를 최대로 반영한 이미지(1820) 및 스타일 식별 네트워크(도12의 1205)를 통해 자동으로 스타일맵을 결정하여 생성된 이미지(1830)가 기본 이미지(1800)에 가장 유사할 수 있다. 다만, 이미지의 자연스러움을 고려하면 이미지(1820)보다 이미지(1830)가 기본 이미지(1800)에 더 유사할 수 있다.The image 1820 that maximally reflects the texture effect for the entire area and the image 1830 generated by automatically determining the style map through the style identification network (1205 in Fig. 12) may be most similar to the basic image 1800. have. However, in consideration of the naturalness of the image, the image 1830 may be more similar to the basic image 1800 than the image 1820 .

도 19는 또 다른 실시 예에 따른 전자 장치의 이미지 변환 동작을 비교하기 위한 도면이다.19 is a diagram for comparing image conversion operations of an electronic device according to another exemplary embodiment.

도 19를 참조하면, 이미지(1910, 1920)는 입력 이미지일 수 있고, 특정 영역(1910-1, 1920-1)을 포함할 수 있다. 이미지(1910-2, 1920-2)는 스타일 식별 네트워크(도12의 1205)를 통해 자동으로 스타일맵을 결정하는 실시 예에 따라 생성된 스타일맵을 의미할 수 있다. 이미지(1910-3, 1920-3)는 사용자 제어부(도 12의 1204)를 통해 사용자가 직접 스타일맵을 결정하는 실시 예에 따라 생성된 스타일맵을 의미할 수 있다.Referring to FIG. 19 , images 1910 and 1920 may be input images and may include specific regions 1910 - 1 and 1920 - 1 . The images 1910 - 2 and 1920 - 2 may refer to style maps generated according to an embodiment in which a style map is automatically determined through a style identification network ( 1205 of FIG. 12 ). The images 1910 - 3 and 1920 - 3 may refer to style maps generated according to an embodiment in which the user directly determines the style map through the user control unit ( 1204 of FIG. 12 ).

여기서, 이미지(1911, 1912, 1913, 1914, 1915, 1916)는 입력 이미지(1910)의 전체 영역 중 특정 영역(1910-1)에 대응되는 이미지일 수 있다. 이미지(1921, 1922, 1923, 1924, 1925, 1926)는 입력 이미지(1920)의 전체 영역 중 특정 영역(1920-1)에 대응되는 이미지일 수 있다.Here, the images 1911 , 1912 , 1913 , 1914 , 1915 , and 1916 may be images corresponding to a specific region 1910 - 1 among the entire region of the input image 1910 . The images 1921 , 1922 , 1923 , 1924 , 1925 , and 1926 may be images corresponding to a specific region 1920 - 1 among the entire region of the input image 1920 .

이미지(1911, 1921)는 입력 이미지(1910, 1920)에 대응되는 기준 이미지일 수 있다. 또한, 이미지(1912, 1922)는 전체 영역에 대하여 질감 효과를 최대로 반영하는 스타일(T=0)로 초해상화가 수행된 이미지일 수 있다. 또한, 이미지(1913, 1923)는 영역 별로 질감 효과를 다르게 반영하는 스타일로 초해상화가 수행된 이미지일 수 있다. 또한, 이미지(1914, 1924)는 사용자가 인위적으로 영역을 지정하여 질감 효과를 다르게 반영하는 스타일로 초해상화가 수행된 이미지일 수 있다. 또한, 이미지(1915, 1925)는 이미지(1910-2, 1920-2)의 전체 영역 중 특정 영역(1910-1, 1920-1)에 대응되는 이미지일 수 있다. 또한, 이미지(1916, 1926)는 이미지(1910-3, 1920-3)의 전체 영역 중 특정 영역(1910-1, 1920-1)에 대응되는 이미지일 수 있다.The images 1911 and 1921 may be reference images corresponding to the input images 1910 and 1920 . Also, the images 1912 and 1922 may be images in which super-resolution is performed in a style (T=0) that maximally reflects the texture effect for the entire area. Also, the images 1913 and 1923 may be images in which super-resolution is performed in a style that reflects a texture effect differently for each region. Also, the images 1914 and 1924 may be images in which super-resolution is performed in a style in which a user artificially designates an area to reflect a texture effect differently. Also, the images 1915 and 1925 may be images corresponding to specific regions 1910 - 1 and 1920 - 1 among the entire regions of the images 1910 - 2 and 1920 - 2 . Also, the images 1916 and 1926 may be images corresponding to specific regions 1910 - 1 and 1920 - 1 among the entire regions of the images 1910 - 3 and 1920 - 3 .

한편, 여기서, 이미지(1910-2, 1910-3, 1915, 1916, 1920-2, 1920-3, 1925, 1926)는 스타일맵 이미지를 의미할 수 있다.Meanwhile, here, the images 1910-2, 1910-3, 1915, 1916, 1920-2, 1920-3, 1925, and 1926 may mean a style map image.

전체 영역에 대하여 질감 효과를 최대로 반영한 이미지(1912,1922) 및 스타일 식별 네트워크(도12의 1205)를 통해 자동으로 스타일맵을 결정하여 질감 효과를 반영한 이미지(1913,1923)가 기본 이미지(1911, 1922)와 가장 유사할 수 있다. 다만, 이미지의 자연스러움을 고려하면 이미지(1912,1922)보다 이미지(1913,1923)가 더 기본 이미지(1911, 1922)에 가까울 수 있다.Images (1912, 1922) that maximally reflected the texture effect for the entire area and images (1913, 1923) reflecting the texture effect by automatically determining the style map through the style identification network (1205 in Fig. 12) are the default images (1911) , 1922). However, considering the naturalness of the image, the images 1913 and 1923 may be closer to the basic images 1911 and 1922 than the images 1912 and 1922 .

또한, 스타일 식별 네트워크(도12의 1205)를 통해 자동으로 스타일맵을 결정하여 생성된 스타일맵 이미지(1915, 1925)가 사용자가 직접 생성한 스타일맵 이미지(1916, 1926)보다 더 정교할 수 있다.In addition, the style map images 1915 and 1925 generated by automatically determining the style map through the style identification network ( 1205 in FIG. 12 ) may be more sophisticated than the style map images 1916 and 1926 generated directly by the user. .

도 20은 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법을 설명하기 위한 도면이다.20 is a diagram for explaining a method of controlling an electronic device according to an embodiment of the present disclosure.

도 20을 참조하면, 임계 해상도 미만의 저해상도 입력 이미지로부터 임계 해상도 이상의 고해상도 출력 이미지를 획득하도록 학습된 인공 지능 모델을 저장하는 전자 장치의 제어 방법은 저해상도 입력 이미지에서 제1 특성 정보를 포함하는 제1 영역 및 제2 특성 정보를 포함하는 제2 영역을 식별하는 단계 (S2005), 제1 영역에 대해 제1 강도의 텍스처 처리를 수행한 제1 이미지 및 제2 영역에 대해 제1 강도와 상이한 제2 강도의 텍스처 처리를 수행한 제2 이미지를 획득하는 단계 (S2010), 제1 이미지 및 제2 이미지를 각각 업스케일링하는 단계(S2015) 및 업스케일링된 제1 및 제2 이미지를 결합하여 고해상도 이미지를 획득하는 단계(S2020)를 포함할 수 있다.Referring to FIG. 20 , a control method of an electronic device for storing an artificial intelligence model trained to obtain a high-resolution output image equal to or greater than a threshold resolution from a low-resolution input image less than the threshold resolution is a first method including first characteristic information in the low-resolution input image. A step of identifying a region and a second region including the second characteristic information (S2005), a first image in which texture processing of a first intensity is performed on the first region, and a second region different from the first intensity on the second region A high-resolution image is obtained by obtaining a second image that has been subjected to intensity texture processing (S2010), upscaling the first image and the second image, respectively (S2015), and combining the upscaled first and second images It may include a step of obtaining (S2020).

한편, 제1 특성 정보는 오브젝트 영역 정보를 포함할 수 있고 제2 특성 정보는 백그라운드 영역 정보를 포함할 수 있고, 제1 영역 및 제2 영역을 식별하는 단계(S2005)는 오브젝트 영역 정보를 포함하는 영역을 제1 영역으로 식별할 수 있고 백그라운드 영역 정보를 포함하는 영역을 제2 영역으로 식별할 수 있다.Meanwhile, the first characteristic information may include object region information and the second characteristic information may include background region information, and the step of identifying the first region and the second region ( S2005 ) includes object region information. The area may be identified as the first area, and the area including the background area information may be identified as the second area.

한편, 제1 이미지 및 제2 이미지를 획득하는 단계(S2010)는 제1 영역에 포함된 오브젝트의 타입이 제1 타입이면 제1 영역에 대해 제1 타입에 대응되는 강도의 텍스처 처리를 수행할 수 있고, 제1 영역에 포함된 오브젝트의 타입이 제2 타입이면 제1 영역에 대해 제2 타입에 대응되는 강도의 텍스처 처리를 수행할 수 있다.Meanwhile, in the step of acquiring the first image and the second image ( S2010 ), if the type of the object included in the first region is the first type, texture processing with an intensity corresponding to the first type may be performed on the first region. If the type of the object included in the first area is the second type, texture processing with an intensity corresponding to the second type may be performed on the first area.

한편, 저해상도 입력 이미지를 제1 인공 지능 모델에 입력하여 제1 영역에 대한 식별 정보, 제1 강도, 제2 영역에 대한 식별 정보 및 제2 강도를 획득하는 단계를 더 포함할 수 있고, 제1 인공 지능 모델은 이미지가 입력되면 입력된 이미지에서 특성이 상이한 영역에 대한 식별 정보 및 식별된 각 영역에 대한 텍스처 강도를 출력하도록 학습된, 제어 방법.Meanwhile, the method may further include inputting a low-resolution input image into the first artificial intelligence model to obtain identification information for the first region, first intensity, identification information for the second region, and second intensity, the first The artificial intelligence model is trained to output identification information for regions with different characteristics in the input image and texture intensity for each identified region when an image is input, a control method.

한편, 제1 영역에 대한 식별 정보는 저해상도 입력 이미지에 포함된 제1 영역에 대한 좌표 정보를 포함할 수 있고, 제2 영역에 대한 식별 정보는 저해상도 입력 이미지에 포함된 제2 영역에 대한 좌표 정보를 포함할 수 있다.Meanwhile, the identification information on the first region may include coordinate information on the first region included in the low-resolution input image, and the identification information on the second region includes coordinate information on the second region included in the low-resolution input image. may include

한편, 제1 이미지 및 제2 이미지를 획득하는 단계(S2010) 및 제1 이미지 및 제2 이미지를 각각 업스케일링하는 단계(S2015)는, 제1 영역 및 제2 영역에 대한 식별 정보를 포함하는 이미지, 제1 강도 및 제2 강도를 제2 인공 지능 모델에 입력하여 텍스처 처리 및 업스케일링 처리된 제1 및 제2 이미지를 획득할 수 있다.On the other hand, the step of obtaining the first image and the second image (S2010) and the step of upscaling the first image and the second image (S2015), respectively, is an image including identification information for the first area and the second area , first intensity and second intensity may be input to the second artificial intelligence model to obtain first and second images subjected to texture processing and upscaling.

한편, 제2 인공 지능 모델은 SFT(Spartial Feature Transformation)를 수행하는 RRDB(Residual-in-Residual Dense Block)을 포함하도록 구현될 수 있다.Meanwhile, the second artificial intelligence model may be implemented to include a residual-in-residual density block (RRDB) that performs partial feature transformation (SFT).

한편, 제2 인공 지능 모델은 복수의 파라미터를 가지는 레이어를 포함할 수 있고, 상이한 특성을 가지는 복수의 영역으로 식별된 이미지 및 복수의 영역에 대응되는 텍스처 강도에 기초하여 획득된 복수의 이미지를 기준 이미지와 비교하여 손실 값을 산출할 수 있고, 산출된 손실 값에 기초하여 복수의 파라미터를 학습시킬 수 있다.Meanwhile, the second artificial intelligence model may include a layer having a plurality of parameters, based on an image identified as a plurality of regions having different characteristics and a plurality of images obtained based on texture intensity corresponding to the plurality of regions. A loss value may be calculated by comparing with the image, and a plurality of parameters may be learned based on the calculated loss value.

한편, 도 20과 같은 전자 장치(100)의 제어 방법은 도 3 또는 도 4의 구성을 가지는 전자 장치(100) 상에서 실행될 수 있으며, 그 밖의 구성을 가지는 전자 장치(100) 상에서도 실행될 수 있다.Meanwhile, the control method of the electronic device 100 as shown in FIG. 20 may be executed on the electronic device 100 having the configuration of FIG. 3 or 4 , and may also be executed on the electronic device 100 having other configurations.

한편, 상술한 본 개시의 다양한 실시 예들에 따른 방법들은, 기존 전자 장치에 설치 가능한 어플리케이션 형태로 구현될 수 있다.Meanwhile, the above-described methods according to various embodiments of the present disclosure may be implemented in the form of an application that can be installed in an existing electronic device.

또한, 상술한 본 개시의 다양한 실시 예들에 따른 방법들은, 기존 전자 장치에 대한 소프트웨어 업그레이드, 또는 하드웨어 업그레이드 만으로도 구현될 수 있다.In addition, the above-described methods according to various embodiments of the present disclosure may be implemented only by software upgrade or hardware upgrade of an existing electronic device.

또한, 상술한 본 개시의 다양한 실시 예들은 전자 장치에 구비된 임베디드 서버, 또는 전자 장치 및 디스플레이 장치 중 적어도 하나의 외부 서버를 통해 수행되는 것도 가능하다.In addition, various embodiments of the present disclosure described above may be performed through an embedded server provided in an electronic device or an external server of at least one of an electronic device and a display device.

한편, 본 개시의 일시 예에 따르면, 이상에서 설명된 다양한 실시 예들은 기기(machine)(예: 컴퓨터)로 읽을 수 있는 저장 매체(machine-readable storage media)에 저장된 명령어를 포함하는 소프트웨어로 구현될 수 있다. 기기는, 저장 매체로부터 저장된 명령어를 호출하고, 호출된 명령어에 따라 동작이 가능한 장치로서, 개시된 실시 예들에 따른 전자 장치를 포함할 수 있다. 명령이 프로세서에 의해 실행될 경우, 프로세서가 직접, 또는 프로세서의 제어 하에 다른 구성요소들을 이용하여 명령에 해당하는 기능을 수행할 수 있다. 명령은 컴파일러 또는 인터프리터에 의해 생성 또는 실행되는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장 매체는, 비일시적(non-transitory) 저장 매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장 매체가 신호(signal)를 포함하지 않으며 실재(tangible)한다는 것을 의미할 뿐 데이터가 저장 매체에 반영구적 또는 임시적으로 저장됨을 구분하지 않는다.Meanwhile, according to a temporary example of the present disclosure, the various embodiments described above may be implemented as software including instructions stored in a machine-readable storage media readable by a machine (eg, a computer). can A device is a device capable of calling a stored command from a storage medium and operating according to the called command, and may include the electronic device according to the disclosed embodiments. When the instruction is executed by the processor, the processor may perform a function corresponding to the instruction by using other components directly or under the control of the processor. Instructions may include code generated or executed by a compiler or interpreter. The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' means that the storage medium does not include a signal and is tangible, and does not distinguish that data is semi-permanently or temporarily stored in the storage medium.

또한, 본 개시의 일 실시 예에 따르면, 이상에서 설명된 다양한 실시 예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로, 또는 어플리케이션 스토어(예: 플레이 스토어TM)를 통해 온라인으로 배포될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.In addition, according to an embodiment of the present disclosure, the method according to the various embodiments described above may be provided by being included in a computer program product. Computer program products may be traded between sellers and buyers as commodities. The computer program product may be distributed in the form of a machine-readable storage medium (eg, compact disc read only memory (CD-ROM)) or online through an application store (eg, Play Store™). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily generated in a storage medium such as a memory of a server of a manufacturer, a server of an application store, or a relay server.

또한, 상술한 다양한 실시 예들에 따른 구성 요소(예: 모듈 또는 프로그램) 각각은 단수 또는 복수의 개체로 구성될 수 있으며, 전술한 해당 서브 구성 요소들 중 일부 서브 구성 요소가 생략되거나, 또는 다른 서브 구성 요소가 다양한 실시 예에 더 포함될 수 있다. 대체적으로 또는 추가적으로, 일부 구성 요소들(예: 모듈 또는 프로그램)은 하나의 개체로 통합되어, 통합되기 이전의 각각의 해당 구성 요소에 의해 수행되는 기능을 동일 또는 유사하게 수행할 수 있다. 다양한 실시 예들에 따른, 모듈, 프로그램 또는 다른 구성 요소에 의해 수행되는 동작들은 순차적, 병렬적, 반복적 또는 휴리스틱하게 실행되거나, 적어도 일부 동작이 다른 순서로 실행되거나, 생략되거나, 또는 다른 동작이 추가될 수 있다.In addition, each of the components (eg, a module or a program) according to the various embodiments described above may be composed of a single or a plurality of entities, and some sub-components of the aforementioned sub-components may be omitted, or other sub-components may be omitted. Components may be further included in various embodiments. Alternatively or additionally, some components (eg, a module or a program) may be integrated into a single entity, so that functions performed by each corresponding component prior to integration may be performed identically or similarly. According to various embodiments, operations performed by a module, program, or other component may be sequentially, parallelly, repetitively or heuristically executed, or at least some operations may be executed in a different order, omitted, or other operations may be added. can

이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시에 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형 실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In the above, preferred embodiments of the present disclosure have been illustrated and described, but the present disclosure is not limited to the specific embodiments described above, and is generally used in the technical field belonging to the present disclosure without departing from the gist of the present disclosure as claimed in the claims. Various modifications may be made by those having the knowledge of

100: 전자 장치
110: 메모리
120: 프로세서
130: 통신 인터페이스
140: 디스플레이100: electronic device
110: memory
120: processor
130: communication interface
140: display

Claims

In an electronic device,
display;
a memory in which the trained artificial intelligence model is stored; and
A processor for controlling the display to obtain a high-resolution output image above the threshold resolution from a low-resolution input image less than the threshold resolution using the learned artificial intelligence model and display the obtained high-resolution image,
The processor is
identifying a first region including first characteristic information and a second region including second characteristic information in the low-resolution input image;
acquiring a first image in which texturing of a first intensity is performed on the first area and a second image in which texturing of a second intensity different from the first intensity is performed on the second area;
upscaling the first image and the second image, respectively;
and combining the upscaled first and second images to obtain the high-resolution image.

According to claim 1,
The first characteristic information includes object region information and the second characteristic information includes background region information,
The processor is
and identifying an area including the object area information as the first area and identifying an area including the background area information as the second area.

3. The method of claim 2,
The processor is
If the type of the object included in the first region is the first type, texture processing of an intensity corresponding to the first type is performed on the first region;
If the type of the object included in the first area is the second type, the electronic device performs texture processing with an intensity corresponding to the second type on the first area.

According to claim 1,
The processor is
inputting the low-resolution input image into a first artificial intelligence model to obtain identification information for the first region, the first intensity, identification information for the second region, and the second intensity;
The first artificial intelligence model,
When an image is input, the electronic device is trained to output identification information for regions having different characteristics in the input image and texture intensity for each identified region.

5. The method of claim 4,
Identification information for the first area,
including coordinate information for the first region included in the low-resolution input image,
Identification information for the second area,
The electronic device comprising coordinate information on the second region included in the low-resolution input image.

5. The method of claim 4,
The processor is
An image including identification information for the first region and the second region, and first and second images subjected to texture processing and upscaling by inputting the first intensity and the second intensity into a second artificial intelligence model to obtain, an electronic device.

7. The method of claim 6,
The second artificial intelligence model,
An electronic device implemented to include a Residual-in-Residual Dense Block (RRDB) that performs Partial Feature Transformation (SFT).

7. The method of claim 6,
The second artificial intelligence model,
A layer having a plurality of parameters,
Comparing the image identified as a plurality of regions having different characteristics and a plurality of images obtained based on the texture intensity corresponding to the plurality of regions with a reference image to calculate a loss value,
learning the plurality of parameters based on the calculated loss value.

9. The method of claim 8,
The second artificial intelligence model,
An electronic device for calculating the loss value based on at least one of a reconstruction loss, an adversarial loss, and a perceptual loss.

9. The method of claim 8,
The second artificial intelligence model,
An electronic device implemented as a Generative Adversarial Network (GAN).

A method for controlling an electronic device for storing an artificial intelligence model trained to obtain a high-resolution output image equal to or greater than the threshold resolution from a low-resolution input image less than a threshold resolution,
identifying a first region including first characteristic information and a second region including second characteristic information in the low-resolution input image;
obtaining a first image on which the texturing process of a first intensity is performed on the first area and a second image on which the texturing process of a second intensity different from the first intensity is performed on the second area;
upscaling the first image and the second image, respectively; and
Combining the upscaled first and second images to obtain the high-resolution image; including, a control method.

12. The method of claim 11,
The first characteristic information includes object region information and the second characteristic information includes background region information,
The step of identifying the first region and the second region comprises:
and identifying an area including the object area information as the first area and identifying an area including the background area information as the second area.

13. The method of claim 12,
Acquiring the first image and the second image comprises:
If the type of the object included in the first region is the first type, texture processing of an intensity corresponding to the first type is performed on the first region;
If the type of the object included in the first region is the second type, texture processing of the intensity corresponding to the second type is performed on the first region.

12. The method of claim 11,
The method further comprising: inputting the low-resolution input image into a first artificial intelligence model to obtain identification information for the first region, the first intensity, identification information for the second region, and the second intensity;
The first artificial intelligence model,
A control method that is learned to output identification information for regions having different characteristics in the input image and texture intensity for each identified region when an image is input.

15. The method of claim 14,
Identification information for the first area,
including coordinate information for the first region included in the low-resolution input image,
Identification information for the second area,
A control method comprising coordinate information on the second region included in the low-resolution input image.

15. The method of claim 14,
Acquiring the first image and the second image and upscaling the first image and the second image, respectively,
An image including identification information for the first region and the second region, and first and second images subjected to texture processing and upscaling by inputting the first intensity and the second intensity into a second artificial intelligence model to obtain, a control method.

17. The method of claim 16,
The second artificial intelligence model,
A control method implemented to include a Residual-in-Residual Dense Block (RRDB) that performs SFT (Spartial Feature Transformation).

17. The method of claim 16,
The second artificial intelligence model,
A layer having a plurality of parameters,
Comparing the image identified as a plurality of regions having different characteristics and a plurality of images obtained based on the texture intensity corresponding to the plurality of regions with a reference image to calculate a loss value,
Learning the plurality of parameters based on the calculated loss value, a control method.

19. The method of claim 18,
The second artificial intelligence model,
A control method for calculating the loss value based on at least one of a reconstruction loss, an adversarial loss, or a perceptual loss.

19. The method of claim 18,
The second artificial intelligence model,
A control method, implemented as a Generative Adversarial Network (GAN).