KR20240054133A

KR20240054133A - Method and electronic device for generating training data for training of artificial intelligence model

Info

Publication number: KR20240054133A
Application number: KR1020220178691A
Authority: KR
Inventors: 이상훈
Original assignee: 삼성전자주식회사
Priority date: 2022-10-18
Filing date: 2022-12-19
Publication date: 2024-04-25

Abstract

학습 대상에 해당하는 객체를 포함하는 제1 이미지를 획득하고, 훈련용 데이터 생성 방법은 제1 이미지와 동일한 시점에서 촬영된 제2 이미지에 기초하여, 객체를 포함하지 않는 배경 이미지를 획득하고, 훈련용 데이터 생성 방법은 제1 이미지 내에서 객체를 포함하는 관심 영역을 식별하고, 훈련용 데이터 생성 방법은 배경 이미지로부터 관심 영역에 해당하는 관심 영역 이미지를 획득하고, 훈련용 데이터 생성 방법은 기생성된 원본 훈련용 데이터 중에서, 관심 영역 이미지의 너비와 높이의 비율에 기초하여 결정된 객체의 컨텍스트에 대응되는, 제1 목표 훈련용 데이터를 결정하고, 훈련용 데이터 생성 방법은 관심 영역 이미지에 기초하여, 제1 목표 훈련용 데이터에 포함된 제1 훈련용 이미지를 수정함으로써, 배경 이미지의 적어도 일부가 포함된 합성 훈련용 데이터를 생성하는 훈련용 데이터를 생성 방법 및 훈련용 데이터 생성 장치가 제공된다.A first image containing an object corresponding to a learning target is acquired, and the training data generation method is based on a second image taken at the same point as the first image, obtaining a background image that does not contain the object, and training. The method of generating data for training identifies a region of interest including an object in the first image, the method of generating data for training acquires an image of the region of interest corresponding to the region of interest from the background image, and the method of generating data for training identifies a region of interest including an object in the first image. Among the original training data, first target training data corresponding to the context of the object determined based on the ratio of the width and height of the region of interest image is determined, and the method of generating training data is based on the region of interest image, 1 A training data generating method and a training data generating apparatus are provided for generating synthetic training data including at least a portion of a background image by modifying a first training image included in target training data.

Description

Method and electronic device for generating training data for learning of artificial intelligence model {METHOD AND ELECTRONIC DEVICE FOR GENERATING TRAINING DATA FOR TRAINING OF ARTIFICIAL INTELLIGENCE MODEL}

본 개시는 인공 지능 모델의 학습을 위한 훈련용 데이터를 생성하는 방법 및 전자 장치에 관한 것이다. 보다 구체적으로, 배경 이미지를 이용하여 기생성된 원본 훈련용 데이터로부터 합성 훈련용 데이터를 생성하는 방법 및 전자 장치에 관한 것이다.This disclosure relates to a method and electronic device for generating training data for learning an artificial intelligence model. More specifically, it relates to a method and electronic device for generating synthetic training data from original training data parasitically generated using a background image.

인공 지능 기술의 발달에 따라, 여러 분야에서 인공 지능 기술이 이용되고 있다. 인공 지능 모델은 많은 훈련용 데이터를 통해 학습이 필요하다. 그러나, 인공 지능 모델의 훈련용 데이터를 직접 만드는 것은 많은 시간과 비용이 소모된다. 따라서, 많은 경우 비용과 시간을 줄이기 위하여 인터넷 상에 공개된 훈련용 데이터를 이용하여 학습을 진행하기도 한다. 다만, 인터넷 상에 공개된 훈련용 데이터는 범용적으로 사용하기 위하여 제작된 데이터 셋으로 특정 환경에 대하여는 오류가 발생할 수 있다. 따라서, 기존에 존재하는 훈련용 데이터를 이용하여 인공 지능 모델이 사용되는 환경에서 보다 정확하게 동작할 수 있도록 인공 지능 모델을 학습 시킬 수 있는 특정 환경이 고려된 훈련용 데이터를 생성하는 방법이 필요할 수 있다.With the development of artificial intelligence technology, artificial intelligence technology is being used in various fields. Artificial intelligence models need to learn through a lot of training data. However, creating training data for artificial intelligence models directly takes a lot of time and money. Therefore, in many cases, in order to reduce cost and time, learning is carried out using training data published on the Internet. However, the training data released on the Internet is a data set created for general use, and errors may occur in certain environments. Therefore, a method may be needed to use existing training data to generate training data that takes into account a specific environment that can learn an artificial intelligence model so that it can operate more accurately in the environment where the artificial intelligence model is used. .

본 개시의 일 양태에 따라서, 인공 지능 모델의 학습을 위한 훈련용 데이터를 생성하는 방법이 제공된다. 훈련용 데이터 생성 방법은 학습 대상에 해당하는 객체를 포함하는 제1 이미지를 획득하는 단계를 포함할 수 있다. 훈련용 데이터 생성 방법은 제1 이미지와 동일한 시점에서 촬영된 제2 이미지에 기초하여, 객체를 포함하지 않는 배경 이미지를 획득하는 단계를 포함할 수 있다. 훈련용 데이터 생성 방법은 제1 이미지 내에서 객체를 포함하는 관심 영역을 식별하는 단계를 포함할 수 있다. 훈련용 데이터 생성 방법은 배경 이미지로부터 관심 영역에 해당하는 관심 영역 이미지를 획득하는 단계를 포함할 수 있다. 훈련용 데이터 생성 방법은 기생성된 원본 훈련용 데이터 중에서, 관심 영역 이미지의 너비와 높이의 비율에 기초하여 결정된 객체의 컨텍스트에 대응되는, 제1 목표 훈련용 데이터를 결정하는 단계를 포함할 수 있다. 훈련용 데이터 생성 방법은 관심 영역 이미지에 기초하여, 제1 목표 훈련용 데이터에 포함된 제1 훈련용 이미지를 수정함으로써, 배경 이미지의 적어도 일부가 포함된 합성 훈련용 데이터를 생성할 수 있다.According to an aspect of the present disclosure, a method for generating training data for learning an artificial intelligence model is provided. A method of generating training data may include acquiring a first image including an object corresponding to a learning target. The method of generating training data may include obtaining a background image that does not include an object based on a second image taken at the same point in time as the first image. A method of generating training data may include identifying a region of interest containing an object within a first image. The method of generating training data may include acquiring a region of interest image corresponding to the region of interest from a background image. The method of generating training data may include determining first target training data, which corresponds to the context of the object determined based on the ratio of the width and height of the region of interest image, from among the pre-generated original training data. . The training data generation method may generate synthetic training data including at least a portion of the background image by modifying the first training image included in the first target training data based on the region of interest image.

본 개시의 일 양태에 따라서, 전술한 방법을 수행하도록 하는 프로그램이 저장된 하나 이상의 컴퓨터로 읽을 수 있는 기록매체가 제공된다.According to one aspect of the present disclosure, one or more computer-readable recording media storing a program for performing the above-described method are provided.

본 개시의 일 양태에 따르면 합성 훈련용 데이터를 생성하는 전자 장치(1400)가 제공된다. 전자 장치(1400)는 메모리(1410) 및 프로세서(1420)를 포함할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 학습 대상에 해당하는 객체를 포함하는 제1 이미지를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지와 동일한 시점에서 촬영된 제2 이미지에 기초하여, 객체를 포함하지 않는 배경 이미지를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지 내에서 객체를 포함하는 관심 영역을 식별할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 배경 이미지로부터 관심 영역에 해당하는 관심 영역 이미지를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 기생성된 원본 훈련용 데이터 중에서, 관심 영역 이미지의 너비와 높이의 비율에 기초하여 결정된 객체의 컨텍스트에 대응되는 제1 목표 훈련용 데이터를 결정할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 제1 목표 훈련용 데이터에 포함된 제1 훈련용 이미지를 수정함으로써, 배경 이미지의 적어도 일부가 포함된 합성 훈련용 데이터를 생성할 수 있다.According to one aspect of the present disclosure, an electronic device 1400 that generates synthetic training data is provided. The electronic device 1400 may include a memory 1410 and a processor 1420. The processor 1420 may acquire a first image including an object corresponding to a learning target by executing one or more instructions stored in the memory 1410. The processor 1420 may execute one or more instructions stored in the memory 1410 to obtain a background image that does not include an object based on a second image taken at the same time as the first image. The processor 1420 may identify a region of interest including an object in the first image by executing one or more instructions stored in the memory 1410. The processor 1420 may obtain a region of interest image corresponding to the region of interest from the background image by executing one or more instructions stored in the memory 1410. The processor 1420 executes one or more instructions stored in the memory 1410 to train a first target corresponding to the context of the object determined based on the ratio of the width and height of the image of the region of interest among the generated original training data. data can be determined. The processor 1420 executes one or more instructions stored in the memory 1410 to modify the first training image included in the first target training data based on the region of interest image, so that at least a portion of the background image is included. You can generate synthetic training data.

도 1은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.
도 2a는 본 개시의 일 실시예에 따른 배경 이미지를 생성하는 방법을 설명하기 위한 도면이다.
도 2b는 본 개시의 일 실시예에 따른 배경 이미지를 생성하는 방법을 설명하기 위한 도면이다.
도 3은 본 개시의 일 실시예에 따른 관심 영역을 식별하는 방법을 설명하기 위한 도면이다.
도 4는 본 개시의 일 실시예에 따른 관심 영역 이미지를 생성하는 방법을 설명하기 위한 도면이다.
도 5a는 본 개시의 일 실시예에 따른 훈련용 데이터를 설명하기 위한 도면이다.
도 5b는 본 개시의 일 실시예에 따른 훈련용 데이터를 설명하기 위한 도면이다.
도 6는 본 개시의 일 실시예에 따른 인공 지능 모델을 학습하는 방법을 설명하기 위한 도면이다.
도 7은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.
도 8은 본 개시의 일 실시예에 따른 추가 합성 훈련용 데이터를 생성하는지 여부를 결정하는 방법을 설명하기 위한 도면이다.
도 9a는 본 개시의 일 실시예에 따른 추가 합성 훈련용 데이터를 생성하는지 여부를 결정하는 방법을 설명하기 위한 순서도이다.
도 9b는 본 개시의 일 실시예에 따른 추가 합성 훈련용 데이터를 생성하는지 여부를 결정하는 방법을 설명하기 위한 순서도이다.
도 10은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.
도 11은 본 개시의 일 실시예에 따른 컨텍스트를 이용하여 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.
도 12는 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.
도 13은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 순서도이다.
도 14는 본 개시의 일 실시예에 따른 전자 장치의 구성을 설명하기 위한 블록도이다.
도 15는 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 순서도이다.
도 16은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하여 인공 지능 모델을 활용하는 과정을 설명하기 위한 도면이다.
도 17은 본 개시의 일 실시예에 따른 전자 장치와 서버를 이용하여 훈련용 데이터를 생성하는 방법을 설명하기 위한 시퀀스 다이어그램이다.
도 18은 본 개시의 일 실시예에 따른 서버의 구성을 설명하기 위한 블록도이다.1 is a diagram for explaining a method of generating training data according to an embodiment of the present disclosure.
FIG. 2A is a diagram for explaining a method of generating a background image according to an embodiment of the present disclosure.
FIG. 2B is a diagram for explaining a method of generating a background image according to an embodiment of the present disclosure.
Figure 3 is a diagram for explaining a method of identifying a region of interest according to an embodiment of the present disclosure.
FIG. 4 is a diagram for explaining a method of generating a region of interest image according to an embodiment of the present disclosure.
FIG. 5A is a diagram illustrating training data according to an embodiment of the present disclosure.
Figure 5b is a diagram for explaining training data according to an embodiment of the present disclosure.
Figure 6 is a diagram for explaining a method of learning an artificial intelligence model according to an embodiment of the present disclosure.
Figure 7 is a diagram for explaining a method of generating training data according to an embodiment of the present disclosure.
FIG. 8 is a diagram for explaining a method of determining whether to generate additional synthetic training data according to an embodiment of the present disclosure.
FIG. 9A is a flowchart illustrating a method of determining whether to generate additional synthetic training data according to an embodiment of the present disclosure.
FIG. 9B is a flowchart illustrating a method of determining whether to generate additional synthetic training data according to an embodiment of the present disclosure.
Figure 10 is a diagram for explaining a method of generating training data according to an embodiment of the present disclosure.
FIG. 11 is a diagram illustrating a method of generating training data using context according to an embodiment of the present disclosure.
Figure 12 is a diagram for explaining a method of generating training data according to an embodiment of the present disclosure.
Figure 13 is a flowchart for explaining a method of generating training data according to an embodiment of the present disclosure.
Figure 14 is a block diagram for explaining the configuration of an electronic device according to an embodiment of the present disclosure.
Figure 15 is a flowchart for explaining a method of generating training data according to an embodiment of the present disclosure.
FIG. 16 is a diagram illustrating a process of generating training data and utilizing an artificial intelligence model according to an embodiment of the present disclosure.
FIG. 17 is a sequence diagram illustrating a method of generating training data using an electronic device and a server according to an embodiment of the present disclosure.
Figure 18 is a block diagram for explaining the configuration of a server according to an embodiment of the present disclosure.

본 개시에서, "a, b 또는 c 중 적어도 하나" 표현은 " a", " b", " c", "a 및 b", "a 및 c", "b 및 c", "a, b 및 c 모두", 혹은 그 변형들을 지칭할 수 있다.In the present disclosure, the expression “at least one of a, b, or c” refers to “a”, “b”, “c”, “a and b”, “a and c”, “b and c”, “a, b and c", or variations thereof.

본 개시에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in this disclosure are general terms that are currently widely used as much as possible while considering the functions in this disclosure, but this may vary depending on the intention or precedent of a person working in the art, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the relevant description. Therefore, the terms used in this disclosure should be defined based on the meaning of the term and the overall content of this disclosure, rather than simply the name of the term.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 명세서에 기재된 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 또한, 본 명세서에서 사용되는 '제1' 또는 '제2' 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용할 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다.Singular expressions may include plural expressions, unless the context clearly indicates otherwise. Terms used herein, including technical or scientific terms, may have the same meaning as generally understood by a person of ordinary skill in the technical field described herein. Additionally, terms including ordinal numbers such as 'first' or 'second' used in this specification may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a part "includes" a certain element throughout the specification, this means that, unless specifically stated to the contrary, it does not exclude other elements but may further include other elements. Additionally, terms such as “unit” and “module” used in the specification refer to a unit that processes at least one function or operation, which may be implemented as hardware or software, or as a combination of hardware and software.

본 개시는 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고, 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 개시의 실시 형태에 대해 한정하려는 것이 아니며, 본 개시는 여러 실시예들의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present disclosure can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail through detailed description. However, this is not intended to limit the embodiments of the present disclosure, and the present disclosure should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the various embodiments.

실시예를 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 개시의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제 1, 제 2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별 기호에 불과하다.In describing the embodiments, if it is determined that detailed descriptions of related known technologies may unnecessarily obscure the gist of the present disclosure, the detailed descriptions will be omitted. In addition, numbers (eg, first, second, etc.) used in the description of the specification are merely identification symbols to distinguish one component from another component.

또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.In addition, in this specification, when a component is referred to as "connected" or "connected" to another component, the component may be directly connected or directly connected to the other component, but specifically Unless there is a contrary description, it should be understood that it may be connected or connected through another component in the middle.

또한, 본 명세서에서 '~부(유닛)', '모듈' 등으로 표현되는 구성요소는 2개 이상의 구성요소가 하나의 구성요소로 합쳐지거나 또는 하나의 구성요소가 보다 세분화된 기능별로 2개 이상으로 분화될 수도 있다. 또한, 이하에서 설명할 구성요소 각각은 자신이 담당하는 주기능 이외에도 다른 구성요소가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성요소 각각이 담당하는 주기능 중 일부 기능이 다른 구성요소에 의해 전담되어 수행될 수도 있음은 물론이다.In addition, in this specification, components expressed as 'unit (unit)', 'module', etc. are two or more components combined into one component, or one component is divided into two or more components for each more detailed function. It may be differentiated into In addition, each of the components described below may additionally perform some or all of the functions of other components in addition to the main functions that each component is responsible for, and some of the main functions of each component may be different from other components. Of course, it can also be performed exclusively by a component.

본 개시에 따른 인공지능과 관련된 기능은 프로세서와 메모리를 통해 동작된다. 프로세서는 하나 또는 복수의 프로세서로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서일 수 있다. 하나 또는 복수의 프로세서는, 메모리에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서가 인공지능 전용 프로세서인 경우, 인공지능 전용 프로세서는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다.Functions related to artificial intelligence according to the present disclosure are operated through a processor and memory. The processor may consist of one or multiple processors. At this time, one or more processors may be a general-purpose processor such as a CPU, AP, or DSP (Digital Signal Processor), a graphics-specific processor such as a GPU or VPU (Vision Processing Unit), or an artificial intelligence-specific processor such as an NPU. One or more processors control input data to be processed according to predefined operation rules or artificial intelligence models stored in memory. Alternatively, when one or more processors are dedicated artificial intelligence processors, the artificial intelligence dedicated processors may be designed with a hardware structure specialized for processing a specific artificial intelligence model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/또는 시스템을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)이 있으나, 전술한 예에 한정되지 않는다.Predefined operation rules or artificial intelligence models are characterized by being created through learning. Here, being created through learning means that the basic artificial intelligence model is learned using a large number of learning data by a learning algorithm, thereby creating a predefined operation rule or artificial intelligence model set to perform the desired characteristics (or purpose). It means burden. This learning may be performed on the device itself that performs the artificial intelligence according to the present disclosure, or may be performed through a separate server and/or system. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but are not limited to the examples described above.

인공지능 모델은, 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들(weight values)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 신경망은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다.An artificial intelligence model may be composed of multiple neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and neural network calculation is performed through calculation between the calculation result of the previous layer and the plurality of weights. Multiple weights of multiple neural network layers can be optimized by the learning results of the artificial intelligence model. For example, a plurality of weights may be updated so that loss or cost values obtained from the artificial intelligence model are reduced or minimized during the learning process. Artificial neural networks may include deep neural networks (DNN), for example, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), or Deep Q-Networks, etc., but are not limited to the examples described above.

본 명세서에서 "배경"은 이미지에서 학습 대상에 해당하는 객체를 제외한 영역 또는 장면을 의미할 수 있다. 예를 들면, 이미지가 학습 대상에 해당하는 사람을 포함하면, 배경은 사람을 제외한 나머지 영역을 의미할 수 있다. 또한, 예를 들면, 이미지가 학습 대상에 해당하는 사람을 전혀 포함하지 않으면, 배경은 이미지 영역 전체를 의미할 수 있다. 배경은 동일한 이미지에 대하여 학습 대상에 따라 다르게 결정될 수 있다.In this specification, “background” may mean an area or scene in an image excluding the object corresponding to the learning target. For example, if an image includes a person as a learning target, the background may refer to the remaining area excluding the person. Additionally, for example, if the image does not contain any people corresponding to the learning target, the background may refer to the entire image area. The background may be determined differently depending on the learning target for the same image.

본 개시의 일 실시예에 따라, 배경은 학습 대상에 해당하지 않는 객체를 포함할 수 있다. 예를 들면, 학습 대상에 해당하는 객체가 인간인 경우, 배경은 이미지에 포함된 학습 대상에 해당하지 않는 객체(예: 강아지 등)를 포함할 수 있다. 또한, 예를 들면, 학습 대상이 특정 사람(예: 가족 중 엄마)이라면, 배경은 학습 대상에 해당하지 않는 사람(예: 가족 중 아빠)을 포함할 수 있다. 또한, 예를 들면, 학습 대상이 특정한 조건(예: 성별, 나이, 피부색 등)에 해당하는 객체라면, 배경은 특정한 조건에 해당하지 않는 객체를 포함할 수 있다.According to an embodiment of the present disclosure, the background may include objects that do not correspond to the learning target. For example, if the object corresponding to the learning target is a human, the background may include an object included in the image that does not correspond to the learning target (e.g., a dog, etc.). Also, for example, if the learning target is a specific person (e.g., the mother in the family), the background may include people who do not correspond to the learning target (e.g., the father in the family). Additionally, for example, if the learning target is an object that corresponds to a specific condition (e.g., gender, age, skin color, etc.), the background may include objects that do not correspond to the specific condition.

본 개시의 일 실시예에 따라, 배경은 학습 대상에 해당하는 이동 객체를 제외한 학습 대상에 해당하지 않는 고정 객체를 포함할 수 있다. 예를 들면, 배경은 시간에 따라 이동하는 사람, 강아지와 같은 이동 객체를 포함하지 않고, 시간에 따라 이동이 없는 가구, 벽과 같은 고정 객체를 포함할 수 있다.According to an embodiment of the present disclosure, the background may include fixed objects that do not correspond to the learning target except for moving objects that correspond to the learning target. For example, the background may not include moving objects such as people or dogs that move with time, but may include fixed objects such as furniture or walls that do not move with time.

본 명세서에서 "배경 이미지"는 배경을 포함하는 이미지를 의미할 수 있다. 본 개시의 일 실시예에 따라, 배경 이미지는 동일한 시점에서 촬영된 하나 이상의 이미지를 이용하여 획득될 수 있다. 예를 들면, 시야가 고정된 카메라를 이용하여 촬영된 복수의 이미지를 평균함으로써, 배경 이미지가 획득될 수 있다. 본 개시의 일 실시예에 따라, 배경 이미지는 학습 대상에 해당하는 객체를 포함하지 않은 이미지로 결정될 수 있다. 예를 들면, 배경 이미지는 하나 이상의 촬영된 이미지들 중에서 학습 대상에 해당하는 객체가 포함되지 않은 이미지로 결정 할 수 있다. 본 개시의 일 실시예에 따라, 배경 이미지는 객체가 포함된 이미지로부터 객체를 제거함으로써 생성될 수 있다. 예를 들면, 배경 이미지는 객체를 포함하는 이미지의 객체 영역을 제거하고, 나머지 영역에 기초하여 객체 영역을 보상하는 방법으로 생성될 수 있다. 본 개시의 일 실시예에 따라, 배경 이미지는 복수의 이미지에 공통으로 포함된 객체를 포함하는 이미지를 의미할 수 있다. 예를 들면, 배경 이미지는 복수의 이미지 내에서 위치가 이동하는 이동 객체를 제외하고, 복수의 이미지 내에서 위치가 고정된 고정 객체를 포함하는 이미지를 의미할 수 있다.In this specification, “background image” may mean an image including a background. According to an embodiment of the present disclosure, a background image may be obtained using one or more images taken at the same viewpoint. For example, a background image may be obtained by averaging a plurality of images captured using a camera with a fixed field of view. According to an embodiment of the present disclosure, the background image may be determined to be an image that does not include an object corresponding to the learning target. For example, the background image may be determined to be an image that does not include an object corresponding to the learning target among one or more captured images. According to one embodiment of the present disclosure, a background image may be created by removing an object from an image containing the object. For example, the background image may be created by removing the object area of the image containing the object and compensating for the object area based on the remaining area. According to an embodiment of the present disclosure, the background image may refer to an image including an object commonly included in a plurality of images. For example, the background image may mean an image that includes fixed objects whose positions are fixed within a plurality of images, excluding moving objects whose positions move within the plurality of images.

본 명세서에서 "관심 영역"은 이미지 중에서 학습 대상에 해당하는 객체를 포함하는 영역을 의미할 수 있다. 예를 들면, 학습 대상이 인간인 경우, 관심 영역은 이미지 중에서 인간을 포함하는 영역일 수 있다. 본 개시의 일 실시예에 따라, 관심 영역은 학습 대상에 해당하는 객체를 포함하는 최소 사이즈의 직사각형 영역을 의미할 수 있다. 예를 들면, 학습 대상이 인간인 경우, 관심 영역은 이미지 중에서 인간을 포함하는 최소의 직사각형 영역일 수 있다.In this specification, “region of interest” may refer to an area in an image that includes an object corresponding to a learning target. For example, if the learning target is a human, the region of interest may be an area containing the human in the image. According to an embodiment of the present disclosure, the region of interest may mean a rectangular region of the minimum size containing an object corresponding to a learning target. For example, if the learning target is a human, the region of interest may be the smallest rectangular area in the image that includes the human.

본 명세서에서 "관심 영역 이미지"는 배경 이미지 중에서 관심 영역에 해당하는 이미지를 의미할 수 있다. 본 개시의 일 실시예에 따라, 관심 영역 이미지는 배경 이미지를 이용하여 획득될 수 있다. 본 개시의 일 실시예에 따라, 전자 장치는 배경 이미지를 이용하여 배경의 적어도 일부를 포함하는 훈련용 데이터를 생성할 수 있다. 전자 장치는 배경의 적어도 일부를 포함하는 훈련용 데이터를 이용하여 인공 지능 모델을 학습할 수 있다.In this specification, “region of interest image” may mean an image corresponding to the area of interest among the background images. According to an embodiment of the present disclosure, a region of interest image may be acquired using a background image. According to an embodiment of the present disclosure, an electronic device may generate training data including at least part of the background using a background image. The electronic device may learn an artificial intelligence model using training data including at least part of the background.

본 명세서에서 "훈련용 데이터"는 인공 지능 모델의 학습을 위한 데이터를 의미할 수 있다. 본 개시의 일 실시예에 따라, 훈련용 데이터는 객체가 포함된 이미지 및 객체에 대한 라벨링 정보를 포함할 수 있다. 예를 들면, 인공 지능 모델이 포즈 추정을 수행하기 위한 인공 지능 모델인 경우, 훈련용 데이터는 객체가 포함된 이미지 및 객체의 포즈에 관한 정보를 포함할 수 있다. 예를 들면, 인공 지능 모델이 객체 탐지를 수행하기 위한 인공 지능 모델인 경우, 훈련용 데이터는 객체가 포함된 이미지 및 객체에 대한 위치 정보 및 객체의 종류에 관한 클래스 정보를 포함할 수 있다. In this specification, “training data” may mean data for learning an artificial intelligence model. According to an embodiment of the present disclosure, training data may include an image containing an object and labeling information about the object. For example, if the artificial intelligence model is an artificial intelligence model for performing pose estimation, training data may include an image containing an object and information about the pose of the object. For example, if the artificial intelligence model is an artificial intelligence model for performing object detection, training data may include an image containing an object, location information about the object, and class information about the type of object.

본 명세서에서 "원본 훈련용 데이터"는 배경이 고려되지 않은 인공 지능 모델의 학습을 위한 훈련용 데이터를 의미할 수 있다. 예를 들면, 원본 훈련용 데이터는 인공 지능 모델의 학습을 위해 공개된 훈련용 데이터일 수 있다. 원본 훈련용 데이터의 배경은 인공 지능 모델을 이용하는 전자 장치의 카메라에 의하여 촬영되는 배경이 아닌 임의의 배경일 수 있다. 원본 훈련용 데이터는 학습 대상이 되는 객체에 관한 훈련용 데이터를 포함할 수 있다.In this specification, “original training data” may mean training data for learning an artificial intelligence model without considering the background. For example, the original training data may be publicly available training data for learning an artificial intelligence model. The background of the original training data may be any background other than the background captured by the camera of an electronic device using an artificial intelligence model. The original training data may include training data about the object to be learned.

본 명세서에서 "목표 훈련용 데이터"는 배경 이미지와 합성됨으로써 합성 훈련용 데이터를 생성하기 위하여 원본 훈련용 데이터 중에서 결정된 훈련용 데이터를 의미할 수 있다. 본 개시의 일 실시예에 따라, 목표 훈련용 데이터는 관심 영역 이미지의 너비 및 높이에 기초하여, 원본 훈련용 데이터 중에서 결정될 수 있다. 예를 들면, 목표 훈련용 데이터는 원본 훈련용 데이터 중에서 관심 영역의 너비 및 높이의 비율와 동일하거나 유사한 너비 및 높이의 비율을 갖는 훈련용 데이터일 수 있다. In this specification, “target training data” may mean training data determined from the original training data to generate synthetic training data by combining it with a background image. According to an embodiment of the present disclosure, target training data may be determined from original training data based on the width and height of the region of interest image. For example, the target training data may be training data that has a width and height ratio that is the same as or similar to the ratio of the width and height of the region of interest among the original training data.

본 명세서에서 "합성 훈련용 데이터"는 목표 훈련용 데이터와 관심 영역 이미지가 합성됨으로써 생성된 훈련용 데이터를 의미할 수 있다. 본 개시의 일 실시예에 따라, 합성 훈련용 데이터는 목표 훈련용 데이터 중에서 인공 지능 모델의 학습 대상이 되는 객체와 관심 영역 이미지가 합성됨으로써 생성될 수 있다.In this specification, “synthetic training data” may mean training data generated by combining target training data and a region of interest image. According to an embodiment of the present disclosure, synthetic training data may be generated by synthesizing an object that is a learning target of an artificial intelligence model and an image of a region of interest among target training data.

아래에서는 첨부한 도면을 참고하여 본 개시의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily implement the present invention. However, the present disclosure may be implemented in many different forms and is not limited to the embodiments described herein. In order to clearly explain the present disclosure in the drawings, parts that are not related to the description are omitted, and similar parts are given similar reference numerals throughout the specification.

이하 첨부된 도면을 참고하여 본 개시를 상세히 설명하기로 한다.Hereinafter, the present disclosure will be described in detail with reference to the attached drawings.

도 1은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.1 is a diagram for explaining a method of generating training data according to an embodiment of the present disclosure.

도 1을 참조하면, 전자 장치(1400)는 배경(110)을 포함하는 이미지를 획득할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 카메라(1430)를 이용하여, 배경(110)을 포함하는 이미지를 획득할 수 있다. 예를 들면, 전자 장치(1400)가 TV인 경우, TV는 카메라(1430)를 이용하여 실내 공간 배경을 포함하는 이미지를 획득할 수 있다. 배경(110)은 카메라(1430)의 시야각에 포함되는 영역일 수 있다.Referring to FIG. 1 , the electronic device 1400 may acquire an image including the background 110 . According to an embodiment of the present disclosure, the electronic device 1400 may acquire an image including the background 110 using the camera 1430. For example, when the electronic device 1400 is a TV, the TV may use the camera 1430 to obtain an image including an indoor space background. The background 110 may be an area included in the viewing angle of the camera 1430.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 배경(110)을 포함하는 이미지에 기초하여 배경 이미지(120)를 획득할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 동일한 시점에서 촬영된 하나 이상의 이미지를 이용하여 배경 이미지(120)를 획득할 수 있다. 예를 들면, 전자 장치(1400)는 시야가 고정된 카메라(1430)를 이용하여 촬영된 복수의 이미지를 평균함으로써, 배경 이미지를 획득할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 학습 대상에 해당하는 객체를 포함하지 않은 이미지를 배경 이미지(120)로 결정할 수 있다. 예를 들면, 전자 장치(1400)는 하나 이상의 촬영된 이미지들 중에서 학습 대상에 해당하는 객체가 포함되지 않은 이미지를 배경 이미지(120)로 결정 할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 객체가 포함된 이미지로부터 객체를 제거함으로써 배경 이미지를 생성할 수 있다. 예를 들면, 전자 장치(1400)는 객체를 포함하는 이미지의 객체 영역을 제거하고, 나머지 영역에 기초하여 객체 영역을 보상하는 방법으로 배경 이미지(120)를 생성할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 복수의 이미지에 공통으로 포함된 객체를 포함하는 이미지를 배경 이미지(120)로 결정할 수 있다. 예를 들면, 전자 장치(1400)는 복수의 이미지 내에서 위치가 이동하는 이동 객체를 제외하고, 복수의 이미지 내에서 위치가 고정된 고정 객체를 포함하는 이미지를 배경 이미지로 결정할 수 있다. According to an embodiment of the present disclosure, the electronic device 1400 may acquire the background image 120 based on an image including the background 110. According to an embodiment of the present disclosure, the electronic device 1400 may acquire the background image 120 using one or more images captured at the same viewpoint. For example, the electronic device 1400 may obtain a background image by averaging a plurality of images captured using the camera 1430 with a fixed field of view. According to an embodiment of the present disclosure, the electronic device 1400 may determine an image that does not include an object corresponding to a learning target as the background image 120. For example, the electronic device 1400 may determine an image that does not include an object corresponding to a learning target as the background image 120 among one or more captured images. According to an embodiment of the present disclosure, the electronic device 1400 may generate a background image by removing an object from an image including the object. For example, the electronic device 1400 may generate the background image 120 by removing the object area of the image including the object and compensating for the object area based on the remaining area. According to an embodiment of the present disclosure, the electronic device 1400 may determine an image including an object commonly included in a plurality of images as the background image 120. For example, the electronic device 1400 may exclude a moving object whose position moves within the plurality of images and determine an image including a stationary object whose position is fixed within the plurality of images as the background image.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 이미지를 촬영하거나 획득할 때마다 배경 이미지(120)를 획득할 수 있다. 예를 들면, 전자 장치(1400)는 관심 영역(135)을 식별할 때마다 배경 이미지(120)를 획득할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 소정의 조건을 만족하는 경우에 배경 이미지(120)를 획득할 수 있다. 예를 들면, 전자 장치(1400)는 소정의 주기마다 배경 이미지(120)를 획득하고, 새로운 배경 이미지를 획득하기 전에는 획득된 배경 이미지(120)를 이용하여 관심 영역을 식별할 수 있다. 또는, 예를 들면, 전자 장치(1400)는 특정 태스크를 수행하는 경우, 배경 이미지(120)를 생성할 수 있다. 예를 들면, 전자 장치(1400)가 운동 어플리케이션을 실행할 때, 배경 이미지(120)를 생성할 수 있다. 본 개시의 여러 실시예에 따른 배경 이미지를 획득하는 방법은 도 2a 및 도 2b를 참조하여 보다 상세히 설명하도록 한다.According to an embodiment of the present disclosure, the electronic device 1400 may acquire the background image 120 every time an image is photographed or acquired. For example, the electronic device 1400 may acquire the background image 120 whenever the region of interest 135 is identified. According to an embodiment of the present disclosure, the electronic device 1400 may acquire the background image 120 when a predetermined condition is satisfied. For example, the electronic device 1400 may acquire the background image 120 at regular intervals and identify the area of interest using the acquired background image 120 before acquiring a new background image. Or, for example, the electronic device 1400 may generate the background image 120 when performing a specific task. For example, when the electronic device 1400 runs an exercise application, the background image 120 may be generated. Methods for obtaining background images according to various embodiments of the present disclosure will be described in more detail with reference to FIGS. 2A and 2B.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 객체를 포함하는 이미지(130)의 관심 영역(135)을 식별할 수 있다. 관심 영역(135)은 객체를 포함하는 이미지(130)의 학습 대상에 해당하는 객체를 포함하는 영역을 의미할 수 있다. 예를 들면, 인공 지능 모델이 사람의 포즈를 인식하는 포즈 추정 모델인 경우, 관심 영역(135)은 사람이 포함된 영역을 의미할 수 있다. 본 개시의 일 실시예에 따라, 관심 영역(135)은 객체를 포함하는 최소 사이즈의 직사각형으로 결정될 수 있으나, 이에 제한되지 않고 다양한 형태로 변형될 수 있다. 본 개시의 일 실시예에 따른 관심 영역을 식별하는 방법은 도 3을 참조하여 보다 상세히 설명하도록 한다.According to an embodiment of the present disclosure, the electronic device 1400 may identify a region of interest 135 of the image 130 including an object. The region of interest 135 may refer to an area including an object corresponding to a learning target of the image 130 including the object. For example, if the artificial intelligence model is a pose estimation model that recognizes a person's pose, the region of interest 135 may mean an area containing a person. According to an embodiment of the present disclosure, the region of interest 135 may be determined as a rectangle of the minimum size including the object, but is not limited thereto and may be modified into various shapes. A method for identifying a region of interest according to an embodiment of the present disclosure will be described in more detail with reference to FIG. 3.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 관심 영역 이미지(140)를 획득할 수 있다. 예를 들면, 관심 영역 이미지(140)는 배경 이미지(120) 중에서 관심 영역(135)에 해당하는 이미지로 결정될 수 있다. 또는, 관심 영역 이미지(140)는 관심 영역(135)의 기초가 된 촬영된 이미지(130)를 제외한 다른 촬영 이미지에 기초하여 획득될 수 있다. 관심 영역 이미지(140)는 객체를 포함하는 이미지(130)의 객체에 가려져 촬영되지 않은 영역도 포함할 수 있다. 본 개시의 일 실시예에 따른 관심 영역 이미지를 획득하는 방법은 도 4를 참조하여 보다 상세히 설명하도록 한다.According to an embodiment of the present disclosure, the electronic device 1400 may acquire the region of interest image 140. For example, the region of interest image 140 may be determined to be an image corresponding to the region of interest 135 among the background images 120. Alternatively, the region of interest image 140 may be acquired based on a captured image other than the captured image 130 that serves as the basis for the region of interest 135 . The region of interest image 140 may also include an area that is not photographed because it is obscured by the object in the image 130 including the object. A method of acquiring a region of interest image according to an embodiment of the present disclosure will be described in more detail with reference to FIG. 4.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 원본 훈련용 데이터(150)를 획득할 수 있다. 원본 훈련용 데이터(150)는 기생성된 훈련용 데이터 세트일 수 있다. 원본 훈련용 데이터는 전자 장치의 카메라가 촬영하는 영역과 관련이 없는 임의의 배경을 포함하는 훈련용 데이터일 수 있다. 예를 들면, 원본 훈련용 데이터는 여러 배경에서 촬영된 훈련용 데이터일 수 있다. 전자 장치(1400)가 원본 훈련용 데이터를 이용하여 훈련된 인공 지능 모델을 이용하는 경우 소정의 배경에 대하여 정확도가 낮은 경우가 발생할 수 있다. 예를 들면, 사람의 포즈를 추론하기 위한 인공 지능 모델이 이미지에 포함된 창틀을 사람의 팔로 판단할 수 있다. 본 개시의 일 실시예에 따른 원본 훈련용 데이터는 도 5a 및 도 5b를 참조하여 보다 상세히 설명하도록 한다.According to an embodiment of the present disclosure, the electronic device 1400 may acquire original training data 150. The original training data 150 may be a parasitic training data set. The original training data may be training data containing random background that is not related to the area captured by the camera of the electronic device. For example, the original training data may be training data taken from various backgrounds. When the electronic device 1400 uses an artificial intelligence model trained using original training data, accuracy may be low for a certain background. For example, an artificial intelligence model for inferring a person's pose may judge a window frame included in an image to be a person's arm. Original training data according to an embodiment of the present disclosure will be described in more detail with reference to FIGS. 5A and 5B.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 합성 훈련용 데이터(160)를 생성할 수 있다. 전자 장치(1400)는 원본 훈련용 데이터(150)와 관심 영역 이미지(140)에 기초하여 합성 훈련용 데이터(160)를 생성할 수 있다. 합성 훈련용 데이터(160)는 원본 훈련용 데이터(150)로부터 인공 지능 모델의 학습 대상이 되는 객체에 관한 데이터와 관심 영역 이미지(140)를 합성함으로써 생성될 수 있다. 예를 들면, 전자 장치(1400)는 원본 훈련용 데이터(150)에 포함된 사람과 배경(110) 중 적어도 일부인 관심 영역 이미지(140)를 합성함으로써, 배경(110)과 학습 대상인 객체가 포함된 합성 훈련용 데이터(160)를 생성할 수 있다. According to an embodiment of the present disclosure, the electronic device 1400 may generate synthetic training data 160. The electronic device 1400 may generate synthetic training data 160 based on the original training data 150 and the region of interest image 140. Synthetic training data 160 may be generated by synthesizing data about an object that is a learning target of an artificial intelligence model and a region of interest image 140 from the original training data 150. For example, the electronic device 1400 combines the region of interest image 140, which is at least part of the person and the background 110 included in the original training data 150, to create a region of interest image 140 that includes the background 110 and the object to be learned. Synthetic training data 160 can be generated.

본 개시의 일 실시예에 따른 전자 장치(1400)는 원본 훈련용 데이터(150)와 관심 영역 이미지(140)에 기초하여 목표 훈련용 데이터를 결정할 수 있다. 본 개시의 일 실시예에 따라, 목표 훈련용 데이터는 원본 훈련용 데이터(150) 중에서 관심 영역 이미지(140)의 사이즈에 기초하여 결정될 수 있다. 목표 훈련용 데이터는 원본 훈련용 데이터(150) 중에서 관심 영역 이미지(140)의 너비 및 높이의 비율에 대응되는 것으로 결정될 수 있다. 예를 들면, 관심 영역 이미지(140)의 너비 및 높이의 비율이 1 대 2인 경우, 목표 훈련용 데이터는 원본 훈련용 데이터 중에서 너비 및 높이의 비율이 1 대 2와 동일하거나 유사한 것으로 결정될 수 있다. 전자 장치(1400)는 목표 훈련용 데이터 및 관심 영역 이미지(140)를 이용하여 합성 훈련용 데이터(160)를 생성할 수 있다.The electronic device 1400 according to an embodiment of the present disclosure may determine target training data based on the original training data 150 and the region of interest image 140. According to an embodiment of the present disclosure, target training data may be determined based on the size of the region of interest image 140 among the original training data 150. The target training data may be determined to correspond to the ratio of the width and height of the region of interest image 140 among the original training data 150. For example, if the ratio of the width and height of the region of interest image 140 is 1 to 2, the target training data may be determined to have a ratio of width and height of 1 to 2 among the original training data. . The electronic device 1400 may generate synthetic training data 160 using the target training data and the region of interest image 140.

전자 장치(1400)는 배경(110)에 기초하여 생성된 합성 훈련용 데이터(160)를 이용하여 인공 지능 모델을 학습시킬 수 있다. 합성 훈련용 데이터(160)를 이용하여 학습된 인공 지능 모델은 배경(110)에 대하여 촬영된 입력 이미지에 대하여 보다 높은 정확도로 인공 지능 모델의 목적을 수행할 수 있다. The electronic device 1400 may learn an artificial intelligence model using synthetic training data 160 generated based on the background 110. The artificial intelligence model learned using the synthetic training data 160 can perform the purpose of the artificial intelligence model with higher accuracy for the input image taken against the background 110.

이하에서는 설명의 편의를 위하여, 별도의 설명이 없는 경우 인공 지능 모델은 포즈 추정 모델이고, 학습 대상은 사람인 경우를 가정하여 설명하도록 하지만, 이에 제한되지 않고, 인공 지능 모델의 종류에 따라 학습 대상이 변경될 수 있다.For convenience of explanation, in the following, unless otherwise stated, it will be explained on the assumption that the artificial intelligence model is a pose estimation model and the learning target is a human. However, this is not limited to this, and the learning target varies depending on the type of artificial intelligence model. can be changed.

도 2a는 본 개시의 일 실시예에 따른 배경 이미지를 생성하는 방법을 설명하기 위한 도면이다.FIG. 2A is a diagram for explaining a method of generating a background image according to an embodiment of the present disclosure.

도 2a를 참조하면, 전자 장치(1400)는 복수의 촬영 이미지(210, 220, 230)에 기초하여 배경 이미지(240)를 획득할 수 있다. 일 예에 따른 설명을 위하여, 제1 촬영 이미지(210)와 제2 촬영 이미지(220)는 상이한 위치에 객체를 포함하는 이미지이고, 제3 촬영 이미지(230)는 객체를 포함하지 않는 이미지일 수 있다. 설명의 편의를 위하여, 학습 대상이 하나의 이동 객체(예: 사람)인 경우를 예를 들어 설명하고, 이동 객체 하나를 포함하는 촬영 이미지를 대상으로 설명하지만, 이에 제한되지 않고, 학습 대상은 훈련용 데이터에 따라 결정될 수 있으며 복수의 객체를 포함하는 촬영 이미지를 이용하는 경우에도 동일하게 동작 할 수 있다.Referring to FIG. 2A , the electronic device 1400 may acquire a background image 240 based on a plurality of captured images 210, 220, and 230. For explanation according to an example, the first captured image 210 and the second captured image 220 may be images including objects in different positions, and the third captured image 230 may be an image that does not include an object. there is. For convenience of explanation, an example will be given where the learning target is a single moving object (e.g., a person), and a captured image containing one moving object will be described as an example, but the learning target is not limited to this and the learning target is trained. It can be determined according to the data used and can operate in the same way even when using a captured image containing multiple objects.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 복수의 촬영 이미지(210, 220, 230) 내에서 위치가 이동하는 이동 객체를 제외하고, 복수의 이미지 내에서 위치가 고정된 고정 객체를 포함하도록 배경 이미지를 생성할 수 있다. 전자 장치(1400)는 복수의 이미지(210, 220, 230) 내에서 위치가 이동하는 이동 객체 및 위치가 고정된 고정 객체를 분류할 수 있다. 전자 장치(1400)는 고정 객체에 대응되는 이미지에 기초하여 배경 이미지(240)를 획득할 수 있다. 예를 들면, 전자 장치(1400)는 제1 촬영 이미지(210) 및 제2 촬영 이미지(220)에 포함된 객체는 이동 객체로 식별하고, 이외는 고정 객체로 식별할 수 있다. 전자 장치(1400)는 제3 촬영 이미지(230)에 포함된 객체들은 고정 객체로 식별할 수 있다. 전자 장치(1400)는 제1 촬영 이미지(210) 중 고정 객체에 대응되는 이미지, 제2 촬영 이미지(220) 중 고정 객체에 대응되는 이미지 및 제3 촬영 이미지(230) 중 고정 객체에 대응되는 이미지에 기초하여 배경 이미지(240)를 생성할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 excludes moving objects whose positions move within the plurality of captured images 210, 220, and 230, and uses fixed objects whose positions are fixed within the plurality of images. You can create a background image to include it. The electronic device 1400 may classify moving objects whose positions move and fixed objects whose positions are fixed within the plurality of images 210, 220, and 230. The electronic device 1400 may acquire the background image 240 based on the image corresponding to the fixed object. For example, the electronic device 1400 may identify objects included in the first captured image 210 and the second captured image 220 as moving objects, and identify other objects as fixed objects. The electronic device 1400 may identify objects included in the third captured image 230 as fixed objects. The electronic device 1400 displays an image corresponding to a fixed object among the first captured images 210, an image corresponding to the fixed object among the second captured images 220, and an image corresponding to the fixed object among the third captured images 230. The background image 240 can be generated based on .

본 개시의 일 실시예에 따른 전자 장치(1400)는 복수의 촬영 이미지(210, 220, 230)를 평균함으로써, 배경 이미지(120)를 획득할 수 있다. 복수의 촬영 이미지는 제1 촬영 이미지(210) 혹은 제2 촬영 이미지(220)와 같이 객체를 포함하는 이미지들을 포함할 수 있다. 그러나, 전자 장치(1400) 다수의 이미지들을 평균함으로써 객체를 포함하지 않는 배경 이미지(240)를 생성할 수 있다.The electronic device 1400 according to an embodiment of the present disclosure may obtain the background image 120 by averaging the plurality of captured images 210, 220, and 230. The plurality of captured images may include images including objects, such as the first captured image 210 or the second captured image 220. However, the electronic device 1400 may generate a background image 240 that does not include an object by averaging multiple images.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 학습 대상에 해당하는 객체를 포함하지 않은 이미지를 배경 이미지(240)로 결정할 수 있다. 예를 들면, 전자 장치(1400)는 복수의 촬영 이미지(210, 220, 230) 중에서 학습 대상에 해당하는 객체가 포함되지 않은 제3 촬영 이미지(230)를 배경 이미지(240)로 결정 할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may determine an image that does not include an object corresponding to a learning target as the background image 240. For example, the electronic device 1400 may determine the third captured image 230, which does not include an object corresponding to the learning target, as the background image 240 among the plurality of captured images 210, 220, and 230. .

본 개시의 일 실시예에 따라 전자 장치(1400)가 객체를 포함하는 촬영 이미지를 이용하여 배경 이미지를 생성하는 과정은 도 2b를 참조하여 설명한다.A process in which the electronic device 1400 generates a background image using a captured image including an object according to an embodiment of the present disclosure will be described with reference to FIG. 2B.

도 2b는 본 개시의 일 실시예에 따른 배경 이미지를 생성하는 방법을 설명하기 위한 도면이다.FIG. 2B is a diagram for explaining a method of generating a background image according to an embodiment of the present disclosure.

도 2b를 참조하면, 전자 장치(1400)는 촬영 이미지(250)에 포함된 객체를 제거함으로써 배경 이미지(270)를 획득할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 학습 대상에 해당하는 객체를 포함하는 촬영 이미지(250)의 객체 영역을 제거함으로써 객체 제거 이미지(260)를 획득할 수 있다. 전자 장치(1400)는 촬영 이미지(250)에 포함된 학습 대상에 해당하지 않는 객체를 포함하는 객체 제거 이미지(260)를 획득할 수 있다. 예를 들면, 촬영 이미지(250)에 학습 대상인 사람과 학습 대상이 아닌 강아지가 포함된 경우, 전자 장치(1400)는 촬영 이미지(250) 중에서 사람을 제거하고 강아지는 제거하지 않은 객체 제거 이미지(260)를 획득할 수 있다. Referring to FIG. 2B, the electronic device 1400 may obtain the background image 270 by removing an object included in the captured image 250. According to an embodiment of the present disclosure, the electronic device 1400 may obtain the object removal image 260 by removing the object area of the captured image 250 including the object corresponding to the learning target. The electronic device 1400 may obtain an object removal image 260 that includes an object that does not correspond to the learning target included in the captured image 250. For example, when the captured image 250 includes a person who is a learning target and a dog that is not a learning target, the electronic device 1400 removes the person from the captured image 250 and does not remove the dog from the object removal image 260. ) can be obtained.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 객체 제거 이미지(260)의 객체 영역 이외의 영역에 기초하여 객체 영역을 보상하는 방법으로 배경 이미지(270)를 생성할 수 있다. 예를 들면, 전자 장치(1400)는 객체 제거 이미지(260) 중 제거되지 않은 영역의 픽셀 값에 기초하여 제거된 영역의 픽셀 값을 복원할 수 있다. 전자 장치(1400)는 다양한 방식의 Inpainting 기법을 이용하여, 객체를 포함하는 이미지로부터 배경 이미지를 획득할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may generate the background image 270 by compensating for the object area based on an area other than the object area of the object removal image 260. For example, the electronic device 1400 may restore the pixel value of the removed area based on the pixel value of the non-removed area of the object removal image 260. The electronic device 1400 can obtain a background image from an image including an object using various inpainting techniques.

도 3은 본 개시의 일 실시예에 따른 관심 영역을 식별하는 방법을 설명하기 위한 도면이다.Figure 3 is a diagram for explaining a method of identifying a region of interest according to an embodiment of the present disclosure.

도 3을 참조하면, 배경 이미지(320)는 도 1의 배경 이미지(120), 도 2a의 배경 이미지(240) 또는 도 2b의 배경 이미지(270)에 대응될 수 있으나, 이에 제한되지 않는다. 촬영 이미지(310)는 도 2a의 복수의 촬영 이미지(210, 220, 230) 혹은 도 2b의 촬영 이미지(250)에 대응될 수 있지만, 이에 제한되지 않고, 따로 촬영된 이미지일 수 있다.Referring to FIG. 3, the background image 320 may correspond to the background image 120 of FIG. 1, the background image 240 of FIG. 2A, or the background image 270 of FIG. 2B, but is not limited thereto. The captured image 310 may correspond to the plurality of captured images 210, 220, and 230 of FIG. 2A or the captured image 250 of FIG. 2B, but is not limited thereto and may be a separately photographed image.

본 개시의 일 실시예에 따른 전자 장치(1400)는 촬영 이미지(310)와 배경 이미지(320)에 기초하여 잔차 이미지(330)를 생성할 수 있다. 예를 들면, 잔차 이미지(330)는 학습 대상의 객체를 포함할 수 있다. 본 개시의 일 실시예에 따른 전자 장치(1400)는 잔차 이미지(330)에 기초하여 관심 영역(340)을 결정할 수 있다. 본 개시의 일 실시예에 따라, 관심 영역(340)은 잔차 이미지(330) 내 객체를 포함하는 최소한의 직사각형일 수 있다.The electronic device 1400 according to an embodiment of the present disclosure may generate a residual image 330 based on the captured image 310 and the background image 320. For example, the residual image 330 may include an object to be studied. The electronic device 1400 according to an embodiment of the present disclosure may determine the region of interest 340 based on the residual image 330. According to one embodiment of the present disclosure, the region of interest 340 may be a minimal rectangle containing the object in the residual image 330.

본 개시의 일 실시예에 따른 잔차 이미지(330)는 객체에 관한 잔차 이미지(350), 그림자에 관한 잔차 이미지(260) 및 노이즈에 관한 잔차 이미지(370) 중 적어도 일부를 포함할 수 있다. 본 개시의 일 실시예에 따라, 관심 영역(340)은 객체에 관한 잔차 이미지(350)를 포함하는 최소한의 직사각형일 수 있다.The residual image 330 according to an embodiment of the present disclosure may include at least a portion of a residual image 350 for an object, a residual image 260 for a shadow, and a residual image 370 for noise. According to one embodiment of the present disclosure, the region of interest 340 may be a minimal rectangle containing the residual image 350 for the object.

본 개시의 일 실시예에 따라, 전자 장치(1400)가 연속적으로 이미지를 촬영하는 경우, 촬영된 복수의 이미지(310)에 기초하여 관심 영역(340)을 결정할 수 있다. 예를 들면, 전자 장치(1400)는 복수의 이미지(310) 사이의 잔차 이미지에 기초하여 관심 영역을 결정할 수 있다.According to an embodiment of the present disclosure, when the electronic device 1400 continuously captures images, the region of interest 340 may be determined based on the plurality of captured images 310. For example, the electronic device 1400 may determine the region of interest based on the residual image between the plurality of images 310.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 소정의 시간 간격마다 관심 영역을 식별할 수 있다. 예를 들면, 전자 장치(1400)는 1초마다 관심 영역을 식별하고, 관심 영역이 존재하는 경우 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the electronic device 1400 may identify a region of interest at predetermined time intervals. For example, the electronic device 1400 may identify a region of interest every second and generate synthetic training data if the region of interest exists.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 소정의 조건을 충족하는 것에 기초하여 관심 영역을 식별할 수 있다. 예를 들면, 전자 장치(1400)는 특정 애플리케이션(예: 운동 애플리케이션)이 실행되는 경우, 관심 영역을 식별할 수 있다. According to an embodiment of the present disclosure, the electronic device 1400 may identify an area of interest based on satisfying a predetermined condition. For example, the electronic device 1400 may identify an area of interest when a specific application (eg, an exercise application) is executed.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 센서를 통해 움직임이 감지되는 경우, 관심 영역을 식별할 수 있다. 예를 들면, 전자 장치(1400)는 적외선 센서를 이용하여 물체의 움직임을 인식한 경우 관심 영역을 식별하는 동작을 수행할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may identify an area of interest when movement is detected through a sensor. For example, when the electronic device 1400 recognizes the movement of an object using an infrared sensor, it may perform an operation to identify an area of interest.

도 4는 본 개시의 일 실시예에 따른 관심 영역 이미지를 생성하는 방법을 설명하기 위한 도면이다.FIG. 4 is a diagram for explaining a method of generating a region of interest image according to an embodiment of the present disclosure.

도 4를 참조하면, 관심 영역(410a, 410b, 410c, 410d) 및 배경 이미지(420)는 도 1의 관심 영역(135) 및 배경 이미지(120)에 대응될 수 있으나, 이에 제한되지 않는다.Referring to FIG. 4, the regions of interest 410a, 410b, 410c, and 410d and the background image 420 may correspond to the region of interest 135 and the background image 120 of FIG. 1, but are not limited thereto.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 복수의 관심 영역(410a, 410b, 410c, 410d)에 관한 정보와 배경 이미지(420)를 획득할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 배경 이미지(420)로부터 관심 영역(410a, 410b, 410c, 410d)에 대응되는 관심 영역 이미지(430a, 430b, 430c, 430d)를 획득할 수 있다. 따라서, 전자 장치(1400)는 배경 이미지(420) 중에서 객체가 식별된 영역에 대한 배경인 관심 영역 이미지(430a, 430b, 430c, 430d)를 획득할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may obtain information about a plurality of regions of interest 410a, 410b, 410c, and 410d and a background image 420. According to an embodiment of the present disclosure, the electronic device 1400 may acquire region-of-interest images 430a, 430b, 430c, and 430d corresponding to the regions of interest 410a, 410b, 410c, and 410d from the background image 420. You can. Accordingly, the electronic device 1400 may acquire the region-of-interest images 430a, 430b, 430c, and 430d, which are the background for the area where the object is identified, from the background image 420.

본 개시의 일 실시예에 따라, 배경 이미지(420)는 관심 영역 이미지(430a, 430b, 430c, 430d)를 획득할 때마다 생성된 것일 수 있다. 본 개시의 일 실시예에 따라, 배경 이미지(420)는 특정한 시점에 생성된 배경 이미지일 수 있다.According to an embodiment of the present disclosure, the background image 420 may be generated each time the region of interest images 430a, 430b, 430c, and 430d are acquired. According to an embodiment of the present disclosure, the background image 420 may be a background image created at a specific point in time.

도 5a는 본 개시의 일 실시예에 따른 훈련용 데이터를 설명하기 위한 도면이다.FIG. 5A is a diagram for explaining training data according to an embodiment of the present disclosure.

도 5a를 참조하면, 훈련용 데이터는 훈련용 이미지(510a) 및 훈련용 데이터(520a)를 포함할 수 있다. 본 개시의 일 실시예에 따라, 훈련용 데이터는 객체의 포즈 추정(pose estimation)에 관한 인공 지능 모델을 학습시키기 위한 것일 수 있다. Referring to FIG. 5A, training data may include a training image 510a and training data 520a. According to an embodiment of the present disclosure, training data may be used to learn an artificial intelligence model for pose estimation of an object.

훈련용 이미지(510a)는 포즈를 추정하려는 객체를 포함하는 이미지일 수 있다. 또한, 훈련용 이미지(510a)는 임의의 배경을 포함할 수 있다. 훈련용 이미지(510a)의 배경은 실제 인공 지능 모델이 수행되는 환경의 배경과 상이할 수 있다. The training image 510a may be an image including an object whose pose is to be estimated. Additionally, the training image 510a may include an arbitrary background. The background of the training image 510a may be different from the background of the environment in which the actual artificial intelligence model is performed.

훈련용 데이터(520a)는 포즈를 식별하기 위한 정보를 포함할 수 있다. 예를 들면, 훈련용 데이터(520a)는 사람의 관절 정보 및 관절 사이의 연결 관계에 관한 정보를 포함할 수 있다. 설명의 편의를 위하여, 훈련용 데이터(520a)는 훈련용 이미지(510a)에 사람의 관절 정보 및 관절 사이의 연결 관계에 관한 정보를 포함하는 이미지로 도시 되었지만, 이에 제한되지 않고, 훈련용 데이터(520a)는 훈련을 위한 데이터가 다양한 방식으로 표현된 형태일 수 있다.Training data 520a may include information for identifying poses. For example, the training data 520a may include information about a person's joints and information about the connection relationships between joints. For convenience of explanation, the training data 520a is shown in the training image 510a as an image including information about the joint information of a person and the connection relationship between joints, but is not limited thereto, and the training data ( 520a) may be a form in which data for training is expressed in various ways.

포즈 추정에 관한 인공 지능 모델은 훈련용 이미지(510a) 및 포즈에 관한 정보를 포함하는 훈련용 데이터(520a)를 이용하여 학습될 수 있다. 예를 들면, 훈련용 이미지(510a)를 인공 지능 모델에 입력함으로써 획득되는 출력 데이터와 훈련용 데이터(520a) 사이의 오차를 줄이는 방향으로 인공 지능 모델이 결과 데이터를 출력할 수 있도록 학습될 수 있다. An artificial intelligence model for pose estimation can be learned using the training image 510a and training data 520a including information about poses. For example, the artificial intelligence model can be trained to output result data in a way that reduces the error between the output data obtained by inputting the training image 510a into the artificial intelligence model and the training data 520a. .

도 5b는 본 개시의 일 실시예에 따른 훈련용 데이터를 설명하기 위한 도면이다.Figure 5b is a diagram for explaining training data according to an embodiment of the present disclosure.

도 5b를 참조하면, 훈련용 데이터는 훈련용 이미지(510b) 및 훈련용 데이터(520b)를 포함할 수 있다. 본 개시의 일 실시예에 따라, 훈련용 데이터는 객체 탐지(Object detection)에 관한 인공 지능 모델을 학습시키기 위한 것일 수 있다. Referring to FIG. 5B, training data may include a training image 510b and training data 520b. According to an embodiment of the present disclosure, training data may be used to learn an artificial intelligence model for object detection.

훈련용 이미지(510b)는 포즈를 추정하려는 객체를 포함하는 이미지일 수 있다. 또한, 훈련용 이미지(510b)는 임의의 배경을 포함할 수 있다. 훈련용 이미지(510b)의 배경은 실제 인공 지능 모델이 수행되는 환경의 배경과 상이할 수 있다. The training image 510b may be an image containing an object whose pose is to be estimated. Additionally, the training image 510b may include an arbitrary background. The background of the training image 510b may be different from the background of the environment in which the actual artificial intelligence model is performed.

훈련용 데이터(520b)는 탐지의 대상인 객체에 관한 정보를 포함할 수 있다. 예를 들면, 훈련용 데이터(520b)는 객체의 존재 영역(예: Bounding Box)(530b) 및 객체의 종류(예: 사람)에 관한 정보를 포함할 수 있다. 설명의 편의를 위하여, 훈련용 데이터(520b)는 훈련용 이미지(510b)에 객체의 존재 영역(530b) 및 객체의 종류에 관한 정보를 포함하는 이미지로 도시 되었지만, 이에 제한되지 않고, 훈련용 데이터(520b)는 훈련을 위한 데이터를 다양한 방식으로 표현될 수 있다. 예를 들면, 훈련용 데이터(520b)는 텍스트 형태로 저장될 수 있다. 예를 들면, 훈련용 데이터(520b)는 객체의 존재 영역(530b)에 대한 좌표 값(예: Bounding box의 좌상단 좌표의 좌표 값 및 우하단 좌표의 좌표 값) 및 객체의 종류에 관한 정보를 텍스트 형태로 표현될 수 있다.Training data 520b may include information about the object that is the target of detection. For example, the training data 520b may include information about the object's presence area (eg, Bounding Box) 530b and the type of object (eg, person). For convenience of explanation, the training data 520b is shown as an image including information about the object's presence area 530b and the type of the object in the training image 510b, but is not limited thereto, and the training data 520b is (520b) can express data for training in various ways. For example, training data 520b may be stored in text form. For example, the training data 520b contains coordinate values for the object's presence area 530b (e.g., coordinate values of the upper left coordinate and lower right coordinate of the bounding box) and information about the type of object as text. It can be expressed in form.

객체 탐지에 관한 인공 지능 모델은 훈련용 이미지(510b) 및 객체의 존재 영역 및 객체의 종류에 관한 정보를 포함하는 훈련용 데이터(520b)를 이용하여 학습될 수 있다. 예를 들면, 인공 지능 모델은 훈련용 이미지(510b)를 인공 지능 모델의 입력함으로써 획득되는 출력 데이터와 훈련용 데이터(520b) 사이의 오차를 줄이는 방향으로 인공 지능 모델이 데이터를 출력할 수 있도록 학습될 수 있다.An artificial intelligence model for object detection can be learned using the training image 510b and training data 520b including information about the area in which the object exists and the type of the object. For example, the artificial intelligence model learns to output data in a way that reduces the error between the output data obtained by inputting the training image 510b to the artificial intelligence model and the training data 520b. It can be.

도 6는 본 개시의 일 실시예에 따른 인공 지능 모델을 학습하는 방법을 설명하기 위한 도면이다.Figure 6 is a diagram for explaining a method of learning an artificial intelligence model according to an embodiment of the present disclosure.

도 6을 참조하면, 훈련용 데이터(610)를 이용하여 인공 지능 모델(620)을 학습하는 방법이 도시된다. 훈련용 데이터(610)는 도 5a, 도 5b의 훈련용 데이터일 수 있지만, 이에 제한되지 않고, 인공 지능 모델의 종류에 대응되는 훈련용 데이터일 수 있다. 본 개시의 일 실시예에 따라 전자 장치(1400)가 인공 지능 모델을 학습하는 과정을 설명하지만, 이에 제한되지 않고, 인공 지능 모델은 서버를 이용하여 학습될 수 있다.Referring to FIG. 6 , a method of learning an artificial intelligence model 620 using training data 610 is shown. The training data 610 may be the training data of FIGS. 5A and 5B, but is not limited thereto, and may be training data corresponding to the type of artificial intelligence model. A process by which the electronic device 1400 learns an artificial intelligence model according to an embodiment of the present disclosure will be described, but the present disclosure is not limited thereto, and the artificial intelligence model may be learned using a server.

훈련용 데이터(610)는 인공 지능 모델(620)의 입력 대상인 훈련용 이미지와 훈련용 정보를 포함할 수 있다. 전자 장치(1400)는 훈련용 데이터(610)를 인공 지능 모델(620)에 입력함으로써, 예측 데이터(630)를 출력할 수 있다. 전자 장치(1400)는 훈련용 데이터(610)의 훈련용 정보와 예측 데이터(630)에 기초하여 예측 오차(640)를 측정할 수 있다. The training data 610 may include training images and training information that are input targets for the artificial intelligence model 620. The electronic device 1400 may output prediction data 630 by inputting training data 610 into the artificial intelligence model 620. The electronic device 1400 may measure the prediction error 640 based on the training information of the training data 610 and the prediction data 630.

전자 장치(1400)는 예측 오차(640)에 기초하여 인공지능 모델(620)의 파라미터(650)를 갱신할 수 있다. The electronic device 1400 may update the parameter 650 of the artificial intelligence model 620 based on the prediction error 640.

본 개시의 일 실시예에 따른, 전자 장치(1400)는 인공 지능 모델(620)에 관한 학습을 진행하면서 훈련용 데이터(610)에 적합하게 학습된다. 전자 장치가 배경 이미지의 적어도 일부가 포함된 훈련용 데이터를 이용하여 인공 지능 모델을 학습하는 경우, 임의의 배경에 대한 훈련용 데이터를 이용하여 훈련시킨 인공 지능 모델 보다 예측 정확도를 높일 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 is appropriately trained on the training data 610 while learning the artificial intelligence model 620. When an electronic device learns an artificial intelligence model using training data that includes at least a portion of a background image, prediction accuracy can be improved compared to an artificial intelligence model trained using training data for a random background.

도 7은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.Figure 7 is a diagram for explaining a method of generating training data according to an embodiment of the present disclosure.

도 7을 참조하면, 합성 훈련용 데이터(750)는 원본 훈련용 데이터(710) 및 관심 영역 이미지(740)에 기초하여 생성될 수 있다. 본 개시의 일 실시예에 따라, 원본 훈련용 데이터(710), 관심 영역 이미지(740) 및 합성 훈련용 데이터(750)는 도 1의 원본 훈련용 데이터(150), 관심 영역 이미지(140) 및 합성 훈련용 데이터(160)에 대응될 수 있다. 설명의 편의를 위하여, 원본 훈련용 데이터(710), 관심 영역 이미지(740) 및 합성 훈련용 데이터(750) 중에서 도 1에서 설명된 것은 생략한다.Referring to FIG. 7 , synthetic training data 750 may be generated based on the original training data 710 and the region of interest image 740. According to an embodiment of the present disclosure, the original training data 710, the region of interest image 740, and the synthetic training data 750 are the original training data 150, the region of interest image 140, and It may correspond to synthetic training data 160. For convenience of explanation, those described in FIG. 1 among the original training data 710, the interest region image 740, and the synthetic training data 750 are omitted.

본 개시의 일 실시예에 따른 전자 장치는 원본 훈련용 데이터(710)에 기초하여 목표 훈련용 데이터(720)를 결정할 수 있다. 전자 장치는 관심 영역 이미지(740)의 너비와 높이의 비율에 기초하여 목표 훈련용 데이터(720)를 결정할 수 있다. 또는, 전자 장치는 도 1에서 설명된 관심 영역의 너비와 높이의 비율에 기초하여 목표 훈련용 데이터(720)를 결정할 수 있다. 목표 훈련용 데이터(720)는 원본 훈련용 데이터(710) 중에서 선택될 수 있다.The electronic device according to an embodiment of the present disclosure may determine target training data 720 based on original training data 710. The electronic device may determine target training data 720 based on the ratio of the width and height of the region of interest image 740. Alternatively, the electronic device may determine target training data 720 based on the ratio of the width and height of the region of interest described in FIG. 1. The target training data 720 may be selected from the original training data 710.

전자 장치는 목표 훈련용 데이터(720)로부터 학습 대상에 해당하는 객체 (730)에 관한 데이터를 획득할 수 있다. 전자 장치는 목표 훈련용 데이터(720)에 포함된 객체 이미지를 분할할 수 있다. 예를 들면, 전자 장치는 목표 훈련용 데이터(720)에 포함된 객체(730)에 대한 이미지 분할(Image segmentation)을 수행할 수 있다. 객체(730)는 인공 지능 모델의 인식 대상이 되는 것을 의미할 수 있다. 예를 들면, 인공 지능 모델이 사람을 인식하는 모델인 경우, 사람 이외의 가구 등은 분할되지 않을 수 있지만, 이에 제한되지 않고 인공 지능 모델의 종류에 따라 다양한 형태의 객체(730)가 결정될 수 있다.The electronic device may obtain data about the object 730 corresponding to the learning target from the target training data 720. The electronic device may segment the object image included in the target training data 720. For example, the electronic device may perform image segmentation on the object 730 included in the target training data 720. The object 730 may mean that it is a recognition target of an artificial intelligence model. For example, if the artificial intelligence model is a model that recognizes people, furniture other than people may not be divided, but this is not limited to this, and various types of objects 730 may be determined depending on the type of artificial intelligence model. .

전자 장치는 목표 훈련용 데이터(720)로부터 학습 대상을 분할하도록 학습된 인공 지능 모델을 이용하여 목표 훈련용 데이터(720)로부터 객체(730)를 분할할 수 있다. 전자 장치는 목표 훈련용 데이터(720)의 객체(730)와 배경 사이의 색상 차이를 이용하여 목표 훈련용 데이터(720)로부터 객체(730)를 분할 할 수 있다. 앞서 언급된 방법에 제한되지 않고, 전자 장치는 다양한 방법으로 목표 훈련용 데이터(720)로부터 객체(730)를 분할 할 수 있다. The electronic device may segment the object 730 from the target training data 720 using an artificial intelligence model learned to segment the learning object from the target training data 720. The electronic device may segment the object 730 from the target training data 720 using the color difference between the object 730 of the target training data 720 and the background. Without being limited to the previously mentioned methods, the electronic device may segment the object 730 from the target training data 720 in various ways.

전자 장치는 관심 영역 이미지(740)에 객체(730)를 합성함으로써 합성 훈련용 데이터(750)를 생성할 수 있다.The electronic device may generate synthetic training data 750 by combining the object 730 with the region of interest image 740.

도 8은 본 개시의 일 실시예에 따른 추가 합성 훈련용 데이터를 생성하는지 여부를 결정하는 방법을 설명하기 위한 도면이다.Figure 8 is a diagram for explaining a method of determining whether to generate additional synthetic training data according to an embodiment of the present disclosure.

도 8을 참조하면, 전자 장치는 복수의 테스트 이미지(810, 820)를 이용하여 인공 지능 모델(830)의 정확도를 판단(840)할 수 있다.Referring to FIG. 8 , the electronic device may determine the accuracy of the artificial intelligence model 830 (840) using a plurality of test images (810, 820).

본 개시의 일 실시예에 따른, 제1 테스트 이미지(810)는 소정의 배경과 목표 훈련용 데이터에 기초하여 생성될 수 있다. 예를 들면, 제1 테스트 이미지(810)는 목표 훈련용 데이터의 학습 대상에 해당하는 객체와 소정의 배경을 합성함으로써 생성될 수 있다. 소정의 배경은 훈련용 데이터에 영향을 주지 않는 배경일 수 있다. 예를 들면, 소정의 배경은 단색의 회색일 수 있다.According to an embodiment of the present disclosure, the first test image 810 may be generated based on a predetermined background and target training data. For example, the first test image 810 may be generated by combining an object corresponding to a learning target of target training data and a predetermined background. The predetermined background may be a background that does not affect training data. For example, the predetermined background may be a solid gray color.

본 개시의 일 실시예에 따른, 제2 테스트 이미지(820)는 배경 이미지와 목표 훈련용 데이터에 기초하여 생성될 수 있다. 예를 들면, 제2 테스트 이미지(820)는 목표 훈련용 데이터의 학습 대상에 해당하는 객체와 배경 이미지의 적어도 일부를 합성함으로써 생성될 수 있다. 예를 들면, 제2 테스트 이미지(820)는 합성 훈련용 데이터일 수 있다.According to an embodiment of the present disclosure, the second test image 820 may be generated based on the background image and target training data. For example, the second test image 820 may be generated by combining at least a portion of an object corresponding to a learning target of target training data and a background image. For example, the second test image 820 may be synthetic training data.

본 개시의 일 실시예에 따른 전자 장치는 인공 지능 모델(830)에 제1 테스트 이미지(810)와 제2 테스트 이미지(820)를 각각 입력하여 정확도를 판단할 수 있다. 예를 들면, 전자 장치는 인공 지능 모델(830)에 제1 테스트 이미지(810)와 제2 테스트 이미지(820)가 입력되는 것에 기초하여 출력되는 데이터와 정답 데이터를 비교함으로써 정확도를 판단(840)할 수 있다. The electronic device according to an embodiment of the present disclosure may determine accuracy by inputting the first test image 810 and the second test image 820 into the artificial intelligence model 830, respectively. For example, the electronic device determines accuracy by comparing the output data and the correct answer data based on the first test image 810 and the second test image 820 being input to the artificial intelligence model 830 (840). can do.

전자 장치는 제1 테스트 이미지(810)에 대한 정확도 및 제2 테스트 이미지(820)에 대한 정확도에 기초하여, 추가 합성 훈련용 데이터를 생성하는지 여부를 결정(850)할 수 있다. 본 개시의 일 실시예에 따른 제1 테스트 이미지(810)에 대한 정확도 및 제2 테스트 이미지(820)에 대한 정확도를 이용하여, 추가 합성 훈련용 데이터를 생성하는지 여부를 결정(850)하는 방법은 도 9a 및 도 9b를 참조하여 보다 상세하게 설명한다.The electronic device may determine (850) whether to generate additional synthetic training data based on the accuracy of the first test image 810 and the accuracy of the second test image 820. A method of determining (850) whether to generate additional synthetic training data using the accuracy of the first test image 810 and the accuracy of the second test image 820 according to an embodiment of the present disclosure is This will be described in more detail with reference to FIGS. 9A and 9B.

도 9a는 본 개시의 일 실시예에 따른 추가 합성 훈련용 데이터를 생성하는지 여부를 결정하는 방법을 설명하기 위한 순서도이다.FIG. 9A is a flowchart illustrating a method of determining whether to generate additional synthetic training data according to an embodiment of the present disclosure.

도 9a를 참조하면, 본 개시의 일 실시예에 따른 전자 장치(1400)가 추가 합성 훈련용 데이터를 생성하는지 여부를 결정하는 방법(900a)은 단계 910a로부터 진행된다. 전자 장치(1400)는 배경 이미지의 적어도 일부를 포함하는 훈련용 데이터로 학습된 인공 지능 모델을 이용하여, 추가 합성 훈련용 데이터를 생성하는지 여부를 결정할 수 있다.Referring to FIG. 9A , the method 900a of determining whether the electronic device 1400 generates additional synthetic training data according to an embodiment of the present disclosure proceeds from step 910a. The electronic device 1400 may determine whether to generate additional synthetic training data using an artificial intelligence model learned with training data including at least a portion of the background image.

단계 910a에서, 전자 장치(1400)는 테스트 이미지를 생성할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 합성 훈련용 데이터에 포함된 훈련용 이미지를 테스트 이미지로 이용할 수 있다. 예를 들면, 전자 장치(1400)는 도 8의 제2 테스트 이미지(820)를 테스트 이미지로 이용할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 합성 훈련용 데이터에 포함된 훈련용 이미지에 소정의 배경을 합성한 이미지를 테스트 이미지로 이용할 수 있다. 예를 들면, 전자 장치(1400)는 도 8의 제1 테스트 이미지(810)를 테스트 이미지로 이용할 수 있다. In step 910a, the electronic device 1400 may generate a test image. According to an embodiment of the present disclosure, the electronic device 1400 may use a training image included in synthetic training data as a test image. For example, the electronic device 1400 may use the second test image 820 of FIG. 8 as a test image. According to an embodiment of the present disclosure, the electronic device 1400 may use an image obtained by combining a training image included in synthetic training data with a predetermined background as a test image. For example, the electronic device 1400 may use the first test image 810 of FIG. 8 as a test image.

단계 920a에서, 전자 장치(1400)는 테스트 이미지를 인공 지능 모델에 입력함으로써 획득된 출력 결과에 기초하여, 예측 정확도를 결정할 수 있다.In step 920a, the electronic device 1400 may determine prediction accuracy based on an output result obtained by inputting a test image into an artificial intelligence model.

단계 930a에서, 전자 장치(1400)는 테스트 이미지에 대한 예측 정확도가 임계 값보다 크거나 같은지 여부를 식별한다. 만약, 예측 정확도가 임계 값보다 크거나 같은 경우에는 단계를 종료하고, 예측 정확도가 임계 값보다 작은 경우에는 단계 940a로 진행한다.In step 930a, the electronic device 1400 identifies whether the prediction accuracy for the test image is greater than or equal to a threshold. If the prediction accuracy is greater than or equal to the threshold, the step ends, and if the prediction accuracy is less than the threshold, the process proceeds to step 940a.

단계 940a에서, 전자 장치(1400)는 추가 합성 훈련용 데이터를 생성한다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 합성 훈련용 데이터와 동일한 객체에 대하여 배경을 변경하여 추가 합성 훈련용 데이터를 생성할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 합성 훈련용 데이터와 동일한 배경에 대하여 객체를 변경하여 추가 합성 훈련용 데이터를 생성할 수 있다. At step 940a, electronic device 1400 generates additional synthetic training data. According to an embodiment of the present disclosure, the electronic device 1400 may generate additional synthetic training data by changing the background for the same object as the synthetic training data. According to an embodiment of the present disclosure, the electronic device 1400 may generate additional synthetic training data by changing an object with respect to the same background as the synthetic training data.

도 9b는 본 개시의 일 실시예에 따른 추가 합성 훈련용 데이터를 생성하는지 여부를 결정하는 방법을 설명하기 위한 순서도이다.FIG. 9B is a flowchart illustrating a method of determining whether to generate additional synthetic training data according to an embodiment of the present disclosure.

도 9b를 참조하면, 본 개시의 일 실시예에 따른 전자 장치(1400)가 추가 합성 훈련용 데이터를 생성하는지 여부를 결정하는 방법(900b)은 단계 910b로부터 진행된다. 전자 장치(1400)는 배경 이미지의 적어도 일부를 포함하는 훈련용 데이터로 학습된 인공 지능 모델을 이용하여, 추가 합성 훈련용 데이터를 생성하는지 여부를 결정할 수 있다.Referring to FIG. 9B, the method 900b of determining whether the electronic device 1400 generates additional synthetic training data according to an embodiment of the present disclosure proceeds from step 910b. The electronic device 1400 may determine whether to generate additional synthetic training data using an artificial intelligence model learned with training data including at least a portion of the background image.

단계 910b에서, 전자 장치(1400)는 제1 테스트 이미지 및 제2 테스트 이미지를 생성할 수 있다. 본 개시의 일 실시예에 따라, 제1 테스트 이미지 및 제2 테스트 이미지는 도 8의 제1 테스트 이미지(810)와 제2 테스트 이미지(820)에 대응될 수 있다.In step 910b, the electronic device 1400 may generate a first test image and a second test image. According to an embodiment of the present disclosure, the first test image and the second test image may correspond to the first test image 810 and the second test image 820 of FIG. 8 .

단계 920b에서, 전자 장치(1400)는 제1 테스트 이미지와 제2 테스트 이미지를 각각 인공 지능 모델에 입력함으로써 획득된 출력 결과에 기초하여, 각각의 예측 정확도를 결정할 수 있다.In step 920b, the electronic device 1400 may determine each prediction accuracy based on output results obtained by inputting the first test image and the second test image into the artificial intelligence model, respectively.

단계 930b에서, 전자 장치(1400)는 제1 테스트 이미지에 대한 예측 정확도가 임계 값보다 크거나 같은지 여부를 식별한다. 만약, 예측 정확도가 임계 값보다 크거나 같은 경우에는 단계 940b로 진행하고, 예측 정확도가 임계 값보다 작은 경우에는 단계 950b로 진행한다.In step 930b, the electronic device 1400 identifies whether the prediction accuracy for the first test image is greater than or equal to a threshold. If the prediction accuracy is greater than or equal to the threshold, the process proceeds to step 940b, and if the prediction accuracy is less than the threshold, the process proceeds to step 950b.

단계 940b에서, 전자 장치(1400)는 제2 테스트 이미지에 대한 예측 정확도가 임계 값보다 크거나 같은지 여부를 식별한다. 만약, 예측 정확도가 임계 값보다 크거나 같은 경우에는 단계를 종료하고, 예측 정확도가 임계 값보다 작은 경우에는 단계 960b로 진행한다. 본 개시의 일 실시예에 따라, 단계 940에서의 임계 값과 단계 930에서의 임계 값은 상황에 따라 다르게 설정될 수 있다.In step 940b, the electronic device 1400 identifies whether the prediction accuracy for the second test image is greater than or equal to a threshold. If the prediction accuracy is greater than or equal to the threshold, the step ends, and if the prediction accuracy is less than the threshold, the process proceeds to step 960b. According to an embodiment of the present disclosure, the threshold value in step 940 and the threshold value in step 930 may be set differently depending on the situation.

단계 950b에서, 전자 장치(1400)는 추가 합성 훈련용 데이터를 생성한다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 합성 훈련용 데이터와 동일한 객체에 대하여 배경을 변경하여 추가 합성 훈련용 데이터를 생성할 수 있다. 제1 테스트 이미지에 대한 예측 정확도가 낮은 것은 해당 객체를 포함하는 데이터에 대한 예측이 부정확한 것으로 이해될 수 있다. 동일한 객체에 대한 추가 합성 훈련용 데이터를 생성함으로써, 인공 지능 모델의 객체에 대한 예측 정확도를 높일 수 있다.At step 950b, electronic device 1400 generates additional synthetic training data. According to an embodiment of the present disclosure, the electronic device 1400 may generate additional synthetic training data by changing the background for the same object as the synthetic training data. The low prediction accuracy for the first test image can be understood as inaccurate prediction for data including the corresponding object. By generating additional synthetic training data for the same object, the prediction accuracy of the object of the artificial intelligence model can be increased.

단계 960b에서, 전자 장치(1400)는 추가 합성 훈련용 데이터를 생성한다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 합성 훈련용 데이터와 동일한 배경에 대하여 객체를 변경하여 추가 합성 훈련용 데이터를 생성할 수 있다. 제2 테스트 이미지에 대한 예측 정확도가 낮은 것은 해당 배경을 포함하는 데이터에 대한 예측이 부정확한 것으로 이해될 수 있다. 동일한 배경에 대한 추가 합성 훈련용 데이터를 생성함으로써, 인공 지능 모델의 해당 배경을 포함하는 데이터에 대한 예측 정확도를 높일 수 있다.At step 960b, electronic device 1400 generates additional synthetic training data. According to an embodiment of the present disclosure, the electronic device 1400 may generate additional synthetic training data by changing an object with respect to the same background as the synthetic training data. The low prediction accuracy for the second test image can be understood as inaccurate prediction for data including the background. By generating additional synthetic training data for the same background, the prediction accuracy of the artificial intelligence model for data containing that background can be increased.

도 10은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.Figure 10 is a diagram for explaining a method of generating training data according to an embodiment of the present disclosure.

도 10을 참조하면, 전자 장치는 관심 영역 이미지(1010)의 색상 정보 및 밝기 정보에 기초하여, 합성 훈련용 데이터(1040)를 생성할 수 있다. 본 개시의 일 실시예에 따라, 목표 훈련용 데이터(1020)는 도 7의 730과 같이 목표 훈련용 데이터 중에서 학습 대상에 해당하는 객체일 수 있다.Referring to FIG. 10 , the electronic device may generate synthetic training data 1040 based on color information and brightness information of the region of interest image 1010. According to an embodiment of the present disclosure, the target training data 1020 may be an object corresponding to a learning target among the target training data, as shown in 730 of FIG. 7 .

본 개시의 일 실시예에 따른 전자 장치는 관심 영역 이미지(1010)의 색상 정보 또는 밝기 정보 중에서 적어도 하나를 획득할 수 있다. 전자 장치는 관심 영역 이미지(1010)의 각 픽셀 값에 기초하여 색상 정보 또는 밝기 정보 중에서 적어도 하나를 계산할 수 있다. 예를 들면, 전자 장치는 관심 영역 이미지(1010)의 픽셀의 색상 정보(예: RGB)에 기초하여 픽셀에 대한 HSL(색상, 채도, 밝기; Hue, Saturation, Luminance) 값을 계산할 수 있다. 관심 영역 이미지(1010)의 색상은 각 픽셀의 H 값의 평균으로 결정될 수 있다. 관심 영역 이미지(1010)의 밝기는 각 픽셀의 L 값의 평균으로 결정될 수 있다. An electronic device according to an embodiment of the present disclosure may obtain at least one of color information or brightness information of the region of interest image 1010. The electronic device may calculate at least one of color information or brightness information based on each pixel value of the region of interest image 1010. For example, the electronic device may calculate the HSL (Hue, Saturation, Luminance) value for the pixel based on color information (e.g., RGB) of the pixel of the region of interest image 1010. The color of the region of interest image 1010 may be determined as the average of the H values of each pixel. The brightness of the region of interest image 1010 may be determined as the average of the L values of each pixel.

본 개시의 일 실시예에 따른 전자 장치는 관심 영역 이미지(1010)의 색상 정보 또는 밝기 정보 중 적어도 하나에 기초하여 목표 훈련용 데이터(1020)의 색상 또는 밝기 중 적어도 하나를 수정할 수 있다. The electronic device according to an embodiment of the present disclosure may modify at least one of the color or brightness of the target training data 1020 based on at least one of the color information or brightness information of the region of interest image 1010.

전자 장치는 관심 영역 이미지(1010)의 H 값 평균(이하 'H_1010')과 훈련용 데이터(1020)의 H 값의 평균(이하 'H_1020')에 기초하여 훈련용 데이터(1020)의 H 값을 보정할 수 있다. 예를 들면, 훈련용 데이터(1020)의 H 값은 아래와 같이 보정될 수 있다. The electronic device determines the H value of the training data 1020 based on the average of the H values of the region of interest image 1010 (hereinafter referred to as 'H_1010') and the average of the H values of the training data 1020 (hereinafter referred to as 'H_1020'). It can be corrected. For example, the H value of the training data 1020 can be corrected as follows.

H_1020(x, y) = H_1020(x, y) + a(H_1010 - H_1020) (a는 보정 정도에 관한 값)H_1020(x, y) = H_1020(x, y) + a(H_1010 - H_1020) (a is a value related to the degree of correction)

전자 장치는 관심 영역 이미지(1010)의 L 값 평균(이하 'L_1010')과 훈련용 데이터(1020)의 L 값의 평균(이하 'L_1020')에 기초하여 훈련용 데이터(1020)의 L 값을 보정할 수 있다. 예를 들면, 훈련용 데이터(1020)의 L 값은 아래와 같이 보정될 수 있다. The electronic device determines the L value of the training data 1020 based on the average L value of the region of interest image 1010 (hereinafter referred to as 'L_1010') and the average of the L values of the training data 1020 (hereinafter referred to as 'L_1020'). It can be corrected. For example, the L value of the training data 1020 can be corrected as follows.

L_1020(x, y) = L_1020(x, y) + b(L_1010 - L_1020) (b는 보정 정도에 관한 값)L_1020(x, y) = L_1020(x, y) + b(L_1010 - L_1020) (b is a value related to the degree of correction)

전자 장치는 색상 또는 밝기 중 적어도 하나가 수정된 목표 훈련용 데이터(1020) 및 관심 영역 이미지(1010)를 합성함으로써, 합성 훈련용 데이터(1040)를 생성할 수 있다.The electronic device may generate synthetic training data 1040 by synthesizing target training data 1020 and a region of interest image 1010 in which at least one of color or brightness has been modified.

전자 장치는 합성하려는 관심 영역 이미지(1010)와 목표 훈련용 데이터(1020)의 색상 및/또는 밝기를 유사하게 변경함으로써 실제 촬영된 이미지와 유사한 합성 훈련용 데이터를 만들 수 있다. 즉, 전자 장치는 실제 촬영된 이미지와 같이 배경과 객체가 모두 같은 조명 상황에서 촬영된 것처럼 모두 밝거나, 모두 어두운 훈련용 데이터를 획득할 수 있다. The electronic device can create synthetic training data similar to an actually captured image by similarly changing the color and/or brightness of the region of interest image 1010 to be synthesized and the target training data 1020. In other words, the electronic device can acquire training data where the background and object are all bright or all dark, as if they were all captured under the same lighting situation, like an actually captured image.

본 개시의 일 실시예에 따라, 전자 장치가 관심 영역 이미지(1010)의 밝기 정보를 이용하여 목표 훈련용 데이터(1020)의 밝기를 변경하는 과정을 설명하였지만, 이에 제한되지 않고, 배경 이미지의 밝기 정보를 이용하여 목표 훈련용 데이터(1020)의 밝기를 변경할 수 있다. 예를 들면, 전자 장치는 배경 이미지의 밝기 정보를 획득하고, 목표 훈련용 데이터(1020)에 대하여 관심 영역과 상관 없이 배경 이미지의 밝기 정보에 기초하여 목표 훈련용 데이터(1020)의 밝기를 변경할 수 있다.According to an embodiment of the present disclosure, a process in which an electronic device changes the brightness of the target training data 1020 using the brightness information of the region of interest image 1010 has been described, but the process is not limited thereto, and the brightness of the background image is not limited thereto. The brightness of the target training data 1020 can be changed using the information. For example, the electronic device may acquire brightness information of the background image and change the brightness of the target training data 1020 based on the brightness information of the background image regardless of the region of interest for the target training data 1020. there is.

도 11은 본 개시의 일 실시예에 따른 컨텍스트를 이용하여 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.FIG. 11 is a diagram illustrating a method of generating training data using context according to an embodiment of the present disclosure.

도 11을 참조하면, 관심 영역 이미지(1110) 및 합성 훈련용 데이터(1130)는 도 1의 관심 영역 이미지(140) 합성 훈련용 데이터(160)에 대응될 수 있지만, 이에 제한되지 않는다. 목표 훈련용 데이터(1120)는 도 7의 목표 훈련용 데이터(720)에 대응될 수 있지만, 이에 제한되지 않는다.Referring to FIG. 11 , the region of interest image 1110 and the synthetic training data 1130 may correspond to the region of interest image 140 and the synthetic training data 160 of FIG. 1 , but are not limited thereto. The target training data 1120 may correspond to the target training data 720 of FIG. 7, but is not limited thereto.

설명의 편의를 위하여, 사람 포즈 추정에 관한 인공 지능 모델의 합성 훈련용 데이터를 생성하는 방법을 예시로 설명하고, 학습 대상은 사용자로 가정하지만, 이에 제한되지 않고, 다른 인공 지능 모델에도 적용될 수 있다.For convenience of explanation, a method of generating synthetic training data for an artificial intelligence model related to human pose estimation is explained as an example, and the learning target is assumed to be a user, but is not limited to this and can be applied to other artificial intelligence models as well. .

전자 장치(1400)는 관심 영역 이미지(1110)에 기초하여, 목표 훈련용 데이터(1120)를 결정할 수 있다. 본 개시의 일 실시예에 따라서, 전자 장치(1400)는 관심 영역 이미지(1110)의 너비 및 높이의 비율에 기초하여, 목표 훈련용 데이터(1120)를 결정할 수 있다. 본 개시의 일 실시예에 따른 전자 장치(1400)는 관심 영역 이미지(1110)의 너비 및 높이의 비율과 동일하거나 유사한 훈련용 이미지를 포함하는 목표 훈련용 데이터(1120)를 원본 훈련용 데이터 중에서 선택할 수 있다.The electronic device 1400 may determine target training data 1120 based on the region of interest image 1110. According to an embodiment of the present disclosure, the electronic device 1400 may determine target training data 1120 based on the ratio of the width and height of the region of interest image 1110. The electronic device 1400 according to an embodiment of the present disclosure selects target training data 1120 including a training image that is the same or similar to the ratio of the width and height of the region of interest image 1110 from the original training data. You can.

본 개시의 일 실시예에 따라서, 전자 장치(1400)는 관심 영역 이미지(1110)의 너비 및 높이의 비율에 기초하여, 관심 영역 이미지(1110)에 포함된 객체 컨텍스트를 결정할 수 있다. 예를 들면, 객체 컨텍스트는 사용자가 앉아 있는 상태인지, 사용자가 서있는 상태인지, 사용자가 운동 중인 상태 인지 여부에 관한 정보를 포함할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may determine the object context included in the region of interest image 1110 based on the ratio of the width and height of the region of interest image 1110. For example, the object context may include information about whether the user is sitting, whether the user is standing, or whether the user is exercising.

본 개시의 일 실시예에 따라서, 전자 장치(1400)는 관심 영역 이미지(1110)에 기초하여 관심 영역에 관한 제1 컨텍스트를 식별함으로써, 객체 컨텍스트를 결정할 수 있다. 예를 들면, 전자 장치(1400)는 관심 영역 이미지(1110)가 소파를 포함한다는 제1 컨텍스트를 식별할 수 있다. 전자 장치(1400)는 소파를 포함한다는 제1 컨텍스트에 기초하여, 사용자가 앉아 있는 상태 혹은 사용자가 누워 있는 상태로 객체 컨텍스트를 결정할 수 있다. 예를 들면, 전자 장치(1400)는 사용자가 앉을 수 있는 가구(예: 의자 등)를 포함하지 않는 것으로 식별된 관심 영역 이미지(1110a)에 기초하여, 사용자가 서있는 상태로 객체 컨텍스트를 결정할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may determine the object context by identifying the first context regarding the region of interest based on the region of interest image 1110. For example, the electronic device 1400 may identify the first context that the region of interest image 1110 includes a sofa. The electronic device 1400 may determine the object context as a state in which the user is sitting or a state in which the user is lying down, based on the first context that includes a sofa. For example, the electronic device 1400 may determine the object context with the user standing based on the region of interest image 1110a identified as not containing furniture (e.g., a chair, etc.) on which the user can sit. .

본 개시의 일 실시예에 따라서, 전자 장치(1400)는 전자 장치(1400)에 관한 제2 컨텍스트 정보(1115)에 기초하여 객체 컨텍스트를 결정할 수 있다. 본 개시에서 제2 컨텍스트 정보(1115)는 훈련용 데이터를 생성하는 전자 장치 혹은 외부 전자 장치의 동작에 관한 컨텍스트 정보를 의미할 수 있다. 예를 들면, 제2 컨텍스트 정보(1115)는 전자 장치(1400)의 영상이 일시 정지된 상태라는 정보(1115a), 전자 장치의 영상이 재생 중인 상태라는 정보(1115b), 또는 전자 장치의 특정 앱(예: 피트니스 앱)을 실행 중인 상태라는 정보(1115c)를 포함할 수 있다. 또한, 예를 들면, 전자 장치(1400)는 제2 컨텍스트 정보가 전자 장치의 특정 앱(예: 피트니스 앱)을 실행 중인 상태라는 정보(1115c)를 나타낼 때, 전자 장치(1400)는 운동 중인 상태라는 객체 컨텍스트를 결정할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may determine the object context based on the second context information 1115 regarding the electronic device 1400. In the present disclosure, the second context information 1115 may refer to context information regarding the operation of an electronic device that generates training data or an external electronic device. For example, the second context information 1115 may include information 1115a that the image of the electronic device 1400 is in a paused state, information 1115b that the image of the electronic device is being played, or a specific app of the electronic device. It may include information (1115c) indicating that a (e.g., fitness app) is running. Additionally, for example, when the second context information indicates that the electronic device 1400 is in a state of running a specific app (e.g., a fitness app), the electronic device 1400 is in a state of exercising. The object context can be determined.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 관심 영역 이미지(1110)의 너비 및 높이의 비율, 제1 컨텍스트 또는 제2 컨텍스트(1113) 중 적어도 하나에 기초하여, 객체 컨텍스트를 결정할 수 있다. 전술한 예시와 같이, 전자 장치(1400)는 관심 영역 이미지(1110)의 너비 및 높이의 비율, 제1 컨텍스트 또는 제2 컨텍스트(1113)에 기초하여 객체 컨텍스트를 결정할 수 있을 뿐 아니라, 이들의 조합에 의해서도 객체 컨텍스트를 결정할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may determine the object context based on at least one of the ratio of the width and height of the region of interest image 1110, the first context, and the second context 1113. there is. As in the above example, the electronic device 1400 may determine the object context based on the ratio of the width and height of the region of interest image 1110, the first context or the second context 1113, as well as a combination thereof. The object context can also be determined by .

본 개시의 일 실시예에 따라서, 전자 장치(1400)는 뉴럴 네트워크 혹은 테이블을 이용하여, 객체 컨텍스트를 결정할 수 있다. 예를 들면, 전자 장치(1400)는 관심 영역 이미지의 너비 및 높이의 비율, 제1 컨텍스트 또는 제2 컨텍스트 중 적어도 하나가 입력이고, 객체 컨텍스트를 출력으로 포함하는 데이터셋을 이용하여 학습된 인공 지능 모델을 이용하여 객체 컨텍스트를 결정할 수 있다. 또한, 예를 들면, 전자 장치(1400)는 관심 영역 이미지(1110)의 너비 및 높이의 비율, 제1 컨텍스트 또는 제2 컨텍스트 중 적어도 하나와 객체 컨텍스트 사이의 관계를 포함하는 테이블을 이용하여 객체 컨텍스트를 결정할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may determine the object context using a neural network or a table. For example, the electronic device 1400 uses artificial intelligence learned using a dataset including the ratio of the width and height of the image of the region of interest, at least one of the first context or the second context as input, and the object context as output. The object context can be determined using the model. Additionally, for example, the electronic device 1400 may use a table containing the ratio of the width and height of the region of interest image 1110 and the relationship between at least one of the first context or the second context and the object context to determine the object context. can be decided.

전자 장치(1400)는 원본 훈련용 데이터 중에서 객체 컨텍스트에 대응되는 목표 훈련용 데이터(1120)를 결정할 수 있다. 예를 들면, 객체 컨텍스트가 사용자가 서있는 상태를 나타내는 경우, 전자 장치(1400)는 원본 훈련용 데이터 중에서 서있는 상태에 대응되는 훈련용 데이터를 목표 훈련용 데이터(1120b)로 결정할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 객체 컨텍스트 및 관심 영역 이미지(1110)의 너비 및 높이의 비율에 기초하여, 목표 훈련용 데이터(1120)를 결정할 수 있다. 전자 장치(1400)는 객체 컨텍스트에 대응되면서 관심 영역 이미지(1110)의 너비 및 높이의 비율이 동일하거나 유사한 훈련용 데이터를 목표 훈련용 데이터(1120)로 결정할 수 있다. 예를 들면, 전자 장치(1400)는 관심 영역의 이미지(1110a)의 너비와 높이의 비율이 4보다 크고, 객체 컨텍스트가 사용자가 서 있는 상태인 경우, 사용자가 서있는 상태이며, 너비와 높이의 비율이 4이거나 4에 가까운 훈련용 이미지를 포함하는 목표 훈련용 데이터(1120b)를 선택할 수 있다. 또는, 예를 들면, 전자 장치(1400)는 관심 영역의 이미지(1110b)의 너비와 높이의 비율이 2배이고, 객체 컨텍스트가 사용자가 앉아 있는 상태인 경우, 사용자가 앉아 있으며, 너비와 높이의 비율이 2인 훈련용 이미지를 포함하는 목표 훈련용 데이터(1120a)를 선택할 수 있다. 또는, 예를 들면, 전자 장치(1400)는 관심 영역의 이미지(1110c)의 너비와 높이의 비율이 3이고, 객체 컨텍스트가 사용자가 운동 중인 상태인 경우, 운동 중인 동작이며 너비와 높이의 비율이 3인 훈련용 이미지를 포함하는 목표 훈련용 데이터(1120c)를 선택할 수 있다.The electronic device 1400 may determine target training data 1120 corresponding to the object context from among the original training data. For example, when the object context indicates a state in which the user is standing, the electronic device 1400 may determine training data corresponding to the standing state among the original training data as the target training data 1120b. According to an embodiment of the present disclosure, the electronic device 1400 may determine target training data 1120 based on the object context and the ratio of the width and height of the region of interest image 1110. The electronic device 1400 may determine training data that corresponds to the object context and has the same or similar ratio of width and height to the region of interest image 1110 as the target training data 1120. For example, if the ratio of the width to height of the image 1110a of the region of interest is greater than 4 and the object context is a state in which the user is standing, the electronic device 1400 determines that the user is standing, and the ratio of the width to height Target training data 1120b containing training images that are 4 or close to 4 can be selected. Or, for example, the electronic device 1400 may determine that the ratio of the width and height of the image 1110b of the region of interest is twice, and if the object context is that the user is sitting, the user is sitting and the ratio of the width and height is Target training data 1120a including these two training images can be selected. Or, for example, if the ratio of the width and height of the image 1110c of the region of interest is 3 and the object context is that the user is exercising, the electronic device 1400 may indicate that the user is exercising and the ratio of the width and height is 3. Target training data 1120c including three training images can be selected.

도 12는 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 도면이다.Figure 12 is a diagram for explaining a method of generating training data according to an embodiment of the present disclosure.

도 12를 참조하면, 본 개시의 일 실시예에 따른 전자 장치(1400)는 제1 촬영 이미지(1210) 및 제2 촬영 이미지(1230)에 기초하여 합성 훈련용 데이터를 생성할 수 있다. 제1 촬영 이미지(1210)는 전자 장치(1400)에 의하여 촬영된 이미지 중에서 인공 지능 모델에 의하여 결정된 예측 신뢰도(confidence)가 소정의 값보다 크거나 같은 이미지일 수 있다. 예를 들면, 전자 장치(1400)는 촬영 이미지들 중에서 인공 지능 모델에 의하여 결정된 예측 신뢰도가 소정의 값보다 작은 이미지를 제외하고, 예측 신뢰도가 소정의 값보다 크거나 같은 이미지를 제1 촬영 이미지(1210)로 결정할 수 있다. 제2 촬영 이미지(1230)는 제1 촬영 이미지(1210)와 동일한 객체를 포함하고, 인공 지능 모델에 의하여 결정된 예측 신뢰도(confidence)가 소정의 값보다 작은 이미지일 수 있다.Referring to FIG. 12 , the electronic device 1400 according to an embodiment of the present disclosure may generate synthetic training data based on the first captured image 1210 and the second captured image 1230. The first captured image 1210 may be an image whose prediction confidence determined by an artificial intelligence model is greater than or equal to a predetermined value among images captured by the electronic device 1400. For example, the electronic device 1400 excludes images whose prediction reliability determined by the artificial intelligence model is less than a predetermined value among the captured images, and selects images whose prediction reliability is greater than or equal to the predetermined value as the first captured image ( 1210). The second captured image 1230 may be an image that includes the same object as the first captured image 1210 and whose prediction confidence determined by the artificial intelligence model is less than a predetermined value.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 제1 촬영 이미지(1210)를 획득할 수 있다. 본 개시의 일 실시예에 따른 전자 장치(1400)는 인공 지능 모델을 이용하여 제1 촬영 이미지(1210)로부터 객체 이미지(1220)를 획득할 수 있다. 예를 들면, 전자 장치(1400)는 제1 촬영 이미지(1210)에 포함된 객체를 포함하는 가장 작은 직사각형 이미지를 객체 이미지(1220)로 결정할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 인공 지능 모델을 이용하여 예측 데이터(1225)를 획득할 수 있다. 예를 들면, 예측 데이터(1225)는 제1 촬영 이미지(1210)를 인공 지능 모델에 입력함으로써 획득된 출력 데이터일 수 있다. 예를 들면, 전자 장치(1400)는 제1 촬영 이미지(1210)를 객체 탐지를 위한 인공 지능 모델에 입력함으로써 출력된 객체의 존재 영역 및 객체의 종류를 포함하는 예측 데이터(1225)를 획득할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may acquire the first captured image 1210. The electronic device 1400 according to an embodiment of the present disclosure may acquire an object image 1220 from the first captured image 1210 using an artificial intelligence model. For example, the electronic device 1400 may determine the smallest rectangular image including an object included in the first captured image 1210 as the object image 1220. According to an embodiment of the present disclosure, the electronic device 1400 may acquire prediction data 1225 using an artificial intelligence model. For example, the prediction data 1225 may be output data obtained by inputting the first captured image 1210 into an artificial intelligence model. For example, the electronic device 1400 may obtain prediction data 1225 including the area of the output object and the type of the object by inputting the first captured image 1210 into an artificial intelligence model for object detection. there is.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 제2 촬영 이미지(1230)를 획득할 수 있다. 본 개시의 일 실시예에 따른 전자 장치(1400)는 인공 지능 모델을 이용하여 제2 촬영 이미지(1230)로부터 관심 영역(1240)을 획득할 수 있다. 예를 들면, 전자 장치(1400)는 배경 이미지에 기초하여 관심 영역(1240)을 획득할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 배경 이미지의 관심 영역에 대응되는 관심 영역 이미지(1250)를 획득할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may acquire the second captured image 1230. The electronic device 1400 according to an embodiment of the present disclosure may acquire the region of interest 1240 from the second captured image 1230 using an artificial intelligence model. For example, the electronic device 1400 may acquire the region of interest 1240 based on the background image. According to an embodiment of the present disclosure, the electronic device 1400 may acquire a region of interest image 1250 corresponding to the region of interest of the background image.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 객체 이미지(1220) 및 관심 영역 이미지(1250)에 기초하여 합성 훈련용 이미지(1260)를 생성할 수 있다. 예를 들면, 전자 장치(1400)는 객체 이미지(1220)의 객체 영역을 추출하고, 관심 영역 이미지를 합성함으로써 합성 훈련용 이미지(1260)를 생성할 수 있다. 전자 장치(1400)는 합성 훈련용 이미지(1260) 및 예측 데이터(1265)를 포함하는 객체에 관한 실측 합성 훈련용 데이터를 생성할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may generate a synthetic training image 1260 based on the object image 1220 and the region of interest image 1250. For example, the electronic device 1400 may generate a synthetic training image 1260 by extracting the object area of the object image 1220 and synthesizing the region-of-interest images. The electronic device 1400 may generate ground truth synthetic training data for an object including a synthetic training image 1260 and prediction data 1265.

전자 장치(1400)는 인공 지능 모델에 의하여 신뢰도가 높은 입력 이미지에 기초하여 획득된 객체 이미지를 이용하여 훈련용 데이터를 생성함으로써, 제1 이미지(1210) 및 제2 이미지(1230)에 포함된 객체에 관한 훈련용 데이터를 생성할 수 있다. 전자 장치(1400)에 의하여 촬영된 이미지에 포함된 객체를 이용하여 훈련용 데이터는 임의의 객체를 이용하여 생성된 훈련용 데이터보다 인공 지능 모델의 실제 사용 환경에 대한 정확도를 증가하도록 인공 지능 모델을 학습 시킬 수 있다.The electronic device 1400 generates training data using object images obtained based on highly reliable input images by an artificial intelligence model, so that the objects included in the first image 1210 and the second image 1230 You can generate training data about . Training data using objects included in images captured by the electronic device 1400 is used to create an artificial intelligence model to increase accuracy of the actual use environment of the artificial intelligence model compared to training data generated using random objects. It can be learned.

전자 장치(1400)를 통해 촬영된 이미지에 포함된 객체가 복수인 경우, 복수의 객체 중에서 인공 지능 모델에 의한 신뢰도가 낮은 객체에 대하여만 훈련용 데이터가 생성될 수 있다. 예를 들면, 전자 장치(1400)를 통해 촬영된 하나 이상의 이미지에 객체 A, 객체 B가 나타나는 경우, 전자 장치(1400)는 촬영된 이미지 중에서 인공 지능 모델에 관한 신뢰도가 낮은 객체 B에 관하여 훈련용 데이터를 생성할 수 있다.When there are a plurality of objects included in an image captured through the electronic device 1400, training data may be generated only for objects with low reliability by the artificial intelligence model among the plurality of objects. For example, when object A and object B appear in one or more images captured through the electronic device 1400, the electronic device 1400 uses object B for training with respect to object B, which has low confidence in the artificial intelligence model, among the captured images. Data can be generated.

도 13은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 순서도이다.Figure 13 is a flowchart for explaining a method of generating training data according to an embodiment of the present disclosure.

도 13을 참조하면, 본 개시의 일 실시예에 따른 전자 장치(1400)가 인공 지능 모델의 학습을 위한 훈련용 데이터를 생성하 방법(1300)은 단계 1310로부터 진행된다. Referring to FIG. 13, a method 1300 in which an electronic device 1400 generates training data for learning an artificial intelligence model according to an embodiment of the present disclosure proceeds from step 1310.

단계 1310에서, 전자 장치(1400)는 학습 대상에 해당하는 객체를 포함하는 제1 이미지를 획득할 수 있다. 예를 들면, 인공 지능 모델이 객체 탐지를 위한 모델인 경우 객체는 인공 지능 모델이 탐지할 수 있는 객체를 의미할 수 있다.In step 1310, the electronic device 1400 may acquire a first image including an object corresponding to the learning target. For example, if the artificial intelligence model is a model for object detection, object may refer to an object that the artificial intelligence model can detect.

단계 1320에서, 전자 장치(1400)는 제1 이미지와 동일한 시점에서 촬영된 제2 이미지에 기초하여, 객체를 포함하지 않는 배경 이미지를 획득할 수 있다. 예를 들면, 제1 이미지와 제2 이미지는 시점(view)이 동일한 하나의 카메라에 의하여 촬영된 것일 수 있다. 예를 들면, 제1 이미지와 제2 이미지는 위치가 고정된 전자 장치(1400)의 카메라를 이용하여 촬영된 것일 수 있다. In step 1320, the electronic device 1400 may obtain a background image that does not include an object based on a second image taken at the same time as the first image. For example, the first image and the second image may be captured by a single camera with the same viewpoint. For example, the first image and the second image may be captured using a camera of the electronic device 1400 with a fixed location.

단계 1330에서, 전자 장치(1400)는 제1 이미지 내에서 객체를 포함하는 관심 영역을 식별할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 제1 이미지와 배경 이미지에 기초하여 관심 영역을 식별할 수 있다. 예를 들면, 전자 장치(1400)는 제1 이미지와 배경 이미지의 차이에 대한 잔차 이미지를 이용하여 관심 영역을 식별할 수 있다.In step 1330, the electronic device 1400 may identify a region of interest including an object within the first image. According to an embodiment of the present disclosure, the electronic device 1400 may identify a region of interest based on the first image and the background image. For example, the electronic device 1400 may identify a region of interest using a residual image of the difference between the first image and the background image.

단계 1340에서, 전자 장치(1400)는 배경 이미지로부터 관심 영역에 해당하는 관심 영역 이미지를 획득할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 배경 이미지의 관심 영역을 추출함으로써 관심 영역 이미지를 획득할 수 있다.In step 1340, the electronic device 1400 may obtain a region of interest image corresponding to the region of interest from the background image. According to an embodiment of the present disclosure, the electronic device 1400 may obtain a region of interest image by extracting the region of interest from the background image.

단계 1350에서, 전자 장치(1400)는 기생성된 원본 훈련용 데이터 중에서, 관심 영역 이미지의 너비와 높이의 비율에 기초하여 결정된 객체의 컨텍스트에 대응되는, 제1 목표 훈련용 데이터를 결정할 수 있다. 예를 들면, 제1 목표 훈련용 데이터는 객체의 컨텍스트에 해당하는 훈련용 이미지 및 훈련용 정보를 포함할 수 있다. 예를 들면, 제1 목표 훈련용 데이터는 서있는 상태를 나타내는 훈련용 이미지 및 훈련용 정보를 포함할 수 있다.In step 1350, the electronic device 1400 may determine first target training data, which corresponds to the context of the object determined based on the ratio of the width and height of the region of interest image, from among the parasitically generated original training data. For example, the first target training data may include training images and training information corresponding to the context of the object. For example, the first target training data may include a training image and training information indicating a standing state.

단계 1360에서, 전자 장치(1400)는 관심 영역 이미지에 기초하여, 제1 목표 훈련용 데이터에 포함된 제1 훈련용 이미지를 수정함으로써, 배경 이미지의 적어도 일부가 포함된 합성 훈련용 데이터를 생성할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 제1 목표 훈련용 데이터에 포함된 제1 훈련용 이미지와 관심 영역 이미지를 합성함으로써 훈련용 이미지를 생성할 수 있다. 합성 훈련용 데이터는 합성된 훈련용 이미지와 훈련용 데이터를 포함할 수 있다.In step 1360, the electronic device 1400 generates synthetic training data including at least a portion of the background image by modifying the first training image included in the first target training data based on the region of interest image. You can. According to an embodiment of the present disclosure, the electronic device 1400 may generate a training image by combining the first training image and the region of interest image included in the first target training data. Synthetic training data may include synthesized training images and training data.

도 14는 본 개시의 일 실시예에 따른 전자 장치의 구성을 설명하기 위한 블록도이다.Figure 14 is a block diagram for explaining the configuration of an electronic device according to an embodiment of the present disclosure.

도 14를 참조하면, 전자 장치(1400)는 메모리(1410), 프로세서(1420), 카메라(1430)를 포함할 수 있으나, 이에 한정되는 것은 아니며, 범용적인 구성이 더 추가될 수 있다.Referring to FIG. 14, the electronic device 1400 may include a memory 1410, a processor 1420, and a camera 1430, but is not limited thereto, and additional general-purpose components may be added.

일 실시예에 따른 메모리(1410)는, 프로세서(1420)의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 전자 장치(1400)로 입력되거나 전자 장치(1400)로부터 출력되는 데이터를 저장할 수 있다. 메모리(1410)는 프로세서(1420)가 판독할 수 있는 명령어들, 데이터 구조, 및 프로그램 코드(program code)가 저장될 수 있다. 개시된 실시예들에서, 프로세서(1420)가 수행하는 동작들은 메모리(1410)에 저장된 프로그램의 명령어들 또는 코드들을 실행함으로써 구현될 수 있다.The memory 1410 according to one embodiment may store a program for processing and control of the processor 1420 and may store data input to or output from the electronic device 1400. The memory 1410 may store instructions, data structures, and program code that the processor 1420 can read. In the disclosed embodiments, operations performed by the processor 1420 may be implemented by executing instructions or codes of a program stored in the memory 1410.

일 실시예에 따른 메모리(1410)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등)를 포함할 수 있으며, 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나를 포함하는 비 휘발성 메모리 및 램(RAM, Random Access Memory) 또는 SRAM(Static Random Access Memory)과 같은 휘발성 메모리를 포함할 수 있다.The memory 1410 according to one embodiment may be a flash memory type, a hard disk type, a multimedia card micro type, or a card type memory (for example, SD or XD). memory, etc.), and may include at least one of ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, and optical disk. It may include non-volatile memory, such as RAM (Random Access Memory) or SRAM (Static Random Access Memory).

일 실시예에 따른 메모리(1410)는 훈련용 데이터를 생성하는 전자 장치(1400)가 태스크를 수행할 수 있도록 제어하는 하나 이상의 인스트럭션 및/또는 프로그램을 저장할 수 있다. 예를 들어, 메모리(1410)에는 관심 영역 식별부(1411), 관심 영역 이미지 획득부(1412), 목표 훈련용 데이터 결정부(1413), 합성 훈련용 데이터 생성부(1414) 등이 저장될 수 있다.The memory 1410 according to an embodiment may store one or more instructions and/or programs that control the electronic device 1400 that generates training data to perform a task. For example, the memory 1410 may store a region of interest identification unit 1411, a region of interest image acquisition unit 1412, a target training data determination unit 1413, a synthetic training data generation unit 1414, etc. there is.

일 실시예에 따른 프로세서(1420)는 메모리(1410)에 저장된 명령어들이나 프로그램화된 소프트웨어 모듈을 실행함으로써, 전자 장치(1400)가 태스크를 수행할 수 있도록 동작이나 기능을 제어할 수 있다. 프로세서(1420)는 산술, 로직 및 입출력 연산과 시그널 프로세싱을 수행하는 하드웨어 구성 요소로 구성될 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션(instructions)을 실행함으로써, 전자 장치(1400)가 합성 훈련용 데이터를 생성하는 태스크를 수행하는 전반적인 동작들을 제어할 수 있다. The processor 1420 according to one embodiment may control operations or functions of the electronic device 1400 to perform a task by executing instructions or programmed software modules stored in the memory 1410. The processor 1420 may be comprised of hardware components that perform arithmetic, logic, input/output operations, and signal processing. The processor 1420 may control overall operations of the electronic device 1400 to perform the task of generating synthetic training data by executing one or more instructions stored in the memory 1410.

일 실시예에 따른 프로세서(1420)는 예를 들어, 중앙 처리 장치(Central Processing Unit), 마이크로 프로세서(microprocessor), 그래픽 처리 장치(Graphic Processing Unit), ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 애플리케이션 프로세서(Application Processor), 신경망 처리 장치(Neural Processing Unit) 또는 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계된 인공지능 전용 프로세서 중 적어도 하나로 구성될 수 있으나, 이에 제한되는 것은 아니다. 프로세서(1420)를 구성하는 각 프로세서는 소정의 기능을 수행하기 위한 전용 프로세서일 수 있다.The processor 1420 according to one embodiment includes, for example, a central processing unit, a microprocessor, a graphics processing unit, application specific integrated circuits (ASICs), and digital signal processing units (DSPs). Processors), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), Application Processors, Neural Processing Units, or hardware specialized for processing artificial intelligence models. It may consist of at least one of the designed artificial intelligence processors, but is not limited to this. Each processor constituting the processor 1420 may be a dedicated processor for performing a certain function.

일 실시예에 따른 인공 지능(AI; artificial intelligence) 프로세서는, 인공지능(AI) 모델을 이용하여, 전자 장치(1400)가 수행하도록 설정된 태스크의 처리를 위해, 연산 및 제어를 수행할 수 있다. AI 프로세서는, 인공 지능(AI)을 위한 전용 하드웨어 칩 형태로 제작될 수도 있고, 또는 범용 프로세서(예: CPU 또는 application processor) 또는 그래픽 전용 프로세서(예: GPU)의 일부로 제작되어 전자 장치(1400)에 탑재될 수도 있다.An artificial intelligence (AI) processor according to one embodiment may perform computation and control to process a task that the electronic device 1400 is set to perform, using an artificial intelligence (AI) model. The AI processor may be manufactured in the form of a dedicated hardware chip for artificial intelligence (AI), or may be manufactured as part of a general-purpose processor (eg, CPU or application processor) or graphics-specific processor (eg, GPU) and used in the electronic device 1400. It may be mounted on .

본 개시의 일 실시예에 따르면 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 학습 대상에 해당하는 객체를 포함하는 제1 이미지를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지와 동일한 시점에서 촬영된 제2 이미지에 기초하여, 객체를 포함하지 않는 배경 이미지를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지 내에서 객체를 포함하는 관심 영역을 식별할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 배경 이미지로부터 관심 영역에 해당하는 관심 영역 이미지를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 기생성된 원본 훈련용 데이터 중에서, 관심 영역 이미지의 너비와 높이의 비율에 기초하여 결정된 객체의 컨텍스트에 대응되는 제1 목표 훈련용 데이터를 결정할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 제1 목표 훈련용 데이터에 포함된 제1 훈련용 이미지를 수정함으로써, 배경 이미지의 적어도 일부가 포함된 합성 훈련용 데이터를 생성할 수 있다.According to an embodiment of the present disclosure, the processor 1420 may acquire a first image including an object corresponding to a learning target by executing one or more instructions stored in the memory 1410. The processor 1420 may execute one or more instructions stored in the memory 1410 to obtain a background image that does not include an object based on a second image taken at the same time as the first image. The processor 1420 may identify a region of interest including an object in the first image by executing one or more instructions stored in the memory 1410. The processor 1420 may obtain a region of interest image corresponding to the region of interest from the background image by executing one or more instructions stored in the memory 1410. The processor 1420 executes one or more instructions stored in the memory 1410 to train a first target corresponding to the context of the object determined based on the ratio of the width and height of the image of the region of interest among the generated original training data. data can be determined. The processor 1420 executes one or more instructions stored in the memory 1410 to modify the first training image included in the first target training data based on the region of interest image, so that at least a portion of the background image is included. You can generate synthetic training data.

본 개시의 일 실시예에 따르면 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 소정의 배경을 포함하는 이미지 및 제1 목표 훈련용 데이터에 기초하여 생성된 제1 테스트 이미지와 배경 이미지 및 제1 목표 훈련용 데이터에 기초하여 생성된 제2 테스트 이미지를 이용하여, 인공 지능 모델의 예측 정확도를 결정할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 테스트 이미지에 대한 제1 예측 정확도 및 제2 테스트 이미지에 대한 제2 예측 정확도에 기초하여, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1420 executes one or more instructions stored in the memory 1410 to create an image including a predetermined background and a first test image generated based on the first target training data. The prediction accuracy of the artificial intelligence model can be determined using the second test image generated based on the background image and the first target training data. Processor 1420 executes one or more instructions stored in memory 1410 to generate additional synthetic training data based on the first prediction accuracy for the first test image and the second prediction accuracy for the second test image. can do.

본 개시의 일 실시예에 따르면 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 예측 정확도가 소정의 값보다 크거나 같고, 제2 예측 정확도가 소정의 값보다 작은 것에 기초하여, 원본 훈련용 데이터 중에서 제2 목표 훈련용 데이터를 결정할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 제2 목표 훈련용 데이터에 포함된 제2 훈련용 이미지를 수정함으로써, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1420 executes one or more instructions stored in the memory 1410 to determine whether the first prediction accuracy is greater than or equal to a predetermined value and the second prediction accuracy is less than a predetermined value. Based on this, the second target training data can be determined from the original training data. The processor 1420 generates additional synthetic training data by executing one or more instructions stored in the memory 1410 and modifying the second training image included in the second target training data based on the region of interest image. can do.

본 개시의 일 실시예에 따르면, 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 적어도 하나의 프로세서(1420)가 적어도 하나의 인스트럭션을 실행함으로써, 제1 예측 정확도 및 제2 예측 정확도가 소정의 값보다 작은 것에 기초하여, 배경 이미지로부터 관심 영역 이미지의 높이 및 너비와 동일한 추가 관심 영역 이미지를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 추가 관심 영역 이미지에 기초하여, 제1 훈련용 이미지를 수정함으로써, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1420 executes one or more instructions stored in the memory 1410, so that at least one processor 1420 executes at least one instruction, thereby obtaining the first prediction accuracy and the second Based on the prediction accuracy being less than a predetermined value, an additional region of interest image equal to the height and width of the region of interest image may be obtained from the background image. The processor 1420 may generate additional synthetic training data by executing one or more instructions stored in the memory 1410 and modifying the first training image based on the additional region of interest image.

본 개시의 일 실시예에 따르면, 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지의 색상 정보 또는 밝기 정보 중 적어도 하나를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지의 색상 정보 또는 밝기 정보 중 적어도 하나에 기초하여 제1 훈련용 이미지의 색상 또는 밝기 중 적어도 하나를 수정할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 수정된 제1 훈련용 이미지를 수정함으로써, 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1420 may obtain at least one of color information or brightness information of the region of interest image by executing one or more instructions stored in the memory 1410. The processor 1420 may modify at least one of the color or brightness of the first training image based on at least one of the color information or brightness information of the region of interest image by executing one or more instructions stored in the memory 1410. The processor 1420 may generate synthetic training data by executing one or more instructions stored in the memory 1410 and modifying the corrected first training image based on the region of interest image.

본 개시의 일 실시예에 따르면, 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 관심 영역에 관한 제1 컨텍스트 정보를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 전자 장치의 상태에 관한 제2 컨텍스트 정보를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 컨텍스트 정보, 제2 컨텍스트 정보 및 배경 이미지의 너비와 높이의 비율에 기초하여, 객체의 컨텍스트를 결정할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 원본 훈련용 데이터 중에서 객체의 컨텍스트에 대응되는 제1 목표 훈련용 데이터를 결정할 수 있다. According to an embodiment of the present disclosure, the processor 1420 may obtain first context information about the region of interest based on the region of interest image by executing one or more instructions stored in the memory 1410. The processor 1420 may obtain second context information about the state of the electronic device by executing one or more instructions stored in the memory 1410. The processor 1420 may execute one or more instructions stored in the memory 1410 to determine the context of the object based on the ratio of the width and height of the first context information, the second context information, and the background image. The processor 1420 may determine first target training data corresponding to the context of the object from among the original training data by executing one or more instructions stored in the memory 1410.

본 개시의 일 실시예에 따르면, 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지와 배경 이미지 사이의 잔차 이미지를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 잔차 이미지에 기초하여 관심 영역을 식별할 수 있다.According to one embodiment of the present disclosure, the processor 1420 may obtain a residual image between the first image and the background image by executing one or more instructions stored in the memory 1410. The processor 1420 may identify a region of interest based on the residual image by executing one or more instructions stored in the memory 1410.

본 개시의 일 실시예에 따르면, 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 훈련용 이미지에 포함된 객체에 해당하는 객체 이미지를 추출할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지와 객체 이미지를 합성함으로써 생성된 합성 훈련용 이미지를 포함하는 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1420 may extract an object image corresponding to an object included in the first training image by executing one or more instructions stored in the memory 1410. The processor 1420 may generate synthetic training data including a synthetic training image generated by combining a region of interest image and an object image by executing one or more instructions stored in the memory 1410.

본 개시의 일 실시예에 따르면, 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 객체 이미지 및 배경 이미지에 기초하여, 실측 합성 훈련용 이미지를 생성할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제3 이미지를 인공 지능 모델에 입력함으로써, 제3 이미지에 포함된 객체에 관한 예측 데이터를 획득할 수 있다. 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 실측 합성 훈련용 이미지와 상기 예측 데이터에 기초하여, 상기 객체에 관한 실측 합성 훈련용 데이터를 생성할 수 있다.According to an embodiment of the present disclosure, the processor 1420 may generate a ground truth synthetic training image based on the object image and the background image by executing one or more instructions stored in the memory 1410. The processor 1420 may execute one or more instructions stored in the memory 1410 to input the third image into the artificial intelligence model, thereby obtaining prediction data regarding the object included in the third image. The processor 1420 may generate ground truth synthesis training data for the object based on the ground truth synthesis training image and the prediction data by executing one or more instructions stored in the memory 1410.

일 실시예에 따른 카메라(1430)는, 객체를 촬영하여 비디오 및/또는 이미지를 획득할 수 있다. 본 개시의 일 실시예에 따라, 카메라(1430)는 소정의 위치에 고정되어 촬영을 수행할 수 있다. 카메라(1430)는 예를 들어, RGB 카메라, 망원 카메라, 광각 카메라, 초광각 카메라 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 카메라(1430)는 복수의 프레임들을 포함하는 비디오를 획득할 수 있다. 카메라(1430)의 구체적인 종류 및 세부 기능은 통상의 기술자가 명확하게 추론할 수 있으므로, 설명을 생략한다.The camera 1430 according to one embodiment may acquire video and/or images by photographing an object. According to an embodiment of the present disclosure, the camera 1430 may be fixed at a predetermined location and perform photography. The camera 1430 may include, for example, an RGB camera, a telephoto camera, a wide-angle camera, an ultra-wide-angle camera, etc., but is not limited thereto. Camera 1430 may acquire video including a plurality of frames. Since the specific type and detailed functions of the camera 1430 can be clearly deduced by a person skilled in the art, descriptions are omitted.

한편, 본 개시의 실시예들은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독 가능 매체는 컴퓨터 저장 매체 및 통신 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독 가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독 가능 명령어, 데이터 구조, 또는 프로그램 모듈과 같은 변조된 데이터 신호의 기타 데이터를 포함할 수 있다.Meanwhile, embodiments of the present disclosure may also be implemented in the form of a recording medium containing instructions executable by a computer, such as program modules executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media. Additionally, computer-readable media may include computer storage media and communication media. Computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Communication media typically may include computer-readable instructions, data structures, or other data, such as modulated data signals, or program modules.

도 15는 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하는 방법을 설명하기 위한 순서도이다.Figure 15 is a flowchart for explaining a method of generating training data according to an embodiment of the present disclosure.

도 15를 참조하면, 본 개시의 일 실시예에 따른 전자 장치(1400)는 TV일 수 있지만, 이에 제한되지 않고, 다양한 전자 장치일 수 있다. Referring to FIG. 15, the electronic device 1400 according to an embodiment of the present disclosure may be a TV, but is not limited thereto and may be various electronic devices.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 카메라(1430)를 이용하여 실내 공간을 촬영할 수 있다. 카메라(1430)에 의하여 촬영된 이미지는 쇼파(1530)와 운동 중인 사람(1540)을 포함할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 can photograph an indoor space using a camera 1430. An image captured by the camera 1430 may include a sofa 1530 and a person exercising 1540.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 카메라(1430)를 이용하여 실내 공간을 촬영하는 기능을 활성화 혹은 비활성화 할 수 있다. 전자 장치(1400)는 특정 태스크를 수행중인 경우 이미지를 촬영할 수 있다. 예를 들면, 전자 장치(1400)는 헬스 애플리케이션 실행 중에만 이미지를 촬영할 수 있다. 전자 장치(1400)가 특정 태스크를 수행중인 경우에 촬영된 이미지를 이용하여 훈련용 데이터를 생성하는 경우, 특정 태스크를 수행할 때 식별되는 관심 영역 이미지를 이용하는 것이므로, 인공 지능 모델의 예측 정확도가 높을 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may activate or deactivate the function of photographing an indoor space using the camera 1430. The electronic device 1400 may capture an image while performing a specific task. For example, the electronic device 1400 may capture images only while a health application is running. When the electronic device 1400 generates training data using images taken while performing a specific task, the region of interest image identified when performing the specific task is used, so the prediction accuracy of the artificial intelligence model is high. You can.

본 개시의 일 실시예에 따른 전자 장치(1400)는 사람 포즈 추정에 관한 인공 지능 모델을 이용할 수 있다. 전자 장치(1400)는 수행하는 태스크에 따라 다른 인공 지능 모델을 이용할 수 있다. 예를 들면, 전자 장치(1400)는 헬스 애플리케이션 실행 중에는 사람 포즈 추정에 관한 인공 지능 모델을 이용할 수 있고, 대기 중에는 객체 탐지에 관한 인공 지능 모델을 이용할 수 있다. 전자 장치(1400)는 이용하는 인공 지능 모델에 따라 생성하는 훈련용 데이터의 종류가 다를 수 있다. 예를 들면, 인공 지능 모델이 사람 포즈 추정을 위한 것인 경우, 전자 장치(1400)에 의하여 생성되는 훈련용 데이터는 사람 포즈 추정에 대한 학습을 위한 훈련용 데이터일 수 있다.The electronic device 1400 according to an embodiment of the present disclosure may use an artificial intelligence model for human pose estimation. The electronic device 1400 may use different artificial intelligence models depending on the task it performs. For example, the electronic device 1400 may use an artificial intelligence model for human pose estimation while running a health application, and may use an artificial intelligence model for object detection while waiting. The electronic device 1400 may generate different types of training data depending on the artificial intelligence model used. For example, if the artificial intelligence model is for human pose estimation, training data generated by the electronic device 1400 may be training data for learning human pose estimation.

전자 장치(1400)는 촬영된 이미지에 기초하여 배경 이미지를 획득할 수 있다. 전자 장치(1400)는 도 1, 도 2a, 도 2b 및 도 13을 참조하여 설명된 배경 생성 방법을 이용하여 촬영 이미지에 기초하여 배경 이미지를 생성할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 소정의 시간 간격마다 배경 이미지를 생성할 수 있다. 예를 들면, 전자 장치(1400)는 촬영 주기와는 별도의 시간 간격(예: 30분, 3시간 등)마다 배경 이미지를 생성할 수 있다. 전자 장치(1400)는 소정의 시간 간격 사이에 촬영된 이미지 전부 혹은 일부를 이용하여 배경 이미지를 생성할 수 있다. 예를 들면, 전자 장치(1400)는 소정의 시간 간격 사이에 촬영된 이미지들 중에서 일부를 결정하고, 결정된 일부 이미지를 이용하여 배경 이미지를 생성할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 이미지 촬영할 때마다 배경 이미지를 생성할 수 있다.The electronic device 1400 may acquire a background image based on the captured image. The electronic device 1400 may generate a background image based on the captured image using the background generation method described with reference to FIGS. 1, 2A, 2B, and 13. According to an embodiment of the present disclosure, the electronic device 1400 may generate a background image at predetermined time intervals. For example, the electronic device 1400 may generate a background image at time intervals (e.g., 30 minutes, 3 hours, etc.) separate from the shooting cycle. The electronic device 1400 may generate a background image using all or part of images captured during a predetermined time interval. For example, the electronic device 1400 may determine some of the images captured during a predetermined time interval and generate a background image using the determined partial images. According to an embodiment of the present disclosure, the electronic device 1400 may generate a background image each time an image is captured.

전자 장치(1400)는 촬영된 이미지와 배경 이미지에 기초하여 관심 영역 이미지를 획득할 수 있다. 전자 장치(1400)는 도 1, 도 3, 도 4 및 도 13을 참조하여 설명된 관심 영역 이미지 생성 방법을 이용하여 촬영된 이미지와 배경 이미지에 기초하여 관심 영역 이미지를 획득할 수 있다. The electronic device 1400 may acquire a region of interest image based on the captured image and the background image. The electronic device 1400 may acquire a region of interest image based on the captured image and the background image using the region of interest image generation method described with reference to FIGS. 1, 3, 4, and 13.

전자 장치(1400)는 관심 영역 이미지에 기초하여 목표 훈련용 데이터를 결정할 수 있다. 본 개시의 일 실시예에 따라, 전자 장치(1400)는 컨텍스트를 고려하여 목표 훈련용 데이터를 결정할 수 있다. 예를 들면, 전자 장치(1400)가 운동 프로그램을 실행중인 경우, 사용자가 운동 중인 상태라는 객체 컨텍스트를 결정할 수 있다. 예를 들면, 전자 장치(1400)가 운동 프로그램 중 요가 프로그램을 실행중인 경우, 사용자가 요가 중인 상태라는 객체 컨텍스트를 결정할 수 있다. 전자 장치(1400)는 사용자가 운동 중인 상태 혹은 요가 중인 상태라는 객체 컨텍스트에 기초하여 목표 훈련용 데이터를 결정할 수 있다.The electronic device 1400 may determine target training data based on the region of interest image. According to an embodiment of the present disclosure, the electronic device 1400 may determine target training data by considering the context. For example, when the electronic device 1400 is executing an exercise program, it may determine an object context indicating that the user is exercising. For example, when the electronic device 1400 is executing a yoga program among exercise programs, it may determine an object context indicating that the user is doing yoga. The electronic device 1400 may determine target training data based on the object context that the user is exercising or doing yoga.

전자 장치(1400)는 관심 영역 이미지 및 목표 훈련용 데이터를 이용하여 합성 훈련용 데이터를 생성할 수 있다. 전자 장치(1400)는 도 1 및 도 7을 참조하여 설명된 합성 훈련용 데이터 생성 방법을 이용하여 관심 영역 이미지 및 목표 훈련용 데이터를 이용하여 합성 훈련용 데이터를 생성할 수 있다.The electronic device 1400 may generate synthetic training data using the region of interest image and target training data. The electronic device 1400 may generate synthetic training data using a region of interest image and target training data using the synthetic training data generation method described with reference to FIGS. 1 and 7 .

본 개시의 일 실시예에 따라, 전자 장치(1400)는 특정 조건에서 합성 훈련용 데이터를 생성할 수 있다. 예를 들면, 전자 장치(1400)는 카메라를(1430) 이용하여 촬영 중인 경우에도 특정 조건을 만족하지 않으면 합성 훈련용 데이터를 생성하지 않을 수 있다. 예를 들면, 전자 장치(1400)는 사용자가 특정 동작(예: 사용자 정보 등록, 제스처 컨트롤)할 때 합성 훈련용 데이터를 생성할 수 있다. 이러한 예시에서, 전자 장치(1400)는 이미지 촬영 여부와 별개로 특정 조건을 만족하는 경우에만 관심 영역을 식별하거나 배경 이미지를 생성할 수 있다. According to an embodiment of the present disclosure, the electronic device 1400 may generate synthetic training data under specific conditions. For example, even when taking pictures using the camera 1430, the electronic device 1400 may not generate synthetic training data unless a specific condition is satisfied. For example, the electronic device 1400 may generate synthetic training data when the user performs a specific operation (e.g., registering user information, gesture control). In this example, the electronic device 1400 may identify an area of interest or generate a background image only when a specific condition is satisfied, regardless of whether the image is captured.

본 개시의 일 실시예에 따라, 전자 장치(1400)는 인공 지능 모델을 이용한 예측 신뢰도(prediction confidence)가 낮은 경우, 합성 데이터를 생성할 수 있다. 예를 들면, 전자 장치(1400)는 주기적으로 촬영 이미지를 획득하더라도, 인공 지능 모델에 대한 예측 신뢰도가 높은 것으로 식별되는 경우에는 합성 훈련용 데이터를 생성하지 않고, 예측 신뢰도가 낮은 것으로 식별된 때부터 합성 훈련용 데이터를 생성할 수 있다. 예측 신뢰도는 인공 지능 모델이 입력 이미지에 대응하는 예측에 대하여 확신하는 점수 또는 확률을 의미할 수 있다. 예를 들면, 객체 분류 모델에 관한 인공 지능 모델에 있어서, 인공 지능 모델이 분류할 수 있는 각 객체에 대한 예측 신뢰도는 인공 지능 모델이 입력 이미지에 대응하는 출력으로 각 객체를 예측하는 확률 또는 점수를 의미할 수 있다. 예를 들면, "사람"에 대한 예측 신뢰도는 인공 지능 모델이 입력 이미지에 대응하는 예측을 "사람"으로 판단하는 확률 또는 점수를 의미할 수 있다.According to an embodiment of the present disclosure, the electronic device 1400 may generate synthetic data when prediction confidence using an artificial intelligence model is low. For example, even if the electronic device 1400 periodically acquires captured images, it does not generate synthetic training data when the prediction reliability for the artificial intelligence model is identified as high, but starts from the time when the prediction reliability is identified as low. Synthetic training data can be generated. Prediction reliability may refer to the score or probability that the artificial intelligence model is confident about the prediction corresponding to the input image. For example, in an artificial intelligence model for object classification model, the prediction confidence for each object that the artificial intelligence model can classify is the probability or score that the artificial intelligence model predicts each object as an output corresponding to the input image. It can mean. For example, the prediction reliability for “person” may refer to the probability or score that the artificial intelligence model determines that the prediction corresponding to the input image is “person.”

도 16은 본 개시의 일 실시예에 따른 훈련용 데이터를 생성하여 인공 지능 모델을 활용하는 과정을 설명하기 위한 도면이다.FIG. 16 is a diagram illustrating a process of generating training data and utilizing an artificial intelligence model according to an embodiment of the present disclosure.

도 16을 참조하면, 본 개시의 일 실시예에 따른 전자 장치(1400)가 합성 훈련용 데이터를 생성하는 방법(1600)은 단계 1610부터 진행된다. Referring to FIG. 16, the method 1600 of generating synthetic training data by the electronic device 1400 according to an embodiment of the present disclosure proceeds from step 1610.

단계 1610에서, 전자 장치(1400)는 객체 특성을 식별할 수 있다. 예를 들면, 전자 장치(1400)는 촬영된 이미지에 포함된 객체의 특성(예: 종류, 나이, 성별 등)을 식별할 수 있다. 본 개시의 일 실시예에 따른 전자 장치(1400)는 객체 분석을 이용하여 객체의 특성을 식별할 수 있다. 예를 들면, 전자 장치(1400)는 촬영된 이미지를 입력으로 하고, 입력된 이미지에 포함된 객체의 특성을 출력으로 하는 인공 지능 모델을 이용하여 객체의 특성을 식별할 수 있다.In step 1610, the electronic device 1400 may identify object characteristics. For example, the electronic device 1400 may identify characteristics (eg, type, age, gender, etc.) of an object included in a captured image. The electronic device 1400 according to an embodiment of the present disclosure may identify characteristics of an object using object analysis. For example, the electronic device 1400 may identify the characteristics of an object using an artificial intelligence model that uses a captured image as an input and outputs the characteristics of the object included in the input image.

본 개시의 일 실시예에 따른 전자 장치(1400)는 객체 인식 또는 객체 분류를 이용하여 객체의 특성을 식별할 수 있다. 예를 들면, 전자 장치(1400)는 객체 인식 또는 객체 분류를 이용하여 촬영된 이미지에 포함된 객체를 식별하고, 식별된 객체의 특성을 결정함으로써 객체의 특성을 식별할 수 있다.The electronic device 1400 according to an embodiment of the present disclosure may identify characteristics of an object using object recognition or object classification. For example, the electronic device 1400 may identify an object included in a captured image using object recognition or object classification and identify the characteristics of the object by determining the characteristics of the identified object.

단계 1620에서, 전자 장치(1400)는 객체의 특성에 따른 훈련용 데이터를 획득할 수 있다. 예를 들면, 전자 장치(1400)는 객체가 흑인, 중년, 남성인 경우에는 해당 피부 타입, 나이, 성별에 대응되는 원본 훈련용 데이터를 획득할 수 있다.In step 1620, the electronic device 1400 may acquire training data according to the characteristics of the object. For example, if the object is black, middle-aged, or male, the electronic device 1400 may obtain original training data corresponding to the skin type, age, and gender.

단계 1630에서, 전자 장치(1400)는 객체 특성에 따라 획득된 원본 훈련용 데이터를 이용하여 합성 훈련용 데이터를 생성할 수 있다. 전자 장치(1400)가 원본 훈련용 데이터를 이용하여 합성 훈련용 데이터를 생성하는 방법은 전술한 바 생략한다. 전자 장치(1400)는 사용자 혹은 목표 객체와 유사한 원본 훈련용 데이터를 획득함으로써 객체와 유사한 합성 훈련용 데이터를 생성할 수 있다.In step 1630, the electronic device 1400 may generate synthetic training data using original training data obtained according to object characteristics. The method in which the electronic device 1400 generates synthetic training data using the original training data has been omitted as described above. The electronic device 1400 may generate synthetic training data similar to an object by acquiring original training data similar to a user or target object.

도 17은 본 개시의 일 실시예에 따른 전자 장치와 서버를 이용하여 훈련용 데이터를 생성하는 방법을 설명하기 위한 시퀀스 다이어그램이다.FIG. 17 is a sequence diagram illustrating a method of generating training data using an electronic device and a server according to an embodiment of the present disclosure.

도 17을 참조하면, 서버(1800)는 전자 장치(1400)를 통해 획득한 정보에 기초하여 모델을 합성 훈련용 데이터를 생성할 수 있다.Referring to FIG. 17 , the server 1800 may generate training data to synthesize a model based on information acquired through the electronic device 1400.

단계 1710에서, 전자 장치(1400)는 서버(1800)로 합성 훈련용 데이터 생성을 위한 이미지를 전송할 수 있다. 합성 훈련용 데이터 생성을 위한 이미지는 전자 장치(1400)가 촬영한 촬영 이미지, 배경 이미지, 또는 관심 영역 이미지 중 적어도 하나일 수 있다. In step 1710, the electronic device 1400 may transmit an image for generating synthetic training data to the server 1800. The image for generating synthetic training data may be at least one of a captured image captured by the electronic device 1400, a background image, or a region of interest image.

단계 1720에서, 서버(1800)는 합성 훈련용 데이터를 생성할 수 있다. 예를 들면, 서버(1800)가 관심 영역 이미지를 수신한 경우, 원본 훈련용 데이터와 합성함으로써 합성 훈련용 데이터를 생성할 수 있다. At step 1720, server 1800 may generate synthetic training data. For example, when the server 1800 receives a region of interest image, it can generate synthetic training data by combining it with the original training data.

단계 1730에서, 서버(1800)는 합성 훈련용 데이터에 기초하여 인공 지능 모델을 훈련시킬 수 있다. 단계 1740에서, 서버(1800)는 전자 장치(1400)로 훈련된 인공 지능 모델을 전송할 수 있다. In step 1730, the server 1800 may train an artificial intelligence model based on synthetic training data. In step 1740, the server 1800 may transmit the trained artificial intelligence model to the electronic device 1400.

일 실시예에 따라, 서버(1800)가 합성 훈련용 데이터를 전자 장치(1400)로 전달하고 전자 장치(1400)가 수신한 합성 훈련용 데이터를 이용하여 인공 지능 모델을 학습할 수 있다.According to one embodiment, the server 1800 may transmit synthetic training data to the electronic device 1400 and the electronic device 1400 may learn an artificial intelligence model using the received synthetic training data.

도 18은 본 개시의 일 실시예에 따른 서버의 구성을 설명하기 위한 블록도이다.Figure 18 is a block diagram for explaining the configuration of a server according to an embodiment of the present disclosure.

도 18을 참조하면, 서버(1800)는 메모리(1810), 프로세서(1820), 통신 인터페이스(1830)를 포함할 수 있으나, 이에 한정되는 것은 아니며, 범용적인 구성이 더 추가될 수 있다.Referring to FIG. 18, the server 1800 may include a memory 1810, a processor 1820, and a communication interface 1830, but is not limited thereto, and additional general-purpose configurations may be added.

일 실시예에 따른 메모리(1810)는, 프로세서(1820)의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 서버(1800)로 입력되거나 서버(1800)로부터 출력되는 데이터를 저장할 수 있다. 메모리(1810)는 프로세서(1820)가 판독할 수 있는 명령어들, 데이터 구조, 및 프로그램 코드(program code)가 저장될 수 있다. 개시된 실시예들에서, 프로세서(1820)가 수행하는 동작들은 메모리(1810)에 저장된 프로그램의 명령어들 또는 코드들을 실행함으로써 구현될 수 있다.The memory 1810 according to one embodiment may store a program for processing and control of the processor 1820, and may store data input to or output from the server 1800. The memory 1810 may store instructions, data structures, and program code that the processor 1820 can read. In the disclosed embodiments, operations performed by the processor 1820 may be implemented by executing instructions or codes of a program stored in the memory 1810.

일 실시예에 따른 메모리(1810)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등)를 포함할 수 있으며, 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나를 포함하는 비 휘발성 메모리 및 램(RAM, Random Access Memory) 또는 SRAM(Static Random Access Memory)과 같은 휘발성 메모리를 포함할 수 있다.The memory 1810 according to one embodiment may be a flash memory type, a hard disk type, a multimedia card micro type, or a card type memory (for example, SD or XD). memory, etc.), and may include at least one of ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, and optical disk. It may include non-volatile memory, such as RAM (Random Access Memory) or SRAM (Static Random Access Memory).

일 실시예에 따른 메모리(1810)는 훈련용 데이터를 생성하는 서버(1800)가 태스크를 수행할 수 있도록 제어하는 하나 이상의 인스트럭션 및/또는 프로그램을 저장할 수 있다. 예를 들어, 메모리(1810)에는 관심 영역 식별부(1811), 관심 영역 이미지 획득부(1812), 목표 훈련용 데이터 결정부(1813), 합성 훈련용 데이터 생성부(1814) 등이 저장될 수 있다.The memory 1810 according to an embodiment may store one or more instructions and/or programs that control the server 1800 that generates training data to perform a task. For example, the memory 1810 may store a region of interest identification unit 1811, a region of interest image acquisition unit 1812, a target training data determination unit 1813, a synthetic training data generation unit 1814, etc. there is.

일 실시예에 따른 프로세서(1820)는 메모리(1810)에 저장된 명령어들이나 프로그램화된 소프트웨어 모듈을 실행함으로써, 서버(1800)가 태스크를 수행할 수 있도록 동작이나 기능을 제어할 수 있다. 프로세서(1820)는 산술, 로직 및 입출력 연산과 시그널 프로세싱을 수행하는 하드웨어 구성 요소로 구성될 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션(instructions)을 실행함으로써, 서버(1800)가 합성 훈련용 데이터를 생성하는 태스크를 수행하는 전반적인 동작들을 제어할 수 있다. The processor 1820 according to one embodiment may control operations or functions of the server 1800 to perform tasks by executing instructions or programmed software modules stored in the memory 1810. The processor 1820 may be comprised of hardware components that perform arithmetic, logic, input/output operations, and signal processing. The processor 1820 may control the overall operations of the server 1800 to perform the task of generating synthetic training data by executing one or more instructions stored in the memory 1810.

일 실시예에 따른 프로세서(1820)는 예를 들어, 중앙 처리 장치(Central Processing Unit), 마이크로 프로세서(microprocessor), 그래픽 처리 장치(Graphic Processing Unit), ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 애플리케이션 프로세서(Application Processor), 신경망 처리 장치(Neural Processing Unit) 또는 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계된 인공지능 전용 프로세서 중 적어도 하나로 구성될 수 있으나, 이에 제한되는 것은 아니다. 프로세서(1820)를 구성하는 각 프로세서는 소정의 기능을 수행하기 위한 전용 프로세서일 수 있다.The processor 1820 according to one embodiment includes, for example, a central processing unit, a microprocessor, a graphics processing unit, application specific integrated circuits (ASICs), and digital signal processing units (DSPs). Processors), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), Application Processors, Neural Processing Units, or hardware specialized for processing artificial intelligence models. It may consist of at least one of the designed artificial intelligence processors, but is not limited to this. Each processor constituting the processor 1820 may be a dedicated processor to perform a certain function.

일 실시예에 따른 인공 지능(AI; artificial intelligence) 프로세서는, 인공지능(AI) 모델을 이용하여, 서버(1800)가 수행하도록 설정된 태스크의 처리를 위해, 연산 및 제어를 수행할 수 있다. AI 프로세서는, 인공 지능(AI)을 위한 전용 하드웨어 칩 형태로 제작될 수도 있고, 또는 범용 프로세서(예: CPU 또는 application processor) 또는 그래픽 전용 프로세서(예: GPU)의 일부로 제작되어 서버(1800)에 탑재될 수도 있다.An artificial intelligence (AI) processor according to one embodiment may use an artificial intelligence (AI) model to perform computation and control to process tasks that the server 1800 is set to perform. The AI processor may be manufactured in the form of a dedicated hardware chip for artificial intelligence (AI), or may be manufactured as part of a general-purpose processor (eg, CPU or application processor) or graphics-specific processor (eg, GPU) and installed in the server 1800. It may be mounted.

본 개시의 일 실시예에 따르면 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 학습 대상에 해당하는 객체를 포함하는 제1 이미지를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지와 동일한 시점에서 촬영된 제2 이미지에 기초하여, 객체를 포함하지 않는 배경 이미지를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지 내에서 객체를 포함하는 관심 영역을 식별할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 배경 이미지로부터 관심 영역에 해당하는 관심 영역 이미지를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 기생성된 원본 훈련용 데이터 중에서, 관심 영역 이미지의 너비와 높이의 비율에 기초하여 결정된 객체의 컨텍스트에 대응되는 제1 목표 훈련용 데이터를 결정할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 제1 목표 훈련용 데이터에 포함된 제1 훈련용 이미지를 수정함으로써, 배경 이미지의 적어도 일부가 포함된 합성 훈련용 데이터를 생성할 수 있다.According to an embodiment of the present disclosure, the processor 1820 may acquire a first image including an object corresponding to a learning target by executing one or more instructions stored in the memory 1810. The processor 1820 may execute one or more instructions stored in the memory 1810 to obtain a background image that does not include an object based on a second image taken at the same time as the first image. The processor 1820 may identify a region of interest including an object in the first image by executing one or more instructions stored in the memory 1810. The processor 1820 may obtain a region of interest image corresponding to the region of interest from the background image by executing one or more instructions stored in the memory 1810. The processor 1820 executes one or more instructions stored in the memory 1810 to train a first target corresponding to the context of the object determined based on the ratio of the width and height of the image of the region of interest, among the generated original training data. data can be determined. The processor 1820 executes one or more instructions stored in the memory 1810, thereby modifying the first training image included in the first target training data based on the region of interest image, so that at least a portion of the background image is included. You can generate synthetic training data.

본 개시의 일 실시예에 따르면 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 소정의 배경을 포함하는 이미지 및 제1 목표 훈련용 데이터에 기초하여 생성된 제1 테스트 이미지와 배경 이미지 및 제1 목표 훈련용 데이터에 기초하여 생성된 제2 테스트 이미지를 이용하여, 인공 지능 모델의 예측 정확도를 결정할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 테스트 이미지에 대한 제1 예측 정확도 및 제2 테스트 이미지에 대한 제2 예측 정확도에 기초하여, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1820 executes one or more instructions stored in the memory 1810 to create an image including a predetermined background and a first test image generated based on the first target training data. The prediction accuracy of the artificial intelligence model can be determined using the second test image generated based on the background image and the first target training data. Processor 1820 executes one or more instructions stored in memory 1810 to generate additional synthetic training data based on the first prediction accuracy for the first test image and the second prediction accuracy for the second test image. can do.

본 개시의 일 실시예에 따르면 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 예측 정확도가 소정의 값보다 크거나 같고, 제2 예측 정확도가 소정의 값보다 작은 것에 기초하여, 원본 훈련용 데이터 중에서 제2 목표 훈련용 데이터를 결정할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 제2 목표 훈련용 데이터에 포함된 제2 훈련용 이미지를 수정함으로써, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1820 executes one or more instructions stored in the memory 1810 to determine whether the first prediction accuracy is greater than or equal to a predetermined value and the second prediction accuracy is less than a predetermined value. Based on this, the second target training data can be determined from the original training data. The processor 1820 generates additional synthetic training data by executing one or more instructions stored in the memory 1810 and modifying the second training image included in the second target training data based on the region of interest image. can do.

본 개시의 일 실시예에 따르면, 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 적어도 하나의 프로세서(1820)가 적어도 하나의 인스트럭션을 실행함으로써, 제1 예측 정확도 및 제2 예측 정확도가 소정의 값보다 작은 것에 기초하여, 배경 이미지로부터 관심 영역 이미지의 높이 및 너비와 동일한 추가 관심 영역 이미지를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 추가 관심 영역 이미지에 기초하여, 제1 훈련용 이미지를 수정함으로써, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1820 executes one or more instructions stored in the memory 1810, so that at least one processor 1820 executes at least one instruction, thereby obtaining the first prediction accuracy and the second Based on the prediction accuracy being less than a predetermined value, an additional region of interest image equal to the height and width of the region of interest image may be obtained from the background image. The processor 1820 may generate additional synthetic training data by executing one or more instructions stored in the memory 1810 and modifying the first training image based on the additional region of interest image.

본 개시의 일 실시예에 따르면, 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지의 색상 정보 또는 밝기 정보 중 적어도 하나를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지의 색상 정보 또는 밝기 정보 중 적어도 하나에 기초하여 제1 훈련용 이미지의 색상 또는 밝기 중 적어도 하나를 수정할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 수정된 제1 훈련용 이미지를 수정함으로써, 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1820 may obtain at least one of color information or brightness information of the region of interest image by executing one or more instructions stored in the memory 1810. The processor 1820 may modify at least one of the color or brightness of the first training image based on at least one of the color information or brightness information of the region of interest image by executing one or more instructions stored in the memory 1810. The processor 1820 may generate synthetic training data by executing one or more instructions stored in the memory 1810 and modifying the modified first training image based on the region of interest image.

본 개시의 일 실시예에 따르면, 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 관심 영역에 관한 제1 컨텍스트 정보를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 전자 장치의 상태에 관한 제2 컨텍스트 정보를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 컨텍스트 정보, 제2 컨텍스트 정보 및 배경 이미지의 너비와 높이의 비율에 기초하여, 객체의 컨텍스트를 결정할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 원본 훈련용 데이터 중에서 객체의 컨텍스트에 대응되는 제1 목표 훈련용 데이터를 결정할 수 있다. According to an embodiment of the present disclosure, the processor 1820 may obtain first context information about the region of interest based on the region of interest image by executing one or more instructions stored in the memory 1810. The processor 1820 may obtain second context information about the state of the electronic device by executing one or more instructions stored in the memory 1810. The processor 1820 may execute one or more instructions stored in the memory 1810 to determine the context of the object based on the ratio of the width and height of the first context information, the second context information, and the background image. The processor 1820 may determine first target training data corresponding to the context of the object from among the original training data by executing one or more instructions stored in the memory 1810.

본 개시의 일 실시예에 따르면, 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지와 배경 이미지 사이의 잔차 이미지를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 잔차 이미지에 기초하여 관심 영역을 식별할 수 있다.According to one embodiment of the present disclosure, the processor 1820 may obtain a residual image between the first image and the background image by executing one or more instructions stored in the memory 1810. The processor 1820 may identify a region of interest based on the residual image by executing one or more instructions stored in the memory 1810.

본 개시의 일 실시예에 따르면, 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 훈련용 이미지에 포함된 객체에 해당하는 객체 이미지를 추출할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지와 객체 이미지를 합성함으로써 생성된 합성 훈련용 이미지를 포함하는 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 1820 may extract an object image corresponding to an object included in the first training image by executing one or more instructions stored in the memory 1810. The processor 1820 may generate synthetic training data including a synthetic training image generated by combining a region of interest image and an object image by executing one or more instructions stored in the memory 1810.

본 개시의 일 실시예에 따르면, 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 합성 훈련용 데이터를 이용하여 인공 지능 모델을 학습시킬 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 통신 인터페이스(1830)를 이용하여 학습된 인공지능 모델을 외부 전자 장치로 전송할 수 있다.According to an embodiment of the present disclosure, the processor 1820 can learn an artificial intelligence model using synthetic training data by executing one or more instructions stored in the memory 1810. The processor 1820 may transmit the learned artificial intelligence model to an external electronic device using the communication interface 1830 by executing one or more instructions stored in the memory 1810.

본 개시의 일 실시예에 따르면, 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 인공 지능 모델을 이용하여, 인공 지능 모델을 이용하여, 제1 이미지로부터 객체에 관한 실측 훈련용 데이터를 생성할 수 있다. 객체 이미지 및 배경 이미지에 기초하여, 실측 합성 훈련용 이미지를 생성할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 실측 훈련용 데이터와 배경 이미지에 기초하여, 실측 합성 훈련용 데이터를 생성할 수 있다. 제3 이미지를 인공 지능 모델에 입력함으로써, 제3 이미지에 포함된 객체에 관한 예측 데이터를 획득할 수 있다. 프로세서(1820)는 메모리(1810)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 실측 합성 훈련용 이미지와 상기 예측 데이터에 기초하여, 상기 객체에 관한 실측 합성 훈련용 데이터를 생성할 수 있다.According to one embodiment of the present disclosure, the processor 1820 executes one or more instructions stored in the memory 1810, using an artificial intelligence model, to perform ground truth training on the object from the first image using the artificial intelligence model. data can be generated. Based on the object image and background image, a ground truth synthetic training image can be generated. The processor 1820 may generate ground truth synthetic training data based on ground truth training data and background images by executing one or more instructions stored in the memory 1810. By inputting the third image into the artificial intelligence model, prediction data about the object included in the third image can be obtained. The processor 1820 may generate ground truth synthesis training data for the object based on the ground truth synthesis training image and the prediction data by executing one or more instructions stored in the memory 1810.

통신 인터페이스(1830)는 통신 회로를 포함할 수 있다. 통신 인터페이스(1830)는 예를 들어, 유선 랜, 무선 랜(Wireless LAN), 와이파이(Wi-Fi), 블루투스(Bluetooth), 지그비(ZigBee), WFD(Wi-Fi Direct), 적외선 통신(IrDA, infrared Data Association), BLE (Bluetooth Low Energy), NFC(Near Field Communication), 와이브로(Wireless Broadband Internet, Wibro), 와이맥스(World Interoperability for Microwave Access, WiMAX), SWAP(Shared Wireless Access Protocol), 와이기그(Wireless Gigabit Alliances, WiGig) 및 RF 통신을 포함하는 데이터 통신 방식 중 적어도 하나를 이용하여, 서버(1800)와 다른 디바이스들 간의 데이터 통신을 수행할 수 있는, 통신 회로를 포함할 수 있다.Communication interface 1830 may include communication circuitry. The communication interface 1830 includes, for example, wired LAN, wireless LAN, Wi-Fi, Bluetooth, ZigBee, Wi-Fi Direct (WFD), and infrared communication (IrDA). infrared Data Association), BLE (Bluetooth Low Energy), NFC (Near Field Communication), Wibro (Wireless Broadband Internet, Wibro), WiMAX (World Interoperability for Microwave Access, WiMAX), SWAP (Shared Wireless Access Protocol), WiGig It may include a communication circuit capable of performing data communication between the server 1800 and other devices using at least one of data communication methods including (Wireless Gigabit Alliances, WiGig) and RF communication.

통신 인터페이스(1830)는 서버(1800)의 훈련용 데이터를 생성하기 위한 데이터를 외부 전자 장치와 송수신할 수 있다. 예를 들어, 통신 인터페이스(1830)는 서버(1800)가 이용하는 인공지능 모델들을 송수신하거나, 인공지능 모델들 혹은 관심 영역 이미지를 외부 전자 장치 등과 송수신할 수 있다. The communication interface 1830 may transmit and receive data for generating training data for the server 1800 with an external electronic device. For example, the communication interface 1830 may transmit and receive artificial intelligence models used by the server 1800, or may transmit and receive artificial intelligence models or region-of-interest images to external electronic devices.

기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, ‘비일시적 저장매체'는 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다. 예로, '비일시적 저장매체'는 데이터가 임시적으로 저장되는 버퍼를 포함할 수 있다.A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' only means that it is a tangible device and does not contain signals (e.g. electromagnetic waves). This term refers to cases where data is semi-permanently stored in a storage medium and temporary storage media. It does not distinguish between cases where it is stored as . For example, a 'non-transitory storage medium' may include a buffer where data is temporarily stored.

일 실시예에 따르면, 본 문서에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어를 통해 또는 두개의 사용자 장치들(예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품(예: 다운로더블 앱(downloadable app))의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, methods according to various embodiments disclosed in this document may be provided and included in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. A computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store or between two user devices (e.g. smartphones). It may be distributed in person or online (e.g., downloaded or uploaded). In the case of online distribution, at least a portion of the computer program product (e.g., a downloadable app) is stored on a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server. It can be temporarily stored or created temporarily.

본 개시의 일 실시예에 따라서, 훈련용 데이터 생성 방법은 소정의 배경을 포함하는 이미지 및 제1 목표 훈련용 데이터에 기초하여 생성된 제1 테스트 이미지와 배경 이미지 및 제1 목표 훈련용 데이터에 기초하여 생성된 제2 테스트 이미지를 이용하여, 인공 지능 모델의 예측 정확도를 결정하는 단계를 포함할 수 있다. 훈련용 데이터 생성 방법은 제1 테스트 이미지에 대한 제1 예측 정확도 및 제2 테스트 이미지에 대한 제2 예측 정확도에 기초하여, 추가 합성 훈련용 데이터를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, a method of generating training data is based on a first test image generated based on an image including a predetermined background and first target training data, and a background image and first target training data. It may include determining the prediction accuracy of the artificial intelligence model using the generated second test image. The method of generating training data may include generating additional synthetic training data based on the first prediction accuracy for the first test image and the second prediction accuracy for the second test image.

본 개시의 일 실시예에 따라서, 추가 합성 훈련용 데이터를 생성하는 단계는 제1 예측 정확도가 소정의 값보다 크거나 같고, 제2 예측 정확도가 소정의 값보다 작은 것에 기초하여, 원본 훈련용 데이터 중에서 제2 목표 훈련용 데이터를 결정하는 단계를 포함할 수 있다. 추가 합성 훈련용 데이터를 생성하는 단계는 관심 영역 이미지에 기초하여, 제2 목표 훈련용 데이터에 포함된 제2 훈련용 이미지를 수정함으로써, 추가 합성 훈련용 데이터를 생성하는 단계를 포함할 수 있다. According to an embodiment of the present disclosure, generating additional synthetic training data includes original training data based on the first prediction accuracy being greater than or equal to a predetermined value and the second prediction accuracy being less than the predetermined value. It may include the step of determining second target training data. Generating additional synthetic training data may include generating additional synthetic training data by modifying a second training image included in the second target training data based on the region of interest image.

본 개시의 일 실시예에 따라서, 추가 합성 훈련용 데이터를 생성하는 단계는 제1 예측 정확도 및 제2 예측 정확도가 소정의 값보다 작은 것에 기초하여, 배경 이미지로부터 관심 영역 이미지의 높이 및 너비와 동일한 추가 관심 영역 이미지를 획득하는 단계를 포함할 수 있다. 추가 합성 훈련용 데이터를 생성하는 단계는 추가 관심 영역 이미지에 기초하여, 제1 훈련용 이미지를 수정함으로써, 추가 합성 훈련용 데이터를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, generating additional synthetic training data includes the first prediction accuracy and the second prediction accuracy being equal to the height and width of the region of interest image from the background image, based on the first prediction accuracy and the second prediction accuracy being less than a predetermined value. It may include acquiring additional images of the region of interest. Generating additional synthetic training data may include generating additional synthetic training data by modifying the first training image based on the additional region of interest image.

본 개시의 일 실시예에 따라서, 합성 훈련용 데이터를 생성하는 단계는, 관심 영역 이미지의 색상 정보 또는 밝기 정보 중 적어도 하나를 획득하는 단계를 포함할 수 있다. 합성 훈련용 데이터를 생성하는 단계는 관심 영역 이미지의 색상 정보 또는 밝기 정보 중 적어도 하나에 기초하여 제1 훈련용 이미지의 색상 또는 밝기 중 적어도 하나를 수정하는 단계를 포함할 수 있다. 합성 훈련용 데이터를 생성하는 단계는 관심 영역 이미지에 기초하여, 수정된 제1 훈련용 이미지를 수정함으로써, 합성 훈련용 데이터를 생성하는 단계를 포함할 수 있다. According to an embodiment of the present disclosure, generating synthetic training data may include acquiring at least one of color information or brightness information of an image of a region of interest. Generating synthetic training data may include modifying at least one of the color or brightness of the first training image based on at least one of color information or brightness information of the region of interest image. Generating synthetic training data may include generating synthetic training data by modifying the modified first training image based on the region of interest image.

본 개시의 일 실시예에 따라서, 제1 목표 훈련용 데이터를 결정하는 단계는, 관심 영역 이미지에 기초하여, 관심 영역에 관한 제1 컨텍스트 정보를 획득하는 단계를 포함할 수 있다. 제1 목표 훈련용 데이터를 결정하는 단계는, 전자 장치의 상태에 관한 제2 컨텍스트 정보를 획득하는 단계를 포함할 수 있다. 제1 목표 훈련용 데이터를 결정하는 단계는, 제1 컨텍스트 정보, 제2 컨텍스트 정보 및 배경 이미지의 너비와 높이의 비율에 기초하여, 객체의 컨텍스트를 결정하는 단계를 포함할 수 있다. 제1 목표 훈련용 데이터를 결정하는 단계는, 원본 훈련용 데이터 중에서 객체의 컨텍스트에 대응되는 제1 목표 훈련용 데이터를 결정하는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, determining first target training data may include obtaining first context information about the region of interest based on the region of interest image. Determining the first target training data may include obtaining second context information regarding the state of the electronic device. Determining the first target training data may include determining the context of the object based on the ratio of the width and height of the first context information, the second context information, and the background image. The step of determining the first target training data may include determining first target training data corresponding to the context of the object from among the original training data.

본 개시의 일 실시예에 따라서, 관심 영역을 식별하는 단계는, 제1 이미지와 배경 이미지 사이의 잔차 이미지를 획득하는 단계를 포함할 수 있다. 관심 영역을 식별하는 단계는, 잔차 이미지에 기초하여 관심 영역을 식별하는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, identifying the region of interest may include obtaining a residual image between the first image and the background image. Identifying the region of interest may include identifying the region of interest based on the residual image.

본 개시의 일 실시예에 따라서, 합성 훈련용 데이터를 생성하는 단계는, 제1 훈련용 이미지에 포함된 객체에 해당하는 객체 이미지를 추출하는 단계를 포함할 수 있다. 합성 훈련용 데이터를 생성하는 단계는, 관심 영역 이미지와 객체 이미지를 합성함으로써 생성된 합성 훈련용 이미지를 포함하는 합성 훈련용 데이터를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, generating synthetic training data may include extracting an object image corresponding to an object included in the first training image. The step of generating synthetic training data may include generating synthetic training data including a synthetic training image generated by combining a region of interest image and an object image.

본 개시의 일 실시예에 따라서, 훈련용 데이터 생성 방법은 객체 이미지 및 배경 이미지에 기초하여, 실측 합성 훈련용 이미지를 생성하는 단계를 포함할 수 있다. 훈련용 데이터 생성 방법은 제3 이미지를 인공 지능 모델에 입력함으로써, 제3 이미지에 포함된 객체에 관한 예측 데이터를 획득하는 단계를 포함할 수 있다. 훈련용 데이터 생성 방법은 실측 합성 훈련용 이미지와 상기 예측 데이터에 기초하여, 상기 객체에 관한 실측 합성 훈련용 데이터를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, a method of generating training data may include generating a ground truth synthetic training image based on an object image and a background image. The method of generating training data may include obtaining prediction data about an object included in the third image by inputting the third image into an artificial intelligence model. The training data generation method may include generating ground truth synthetic training data for the object based on the ground truth synthetic training image and the prediction data.

본 개시의 일 실시예에 따라서, 인공 지능 모델은 포즈 추정 모델, 객체 탐지 모델, 또는 객체 분류 모델 중 적어도 하나를 포함할 수 있다. 객체는 사람, 동물, 또는 사물 중 적어도 하나를 포함할 수 있다.According to an embodiment of the present disclosure, the artificial intelligence model may include at least one of a pose estimation model, an object detection model, or an object classification model. An object may include at least one of a person, animal, or object.

본 개시의 일 양태에 따르면 합성 훈련용 데이터를 생성하는 전자 장치(1400)가 제공된다. 전자 장치(1400)는 메모리(1410) 및 적어도 하나의 프로세서(1420)를 포함할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 학습 대상에 해당하는 객체를 포함하는 제1 이미지를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지와 동일한 시점에서 촬영된 제2 이미지에 기초하여, 객체를 포함하지 않는 배경 이미지를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지 내에서 객체를 포함하는 관심 영역을 식별할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 배경 이미지로부터 관심 영역에 해당하는 관심 영역 이미지를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 기생성된 원본 훈련용 데이터 중에서, 관심 영역 이미지의 너비와 높이의 비율에 기초하여 결정된 객체의 컨텍스트에 대응되는 제1 목표 훈련용 데이터를 결정할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 제1 목표 훈련용 데이터에 포함된 제1 훈련용 이미지를 수정함으로써, 배경 이미지의 적어도 일부가 포함된 합성 훈련용 데이터를 생성할 수 있다.According to one aspect of the present disclosure, an electronic device 1400 that generates synthetic training data is provided. The electronic device 1400 may include a memory 1410 and at least one processor 1420. At least one processor 1420 may acquire a first image including an object corresponding to a learning target by executing one or more instructions stored in the memory 1410. At least one processor 1420 may execute one or more instructions stored in the memory 1410 to obtain a background image that does not include an object based on a second image taken at the same time as the first image. At least one processor 1420 may identify a region of interest including an object in the first image by executing one or more instructions stored in the memory 1410. At least one processor 1420 may obtain a region of interest image corresponding to the region of interest from the background image by executing one or more instructions stored in the memory 1410. At least one processor 1420 executes one or more instructions stored in the memory 1410, thereby selecting a second object corresponding to the context of the object determined based on the ratio of the width and height of the image of the region of interest from among the generated original training data. 1 Target training data can be determined. At least one processor 1420 executes one or more instructions stored in the memory 1410 to modify the first training image included in the first target training data based on the region of interest image, thereby modifying at least one of the background images. You can create synthetic training data that includes some

본 개시의 일 실시예에 따르면 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 소정의 배경을 포함하는 이미지 및 제1 목표 훈련용 데이터에 기초하여 생성된 제1 테스트 이미지와 배경 이미지 및 제1 목표 훈련용 데이터에 기초하여 생성된 제2 테스트 이미지를 이용하여, 인공 지능 모델의 예측 정확도를 결정할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 테스트 이미지에 대한 제1 예측 정확도 및 제2 테스트 이미지에 대한 제2 예측 정확도에 기초하여, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, at least one processor 1420 executes one or more instructions stored in the memory 1410, thereby generating a first image based on an image including a predetermined background and first target training data. The prediction accuracy of the artificial intelligence model can be determined using the test image, the background image, and the second test image generated based on the first target training data. At least one processor 1420 executes one or more instructions stored in memory 1410, based on the first prediction accuracy for the first test image and the second prediction accuracy for the second test image, for additional synthetic training. Data can be generated.

본 개시의 일 실시예에 따르면 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 예측 정확도가 소정의 값보다 크거나 같고, 제2 예측 정확도가 소정의 값보다 작은 것에 기초하여, 원본 훈련용 데이터 중에서 제2 목표 훈련용 데이터를 결정할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 제2 목표 훈련용 데이터에 포함된 제2 훈련용 이미지를 수정함으로써, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, at least one processor 1420 executes one or more instructions stored in the memory 1410, so that the first prediction accuracy is greater than or equal to a predetermined value and the second prediction accuracy is greater than or equal to a predetermined value. Based on the smaller one, the second target training data can be determined from the original training data. At least one processor 1420 executes one or more instructions stored in the memory 1410 to modify the second training image included in the second target training data based on the region of interest image, thereby performing additional synthetic training. Data can be generated.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 적어도 하나의 프로세서(1420)가 적어도 하나의 인스트럭션을 실행함으로써, 제1 예측 정확도 및 제2 예측 정확도가 소정의 값보다 작은 것에 기초하여, 배경 이미지로부터 관심 영역 이미지의 높이 및 너비와 동일한 추가 관심 영역 이미지를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 추가 관심 영역 이미지에 기초하여, 제1 훈련용 이미지를 수정함으로써, 추가 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, the at least one processor 1420 executes one or more instructions stored in the memory 1410, so that the at least one processor 1420 executes the at least one instruction, thereby achieving a first prediction accuracy. And based on the second prediction accuracy being less than a predetermined value, an additional region of interest image equal to the height and width of the region of interest image may be obtained from the background image. At least one processor 1420 may generate additional synthetic training data by executing one or more instructions stored in the memory 1410 and modifying the first training image based on the additional ROI image.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지의 색상 정보 또는 밝기 정보 중 적어도 하나를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지의 색상 정보 또는 밝기 정보 중 적어도 하나에 기초하여 제1 훈련용 이미지의 색상 또는 밝기 중 적어도 하나를 수정할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 수정된 제1 훈련용 이미지를 수정함으로써, 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, at least one processor 1420 may acquire at least one of color information or brightness information of a region of interest image by executing one or more instructions stored in the memory 1410. At least one processor 1420 executes one or more instructions stored in the memory 1410 to modify at least one of the color or brightness of the first training image based on at least one of the color information or brightness information of the region of interest image. You can. At least one processor 1420 may generate synthetic training data by executing one or more instructions stored in the memory 1410 and modifying the modified first training image based on the region of interest image.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지에 기초하여, 관심 영역에 관한 제1 컨텍스트 정보를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 전자 장치의 상태에 관한 제2 컨텍스트 정보를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 컨텍스트 정보, 제2 컨텍스트 정보 및 배경 이미지의 너비와 높이의 비율에 기초하여, 객체의 컨텍스트를 결정할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 원본 훈련용 데이터 중에서 객체의 컨텍스트에 대응되는 제1 목표 훈련용 데이터를 결정할 수 있다. According to an embodiment of the present disclosure, at least one processor 1420 may acquire first context information about the region of interest based on the region of interest image by executing one or more instructions stored in the memory 1410. . At least one processor 1420 may obtain second context information about the state of the electronic device by executing one or more instructions stored in the memory 1410. At least one processor 1420 may determine the context of the object based on the ratio of the width and height of the first context information, the second context information, and the background image by executing one or more instructions stored in the memory 1410. . At least one processor 1420 may execute one or more instructions stored in the memory 1410 to determine first target training data corresponding to the context of the object from among the original training data.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 이미지와 배경 이미지 사이의 잔차 이미지를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 잔차 이미지에 기초하여 관심 영역을 식별할 수 있다.According to an embodiment of the present disclosure, at least one processor 1420 may obtain a residual image between the first image and the background image by executing one or more instructions stored in the memory 1410. At least one processor 1420 may identify a region of interest based on the residual image by executing one or more instructions stored in the memory 1410.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제1 훈련용 이미지에 포함된 객체에 해당하는 객체 이미지를 추출할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 관심 영역 이미지와 객체 이미지를 합성함으로써 생성된 합성 훈련용 이미지를 포함하는 합성 훈련용 데이터를 생성할 수 있다. According to an embodiment of the present disclosure, at least one processor 1420 may extract an object image corresponding to an object included in the first training image by executing one or more instructions stored in the memory 1410. At least one processor 1420 may generate synthetic training data including a synthetic training image generated by combining a region of interest image and an object image by executing one or more instructions stored in the memory 1410.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 객체 이미지 및 배경 이미지에 기초하여, 실측 합성 훈련용 이미지를 생성할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 제3 이미지를 인공 지능 모델에 입력함으로써, 제3 이미지에 포함된 객체에 관한 예측 데이터를 획득할 수 있다. 적어도 하나의 프로세서(1420)는 메모리(1410)에 저장된 하나 이상의 인스트럭션을 실행함으로써, 실측 합성 훈련용 이미지와 상기 예측 데이터에 기초하여, 상기 객체에 관한 실측 합성 훈련용 데이터를 생성할 수 있다.According to an embodiment of the present disclosure, at least one processor 1420 may generate a ground truth synthetic training image based on the object image and the background image by executing one or more instructions stored in the memory 1410. At least one processor 1420 may execute one or more instructions stored in the memory 1410 to input the third image into the artificial intelligence model, thereby obtaining prediction data regarding the object included in the third image. At least one processor 1420 may generate ground truth synthesis training data for the object based on the ground truth synthesis training image and the prediction data by executing one or more instructions stored in the memory 1410.

본 개시의 일 실시예에 따라, 인공 지능 모델은 포즈 추정 모델, 객체 탐지 모델, 또는 객체 분류 모델 중 적어도 하나를 포함할 수 있다. 객체는 사람, 동물, 또는 사물 중 적어도 하나를 포함할 수 있다.According to an embodiment of the present disclosure, the artificial intelligence model may include at least one of a pose estimation model, an object detection model, or an object classification model. An object may include at least one of a person, animal, or object.

Claims

In a method of generating training data for learning an artificial intelligence model,
Obtaining a first image including an object corresponding to a learning target;
Obtaining a background image that does not include the object based on a second image taken at the same point as the first image;
identifying a region of interest containing the object within the first image;
Obtaining a region of interest image corresponding to the region of interest from the background image;
determining first target training data corresponding to the context of the object determined based on the ratio of the width and height of the region of interest image from among the parasitically generated original training data; and
For training, comprising generating synthetic training data including at least a portion of the background image by modifying the first training image included in the first target training data based on the region of interest image. How data is generated.

According to paragraph 1,
Using a first test image generated based on an image including a predetermined background and the first target training data, and a second test image generated based on the background image and the first target training data, determining the prediction accuracy of the artificial intelligence model; and
The method further includes generating additional synthetic training data based on the first prediction accuracy for the first test image and the second prediction accuracy for the second test image.

According to paragraph 2,
The step of generating the additional synthetic training data is
determining second target training data from the original training data based on the first prediction accuracy being greater than or equal to a predetermined value and the second prediction accuracy being less than the predetermined value; and
A method of generating training data, comprising generating additional synthetic training data by modifying a second training image included in the second target training data based on the region of interest image.

According to any one of claims 2 to 3,
The step of generating the additional synthetic training data is
Based on the first prediction accuracy and the second prediction accuracy being less than a predetermined value, obtaining an additional region of interest image equal to the height and width of the region of interest image from the background image;
A method of generating training data, comprising generating additional synthetic training data by modifying the first training image based on the additional region of interest image.

According to any one of claims 1 to 4,
The step of generating the synthetic training data is,
Obtaining at least one of color information or brightness information of the region of interest image;
modifying at least one of color or brightness of the first training image based on at least one of color information or brightness information of the region of interest image; and
A method of generating training data, comprising generating the synthetic training data by modifying the modified first training image based on the region of interest image.

According to any one of claims 1 to 5,
The step of determining the first target training data is,
Based on the region of interest image, acquiring first context information about the region of interest;
Obtaining second context information regarding the state of the electronic device;
determining the context of the object based on the ratio of the width and height of the first context information, the second context information, and the background image; and
A method of generating training data, comprising the step of determining the first target training data corresponding to the context of the object from among the original training data.

According to any one of claims 1 to 6,
The step of identifying the region of interest is,
Obtaining a residual image between the first image and the background image; and
Method for generating training data, comprising identifying the region of interest based on the residual image.

According to any one of claims 1 to 7,
The step of generating the synthetic training data is,
extracting an object image corresponding to the object included in the first training image; and
A method of generating training data, comprising generating the synthetic training data including a synthetic training image generated by combining the region of interest image and the object image.

According to any one of claims 1 to 8,
acquiring a third image including the object;
Obtaining an object image for the object from the third image;
generating a ground truth synthetic training image based on the object image and the background image;
acquiring prediction data about the object included in the third image by inputting the third image into the artificial intelligence model; and
A training data generation method further comprising generating ground truth synthetic training data for the object based on the ground truth synthetic training image and the prediction data.

According to any one of claims 1 to 9,
The artificial intelligence model includes at least one of a pose estimation model, an object detection model, or an object classification model,
A method of generating training data, wherein the object includes at least one of a person, an animal, or an object.

In an electronic device that generates training data for learning an artificial intelligence model,
at least one processor 1420; and
Includes a memory 1410 that stores at least one instruction,
When the at least one processor 1420 executes the at least one instruction,
Acquire a first image containing an object corresponding to the learning target,
Based on a second image taken at the same point as the first image, obtain a background image that does not include the object,
Identifying a region of interest containing the object within the first image,
Obtaining a region of interest image corresponding to the region of interest from the background image,
Among the parasitic-generated original training data, determine first target training data corresponding to the context of the object determined based on the ratio of the width and height of the region of interest image,
An electronic device that generates synthetic training data including at least a portion of the background image by modifying the first training image included in the first target training data based on the region of interest image.

According to clause 11,
When the at least one processor 1420 executes the at least one instruction,
Using a first test image generated based on an image including a predetermined background and the first target training data, and a second test image generated based on the background image and the first target training data, Determine the predictive accuracy of the artificial intelligence model,
The electronic device generates additional synthetic training data based on the first prediction accuracy for the first test image and the second prediction accuracy for the second test image.

According to clause 12,
When the at least one processor 1420 executes the at least one instruction,
Based on the first prediction accuracy being greater than or equal to a predetermined value and the second prediction accuracy being less than the predetermined value, determining second target training data from the original training data,
An electronic device that generates additional synthetic training data by modifying a second training image included in the second target training data based on the region of interest image.

According to any one of claims 12 to 13,
When the at least one processor 1420 executes the at least one instruction,
Based on the first prediction accuracy and the second prediction accuracy being less than a predetermined value, obtaining an additional region of interest image equal to the height and width of the region of interest image from the background image,
The electronic device generates additional synthetic training data by modifying the first training image based on the additional region of interest image.

According to any one of claims 11 to 14,
When the at least one processor 1420 executes the at least one instruction,
Obtaining at least one of color information or brightness information of the region of interest image,
Modifying at least one of the color or brightness of the first training image based on at least one of color information or brightness information of the region of interest image,
An electronic device that generates the synthetic training data by modifying the modified first training image based on the region of interest image.

According to any one of claims 11 to 15,
When the at least one processor 1420 executes the at least one instruction,
Based on the region of interest image, obtain first context information about the region of interest,
Obtain second context information regarding the state of the electronic device,
Determine the context of the object based on the ratio of the width and height of the first context information, the second context information, and the background image,
An electronic device that determines the first target training data corresponding to the context of the object among the original training data.

According to any one of claims 11 to 16,
When the at least one processor 1420 executes the at least one instruction,
Obtaining a residual image between the first image and the background image,
An electronic device that identifies the region of interest based on the residual image.

According to any one of claims 11 to 17,
When the at least one processor 1420 executes the at least one instruction,
Extracting an object image corresponding to the object included in the first training image,
An electronic device that generates the synthetic training data including a synthetic training image generated by combining the region of interest image and the object image.

According to any one of claims 11 to 18,
When the at least one processor 1420 executes the at least one instruction,
acquiring a third image including the object;
Obtaining an object image for the object from the third image;
generating a ground truth synthetic training image based on the object image and the background image;
acquiring prediction data about the object included in the third image by inputting the third image into the artificial intelligence model; and
An electronic device that generates ground truth synthetic training data for the object based on the ground truth synthetic training image and the prediction data.

One or more computer-readable recording media storing a program for performing the method of any one of claims 1 to 10.