KR20230157228A

KR20230157228A - Electronic apparatus augmenting learning data and method for controlling thereof

Info

Publication number: KR20230157228A
Application number: KR1020220178747A
Authority: KR
Inventors: 김지만; 김예훈; 서찬원
Original assignee: 삼성전자주식회사
Priority date: 2022-05-09
Filing date: 2022-12-19
Publication date: 2023-11-16

Abstract

본 개시는 전자 장치 및 그 제어 방법을 제공한다. 본 개시의 일 실시 예에 따른 전자 장치는, 복수의 2D 포즈 데이터 및 복수의 2D 포즈 데이터에 각각 대응하는 복수의 3D 포즈 데이터를 쌍으로 포함하는 제1 학습 데이터 셋을 저장한 메모리 및 제1 학습 데이터 셋에 기초하여 3D 포즈를 추정하도록 제1 신경망 모델을 학습 시키는 하나 이상의 프로세서를 포함한다. 하나 이상의 프로세서는 제1 학습 데이터 셋을 증강시켜 증강 데이터 셋을 획득하고, 획득된 증강 데이터 셋에 포함된 3D 포즈 증강 데이터의 유사도 및 신뢰도 중 적어도 하나에 기초하여, 증강 데이터 셋에 포함된 복수의 3D 포즈 증강 데이터 중 적어도 하나의 3D 포즈 증강 데이터를 선별하고, 선별된 3D 포즈 증강 데이터 및 증강 데이터 셋에 포함된 선별된 3D 포즈 증강 데이터에 대응하는 2D 포즈 증강 데이터를 쌍으로 포함하는 제2 학습 데이터 셋을 획득하고, 획득된 제2 학습 데이터 셋에 기초하여 제1 신경망 모델을 재학습 시킨다.This disclosure provides an electronic device and a method for controlling the same. An electronic device according to an embodiment of the present disclosure includes a memory storing a first learning data set pairwise including a plurality of 2D pose data and a plurality of 3D pose data corresponding to the plurality of 2D pose data, and a first learning data set. and one or more processors that train a first neural network model to estimate a 3D pose based on the data set. One or more processors acquire an augmented data set by augmenting the first learning data set, and based on at least one of similarity and reliability of 3D pose augmented data included in the acquired augmented data set, a plurality of processors included in the augmented data set. Second learning that selects at least one 3D pose augmented data from among the 3D pose augmented data and includes a pair of 2D pose augmented data corresponding to the selected 3D pose augmented data and the selected 3D pose augmented data included in the augmented data set. A data set is acquired, and the first neural network model is retrained based on the acquired second learning data set.

Description

Electronic device for augmenting learning data and control method thereof {ELECTRONIC APPARATUS AUGMENTING LEARNING DATA AND METHOD FOR CONTROLLING THEREOF}

본 개시는 신경망 모델을 학습 시키는 전자 장치 및 그 제어 방법이다. 보다 구체적으로는, 신경망 모델을 학습 시키는데 이용된 학습 데이터를 증강 시켜 신경망 모델을 재학습 시키는데 이용될 수 있는 재학습 데이터를 획득하는 전자 장치 및 그 제어 방법에 관한 것이다. The present disclosure is an electronic device that learns a neural network model and a method of controlling the same. More specifically, it relates to an electronic device and a control method for acquiring retraining data that can be used to retrain a neural network model by augmenting the training data used to train the neural network model.

최근 전자 기술의 발달로 다양한 모델에서 딥 러닝 모델이 이용되고 있다. 예를 들어, 스피커는 딥 러닝 모델을 바탕으로 사용자의 음성을 인식하여 사용자 음성에 대응하는 응답을 출력하거나 또는 사용자 음성에 대응하는 제어 명령을 주변 IoT 전자 기기에 송신한다. 또는 로봇은 딥 러닝 모델을 바탕으로 로봇 주변의 객체를 인식하여, 객체와의 충돌 없이 주행한다. With the recent development of electronic technology, deep learning models are being used in various models. For example, a speaker recognizes the user's voice based on a deep learning model and outputs a response corresponding to the user's voice or transmits a control command corresponding to the user's voice to surrounding IoT electronic devices. Alternatively, the robot recognizes objects around the robot based on a deep learning model and drives without colliding with the objects.

이러한 딥 러닝 모델의 경우 학습 데이터에 따라서 성능이 달라질 수 있다. 특히, 딥 러닝 모델을 학습 시키는데 이용된 학습 데이터의 유형에 따라 딥 러닝 모델의 출력하고자 하는 출력 값이 결정될 수 있다. 또한, 학습 데이터의 양이 많을수록 그리고 학습 데이터의 품질이 좋을수록 딥 러닝 모델이 추정하여 출력하는 출력 값의 정확도는 높아 진다. 그렇기 때문에, 딥 러닝 모델의 이용 형태에 따라 적절한 학습 데이터를 준비하되, 품질 좋은 다량의 학습 데이터를 확보하는 것이 필요하다. In the case of these deep learning models, performance may vary depending on the training data. In particular, the output value to be output from the deep learning model may be determined depending on the type of learning data used to train the deep learning model. Additionally, the larger the amount of training data and the better the quality of the training data, the higher the accuracy of the output value estimated by the deep learning model. Therefore, it is necessary to prepare appropriate learning data according to the type of use of the deep learning model and secure a large amount of high-quality learning data.

본 개시의 일 실시 예에 따른 전자 장치는, 메모리 및 하나 이상의 프로세서를 포함한다. 상기 메모리는 복수의 2D 포즈 데이터 및 상기 복수의 2D 포즈 데이터에 각각 대응하는 복수의 3D 포즈 데이터를 쌍으로 포함하는 제1 학습 데이터 셋을 저장한다. 상기 하나 이상의 프로세서는 상기 제1 학습 데이터 셋에 기초하여 3D 포즈를 추정하도록 제1 신경망 모델을 학습 시킨다. 또한 상기 하나 이상의 프로세서는 상기 제1 학습 데이터 셋을 증강시켜 증강 데이터 셋을 획득한다. 또한 상기 하나 이상의 프로세서는 상기 획득된 증강 데이터 셋에 포함된 3D 포즈 증강 데이터의 유사도 및 신뢰도 중 적어도 하나에 기초하여, 상기 증강 데이터 셋에 포함된 복수의 3D 포즈 증강 데이터 중 적어도 하나의 3D 포즈 증강 데이터를 선별한다. 또한 상기 하나 이상의 프로세서는 상기 선별된 3D 포즈 증강 데이터 및 상기 증강 데이터 셋에 포함된 상기 선별된 3D 포즈 증강 데이터에 대응하는 2D 포즈 증강 데이터를 쌍으로 포함하는 제2 학습 데이터 셋을 획득한다. 또한 상기 하나 이상의 프로세서는 상기 획득된 제2 학습 데이터 셋에 기초하여 상기 제1 신경망 모델을 재학습 시킨다. An electronic device according to an embodiment of the present disclosure includes a memory and one or more processors. The memory stores a first learning data set including a plurality of 2D pose data pairs and a plurality of 3D pose data corresponding to the plurality of 2D pose data. The one or more processors train a first neural network model to estimate a 3D pose based on the first training data set. Additionally, the one or more processors augment the first learning data set to obtain an augmented data set. In addition, the one or more processors perform 3D pose augmentation on at least one of a plurality of 3D pose augmented data included in the augmented data set, based on at least one of similarity and reliability of 3D pose augmented data included in the acquired augmented data set. Select data. Additionally, the one or more processors acquire a second learning data set pairwise including the selected 3D pose augmented data and 2D pose augmented data corresponding to the selected 3D pose augmented data included in the augmented data set. Additionally, the one or more processors retrain the first neural network model based on the acquired second training data set.

본 개시의 일 실시 예에 따른 전자 장치를 제어 하는 방법은 복수의 2D 포즈 데이터 및 상기 복수의 2D 포즈 데이터에 각각 대응하는 복수의 3D 포즈 데이터를 쌍으로 포함하는 제1 학습 데이터 셋에 기초하여 3D 포즈를 추정하도록 제1 신경망 모델을 학습 시키는 단계를 포함한다. 또한, 상기 제어 방법은 상기 제1 학습 데이터 셋을 증강시켜 증강 데이터 셋을 획득하는 단계를 포함한다. 상기 제어 방법은 상기 획득된 증강 데이터 셋에 포함된 3D 포즈 증강 데이터의 유사도 및 신뢰도 중 적어도 하나에 기초하여, 상기 증강 데이터 셋에 포함된 복수의 3D 포즈 증강 데이터 중 적어도 하나의 3D 포즈 증강 데이터를 선별하는 단계를 포함한다. 상기 제어 방법은 상기 선별된 3D 포즈 증강 데이터 및 상기 증강 데이터 셋에 포함된 상기 선별된 3D 포즈 증강 데이터에 대응하는 2D 포즈 증강 데이터를 쌍으로 포함하는 제2 학습 데이터 셋을 획득하는 단계를 포함한다. 상기 제어 방법은 상기 획득된 제2 학습 데이터 셋에 기초하여 상기 제1 신경망 모델을 재학습시키는 단계를 포함한다. A method of controlling an electronic device according to an embodiment of the present disclosure includes 3D pose data based on a first learning data set pairwise including a plurality of 2D pose data and a plurality of 3D pose data corresponding to the plurality of 2D pose data. It includes training a first neural network model to estimate the pose. Additionally, the control method includes the step of augmenting the first learning data set to obtain an augmented data set. The control method includes at least one 3D pose augmented data among a plurality of 3D pose augmented data included in the augmented data set, based on at least one of similarity and reliability of 3D pose augmented data included in the acquired augmented data set. Includes a selection step. The control method includes acquiring a second learning data set pairwise including the selected 3D pose augmentation data and 2D pose augmentation data corresponding to the selected 3D pose augmentation data included in the augmentation data set. . The control method includes retraining the first neural network model based on the obtained second training data set.

한편, 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법이 전자 장치의 프로세서에 의해 실행되는 경우 상기 전자 장치의 제어 방법을 수행하도록 하는 컴퓨터 명령이 비일시적 컴퓨터 판독 가능 기록 매체에 저장될 수 잇다. 이때, 상기 제어 방법은 복수의 2D 포즈 데이터 및 상기 복수의 2D 포즈 데이터에 각각 대응하는 복수의 3D 포즈 데이터를 쌍으로 포함하는 제1 학습 데이터 셋에 기초하여 3D 포즈를 추정하도록 제1 신경망 모델을 학습 시키는 단계를 포함한다. 또한, 상기 제어 방법은 상기 제1 학습 데이터 셋을 증강시켜 증강 데이터 셋을 획득하는 단계를 포함한다. 상기 제어 방법은 상기 획득된 증강 데이터 셋에 포함된 3D 포즈 증강 데이터의 유사도 및 신뢰도 중 적어도 하나에 기초하여, 상기 증강 데이터 셋에 포함된 복수의 3D 포즈 증강 데이터 중 적어도 하나의 3D 포즈 증강 데이터를 선별하는 단계를 포함한다. 상기 제어 방법은 상기 선별된 3D 포즈 증강 데이터 및 상기 증강 데이터 셋에 포함된 상기 선별된 3D 포즈 증강 데이터에 대응하는 2D 포즈 증강 데이터를 쌍으로 포함하는 제2 학습 데이터 셋을 획득하는 단계를 포함한다. 상기 제어 방법은 상기 획득된 제2 학습 데이터 셋에 기초하여 상기 제1 신경망 모델을 재학습시키는 단계를 포함한다.Meanwhile, when the method for controlling an electronic device according to an embodiment of the present disclosure is executed by a processor of the electronic device, computer instructions for performing the method for controlling the electronic device may be stored in a non-transitory computer-readable recording medium. . At this time, the control method uses a first neural network model to estimate a 3D pose based on a first learning data set that includes a pair of a plurality of 2D pose data and a plurality of 3D pose data corresponding to the plurality of 2D pose data. It includes a learning step. Additionally, the control method includes the step of augmenting the first learning data set to obtain an augmented data set. The control method includes at least one 3D pose augmented data among a plurality of 3D pose augmented data included in the augmented data set, based on at least one of similarity and reliability of 3D pose augmented data included in the acquired augmented data set. Includes a selection step. The control method includes acquiring a second learning data set pairwise including the selected 3D pose augmentation data and 2D pose augmentation data corresponding to the selected 3D pose augmentation data included in the augmentation data set. . The control method includes retraining the first neural network model based on the obtained second training data set.

도 1은 본 개시의 일 실시 예에 따른 학습 데이터 셋을 증강시켜 획득된 증강 데이터 셋 중에서 재 학습에 이용될 학습 데이터를 선별하는 전자 장치의 예시도이다.
도 2는 본 개시의 일 실시 예에 따른 전자 장치의 개략적인 구성도이다.
도 3은 본 개시의 일 실시 예에 따른 학습 데이터 셋을 나타낸 예시도이다.
도 4는 본 개시의 일 실시 예에 따른 학습 데이터 셋을 증강시키는 방법을 나타낸 예시도이다.
도 5는 본 개시의 일 실시 예에 따른, 제1 학습 데이터 셋에 포함된 3D 포즈 데이터의 분포 함수에 기초하여 증강 데이터 셋에서 3D 포즈 증강 데이터를 선별하는 방법을 나타낸 예시도이다.
도 6은 본 개시의 일 실시 예에 따른, 제1 신경망 모델에 기초하여 증강 데이터 셋에서 3D 포즈 증강 데이터를 선별하는 방법을 나타낸 예시도이다.
도 7은 본 개시의 일 실시 예에 따른 이미지에 포함된 객체의 포즈를 추정하는 방법을 나타낸 예시도이다.
도 8은 본 개시의 일 실시 예에 따른 전자 장치의 세부 구성도이다.
도 9는 본 개시의 일 실시 예에 따른 전자 장치를 제어하는 방법을 개략적으로 나타낸 순서도이다.1 is an exemplary diagram of an electronic device that selects learning data to be used for re-learning from an augmented data set obtained by augmenting a learning data set according to an embodiment of the present disclosure.
Figure 2 is a schematic configuration diagram of an electronic device according to an embodiment of the present disclosure.
Figure 3 is an example diagram showing a learning data set according to an embodiment of the present disclosure.
Figure 4 is an exemplary diagram showing a method of augmenting a learning data set according to an embodiment of the present disclosure.
Figure 5 is an example diagram showing a method of selecting 3D pose augmented data from an augmented data set based on the distribution function of 3D pose data included in the first learning data set, according to an embodiment of the present disclosure.
FIG. 6 is an exemplary diagram illustrating a method of selecting 3D pose augmented data from an augmented data set based on a first neural network model, according to an embodiment of the present disclosure.
Figure 7 is an example diagram showing a method for estimating the pose of an object included in an image according to an embodiment of the present disclosure.
8 is a detailed configuration diagram of an electronic device according to an embodiment of the present disclosure.
Figure 9 is a flowchart schematically showing a method of controlling an electronic device according to an embodiment of the present disclosure.

본 실시 예들은 다양한 변환을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 특정한 실시 형태에 대해 범위를 한정하려는 것이 아니며, 본 개시의 실시 예의 다양한 변경(modifications), 균등물(equivalents), 및/또는 대체물(alternatives)을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조 부호가 사용될 수 있다.Since these embodiments can be modified in various ways and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the scope to specific embodiments, and should be understood to include various modifications, equivalents, and/or alternatives to the embodiments of the present disclosure. In connection with the description of the drawings, similar reference numbers may be used for similar components.

본 개시를 설명함에 있어서, 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 개시의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그에 대한 상세한 설명은 생략한다. In describing the present disclosure, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present disclosure, the detailed description thereof will be omitted.

덧붙여, 하기 실시 예는 여러 가지 다른 형태로 변형될 수 있으며, 본 개시의 기술적 사상의 범위가 하기 실시 예에 한정되는 것은 아니다. 오히려, 이들 실시 예는 본 개시를 더욱 충실하고 완전하게 하고, 당업자에게 본 개시의 기술적 사상을 완전하게 전달하기 위하여 제공되는 것이다.In addition, the following examples may be modified into various other forms, and the scope of the technical idea of the present disclosure is not limited to the following examples. Rather, these embodiments are provided to make the present disclosure more faithful and complete and to completely convey the technical idea of the present disclosure to those skilled in the art.

본 개시에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 권리범위를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.The terms used in this disclosure are merely used to describe specific embodiments and are not intended to limit the scope of rights. Singular expressions include plural expressions unless the context clearly dictates otherwise.

본 개시에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다. In the present disclosure, expressions such as “have,” “may have,” “includes,” or “may include” refer to the presence of the corresponding feature (e.g., component such as numerical value, function, operation, or part). , and does not rule out the existence of additional features.

본 개시에서, "A 또는 B," "A 또는/및 B 중 적어도 하나," 또는 "A 또는/및 B 중 하나 또는 그 이상"등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. 예를 들면, "A 또는 B," "A 및 B 중 적어도 하나," 또는 "A 또는 B 중 적어도 하나"는, (1) 적어도 하나의 A를 포함, (2) 적어도 하나의 B를 포함, 또는 (3) 적어도 하나의 A 및 적어도 하나의 B 모두를 포함하는 경우를 모두 지칭할 수 있다.In the present disclosure, expressions such as “A or B,” “at least one of A or/and B,” or “one or more of A or/and B” may include all possible combinations of the items listed together. . For example, “A or B,” “at least one of A and B,” or “at least one of A or B” includes (1) at least one A, (2) at least one B, or (3) it may refer to all cases including both at least one A and at least one B.

본 개시에서 사용된 "제1," "제2," "첫째," 또는 "둘째,"등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다. Expressions such as “first,” “second,” “first,” or “second,” used in the present disclosure can modify various components regardless of order and/or importance, and can refer to one component. It is only used to distinguish from other components and does not limit the components.

어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "(기능적으로 또는 통신적으로) 연결되어((operatively or communicatively) coupled with/to)" 있다거나 "접속되어(connected to)" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제3 구성요소)를 통하여 연결될 수 있다고 이해되어야 할 것이다. A component (e.g., a first component) is “(operatively or communicatively) coupled with/to” another component (e.g., a second component). When referred to as being “connected to,” it should be understood that any component may be directly connected to the other component or may be connected through another component (e.g., a third component).

반면에, 어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소와 상기 다른 구성요소 사이에 다른 구성요소(예: 제3 구성요소)가 존재하지 않는 것으로 이해될 수 있다.On the other hand, when a component (e.g., a first component) is said to be “directly connected” or “directly connected” to another component (e.g., a second component), It may be understood that no other component (e.g., a third component) exists between other components.

본 개시에서 사용된 표현 "~하도록 구성된(또는 설정된)(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)," "~하는 능력을 가지는(having the capacity to)," "~하도록 설계된(designed to)," "~하도록 변경된(adapted to)," "~하도록 만들어진(made to)," 또는 "~를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다. 용어 "~하도록 구성된(또는 설정된)"은 하드웨어적으로 "특별히 설계된(specifically designed to)" 것만을 반드시 의미하지 않을 수 있다. The expression “configured to” used in the present disclosure may mean, for example, “suitable for,” “having the capacity to,” depending on the situation. ," can be used interchangeably with "designed to," "adapted to," "made to," or "capable of." The term “configured (or set to)” may not necessarily mean “specifically designed to” in hardware.

대신, 어떤 상황에서는, "~하도록 구성된 장치"라는 표현은, 그 장치가 다른 장치 또는 부품들과 함께 "~할 수 있는" 것을 의미할 수 있다. 예를 들면, 문구 "A, B, 및 C를 수행하도록 구성된(또는 설정된) 프로세서"는 해당 동작을 수행하기 위한 전용 프로세서(예: 임베디드 프로세서), 또는 메모리 장치에 저장된 하나 이상의 소프트웨어 프로그램들을 실행함으로써, 해당 동작들을 수행할 수 있는 범용 프로세서(generic-purpose processor)(예: CPU 또는 application processor)를 의미할 수 있다.Instead, in some contexts, the expression “a device configured to” may mean that the device is “capable of” working with other devices or components. For example, the phrase "processor configured (or set) to perform A, B, and C" refers to a processor dedicated to performing the operations (e.g., an embedded processor), or by executing one or more software programs stored on a memory device. , may refer to a general-purpose processor (e.g., CPU or application processor) capable of performing the corresponding operations.

실시 예에 있어서 '모듈' 혹은 '부'는 적어도 하나의 기능이나 동작을 수행하며, 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 '모듈' 혹은 복수의 '부'는 특정한 하드웨어로 구현될 필요가 있는 '모듈' 혹은 '부'를 제외하고는 적어도 하나의 모듈로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.In an embodiment, a 'module' or 'unit' performs at least one function or operation, and may be implemented as hardware or software, or as a combination of hardware and software. Additionally, a plurality of 'modules' or a plurality of 'units' may be integrated into at least one module and implemented with at least one processor, except for 'modules' or 'units' that need to be implemented with specific hardware.

한편, 도면에서의 다양한 요소와 영역은 개략적으로 그려진 것이다. 따라서, 본 발명의 기술적 사상은 첨부한 도면에 그려진 상대적인 크기나 간격에 의해 제한되지 않는다. Meanwhile, various elements and areas in the drawing are schematically drawn. Accordingly, the technical idea of the present invention is not limited by the relative sizes or spacing drawn in the attached drawings.

이하에서는 첨부한 도면을 참고하여 본 개시에 따른 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다.Hereinafter, with reference to the attached drawings, embodiments according to the present disclosure will be described in detail so that those skilled in the art can easily implement them.

도 1은 본 개시의 일 실시 예에 따른 학습 데이터 셋을 증강시켜 획득된 증강 데이터 셋 중에서 재 학습에 이용될 학습 데이터를 선별하는 전자 장치의 예시도이다. 1 is an exemplary diagram of an electronic device that selects learning data to be used for re-learning from an augmented data set obtained by augmenting a learning data set according to an embodiment of the present disclosure.

신경망 모델(또는, 딥 러닝 모델 등으로 지칭될 수 있다.)은 학습 데이터를 바탕으로 학습된다. 구체적으로, 사용자는 학습 데이터에 포함된 입력 데이터를 신경망 모델에 입력하고, 학습 데이터에 포함된 출력 데이터에 기초하여 신경망 모델에 포함된 복수의 은닉 층에 역전 파 알고리즘을 적용하여, 신경망 모델에 포함된 복수의 은닉 층 각각에 대한 가중치 값을 획득할 수 있다. 결국, 신경망 모델은 신경망 모델이 학습되는데 이용된 학습 데이터에 따라서 다양한 결과 값을 출력할 수 있다. A neural network model (or may be referred to as a deep learning model, etc.) is learned based on training data. Specifically, the user inputs the input data included in the learning data into the neural network model, and applies the back-propagation algorithm to a plurality of hidden layers included in the neural network model based on the output data included in the learning data to obtain the data included in the neural network model. Weight values for each of the plurality of hidden layers can be obtained. Ultimately, a neural network model can output various result values depending on the learning data used to learn the neural network model.

다만, 학습 데이터의 양이 충분하지 않는 경우에는, 신경망 모델의 성능이 떨어질 수 있다. 학습 데이터의 양이 적은 경우에는 신경망 모델에 포함된 복수의 은닉 층 각각에 대한 적절한 가중치 값을 획득하기 전에 학습 과정이 중단되기 때문이다. 따라서, 이러한 경우 학습 데이터를 증강시켜 새로운 학습 데이터를 획득한다. 예를 들어, 학습 데이터가 이미지인 경우, 이미지를 회전, 크로핑(Cropping), 확대 또는 축소하여 기존의 학습 데이터에서 새로운 학습 데이터를 획득할 수 있다. However, if the amount of learning data is insufficient, the performance of the neural network model may deteriorate. This is because if the amount of learning data is small, the learning process is stopped before obtaining appropriate weight values for each of the multiple hidden layers included in the neural network model. Therefore, in this case, new learning data is obtained by augmenting the learning data. For example, if the training data is an image, new training data can be obtained from existing training data by rotating, cropping, enlarging or reducing the image.

다만, 이러한 학습 데이터를 증강시키는 경우, 품질이 떨어지는 학습 데이터가 함께 획득될 수 있다. 품질이 떨어지는 학습 데이터는 신경망 모델의 학습 목적과 관계성이 부족한 데이터, 잘못 가공된 데이터, 또는 일반 로우 데이터를 포함할 수 있다. 따라서, 학습 데이터를 증강시켜 학습 데이터의 양을 증가시키는 것과 동시에 품질이 좋은 학습 데이터를 획득하는 방안이 요구된다. However, when augmenting such learning data, learning data of poor quality may also be obtained. Low-quality training data may include data that lacks a relationship with the learning purpose of the neural network model, incorrectly processed data, or general raw data. Therefore, a method is required to increase the amount of learning data by augmenting the learning data and at the same time obtain good quality learning data.

본 개시의 일 실시 예에 따른 전자 장치(100)는 이러한 과제를 해결하기 위하여 기존의 학습 데이터(10)를 증강하여 새로운 학습 데이터(20)를 획득하되, 획득된 새로운 학습 데이터(20)에서 품질이 좋은 학습 데이터 만을 선별한다. 즉, 신경망 모델(40)을 학습 시키기 위하여 기존의 학습 데이터(10)를 증강시켜 획득된 새로운 학습 데이터(20)를 모두 이용하는 것이 아닌, 새로운 학습 데이터(20)에서 신경망 모델(40)을 학습시키기에 적합한 학습 데이터 만을 선별한다. 이를 통해 전자 장치(100)는 신경망 모델(40)의 학습 시키기 위한 최적의 학습 데이터를 확보한다. In order to solve this problem, the electronic device 100 according to an embodiment of the present disclosure acquires new learning data 20 by augmenting the existing learning data 10, and determines the quality of the acquired new learning data 20. Only this good training data is selected. In other words, in order to learn the neural network model 40, rather than using all the new learning data 20 obtained by augmenting the existing learning data 10, the neural network model 40 is trained from the new learning data 20. Select only the learning data suitable for. Through this, the electronic device 100 secures optimal learning data for training the neural network model 40.

이하에서는, 이와 관련된 본 개시의 실시 예에 대하여 설명한다. Below, embodiments of the present disclosure related to this will be described.

도 2는 본 개시의 일 실시 예에 따른 전자 장치(100)의 개략적인 구성도이다. Figure 2 is a schematic configuration diagram of an electronic device 100 according to an embodiment of the present disclosure.

본 개시의 일 실시 예에 따른 전자 장치(100)는 메모리(110) 및 하나 이상의 프로세서를 포함한다. The electronic device 100 according to an embodiment of the present disclosure includes a memory 110 and one or more processors.

메모리(110)는 하나 이상의 신경망 모델 및 학습 데이터 셋을 저장한다. 여기서 하나 이상의 신경망 모델은 2D(Dimension) 포즈 데이터로부터 3D 포즈 데이터를 추정하도록 기 학습된 신경망 모델을 포함할 수 있다. 여기서, 2D 포즈 데이터로부터 3D 포즈 데이터를 추정하도록 기 학습된 신경망 모델은 2D 포즈 데이터가 입력되면, 입력된 2D 포즈 데이터에 대응하는 3D 포즈 데이터를 출력할 수 있다. 이하에서는, 본 개시의 설명의 편의를 위해 2D 포즈 데이터로부터 3D 포즈 데이터를 추정하도록 기 학습된 신경망 모델을 제1 신경망 모델이라 지칭한다. Memory 110 stores one or more neural network models and learning data sets. Here, one or more neural network models may include a neural network model that has been previously learned to estimate 3D pose data from 2D (Dimension) pose data. Here, when 2D pose data is input, a neural network model previously trained to estimate 3D pose data from 2D pose data may output 3D pose data corresponding to the input 2D pose data. Hereinafter, for convenience of description of the present disclosure, a neural network model previously learned to estimate 3D pose data from 2D pose data is referred to as a first neural network model.

본 개시에서 사용되는 제1 신경망 모델은 심층 신경망(Deep Neural Network, DNN), 합성곱 신경망(Convolutional Neural Network, CNN), 순환 신경망(Recurrent Neural Network, RNN), 제한 볼츠만 머신 (Restricted Boltzmann Machine, RBM), 심층 신뢰 신경망 (Deep Belief Network, DBN), 심층 Q-네트워크(Deep Q-Networks, DQN) 등 다양한 네트워크가 사용될 수 있다.The first neural network model used in this disclosure is a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), and a restricted Boltzmann machine (RBM). ), Deep Belief Network (DBN), and Deep Q-Networks (DQN), etc. can be used.

학습 데이터 셋은 제1 신경망 모델을 학습 시키기 위하여 이용된 학습 데이터 셋일 수 있다. 구체적으로, 학습 데이터 셋에는 2D 포즈 데이터와 2D 포즈 데이터에 대응하는 3D 포즈 데이터가 쌍을 이루는 복수의 학습 데이터가 포함될 수 있다. 학습 데이터 셋에 대해서는 도 3에서 상세히 설명하도록 한다.The training data set may be a training data set used to train the first neural network model. Specifically, the learning data set may include a plurality of learning data pairs of 2D pose data and 3D pose data corresponding to the 2D pose data. The learning data set will be described in detail in FIG. 3.

이 밖에도 메모리(110)는 본 개시의 전자 장치(100)에 관한 다양한 실시 예를 위해 필요한 데이터 또는 전자 장치(100)를 구동하는데 이용되는 다양한 데이터를 저장할 수 있다. 메모리(110)는 데이터 저장 용도에 따라 전자 장치(100)에 임베디드 된 메모리(110) 형태로 구현되거나, 전자 장치(100)에 탈 부착이 가능한 메모리(110) 형태로 구현될 수도 있다. 예를 들어, 전자 장치(100)의 구동을 위한 데이터의 경우 전자 장치(100)에 임베디드 된 메모리(110)에 저장되고, 전자 장치(100)의 확장 기능을 위한 데이터의 경우 전자 장치(100)에 탈 부착이 가능한 메모리(110)에 저장될 수 있다. In addition, the memory 110 may store data necessary for various embodiments of the electronic device 100 of the present disclosure or various data used to drive the electronic device 100. The memory 110 may be implemented in the form of a memory 110 embedded in the electronic device 100, or may be implemented in the form of a memory 110 that is detachable from the electronic device 100, depending on the data storage purpose. For example, data for driving the electronic device 100 is stored in the memory 110 embedded in the electronic device 100, and data for extended functions of the electronic device 100 is stored in the electronic device 100. It can be stored in a removable memory 110.

한편, 전자 장치(100)에 임베디드 된 메모리(110)의 경우 휘발성 메모리(예: DRAM(dynamic RAM), SRAM(static RAM), 또는 SDRAM(synchronous dynamic RAM) 등), 비휘발성 메모리(110)(non-volatile Memory)(예: OTPROM(one time programmable ROM), PROM(programmable ROM), EPROM(erasable and programmable ROM), EEPROM(electrically erasable and programmable ROM), mask ROM, flash ROM, 플래시 메모리(예: NAND flash 또는 NOR flash 등), 하드 드라이브, 또는 솔리드 스테이트 드라이브(solid state drive(SSD)) 중 적어도 하나로 구현될 수 있다. Meanwhile, in the case of the memory 110 embedded in the electronic device 100, volatile memory (e.g., dynamic RAM (DRAM), static RAM (SRAM), or synchronous dynamic RAM (SDRAM), etc.), non-volatile memory 110 ( Non-volatile Memory (e.g. one time programmable ROM (OTPROM), programmable ROM (PROM), erasable and programmable ROM (EPROM), electrically erasable and programmable ROM (EEPROM), mask ROM, flash ROM, flash memory (e.g. It may be implemented with at least one of a NAND flash or NOR flash, a hard drive, or a solid state drive (SSD).

또한, 전자 장치(100)에 탈부착이 가능한 메모리(110)의 경우 메모리(110) 카드(예를 들어, CF(compact flash), SD(secure digital), Micro-SD(micro secure digital), Mini-SD(mini secure digital), xD(extreme digital), MMC(multi-media card) 등), USB 포트에 연결 가능한 외부 메모리(예를 들어, USB 메모리(110)) 등과 같은 형태로 구현될 수 있다.In addition, in the case of the memory 110 that is detachable from the electronic device 100, a memory 110 card (e.g., compact flash (CF), secure digital (SD), micro secure digital (Micro-SD), Mini- It may be implemented in the form of SD (mini secure digital), xD (extreme digital), MMC (multi-media card), etc.), external memory connectable to a USB port (for example, USB memory 110), etc.

본 개시의 일 실시 예에 따른 하나 이상의 프로세서(120)는 전자 장치(100)의 전반적인 동작 및 기능을 제어한다. One or more processors 120 according to an embodiment of the present disclosure controls the overall operation and functions of the electronic device 100.

하나 이상의 프로세서(120)는 CPU (Central Processing Unit), GPU (Graphics Processing Unit), APU (Accelerated Processing Unit), MIC (Many Integrated Core), DSP (Digital Signal Processor), NPU (Neural Processing Unit), 하드웨어 가속기 또는 머신 러닝 가속기 중 하나 이상을 포함할 수 있다. 하나 이상의 프로세서(120)는 전자 장치(100)의 다른 구성요소 중 하나 또는 임의의 조합을 제어할 수 있으며, 통신에 관한 동작 또는 데이터 처리를 수행할 수 있다. 하나 이상의 프로세서(120)는 메모리(110)에 저장된 하나 이상의 프로그램 또는 명령어(instruction)을 실행할 수 있다. 예를 들어, 하나 이상의 프로세서(120)는 메모리(110)에 저장된 하나 이상의 명령어를 실행함으로써, 본 개시의 일 실시 예에 따른 방법을 수행할 수 있다. One or more processors 120 include a CPU (Central Processing Unit), GPU (Graphics Processing Unit), APU (Accelerated Processing Unit), MIC (Many Integrated Core), DSP (Digital Signal Processor), NPU (Neural Processing Unit), and hardware. It may include one or more of an accelerator or machine learning accelerator. One or more processors 120 may control one or any combination of other components of the electronic device 100 and may perform operations related to communication or data processing. One or more processors 120 may execute one or more programs or instructions stored in the memory 110. For example, one or more processors 120 may perform a method according to an embodiment of the present disclosure by executing one or more instructions stored in the memory 110.

본 개시의 일 실시 예에 따른 방법이 복수의 동작을 포함하는 경우, 복수의 동작은 하나의 프로세서(120)에 의해 수행될 수도 있고, 복수의 프로세서에 의해 수행될 수도 있다. 예를 들어, 일 실시 예에 따른 방법에 의해 제 1 동작, 제 2 동작, 제 3 동작이 수행될 때, 제 1 동작, 제 2 동작, 및 제 3 동작 모두 제 1 프로세서에 의해 수행될 수도 있고, 제 1 동작 및 제 2 동작은 제 1 프로세서(예를 들어, 범용 프로세서에 의해 수행되고 제 3 동작은 제 2 프로세서(예를 들어, 인공지능 전용 프로세서)에 의해 수행될 수도 있다. When the method according to an embodiment of the present disclosure includes a plurality of operations, the plurality of operations may be performed by one processor 120 or may be performed by a plurality of processors. For example, when the first operation, the second operation, and the third operation are performed by the method according to one embodiment, the first operation, the second operation, and the third operation may all be performed by the first processor. , the first operation and the second operation may be performed by a first processor (e.g., a general-purpose processor) and the third operation may be performed by a second processor (e.g., an artificial intelligence-specific processor).

하나 이상의 프로세서(120)는 하나의 코어를 포함하는 단일 코어 프로세서(single core processor)로 구현될 수도 있고, 복수의 코어(예를 들어, 동종 멀티 코어 또는 이종 멀티 코어)를 포함하는 하나 이상의 멀티 코어 프로세서(multicore processor)로 구현될 수도 있다. 하나 이상의 프로세서(120)가 멀티 코어 프로세서로 구현되는 경우, 멀티 코어 프로세서에 포함된 복수의 코어 각각은 캐시 메모리(110), 온 칩(On-chip) 메모리(110)와 같은 프로세서 내부 메모리(110)를 포함할 수 있으며, 복수의 코어에 의해 공유되는 공통 캐시가 멀티 코어 프로세서에 포함될 수 있다. 또한, 멀티 코어 프로세서에 포함된 복수의 코어 각각(또는 복수의 코어 중 일부)은 독립적으로 본 개시의 일 실시 예에 따른 방법을 구현하기 위한 프로그램 명령을 판독하여 수행할 수도 있고, 복수의 코어 전체(또는 일부)가 연계되어 본 개시의 일 실시 예에 따른 방법을 구현하기 위한 프로그램 명령을 판독하여 수행할 수도 있다.The one or more processors 120 may be implemented as a single core processor including one core, or one or more multi-cores including a plurality of cores (e.g., homogeneous multi-core or heterogeneous multi-core). It may also be implemented as a processor (multicore processor). When one or more processors 120 are implemented as multi-core processors, each of the plurality of cores included in the multi-core processor has processor internal memory 110, such as cache memory 110 and on-chip memory 110. ), and a common cache shared by a plurality of cores may be included in a multi-core processor. In addition, each of the plurality of cores (or some of the plurality of cores) included in the multi-core processor may independently read and perform program instructions for implementing the method according to an embodiment of the present disclosure, and all of the plurality of cores may (or part of it) may be linked to read and perform program instructions for implementing the method according to an embodiment of the present disclosure.

본 개시의 일 실시 예에 따른 방법이 복수의 동작을 포함하는 경우, 복수의 동작은 멀티 코어 프로세서에 포함된 복수의 코어 중 하나의 코어에 의해 수행될 수도 있고, 복수의 코어에 의해 수행될 수도 있다. 예를 들어, 일 실시 예에 따른 방법에 의해 제 1 동작, 제 2 동작, 및 제 3 동작이 수행될 때, 제 1 동작, 제2 동작, 및 제3 동작 모두 멀티 코어 프로세서에 포함된 제 1 코어에 의해 수행될 수도 있고, 제 1 동작 및 제 2 동작은 멀티 코어 프로세서에 포함된 제 1 코어에 의해 수행되고 제 3 동작은 멀티 코어 프로세서에 포함된 제 2 코어에 의해 수행될 수도 있다. When a method according to an embodiment of the present disclosure includes a plurality of operations, the plurality of operations may be performed by one core among a plurality of cores included in a multi-core processor, or may be performed by a plurality of cores. there is. For example, when the first operation, the second operation, and the third operation are performed by the method according to an embodiment, the first operation, the second operation, and the third operation are all performed by the first operation included in the multi-core processor. It may be performed by a core, and the first operation and the second operation may be performed by the first core included in the multi-core processor, and the third operation may be performed by the second core included in the multi-core processor.

본 개시의 실시 예들에서, 프로세서는 하나 이상의 프로세서 및 기타 전자 부품들이 집적된 시스템 온 칩(SoC), 단일 코어 프로세서(120), 멀티 코어 프로세서, 또는 단일 코어 프로세서 또는 멀티 코어 프로세서에 포함된 코어를 의미할 수 있으며, 여기서 코어는 CPU, GPU, APU, MIC, DSP, NPU, 하드웨어 가속기 또는 기계 학습 가속기 등으로 구현될 수 있으나, 본 개시의 실시 예들이 이에 한정되는 것은 아니다.In embodiments of the present disclosure, the processor may be a system-on-chip (SoC) in which one or more processors and other electronic components are integrated, a single-core processor 120, a multi-core processor, or a core included in a single-core processor or a multi-core processor. This may mean that the core may be implemented as a CPU, GPU, APU, MIC, DSP, NPU, hardware accelerator, or machine learning accelerator, but embodiments of the present disclosure are not limited thereto.

이하에서는, 설명의 편의를 위해 하나 이상의 프로세서(120)를 프로세서(120)로 지칭하도록 한다. Hereinafter, for convenience of explanation, one or more processors 120 will be referred to as processor 120.

도 3은 본 개시의 일 실시 예에 따른 학습 데이터 셋을 나타낸 예시도이고, 도 4a 및 4b는 본 개시의 일 실시 예에 따른 학습 데이터 셋을 증강시켜 증강 데이터를 획득하는 방법을 나타낸 예시도이다. Figure 3 is an example diagram showing a learning data set according to an embodiment of the present disclosure, and Figures 4a and 4b are example diagrams showing a method of obtaining augmented data by augmenting the learning data set according to an embodiment of the present disclosure. .

도 3을 참조하면, 학습 데이터 셋(11)에 포함된 2D 포즈 데이터(210)는 객체의 관절에 대한 2차원의 좌표 정보를 포함할 수 있다. 여기서, 객체는 이미지에 포함된 객체로, 2D 포즈 데이터(210)는 이미지에 포함된 객체를 구성하는 복수의 관절에 대한 2차원의 좌표 정보를 포함할 수 있다. 2차원의 좌표 정보는 기 설정된 2차원의 좌표 공간에서의 객체를 구성하는 복수의 관절의 각각의 좌표 값, 벡터 값 등이 될 수 있다. Referring to FIG. 3, 2D pose data 210 included in the learning data set 11 may include two-dimensional coordinate information about the joints of the object. Here, the object is an object included in the image, and the 2D pose data 210 may include two-dimensional coordinate information about a plurality of joints constituting the object included in the image. The two-dimensional coordinate information may be each coordinate value or vector value of a plurality of joints constituting the object in a preset two-dimensional coordinate space.

예를 들어, 객체의 주요 관절이 9개인 경우, 2D 포즈 데이터(210)에는 9개의 관절에 대응하는 좌표 정보가 포함될 수 있다. 복수의 2D 포즈 데이터(210) 중 2D 제2 포즈 데이터(210-2)는, 9개의 관절(즉, 제1 내지 제9 관절) 각각에 대한 좌표 정보로, 제1 관절의 벡터 값 내지 제9 관절의 벡터 값까지 9개의 좌표 정보를 포함할 수 있다.For example, if an object has nine major joints, the 2D pose data 210 may include coordinate information corresponding to the nine joints. Among the plurality of 2D pose data 210, the 2D second pose data 210-2 is coordinate information for each of nine joints (i.e., first to ninth joints), and is the vector value of the first joint. Vector values of joints 9 to 9 Up to 9 coordinate information can be included.

한편, 학습 데이터 셋(11)에 포함된 3D 포즈 데이터(220)는 객체의 관절에 대한 3차원의 좌표 정보를 포함할 수 있다. 여기서, 객체는 이미지에 포함된 객체로, 3D 포즈 데이터(220)는 이미지에 포함된 객체를 구성하는 복수의 관절에 대한 3차원의 좌표 정보를 포함할 수 있다. 3차원의 좌표 정보는 기 설정된 3차원의 좌표 공간에서의 객체를 구성하는 복수의 관절의 각각의 좌표 값, 벡터 값 등이 될 수 있다. Meanwhile, the 3D pose data 220 included in the learning data set 11 may include three-dimensional coordinate information about the joints of the object. Here, the object is an object included in the image, and the 3D pose data 220 may include three-dimensional coordinate information about a plurality of joints constituting the object included in the image. The three-dimensional coordinate information may be each coordinate value or vector value of a plurality of joints constituting the object in a preset three-dimensional coordinate space.

예를 들어, 도 3을 참조하면 객체의 주요 관절이 9개인 경우, 3D 포즈 데이터(220)에는 9개의 관절에 대응하는 좌표 정보가 포함될 수 있다. 복수의 3D 포즈 데이터(220) 중 3D 제2 포즈 데이터(220-2)는, 9개의 관절(즉, 제1 내지 제9 관절) 각각에 대한 좌표 정보로, 제1 관절의 벡터 값 내지 제9 관절의 벡터 값까지 9개의 좌표 정보를 포함할 수 있다.For example, referring to FIG. 3 , if an object has nine major joints, the 3D pose data 220 may include coordinate information corresponding to the nine joints. Among the plurality of 3D pose data 220, the 3D second pose data 220-2 is coordinate information for each of nine joints (i.e., first to ninth joints), and is the vector value of the first joint. Vector values of joints 9 to 9 Up to 9 coordinate information can be included.

한편, 도 3에서는 객체의 관절이 9개인 것으로 도시 되었으나, 객체의 관절은 실시 예에 따라 다양한 개수로 구성될 수 있다. Meanwhile, in FIG. 3, the object is shown as having 9 joints, but the number of joints of the object may vary depending on the embodiment.

학습 데이터 셋(11)에는 포함된 복수의 2D 포즈 데이터(210)와 복수의 3D 포즈 데이터(220)는 동일한 포즈에 대하여 서로 쌍(Pair)를 이룰 수 있다. 예를 들어, 도 3을 참조하면, 객체가 서 있는 경우, 서 있는 객체의 포즈에 대한 객체의 관절에 대한 2차원 좌표 정보를 포함하는 2D 포즈 데이터(210)와 서 있는 객체의 포즈에 대한 객체의 관절에 대한 3차원 좌표 정보를 포함하는 3D 포즈 데이터(220)가 쌍을 이룰 수 있다. 이처럼, 학습 데이터 셋(11)에는 동일한 객체, 및 동일한 객체의 포즈에 대한 2D 포즈 데이터(210) 및 3D 포즈 데이터(220)가 쌍을 이뤄 포함될 수 있다. A plurality of 2D pose data 210 and a plurality of 3D pose data 220 included in the learning data set 11 may be paired with each other for the same pose. For example, referring to FIG. 3, when an object is standing, 2D pose data 210 containing two-dimensional coordinate information about the joints of the object with respect to the pose of the standing object and the object with respect to the pose of the standing object 3D pose data 220 containing 3D coordinate information for the joints may be paired. In this way, the learning data set 11 may include pairs of 2D pose data 210 and 3D pose data 220 for the same object and the pose of the same object.

예를 들어, 복수의 2D 포즈 데이터(210) 중 2D 제2 포즈 데이터(210-2)와 복수의 3D 포즈 데이터(220) 중 3D 제2 포즈 데이터(220-2)는 동일한 객체 및 동일한 객체의 동일한 포즈(객체가 서 있는 포즈)에 관한 것이므로, 2D 제2 포즈 데이터(210-2)와 3D 제2 포즈 데이터(220-2)는 매칭될 수 있다. 즉, 2D 제2 포즈 데이터(210-2)와 3D 제2 포즈 데이터(220-2)는 쌍을 이뤄 학습 데이터 셋에 포함될 수 있다. 이때, 2D 제2 포즈 데이터(210-2)와 3D 제2 포즈 데이터(220-2)에 각각 포함된 복수의 좌표 정보 또한 동일한 관절에 대해서 매칭될 수 있다. 즉, 2D 제2 포즈 데이터(210-2)에 포함된 복수의 관절(즉, 9개의 관절) 중 제1 관절에 대한 좌표 정보는 3D 제2 포즈 데이터(220-2)에 포함된 복수의 관절(즉, 9개의 관절) 중 제1 관절에 대한 좌표 정보와 매칭될 수 있다. 이와 같이, 2D 제2 포즈 데이터(210-2)에 포함된 복수의 관절(즉, 9개의 관절)에 대응하는 각각의 좌표 정보는 3D 제2 포즈 데이터(220-2)에 포함된 복수의 관절(즉, 9개의 관절)에 대응하는 각각의 좌표 정보와 동일한 관절에 대하여 매칭될 수 있다. For example, the 2D second pose data 210-2 among the plurality of 2D pose data 210 and the 3D second pose data 220-2 among the plurality of 3D pose data 220 are the same object and the same object. Since they relate to the same pose (a pose in which the object is standing), the 2D second pose data 210-2 and the 3D second pose data 220-2 may be matched. That is, the 2D second pose data 210-2 and the 3D second pose data 220-2 may be paired and included in the learning data set. At this time, a plurality of coordinate information included in the 2D second pose data 210-2 and the 3D second pose data 220-2 may also be matched to the same joint. That is, coordinate information about the first joint among a plurality of joints (i.e., nine joints) included in the 2D second pose data 210-2. is coordinate information about the first joint among a plurality of joints (i.e., 9 joints) included in the 3D second pose data 220-2 can be matched with In this way, each coordinate information corresponding to a plurality of joints (i.e., 9 joints) included in the 2D second pose data 210-2 is a plurality of joints included in the 3D second pose data 220-2. (i.e., each coordinate information corresponding to 9 joints) may be matched to the same joint.

본 개시의 일 실시 예에 따라 프로세서(120)는 학습 데이터 셋(11)에 기초하여 3D 포즈를 추정하도록 제1 신경망 모델(41)을 학습 시킨다. According to an embodiment of the present disclosure, the processor 120 trains the first neural network model 41 to estimate a 3D pose based on the training data set 11.

프로세서(120)는 동일한 포즈에 대한 2D 포즈 데이터(210) 및 3D 포즈 데이터(220)를 이용하여 제1 신경망 모델(41)을 학습시킬 수 있다. 구체적으로, 프로세서(120)는 2D 포즈 데이터(210)를 제1 신경망 모델(41)에 입력하고, 입력된 2D 포즈 데이터(210)에 대응하는(또는 입력된 2D 포즈 데이터(210)와 쌍을 이루는) 3D 포즈 데이터(220)를 출력 값으로 이용하여 제1 신경망 모델(41)을 학습 시킬 수 있다. 이하에서는, 본 개시의 설명의 편의를 위해 제1 신경망 모델(41)을 최초 학습 시키는데 이용된 학습 데이터 셋을 제1 학습 데이터 셋(11)으로 지칭한다. The processor 120 may train the first neural network model 41 using 2D pose data 210 and 3D pose data 220 for the same pose. Specifically, the processor 120 inputs the 2D pose data 210 into the first neural network model 41 and creates a pair corresponding to the input 2D pose data 210 (or a pair with the input 2D pose data 210). The first neural network model 41 can be learned by using the 3D pose data 220 as an output value. Hereinafter, for the convenience of explanation of the present disclosure, the training data set used to initially train the first neural network model 41 is referred to as the first training data set 11.

그리고, 프로세서(120)는 제1 학습 데이터 셋(11)을 증강시켜 증강 데이터 셋(21)을 획득한다. 구체적으로, 프로세서(120)는 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터(220) 간의 적어도 하나의 동일한 관절에 대한 3차원의 좌표 정보를 교환하여 제1 학습 데이터 셋(11)을 증강시킬 수 있다. Then, the processor 120 acquires the augmented data set 21 by augmenting the first learning data set 11. Specifically, the processor 120 exchanges three-dimensional coordinate information for at least one same joint between the plurality of 3D pose data 220 included in the first learning data set 11 to create the first learning data set 11. ) can be enhanced.

예를 들어, 도 4a를 참조하면, 프로세서(120)는 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터(220) 중 3D 제1 포즈 데이터(220-1) 및 3D 제2 포즈 데이터(220-2)를 추출할 수 있다. 그리고, 프로세서(120)는 추출된 3D 제1 포즈 데이터(220-1)와 3D 제2 포즈 데이터(220-2)에 포함된 복수의 관절에 대한 3차원의 좌표 정보 중 적어도 하나의 동일한 관절의 좌표 정보를 맞바꿀 수 있다. 구체적으로, 프로세서(120)는 3D 제1 포즈 데이터(220-1)에 포함된 종아리(즉, 제9 관절)에 대한 좌표 정보를 3D 제2 포즈 데이터(220-2)에 포함된 종아리(즉, 제9 관절)에 대한 좌표 정보로 변경하여, 3D 포즈 제1 증강 데이터(320-1)를 획득할 수 있다. For example, referring to FIG. 4A, the processor 120 selects 3D first pose data 220-1 and 3D second pose data 220-1 from among the plurality of 3D pose data 220 included in the first learning data set 11. Data (220-2) can be extracted. And, the processor 120 selects at least one of the same joint among the three-dimensional coordinate information for a plurality of joints included in the extracted 3D first pose data 220-1 and the 3D second pose data 220-2. Coordinate information can be exchanged. Specifically, the processor 120 coordinates information about the calf (i.e., the 9th joint) included in the 3D first pose data 220-1. Coordinate information about the calf (i.e., ninth joint) included in the 3D second pose data 220-2. By changing to , the 3D pose first augmented data 320-1 can be obtained.

이때, 3D 포즈 제1 증강 데이터(320-1)에는 3D 제1 포즈 데이터(320-1)에 대응하는 포즈(즉, 제1 포즈)와 3D 제2 포즈 데이터(320-2)에 대응하는 포즈(즉, 제2 포즈)와는 다른 새로운 포즈(즉, 제3 포즈)에 대한 좌표 정보(구체적으로, 제1 관절의 벡터 값 내지 제9 관절의 벡터 값)가 포함될 수 있다. 즉, 프로세서는 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터에 각각 포함된 적어도 하나의 관절에 대한 좌표 정보를 서로 교환시켜, 새로운 포즈에 대한 데이터(즉, 3D 포즈 증강 데이터(320))를 획득할 수 있다. At this time, the 3D pose first augmented data 320-1 includes a pose (i.e., first pose) corresponding to the 3D first pose data 320-1 and a pose corresponding to the 3D second pose data 320-2. Coordinate information (specifically, the vector value of the first joint) for the new pose (i.e., the third pose) that is different from the (i.e., the second pose) Vector values of joints 9 to 9 ) may be included. That is, the processor exchanges coordinate information for at least one joint included in each of the plurality of 3D pose data included in the first learning data set 11 to generate data for a new pose (i.e., 3D pose augmented data ( 320)) can be obtained.

한편, 프로세서(120)는 3D 제2 포즈 데이터(220-2)에 포함된 종아리(즉, 제9 관절)에 대한 좌표 정보를 3D 제1 포즈 데이터(220-1)에 포함된 종아리(즉, 제9 관절)에 대한 좌표 정보로 변경하여, 3D 포즈 제2 증강 데이터(320-2)를 획득할 수 있다. Meanwhile, the processor 120 coordinates information about the calf (i.e., the 9th joint) included in the 3D second pose data 220-2. Coordinate information about the calf (i.e., ninth joint) included in the 3D first pose data 220-1. By changing to , the 3D pose second augmented data 320-2 can be obtained.

이때, 3D 포즈 제2 증강 데이터(320-2)에도 3D 제1 포즈 데이터(320-1)에 대응하는 포즈(즉, 제1 포즈)와 3차원 제2 포즈 데이터(320-2)에 대응하는 포즈(즉, 제2 포즈)와는 다른 새로운 포즈(즉, 제4 포즈)에 대한 좌표 정보(구체적으로, 제1 관절의 벡터 값 내지 제9 관절의 벡터 값)가 포함될 수 있다. 또한, 3차원 포즈 제2 증강 데이터(320-2)의 포즈(즉, 제4 포즈)는 3D 포즈 제1 증강 데이터(320-1)의 포즈(즉, 제3 포즈)와도 상이할 수 있다. At this time, the 3D pose second augmented data 320-2 also includes a pose (i.e., first pose) corresponding to the 3D first pose data 320-1 and a pose corresponding to the 3D second pose data 320-2. Coordinate information (specifically, the vector value of the first joint) for the new pose (i.e., the fourth pose) that is different from the pose (i.e., the second pose) Vector values of joints 9 to 9 ) may be included. Additionally, the pose (ie, fourth pose) of the 3D pose second augmented data 320-2 may be different from the pose (ie, third pose) of the 3D pose first augmented data 320-1.

이때, 프로세서(120)는 종아리의 3차원의 좌표 정보를 교환하기 위하여, 종아리와 연결되는 관절(예를 들어, 허벅지 또는 발 등)의 3차원의 좌표 정보를 이용할 수 있다. 구체적으로, 프로세서(120)는 종아리와 연결되는 관절의 3차원의 좌표 정보와 다른 3D 포즈 데이터(220)에서 획득된 종아리의 3차원의 좌표 정보를 정합 시킬 수 있다. 예를 들어, 도 4a를 다시 참조하면 프로세서(120)는 3D 제1 포즈 데이터(220-1)에 포함된 종아리의 3차원의 좌표 정보를 추출하고, 3D 제2 포즈 데이터(220-1)에 포함된 종아리에 연결되는 허벅지의 3차원의 좌표 정보에 3D 제1 포즈 데이터(220-1)에 포함된 종아리의 3차원의 좌표 정보를 정합 시켜 새로운 3D 포즈 데이터(즉, 3D 포즈 제1 증강 데이터(320-1))를 획득할 수 있다. 이와 마찬가지로, 3D 포즈 제2 데이터(220-2)로부터 새로운 3D 포즈 데이터(즉 3D 포즈 제2 증강 데이터(320-2))를 획득할 수 있다. 이와 같이 프로세서(120)는 기존의 학습 데이터 셋에 포함된 3D 포즈 데이터(220)로부터 새로운 3D 포즈 데이터(320)를 획득할 수 있다. 이하에서는, 본 개시의 설명의 편의를 위해 새로운 3D 포즈 데이터(320)를 3D 포즈 증강 데이터(320)로 지칭한다. At this time, the processor 120 may use three-dimensional coordinate information of a joint (eg, thigh or foot, etc.) connected to the calf to exchange three-dimensional coordinate information of the calf. Specifically, the processor 120 may match the three-dimensional coordinate information of the joint connected to the calf with the three-dimensional coordinate information of the calf obtained from other 3D pose data 220. For example, referring again to FIG. 4A, the processor 120 extracts the three-dimensional coordinate information of the calf included in the 3D first pose data 220-1 and adds it to the 3D second pose data 220-1. New 3D pose data (i.e., 3D pose first augmented data) is created by matching the 3-dimensional coordinate information of the calf included in the 3D first pose data 220-1 with the 3-dimensional coordinate information of the thigh connected to the calf. (320-1)) can be obtained. Likewise, new 3D pose data (i.e., 3D pose second augmented data 320-2) may be obtained from the 3D pose second data 220-2. In this way, the processor 120 can obtain new 3D pose data 320 from the 3D pose data 220 included in the existing learning data set. Hereinafter, for convenience of description of the present disclosure, the new 3D pose data 320 is referred to as 3D pose augmented data 320.

한편, 프로세서(120)는 획득된 3D 포즈 증강 데이터(320)를 2차원 좌표 공간에 투영(Projection)시켜, 획득된 3D 포즈 데이터(220)에 대응하는(또는 쌍을 이루는) 2D 포즈 증강 데이터(310)를 획득할 수 있다. 상술한 예를 들어 다시 설명하면, 프로세서(120)는 3D 제1 포즈 데이터(220-1) 및 3D 제2 포즈 데이터(220-2)로부터 각각 획득된 3D 포즈 증강 데이터(320)(예를 들어, 3D 포즈 제1 증강 데이터(320-1) 및 3D 포즈 제2 증강 데이터(320-2)에 포함된 복수의 관절(구체적으로, 9개의 관절)에 대한 3차원의 좌표 정보를 기 설정된 2차원의 좌표 공간에 투영시켜 복수의 관절(구체적으로, 9개의 관절) 각각에 대한 2차원의 좌표 정보를 획득할 수 있다. 그리고, 프로세서(120)는 획득된 복수의 관절에 대한 2차원의 좌표 정보를 포함하는 2D 포즈 증강 데이터(예를 들어, 2D 포즈 제1 증강 데이터(310-1) 및 2D 포즈 제2 증강 데이터를 각각 획득할 수 있다. 그리고, 프로세서(120)는 3D 포즈 증강 데이터(320)와 2D 포즈 증강 데이터(310)를 매칭 시켜 증강 데이터 셋(21)에 포함시킬 수 있다. Meanwhile, the processor 120 projects the acquired 3D pose augmented data 320 into a two-dimensional coordinate space to generate 2D pose augmented data (or paired with) the acquired 3D pose data 220. 310) can be obtained. Referring to the above example again, the processor 120 uses 3D pose augmented data 320 (e.g. , 3D coordinate information for a plurality of joints (specifically, 9 joints) included in the 3D pose first augmented data 320-1 and the 3D pose second augmented data 320-2 is stored in a preset 2D Two-dimensional coordinate information for each of a plurality of joints (specifically, nine joints) can be obtained by projecting to the coordinate space of . Then, the processor 120 obtains two-dimensional coordinate information for the plurality of joints obtained. 2D pose augmented data including (e.g., 2D pose first augmented data 310-1 and 2D pose second augmented data may be obtained respectively. Then, the processor 120 may obtain 3D pose augmented data 320 ) can be matched with the 2D pose augmented data 310 and included in the augmented data set 21.

구체적으로, 도 4b를 참조하면 프로세서는 3차원 포즈 제1 증강 데이터(320-1)에 포함된 9개의 관절에 각각 대응하는 3차원 좌표 정보(제1 관절의 벡터 값 내지 제9 관절의 벡터 값)를 기 설정된 2차원의 좌표 공간에 각각 투영시키고, 9개의 관절에 각각 대응하는 좌표 정보(제1 관절의 2차원 벡터 값 내지 제9 관절의 2차원 벡터 값)를 획득할 수 있다. Specifically, referring to FIG. 4B, the processor generates 3D coordinate information (vector value of the first joint) corresponding to each of the 9 joints included in the 3D pose first augmented data 320-1. Vector values of joints 9 to 9 ) are each projected onto a preset two-dimensional coordinate space, and coordinate information corresponding to each of the nine joints (two-dimensional vector value of the first joint Two-dimensional vector values of joints from to 9th ) can be obtained.

이와 같이, 프로세서(120)는 제1 신경망 모델(41)을 학습 하는데 이용된 3D 포즈 데이터(220)를 증강시켜 획득된, 새로운 2D 포즈 데이터(보다 구체적으로 2D 포즈 증강 데이터(310))와 새로운 3D 포즈 데이터(보다 구체적으로 3D 포즈 증강 데이터(320))를 포함하는 증강 데이터 셋(21)을 획득할 수 있다. In this way, the processor 120 combines new 2D pose data (more specifically, 2D pose augmented data 310) obtained by augmenting the 3D pose data 220 used to learn the first neural network model 41 and new An augmented data set 21 containing 3D pose data (more specifically, 3D pose augmented data 320) may be obtained.

그리고, 프로세서(120)는 획득된 증강 데이터 셋(21)에 포함된 3D 포즈 증강 데이터(320)의 유사도 및 신뢰도 중 적어도 하나에 기초하여, 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320) 중 적어도 하나의 3D 포즈 증강 데이터를 선별한다. And, the processor 120 augments a plurality of 3D poses included in the augmented data set 21 based on at least one of the similarity and reliability of the 3D pose augmented data 320 included in the acquired augmented data set 21. At least one 3D pose augmented data is selected from the data 320.

구체적으로, 프로세서(120)는 증강 데이터 셋(21)에 포함된 복수의 학습 데이터(보다 구체적으로, 2D 포즈 증강 데이터(310) 및 3D 포즈 증강 데이터(320)의 쌍으로 구성된 학습 데이터) 중 제1 신경망 모델(41)을 재학습 시키는데 이용될 학습 데이터를 선별한다. Specifically, the processor 120 selects the first of a plurality of learning data (more specifically, learning data consisting of a pair of 2D pose augmented data 310 and 3D pose augmented data 320) included in the augmented data set 21. 1 Select learning data to be used to retrain the neural network model (41).

이를 위해, 프로세서(120)는 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320)의 유사도 및 신뢰도를 각각 식별하고, 식별된 유사도 및 신뢰도에 기초하여 복수의 3D 포즈 증강 데이터(320) 중 제1 신경망 모델(41)을 재학습 시키는데 이용될 3D 포즈 증강 데이터를 선별한다. 이때, 프로세서(120)는 선별된 3D 포즈 증강 데이터에 대응하는 2D 포즈 증강 데이터 또한 함께 선별한다. To this end, the processor 120 identifies the similarity and reliability of the plurality of 3D pose augmented data 320 included in the augmented data set 21, and generates a plurality of 3D pose augmented data based on the identified similarity and reliability ( 320), 3D pose augmentation data to be used to retrain the first neural network model 41 is selected. At this time, the processor 120 also selects 2D pose augmentation data corresponding to the selected 3D pose augmentation data.

본 개시의 일 실시 예에 따라, 프로세서(120)는 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320) 중, 기존의 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터(220)와는 유사도가 떨어지는 3D 포즈 증강 데이터를 선별할 수 있다. 이는, 프로세서(120)가 제1 신경망 모델(41)을 학습 시키는데 이용되었던 포즈와는 다른 보다 포즈(즉, 유사도가 떨어지는 포즈)에 대한 3D 포즈 증강 데이터를 선별함으로써, 제1 신경망 모델(41)이 보다 다양한 포즈를 식별할 수 있도록 학습 시키기 위함이다. According to an embodiment of the present disclosure, the processor 120 selects a plurality of 3D pose augmented data 320 included in the augmented data set 21, a plurality of 3D pose augmented data included in the existing first learning data set 11. 3D pose augmented data that is less similar to the pose data 220 can be selected. This is because the processor 120 selects 3D pose augmented data for a pose different from the pose used to learn the first neural network model 41 (i.e., a pose with less similarity), thereby creating the first neural network model 41. This is to learn to identify more diverse poses.

또한, 본 개시의 일 실시 예에 따라, 프로세서(120)는 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320) 중 신뢰도가 높은 3D 포즈 증강 데이터를 선별할 수 있다. 이는 선별된 신뢰도가 높은 3D 포즈 증강 데이터(320)를 바탕으로 제1 신경망 모델(41)을 학습 시킴으로써, 제1 신경망 모델(41)이 보다 정확히 객체의 포즈에 대한 3D 포즈 데이터(220)를 출력하도록 하기 위함이다. 즉, 제1 신경망 모델(41)에 대한 신뢰도를 높이기 위함이다. Additionally, according to an embodiment of the present disclosure, the processor 120 may select highly reliable 3D pose augmented data from among the plurality of 3D pose augmented data 320 included in the augmented data set 21. This trains the first neural network model 41 based on selected highly reliable 3D pose augmentation data 320, so that the first neural network model 41 outputs 3D pose data 220 for the object's pose more accurately. It is to do so. That is, this is to increase the reliability of the first neural network model 41.

한편, 프로세서(120)는 유사도 및 신뢰도 중 적어도 하나에 기초하여 증강 데이터 셋(21)에서 선별된 3D 포즈 증강 데이터(320) 및 2D 포즈 증강 데이터(310)에 기초하여 제1 신경망 모델(41)을 재학습 시킨다. 구체적으로, 프로세서(120)는 유사도 및 신뢰도 중 적어도 하나에 기초하여 증강 데이터 셋(21)에서 선별된 3D 포즈 증강 데이터(320)와 선별된 3D 포즈 데이터(220)에 대응하는 2D 포즈 증강 데이터(310)를 매칭 시켜 학습 데이터를 생성한 후 생성된 학습 데이터를 포함하는 새로운 학습 데이터 셋을 획득할 수 있다. 그리고, 프로세서(120)는 획득된 새로운 학습 데이터 셋을 바탕으로 기 학습된 제1 신경망 모델(41)을 재 학습 시킬 수 있다. 이하에서는, 본 개시의 설명의 편의를 위해 새로운 학습 데이터 셋을 제2 학습 데이터 셋(12)으로 지칭한다. Meanwhile, the processor 120 creates a first neural network model 41 based on the 3D pose augmented data 320 and the 2D pose augmented data 310 selected from the augmented data set 21 based on at least one of similarity and reliability. Relearn. Specifically, the processor 120 includes 3D pose augmented data 320 selected from the augmented data set 21 based on at least one of similarity and reliability and 2D pose augmented data corresponding to the selected 3D pose data 220 ( After generating learning data by matching 310), a new learning data set containing the generated learning data can be obtained. Additionally, the processor 120 may retrain the previously learned first neural network model 41 based on the new acquired training data set. Hereinafter, for convenience of description of the present disclosure, the new learning data set is referred to as the second learning data set 12.

도 5는 본 개시의 일 실시 예에 따른, 제1 학습 데이터 셋(11)에 포함된 3D 포즈 데이터(220)의 분포 함수에 기초하여 증강 데이터 셋(21)에서 3D 포즈 증강 데이터를 선별하는 방법을 나타낸 예시도이다. 도 6은 본 개시의 일 실시 예에 따른, 제1 신경망 모델(41)에 기초하여 증강 데이터 셋(21)에서 3D 포즈 증강 데이터를 선별하는 방법을 나타낸 예시도이다. FIG. 5 illustrates a method of selecting 3D pose augmented data from the augmented data set 21 based on the distribution function of the 3D pose data 220 included in the first learning data set 11 according to an embodiment of the present disclosure. This is an example diagram showing. FIG. 6 is an exemplary diagram illustrating a method of selecting 3D pose augmented data from the augmented data set 21 based on the first neural network model 41 according to an embodiment of the present disclosure.

이하에서는 증강 데이터 셋(21)에서 재 학습에 이용된 학습 데이터를 선별하는 방법에 대하여 설명하도록 한다. Hereinafter, a method of selecting learning data used for re-learning from the augmented data set 21 will be described.

일 예로, 프로세서(120)는 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터(220)의 분포 함수를 획득하고, 획득된 분포 함수에 기초하여 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320)의 학습 데이터 셋에 대한 분포 확률 값을 획득할 수 있다. 이때, 프로세서(120)는, 분포 확률 값이 높을 수록 제1 학습 데이터 셋(11)과의 유사도를 높게 식별할 수 있다. As an example, the processor 120 acquires a distribution function of a plurality of 3D pose data 220 included in the first learning data set 11, and based on the obtained distribution function, the processor 120 includes the augmented data set 21. The distribution probability value for the learning data set of the plurality of 3D pose augmented data 320 can be obtained. At this time, the processor 120 can identify the higher the distribution probability value, the higher the similarity with the first learning data set 11.

구체적으로, 프로세서(120)는 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터(220)에 대한 분포 함수를 획득할 수 있다. 일 예로, 3D 포즈 데이터(220)에 대한 분포 함수는, 학습 데이터 셋에 포함된 각각의 포즈의 확률 밀도 함수일 수 있다. 따라서, 분포 함수의 출력 값은 각각의 포즈에 대한 확률 값(또는 분포 확률 값)일 수 있다. Specifically, the processor 120 may obtain a distribution function for a plurality of 3D pose data 220 included in the first training data set 11. As an example, the distribution function for the 3D pose data 220 may be a probability density function of each pose included in the learning data set. Accordingly, the output value of the distribution function may be a probability value (or distribution probability value) for each pose.

예를 들어, 3D 포즈 데이터(220)에 대한 분포 함수는 각각의 포즈에 대응하는 3D 포즈 데이터(220)에 포함된 복수의 관절에 대한 3차원 좌표 정보에 대한 분포 정도를 나타낸 함수일 수 있다. 즉, 3D 포즈 데이터(220)에 대한 분포 함수 각각의 포즈에 대응하는 복수의 관절에 대한 3차원 좌표 정보의 확률 밀도 함수일 수 있다. 3D 포즈 데이터(220)가 객체에 대한 9개의 관절에 대한 3차원 좌표 정보를 포함하는 경우, 분포 함수는 9개의 관절 각각의 x, y, z 좌표 값에 대한 27차원의 함수일 수 있다. 이때, 프로세서(120)는 PCA(Principal Component Analysis) 알고리즘 또는 t-SNE(T-Stochastic Neighbor Embedding) 알고리즘을 이용하여, 27차원의 함수를 2차원 또는 3차원의 함수로 축소시킬 수 있다. 한편, 도 5에서는 본 개시의 설명의 편의를 위해 복수의 3D 포즈 데이터(220)에 대한 분포 함수가 2차원인 것으로 도시 되었다. For example, the distribution function for the 3D pose data 220 may be a function representing the degree of distribution of 3D coordinate information for a plurality of joints included in the 3D pose data 220 corresponding to each pose. That is, the distribution function for the 3D pose data 220 may be a probability density function of 3D coordinate information for a plurality of joints corresponding to each pose. If the 3D pose data 220 includes 3D coordinate information for 9 joints of the object, the distribution function may be a 27-dimensional function for the x, y, and z coordinate values of each of the 9 joints. At this time, the processor 120 may reduce the 27-dimensional function to a 2-dimensional or 3-dimensional function using the Principal Component Analysis (PCA) algorithm or the T-Stochastic Neighbor Embedding (t-SNE) algorithm. Meanwhile, in FIG. 5 , the distribution function for the plurality of 3D pose data 220 is shown as two-dimensional for convenience of explanation of the present disclosure.

프로세서(120)는 획득된 분포 함수에 기초하여 3D 포즈 증강 데이터(320)에 각각 대응하는 포즈의 확률 값을 획득할 수 있다. 구체적으로, 도 5를 참조하면, 프로세서(120)는 제1 학습 데이터 셋(11)에 기초하여 획득된 분포 함수에 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320)를 각각 입력하여 각각의 3D 포즈 증강 데이터(320)에 대한 확률 값을 획득할 수 있다. The processor 120 may obtain probability values of poses corresponding to each of the 3D pose augmented data 320 based on the obtained distribution function. Specifically, referring to FIG. 5, the processor 120 combines a plurality of 3D pose augmented data 320 included in the augmented data set 21 with a distribution function obtained based on the first learning data set 11. By inputting, a probability value for each 3D pose augmented data 320 can be obtained.

그리고, 프로세서(120)는 획득된 확률 값에 기초하여 각각의 3D 포즈 증강 데이터(320)의 제1 학습 데이터 셋(11)에 대한 유사도를 식별할 수 있다. 구체적으로, 프로세서(120)는 3D 포즈 증강 데이터(320)의 확률 값이 작을수록 3D 포즈 증강 데이터(320)가 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터(220)와의 유사도가 낮은 것으로 식별할 수 있다. 반면에, 프로세서(120)는 3D 포즈 증강 데이터(320)의 확률 값이 클수록 3D 포즈 증강 데이터(320)가 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터(220)와의 유사도가 높은 것으로 식별할 수 있다.Additionally, the processor 120 may identify the similarity of each 3D pose augmented data 320 to the first learning data set 11 based on the obtained probability value. Specifically, the processor 120 determines that the smaller the probability value of the 3D pose augmented data 320, the greater the similarity between the 3D pose augmented data 320 and the plurality of 3D pose data 220 included in the first learning data set 11. can be identified as low. On the other hand, the processor 120 determines that as the probability value of the 3D pose augmented data 320 increases, the similarity between the 3D pose augmented data 320 and the plurality of 3D pose data 220 included in the first learning data set 11 increases. It can be identified as high.

이때, 프로세서(120)는 기존의 제1 학습 데이터 셋(11)에 포함된 포즈와는 다른, 보다 다양한 포즈에 대한 3D 포즈 증강 데이터를 선별하기 위하여, 복수의 3D 포즈 증강 데이터(320) 중 분포 확률 값이 기 설정된 제1 값 미만인 3D 포즈 증강 데이터 만을 선별할 수 있다. 다시 도 5를 참조하면 기 설정된 제1 값이 0.3이라고 가정하였을 때, 프로세서(120)는 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320) 중 확률 값이 0.3 미만인 3D 포즈 증강 데이터 만을 선별할 수 있다. 일 예로, 프로세서는 3D 포즈 제1 증강 데이터(320-1)의 경우 확률 값이 0.3 미만인 0.2이므로, 3D 포즈 제1 증강 데이터(320-1)를 선별하는 반면에, 3D 포즈 제2 증강 데이터(320-2)의 경우 확률 값이 0.3 이상인 0.7이므로, 3D 포즈 제2 증강 데이터(320-2)는 선별하지 않을 수 있다.At this time, the processor 120 uses distribution among the plurality of 3D pose augmented data 320 in order to select 3D pose augmented data for more diverse poses different from the poses included in the existing first learning data set 11. Only 3D pose augmented data whose probability value is less than a preset first value can be selected. Referring again to FIG. 5, assuming that the preset first value is 0.3, the processor 120 performs 3D pose augmentation whose probability value is less than 0.3 among the plurality of 3D pose augmented data 320 included in the augmented data set 21. Only data can be selected. As an example, the processor selects the 3D pose first augmented data 320-1 because the probability value of the 3D pose first augmented data 320-1 is 0.2, which is less than 0.3, while the 3D pose second augmented data ( In the case of 320-2), the probability value is 0.7, which is 0.3 or more, so the 3D pose second augmented data 320-2 may not be selected.

그리고, 프로세서(120)는 선별된 3D 포즈 증강 데이터(320)에 대응하는 2D 포즈 증강 데이터(320)를 식별하여 선별하고, 선별된 3D 포즈 증강 데이터(320) 및 2D 포즈 증강 데이터(320)를 매칭시켜, 새로운 학습 데이터를 생성할 수 있다. Then, the processor 120 identifies and selects 2D pose augmentation data 320 corresponding to the selected 3D pose augmentation data 320, and selects the selected 3D pose augmentation data 320 and the 2D pose augmentation data 320. By matching, new learning data can be created.

한편, 프로세서(120)는 제1 신경망 모델(41)에 증강 데이터 셋(21)에 포함된 복수의 2D 포즈 증강 데이터(310)를 각각 입력하여, 각각의 2D 포즈 증강 데이터(310)에 대응하는 복수의 3D 포즈 출력 데이터(420)를 각각 획득하고, 획득된 복수의 3D 포즈 출력 데이터(420)에 기초하여 복수의 2D 포즈 증강 데이터(310) 각각에 대응하는 3D 포즈 증강 데이터(320)의 신뢰도를 각각 식별할 수 있다. Meanwhile, the processor 120 inputs a plurality of 2D pose augmented data 310 included in the augmented data set 21 into the first neural network model 41, and generates a plurality of 2D pose augmented data 310 corresponding to each 2D pose augmented data 310. A plurality of 3D pose output data 420 are respectively acquired, and the reliability of the 3D pose augmentation data 320 corresponding to each of the plurality of 2D pose augmentation data 310 is based on the acquired plurality of 3D pose output data 420. can be identified respectively.

구체적으로, 프로세서(120)는 증강 데이터 셋(21)에 포함된 복수의 2D 포즈 증강 데이터(310)를 기 학습된 제1 신경망 모델(41)에 입력하고, 각각의 2D 포즈 증강 데이터(310)에 대응하는 출력 값을 획득할 수 있다. 이때, 프로세서(120)가 획득하는 출력 값은 제1 신경망 모델(41)이 입력된 각각의 2D 포즈 증강 데이터(310)를 바탕으로 추론한 3D 포즈 데이터(220)일 수 있다. 구체적으로, 프로세서(120)는 2D 포즈 증강 데이터(310)에 포함된 객체의 복수의 관절 각각에 대한 2차원 좌표 정보를 바탕으로 추론한 객체의 복수의 관절 각각에 대한 3차원 좌표 정보를 획득할 수 있다. Specifically, the processor 120 inputs a plurality of 2D pose augmented data 310 included in the augmented data set 21 into the previously learned first neural network model 41, and each 2D pose augmented data 310 The output value corresponding to can be obtained. At this time, the output value obtained by the processor 120 may be 3D pose data 220 inferred based on each 2D pose augmentation data 310 input to the first neural network model 41. Specifically, the processor 120 acquires three-dimensional coordinate information for each of the plurality of joints of the object inferred based on the two-dimensional coordinate information for each of the plurality of joints of the object included in the 2D pose augmented data 310. You can.

한편, 프로세서(120)는 각각의 2D 포즈 증강 데이터(310)를 제1 신경망 모델(41)에 입력하여 획득된 3D 포즈 출력 데이터(420)를 각각 획득한 후 획득된 각각의 3D 포즈 출력 데이터(420)와 증강 데이터 셋(21)에 포함된 각각의 2D 포즈 증강 데이터(310)에 대응하는 3D 포즈 증강 데이터(320)를 각각 비교하여, 각각의 3D 포즈 증강 데이터(320)의 신뢰도를 식별할 수 있다. Meanwhile, the processor 120 inputs each 2D pose augmentation data 310 into the first neural network model 41 to obtain 3D pose output data 420, and then inputs each of the acquired 3D pose output data ( 420) and the 3D pose augmented data 320 corresponding to each 2D pose augmented data 310 included in the augmented data set 21, respectively, to identify the reliability of each 3D pose augmented data 320. You can.

예를 들어, 도 6를 참조하면 프로세서(120)는 증강 데이터 셋(21)에 포함된 복수의 2D 포즈 증강 데이터(310) 중 2D 포즈 제1 증강 데이터(310-1)를 제1 신경망 모델(41)에 입력하여 2D 포즈 제1 증강 데이터(310-1)에 대한 출력 값인 3D 포즈 제1 출력 데이터(420-1)를 획득할 수 있다. 그리고, 프로세서(120)는 증강 데이터 셋(21)에서 2D 포즈 제1 증강 데이터(310-1)와 매칭되는(또는 쌍을 이루는), 3D 포즈 제1 증강 데이터(320-1)를 획득하고, 획득된 3D 포즈 제1 증강 데이터(320-1)와 3D 포즈 제1 출력 데이터(420-1)를 비교할 수 있다. 이때, 프로세서(120)는 3D 포즈 제1 증강 데이터(320-1)와 3D 포즈 제1 출력 데이터(420-1)를 비교한 결과, 3D 포즈 제1 증강 데이터(320-1)와 3D 포즈 제1 출력 데이터(420-1)가 상이한 정도가 큰 것으로 식별되면, 제1 학습 데이터 셋(11)을 증강시켜 획득된 3D 포즈 제1 증강 데이터(320-1)의 신뢰도가 떨어진 것으로 식별할 수 있다. 그리고, 프로세서(120)는 이후 제1 신경망 모델(41)을 학습 시키기 위한 제2 학습 데이터 셋(12)을 생성하는데 있어, 신뢰도가 떨어지는 3D 포즈 제1 증강 데이터(320-1)는 선별하지 않을 수 있다. 즉, 3D 포즈 제1 증강 데이터(320-1)는 제2 학습 데이터 셋(12)에서 제외될 수 있다. For example, referring to FIG. 6, the processor 120 uses 2D pose first augmented data 310-1 among the plurality of 2D pose augmented data 310 included in the augmented data set 21 to form a first neural network model ( 41), 3D pose first output data 420-1, which is an output value for the 2D pose first augmented data 310-1, can be obtained. Then, the processor 120 acquires 3D pose first augmented data 320-1 that matches (or is paired with) 2D pose first augmented data 310-1 in the augmented data set 21, The acquired 3D pose first augmented data 320-1 and the 3D pose first output data 420-1 may be compared. At this time, the processor 120 compares the 3D pose first augmented data 320-1 and the 3D pose first output data 420-1, and as a result, the 3D pose first augmented data 320-1 and the 3D pose first output data 420-1 1 If the output data 420-1 is identified as having a large degree of difference, it can be identified that the reliability of the 3D pose first augmented data 320-1 obtained by augmenting the first learning data set 11 has decreased. . In addition, the processor 120 will not select the unreliable 3D pose first augmented data 320-1 when generating the second learning data set 12 for training the first neural network model 41. You can. That is, the 3D pose first augmented data 320-1 may be excluded from the second learning data set 12.

일 예로, 프로세서(120)는 3D 포즈 증강 데이터(320)와 3D 포즈 출력 데이터(420) 간의 오차를 식별하고, 식별된 오차에 기초하여 3D 포즈 증강 데이터(320)의 신뢰도를 식별할 수 있다. 구체적으로, 프로세서(120)는 동일한 2D 포즈 증강 데이터(310)에 대응하는 3D 포즈 출력 데이터(420) 및 3D 포즈 증강 데이터(320) 간의 오차를 식별할 수 있다. As an example, the processor 120 may identify an error between the 3D pose augmented data 320 and the 3D pose output data 420 and identify the reliability of the 3D pose augmented data 320 based on the identified error. Specifically, the processor 120 may identify an error between the 3D pose output data 420 and the 3D pose augmented data 320 corresponding to the same 2D pose augmented data 310.

예를 들어, 프로세서(120)는 동일한 2D 포즈 증강 데이터(310)에 대응하는 3D 포즈 출력 데이터(420) 및 3D 포즈 증강 데이터(320)에 각각의 특징(Feature)를 추출하고, 추출된 각각의 특징을 3차원의 벡터로 임베딩(Imbedding)한다. 프로세서(120)는 t-SNE(T-Stochastic Neighbor Embedding) 알고리즘에 기초하여 3D 포즈 출력 데이터(420) 및 3D 포즈 증강 데이터(320)의 특징에 대응하는 3차원 벡터를 획득할 수 있다. 그리고 프로세서(120)는 3차원의 벡터를 3차원의 공간에서 식별하여, 3D 포즈 출력 데이터(420)에 대응하는 벡터와 3D 포즈 증강 데이터(320)에 대응하는 벡터 간의 유클리드(Euclidean) 거리를 식별할 수 있다. 그리고, 프로세서(120)는 식별된 유클리드 거리를 3D 포즈 출력 데이터(420)에 대한 3D 포즈 증강 데이터(320)의 오차 값으로 식별할 수 있다. For example, the processor 120 extracts each feature from the 3D pose output data 420 and the 3D pose augmented data 320 corresponding to the same 2D pose augmented data 310, and each extracted Embedding features into three-dimensional vectors. The processor 120 may obtain 3D vectors corresponding to the features of the 3D pose output data 420 and the 3D pose augmented data 320 based on the t-SNE (T-Stochastic Neighbor Embedding) algorithm. Then, the processor 120 identifies a three-dimensional vector in a three-dimensional space and identifies the Euclidean distance between the vector corresponding to the 3D pose output data 420 and the vector corresponding to the 3D pose augmented data 320. can do. Additionally, the processor 120 may identify the identified Euclidean distance as an error value of the 3D pose augmented data 320 with respect to the 3D pose output data 420.

한편, 프로세서(120)는 식별된 오차가 작을 수록 각각의 3D 포즈 출력 데이터(420)의 신뢰도가 높은 것으로 식별할 수 있다. 즉, 상술한 예를 들어 다시 설명하면, 3D 포즈 출력 데이터(420)에 대응하는 벡터와 3D 포즈 증강 데이터(320)에 대응하는 벡터 간의 유클리드 거리가 작을수록 프로세서(120)는 3D 포즈 증강 데이터(320)의 신뢰도가 높은 것으로 식별할 수 있다Meanwhile, the processor 120 can identify that the reliability of each 3D pose output data 420 is higher as the identified error is smaller. That is, to explain again using the above example, the smaller the Euclidean distance between the vector corresponding to the 3D pose output data 420 and the vector corresponding to the 3D pose augmented data 320, the processor 120 increases the 3D pose augmented data ( 320) can be identified as having high reliability.

그리고, 프로세서(120)는 복수의 3D 포즈 증강 데이터(320) 중 오차가 기 설정된 제2 값 미만인 3D 포즈 증강 데이터를 선별할 수 있다. 즉, 프로세서(120)는 기 설정된 제2 값 미만의 오차를 갖는 3D 포즈 증강 데이터는 신뢰도가 높은 것으로 식별할 수 있다. Additionally, the processor 120 may select 3D pose augmented data whose error is less than a preset second value from among the plurality of 3D pose augmented data 320. That is, the processor 120 may identify 3D pose augmented data with an error less than a preset second value as having high reliability.

한편, 프로세서(120)는 유사도 및 신뢰도에 기초하여 제2 학습 데이터 셋(12)에 포함시킬 3D 포즈 데이터(220)를 획득할 수 있다. 구체적으로, 프로세서(120)는 유사도 및 신뢰도에 대응하는 스코어를 산출하고, 산출된 스코어에 기초하여 3D 포즈 증강 데이터를 선별할 수 있다. Meanwhile, the processor 120 may acquire 3D pose data 220 to be included in the second learning data set 12 based on similarity and reliability. Specifically, the processor 120 may calculate scores corresponding to similarity and reliability, and select 3D pose augmented data based on the calculated scores.

특히, 프로세서(120)는 유사도에 대응하는 분포 확률 값 및 신뢰도에 대응하는 오차에 기초하여 각각의 3D 포즈 증강 데이터(320)에 대응하는 스코어를 산출할 수 있다. 예를 들어, 프로세서(120)는 분포 확률 값과 오차 값의 합하여 각각의 3D 포즈 증강 데이터(320) 대응하는 스코어를 산출할 수 있다. 다만, 이에 제한되는 것은 아니며 프로세서(120)는 확률 값 및 오차 각각에 대응하는 벡터의 선형 결합(Linear Combination), 내적 등을 이용하여 각각의 3D 포즈 증강 데이터(320)에 대응하는 스코어를 산출할 수 있다. In particular, the processor 120 may calculate a score corresponding to each 3D pose augmented data 320 based on the distribution probability value corresponding to the similarity and the error corresponding to the reliability. For example, the processor 120 may calculate a score corresponding to each 3D pose augmented data 320 by adding the distribution probability value and the error value. However, it is not limited to this, and the processor 120 calculates a score corresponding to each 3D pose augmented data 320 using linear combination and dot product of vectors corresponding to each probability value and error. You can.

한편, 분포 확률 값과 오차 값의 합하여 각각의 3D 포즈 증강 데이터(320) 대응하는 스코어를 산출하는 경우, 프로세서(120)는 산출된 스코어에 기초하여 스코어가 낮은 순서대로 복수의 3D 포즈 증강 데이터(320)를 정렬하고, 정렬된 복수의 3D 포즈 증강 데이터(320) 중 기 설정된 범위 이내의 복수의 3D 포즈 증강 데이터를 선별할 수 있다. 예를 들어, 프로세서(120)는 스코어가 낮은 순서대로 복수의 3D 포즈 증강 데이터(320)를 정렬한 후 상위 10%에 해당하는 복수의 3D 포즈 증강 데이터(320) 만을 선별할 수 있다. Meanwhile, when calculating a score corresponding to each 3D pose augmented data 320 by adding the distribution probability value and the error value, the processor 120 generates a plurality of 3D pose augmented data in descending order of score based on the calculated score ( 320) may be sorted, and a plurality of 3D pose augmented data within a preset range may be selected from among the sorted plurality of 3D pose augmented data 320. For example, the processor 120 may sort the plurality of 3D pose augmented data 320 in descending order of score and then select only the plurality of 3D pose augmented data 320 corresponding to the top 10%.

한편, 본 개시의 일 실시 예에 따른 제2 학습 데이터 셋(12)은 제1 학습 데이터 셋(11)이 1.29배 스케일로 증강된 것일 수 있다. Meanwhile, the second learning data set 12 according to an embodiment of the present disclosure may be the first learning data set 11 augmented to a 1.29 times scale.

구체적으로, 제2 학습 데이터 셋(12)에 포함된 2D 포즈 증강 데이터(310)와 3D 포즈 증강 데이터(320)의 쌍의 개수는 제1 학습 데이터 셋(11)에 포함된 2D 포즈 증강 데이터(310)와 3D 포즈 증강 데이터(320)의 쌍의 개수의 1.29배일 수 있다. 예를 들어, 제1 학습 데이터 셋(11)에 포함된 2D 포즈 증강 데이터(310)와 3D 포즈 증강 데이터(320)의 쌍의 개수가 1000개인 경우, 제2 학습 데이터 셋(12)에 포함된 2D 포즈 증강 데이터(310)와 3D 포즈 증강 데이터(320)의 쌍의 개수 1290개일 수 있다. Specifically, the number of pairs of 2D pose augmented data 310 and 3D pose augmented data 320 included in the second learning data set 12 is the 2D pose augmented data included in the first learning data set 11 ( It may be 1.29 times the number of pairs of 310) and 3D pose augmented data 320. For example, when the number of pairs of 2D pose augmented data 310 and 3D pose augmented data 320 included in the first learning data set 11 is 1000, the number of pairs included in the second learning data set 12 is 1000. The number of pairs of 2D pose augmented data 310 and 3D pose augmented data 320 may be 1290.

이를 위해, 프로세서(120)는 제2 학습 데이터 셋(12)이 제1 학습 데이터 셋(11)의 1.29배 스케일이 되도록 증강 데이터 셋(21)에서 2D 포즈 증강 데이터와 3D 포즈 증강 데이터를 선별할 수 있다. To this end, the processor 120 selects 2D pose augmented data and 3D pose augmented data from the augmented data set 21 so that the second learning data set 12 is 1.29 times the scale of the first learning data set 11. You can.

특히, 프로세서(120)는 제1 학습 데이터 셋(11)을 증강시켜 획득된 증강 데이터 셋(21)에 포함된 3D 포즈 증강 데이터(320)를 다시 증강시켜 또 다른 증강 데이터 셋을 획득할 수도 있다. 즉, 제1 학습 데이터 셋(11)을 증강시켜 획득된 증강 데이터 셋(21)이 1차 증강 데이터 셋으로 지칭하는 경우, 프로세서(120)는 1차 증강 데이터 셋에 포함된 3D 포즈 증강 데이터(320)에 포함된 객체의 복수의 관절에 대한 3차원 좌표 정보 중 적어도 하나의 관절에 대한 3차원 좌표 정보를 서로 교환하여, 또 다른 3D 포즈 증강 데이터를 획득할 수 있다. 그리고, 프로세서(120)는 획득된 3D 포즈 증강 데이터에 포함된 복수의 관절에 대한 3차원 좌표 정보를 기 설정된 2차원 좌표 공간에 투영 시켜 각각의 3D 포즈 증강 데이터에 대응하는 2D 포즈 증강 데이터를 획득할 수 있다. 이와 같이, 프로세서(120)는 1차 증강 데이터 셋에 포함된 3D 포즈 증강 데이터를 바탕으로 획득된 복수의 또 다른 3D 포즈 증강 데이터와 이에 각각 대응하는 복수의 2D 포즈 증강 데이터를 포함하는 2차 증강 데이터 셋을 획득할 수 있다. In particular, the processor 120 may acquire another augmented data set by re-augmenting the 3D pose augmented data 320 included in the augmented data set 21 obtained by augmenting the first learning data set 11. . That is, when the augmented data set 21 obtained by augmenting the first learning data set 11 is referred to as the first augmented data set, the processor 120 uses 3D pose augmented data (3D pose augmented data) included in the first augmented data set. By exchanging 3D coordinate information for at least one joint among the 3D coordinate information for a plurality of joints of the object included in 320), another 3D pose augmented data may be obtained. Then, the processor 120 projects the 3D coordinate information for a plurality of joints included in the acquired 3D pose augmentation data onto a preset 2D coordinate space to obtain 2D pose augmentation data corresponding to each 3D pose augmentation data. can do. In this way, the processor 120 generates secondary augmentation data that includes a plurality of other 3D pose augmentation data acquired based on the 3D pose augmentation data included in the first augmentation data set and a plurality of 2D pose augmentation data corresponding thereto. A data set can be obtained.

프로세서(120)는 제2 학습 데이터 셋이 제1 학습 데이터 셋의 1.29배 스케일이 될 때까지, 반복하여 증강 데이터 셋(21)을 획득할 수 있다. 즉, 프로세서(120)는 신뢰도 및 유사도 중 적어도 하나에 기초하여 1차 증강 데이터 셋에서 선별된 3D 포즈 증강 데이터(320)를 포함하는 제2 학습 데이터의 크기가 제1 학습 데이터 셋(11)의 크기의 1.29배 스케일에 해당하지 않으면, 1차 증강 데이터 셋을 증강시켜 2차 증강 데이터 셋을 생성할 수 있다. 그리고 프로세서(120)는, 신뢰도 및 유사도 중 적어도 하나에 기초하여 생성된 2차 증강 데이터 셋에서 2차 학습 데이터에 포함시킬 3D 포즈 증강 데이터를 선별할 수 있다. The processor 120 may repeatedly acquire the augmented data set 21 until the second learning data set is 1.29 times the scale of the first learning data set. That is, the processor 120 determines that the size of the second learning data including the 3D pose augmented data 320 selected from the first augmented data set based on at least one of reliability and similarity is that of the first learning data set 11. If it does not correspond to a scale of 1.29 times the size, the first augmented data set can be augmented to create a second augmented data set. Additionally, the processor 120 may select 3D pose augmented data to be included in the secondary learning data from the secondary augmented data set generated based on at least one of reliability and similarity.

도 7은 본 개시의 일 실시 예에 따른 이미지에 포함된 객체의 포즈를 추정하는 방법을 나타낸 예시도이다.Figure 7 is an example diagram showing a method for estimating the pose of an object included in an image according to an embodiment of the present disclosure.

한편, 프로세서(120)는 제2 학습 데이터 셋(12)에 기초하여 제1 신경망 모델(41)을 재학습 시킨 후, 객체를 포함하는 이미지(30)를 획득하고, 이미지 내 객체의 2D 포즈 데이터(210)를 추정하도록 학습된 제2 신경망 모델(42)에 획득된 이미지(30)를 입력하여 이미지(30)에 대응하는 2D 포즈 데이터(210)를 획득할 수 있다. Meanwhile, the processor 120 retrains the first neural network model 41 based on the second learning data set 12, acquires an image 30 including an object, and 2D pose data of the object in the image. 2D pose data 210 corresponding to the image 30 can be obtained by inputting the acquired image 30 into the second neural network model 42 learned to estimate 210.

여기서, 제2 신경망 모델(42)은 2차원 이미지(30) 내 객체를 인식하고, 인식된 객체를 2D 포즈 데이터(210)를 추정하여 출력하도록 학습된 신경망 모델일 수 있다. 이를 위해, 프로세서(120)는 객체를 포함하는 복수의 이미지(30) 및 복수의 이미지(30) 각각에 포함된 객체에 대응하는 2D 포즈 데이터(210)를 쌍으로 포함하는 제3 학습 데이터 셋(13)을 바탕으로 제2 신경망 모델(42)을 학습시킬 수 있다. 여기서 2D 포즈 데이터(210)에는 각각의 이미지에 포함된 객체를 구성하는 복수의 관절에 대한 2차원 좌표 정보를 포함할 수 있다. Here, the second neural network model 42 may be a neural network model learned to recognize an object in the 2D image 30, estimate 2D pose data 210, and output the recognized object. For this purpose, the processor 120 includes a third learning data set (pairwise) including a plurality of images 30 including an object and 2D pose data 210 corresponding to the object included in each of the plurality of images 30. Based on 13), the second neural network model 42 can be learned. Here, the 2D pose data 210 may include two-dimensional coordinate information about a plurality of joints constituting the object included in each image.

따라서, 도 7을 참조하면 프로세서(120)가 기 학습된 제2 신경망 모델(42)에 객체를 포함하는 이미지를 입력하면, 기 학습된 제2 신경망 모델(42)은 이미지에 포함된 객체를 인식하고, 인식된 객체의 복수의 관절을 식별하고, 식별된 복수의 관절 각각의 2차원 좌표 정보를 추정하여 출력할 수 있다. Therefore, referring to FIG. 7, when the processor 120 inputs an image including an object to the pre-trained second neural network model 42, the pre-trained second neural network model 42 recognizes the object included in the image. Then, a plurality of joints of the recognized object can be identified, and two-dimensional coordinate information for each of the identified plurality of joints can be estimated and output.

본 개시에서 사용되는 제2 신경망 모델(42)은 심층 신경망(Deep Neural Network, DNN), 합성곱 신경망(Convolutional Neural Network, CNN), 순환 신경망(Recurrent Neural Network, RNN), 제한 볼츠만 머신 (Restricted Boltzmann Machine, RBM), 심층 신뢰 신경망 (Deep Belief Network, DBN), 심층 Q-네트워크(Deep Q-Networks, DQN) 등 다양한 네트워크가 사용될 수 있다.The second neural network model 42 used in the present disclosure is a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), and a restricted Boltzmann machine. A variety of networks can be used, including Machine, RBM), Deep Belief Network (DBN), and Deep Q-Networks (DQN).

그리고, 프로세서(120)는 획득된 2D 포즈 데이터(210)를 재 학습된 제1 신경망 모델(41)에 입력하여 2D 포즈 데이터(210)에 대응하는 3D 포즈 데이터(220)를 획득하고, 획득된 3D 포즈 데이터(220)에 기초하여 이미지 내 객체의 포즈를 식별할 수 있다. 구체적으로, 프로세서(120)는 제2 신경망 모델(42)의 출력 값으로 획득된 이미지 내 객체에 대한 2D 포즈 데이터를 제1 신경망 모델(41)에 입력하여 3D 포즈 데이터(220)를 획득할 수 있다. 이때, 프로세서(120)는 획득된 3D 포즈 데이터(220)에 포함된 복수의 관절에 대한 3D 좌표 정보를 바탕으로 객체의 포즈를 식별할 수 있다. 즉, 도 7을 참조하면 프로세서(120)는 3차원 좌표 정보에 기초하여 이미지(30) 내 객체가 달리는 포즈를 취한 것으로 식별할 수 있다. Then, the processor 120 inputs the acquired 2D pose data 210 into the re-learned first neural network model 41 to obtain 3D pose data 220 corresponding to the 2D pose data 210, and the obtained The pose of an object in an image can be identified based on the 3D pose data 220. Specifically, the processor 120 may acquire 3D pose data 220 by inputting 2D pose data for the object in the image obtained as the output value of the second neural network model 42 into the first neural network model 41. there is. At this time, the processor 120 may identify the pose of the object based on 3D coordinate information for a plurality of joints included in the acquired 3D pose data 220. That is, referring to FIG. 7 , the processor 120 may identify an object in the image 30 as having taken a running pose based on 3D coordinate information.

도 8은 본 개시의 일 실시 예에 따른 전자 장치(100)의 세부 구성도이다. FIG. 8 is a detailed configuration diagram of an electronic device 100 according to an embodiment of the present disclosure.

도 8을 참조하면, 전자 장치(100)는 메모리(110), 통신 인터페이스(130), 카메라(140), 입력 인터페이스(150) 및 하나 이상의 프로세서(120)를 포함한다. 도 8에 도시된 전자 장치(100)의 구성 중 도 2에 도시된 전자 장치(100)의 구성과 중복되는 구성에 대해서는 상세한 설명을 생략한다. Referring to FIG. 8 , the electronic device 100 includes a memory 110, a communication interface 130, a camera 140, an input interface 150, and one or more processors 120. Among the configurations of the electronic device 100 shown in FIG. 8, detailed descriptions of configurations that overlap with the configuration of the electronic device 100 shown in FIG. 2 will be omitted.

통신 인터페이스(130)는 통신 인터페이스(130)는 다양한 통신 방식을 통해 외부 장치 및 외부 서버와 통신을 수행할 수 있다. 통신 인터페이스(130)가 외부 장치 및 외부 서버와 통신 연결되는 것은 제3 기기(예로, 중계기, 허브, 엑세스 포인트, 게이트웨이 등)를 거쳐서 통신 하는 것을 포함할 수 있다. 예를 들어, 외부 장치 는 다른 전자 장치, 서버, 클라우드 저장소, 네트워크 등으로 구현될 수 있다. 특히 프로세서(120)는 통신 인터페이스(130)를 통해 제1 학습 데이터 셋(11) 및 제3 학습 데이터 셋을 전자 장치(100)와 연동하는 외부 서버로부터 수신할 수 있다. The communication interface 130 can communicate with external devices and external servers through various communication methods. Communication connection of the communication interface 130 with an external device and an external server may include communication through a third device (eg, repeater, hub, access point, gateway, etc.). For example, external devices may be implemented as other electronic devices, servers, cloud storage, networks, etc. In particular, the processor 120 may receive the first learning data set 11 and the third learning data set from an external server interoperating with the electronic device 100 through the communication interface 130.

통신 인터페이스(130)는 외부 장치와 통신을 수행하기 위해 다양한 통신 모듈을 포함할 수 있다. 일 예로, 통신 인터페이스(130)는 무선 통신 모듈을 포함할 수 있으며, 예를 들면, 3G(3RD Generation), 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), LTE-A(LTE Advance), CDMA(code division multiple access), WCDMA(wideband CDMA), UMTS(universal mobile telecommunications system), WiBro(Wireless Broadband), 또는 GSM(Global System for Mobile Communications) 등 중 적어도 하나를 사용하는 셀룰러 통신 모듈을 포함할 수 있다. 또 다른 예로, 무선 통신 모듈은, 예를 들면, WiFi(wireless fidelity), 블루투스, 블루투스 저전력(BLE), 지그비(Zigbee), 중 적어도 하나를 포함할 수 있다.The communication interface 130 may include various communication modules to communicate with external devices. As an example, the communication interface 130 may include a wireless communication module, such as 3RD Generation (3G), 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), and LTE Advance (LTE-A). , including a cellular communication module using at least one of code division multiple access (CDMA), wideband CDMA (WCDMA), universal mobile telecommunications system (UMTS), wireless broadband (WiBro), or global system for mobile communications (GSM), etc. can do. As another example, the wireless communication module may include at least one of, for example, wireless fidelity (WiFi), Bluetooth, Bluetooth low energy (BLE), and Zigbee.

카메라(140)는 전자 장치(100)의 주변 객체를 촬영하여 객체에 관한 이미지를 획득할 수 있다. 이와 같이 획득된 객체에 관한 이미지는 제3 학습 데이터 셋에 포함되어 제2 신경망 모델(42)을 학습 시키는데 이용될 수 있다. The camera 140 may capture an object surrounding the electronic device 100 and obtain an image of the object. The image of the object obtained in this way can be included in the third learning data set and used to train the second neural network model 42.

카메라(140)는 CMOS 구조를 가진 촬상 소자(CIS, CMOS Image Sensor), CCD 구조를 가진 촬상 소자(Charge Coupled Device) 등의 촬상 소자로 구현될 수 있다. 다만, 이에 한정되는 것은 아니며, 카메라(140)는 객체를 촬영할 수 있는 다양한 해상도의 카메라(140) 모듈로 구현될 수 있다. 한편, 카메라(140)는 뎁스 카메라(140)(예를 들어, IR 뎁스 카메라 등), 스테레오 카메라 또는 RGB 카메라 등으로 구현될 수 있다.The camera 140 may be implemented as an imaging device such as a CMOS Image Sensor (CIS) with a CMOS structure or a Charge Coupled Device (CCD) structure. However, the present invention is not limited to this, and the camera 140 may be implemented as a camera 140 module with various resolutions capable of photographing objects. Meanwhile, the camera 140 may be implemented as a depth camera 140 (eg, an IR depth camera, etc.), a stereo camera, or an RGB camera.

입력 인터페이스(150)는 회로를 포함하며, 프로세서(120)는 입력 인터페이스(150)를 통해 전자 장치(100)의 동작을 제어하기 위한 사용자 명령을 수신할 수 있다. 구체적으로, 입력 인터페이스(150)는 터치 스크린으로서 디스플레이를 포함할 수 있으나, 이는 일 실시예에 불과한 뿐, 버튼, 마이크 및 리모컨 신호 수신부 (미도시) 등과 같은 구성을 포함할 수 있다.The input interface 150 includes a circuit, and the processor 120 can receive user commands for controlling the operation of the electronic device 100 through the input interface 150. Specifically, the input interface 150 may include a display as a touch screen, but this is only an example and may include components such as buttons, a microphone, and a remote control signal receiver (not shown).

도 9는 본 개시의 일 실시 예에 따른 전자 장치(100)를 제어하는 방법을 개략적으로 나타낸 순서도이다. FIG. 9 is a flowchart schematically showing a method of controlling the electronic device 100 according to an embodiment of the present disclosure.

본 개시의 일 실시 예에 따른 전자 장치(100)를 제어 하는 방법을 수행하기 위하여, 프로세서(120)는 먼저 복수의 2D 포즈 데이터(210) 및 복수의 2D 포즈 데이터(210)에 각각 대응하는 복수의 3D 포즈 데이터(220)를 쌍으로 포함하는 제1 학습 데이터 셋(11)에 기초하여 3D 포즈를 추정하도록 제1 신경망 모델(41)을 학습 시킨다(S910). In order to perform a method of controlling the electronic device 100 according to an embodiment of the present disclosure, the processor 120 first generates a plurality of 2D pose data 210 and a plurality of data corresponding to the plurality of 2D pose data 210. The first neural network model 41 is trained to estimate the 3D pose based on the first learning data set 11 including the 3D pose data 220 in pairs (S910).

이때, 2D 포즈 데이터(210)는, 오브젝트를 구성하는 복수의 관절 각각에 대한 2D 좌표 정보를 포함하고, 3D 포즈 데이터(220)는 오브젝트를 구성하는 복수의 관절 각각에 대한 3D 좌표 정보를 포함할 수 있다. At this time, the 2D pose data 210 may include 2D coordinate information for each of a plurality of joints constituting the object, and the 3D pose data 220 may include 3D coordinate information for each of a plurality of joints constituting the object. You can.

그리고, 프로세서(120)는 제1 학습 데이터 셋(11)을 증강시켜 증강 데이터 셋(21)을 획득한다(S920). 구체적으로 프로세서(120)는 복수의 3D 포즈 데이터(220) 간의 적어도 하나의 동일한 관절에 대한 3D 좌표 정보를 교환하여 제1 학습 데이터 셋(11)을 증강시킬 수 있다. Then, the processor 120 acquires the augmented data set 21 by augmenting the first learning data set 11 (S920). Specifically, the processor 120 may augment the first learning data set 11 by exchanging 3D coordinate information for at least one same joint between the plurality of 3D pose data 220.

프로세서(120)는 획득된 증강 데이터 셋(21)에 포함된 3D 포즈 증강 데이터(320)의 유사도 및 신뢰도 중 적어도 하나에 기초하여, 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320) 중 적어도 하나의 3D 포즈 증강 데이터를 선별한다(S930). The processor 120 generates a plurality of 3D pose augmented data ( 320), at least one 3D pose augmented data is selected (S930).

일 예로, 프로세서(120)는 제1 학습 데이터 셋(11)에 포함된 복수의 3D 포즈 데이터(220)의 분포 함수를 획득하고, 획득된 분포 함수에 기초하여 증강 데이터 셋(21)에 포함된 복수의 3D 포즈 증강 데이터(320)의 제1 학습 데이터 셋(11)에 대한 분포 확률 값을 획득할 수 있다. 이때, 유사도는 분포 확률 값이 높을수록 높게 식별될 수 있다. As an example, the processor 120 acquires a distribution function of a plurality of 3D pose data 220 included in the first learning data set 11, and based on the obtained distribution function, the processor 120 includes the augmented data set 21. A distribution probability value for the first learning data set 11 of the plurality of 3D pose augmented data 320 may be obtained. At this time, the similarity can be identified as higher as the distribution probability value is higher.

프로세서(120)는 복수의 3D 포즈 증강 데이터(320) 중 분포 확률 값이 기 설정된 값 미만인 3D 포즈 증강 데이터(320) 만을 선별할 수 있다The processor 120 may select only 3D pose augmented data 320 whose distribution probability value is less than a preset value among the plurality of 3D pose augmented data 320.

일 예로, 프로세서(120)는 제1 신경망 모델(41)에 증강 데이터 셋(21)에 포함된 복수의 2D 포즈 증강 데이터(310)를 각각 입력하여, 각각의 2D 포즈 증강 데이터(310)에 대응하는 복수의 3D 포즈 출력 데이터(420)를 각각 획득하고, 획득된 복수의 3D 포즈 출력 데이터(420)에 기초하여 복수의 2D 포즈 증강 데이터(310) 각각에 대응하는 3D 포즈 증강 데이터(320)의 신뢰도를 각각 식별할 수 있다. As an example, the processor 120 inputs a plurality of 2D pose augmented data 310 included in the augmented data set 21 into the first neural network model 41 and corresponds to each 2D pose augmented data 310. Each of the plurality of 3D pose output data 420 is acquired, and 3D pose augmentation data 320 corresponding to each of the plurality of 2D pose augmentation data 310 is obtained based on the acquired plurality of 3D pose output data 420. Each reliability can be identified.

특히, 프로세서(120)는 동일한 2D 포즈 증강 데이터(310)에 대응하는 3D 포즈 출력 데이터(420) 및 3D 포즈 증강 데이터(320) 간의 오차를 식별하고, 식별된 오차가 작을 수록 각각의 3D 포즈 증강 데이터(320)의 신뢰도가 높은 것으로 식별할 수 있다. In particular, the processor 120 identifies the error between the 3D pose output data 420 and the 3D pose augmentation data 320 corresponding to the same 2D pose augmentation data 310, and the smaller the identified error, the better each 3D pose augmentation. The reliability of the data 320 can be identified as high.

한편, 프로세서(120)는 복수의 3D 포즈 증강 데이터(320) 중 오차가 기 설정된 제2 값 미만인 3D 포즈 증강 데이터(320) 만을 선별할 수 있다. Meanwhile, the processor 120 may select only the 3D pose augmented data 320 whose error is less than a preset second value among the plurality of 3D pose augmented data 320.

프로세서(120)는 3D 포즈 증강 데이터를 선별한 후 선별된 3D 포즈 증강 데이터(320)와 증강 데이터 셋(21)에 포함된 선별된 3D 포즈 증강 데이터(320)에 대응하는 2D 포즈 증강 데이터(310)를 쌍으로 포함하는 제2 학습 데이터 셋(12)을 획득한다(S940). 이때 제2 학습 데이터 셋(12)은 제1 학습 데이터 셋(11)이 1.29배 스케일로 증강된 것일 수 있다. The processor 120 selects the 3D pose augmented data and then generates the selected 3D pose augmented data 320 and 2D pose augmented data 310 corresponding to the selected 3D pose augmented data 320 included in the augmented data set 21. ) is obtained in pairs (S940). At this time, the second learning data set 12 may be an augmented version of the first learning data set 11 at a scale of 1.29 times.

그리고, 프로세서(120)는 획득된 제2 학습 데이터 셋(12)에 기초하여 제1 신경망 모델(41)을 재 학습시킨다(S950). Then, the processor 120 retrains the first neural network model 41 based on the acquired second training data set 12 (S950).

이후 프로세서(120)는 객체를 포함하는 이미지를 획득하고, 이미지를 이미지 내 객체의 2D 포즈 데이터(210)를 추정하도록 학습된 제2 신경망 모델(42)에 입력하여 이미지에 대응하는 2D 포즈 데이터(210)를 획득할 수 있다. Thereafter, the processor 120 acquires an image including an object, inputs the image to the second neural network model 42 learned to estimate 2D pose data 210 of the object in the image, and obtains 2D pose data corresponding to the image ( 210) can be obtained.

그리고, 프로세서(120)는 획득된 2D 포즈 데이터(210)를 재학습된 제1 신경망 모델(41)에 입력하여 2D 포즈 데이터(210)에 대응하는 3D 포즈 데이터(220)를 획득하고, 획득된 3D 포즈 데이터(220)에 기초하여 이미지 내 객체의 포즈를 식별할 수 있다.Then, the processor 120 inputs the acquired 2D pose data 210 into the retrained first neural network model 41 to obtain 3D pose data 220 corresponding to the 2D pose data 210, and the obtained The pose of an object in an image can be identified based on the 3D pose data 220.

상술한 설명에서, 단계 S910 내지 S950 은 본 발명의 실시 예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S910 to S950 may be further divided into additional steps or combined into fewer steps, depending on the embodiment of the present invention. Additionally, some steps may be omitted or the order between steps may be changed as needed.

한편, 상술한 본 개시의 다양한 실시 예들에 따른 방법들은, 기존 전자 장치에 설치 가능한 어플리케이션 형태로 구현될 수 있다. 또는 상술한 본 개시의 다양한 실시 예들에 따른 방법들은 딥 러닝 기반의 학습된 신경망(또는 심층 학습된 신경망) 즉, 학습 네트워크 모델을 이용하여 수행될 수 있다. 또한, 상술한 본 개시의 다양한 실시 예들에 따른 방법들은, 기존 전자 장치에 대한 소프트웨어 업그레이드, 또는 하드웨어 업그레이드 만으로도 구현될 수 있다. 또한, 상술한 본 개시의 다양한 실시 예들은 전자 장치에 구비된 임베디드 서버, 또는 전자 장치의 외부 서버를 통해 수행되는 것도 가능하다. Meanwhile, the methods according to various embodiments of the present disclosure described above may be implemented in the form of applications that can be installed on existing electronic devices. Alternatively, the methods according to various embodiments of the present disclosure described above may be performed using a deep learning-based learned neural network (or deep learned neural network), that is, a learning network model. Additionally, the methods according to various embodiments of the present disclosure described above may be implemented only by upgrading software or hardware for an existing electronic device. Additionally, the various embodiments of the present disclosure described above can also be performed through an embedded server provided in an electronic device or an external server of the electronic device.

한편, 본 개시의 일시 예에 따르면, 이상에서 설명된 다양한 실시 예들은 기기(machine)(예: 컴퓨터)로 읽을 수 있는 저장 매체(machine-readable storage media)에 저장된 명령어를 포함하는 소프트웨어로 구현될 수 있다. 기기는, 저장 매체로부터 저장된 명령어를 호출하고, 호출된 명령어에 따라 동작이 가능한 장치로서, 개시된 실시 예들에 따른 전자 장치를 포함할 수 있다. 명령이 프로세서에 의해 실행될 경우, 프로세서가 직접, 또는 프로세서의 제어 하에 다른 구성요소들을 이용하여 명령에 해당하는 기능을 수행할 수 있다. 명령은 컴파일러 또는 인터프리터에 의해 생성 또는 실행되는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장 매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장 매체가 신호(signal)를 포함하지 않으며 실재(tangible)한다는 것을 의미할 뿐 데이터가 저장매체에 반영구적 또는 임시적으로 저장됨을 구분하지 않는다.Meanwhile, according to an example of the present disclosure, the various embodiments described above may be implemented as software including instructions stored in a machine-readable storage media (e.g., a computer). You can. The device is a device capable of calling instructions stored from a storage medium and operating according to the called instructions, and may include an electronic device according to the disclosed embodiments. When an instruction is executed by a processor, the processor may perform the function corresponding to the instruction directly or using other components under the control of the processor. Instructions may contain code generated or executed by a compiler or interpreter. A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' only means that the storage medium does not contain signals and is tangible, and does not distinguish whether the data is stored in the storage medium semi-permanently or temporarily.

또한, 일 실시 예에 따르면, 이상에서 설명된 다양한 실시 예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로, 또는 어플리케이션 스토어(예: 플레이 스토어TM)를 통해 온라인으로 배포될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.Additionally, according to one embodiment, the methods according to various embodiments described above may be provided and included in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. The computer program product may be distributed on a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or online through an application store (e.g. Play Store™). In the case of online distribution, at least a portion of the computer program product may be at least temporarily stored or created temporarily in a storage medium such as the memory of a manufacturer's server, an application store's server, or a relay server.

또한, 상술한 다양한 실시 예들에 따른 구성 요소(예: 모듈 또는 프로그램) 각각은 단수 또는 복수의 개체로 구성될 수 있으며, 전술한 해당 서브 구성 요소들 중 일부 서브 구성 요소가 생략되거나, 또는 다른 서브 구성 요소가 다양한 실시 예에 더 포함될 수 있다. 대체적으로 또는 추가적으로, 일부 구성 요소들(예: 모듈 또는 프로그램)은 하나의 개체로 통합되어, 통합되기 이전의 각각의 해당 구성 요소에 의해 수행되는 기능을 동일 또는 유사하게 수행할 수 있다. 다양한 실시 예들에 따른, 모듈, 프로그램 또는 다른 구성 요소에 의해 수행되는 동작들은 순차적, 병렬적, 반복적 또는 휴리스틱하게 실행되거나, 적어도 일부 동작이 다른 순서로 실행되거나, 생략되거나, 또는 다른 동작이 추가될 수 있다.In addition, each component (e.g., module or program) according to the various embodiments described above may be composed of a single or multiple entities, and some of the sub-components described above may be omitted, or other sub-components may be omitted. Additional components may be included in various embodiments. Alternatively or additionally, some components (e.g., modules or programs) may be integrated into a single entity and perform the same or similar functions performed by each corresponding component prior to integration. According to various embodiments, operations performed by a module, program, or other component may be executed sequentially, in parallel, iteratively, or heuristically, or at least some operations may be executed in a different order, omitted, or other operations may be added. You can.

이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시에 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.In the above, preferred embodiments of the present disclosure have been shown and described, but the present disclosure is not limited to the specific embodiments described above, and may be used in the technical field pertaining to the disclosure without departing from the gist of the disclosure as claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be understood individually from the technical ideas or perspectives of the present disclosure.

100: 전자 장치
110: 메모리
120: 하나 이상의 프로세서100: electronic device
110: memory
120: One or more processors

Claims

In electronic devices,
a memory storing a first learning data set including a plurality of 2D pose data pairs and a plurality of 3D pose data corresponding to the plurality of 2D pose data; and
Comprising one or more processors to train a first neural network model to estimate a 3D pose based on the first training data set,
The one or more processors augment the first learning data set to obtain an augmented data set, and based on at least one of similarity and reliability of 3D pose augmented data included in the obtained augmented data set, to the augmented data set. Selecting at least one 3D pose augmented data from among a plurality of included 3D pose augmented data, and providing 2D pose augmented data corresponding to the selected 3D pose augmented data and the selected 3D pose augmented data included in the augmented data set. An electronic device that acquires a second training data set including a pair, and retrains the first neural network model based on the acquired second training data set.

According to paragraph 1,
The one or more processors:
Obtaining a distribution function of the plurality of 3D pose data included in the first training data set, and based on the obtained distribution function, the first learning data set of the plurality of 3D pose augmented data included in the augmented data set An electronic device that obtains a distribution probability value for and identifies the similarity as higher as the distribution probability value is higher.

According to paragraph 2,
The one or more processors
An electronic device that selects 3D pose augmented data whose distribution probability value is less than a preset first value among the plurality of 3D pose augmented data.

According to paragraph 1,
The one or more processors:
Inputting a plurality of 2D pose augmentation data included in the augmentation data set into the first neural network model, respectively acquiring a plurality of 3D pose output data corresponding to each 2D pose augmentation data, and obtaining a plurality of 3D pose output data corresponding to each 2D pose augmentation data, respectively, and An electronic device that identifies reliability of 3D pose augmented data corresponding to each of the plurality of 2D pose augmented data based on pose output data.

According to paragraph 4,
The one or more processors
Identifying an error between 3D pose output data and 3D pose augmented data corresponding to the same 2D pose augmented data, identifying that the smaller the identified error, the higher the reliability of each 3D pose augmented data, and identifying the plurality of 3D pose augmented data. An electronic device that selects 3D pose augmented data whose error is less than a preset second value from among the augmented data.

According to paragraph 1,
The 2D pose data includes 2D coordinate information for each of a plurality of joints constituting the object, and the 3D pose data includes 3D coordinate information for each of a plurality of joints constituting the object,
The electronic device wherein the one or more processors augment the first learning data set by exchanging 3D coordinate information for at least one same joint between the plurality of 3D pose data.

According to paragraph 1,
The one or more processors:
Obtaining an image including an object, inputting the image into a second neural network model learned to estimate 2D pose data of the object in the image to obtain 2D pose data corresponding to the image, and obtaining 2D pose data Input to the retrained first neural network model to obtain 3D pose data corresponding to the 2D pose data, and identify the pose of the object in the image based on the obtained 3D pose data.

According to paragraph 1,
The second learning data set is an electronic device in which the first learning data set is augmented to a 1.29 times scale.

In a method of controlling an electronic device,
Training a first neural network model to estimate a 3D pose based on a first learning data set pairwise including a plurality of 2D pose data and a plurality of 3D pose data corresponding to the plurality of 2D pose data;
acquiring an augmented data set by augmenting the first learning data set;
Selecting at least one 3D pose augmented data from among a plurality of 3D pose augmented data included in the augmented data set based on at least one of similarity and reliability of the 3D pose augmented data included in the acquired augmented data set;
Obtaining a second learning data set pairwise comprising the selected 3D pose augmented data and 2D pose augmented data corresponding to the selected 3D pose augmented data included in the augmented data set; and
A control method comprising retraining the first neural network model based on the acquired second training data set.

According to clause 9,
Obtaining a distribution function of the plurality of 3D pose data included in the first learning data set; And obtaining a distribution probability value for the first learning data set of a plurality of 3D pose augmented data included in the augmented data set based on the obtained distribution function.
The control method wherein the similarity is identified as higher as the distribution probability value is higher.

According to clause 10,
The selection step is,
A control method for selecting 3D pose augmented data whose distribution probability value is less than a preset value among the plurality of 3D pose augmented data.

According to clause 9,
Inputting a plurality of 2D pose augmented data included in the augmented data set into the first neural network model, and obtaining a plurality of 3D pose output data corresponding to each 2D pose augmented data; and
A control method comprising: identifying reliability of 3D pose augmented data corresponding to each of the plurality of 2D pose augmented data based on the acquired plurality of 3D pose output data.

According to clause 12,
The step of identifying the reliability is,
Identifying errors between 3D pose output data and 3D pose augmented data corresponding to the same 2D pose augmented data, and identifying that the smaller the identified error, the higher the reliability of each 3D pose augmented data,
The selection step is,
A control method for selecting 3D pose augmented data whose error is less than a preset second value among the plurality of 3D pose augmented data.

According to clause 9,
The 2D pose data includes 2D coordinate information for each of a plurality of joints constituting the object, and the 3D pose data includes 3D coordinate information for each of a plurality of joints constituting the object,
The step of acquiring the augmented data set is,
A control method for augmenting the first learning data set by exchanging 3D coordinate information for at least one same joint between the plurality of 3D pose data.

According to clause 9,
acquiring an image containing an object;
acquiring 2D pose data corresponding to the image by inputting the image into a second neural network model learned to estimate 2D pose data of an object in the image;
Inputting the obtained 2D pose data into the retrained first neural network model to obtain 3D pose data corresponding to the 2D pose data; and
A control method comprising identifying a pose of an object in the image based on the acquired 3D pose data.

According to clause 9,
The second learning data set is a control method in which the first learning data set is augmented to a 1.29 times scale.