KR102638369B1

KR102638369B1 - Method and apparatus for modifying motion clip

Info

Publication number: KR102638369B1
Application number: KR1020220132563A
Authority: KR
Inventors: 노준용; 김혜민; 조경민
Original assignee: 한국과학기술원
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2024-02-21

Abstract

모션 클립의 보정 방법 및 장치가 개시된다. 일 실시 예에 따른 모션 클립의 보정 방법은 객체의 포즈에 대응하는 일련의 프레임들을 포함하는 입력 모션 클립을 인코더에 인가하여, 입력 모션 클립에 포함된 프레임들 각각에 대응하는 특징 벡터를 획득하는 단계, 특징 벡터를 BGRU(bidirectional gated recurrent unit)에 인가하여, 제1 방향에 대응하는 제1 은닉 벡터 및 제2 방향에 대응하는 제2 은닉 벡터를 획득하는 단계 및 제1 은닉 벡터 및 제2 은닉 벡터를 디코더에 인가하여, 입력 모션 클립이 보정된 출력 모션 클립을 획득하는 단계를 포함할 수 있다.A method and apparatus for correcting a motion clip are disclosed. A motion clip correction method according to an embodiment includes applying an input motion clip including a series of frames corresponding to the pose of an object to an encoder, and obtaining feature vectors corresponding to each of the frames included in the input motion clip. , applying the feature vector to a bidirectional gated recurrent unit (BGRU) to obtain a first hidden vector corresponding to the first direction and a second hidden vector corresponding to the second direction, and the first hidden vector and the second hidden vector It may include the step of applying to the decoder to obtain an output motion clip in which the input motion clip has been corrected.

Description

Method and apparatus for compensating motion clips {METHOD AND APPARATUS FOR MODIFYING MOTION CLIP}

아래 실시예들은 모션 클립의 보정 방법 및 장치에 관한 것이다.The following embodiments relate to a method and device for correcting a motion clip.

현실 세계의 객체(예: 사람)의 모션의 캡처 및 처리 기술은 사실적인 캐릭터 애니메이션을 생성하기 위하여 널리 이용되는 기술이다. 객체의 모션은 시간에 따라 변하는 고차원의 신호 유형으로 볼 수 있기 때문에, 신호 처리 및 기하학적 기술을 일반화하여 다양한 모션 합성 기술에 적용할 수 있다. 애니메이션으로 생성하기 위한 객체의 모션 전체를 캡처하는 것은 시간 및 자원이 많이 소요되므로, 캡처된 모션 클립을 합성 및/또는 변경하여 자연스러운 모션을 생성하기 위한 모션 합성과 관련된 입력 모션의 리타겟팅, 워핑, 블렌딩 및 편집 등의 기술이 요구되고 있다. Capturing and processing technology for the motion of objects (e.g., people) in the real world is a widely used technology to create realistic character animation. Because the motion of an object can be viewed as a high-dimensional signal type that changes over time, signal processing and geometric techniques can be generalized and applied to various motion synthesis technologies. Capturing the entire motion of an object for animation creation is time-consuming and resource-consuming, so compositing and/or altering the captured motion clips involves retargeting, warping, and other input motion related to motion synthesis to create natural motion. Skills such as blending and editing are required.

아래 실시 예들을 통해 연결이 자연스럽지 않은 모션 클립을 자연스러운 모션을 포함하도록 보정하는 기술을 제공할 수 있다.Through the examples below, it is possible to provide technology for correcting motion clips with unnatural connections to include natural motion.

아래 실시 예들을 통해 정답 데이터로부터 학습 데이터를 생성하여 모션 클립의 보정을 위한 모델의 학습을 위하여 필요한 데이터를 확보할 수 있다.Through the examples below, it is possible to secure the data necessary to learn a model for correcting motion clips by generating learning data from the correct answer data.

다만, 기술적 과제는 상술한 기술적 과제들로 한정되는 것은 아니며, 또 다른 기술적 과제들이 존재할 수 있다.However, technical challenges are not limited to the above-mentioned technical challenges, and other technical challenges may exist.

일 측에 따른 모션 클립의 보정 방법은 객체의 포즈에 대응하는 일련의 프레임들을 포함하는 입력 모션 클립을 인코더에 인가하여, 상기 입력 모션 클립에 포함된 상기 프레임들 각각에 대응하는 특징 벡터를 획득하는 단계; 상기 특징 벡터를 BGRU(bidirectional gated recurrent unit)에 인가하여, 제1 방향에 대응하는 제1 은닉 벡터 및 제2 방향에 대응하는 제2 은닉 벡터를 획득하는 단계; 및 상기 제1 은닉 벡터 및 상기 제2 은닉 벡터를 디코더에 인가하여, 상기 입력 모션 클립이 보정된 출력 모션 클립을 획득하는 단계를 포함한다.A motion clip correction method according to one side applies an input motion clip including a series of frames corresponding to the pose of an object to an encoder, and obtains a feature vector corresponding to each of the frames included in the input motion clip. step; applying the feature vector to a bidirectional gated recurrent unit (BGRU) to obtain a first hidden vector corresponding to a first direction and a second hidden vector corresponding to a second direction; and applying the first hidden vector and the second hidden vector to a decoder to obtain an output motion clip in which the input motion clip has been corrected.

상기 인코더, 상기 BGRU 및 상기 디코더를 포함하는 모션 클립의 보정을 위한 모델은 정답 모션 클립으로부터 변형된 모션을 포함하도록 생성된 학습 데이터로부터, 미리 정의된 손실 함수에 기초하여 상기 정답 모션 클립을 출력하도록 학습된 뉴럴 네트워크를 포함한다.A model for correction of a motion clip including the encoder, the BGRU, and the decoder is configured to output the correct answer motion clip based on a predefined loss function from learning data generated to include motion modified from the correct answer motion clip. Contains a trained neural network.

상기 학습 데이터는 상기 정답 모션 클립의 일부 프레임을 쿼리로 한 모션 클립의 집합에 대응하는 공간에서의 최근접 이웃 탐색(nearest neighborhood search)에 기초하여 획득된 근접 모션 클립 및 정답 모션 클립의 나머지 일부를 포함할 수 있다.The learning data includes a close motion clip obtained based on a nearest neighborhood search in a space corresponding to a set of motion clips querying some frames of the correct answer motion clip and the remaining part of the correct answer motion clip. It can be included.

상기 근접 모션 클립은 제1 인덱스의 프레임에 포함된 포즈와 제2 인덱스의 프레임에 포함된 포즈에 서로 다른 가중치를 부가하여 계산된 상기 쿼리와 상기 모션 클립의 집합에 포함된 원소 사이의 거리에 기초하여 상기 집합에서 추출된 적어도 하나의 원소를 포함할 수 있다.The proximity motion clip is based on the distance between the query and the elements included in the set of motion clips calculated by adding different weights to the pose included in the frame of the first index and the pose included in the frame of the second index. Thus, it may include at least one element extracted from the set.

상기 손실 함수는 상기 정답 모션 클립에 포함된 프레임들에 대응하는 포즈들 및 상기 출력 모션 클립에 포함된 프레임들에 대응하는 포즈들의 차이에 기초하여 결정된 제1 손실 함수 및 상기 학습 데이터의 첫번째 프레임에 대응하는 포즈 및 상기 출력 모션 클립의 첫번째 프레임에 대응하는 포즈의 차이에 기초하여 결정된 제2 손실 함수 중 적어도 하나를 포함할 수 있다.The loss function is based on a first loss function determined based on the difference between poses corresponding to frames included in the correct answer motion clip and poses corresponding to frames included in the output motion clip and the first frame of the learning data. It may include at least one of a second loss function determined based on a difference between a corresponding pose and a pose corresponding to a first frame of the output motion clip.

상기 출력 모션 클립을 획득하는 단계는 상기 제1 은닉 벡터 및 상기 제2 은닉 벡터를 연결하여 생성된 벡터를 상기 디코더에 인가하여, 상기 입력 모션 클립에 포함된 상기 프레임들 각각에 대응하는 보정된 상기 객체의 포즈를 획득하는 단계; 및 상기 입력 모션 클립에 포함된 상기 프레임들에 대응하는 포즈들을 포함하는 상기 출력 모션 클립을 획득하는 단계를 포함할 수 있다.The step of acquiring the output motion clip includes applying a vector generated by concatenating the first hidden vector and the second hidden vector to the decoder to generate the corrected image corresponding to each of the frames included in the input motion clip. Obtaining a pose of an object; and obtaining the output motion clip including poses corresponding to the frames included in the input motion clip.

상기 모션 클립의 보정 방법은 보정 대상 모션 클립에서 미리 정해진 개수의 프레임들을 포함하는 상기 입력 모션 클립을 추출하는 단계를 더 포함할 수 있다.The motion clip correction method may further include extracting the input motion clip including a predetermined number of frames from the motion clip to be corrected.

상기 모션 클립의 보정 방법은 상기 보정 대상 모션 클립에서 상기 입력 모션 클립을 상기 출력 모션 클립으로 변경하는 단계; 및 상기 변경된 보정 대상 모션 클립에서 미리 정해진 개수의 프레임들을 포함하는 새로운 입력 모션 클립을 추출하는 단계를 더 포함할 수 있다.The motion clip correction method includes changing the input motion clip to the output motion clip in the correction target motion clip; and extracting a new input motion clip including a predetermined number of frames from the changed motion clip to be corrected.

일 측에 따른 모션 클립의 보정을 위한 모델에 포함된 뉴럴 네트워크의 학습 방법은 정답 모션 클립의 일부 프레임을 쿼리로 한 모션 클립의 집합에 대응하는 공간에서의 최근접 이웃 탐색에 기초하여, 상기 쿼리에 대응하는 근접 모션 클립을 획득하는 단계; 상기 근접 모션 클립 및 상기 쿼리로 생성되지 않은 상기 정답 모션 클립의 나머지 프레임을 포함하는 학습 데이터를 생성하는 단계; 및 상기 정답 모션 클립에 관하여 정의된 제1 손실 함수에 기초하여, 상기 학습 데이터로부터 상기 정답 모션 클립을 출력하도록 모션 클립의 보정을 위한 상기 뉴럴 네트워크를 학습시키는 단계를 포함한다.The learning method of the neural network included in the model for correction of motion clips according to one side is based on nearest neighbor search in the space corresponding to a set of motion clips using some frames of the correct motion clip as a query, and the query Obtaining a proximity motion clip corresponding to; generating learning data including remaining frames of the proximate motion clip and the correct motion clip not generated by the query; and training the neural network for correction of a motion clip to output the correct motion clip from the learning data, based on a first loss function defined with respect to the correct motion clip.

상기 제1 손실 함수는 상기 정답 모션 클립에 포함된 프레임들에 대응하는 포즈들 및 상기 뉴럴 네트워크의 출력 데이터에 포함된 프레임들에 대응하는 포즈들의 차이에 기초하여 결정된 손실 함수를 포함할 수 있다. The first loss function may include a loss function determined based on a difference between poses corresponding to frames included in the correct motion clip and poses corresponding to frames included in output data of the neural network.

상기 뉴럴 네트워크를 학습시키는 단계는 상기 학습 데이터에 대응하는 상기 뉴럴 네트워크의 출력 데이터의 첫번째 프레임 및 상기 학습 데이터의 첫번째 프레임에 관하여 정의된 제2 손실 함수에 더 기초하여, 상기 학습 데이터의 첫번째 포즈를 포함하는 모션 클립을 출력하도록 상기 뉴럴 네트워크를 학습시키는 단계를 포함할 수 있다.The step of training the neural network determines a first pose of the learning data based further on a first frame of output data of the neural network corresponding to the learning data and a second loss function defined with respect to the first frame of the learning data. It may include training the neural network to output a motion clip including a motion clip.

상기 제2 손실 함수는 상기 학습 데이터의 첫번째 프레임에 대응하는 포즈 및 상기 출력 데이터의 첫번째 프레임에 대응하는 포즈의 차이에 기초하여 결정된 손실 함수를 포함할 수 있다.The second loss function may include a loss function determined based on a difference between a pose corresponding to the first frame of the learning data and a pose corresponding to the first frame of the output data.

상기 근접 모션 클립을 획득하는 단계는 제1 인덱스의 프레임에 포함된 포즈와 제2 인덱스의 프레임에 포함된 포즈에 서로 다른 가중치를 부가하여 상기 모션 클립의 집합에 포함된 각 원소와 상기 쿼리의 거리를 계산하는 단계; 및 상기 계산된 거리에 기초하여, 상기 모션 클립의 집합에 포함된 적어도 하나의 원소를 상기 근접 모션 클립으로 결정하는 단계를 포함할 수 있다.The step of acquiring the proximity motion clip includes adding different weights to the pose included in the frame of the first index and the pose included in the frame of the second index to determine the distance between each element included in the set of motion clips and the query. calculating; and determining at least one element included in the set of motion clips as the proximate motion clip based on the calculated distance.

상기 쿼리의 거리를 계산하는 단계는 상기 제2 인덱스보다 앞선 상기 제1 인덱스에 대응하여, 상기 제1 인덱스의 프레임보다 상기 제2 인덱스의 프레임에 높은 가중치를 부가하는 단계를 더 포함할 수 있다.Calculating the distance of the query may further include adding a higher weight to the frame of the second index than to the frame of the first index, in response to the first index preceding the second index.

일 측에 따른 장치는 객체의 포즈에 대응하는 일련의 프레임들을 포함하는 입력 모션 클립을 인코더에 인가하여, 상기 입력 모션 클립에 포함된 상기 프레임들 각각에 대응하는 특징 벡터를 획득하고, 상기 특징 벡터를 BGRU(bidirectional gated recurrent unit)에 인가하여, 제1 방향에 대응하는 제1 은닉 벡터 및 제2 방향에 대응하는 제2 은닉 벡터를 획득하고, 상기 제1 은닉 벡터 및 상기 제2 은닉 벡터를 디코더에 인가하여, 상기 입력 모션 클립이 보정된 출력 모션 클립을 획득하는, 적어도 하나의 프로세서를 포함한다.A device according to one side applies an input motion clip including a series of frames corresponding to the pose of an object to an encoder, obtains a feature vector corresponding to each of the frames included in the input motion clip, and obtains the feature vector. is applied to a bidirectional gated recurrent unit (BGRU) to obtain a first hidden vector corresponding to the first direction and a second hidden vector corresponding to the second direction, and the first hidden vector and the second hidden vector are transmitted to a decoder and at least one processor configured to obtain an output motion clip with the input motion clip corrected.

상기 학습 데이터는 상기 정답 모션 클립의 일부 프레임을 쿼리로 한 모션 클립의 집합에 대응하는 공간에서의 최근접 이웃 탐색에 기초하여 획득된 근접 모션 클립 및 정답 모션 클립의 나머지 일부를 포함할 수 있다.The learning data may include a remaining part of the correct answer motion clip and a nearby motion clip obtained based on nearest neighbor search in a space corresponding to a set of motion clips querying some frames of the correct answer motion clip.

상기 프로세서는, 상기 출력 모션 클립을 획득함에 있어서, 상기 제1 은닉 벡터 및 상기 제2 은닉 벡터를 연결하여 생성된 벡터를 상기 디코더에 인가하여, 상기 입력 모션 클립에 포함된 상기 프레임들 각각에 대응하는 보정된 상기 객체의 포즈를 획득하고, 상기 입력 모션 클립에 포함된 상기 프레임들에 대응하는 포즈들을 포함하는 상기 출력 모션 클립을 획득할 수 있다.When obtaining the output motion clip, the processor applies a vector generated by concatenating the first hidden vector and the second hidden vector to the decoder to correspond to each of the frames included in the input motion clip. The corrected pose of the object may be obtained, and the output motion clip including poses corresponding to the frames included in the input motion clip may be obtained.

상기 입력 모션 클립은 보정 대상 모션 클립에서 추출된 미리 정해진 개수의 프레임들을 포함할 수 있다.The input motion clip may include a predetermined number of frames extracted from the motion clip to be corrected.

상기 프로세서는, 상기 보정 대상 모션 클립에서 상기 입력 모션 클립을 상기 출력 모션 클립으로 변경하고, 상기 변경된 보정 대상 모션 클립에서 미리 정해진 개수의 프레임들을 포함하는 새로운 입력 모션 클립을 추출할 수 있다.The processor may change the input motion clip into the output motion clip in the motion clip to be corrected and extract a new input motion clip including a predetermined number of frames from the changed motion clip to be corrected.

도 1은 일 실시 예에 따른 모션 클립의 보정을 위한 모델의 개요를 예시한 도면이다.
도 2는 일 실시 예에 따른 모션 클립의 보정을 위한 모델의 구조를 예시한 도면이다.
도 3은 일 실시 예에 따른 모션 클립의 보정을 위한 모델에 기초하여 보정 대상 모션 클립을 연쇄적으로 보정하는 방법을 설명하기 위한 도면이다.
도 4는 일 실시 예에 따른 모션 클립의 보정을 위한 모델의 학습 방법의 동작 흐름도이다.
도 5는 일 실시 예에 따른 학습 데이터 생성 방법을 설명하기 위한 도면이다.
도 6은 일 실시 예에 따른 모션 클립의 보정을 위한 모델의 학습을 위한 손실 함수를 설명하기 위한 도면이다.
도 7은 일 실시 예에 따른 장치의 구성의 예시도이다.Figure 1 is a diagram illustrating an outline of a model for correction of a motion clip according to an embodiment.
Figure 2 is a diagram illustrating the structure of a model for correction of a motion clip according to an embodiment.
FIG. 3 is a diagram illustrating a method of serially correcting a motion clip to be corrected based on a model for correcting a motion clip according to an embodiment.
Figure 4 is an operation flowchart of a method for learning a model for correcting a motion clip according to an embodiment.
Figure 5 is a diagram for explaining a method of generating learning data according to an embodiment.
FIG. 6 is a diagram illustrating a loss function for learning a model for correcting a motion clip according to an embodiment.
Figure 7 is an exemplary diagram of the configuration of a device according to an embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific disclosed embodiments, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, and are intended to indicate the presence of one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art. Terms as defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or overly formal sense. No.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시 예에 따른 모션 클립의 보정을 위한 모델의 개요를 예시한 도면이다.Figure 1 is a diagram illustrating an outline of a model for correction of a motion clip according to an embodiment.

도 1을 참조하면, 일 실시 예에 따른 모션 클립의 보정을 위한 모델(110)은 입력된 모션 클립에 대응하는 보정된 모션 클립을 출력하는 모델에 해당할 수 있다. 이하에서, '모션 클립의 보정을 위한 모델'은 간략하게 '모델'로 지칭될 수 있다.Referring to FIG. 1, the model 110 for correcting a motion clip according to an embodiment may correspond to a model that outputs a corrected motion clip corresponding to an input motion clip. Hereinafter, 'model for correction of motion clip' may be simply referred to as 'model'.

모션 클립은 객체의 포즈에 대응하는 일련의 프레임들을 포함할 수 있다. 다시 말해, 모션 클립에 포함된 각 프레임은 객체의 포즈를 포함할 수 있다. 객체는 형체의 변화가 있거나 위치의 변화가 있는 물체로, 예를 들어 사람, 동물, 로봇, 캐릭터를 포함할 수 있다. 객체는 적어도 하나의 관절을 포함할 수 있다. 객체의 관절은 객체가 차지하는 공간 내의 지점으로, 움직임의 단위를 결정하는 기준에 해당할 수 있다.A motion clip may include a series of frames corresponding to the pose of an object. In other words, each frame included in the motion clip may include the pose of the object. Objects are objects that have a change in shape or position, and may include, for example, people, animals, robots, and characters. An object may include at least one joint. The joints of an object are points within the space occupied by the object and may correspond to the standard for determining the unit of movement.

객체의 포즈는 객체의 자세 및/또는 자세의 변화를 지시하는 정보로, 예를 들어 객체를 구성하는 관절의 위치 및/또는 위치의 변화에 관한 정보를 포함할 수 있다.The pose of an object is information indicating the posture and/or change in posture of the object, and may include, for example, information regarding the position and/or change in position of joints constituting the object.

일 예로, t를 프레임의 인덱스라고 할 때, t번째 프레임에 대응하는 객체의 포즈 x_t는 아래의 수학식 1과 같이 정의될 수 있다.For example, when t is an index of a frame, the pose x _t of the object corresponding to the tth frame can be defined as Equation 1 below.

수학식 1에서, 객체의 포즈 x_t에 포함된 r_t는 이전 프레임(예: t-1번째 프레임)을 기준으로 한 객체의 루트의 변위를 지시하는 벡터일 수 있다. 객체의 루트는 객체의 포즈를 특정하기 위한 기준이 되는 지점으로, 예를 들어 객체의 힙 관절(hip joint)을 바닥면(예: xyz 좌표계의 xy평면)에 프로젝션(projection)한 지점을 의미할 수 있다. p_t는 객체의 루트의 위치를 원점으로 한 객체의 각 관절의 좌표 를 지시하는 벡터일 수 있다. q_t는 객체의 각 관절의 부모 관절을 기준으로 한 각도를 지시하는 벡터일 수 있다. v_t는 이전 프레임을 기준으로 한 객체의 각 관절의 속도를 지시하는 벡터일 수 있다. c_t는 객체의 일부 관절이 바닥면에 붙어있는지 여부를 지시하는 벡터일 수 있다. 예를 들어, c_t는 객체의 오른발 발끝 관절, 오른발 뒤꿈치 관절, 왼발 발끝 관절 및 왼발 뒤꿈치 관절 각각이 바닥면에 붙어있는지 여부를 지시하는 벡터일 수 있다.In Equation 1, r _t included in the pose x _t of the object may be a vector indicating the displacement of the root of the object based on the previous frame (e.g., t-1th frame). The root of an object is a reference point for specifying the pose of an object. For example, it may mean the point where the object's hip joint is projected onto the floor (e.g., the xy plane of the xyz coordinate system). You can. p _t may be a vector indicating the coordinates of each joint of the object with the origin of the root position of the object. q _t may be a vector indicating the angle of each joint of the object relative to the parent joint. v _t may be a vector indicating the speed of each joint of the object based on the previous frame. c _t may be a vector indicating whether some joints of the object are attached to the floor surface. For example, c _t may be a vector indicating whether each of the object's right toe joint, right heel joint, left toe joint, and left heel joint is attached to the floor surface.

모델(110)에 입력되는 모션 클립인 입력 모션 클립(101)은 보정 대상의 모션 클립(102)에서 추출된 T개(예: 60개)의 프레임을 포함할 수 있다. 입력 모션 클립(101)에 포함된 T개의 프레임은 시간적으로 연속된 일련의 프레임에 해당할 수 있다. 입력 모션 클립(101)에 포함되는 프레임의 개수는 미리 정해진 값으로, 예를 들어 모델(110)의 학습 과정에서 입력된 데이터의 프레임 개수로 결정될 수 있다. The input motion clip 101, which is a motion clip input to the model 110, may include T (eg, 60) frames extracted from the motion clip 102 to be corrected. T frames included in the input motion clip 101 may correspond to a series of temporally consecutive frames. The number of frames included in the input motion clip 101 is a predetermined value, and may be determined, for example, by the number of frames of data input during the learning process of the model 110.

일 예로, 입력 모션 클립(101)은 T개의 프레임에 대응하는 T개의 객체의 포즈의 집합으로 아래의 수학식 2와 같이 정의될 수 있다.As an example, the input motion clip 101 can be defined as Equation 2 below as a set of poses of T objects corresponding to T frames.

모델(110)의 출력 데이터인 출력 모션 클립(103)은 입력 모션 클립(101)을 보정한 모션 클립에 해당할 수 있다. 출력 모션 클립(103)에 포함된 프레임의 개수는 입력 모션 클립(101)에 포함된 프레임의 개수와 동일할 수 있다. 출력 모션 클립(103)에 포함된 프레임들 각각에 대응하는 포즈는 입력 모션 클립(101)에 포함된 프레임들 각각에 대응하는 포즈를 보정하여 획득된 포즈에 해당할 수 있다.The output motion clip 103, which is output data of the model 110, may correspond to a motion clip obtained by correcting the input motion clip 101. The number of frames included in the output motion clip 103 may be the same as the number of frames included in the input motion clip 101. The pose corresponding to each of the frames included in the output motion clip 103 may correspond to a pose obtained by correcting the pose corresponding to each of the frames included in the input motion clip 101.

일 실시 예에 따른 모션 클립의 보정을 위한 모델(110)은 정답 모션 클립으로부터 변형된 모션을 포함하도록 생성된 학습 데이터로부터, 미리 정의된 손실 함수에 기초하여 정답 모션 클립을 출력하도록 학습된 뉴럴 네트워크를 포함할 수 있다. 정답 모션 클립은 포즈의 연결이 자연스러운 일련의 프레임들을 포함할 수 있다. 정답 모션 클립은 객체의 실제 움직임과 유사한 객체의 포즈들의 시퀀스를 포함할 수 있다.The model 110 for correcting a motion clip according to an embodiment is a neural network trained to output the correct motion clip based on a predefined loss function from learning data generated to include motion modified from the correct answer motion clip. may include. Correct Answer: A motion clip can contain a series of frames with natural pose connections. Correct Answer A motion clip may include a sequence of poses of an object that are similar to the actual movement of the object.

보정 대상 모션 클립(102)은 자연스럽게 연결되지 않는 복수의 포즈들의 시퀀스에 대응하는 프레임들을 포함할 수 있다. 보정 대상 모션 클립(102)의 일부 프레임은 입력 모션 클립(101)으로 추출되어 학습된 모델(110)에 입력될 수 있다. 학습된 모델(110)은 입력 모션 클립(101)을 실제 객체의 움직임과 유사한 포즈들의 시퀀스를 포함하도록 보정한 모션 클립을 출력할 수 있다. 모델(110)의 구체적인 구조 및 학습 방법에 관하여는 이하에서 상술한다.The motion clip 102 to be corrected may include frames corresponding to a sequence of a plurality of poses that are not naturally connected. Some frames of the motion clip 102 to be corrected may be extracted as the input motion clip 101 and input into the learned model 110. The learned model 110 may output a motion clip in which the input motion clip 101 is corrected to include a sequence of poses similar to the movement of an actual object. The specific structure and learning method of the model 110 will be described in detail below.

도 2는 일 실시 예에 따른 모션 클립의 보정을 위한 모델의 구조를 예시한 도면이다.Figure 2 is a diagram illustrating the structure of a model for correction of a motion clip according to an embodiment.

도 2를 참조하면, 일 실시 예에 따른 모션 클립의 보정을 위한 모델은 인코더(encoder)(210), 보정기(refiner)(220) 및 디코더(decoder)(230)를 포함할 수 있다. 보정기(220)는 예를 들어 BGRU(bidirectional gated recurrent unit)를 포함할 수 있다. Referring to FIG. 2, a model for correcting a motion clip according to an embodiment may include an encoder 210, a refiner 220, and a decoder 230. Compensator 220 may include, for example, a bidirectional gated recurrent unit (BGRU).

모델은 입력 모션 클립(101)을 프레임 단위로 처리할 수 있다. 도 2는 인코더(210), 보정기(220) 및 디코더(230)가 물리적으로 프레임 개수만큼 포함된 모델의 구조를 도시한 것이 아니라, 입력 모션 클립(101)에 포함된 각 프레임의 포즈(예: x₁, x_t 혹은 x_T)이 인코더(210), 보정기(220) 및 디코더(230)에 인가되어, 각 프레임에 대응하는 보정된 포즈(예: 혹은 )를 획득하고, 보정된 포즈들(예: 및 )을 포함하는 출력 모션 클립(103)을 생성하는 모델의 논리적 구조를 도시한 것이다.The model can process the input motion clip 101 on a frame-by-frame basis. Figure 2 does not show the structure of a model in which the encoder 210, corrector 220, and decoder 230 are physically included as the number of frames, but rather shows the pose of each frame included in the input motion clip 101 (e.g. x ₁ , x _t _or or ) and obtain calibrated poses (e.g. and ) shows the logical structure of a model that generates an output motion clip 103 including.

일 실시 예에 따른 모델에 기초한 모션 클립의 보정 방법은 객체의 포즈에 대응하는 일련의 프레임들을 포함하는 입력 모션 클립(101)을 인코더(210)에 인가하여, 입력 모션 클립(101)에 포함된 프레임들 각각에 대응하는 특징 벡터를 획득하는 단계를 포함할 수 있다. 특징 벡터는 각 프레임에 포함된 포즈(예: x₁, x_t 혹은 x_T)의 인코딩 정보를 포함할 수 있다.A model-based motion clip correction method according to an embodiment applies an input motion clip 101 containing a series of frames corresponding to the pose of an object to the encoder 210, and the input motion clip 101 includes It may include obtaining a feature vector corresponding to each of the frames. The feature vector may include encoding information of the pose (e.g., x ₁ , x _t or x _T ) included in each frame.

일 실시 예에 따른 모델에 기초한 모션 클립의 보정 방법은 특징 벡터를 보정기(예: BGRU(bidirectional gated recurrent unit))(220)에 인가하여, 제1 방향에 대응하는 제1 은닉 벡터 및 제2 방향에 대응하는 제2 은닉 벡터를 획득하는 단계를 포함할 수 있다. 일 예로, 제1 방향에 대응하는 제1 은닉 벡터는 프레임들의 시간 순서에 따라 이전 프레임의 정보를 반영하여 획득된 순방향 은닉 벡터(221)를 포함할 수 있다. 일 예로, 제2 방향에 대응하는 제2 은닉 벡터는 프레임들의 시간의 역순에 따라 이후 프레임의 정보를 반영하여 획득된 역방향 은닉 벡터(222)를 포함할 수 있다.A method for correcting a motion clip based on a model according to an embodiment applies a feature vector to a corrector (e.g., a bidirectional gated recurrent unit (BGRU)) 220 to obtain a first hidden vector corresponding to a first direction and a second direction. It may include obtaining a second hidden vector corresponding to . As an example, the first hidden vector corresponding to the first direction may include a forward hidden vector 221 obtained by reflecting information of the previous frame according to the time order of the frames. As an example, the second hidden vector corresponding to the second direction may include a reverse hidden vector 222 obtained by reflecting information of a subsequent frame in reverse chronological order of the frames.

일 실시 예에 따르면, 출력 모션 클립(103)을 획득하는 단계는 제1 은닉 벡터 및 제2 은닉 벡터를 연결하여 생성된 벡터를 디코더(230)에 인가하여, 입력 모션 클립(101)에 포함된 프레임들 각각에 대응하는 보정된 객체의 포즈(예: 혹은 )를 획득하는 단계 및 입력 모션 클립(101)에 포함된 프레임들에 대응하는 포즈들(예: 및 )을 포함하는 출력 모션 클립(103)을 획득하는 단계를 포함할 수 있다. According to one embodiment, the step of acquiring the output motion clip 103 includes applying a vector generated by concatenating the first hidden vector and the second hidden vector to the decoder 230 to obtain the output motion clip 101 included in the input motion clip 101. The pose of the calibrated object corresponding to each of the frames, e.g. or ) and poses corresponding to the frames included in the input motion clip 101 (e.g. and ) may include obtaining an output motion clip 103 including.

일 실시 예에 따른 모델에 포함된 인코더(210), 보정기(220) 및 디코더(230)는 뉴럴 네트워크에 해당할 수 있다. 다시 말해, 인코더(210), 보정기(220) 및 디코더(230)는 학습에 의해 결정되는 가중치를 포함하는 레이어(들)을 포함할 수 있다. 모델의 학습 방법은 이하에서 상술한다.The encoder 210, corrector 220, and decoder 230 included in the model according to one embodiment may correspond to a neural network. In other words, the encoder 210, corrector 220, and decoder 230 may include layer(s) including weights determined by learning. The model learning method is described in detail below.

도 3은 일 실시 예에 따른 모션 클립의 보정을 위한 모델에 기초하여 보정 대상 모션 클립을 연쇄적으로 보정하는 방법을 설명하기 위한 도면이다.FIG. 3 is a diagram illustrating a method of serially correcting a motion clip to be corrected based on a model for correcting a motion clip according to an embodiment.

도 3을 참조하면, 일 실시 예에 따른 모션 클립의 보정 방법은 보정 대상 모션 클립(302-1)에서 미리 정해진 개수의 프레임들을 포함하는 입력 모션 클립(301-1)을 추출하는 단계를 포함할 수 있다. 추출된 입력 모션 클립(301-1)을 모델(110)에 인가하여 입력 모션 클립에 포함된 포즈가 보정된 출력 모션 클립(303-1)이 획득될 수 있다.Referring to FIG. 3, a motion clip correction method according to an embodiment may include extracting an input motion clip 301-1 including a predetermined number of frames from a motion clip 302-1 to be corrected. You can. By applying the extracted input motion clip 301-1 to the model 110, an output motion clip 303-1 in which the pose included in the input motion clip is corrected can be obtained.

일 실시 예에 따른 모션 클립의 보정 방법은 보정 대상 모션 클립(302-1)에서 추출된 입력 모션 클립(301-1)을 출력 모션 클립(303-1)으로 변경하는 단계를 포함할 수 있다. 다시 말해, 보정 대상 모션 클립(302-1)에 포함된 입력 모션 클립(301-1)이 모델의 출력으로 획득된 출력 모션 클립(303-1)으로 대체됨으로써, 보정 대상 모션 클립이 변경될 수 있다. The motion clip correction method according to an embodiment may include changing the input motion clip 301-1 extracted from the motion clip 302-1 to be corrected into the output motion clip 303-1. In other words, the motion clip to be corrected can be changed by replacing the input motion clip 301-1 included in the motion clip 302-1 to be corrected with the output motion clip 303-1 obtained as the output of the model. there is.

일 실시 예에 따른 모션 클립의 보정 방법은 변경된 보정 대상 모션 클립(302-2)에서 미리 정해진 개수의 프레임들을 포함하는 새로운 입력 모션 클립(301-2)을 추출하는 단계를 포함할 수 있다. 새로 추출된 입력 모션 클립(301-2)은 이전에 추출된 입력 모션 클립(301-1)으로부터 특정 개수의 프레임 간격 이후의 모션 클립에 해당할 수 있다. 예를 들어, 새로 추출된 입력 모션 클립(301-2)의 첫번째 프레임은 이전에 추출된 입력 모션 클립(301-1)의 첫번째 프레임보다 n개(n은 임의의 자연수)의 프레임 이후의 프레임에 해당할 수 있다. 새로 추출된 입력 모션 클립(301-2)은 이전에 추출된 입력 모션 클립(301-1)과 동일한 개수의 프레임을 포함할 수 있다. A motion clip correction method according to an embodiment may include extracting a new input motion clip 301-2 including a predetermined number of frames from the changed correction target motion clip 302-2. The newly extracted input motion clip 301-2 may correspond to a motion clip after a certain number of frame intervals from the previously extracted input motion clip 301-1. For example, the first frame of the newly extracted input motion clip 301-2 is in the frame n (n is a random natural number) frames later than the first frame of the previously extracted input motion clip 301-1. It may apply. The newly extracted input motion clip 301-2 may include the same number of frames as the previously extracted input motion clip 301-1.

새로 추출된 입력 모션 클립(301-2)을 모델에 입력하여 획득된 출력 모션 클립(303-2)으로 보정 대상 모션 클립(302-2)의 입력 모션 클립(301-2)을 대체하고, 변경된 보정 대상 모션 클립에서 다시 새로운 입력 모션 클립을 추출하여 모델에 의해 보정 결과를 획득하는 동작이 보정 대상 모션 클립의 마지막 프레임까지 모델에 입력될 때까지 반복될 수 있다. The input motion clip 301-2 of the motion clip 302-2 to be corrected is replaced with the output motion clip 303-2 obtained by inputting the newly extracted input motion clip 301-2 into the model, and the changed The operation of extracting a new input motion clip from the motion clip to be corrected and obtaining a correction result by the model may be repeated until the last frame of the motion clip to be corrected is input to the model.

예를 들어, 보정 대상 모션 클립의 첫번째 프레임부터 T번째 프레임까지 T개의 프레임을 입력 모션 클립으로 추출하여 모델에 의해 보정된 출력 모션 클립을 획득할 수 있다. 보정 대상 모션 클립의 첫번째 프레임부터 T번째 프레임은 출력 모션 클립으로 대체됨으로써, 보정 대상 모션 클립이 변경될 수 있다. 변경된 보정 대상 모션 클립의 (1+n) 번째 프레임부터 (1+n+T) 번째 프레임까지 T개의 프레임 새로운 입력 모션 클립으로 추출될 수 있으며, 모델에 의해 보정된 출력 모션 클립을 획득하여 보정 대상 모션 클립이 변경될 수 있다. 변경된 보정 대상 모션 클립에서 다시 입력 모션 클립을 추출하여 모델에 의해 보정되고, 보정된 출력 모션 클립으로 보정 대상 모션 클립을 변경하는 동작이 보정 대상 클립의 마지막 프레임까지 모델에 입력될 때까지 반복될 수 있다.For example, T frames from the first frame to the Tth frame of the motion clip to be corrected can be extracted as input motion clips to obtain an output motion clip corrected by the model. The motion clip to be corrected may be changed by replacing the first to T frames of the motion clip to be corrected with the output motion clip. T frames from the (1+n)th frame to the (1+n+T)th frame of the changed motion clip to be corrected can be extracted as a new input motion clip, and the output motion clip corrected by the model is acquired to target the correction. Motion clips may change. The input motion clip is extracted again from the changed compensation target motion clip, corrected by the model, and the operation of changing the compensation target motion clip to the corrected output motion clip can be repeated until the last frame of the compensation target clip is input to the model. there is.

도 4는 일 실시 예에 따른 모션 클립의 보정을 위한 모델의 학습 방법의 동작 흐름도이다.Figure 4 is an operation flowchart of a method for learning a model for correcting a motion clip according to an embodiment.

도 4를 참조하면, 모션 클립의 보정을 위한 모델의 학습 방법은 정답 모션 클립의 일부 프레임을 쿼리로 한 모션 클립의 집합에 대응하는 공간에서의 최근접 이웃 탐색(nearest neighborhood search)에 기초하여, 쿼리에 대응하는 근접 모션 클립을 획득하는 단계(410)를 포함할 수 있다. 예를 들어, 도 5를 참조하면, 쿼리(501)는 정답 모션 클립(503)의 후반부 프레임들로 결정될 수 있다. Referring to FIG. 4, the model learning method for correction of motion clips is based on nearest neighborhood search in the space corresponding to a set of motion clips using some frames of the correct motion clip as a query, It may include a step 410 of acquiring a proximity motion clip corresponding to the query. For example, referring to FIG. 5 , the query 501 may be determined from the latter frames of the correct motion clip 503 .

근접 모션 클립은 모션 클립의 집합에 포함된 모션 클립의 원소들 중 쿼리와 유사한 것으로 판단된 원소로 결정될 수 있다. 모션 클립의 집합에 포함된 원소들 중 쿼리와 유사한 원소는 최근접 이웃 탐색 알고리즘에 기초하여 판단될 수 있다. 예를 들어, 최근접 이웃 탐색은 k-최근접 이웃 탐색을 포함할 수 있다. 일 예로, 모션 클립의 집합에 대응하는 공간에서 쿼리와 거리가 가장 가까운 원소가 근접 모션 클립으로 결정될 수 있다.The close motion clip may be determined as an element determined to be similar to the query among the elements of the motion clip included in the set of motion clips. Among the elements included in the set of motion clips, elements similar to the query may be determined based on a nearest neighbor search algorithm. For example, nearest neighbor search may include k-nearest neighbor search. As an example, the element with the closest distance to the query in the space corresponding to the set of motion clips may be determined as the closest motion clip.

일 실시 예에 따른 근접 모션 클립을 획득하는 단계는 제1 인덱스의 프레임에 포함된 포즈와 제2 인덱스의 프레임에 포함된 포즈에 서로 다른 가중치를 부가하여 모션 클립의 집합에 포함된 각 원소와 쿼리의 거리를 계산하는 단계 및 계산된 거리에 기초하여, 모션 클립의 집합에 포함된 적어도 하나의 원소를 근접 모션 클립으로 결정하는 단계를 포함할 수 있다. 일 예로, 쿼리의 거리를 계산하는 단계는 제2 인덱스보다 앞선 제1 인덱스에 대응하여, 제1 인덱스의 프레임보다 제2 인덱스의 프레임에 높은 가중치를 부가하는 단계를 포함할 수 있다.The step of acquiring a close motion clip according to an embodiment includes adding different weights to the pose included in the frame of the first index and the pose included in the frame of the second index, and querying each element included in the set of motion clips. It may include calculating the distance of and determining at least one element included in the set of motion clips as a close motion clip based on the calculated distance. For example, calculating the query distance may include adding a higher weight to the frame of the second index than to the frame of the first index, in response to the first index preceding the second index.

예를 들어, 도 5를 참조하면, 첫번째 프레임(504)에 낮은 가중치(예: 0.1)를 부가하고, 마지막 프레임(505)에 높은 가중치(예: 1)를 부가하여 모션 클립의 집합에 대응하는 공간에서 쿼리(501)와 모션 클립의 집합의 각 원소의 거리가 계산될 수 있다. 일 예로, 쿼리(501)와 모션 클립의 집합의 각 원소의 거리는 k-최근접 이웃 탐색 알고리즘(510)에 기초하여 계산될 수 있다. 계산된 거리에 기초하여, 쿼리와 거리가 가장 가까운 것으로 판단된 모션 클립의 집합에 포함된 원소가 근접 모션 클립(521)으로 결정될 수 있다. 쿼리(501)와 앞부분 포즈의 유사도보다 뒷부분 포즈의 유사도가 큰 모션 클립이 쿼리에 대응하는 근접 모션 클립(521)으로 결정될 수 있다.For example, referring to FIG. 5, a low weight (e.g., 0.1) is added to the first frame 504, and a high weight (e.g., 1) is added to the last frame 505 to create a weight corresponding to a set of motion clips. The distance of each element of the query 501 and the set of motion clips in space may be calculated. As an example, the distance between the query 501 and each element of the set of motion clips may be calculated based on the k-nearest neighbor search algorithm 510. Based on the calculated distance, the element included in the set of motion clips determined to have the closest distance to the query may be determined as the close motion clip 521. A motion clip in which the similarity between the query 501 and the rear pose is greater than the similarity between the front pose and the query 501 may be determined as the close motion clip 521 corresponding to the query.

다시 도 4를 참조하면, 일 실시 예에 따른 모델의 학습 방법은 근접 모션 클립 및 쿼리로 생성되지 않은 정답 모션 클립의 나머지 프레임을 포함하는 학습 데이터를 생성하는 단계(420)를 포함할 수 있다. 예를 들어, 도 5를 참조하면, 학습 데이터(523)는 쿼리(501)로 생성되지 않은 정답 모션 클립의 전반부 프레임들(502)을 전반부에 포함하고, 근접 모션 클립(521)을 후반부에 포함하는 모션 클립으로 생성될 수 있다.Referring again to FIG. 4, the model learning method according to one embodiment may include a step 420 of generating learning data including the proximity motion clip and the remaining frames of the correct motion clip that were not generated by the query. For example, referring to Figure 5, the training data 523 includes the first half frames 502 of the correct motion clip that were not generated by the query 501 in the first half, and the proximity motion clip 521 in the second half. can be created as a motion clip.

일 실시 예에 따른 모델의 학습 방법은 손실 함수에 기초하여, 모션 클립의 보정을 위한 뉴럴 네트워크를 학습시키는 단계(430)를 포함할 수 있다. 예를 들어, 도 6을 참조하면, 뉴럴 네트워크를 학습시키는 단계는 정답 모션 클립(601)에 관하여 정의된 제1 손실 함수(610)에 기초하여, 학습 데이터(602)로부터 정답 모션 클립(601)을 출력하도록 모션 클립의 보정을 위한 모델(603)의 뉴럴 네트워크를 학습시키는 단계를 포함할 수 있다. A model learning method according to an embodiment may include a step 430 of learning a neural network for correction of a motion clip based on a loss function. For example, referring to FIG. 6, the step of training the neural network is to select the correct motion clip 601 from the training data 602 based on the first loss function 610 defined with respect to the correct motion clip 601. It may include training the neural network of the model 603 for correction of the motion clip to output.

일 실시 예에 따른 제1 손실 함수는 정답 모션 클립에 포함된 프레임들에 대응하는 포즈들 및 뉴럴 네트워크의 출력 데이터에 포함된 프레임들에 대응하는 포즈들의 차이에 기초하여 결정된 손실 함수를 포함할 수 있다.The first loss function according to an embodiment may include a loss function determined based on the difference between poses corresponding to frames included in the correct motion clip and poses corresponding to frames included in the output data of the neural network. there is.

예를 들어, 도 6을 참조하면, 뉴럴 네트워크를 학습시키는 단계는 학습 데이터(602)에 대응하는 뉴럴 네트워크의 출력 데이터(604)의 첫번째 프레임 및 학습 데이터(602)의 첫번째 프레임에 관하여 정의된 제2 손실 함수(620)에 기초하여, 학습 데이터(602)의 첫번째 포즈를 포함하는 모션 클립을 출력하도록 모델(603)의 뉴럴 네트워크를 학습시키는 단계를 포함할 수 있다. For example, referring to FIG. 6, the step of training a neural network includes the first frame of the output data 604 of the neural network corresponding to the training data 602 and the first frame defined with respect to the first frame of the training data 602. 2 Based on the loss function 620, training the neural network of the model 603 to output a motion clip including the first pose of the training data 602 may be included.

일 실시 예에 따른 제2 손실 함수는 학습 데이터의 첫번째 프레임에 대응하는 포즈 및 출력 데이터의 첫번째 프레임에 대응하는 포즈의 차이에 기초하여 결정된 손실 함수를 포함할 수 있다.The second loss function according to an embodiment may include a loss function determined based on the difference between the pose corresponding to the first frame of the training data and the pose corresponding to the first frame of the output data.

일 실시 예에 따르면, 모델의 뉴럴 네트워크의 학습을 위한 손실 함수(L)는 아래의 수학식 3과 같이 정의될 수 있다.According to one embodiment, the loss function (L) for learning the neural network of the model may be defined as Equation 3 below.

손실 함수(L)는 제1 손실 함수(610)에 해당하는 L_pose 및 L_foot를 포함할 수 있다. 손실 함수(L)는 제2 손실 함수(620)에 해당하는 L_first를 포함할 수 있다.The loss function (L) may include L _pose and L _foot corresponding to the first loss function 610. The loss function (L) may include L _first corresponding to the second loss function (620).

일 예로, L_pose는 아래의 수학식 4와 같이 정의될 수 있다.As an example, L _pose can be defined as Equation 4 below.

수학식 4에서, , , 및 는 모델(603)의 출력 데이터(604)에 포함된 t번째 프레임에 대응하는 객체의 포즈로, 수학식 1에서 상술한 객체의 포즈에 대응될 수 있다. , , 및 는 정답 모션 클립(601)에 포함된 t번째 프레임에 대응하는 객체의 포즈로, 수학식 1에서 상술한 객체의 포즈에 대응될 수 있다. 보다 구체적으로, 는 모델(603)의 출력 데이터(604)에 포함된 객체의 루트의 변위, 는 정답 모션 클립(601)에 포함된 객체의 루트의 변위, 는 모델(603)의 출력 데이터(604)에 포함된 객체의 각 관절의 좌표, 는 정답 모션 클립(601)에 포함된 객체의 각 관절의 좌표, 는 모델(603)의 출력 데이터(604)에 포함된 객체의 각 관절의 부모 관절을 기준으로 한 각도, 는 정답 모션 클립(601)에 포함된 객체의 각 관절의 부모 관절을 기준으로 한 각도, 는 모델(603)의 출력 데이터(604)에 포함된 객체의 각 관절의 속도, 는 정답 모션 클립(601)에 포함된 객체의 각 관절의 속도를 지시하는 벡터에 해당할 수 있다.In equation 4, , , and is the pose of the object corresponding to the t frame included in the output data 604 of the model 603, and may correspond to the pose of the object described above in Equation 1. , , and is the pose of the object corresponding to the t-th frame included in the correct motion clip 601, and may correspond to the pose of the object described above in Equation 1. More specifically, is the displacement of the root of the object included in the output data 604 of the model 603, is the displacement of the root of the object included in the correct answer motion clip 601, is the coordinate of each joint of the object included in the output data 604 of the model 603, is the coordinate of each joint of the object included in the correct answer motion clip 601, is the angle based on the parent joint of each joint of the object included in the output data 604 of the model 603, is the angle based on the parent joint of each joint of the object included in the correct motion clip 601, is the speed of each joint of the object included in the output data 604 of the model 603, may correspond to a vector indicating the speed of each joint of the object included in the correct answer motion clip 601.

일 예로, L_foot는 아래의 수학식 5와 같이 정의될 수 있다.As an example, L _foot can be defined as Equation 5 below.

수학식 5에서, 는 모델(603)의 출력 데이터(604)에 포함된 t번째 프레임에 대응하는 객체의 포즈로, 수학식 1에서 상술한 객체의 포즈에 대응될 수 있다. 는 정답 모션 클립(601)에 포함된 t번째 프레임에 대응하는 객체의 포즈로, 수학식 1에서 상술한 객체의 포즈에 대응될 수 있다. 보다 구체적으로, 는 모델(603)의 출력 데이터(604)에 포함된 객체의 일부 관절이 바닥면에 붙어있는지 여부를 지시하는 벡터, 는 정답 모션 클립(601)에 포함된 객체의 일부 관절이 바닥면에 붙어있는지 여부를 지시하는 벡터에 해당할 수 있다.In equation 5, is the pose of the object corresponding to the t frame included in the output data 604 of the model 603, and may correspond to the pose of the object described above in Equation 1. is the pose of the object corresponding to the t-th frame included in the correct motion clip 601, and may correspond to the pose of the object described above in Equation 1. More specifically, is a vector indicating whether some joints of the object included in the output data 604 of the model 603 are attached to the floor surface, may correspond to a vector indicating whether some joints of the object included in the correct answer motion clip 601 are attached to the floor surface.

일 예로, L_first는 아래의 수학식 6과 같이 정의될 수 있다.As an example, L _first can be defined as Equation 6 below.

수학식 6에서, , , 및 는 모델(603)의 출력 데이터(604)에 포함된 첫번째 프레임에 대응하는 객체의 포즈로, 수학식 1에서 상술한 객체의 포즈에 대응될 수 있다. , , 및 는 학습 데이터(602)에 포함된 첫번째 프레임에 대응하는 객체의 포즈로, 수학식 1에서 상술한 객체의 포즈에 대응될 수 있다. 보다 구체적으로, 는 모델(603)의 출력 데이터(604)의 첫번째 프레임에 포함된 객체의 루트의 변위, 는 학습 데이터(602)의 첫번째 프레임에 포함된 객체의 루트의 변위, 는 모델(603)의 출력 데이터(604)의 첫번째 프레임에 포함된 객체의 각 관절의 좌표, 는 학습 데이터(602)의 첫번째 프레임에 포함된 객체의 각 관절의 좌표, 는 모델(603)의 출력 데이터(604)의 첫번째 프레임에 포함된 객체의 각 관절의 부모 관절을 기준으로 한 각도, 는 학습 데이터(602)의 첫번째 프레임에 포함된 객체의 각 관절의 부모 관절을 기준으로 한 각도, 는 모델(603)의 출력 데이터(604)의 첫번째 프레임에 포함된 객체의 각 관절의 속도, 는 학습 데이터(602)의 첫번째 프레임에 포함된 객체의 각 관절의 속도를 지시하는 벡터에 해당할 수 있다.In equation 6, , , and is the pose of the object corresponding to the first frame included in the output data 604 of the model 603, and may correspond to the pose of the object described above in Equation 1. , , and is the pose of the object corresponding to the first frame included in the learning data 602, and may correspond to the pose of the object described above in Equation 1. More specifically, is the displacement of the root of the object included in the first frame of the output data 604 of the model 603, is the displacement of the root of the object included in the first frame of the learning data 602, is the coordinate of each joint of the object included in the first frame of the output data 604 of the model 603, is the coordinate of each joint of the object included in the first frame of the learning data 602, is the angle based on the parent joint of each joint of the object included in the first frame of the output data 604 of the model 603, is the angle based on the parent joint of each joint of the object included in the first frame of the learning data 602, is the speed of each joint of the object included in the first frame of the output data 604 of the model 603, may correspond to a vector indicating the speed of each joint of the object included in the first frame of the learning data 602.

도 7은 일 실시 예에 따른 장치의 구성의 예시도이다.Figure 7 is an exemplary diagram of the configuration of a device according to an embodiment.

도 7을 참조하면, 장치(700)는 프로세서(701), 메모리(703) 및 입출력 장치(705)를 포함한다. 일 실시 예에 따른 장치(700)는 도 1 내지 도 6을 통하여 전술한 모션 클립의 보정 방법을 수행하는 장치를 포함할 수 있다. 일 실시 예에 따른 장치(700)는 도 1 내지 도 6을 통하여 전술한 모션 클립의 보정을 위한 모델의 학습 방법을 수행하는 장치를 포함할 수 있다.Referring to FIG. 7, the device 700 includes a processor 701, a memory 703, and an input/output device 705. The device 700 according to one embodiment may include a device that performs the motion clip correction method described above with reference to FIGS. 1 to 6 . The device 700 according to an embodiment may include a device that performs the method of learning a model for correcting a motion clip described above with reference to FIGS. 1 to 6 .

일 실시 예에 따른 프로세서(701)는 도 1 내지 도 6을 통하여 전술한 적어도 하나의 동작을 수행할 수 있다. 예를 들어, 프로세서(701)는 객체의 포즈에 대응하는 일련의 프레임들을 포함하는 입력 모션 클립을 인코더에 인가하여, 입력 모션 클립에 포함된 프레임들 각각에 대응하는 특징 벡터를 획득하는 동작, 특징 벡터를 BGRU(bidirectional gated recurrent unit)에 인가하여, 제1 방향에 대응하는 제1 은닉 벡터 및 제2 방향에 대응하는 제2 은닉 벡터를 획득하는 동작 및 제1 은닉 벡터 및 제2 은닉 벡터를 디코더에 인가하여, 입력 모션 클립이 보정된 출력 모션 클립을 획득하는 동작 중 적어도 하나를 수행할 수 있다. 예를 들어, 프로세서(701)는 모션 클립의 보정을 위한 모델을 학습시키기 위한 동작을 수행할 수 있다. 프로세서(701)는 정답 모션 클립의 일부 프레임을 쿼리로 한 모션 클립의 집합에 대응하는 공간에서의 최근접 이웃 탐색에 기초하여, 쿼리에 대응하는 근접 모션 클립을 획득하는 동작, 근접 모션 클립 및 쿼리로 생성되지 않은 정답 모션 클립의 나머지 프레임을 포함하는 학습 데이터를 생성하는 동작, 및 정답 모션 클립에 관하여 정의된 제1 손실 함수에 기초하여, 학습 데이터로부터 정답 모션 클립을 출력하도록 모션 클립의 보정을 위한 모델의 뉴럴 네트워크를 학습시키는 동작 중 적어도 하나를 수행할 수 있다.The processor 701 according to one embodiment may perform at least one operation described above with reference to FIGS. 1 to 6 . For example, the processor 701 applies an input motion clip including a series of frames corresponding to the pose of an object to the encoder, and obtains a feature vector corresponding to each of the frames included in the input motion clip. Applying a vector to a bidirectional gated recurrent unit (BGRU) to obtain a first hidden vector corresponding to a first direction and a second hidden vector corresponding to a second direction, and decoding the first hidden vector and the second hidden vector By applying , at least one operation of obtaining an output motion clip in which the input motion clip has been corrected may be performed. For example, the processor 701 may perform an operation to learn a model for correcting a motion clip. The processor 701 performs an operation of obtaining a close motion clip corresponding to the query, based on a nearest neighbor search in a space corresponding to a set of motion clips using some frames of the correct answer motion clip as a query, the close motion clip, and the query. Generating training data including the remaining frames of the correct motion clip that were not generated with the correct answer motion clip, and Compensating the motion clip to output the correct motion clip from the training data, based on a first loss function defined with respect to the correct answer motion clip. At least one of the operations of training a neural network of a model may be performed.

일 실시 예에 따른 메모리(703)는 휘발성 메모리 또는 비휘발성 메모리일 수 있으며, 도 1 내지 도 6을 통하여 전술한 모션 클립의 보정 방법 및/또는 모션 클립의 보정을 위한 모델에 포함된 뉴럴 네트워크의 학습 방법에 관한 데이터를 저장할 수 있다. 일 예로, 메모리(703)는 모션 클립의 보정 방법의 수행 과정에서 발생한 데이터 혹은 모션 클립의 보정 방법을 수행하기 위하여 필요한 데이터를 저장할 수 있다. 일 예로, 메모리(703)는 모션 클립의 보정을 위한 모델에 포함된 뉴럴 네트워크의 학습 방법의 수행 과정에서 발생한 데이터 혹은 모션 클립의 보정을 위한 모델에 포함된 뉴럴 네트워크의 학습 방법을 수행하기 위하여 필요한 데이터를 저장할 수 있다. 예를 들어, 메모리(703)는 모션 클립의 보정을 위한 모델에 포함된 뉴럴 네트워크의 학습을 위한 학습 데이터를 저장할 수 있으며, 학습된 모델의 뉴럴 네트워크에 포함된 레이어 간의 가중치(들)을 저장할 수 있다.The memory 703 according to an embodiment may be a volatile memory or a non-volatile memory, and may be used in the motion clip correction method described above through FIGS. 1 to 6 and/or the neural network included in the model for motion clip correction. Data about learning methods can be stored. As an example, the memory 703 may store data generated in the process of performing a motion clip correction method or data required to perform a motion clip correction method. As an example, the memory 703 may contain data generated in the process of performing a learning method of a neural network included in a model for compensating a motion clip or necessary to perform a learning method of a neural network included in a model for compensating a motion clip. Data can be saved. For example, the memory 703 may store training data for learning a neural network included in a model for correction of a motion clip, and may store weight(s) between layers included in the neural network of the learned model. there is.

일 실시 예에 따른 장치(700)는 입출력 장치(705)를 통하여 외부 장치(예를 들어, 유저의 단말, 서버 또는 네트워크)에 연결되고, 데이터를 교환할 수 있다. 예를 들어, 입력 장치를 통해 보정 대상 모션 클립이 입력될 수 있고, 출력 장치를 통해 모델에 의해 보정된 보정 대상 모션 클립이 출력될 수 있다. The device 700 according to one embodiment is connected to an external device (eg, a user terminal, a server, or a network) through the input/output device 705 and can exchange data. For example, a motion clip to be corrected may be input through an input device, and a motion clip to be corrected corrected by a model may be output through an output device.

일 실시 예에 따르면, 메모리(703)는 도 1 내지 도 6을 통하여 전술한 모션 클립의 보정 방법이 구현된 프로그램을 저장할 수 있다. 메모리(703)는 도 1 내지 도 6을 통하여 전술한 모션 클립의 보정을 위한 모델의 학습 방법이 구현된 프로그램을 저장할 수 있다. 프로세서(701)는 메모리(703)에 저장된 프로그램을 실행하고, 장치(700)를 제어할 수 있다. 프로세서(701)에 의하여 실행되는 프로그램의 코드는 메모리(703)에 저장될 수 있다.According to one embodiment, the memory 703 may store a program that implements the motion clip correction method described above through FIGS. 1 to 6. The memory 703 may store a program in which the model learning method for correcting motion clips described above with reference to FIGS. 1 to 6 is implemented. The processor 701 can execute a program stored in the memory 703 and control the device 700. The code of the program executed by the processor 701 may be stored in the memory 703.

일 실시 예에 따른 장치(700)는 도시되지 않은 다른 구성 요소들을 더 포함할 수 있다. 예를 들어, 장치(700)는 네트워크를 통해 장치(700)가 다른 전자 기기 또는 다른 서버와 통신하기 위한 기능을 제공하는 통신 모듈을 포함할 수 있다. 또 예를 들어, 장치(700)는 트랜시버(transceiver), 각종 센서, 데이터베이스 등과 같은 다른 구성 요소들을 더 포함할 수도 있다.The device 700 according to an embodiment may further include other components not shown. For example, the device 700 may include a communication module that provides a function for the device 700 to communicate with other electronic devices or other servers through a network. Also, for example, the device 700 may further include other components such as a transceiver, various sensors, and a database.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using a general-purpose computer or a special-purpose computer, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. It can be embodied permanently or temporarily. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. A computer-readable medium may store program instructions, data files, data structures, etc., singly or in combination, and the program instructions recorded on the medium may be specially designed and constructed for the embodiment or may be known and available to those skilled in the art of computer software. there is. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or multiple software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

Applying an input motion clip including a series of frames corresponding to the pose of an object to an encoder, and obtaining a feature vector corresponding to each of the frames included in the input motion clip;
applying the feature vector to a bidirectional gated recurrent unit (BGRU) to obtain a first hidden vector corresponding to a first direction and a second hidden vector corresponding to a second direction; and
Applying the first hidden vector and the second hidden vector to a decoder to obtain an output motion clip in which the input motion clip is corrected.
Including,
A model for correction of a motion clip including the encoder, the BGRU, and the decoder is configured to output the correct answer motion clip based on a predefined loss function from learning data generated to include motion modified from the correct answer motion clip. Contains a learned neural network,
The loss function is
a first loss function defined with respect to the correct motion clip; and
A second loss function defined with respect to the first frame of the output motion clip and the first frame of the training data
Including,
How to correct motion clips.

According to paragraph 1,
The learning data is
Containing a remaining part of the correct motion clip and a nearby motion clip obtained based on a nearest neighborhood search in a space corresponding to a set of motion clips querying some frames of the correct motion clip,
How to correct motion clips.

According to paragraph 2,
The close motion clip is
Extracted from the set based on the distance between the query and the elements included in the set of motion clips calculated by adding different weights to the pose included in the frame of the first index and the pose included in the frame of the second index. Containing at least one element of
How to correct motion clips.

According to paragraph 1,
The first loss function is
a loss function determined based on the difference between poses corresponding to frames included in the correct answer motion clip and poses corresponding to frames included in the output motion clip;
The second loss function is
Comprising a loss function determined based on the difference between the pose corresponding to the first frame of the learning data and the pose corresponding to the first frame of the output motion clip,
How to correct motion clips.

According to paragraph 1,
The step of obtaining the output motion clip is
applying a vector generated by concatenating the first hidden vector and the second hidden vector to the decoder to obtain a corrected pose of the object corresponding to each of the frames included in the input motion clip; and
Obtaining the output motion clip including poses corresponding to the frames included in the input motion clip.
Including,
How to correct motion clips.

According to paragraph 1,
Extracting the input motion clip including a predetermined number of frames from the motion clip to be corrected.
Containing more,
How to correct motion clips.

According to clause 6,
changing the input motion clip in the correction target motion clip into the output motion clip; and
Extracting a new input motion clip including a predetermined number of frames from the changed motion clip to be corrected.
Containing more,
How to correct motion clips.

In the learning method of the neural network included in the model for correction of motion clips,
Obtaining a nearby motion clip corresponding to the query based on a nearest neighbor search in a space corresponding to a set of motion clips using some frames of the correct motion clip as a query;
generating learning data including remaining frames of the proximate motion clip and the correct motion clip not generated by the query; and
Based on a loss function, training the neural network for correction of a motion clip to output the correct motion clip from the learning data.
Including,
The loss function is
a first loss function defined with respect to the correct motion clip; and
A first frame of output data of the neural network corresponding to the training data and a second loss function defined with respect to the first frame of the training data
Including,
How to learn.

According to clause 8,
The first loss function is
Comprising a loss function determined based on the difference between poses corresponding to frames included in the correct motion clip and poses corresponding to frames included in output data of the neural network,
How to learn.

delete

According to clause 8,
The second loss function is
Comprising a loss function determined based on the difference between the pose corresponding to the first frame of the learning data and the pose corresponding to the first frame of the output data,
How to learn.

According to clause 8,
The step of acquiring the proximity motion clip is
Calculating a distance between each element included in the set of motion clips and the query by adding different weights to the pose included in the frame of the first index and the pose included in the frame of the second index; and
Based on the calculated distance, determining at least one element included in the set of motion clips as the proximity motion clip.
Including,
How to learn.

According to clause 12,
The step of calculating the distance of the query is
Corresponding to the first index preceding the second index, adding a higher weight to the frame of the second index than to the frame of the first index.
Containing more,
How to learn.

A computer program combined with hardware and stored in a medium to execute the method of any one of claims 1 to 9 and 11 to 13.

Applying an input motion clip including a series of frames corresponding to the pose of an object to an encoder, obtaining a feature vector corresponding to each of the frames included in the input motion clip,
Applying the feature vector to a bidirectional gated recurrent unit (BGRU) to obtain a first hidden vector corresponding to a first direction and a second hidden vector corresponding to a second direction,
Applying the first hidden vector and the second hidden vector to a decoder to obtain an output motion clip in which the input motion clip is corrected,
Contains at least one processor,
A model for correction of a motion clip including the encoder, the BGRU, and the decoder is configured to output the correct answer motion clip based on a predefined loss function from learning data generated to include motion modified from the correct answer motion clip. Contains a learned neural network,
The loss function is
a first loss function defined with respect to the correct motion clip; and
A second loss function defined with respect to the first frame of the output motion clip and the first frame of the training data
Including,
Device.

According to clause 15,
The learning data is
Containing a close motion clip obtained based on a nearest neighbor search in a space corresponding to a set of motion clips querying some frames of the correct motion clip and the remaining part of the correct motion clip,
Device.

According to clause 16,
The close motion clip is
Extracted from the set based on the distance between the query and the elements included in the set of motion clips calculated by adding different weights to the pose included in the frame of the first index and the pose included in the frame of the second index. Containing at least one element of
Device.

According to clause 15,
The first loss function is
a loss function determined based on the difference between poses corresponding to frames included in the correct answer motion clip and poses corresponding to frames included in the output motion clip;
The second loss function is
Comprising a loss function determined based on the difference between the pose corresponding to the first frame of the learning data and the pose corresponding to the first frame of the output motion clip,
Device.

According to clause 15,
The processor,
In obtaining the output motion clip,
Applying a vector generated by concatenating the first hidden vector and the second hidden vector to the decoder to obtain a corrected pose of the object corresponding to each of the frames included in the input motion clip,
Obtaining the output motion clip containing poses corresponding to the frames included in the input motion clip,
Device.

According to clause 15,
The input motion clip includes a predetermined number of frames extracted from the motion clip to be corrected,
The processor,
Change the input motion clip in the correction target motion clip to the output motion clip,
Extracting a new input motion clip including a predetermined number of frames from the changed motion clip to be corrected,
Device.