KR102463778B1

KR102463778B1 - Method and Apparatus for Reconstruction of Holistic 3D Body from a Blurred Single Image

Info

Publication number: KR102463778B1
Application number: KR1020220032133A
Authority: KR
Inventors: 박인규; 조슈아
Original assignee: 인하대학교 산학협력단
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-11-11

Abstract

Disclosed are a method and apparatus for reconstructing a three-dimensional (3D) body model from a blurred image, which generate a new large-scale dataset to reconstruct a 3D human body model from a blurry image. According to the present invention, the method for reconstructing a 3D body model from a blurred image comprises the steps of: generating, by a dataset generation module, a synthetic dataset for motion deblurring and 3D human body model reconstruction; using the dataset, by a motion deblurring module, to generate a motion deblurred image and using a loss function to learn the motion deblurred image; using, by a structure recognition module, the learned motion deblurred image to provide structural information for integrated human body reconstruction; and using, by a human body reconstruction module, the deblurred image with the structural information to reconstruct a 3D human body model.

Description

Method and apparatus for restoring 3D human body model from image with blur {Method and Apparatus for Reconstruction of Holistic 3D Body from a Blurred Single Image}

본 발명은 블러가 존재하는 영상으로부터 3차원 인체 모델 복원 방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for reconstructing a 3D human body model from a blurred image.

인체 포즈와 형상 재구성은 인간의 행동 감지, 증강/가상 현실, 모션 캡처, 온라인 게임 등 다양한 응용 분야에서 광범위하게 사용되면서 연구계의 주목을 받았다. 최근 인체 메시 재구성의 성공 외에도, 인상적인 결과를 산출하는 단일 선명한 이미지(single sharp image)에서 통합적 인체 메시 재구성으로 전환했다. 비통합적 모델과 달리, 통합적 인체는 더 많은 파라미터를 가진 상세한 얼굴 표정과 핑거 레벨 손 포즈(finger-level hand pose)를 포함한다. Human body pose and shape reconstruction have been widely used in various applications such as human action detection, augmented/virtual reality, motion capture, and online games, drawing attention from the research community. In addition to the recent success of human body mesh reconstruction, the shift from single sharp image to integrated body mesh reconstruction yields impressive results. Unlike the non-integrative model, the integrative anatomy includes detailed facial expressions and finger-level hand poses with more parameters.

도 1은 종래기술에 따른 통합적 3D 인체 재구성의 예시를 설명하기 위한 도면이다. 1 is a diagram for explaining an example of integrated 3D human body reconstruction according to the prior art.

왼쪽부터 휴먼 모션 블러 이미지, 실측 자료의 3D 통합적 바디 메시, ExPose의 3D 통합적 인체 메시, '디블러 + 노출'의 3D 통합적 인체 메시, 제안하는 조인트 프레임워크의 결과이다. 제안하는 방법은 현재의 최신 기술(ExPose)보다 머리와 손의 자세뿐만 아니라 일반적인 모양을 더 정확하게 재구성한다.From the left, human motion blur image, 3D integrated body mesh of ground truth, 3D integrated human body mesh of ExPose, 3D integrated human body mesh of 'Deblur + Exposure', and the result of the proposed joint framework. The proposed method reconstructs the general shape as well as the posture of the head and hands more accurately than the current state-of-the-art (ExPose).

인체 메시 재구성 또는 심지어 통합적 인체 메시 재구성에 대한 최근 연구는 본질적으로 선명한 이미지를 입력으로 고려한다. 그러나 많은 경우, 야생 장면에서 캡처한 이미지는 사람과 카메라의 움직임으로 인해 흐릿해지는 경우가 많다. 결과적으로, 도 1에 도시된 것과 같은 정확한 인체 메시를 재구성하지 못한다. Recent work on human body mesh reconstruction or even integrated body mesh reconstruction considers intrinsically sharp images as input. However, in many cases, images captured from wild scenes are often blurred by human and camera movements. As a result, an accurate human body mesh as shown in FIG. 1 cannot be reconstructed.

최첨단 이미지 디블러링 방법이 주목할 만한 성능을 달성했지만 인체 메시를 재구성하기 전에 전처리 단계로 사용할 수는 없다. 이는 인간의 모션이 종종 특이한 동요를 동반하기 때문인데, 이는 모션 블러 효과의 불확실성을 증가시킨다. Although state-of-the-art image deblurring methods have achieved remarkable performance, they cannot be used as a preprocessing step prior to reconstructing the human body mesh. This is because human motion is often accompanied by a peculiar perturbation, which increases the uncertainty of the motion blur effect.

앞서 언급한 문제를 해결하기 위해 D2R이라고 하는 디블러링 및 통합적 3D 인체 재구성을 위한 공동 및 통합 프레임워크를 도입하여 두 가지 문제를 동시에 해결한다. 단일 통합 프레임워크로서 디블러링 모듈은 인간 재구성 모듈을 활용하여 인간 재구성 손실을 활용하여 성능을 향상시킨다. 디블러링 성능이 향상되는 동시에 재구성 성능도 개선된다. 더 선명한 결과를 얻을 수 있어 더 정확한 결과를 얻을 수 있기 때문이다. 그렇더라도 인간의 특정 부분에 대한 정보가 부적절하기 때문에 디블러 이미지에서 직접 인체 메시 디테일을 회귀시키기는 여전히 어렵다. 결과적으로, 구조 맵의 형태로 인체 디테일을 강조하기 위한 구조 인식 모듈을 필요로 한다. To solve the aforementioned problems, we introduce a joint and unified framework for deblurring and integrated 3D human body reconstruction, called D2R, to solve both problems simultaneously. As a single integrated framework, the deblurring module utilizes the human reconstruction module to leverage human reconstruction loss to improve performance. Deblurring performance is improved while reconstruction performance is also improved. This is because you can get sharper results, which will give you more accurate results. Even so, it is still difficult to regress the human body mesh details directly from the deblur image because the information about the specific part of the human is inadequate. As a result, a structure recognition module for emphasizing human body details in the form of a structure map is required.

본 발명이 이루고자 하는 기술적 과제는 기존의 심층 3D 통합적 인체 포즈 및 형태 재구성 방법에서 선명한 이미지를 입력으로 활용하여 흐릿한 이미지가 주어졌을 때 부정확한 인체 메시로 이어지는 문제점과 단순한 계단식 접근으로는 만족스러운 결과를 얻을 수 없는 이미지 디블러링 방법의 문제점을 개선하기 위해 휴먼 모션 디블러링과 3D 통합적 인체 재구성의 새로운 공동 프레임워크인 D2R(Deblurring-to-Reconstruction)을 제안한다. 또한, 선명한/흐릿한 이미지 쌍과 해당 3D 인체 포즈/형태를 포함하는 새로운 대규모 데이터셋을 생성하여 블러가 존재하는 영상으로부터 3차원 인체 모델 복원하는 방법 및 장치를 제공하는데 있다.The technical problem to be achieved by the present invention is to utilize a clear image as an input in the existing deep 3D integrated human body pose and shape reconstruction method, leading to an inaccurate human body mesh when a blurry image is given, and a satisfactory result with a simple cascade approach. In order to improve the problem of unobtainable image deblurring method, we propose a new joint framework of human motion deblurring and 3D integrated human body reconstruction, D2R (Deblurring-to-Reconstruction). In addition, a method and apparatus for restoring a 3D human body model from a blurry image by generating a new large-scale dataset including a clear/blurred image pair and a corresponding 3D human body pose/shape are provided.

일 측면에 있어서, 본 발명에서 제안하는 블러가 존재하는 영상으로부터의 3차원 인체 모델 복원 방법은 데이터셋 생성 모듈을 통해 모션 디블러링 및 3차원 인체 모델 재구성을 위한 합성 데이터셋(synthetic dataset)을 생성하는 단계, 상기 데이터셋을 이용하여 모션 디블러링 모듈을 통해 모션 디블러링된 이미지를 생성하고, 손실함수를 이용하여 상기 모션 디블러링된 이미지를 학습하는 단계, 상기 학습된 모션 디블러링된 이미지를 이용하여 구조 인식 모듈을 통해 통합적 인체 재구성을 위한 구조 정보를 제공하는 단계 및 상기 구조 정보로 디블러링된 이미지를 이용하여 인체 재구성 모듈을 통해 3차원 인체 모델을 재구성하는 단계를 포함한다. In one aspect, the method for restoring a 3D human body model from an image with blur proposed in the present invention generates a synthetic dataset for motion deblurring and reconstruction of a 3D human body model through a dataset generation module. Generating, generating a motion deblurred image through a motion deblurring module using the dataset, and learning the motion deblurred image using a loss function, the learned motion deblurring Providing structural information for integrated human body reconstruction through a structure recognition module using a ringed image and reconstructing a 3D human body model through a human body reconstruction module using an image deblurred with the structure information do.

상기 데이터셋 생성 모듈을 통해 모션 디블러링 및 3차원 인체 모델 재구성을 위한 합성 데이터셋(synthetic dataset)을 생성하는 단계는 평균 이미지가 계산되기 전에 프레임 속도를 증가시키기 위해 비디오 프레임 보간을 적용하고, 평균 이미지 계산 중에 인체 영역을 분할하여 배경 영역에 대한 블러 효과를 감소시키기 위한 인체 마스크를 생성하며, 통합적 3D 파라미터를 얻기 위해 유사 실측 자료(pseudo ground truth) 2D 키포인트를 생성하고, 유사 실측 자료 파라미터와 3D 키포인트를 생성하여 데이터를 필터링함으로써 모션 디블러링 및 3차원 인체 모델 재구성을 위한 합성 데이터셋(synthetic dataset)을 생성한다. Generating a synthetic dataset for motion deblurring and 3D human body model reconstruction through the dataset generation module applies video frame interpolation to increase the frame rate before an average image is calculated, During average image calculation, the human body region is segmented to create a human body mask to reduce the blur effect on the background region, pseudo ground truth 2D keypoints are created to obtain integrated 3D parameters, and pseudo ground truth parameters and By generating 3D keypoints and filtering the data, synthetic datasets are created for motion deblurring and 3D human body model reconstruction.

상기 데이터셋을 이용하여 모션 디블러링 모듈을 통해 모션 디블러링된 이미지를 생성하고, 손실함수를 이용하여 상기 모션 디블러링된 이미지를 학습하는 단계는 픽셀 단위 평균 오차 손실 함수, 지각 손실 함수, 글로벌 및 로컬 판별기 모두에 대한 적대적 손실 함수 및 인체 재구성 손실 함수를 이용하여 모션 디블러링된 이미지를 생성한다. The steps of generating a motion deblurred image through a motion deblurring module using the dataset and learning the motion deblurred image using a loss function include a pixel-wise mean error loss function and a perceptual loss function. , the adversarial loss function and the body reconstruction loss function for both global and local discriminators are used to generate motion deblurred images.

상기 구조 정보로 디블러링된 이미지를 이용하여 인체 재구성 모듈을 통해 3차원 인체 모델을 재구성하는 단계는 디블러링된 이미지로부터 투영 행렬 및 통합적 3D 파라미터를 추정하고, 상기 통합적 3D 파라미터를 회귀시키기 위해 구조 특징 및 이미지 특징을 입력으로 연결하고, 상기 이미지 특징은 컨볼루션 블록에 의해 디블러 이미지로부터 생성되고, 구조 특징 및 이미지 특징 간의 연결 특징을 동적으로 융합하기 위한 특징 융합 블록을 이용한다. The step of reconstructing a 3D human body model through a human body reconstruction module using the image deblurred with the structure information is to estimate a projection matrix and integrated 3D parameters from the deblurred image, and to regress the integrated 3D parameters. Concatenate structure features and image features as inputs, the image features are generated from the deblur image by a convolution block, and use a feature fusion block to dynamically fuse the connection features between the structure features and image features.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 블러가 존재하는 영상으로부터의 3차원 인체 모델 복원 장치는 모션 디블러링 및 3차원 인체 모델 재구성을 위한 합성 데이터셋(synthetic dataset)을 생성하는 데이터셋 생성 모듈, 상기 데이터셋을 이용하여 모션 디블러링된 이미지를 생성하고, 손실함수를 이용하여 상기 모션 디블러링된 이미지를 학습하는 모션 디블러링 모듈, 상기 학습된 모션 디블러링된 이미지를 이용하여 통합적 인체 재구성을 위한 구조 정보를 제공하는 구조 인식 모듈 및 상기 구조 정보로 디블러링된 이미지를 이용하여 3차원 인체 모델을 재구성하는 인체 재구성 모듈을 포함한다. In another aspect, the apparatus for restoring a 3D human body model from an image with blur proposed in the present invention generates a synthetic dataset for motion deblurring and reconstruction of a 3D human body model. A generation module, a motion deblurring module that generates a motion deblurred image using the dataset, and learns the motion deblurred image using a loss function, the learned motion deblurred image and a structure recognition module that provides structural information for integrated human body reconstruction using the structure information, and a human body reconstruction module that reconstructs a three-dimensional human body model using the image deblurred with the structure information.

본 발명의 실시예들에 따른 휴먼 모션 디블러링과 3D 통합적 인체 재구성의 새로운 공동 프레임워크인 D2R(Deblurring-to-Reconstruction)을 통해 기존의 심층 3D 통합적 인체 포즈 및 형태 재구성 방법에서 선명한 이미지를 입력으로 활용하여 흐릿한 이미지가 주어졌을 때 부정확한 인체 메시로 이어지는 문제점과 단순한 계단식 접근으로는 만족스러운 결과를 얻을 수 없는 이미지 디블러링 방법의 문제점을 개선할 수 있다. 선명한/흐릿한 이미지 쌍과 해당 3D 인체 포즈/형태를 포함하는 새로운 대규모 데이터셋을 생성하여 블러가 존재하는 영상으로부터 3차원 인체 모델 복원할 수 있다. Through D2R (Deblurring-to-Reconstruction), a new joint framework of human motion deblurring and 3D integrated human body reconstruction according to embodiments of the present invention, a clear image is input from the existing deep 3D integrated human body pose and shape reconstruction method It is possible to improve the problem of inaccurate human body meshing when a blurry image is given and the problem of image deblurring methods that cannot obtain satisfactory results with a simple cascade approach. A 3D human body model can be reconstructed from a blurred image by creating a new large-scale dataset containing sharp/blurred image pairs and corresponding 3D human body poses/shapes.

도 1은 종래기술에 따른 통합적 3D 인체 재구성의 예시를 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 블러가 존재하는 영상으로부터의 3차원 인체 모델 복원 방법을 설명하기 위한 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 블러가 존재하는 영상으로부터의 3차원 인체 모델 복원 장치의 구성을 나타내는 도면이다.
도 4는 본 발명의 일 실시예에 따른 네트워크의 파이프라인 개념을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 대체 훈련이 수행되는 방법 나타내는 알고리즘이다. 1 is a diagram for explaining an example of integrated 3D human body reconstruction according to the prior art.
2 is a flowchart illustrating a method of restoring a 3D human body model from an image with blur according to an embodiment of the present invention.
3 is a diagram showing the configuration of an apparatus for restoring a 3D human body model from an image with blur according to an embodiment of the present invention.
4 is a diagram for explaining a concept of a pipeline of a network according to an embodiment of the present invention.
5 is an algorithm illustrating how replacement training is performed according to one embodiment of the present invention.

통합적 인체 포즈와 형태 재구성은 얼굴 표정, 핑거 레벨의 손 모양 등 디테일한 인체 자세와 형태를 복원해 큰 관심을 받고 있다. 기존의 심층 3D 통합적 인체 포즈 및 형태 재구성 방법은 선명한 이미지를 입력으로 활용하므로 흐릿한 이미지가 주어졌을 때 부정확한 인체 메시로 이어진다. 이미지 디블러링 방법은 다양하지만 단순한 계단식 접근으로는 만족스러운 결과를 얻을 수 없다.Integrated human body pose and shape reconstruction is receiving great attention by restoring detailed human body posture and shape, such as facial expressions and finger-level hand shapes. Existing deep 3D integrated body pose and shape reconstruction methods use clear images as input, leading to inaccurate body meshes when blurry images are given. There are various image deblurring methods, but a simple step-by-step approach cannot produce satisfactory results.

본 발명에서, 두 가지 문제를 동시에 해결하기 위해 휴먼 모션 디블러링과 3D 통합적 인체 재구성의 새로운 공동 프레임워크인 D2R(Deblurring-to-Reconstruction)을 제안한다. 또한, 선명한/흐릿한 이미지 쌍과 해당 3D 인체 포즈/형태를 포함하는 새로운 대규모 데이터셋을 생성한다. In the present invention, we propose a new joint framework of human motion deblurring and 3D integrated human body reconstruction, Deblurring-to-Reconstruction (D2R), to solve the two problems at the same time. It also creates a new large-scale dataset containing sharp/blurred image pairs and corresponding 3D human body poses/shapes.

본 발명에서는 구조 맵의 형태로 인체 디테일을 강조하기 위한 구조 인식 모듈을 도입한 3차원 인체 모델 복원 방법 및 장치를 제안한다. 본 발명의 실시예에 따르면 추가적인 구조 인식 모듈을 활용하여 각 모듈의 성능을 개선하기 위해 대체 방식으로 제안된 공동 프레임워크를 학습한다. The present invention proposes a method and apparatus for restoring a 3D human body model incorporating a structure recognition module for emphasizing human body details in the form of a structure map. According to an embodiment of the present invention, the proposed joint framework is learned in an alternative way to improve the performance of each module by utilizing an additional structure recognition module.

네트워크를 단일 통합 프레임워크로 설계하지만 문제가 다른 도메인에 속하기 때문에 엔드 투 엔드 방식으로 공동 문제를 해결하는 것은 효과적이지 않다. 따라서 제안하는 D2R은 각 모듈을 번갈아 학습함으로써 해결된다. 또한 적절한 데이터셋이 부족하기 때문에 대규모 블러 인체 이미지와 해당 실측 자료의 3D 인체 파라미터를 포함하는 통합적 3D 신체 재구성을 위한 새로운 데이터셋을 제안한다. 실험 결과는 제안된 방법이 입력 이미지가 디블러링되는 동안 질적으로뿐만 아니라 질적으로도 통합적 인체 재구성을 능가하는 성과를 달성한다는 것을 보여준다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다. Although the network is designed as a single unified framework, solving common problems end-to-end is not effective because the problems belong to different domains. Therefore, the proposed D2R is solved by learning each module alternately. In addition, due to the lack of appropriate datasets, we propose a new dataset for integrated 3D body reconstruction that includes large-scale blurred human body images and 3D human body parameters of corresponding real-world data. Experimental results show that the proposed method outperforms integrative human body reconstruction qualitatively as well as qualitatively while the input image is deblurred. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

최근 심층 신경망의 우수성을 적용함으로써 여러 기술이 단안 이미지에서 3D 인체 자세 및 형상 재구성 분야에서 상당한 발전을 이루었다. 대부분의 종래기술 인체 메시를 재구성하기 위해 파라메트릭 인체 모델(SMPL [9])을 활용했다. Recently, by applying the excellence of deep neural networks, several techniques have made significant progress in the field of 3D human body posture and shape reconstruction from monocular images. Most prior art parametric human body models (SMPL [9]) were utilized to reconstruct human body meshes.

또 다른 종래기술은 입력 이미지에서 SMPL 파라미터를 직접 회귀시켰다. 반면에, 매우 비선형적인 이미지 특징을 사용하는 대신, 일부 작업은 인간 마스크 또는 고밀도 대응과 같은 입력으로 추가 표현을 활용했다. 앞서 언급한 기술은 입력으로서 선명한 이미지에만 초점을 맞추지만, 또 다른 종래기술은 저해상도 영상에서 인체 메시를 재구성하는 새로운 방법을 소개한다. Another prior art directly regressed the SMPL parameters on the input image. On the other hand, instead of using highly non-linear image features, some tasks have utilized additional representations as inputs, such as human masks or high-density correspondences. While the aforementioned techniques focus only on clear images as input, another prior art introduces a new method of reconstructing a human body mesh from low-resolution images.

기존의 포즈와 인체에 비해 보다 정확한 머리 자세, 표정, 핑거 레벨의 손 자세 등 얼굴과 손의 디테일을 재구성해야 하기 때문에 통합적 인체 재구성이 쉽지 않은 과제다. Comparing to existing poses and human bodies, it is not an easy task to reconstruct an integrated human body because details of the face and hands, such as more accurate head posture, facial expression, and finger-level hand posture, must be reconstructed.

또 다른 종래기술은 최적화 접근방식을 채택했으며, 여기서 인체 모델을 2D 요점에 적합시켜 통합적 인체 메시를 얻었다. 이 접근법은 주목할 만한 결과를 얻었지만 추론이 느리고 노이즈가 발생하기 쉽다.Another prior art has adopted an optimization approach, where an anatomical model is fit to 2D points to obtain an integral anatomical mesh. This approach yields notable results, but inference is slow and prone to noise.

또 다른 종래기술은 단일 RGB 이미지에서 직접 SMPL-X 파라미터를 예측하고 추론 시간이 빨라지고 성능이 향상된 ExPose를 제안했다. 본 발명에서 제안하는 D2R 프레임워크는 흐릿한 이미지에서 통합적 인체 메시를 재구성한다. 본 발명에서 제안하는 방법 또한 인체 재구성 네트워크의 정보를 활용하여 디블러링된 이미지를 복원한다. Another prior art proposed ExPose, which predicts SMPL-X parameters directly from a single RGB image and has faster inference time and improved performance. The D2R framework proposed in the present invention reconstructs an integrated human body mesh from a blurry image. The method proposed in the present invention also restores the deblurred image by using the information of the human body reconstruction network.

일반적인 이미지 디블러링의 많은 기술에서 이미지 디블러링 작업을 처리하기 위해 딥러닝 기술을 이용했다.In many techniques of general image deblurring, deep learning technology is used to handle the image deblurring task.

종래기술에서는 비균일 모션 블러를 해결하기 위해 반복 신경망 아키텍처를 제안했다. 또 다른 종래기술은 계층 구조를 도입하였고, 앞서 언급한 접근 방식은 대부분 동적 장면 블러에 초점을 맞췄으며 전경 모션 블러를 처리하는 데 어려움을 겪었다. 이러한 문제를 해결하기 위해 개선된 결과와 함께 동적 장면과 객체 모션 블러를 동시에 해결하기 위한 다중 헤드 디코더 아키텍처가 도입되었다.In the prior art, a recurrent neural network architecture has been proposed to solve non-uniform motion blur. Another prior art introduced a hierarchical structure, and the aforementioned approaches mostly focused on dynamic scene blur and struggled to deal with foreground motion blur. To address these issues, a multi-head decoder architecture was introduced to simultaneously address dynamic scene and object motion blur with improved results.

한편, 다른 이미지 복원 분야에서 GAN 기반 아키텍처의 성공은 이미지 디블러링 분야에도 영감을 주었다. 또 다른 종래기술은 적대적 학습을 통한 코어스-투-파인(coarse-to-fine) 접근 방식을 제안했다. 또 다른 종래기술은 지각 손실과 적대적 손실이 있는 GAN 기반 방법을 제안했다. 최근의 연구에서는 블러를 생성하는 방법을 학습하여 이미지를 디블러링하는 접근방식을 설계했다.Meanwhile, the success of GAN-based architectures in other image restoration fields also inspired the field of image deblurring. Another prior art proposed a coarse-to-fine approach through adversarial learning. Another prior art proposed a GAN-based method with perceptual loss and adversarial loss. Recent work has designed an approach to deblurring an image by learning how to create blur.

종래기술의 이러한 성공의 주요 요인은 대규모 데이터셋의 가용성이다. 디블러링 분야의 경우, GoPro 데이터셋이라고 하는 하이프레임 비디오 카메라에 의해 캡처된 여러 개의 연속 프레임을 평균화하여 동적 블러 데이터셋을 생성했다. 또 다른 종래기술에서는 유사한 전략을 따라 인간 보행자 활동을 포함하는 HIDE 데이터셋을 생성했다. A major factor in this success of the prior art is the availability of large datasets. For the deblurring application, a dynamic blur dataset was created by averaging multiple consecutive frames captured by a high-frame video camera, referred to as the GoPro dataset. Another prior art followed a similar strategy to create a HIDE dataset containing human pedestrian activity.

인체 재구성 분야의 경우 2D 인체 키포인트 주석을 사용하여 야생에서 선명한 이미지 쌍을 생성했다. 또 다른 종래기술에서는 3D 및 2D 인체 키포인트 주석을 통해 제한된 환경에서 캡처한 수백만 개의 선명한 이미지를 제공했다. 최근에는 얼굴 및 손 디테일에 대한 추가 주석을 포함한 인체 MS-COCO 데이터셋의 확장을 제안했다. For the field of human body reconstruction, 2D human keypoint annotations were used to generate sharp image pairs in the wild. Another prior art provided millions of sharp images captured in a confined environment through 3D and 2D human body keypoint annotation. Recently, we proposed an extension of the human body MS-COCO dataset to include additional annotations on face and hand details.

인체 재구성과 디블러링 사이의 격차를 해소하기 위해서는 사용자 정의되고 정교한 데이터셋을 갖추어야 한다. 하지만 사용 가능한 데이터셋은 본 발명의 목적에 적합하지 않다. 종래기술에서 인체 모션 블러 데이터셋을 제안했지만, 이 데이터셋은 인체 주석의 부재, 인간 활동의 제약, 대부분의 블러 효과가 카메라 흔들림 모션으로 인해 발생하기 때문에 여전히 사용하기에 불충분하다. 이런 맥락에서 흐릿한 이미지, 선명한 실측 자료, 3D 인체, 얼굴, 손 주석 등 인체 모션 효과의 다양한 측면이 담긴 새로운 데이터셋을 구축해야 한다. To bridge the gap between human body reconstruction and deblurring, it is necessary to have a user-defined and sophisticated dataset. However, the available datasets are not suitable for the purposes of the present invention. Although a human body motion blur dataset has been proposed in the prior art, this dataset is still insufficient for use because of the absence of human body annotation, restrictions on human activities, and most of the blur effect caused by camera shake motion. In this context, it is necessary to build a new dataset containing various aspects of human body motion effects, such as blurred images, sharp ground truth, 3D human body, face, and hand annotations.

도 2는 본 발명의 일 실시예에 따른 블러가 존재하는 영상으로부터의 3차원 인체 모델 복원 방법을 설명하기 위한 흐름도이다. 2 is a flowchart illustrating a method of restoring a 3D human body model from an image with blur according to an embodiment of the present invention.

제안하는 블러가 존재하는 영상으로부터의 3차원 인체 모델 복원 방법은 데이터셋 생성 모듈을 통해 모션 디블러링 및 3차원 인체 모델 재구성을 위한 합성 데이터셋(synthetic dataset)을 생성하는 단계(210), 상기 데이터셋을 이용하여 모션 디블러링 모듈을 통해 모션 디블러링된 이미지를 생성하고, 손실함수를 이용하여 상기 모션 디블러링된 이미지를 학습하는 단계(220), 상기 학습된 모션 디블러링된 이미지를 이용하여 구조 인식 모듈을 통해 통합적 인체 재구성을 위한 구조 정보를 제공하는 단계(230) 및 상기 구조 정보로 디블러링된 이미지를 이용하여 인체 재구성 모듈을 통해 3차원 인체 모델을 재구성하는 단계(240)를 포함한다. The proposed method of restoring a 3D human body model from an image with blur includes generating a synthetic dataset for motion deblurring and 3D human body model reconstruction through a dataset generation module (210). Generating a motion deblurred image through a motion deblurring module using a dataset, and learning the motion deblurred image using a loss function (220). Providing structural information for integrated human body reconstruction through a structure recognition module using an image (230) and reconstructing a 3D human body model through a human body reconstruction module using the deblurred image with the structure information ( 240).

단계(210)에서, 데이터셋 생성 모듈을 통해 모션 디블러링 및 3차원 인체 모델 재구성을 위한 합성 데이터셋(synthetic dataset)을 생성한다. In step 210, a synthetic dataset for motion deblurring and 3D human body model reconstruction is generated through the dataset generation module.

평균 이미지가 계산되기 전에 프레임 속도를 증가시키기 위해 비디오 프레임 보간을 적용하고, 평균 이미지 계산 중에 인체 영역을 분할하여 배경 영역에 대한 블러 효과를 감소시키기 위한 인체 마스크를 생성한다. Video frame interpolation is applied to increase the frame rate before the average image is calculated, and body regions are segmented during average image calculation to create a body mask to reduce the blur effect on the background region.

이후, 통합적 3D 파라미터를 얻기 위해 유사 실측 자료(pseudo ground truth) 2D 키포인트를 생성하고, 유사 실측 자료 파라미터와 3D 키포인트를 생성하여 데이터를 필터링함으로써 모션 디블러링 및 3차원 인체 모델 재구성을 위한 합성 데이터셋(synthetic dataset), 다시 말해 본 발명의 실시예에 따른 SIN 블러(Synthesized INstavariety Blur) 데이터셋을 생성한다. Then, synthetic data for motion deblurring and 3D human body model reconstruction by generating pseudo ground truth 2D keypoints to obtain integrated 3D parameters and filtering data by generating pseudo ground truth parameters and 3D keypoints A set (synthetic dataset), that is, a synthetic INstavariety Blur (SIN blur) dataset according to an embodiment of the present invention is generated.

단계(220)에서, 상기 데이터셋을 이용하여 모션 디블러링 모듈을 통해 모션 디블러링된 이미지를 생성하고, 손실함수를 이용하여 상기 모션 디블러링된 이미지를 학습한다. 이때, 픽셀 단위 평균 오차 손실 함수, 지각 손실 함수, 글로벌 및 로컬 판별기 모두에 대한 적대적 손실 함수 및 인체 재구성 손실 함수를 이용하여 모션 디블러링된 이미지를 생성한다. In step 220, a motion deblurred image is generated through a motion deblurring module using the dataset, and the motion deblurred image is learned using a loss function. In this case, a motion deblurred image is generated using a per-pixel mean error loss function, a perceptual loss function, an adversarial loss function for both global and local discriminators, and a body reconstruction loss function.

단계(230)에서, 학습된 모션 디블러링된 이미지를 이용하여 구조 인식 모듈을 통해 통합적 인체 재구성을 위한 구조 정보를 제공한다. In step 230, structural information for integrated human body reconstruction is provided through a structure recognition module using the learned motion deblurred image.

단계(240)에서, 상기 구조 정보로 디블러링된 이미지를 이용하여 인체 재구성 모듈을 통해 3차원 인체 모델을 재구성한다. In step 240, a 3D human body model is reconstructed through a human body reconstruction module using the deblurred image with the structure information.

인체 재구성 모듈은 디블러링된 이미지로부터 투영 행렬 및 통합적 3D 파라미터를 추정하고, 상기 통합적 3D 파라미터를 회귀시키기 위해 구조 특징 및 이미지 특징을 입력으로 연결한다. An anatomy reconstruction module estimates projection matrices and integral 3D parameters from the deblurred image, and concatenates the structural features and image features as inputs to regress the integral 3D parameters.

상기 이미지 특징은 컨볼루션 블록에 의해 디블러 이미지로부터 생성되고, 구조 특징 및 이미지 특징 간의 연결 특징을 동적으로 융합하기 위한 특징 융합 블록을 이용한다. The image feature is generated from the deblur image by a convolution block, and uses the feature fusion block to dynamically fuse the structural features and the connection features between the image features.

도 3은 본 발명의 일 실시예에 따른 블러가 존재하는 영상으로부터의 3차원 인체 모델 복원 장치의 구성을 나타내는 도면이다. 3 is a diagram showing the configuration of an apparatus for restoring a 3D human body model from an image with blur according to an embodiment of the present invention.

제안하는 블러가 존재하는 영상으로부터의 3차원 인체 모델 복원 장치는 데이터셋 생성 모듈(310), 모션 디블러링 모듈(320), 구조 인식 모듈(330) 및 인체 재구성 모듈(340)을 포함한다. The proposed apparatus for restoring a 3D human body model from an image with blur includes a dataset generation module 310, a motion deblurring module 320, a structure recognition module 330, and a human body reconstruction module 340.

본 발명의 실시예에 따른 데이터셋 생성 모듈(310)은 데이터셋 생성 모듈을 통해 모션 디블러링 및 3차원 인체 모델 재구성을 위한 합성 데이터셋(synthetic dataset)을 생성한다. The dataset generation module 310 according to an embodiment of the present invention generates a synthetic dataset for motion deblurring and 3D human body model reconstruction through the dataset generation module.

본 발명의 실시예에 따른 통합적 인체 메시는 인체, 얼굴, 손의 디테일을 동시에 포착하는 SMPLX[1] 모델로 표현된다. SMPL-X 파라미터는 형태

, 포즈

및 표현 파라미터

로 구성된다. 이러한 매개변수는 N = 10,475개의 정점을 갖는 자세와 형태의 3D 인체 메시

을 생성하는 데 사용될 수 있다. 구체적으로 포즈 파라미터

는 인체, 얼굴, 손의 관절을 나타낸다. 주요 관절 J는 몸통관절 22개, 턱 포즈 1개, 손당 손가락 관절 15개로 구성된 53개의 관절이 있다. D는 [2]에서 수정된 회전 표현으로, 여기서 D = 6이다. 형태 파라미터

및 표현 매개변수

은 해당 PCA 공간의 10개 계수로 표현된다. The integrated human body mesh according to an embodiment of the present invention is represented by the SMPLX[1] model that simultaneously captures details of the human body, face, and hands. SMPL-X parameters are of the form

, pose

and expression parameters

consists of These parameters are a 3D human mesh with pose and shape with N = 10,475 vertices.

can be used to create specifically the pose parameters

represents the joints of the human body, face, and hand. Major joint J has 53 joints, consisting of 22 torso joints, 1 jaw pose, and 15 finger joints per hand. D is the rotation expression modified from [2], where D = 6. shape parameters

and expression parameters

is expressed as 10 coefficients of the corresponding PCA space.

종래기술에 따르면, 실측 자료(ground truth) 3D 인체, 선명한 이미지, 2D 및 3D 키포인트와 같은 필요한 주석으로 흐릿한 이미지에서 통합적 3D 인체 재구성을 해결하는 데 사용할 수 있는 데이터셋이 없다. According to the prior art, there is no dataset available to solve integrated 3D anatomical reconstruction from blurry images with necessary annotations such as ground truth 3D anatomy, sharp images, 2D and 3D keypoints.

본 발명의 실시예에 따른 데이터셋 생성에 있어서, 디블러링과 통합적 3D 인체 재구성의 공동 문제를 해결하기 위한 최초이자 유일한 데이터셋인 SINBlur(Synthesized INstavariety Blur) 데이터셋을 제안한다. 표 1은 기존 데이터셋과 제안하는 데이터셋의 요약 및 비교를 보여준다.In generating a dataset according to an embodiment of the present invention, we propose a synthesized INstavariety Blur (SINBlur) dataset, which is the first and only dataset to solve the common problem of deblurring and integrated 3D human body reconstruction. Table 1 shows a summary and comparison of the existing dataset and the proposed dataset.

<표 1><Table 1>

특히 실제 환경에서 통합적 3D 인체 파라미터의 인체 모션 블러 및 실측 자료(ground truth) 정보가 포함된 이미지를 얻는 것은 어렵다. 따라서 일련의 비디오 프레임에서 흐릿한 이미지를 평균 이미지로 합성하지 않는 경우가 많다. In particular, it is difficult to obtain an image including human motion blur and ground truth information of integrated 3D human body parameters in a real environment. Therefore, blurring images from a series of video frames are often not composited into an averaged image.

본 발명에서는 다양한 인체 활동 비디오가 포함된 InstaVariety 데이터셋[3]의 비디오 클립을 활용한다. 그럼에도 불구하고 연속 프레임에 대한 간단한 평균 산출은 낮은 프레임률(예를 들어, 30fps)로 인해 비현실적인 고스팅(ghosting) 아티팩트를 초래한다.In the present invention, video clips of the InstaVariety dataset [3] containing various human body activity videos are utilized. Nonetheless, simple averaging over successive frames results in unrealistic ghosting artifacts due to low frame rates (e.g., 30 fps).

본 발명에서는 평균 이미지가 계산되기 전에 프레임 속도를 30fps에서 240fps로 높이기 위해 비디오 프레임 보간[4]을 적용하여 상술된 문제를 해결한다. 또한 평균 이미지 계산 중에 인체 영역을 분할하여 배경 영역에 대한 블러 효과를 감소시키기 위해 인체 마스크[5]를 생성한다.The present invention solves the aforementioned problem by applying video frame interpolation [4] to increase the frame rate from 30 fps to 240 fps before the average image is calculated. In addition, the body mask [5] is created to reduce the blur effect on the background area by segmenting the body area during average image calculation.

통합적 3D 파라미터를 얻기 위해 OpenPose[6]를 사용하여 유사 실측 자료(pseudo ground truth) 2D 키포인트를 생성하고, Smplify-X [7]를 사용하여 유사 실측 자료 SMPL-X 파라미터와 3D 키포인트를 생성한다. 흐릿한 이미지의 해당 실측 자료를 생성하기 위해 선명한 이미지가 사용된다는 점에 유의한다. 또한 여러 관점에서 유사 실측 자료 정보를 확인하여 주석이 부족한 데이터를 필터링한다. To obtain integrated 3D parameters, pseudo ground truth 2D keypoints are generated using OpenPose [6], and pseudo ground truth SMPL-X parameters and 3D keypoints are generated using Smplify-X [7]. Note that clear images are used to create corresponding ground truths for blurry images. In addition, it checks similar ground truth data information from various perspectives to filter out data lacking in annotations.

마지막으로, 본 발명의 실시예에 따르면, 18,911개의 학습 이미지와 1,089개의 테스트 이미지로 나뉜 20,000개의 이미지를 신중하게 선택한다. 선택 요건은 다음과 같이 요약된다: 사람 영역의 크기는 이미지의 30%여야 하고, 디테일(얼굴 및 손)이 이미지에 표시되며, 이미지의 폐색(occlusion) 영역은 10% 미만이다. Finally, according to an embodiment of the present invention, we carefully select 20,000 images divided into 18,911 training images and 1,089 test images. The selection requirements are summarized as follows: the size of the person area must be 30% of the image, details (face and hands) are visible in the image, and the occlusion area of the image is less than 10%.

본 발명의 실시예에 따른 유사 레이블은 반지도 연구 영역에서 비롯된다[8]. 간단히 말해서, 첫째, 네트워크는 제한된 크기의 레이블링된 데이터셋으로 학습된다. 이후, 학습된 네트워크는 레이블이 없는 데이터에 대한 유사 레이블을 예측하는 데 사용된다. 마지막으로, 네트워크는 레이블이 지정된 데이터셋과 함께 유사 레이블 데이터셋으로 학습된다. 이 절차는 수렴될 때까지 반복된다. 유사 레이블링이 실제 실측 자료로 평가 데이터셋의 성능을 향상시킨다는 것을 보여준다. Similar labels according to an embodiment of the present invention originate from the semi-map study area [8]. Briefly, first, the network is trained with a labeled dataset of limited size. Then, the trained network is used to predict similar labels for unlabeled data. Finally, the network is trained on a pseudo-labeled dataset along with a labeled dataset. This procedure is repeated until convergence. We show that pseudo-labeling improves the performance of the evaluation dataset with real-world ground truth.

다시 도 3을 참조하면, 본 발명의 실시예에 따른 모션 디블러링 모듈(320)은 상기 데이터셋을 이용하여 모션 디블러링 모듈을 통해 모션 디블러링된 이미지를 생성하고, 손실함수를 이용하여 상기 모션 디블러링된 이미지를 학습한다. 이때, 픽셀 단위 평균 오차 손실 함수, 지각 손실 함수, 글로벌 및 로컬 판별기 모두에 대한 적대적 손실 함수 및 인체 재구성 손실 함수를 이용하여 모션 디블러링된 이미지를 생성한다. Referring back to FIG. 3, the motion deblurring module 320 according to an embodiment of the present invention generates a motion deblurred image through the motion deblurring module using the dataset and uses a loss function. to learn the motion deblurred image. In this case, a motion deblurred image is generated using a per-pixel mean error loss function, a perceptual loss function, an adversarial loss function for both global and local discriminators, and a body reconstruction loss function.

도 4는 본 발명의 일 실시예에 따른 네트워크의 파이프라인 개념을 설명하기 위한 도면이다. 4 is a diagram for explaining a concept of a pipeline of a network according to an embodiment of the present invention.

본 발명에서는 모션 디블러링과 통합적 3D 재구성을 모두 해결하기 위한 새로운 D2R 프레임워크를 소개한다. 제안하는 프레임워크는 디블러링 모듈

, 구조 인식 모듈

및 재구성 모듈

의 세 가지 핵심 모듈로 구성된다. 흐릿한 인체 이미지가

에 공급되어 디블러링된 결과를 생성한다. 그런 다음

은

에서 얻은 디블러 이미지와 구조 맵에서 카메라와 SMPL-X 파라미터를 생성한다. 도 4는 본 발명의 실시예에 따른 블러가 존재하는 영상으로부터의 3차원 인체 모델을 복원하기 위한 프레임워크의 구조를 보여준다.In the present invention, we introduce a new D2R framework to solve both motion deblurring and integrated 3D reconstruction. The proposed framework is a deblurring module

, the structure recognition module

and reconstruction module

It consists of three core modules. blurry human body image

to produce a deblurred result. after that

silver

Generate the camera and SMPL-X parameters from the deblur image and structure map obtained from 4 shows the structure of a framework for restoring a 3D human body model from an image with blur according to an embodiment of the present invention.

본 발명의 실시예에 따른 모션 디블러링 모듈

는 모션 디블러링된 이미지

를 생성한다. 여기서

와

는 256으로 설정된다. 본 발명의 실시예에 따른 접근 방식에서는 인체 영역

주변의 이미지를 자르는 패치 기반 학습을 적용하여 해당 영역의 정교화에 초점을 맞춘다. 또한 무작위로 잘라낸 패치 이미지

도 데이터 확대에 사용된다. Motion deblurring module according to an embodiment of the present invention

is the motion deblurred image

generate here

Wow

is set to 256. In the approach according to an embodiment of the present invention, the human body area

It applies patch-based learning that crops images around it, focusing on the refinement of that region. Also randomly cropped patch image

Also used for data augmentation.

본 발명의 실시예에 따르면,

를 학습시키기 위해, 먼저 다음과 같이 픽셀 단위 평균 오차 손실 L_P, 지각 손실 L_X, 글로벌 및 로컬 판별기 모두에 대한 적대적 손실 L_Adv로 구성된 기존의 디블러링 손실 함수

로 시작한다.According to an embodiment of the present invention,

To train , first a conventional deblurring loss function consisting of per-pixel mean error loss L _P , perceptual loss L _X , and adversarial loss L _Adv for both global and local discriminators as

It starts with

(1)

(One)

다만 기존 손실 함수는 인체 영역을 디블러링 하기엔 부족하다. 따라서 모션 디블러링 모듈에 대한 새로운 인간 재구성 손실

을 제안한다.

은 인체 재구성 네트워크

에

를 공급함으로써 얻을 수 있다. 요약하자면, 본 발명의 실시예에서는 다음에서 설명한 것과 같이 네 가지 손실 함수를 활용한다.However, the existing loss function is insufficient to deblur the human body area. Hence the new human reconstruction loss for motion deblurring module

suggests

silver human body reconstruction network

to

can be obtained by supplying In summary, the embodiments of the present invention utilize four loss functions as described below.

(2)

여기서 가중치 계수

,

는 각각 0.3, 25, 0.5로 설정된다.

은 아래에서 더욱 정확하게 공식화된다. where the weight coefficient

,

are set to 0.3, 25, and 0.5, respectively.

is formulated more precisely below.

다시 도 3을 참조하면, 본 발명의 실시예에 따른 구조 인식 모듈(330)은 학습된 모션 디블러링된 이미지를 이용하여 구조 인식 모듈을 통해 통합적 인체 재구성을 위한 구조 정보를 제공한다. Referring back to FIG. 3 , the structure recognition module 330 according to an embodiment of the present invention provides structure information for integrated human body reconstruction through the structure recognition module using the learned motion deblurred image.

구조 인식 모듈

는 디블러링된 이미지에서 구조 정보를 제공하여 통합적 인체 재구성 모듈을 돕는다. 이것의 목표는 구조맵

에서 저차원 특징 F_S를 추출하는 것이다. S를 얻기 위해 구조 부동 함수(structure dissimilarity function) SD를 적용하여

와

의 구조 차이를 다음과 같이 계산한다.structure recognition module

assists the integrated human body reconstruction module by providing structural information from the deblurred image. Its goal is to map the structure

It is to extract low-dimensional features F _S from . Applying the structure dissimilarity function SD to obtain S

Wow

The structural difference of is calculated as:

(3)

(4)

여기서

와

는 평균값이고

와

는 각각 블러링된 인체와 복원된 인체의 분산이다. 블러링된 인체와 복원된 인체의 공분산은

로 표시된다. C₁과 C₂는 0으로의 나누기를 피하기 위한 안정화 상수이다. 식 (4)는 SD 함수가 역방향(1-SSIM)을 계산하는 SSIM 메트릭과 유사하다. 이러한 이유로 S는 배경을 무시함으로써 인체 영역의 중간 결과로 나타낼 수 있다. 이후,

는 컨볼루션 블록

에 의해 S로부터 특징

를 추출하고 다음과 같이 설명된다: here

Wow

is the average value

Wow

is the variance of the blurred human body and the restored human body, respectively. The covariance of the blurred and restored bodies is

is indicated by C ₁ and C ₂ are stabilization constants to avoid division by zero. Equation (4) is similar to the SSIM metric where the SD function computes the inverse (1-SSIM). For this reason, S can be represented as an intermediate result in the human body region by ignoring the background. after,

is the convolution block

Features from S by

is extracted and described as:

(5)

여기서

는 2D 배치 정규화, ReLU, 최대 풀링 계층으로 구성된다. here

consists of 2D batch normalization, ReLU, and max pooling layers.

다시 도 3을 참조하면, 본 발명의 실시예에 따른 인체 재구성 모듈(340)은 상기 구조 정보로 디블러링된 이미지를 이용하여 인체 재구성 모듈을 통해 3차원 인체 모델을 재구성한다. Referring back to FIG. 3 , the human body reconstruction module 340 according to an embodiment of the present invention reconstructs a 3D human body model using the image deblurred with the structure information through the human body reconstruction module.

본 발명의 실시예에 따른 통합적 인체 재구성 모듈

은 디블러링된 이미지

로부터 투영 행렬

및 SMPL-X 파리미터 (

)를 추정한다. 파라미터를 회귀시키기 위해,

은 다음과 같이 표현되는 구조 특징

와 이미지 특징

를 입력으로 연결한다. Integrated human body reconstruction module according to an embodiment of the present invention

is a deblurred image

projection matrix from

and SMPL-X parameters (

) is estimated. To regress the parameters,

is a structural feature expressed as

with image features

connect as input.

(6)

(7)

여기서

는 컨볼루션 블록

에 의해 디블러 이미지

로부터 생성되며

와 같은 양의 계층을 갖는다. 특징 융합 블록

는

및

(

로 표시) 사이의 연결된 특징을 동적으로 융합하기 위해 도입된다. 또한 카메라와 SMPL-X 파라미터를 추정하기 위해

의 출력이

에 공급된다. 모듈을 학습하기 위해, 본 발명에서는 3D L_3D 및 2D L_2D 키포인트와 SMPL-X L_SMPL-X손실을 사용한다: here

is the convolution block

deblur image by

is generated from

has the same amount of layers as feature fusion block

Is

and

(

) is introduced to dynamically fuse the connected features between them. Also to estimate camera and SMPL-X parameters

the output of

supplied to To learn the module, we use 3D L _3D and 2D L _2D keypoints and SMPL-X L _SMPL-X loss:

(8)

여기서

는 각 손실함수에 대한 가중치이다. 본 발명의 실시예에 따른 구현에서, 모든 항은 동일하게 가중된다. here

is the weight for each loss function. In implementations according to embodiments of the present invention, all terms are weighted equally.

도 5는 본 발명의 일 실시예에 따른 대체 훈련이 수행되는 방법 나타내는 알고리즘이다. 5 is an algorithm illustrating how replacement training is performed according to one embodiment of the present invention.

본 발명의 실시예에 따르면, 엔드 투 엔드 방식으로 프레임워크를 학습하는 대신 각 모듈의 정보가 다른 모듈에 의해 사용되는 대체 학습을 적용한다. 알고리즘 1은 대체 학습이 수행되는 방법을 보여준다. 디블러링

와 통합적 인체 재구성

의 두 가지 경사도(gradient)가 있으며, 여기서 구조 인식 모듈

의 경사도는

에 포함된다. According to an embodiment of the present invention, instead of learning the framework in an end-to-end manner, we apply replacement learning in which the information of each module is used by other modules. Algorithm 1 shows how substitution learning is performed. deblurring

and integrated human body reconstruction

There are two gradients of , where the structure recognition module

the slope of

included in

본 발명은 디블러링 및 통합적 3D 인체 재구성뿐만 아니라 3D 인체 파라미터를 사용한 인간 모션 블러 데이터셋을 동시에 다룬 최초의 기술이다. 본 발명의 실시예에 따르면, 새로운 SINBlur 데이터셋은 블러 이미지에서 통합적 3D 인체 재구성을 해결하는 데 유용한 합성 InstaVariety 블러 데이터셋을 제공할 수 있다. 또한, 통합적 3D 인체 재구성 및 인체 움직임 디블러링을 위한 새로운 공동 프레임워크를 제안하고, 재구성 네트워크를 위한 구조 인식 모듈 및 디블러링 네트워크를 위한 인간 재구성 손실 기능의 정교화와 최첨단 방법에 비해 상당한 성능 향상을 달성할 수 있다. The present invention is the first technique to simultaneously address human motion blur datasets using 3D anatomical parameters as well as deblurring and integrated 3D anatomical reconstruction. According to an embodiment of the present invention, the new SINBlur dataset can provide a synthetic InstaVariety blur dataset useful for solving integral 3D anatomical reconstructions in blurred images. In addition, we propose a new joint framework for integrated 3D human body reconstruction and human body motion deblurring, elaboration of a structure recognition module for reconstruction networks and human reconstruction loss functions for deblurring networks, and significant performance improvements compared to state-of-the-art methods. can be achieved.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The devices described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

<참고문헌><References>

[1] Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. In CVPR, pages 10975- 10985, 2019. [1] Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. In CVPR, pages 10975- 10985, 2019.

[2] Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. In CVPR, pages 5745-5753, 2019. [2] Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. In CVPR, pages 5745-5753, 2019.

[3] Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, and Jitendra Malik. Learning 3D human dynamics from video. In CVPR, pages 5614-5623, 2019.[3] Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, and Jitendra Malik. Learning 3D human dynamics from video. In CVPR, pages 5614-5623, 2019.

[4] Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik G. Learned-Miller, and Jan Kautz. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In CVPR, pages 9000-9008, 2018.[4] Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik G. Learned-Miller, and Jan Kautz. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In CVPR, pages 9000-9008, 2018.

[5] Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, and Ming-Ting Sun. Cross-domain complementary learning using pose for multi-person part segmentation. IEEE TCSVT, 31(3):1066-1078, 2021.[5] Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, and Ming-Ting Sun. Cross-domain complementary learning using pose for multi-person part segmentation. IEEE TCSVT, 31(3):1066-1078, 2021.

[6] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE TPAMI, 43(1):172- 186, 2019.[6] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE TPAMI, 43(1):172-186, 2019.

[7] Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. In CVPR, pages 10975- 10985, 2019.[7] Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. In CVPR, pages 10975- 10985, 2019.

[8] Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop, 2013.[8] Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop, 2013.

[9] Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. SMPL: A skinned multiperson linear model. ACM TOG, 34(6):248:1-248:16, 2015.[9] Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. SMPL: A skinned multiperson linear model. ACM TOG, 34(6):248:1-248:16, 2015.

Claims

generating a synthetic dataset for motion deblurring and 3D human body model reconstruction through a dataset generation module;
generating a motion deblurred image through a motion deblurring module using the dataset, and learning the motion deblurred image using a loss function;
providing structural information for integrated human body reconstruction through a structure recognition module using the learned motion deblurred image; and
Reconstructing a 3D human body model through a human body reconstruction module using the deblurred image with the structural information.
including,
The step of generating a motion deblurred image through a motion deblurring module using the dataset and learning the motion deblurred image using a loss function,
Generating a motion deblurred image using a per-pixel mean error loss function, a perceptual loss function, an adversarial loss function for both global and local discriminators, and an anatomical reconstruction loss function.
A method for restoring a 3D human body model.

According to claim 1,
Generating a synthetic dataset for motion deblurring and 3D human body model reconstruction through the dataset generation module,
apply video frame interpolation to increase the frame rate before the average image is computed;
segmenting the body region during average image calculation to generate a body mask for reducing the blur effect on the background region;
A synthetic dataset for motion deblurring and 3D human body model reconstruction by generating pseudo ground truth 2D keypoints to obtain integrated 3D parameters and filtering data by generating pseudo ground truth parameters and 3D keypoints ( to generate a synthetic dataset)
A method for restoring a 3D human body model.

delete

generating a synthetic dataset for motion deblurring and 3D human body model reconstruction through a dataset generation module;
generating a motion deblurred image through a motion deblurring module using the dataset, and learning the motion deblurred image using a loss function;
providing structural information for integrated human body reconstruction through a structure recognition module using the learned motion deblurred image; and
Reconstructing a 3D human body model through a human body reconstruction module using the deblurred image with the structural information.
including,
The step of reconstructing a 3D human body model through a human body reconstruction module using the deblurred image with the structure information,
Estimating projection matrices and integral 3D parameters from the deblurred image, concatenating structure features and image features as inputs to regress the integral 3D parameters;
The image features are generated from the deblur image by a convolution block, using feature fusion blocks to dynamically fuse structural features and connection features between image features.
A method for restoring a 3D human body model.

A dataset generation module for generating a synthetic dataset for motion deblurring and 3D human body model reconstruction;
a motion deblurring module for generating a motion deblurred image using the dataset and learning the motion deblurred image using a loss function;
a structure recognition module providing structure information for integrated human body reconstruction using the learned motion deblurred image; and
A human body reconstruction module that reconstructs a 3D human body model using the deblurred image with the structural information.
including,
The motion deblurring module,
Generating a motion deblurred image using a per-pixel mean error loss function, a perceptual loss function, an adversarial loss function for both global and local discriminators, and an anatomical reconstruction loss function.
3D human body model restoration device.

According to claim 5,
The dataset generation module,
apply video frame interpolation to increase the frame rate before the average image is computed;
segmenting the body region during average image calculation to generate a body mask for reducing the blur effect on the background region;
A synthetic dataset for motion deblurring and 3D human body model reconstruction by generating pseudo ground truth 2D keypoints to obtain integrated 3D parameters and filtering data by generating pseudo ground truth parameters and 3D keypoints ( to generate a synthetic dataset)
3D human body model restoration device.

delete

A dataset generation module for generating a synthetic dataset for motion deblurring and 3D human body model reconstruction;
a motion deblurring module for generating a motion deblurred image using the dataset and learning the motion deblurred image using a loss function;
a structure recognition module providing structure information for integrated human body reconstruction using the learned motion deblurred image; and
A human body reconstruction module that reconstructs a 3D human body model using the deblurred image with the structural information.
including,
The human body reconstruction module,
Estimating projection matrices and integral 3D parameters from the deblurred image, concatenating structure features and image features as inputs to regress the integral 3D parameters;
The image features are generated from the deblur image by a convolution block, using feature fusion blocks to dynamically fuse structural features and connection features between image features.
3D human body model restoration device.