KR20220050458A

KR20220050458A - Lightweight Deep Learning Network Method and Device for 3D Human Model Reconstruction

Info

Publication number: KR20220050458A
Application number: KR1020200134164A
Authority: KR
Inventors: 김성제; 김제우; 윤주홍; 박민규; 정진우
Original assignee: 한국전자기술연구원
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2022-04-25

Abstract

Provided is a method for determining an optimal quantization parameter, based on both a reconstructed image quality and compression performance, when lightening and compressing a deep learning network for reconstructing a human model. According to an embodiment of the present invention, a system for reconstructing a three-dimensional system includes: an image encoder to refine image feature vectors from an image input, through the deep learning network; a volume encoder to refine volume feature vectors from a depth image input through the deep learning network; and a volume decoder to reconstruct the three-dimensional model from the image feature vectors refined in the image encoder and the volume feature vectors refined in the volume encoder. In the deep network of the image encoder, the volume encoder, and the volume decoder, a weight may be quantized by adaptively applying multiple quantization manners.

Description

Lightweight Deep Learning Network Method and Device for 3D Human Model Reconstruction}

본 발명은 딥러닝 기반 3차원 휴먼 모델 복원 기술에 관한 것으로, 더욱 상세하게는 기존 딥러닝 추론기가 가진 높은 소비 전력 문제를 완화하면서 기존 네트워크의 휴먼 복원 성능은 유지하도록 하는 네트워크 경량화 기술에 관한 것이다.The present invention relates to a deep learning-based 3D human model restoration technology, and more particularly, to a network lightweight technology that maintains the human restoration performance of the existing network while alleviating the high power consumption problem of the existing deep learning reasoning machine.

1) 딥러닝 네트워크의 정의와 경량화 필요성1) Definition of deep learning network and the need for weight reduction

딥러닝 네트워크는 커널 데이터라고도 불리며, 여러 개의 레이어 (Layer)로 구성되어 있고, 각 레이어는 가중치 (Weights)와 편항치 (Biases)로 구성되어 있다.A deep learning network, also called kernel data, is composed of several layers, and each layer is composed of weights and biases.

커널 데이터의 한 레이어는 레이어 타입 (Input / Output / Convolutional / Residual / Fully-Connected / Batch Normalization / Recurrent 등)에 따라 가중치와 편항치의 개수가 다르기 때문에 레이어의 구성에 따라 커널 데이터의 크기가 달라진다.Because one layer of kernel data has different weights and number of bias values depending on the layer type (Input / Output / Convolutional / Residual / Fully-Connected / Batch Normalization / Recurrent, etc.), the size of the kernel data varies according to the layer configuration.

레이어 타입에 따라서 연산의 횟수가 다르긴 하지만, 레이어가 길어지면 길어질수록 연산해야 할 양이 많아지기 때문에 일반적으로 레이어가 짧은 것이 연산 속도 측면에서 유리하다.Although the number of calculations varies depending on the layer type, as the length of the layer increases, the amount to be calculated increases. In general, a shorter layer is advantageous in terms of operation speed.

또한, 대부분의 딥러닝 연산 정확도 (precision)는 32-bit 부동 소수점 (floating) 연산이며, 이보다 낮은 정확도 (예를 들어 8-/16-bit 등)로 연산하는 것이 성능 하락은 다소 있지만, 전력 소비 측면에서 유리하다.In addition, most of the deep learning operation precision is 32-bit floating point operation, and although operation with lower precision (eg 8-/16-bit, etc.) advantage in terms of

2) 종래 딥러닝 네트워크 경량화 기술2) Conventional deep learning network lightweight technology

네트워크의 정확도를 다소 낮추고 압축을 하는 기술을 통해 모델을 경량화하는 시도도 제안됐다. Pruning-Quantization-Huffman Encoding 과정을 거쳐 VGG-16 네트워크를 약 49배까지 압축을 수행하는 기술이다.Attempts to reduce the accuracy of the network somewhat and reduce the weight of the model through compression techniques have also been proposed. It is a technology that compresses the VGG-16 network up to 49 times through the Pruning-Quantization-Huffman Encoding process.

Pruning 기법 (가지치기 기법)은 모델 내에서 충분히 작은 가중치를 갖는 값은 성능에 크게 기여 하지 않는 것으로 판단해 뉴런 간의 연결을 끊어버리고 (해당 가중치는 0으로 설정) 재학습 과정을 통해 성능은 유지하면서 불필요한 가중치는 제거하는 방법이다.The pruning technique determines that a value with a sufficiently small weight in the model does not contribute significantly to the performance, cuts the connection between neurons (the corresponding weight is set to 0), and maintains performance through the re-learning process. This is a method of removing unnecessary weights.

Quantization 기법 (양자화 기법)은 레이어 내 가중치를 양자화하기 위한 최적 양자화 코드북을 학습 과정에서 도출한다.The quantization technique (quantization technique) derives an optimal quantization codebook for quantizing the weights in a layer in the learning process.

마지막 Huffman Encoding 기법은 양자화된 가중치 (Quantized weights)와 코드북 (CodeBook, Index)을 비트스트림으로 변환하는 과정이고, 이 과정에서 약 20 ~ 30% 정도의 추가적인 압축이 이뤄진다.The last Huffman encoding technique is a process of converting quantized weights and codebooks (CodeBook, Index) into bitstreams, and additional compression of about 20 to 30% is performed in this process.

하지만, 종래 기술은 압축 성능에 추가적인 개선을 가져올 수 있는 양자화 기법을 적응적으로 쓰거나, 입력 데이터 분포에 적합한 엔트로피 코딩 방법을 제시하지 못하였다. However, the prior art has not been able to adaptively use a quantization technique capable of bringing about further improvement in compression performance or to propose an entropy coding method suitable for distribution of input data.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 휴먼 모델 복원을 위한 딥러닝 네트워크를 경량화 및 압축할 때, 복원 화질과 압축 성능을 동시에 고려한 최적 양자화 파라미터 결정 방법을 제공함에 있다.The present invention has been devised to solve the above problems, and an object of the present invention is to lighten and compress a deep learning network for human model restoration, an optimal quantization parameter determination method that considers the restoration quality and compression performance at the same time. is in providing.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 3차원 모델 복원 시스템은 딥러닝 네트워크를 이용하여, 입력되는 이미지 영상에서 이미지 특징 벡터들을 정제하는 이미지 인코더; 딥러닝 네트워크를 이용하여, 입력되는 깊이 영상에서 볼륨 특징 벡터들을 정제하는 볼륨 인코더; 딥러닝 네트워크를 이용하여, 이미지 인코더에서 정제된 이미지 특징 벡터들과 볼륨 인코더에서 정제된 볼륨 특징 벡터들로부터 3차원 모델을 복원하는 볼륨 디코더;를 포함하고, 이미지 인코더, 볼륨 인코더 및 볼륨 디코더의 딥러닝 네트워크는, 다수의 양자화 방법들을 적응적으로 적용하여 가중치가 양자화되는 것일 수 있다.According to an embodiment of the present invention for achieving the above object, a three-dimensional model restoration system includes an image encoder for refining image feature vectors in an input image image using a deep learning network; a volume encoder that refines volume feature vectors in an input depth image using a deep learning network; A volume decoder that uses a deep learning network to reconstruct a 3D model from image feature vectors refined in the image encoder and volume feature vectors refined in the volume encoder; The learning network may be one in which weights are quantized by adaptively applying a plurality of quantization methods.

이미지 인코더, 볼륨 인코더 및 볼륨 디코더의 딥러닝 네트워크는, 계층 마다 양자화 방법을 적응적으로 적용할 수 있다. A deep learning network of an image encoder, a volume encoder, and a volume decoder can adaptively apply a quantization method for each layer.

양자화 방법은, Symmetric + Signed (SS) 양자화 방법 및 Asymmetric + unsigned (AU) 양자화 방법을 포함할 수 있다. The quantization method may include a Symmetric + Signed (SS) quantization method and an Asymmetric + unsigned (AU) quantization method.

가중치 분포의 mean, skewness, kurtosis를 기초로, 양자화 방법을 적용하는 것일 수 있다.It may be to apply a quantization method based on the mean, skewness, and kurtosis of the weight distribution.

Skewness가 특정 범위 내이면 SS 양자화 방법을 적용하고, Skewness가 특정 범위를 벗어나면 AU 양자화 방법을 적용하며, Kurtosis의 크기를 기초로 양자화 비트를 결정할 수 있다. If the skewness is within a specific range, the SS quantization method is applied, if the skewness is outside the specific range, the AU quantization method is applied, and the quantization bit can be determined based on the Kurtosis size.

본 발명에 따른 3차원 모델 복원 시스템은 이미지 인코더의 후단에서, 이미지 특징 벡터들을 압축하는 이미지 특징 벡터 압축기; 볼륨 인코더의 후단에서, 볼륨 특징 벡터들을 압축하는 볼륨 특징 벡터 압축기; 및 볼륨 디코더의 전단에서 압축된 이미지 특징 벡터들과 볼륨 특징 벡터들을 신장시키는 신장기;를 더 포함할 수 있다. A three-dimensional model restoration system according to the present invention comprises: an image feature vector compressor for compressing image feature vectors at a rear end of an image encoder; at the rear end of the volume encoder, a volume feature vector compressor for compressing the volume feature vectors; and an expander that expands the compressed image feature vectors and the volume feature vectors at the front end of the volume decoder.

이미지 특징 벡터 압축기는, Huffman Coding (HFC), Arithmetic Coding (ARC) 중 어느 하나의 기법으로 이미지 특징 벡터들을 압축하고, 볼륨 특징 벡터 압축기는, run-length coding (RLC) 기법으로 볼륨 특징 벡터들을 압축하는 것일 수 있다.The image feature vector compressor compresses the image feature vectors using any one of Huffman Coding (HFC) and Arithmetic Coding (ARC), and the volume feature vector compressor compresses the volume feature vectors using a run-length coding (RLC) technique. may be doing

본 발명의 다른 측면에 따르면, 딥러닝 네트워크를 이용하여, 입력되는 이미지 영상에서 이미지 특징 벡터들을 정제하는 단계; 딥러닝 네트워크를 이용하여, 입력되는 깊이 영상에서 볼륨 특징 벡터들을 정제하는 단계; 및 딥러닝 네트워크를 이용하여, 이미지 인코더에서 정제된 이미지 특징 벡터들과 볼륨 인코더에서 정제된 볼륨 특징 벡터들로부터 3차원 모델을 복원하는 단계;를 포함하고, 이미지 인코더, 볼륨 인코더 및 볼륨 디코더의 딥러닝 네트워크는, 다수의 양자화 방법들을 적응적으로 적용하여 가중치가 양자화되는 것을 특징으로 하는 3차원 모델 복원 방법이 제공된다.According to another aspect of the present invention, using a deep learning network, refining image feature vectors from an input image image; refining volume feature vectors in an input depth image using a deep learning network; and reconstructing a three-dimensional model from the image feature vectors refined in the image encoder and the volume feature vectors refined in the volume encoder by using a deep learning network; A learning network is provided with a three-dimensional model reconstruction method, characterized in that weights are quantized by adaptively applying a plurality of quantization methods.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 휴먼 모델 복원에 적합한 경량화 기법 제공으로 인해 소모 전력을 절감할 수 있고, 최적 양자화 기법 적용을 통해 딥러닝 네트워크 압축 효과를 증대시킬 수 있으며, 선택적 엔트로피 코딩 기법 사용을 통한 딥러닝 네트워크 압축 효과를 증대시킬 수 있게 된다.As described above, according to the embodiments of the present invention, power consumption can be reduced by providing a lightweight technique suitable for human model restoration, and the deep learning network compression effect can be increased by applying an optimal quantization technique, and selectively It is possible to increase the compression effect of the deep learning network through the use of entropy coding technique.

도 1은 본 발명이 적용가능한 3차원 휴먼 모델 복원을 위한 딥러닝 네트워크를 도시한 도면,
도 2는 Range linear 양자화 방법들을 나타낸 도면,
도 3은 본 발명의 실시예에서 제안하는 양자화 기법을 적용한 딥러닝 네트워크를 도시한 도면,
도 4는 본 발명의 실시예에서 제시하는 양자화 기법을 적용한 3차원 휴먼 모델 복원 네트워크에 엔트로피 코딩 기법을 추가적으로 적용한 것을 도시한 도면,
도 5는 본 발명의 다른 실시예에 따른 3차원 휴먼 모델 복원 방법의 설명에 제공되는 흐름도,
도 6은 본 발명의 다른 실시예에 따른 3차원 휴먼 모델 복원 장치의 하드웨어 구조이다.1 is a diagram showing a deep learning network for 3D human model restoration to which the present invention is applicable;
2 is a diagram showing range linear quantization methods;
3 is a diagram showing a deep learning network to which the quantization technique proposed in an embodiment of the present invention is applied;
4 is a diagram illustrating an additional application of an entropy coding technique to a 3D human model reconstruction network to which the quantization technique presented in an embodiment of the present invention is applied;
5 is a flowchart provided for explaining a 3D human model restoration method according to another embodiment of the present invention;
6 is a hardware structure of a 3D human model restoration apparatus according to another embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

딥러닝 기법은 영상 인식, 음성 신호처리, 자연어 처리 등 다양한 응용 분야에서 기존 전통적인 방법을 압도하는 고무적인 성능을 보이고 있다.Deep learning techniques are showing encouraging performance that surpasses existing traditional methods in various application fields such as image recognition, voice signal processing, and natural language processing.

딥러닝 기법은 가중치 (Weights)와 편항치 (Biases)로 구성된 네트워크를 주어진 응용 분야에 적합하게 학습하고, 학습된 네트워크를 다시 해당 응용 분야에 적용하여 성능을 개선한다.The deep learning technique learns a network composed of weights and biases to suit a given application field, and then applies the learned network to the application field again to improve performance.

하지만 딥러닝 기술이 발전할수록, 딥러닝 네트워크의 크기가 점차 커지고 있는 추세에 있지만, 지속 사용 시간이 중요한 모바일 응용 분야에 적용하기에는 딥러닝 추론기의 높은 소비 전력이 문제점으로 대두되고 있다.However, as the deep learning technology develops, the size of the deep learning network is gradually increasing, but the high power consumption of the deep learning inference machine is emerging as a problem to be applied to mobile applications where sustained use time is important.

이에 본 발명의 실시예에서는 3차원 휴먼 모델 복원을 위한 딥러닝 네트워크에 있어, 기존 딥러닝 추론기가 가진 높은 소비 전력 문제를 완화하면서, 기존 네트워크의 휴먼 복원 성능은 유지할 수 있는 네트워크를 제시한다.Accordingly, in an embodiment of the present invention, in a deep learning network for 3D human model restoration, a network capable of maintaining the human restoration performance of the existing network while alleviating the problem of high power consumption of the existing deep learning reasoning machine is proposed.

구체적으로 본 발명의 실시예에서는 휴먼 모델 복원을 위한 네트워크에 적합한 경량화 방법을 제공하고, 양자화 방법 수행 시 레이어 별로 적응적인 양자화 기법을 적용하며, 네트워크 별 다른 엔트로피 코딩 기법을 적용한다.Specifically, in an embodiment of the present invention, a lightweight method suitable for a network for human model restoration is provided, an adaptive quantization technique is applied for each layer when the quantization method is performed, and a different entropy coding technique is applied for each network.

도 1은 본 발명이 적용가능한 3차원 휴먼 모델 복원을 위한 딥러닝 네트워크를 도시한 도면이다. 도시된 바와 같이, 3차원 휴먼 모델 복원을 위한 딥러닝 네트워크는, 이미지 인코더 (Image Encoder : 110), 볼륨 인코더 (Volume Encoder : 120) 및 볼륨 디코더 (Volume Decoder : 130)를 포함하여 구성된다.1 is a diagram illustrating a deep learning network for 3D human model restoration to which the present invention is applicable. As shown, the deep learning network for 3D human model reconstruction is configured to include an image encoder (Image Encoder: 110), a volume encoder (Volume Encoder: 120), and a volume decoder (Volume Decoder: 130).

이미지 인코더 (110)는 RGB 영상을 입력으로 하여 오토 인코더 (Auto Encoder) 네트워크에 통과시켜 입력 데이터에 비해 적은 양의 특징 벡터 (feature vector)들로 정제 (압축)하는 딥러닝 네트워크이다.The image encoder 110 is a deep learning network that takes an RGB image as an input, passes it through an Auto Encoder network, and refines (compresses) it into a small amount of feature vectors compared to the input data.

볼륨 인코더 (120)는 3차원 공간 정보인 SDF (Signed Distance Function) 볼륨을 입력으로 하여 적은 양의 특징 벡터 (feature vector)들로 정제 (압축)하는 딥러닝 네트워크이다. SDF 볼륨은 깊이 (depth) 영상을 3차원 공간에 투영해서 생성한 것이다.The volume encoder 120 is a deep learning network that receives a Signed Distance Function (SDF) volume, which is 3D spatial information, as an input, and refines (compresses) it into a small amount of feature vectors. The SDF volume is created by projecting a depth image into a 3D space.

볼륨 디코더 (130)는 이미지 인코더 (110)와 볼륨 인코더 (120)에서 출력되는 특징 벡터들을 입력으로 하여 3차원 휴먼 모델을 복원 (decoding) 하는 딥러닝 네트워크이다.The volume decoder 130 is a deep learning network that decodes a 3D human model by inputting feature vectors output from the image encoder 110 and the volume encoder 120 .

3차원 휴먼 모델을 복원하는 과정은, 이미지 인코더 (110)와 볼륨 인코더 (120)에 의해 입력 이미지와 깊이 영상에서 불필요한 데이터는 정리하고 핵심적인 특징 벡터들만 도출하는 과정 (인코딩), 특징 벡터들을 이용하여 볼륨 디코더 (130)에서 3차원 데이터로 복원하는 과정 (디코딩)을 거쳐 수행된다.The process of reconstructing the 3D human model is a process of arranging unnecessary data in the input image and depth image by the image encoder 110 and the volume encoder 120 and deriving only core feature vectors (encoding), using feature vectors. Thus, the volume decoder 130 restores 3D data (decoding).

이미지 인코더 (110)와 볼륨 인코더/디코더 (120/130)는 32-bit 정밀도를 갖는 부동 소수점 연산으로 이뤄지게 된다.The image encoder 110 and the volume encoder/decoder 120/130 are made of floating point arithmetic with 32-bit precision.

본 발명의 실시예에서는 이미지 인코더 (110)와 볼륨 인코더/디코더 (120/130)의 가중치를 다음의 수식 (1)과 같이 range linear 양자화 기법을 통해 양자화를 진행한다. 여기서 s_w는 scale 값이고, X_f는 입력 가중치 (32-bit), z_w는 zero point, X_q는 출력 가중치 (양자화된 가중치) (8-/16-bit)이다.In the embodiment of the present invention, the weights of the image encoder 110 and the volume encoder/decoder 120/130 are quantized through a range linear quantization technique as shown in Equation (1) below. where s _w is the scale value, X _f is the input weight (32-bit), z _w is the zero point, and X _q is the output weight (quantized weight) (8-/16-bit).

X_q = s_wX_f - z_w (1)X _q = s _w X _f - z _w (1)

양자화 과정은 z_w의 값이 0인지 아닌지 여부에 따라 도 2에 도시된 바와 같이 두 가지 방법으로 나뉘게 된다.The quantization process is divided into two methods as shown in FIG. 2 according to whether the value of z _w is 0 or not.

Symmetric + Signed (SS) 양자화 방법은 32-bit X_f 값의 최대값을 기준으로 -128 ~ 127을 갖는 8-bit X_q에 맵핑하는 방법으로 z_w가 0이다. 반면에 Asymmetric + unsigned (AU) 양자화 방법은 X_f 값의 최소값과 최대값을 동시에 고려해 0 ~ 255를 갖는 8-bit X_q에 맵핑하기 때문에 z_w가 존재하게 된다.Symmetric + Signed (SS) quantization method is a method of mapping to 8-bit X _q with -128 ~ 127 based on the maximum value of 32-bit X _f , and z _w is 0. On the other hand, in the asymmetric + unsigned (AU) quantization method, z _w exists because it maps to 8-bit X _q having 0 to 255 considering the minimum and maximum values of X _f at the same time.

SS 양자화 방법은 실제 사용되지 않은 X_f 값까지 X_q로 맵핑하기 때문에 양자화 성능이 AU 양자화 방법에 비해서 떨어진다. 반면에 AU 양자화 방법은 역양자화 과정에서 z_w의 값의 범위에 따라서 X_q+z_w의 값의 범위를 8-bit로 보장해줄 수 없기 때문에, convolutional operation 과정에서 SS 양자화 방법에 비해서 더 많은 비트를 사용해야 한다. (전력 소모가 상대적으로 크다.)The SS quantization method has inferior quantization performance compared to the AU quantization method because it maps up to X _f values that are not actually used as X _q . On the other hand, since the AU quantization method cannot guarantee the range of the value of X _q + z _w to 8-bit depending on the range of the value of z _w in the inverse quantization process, there are more bits than the SS quantization method in the convolutional operation process. should use (The power consumption is relatively large.)

본 발명의 실시예에서는 SS 양자화 방법과 AU 양자화 방법을 적응적으로 사용하여 네트워크를 양자화하는 방안을 제시한다. 이 때 입력 X_f의 분포 (mean, skewness : 분포의 치우침 정도, kurtosis : 분포의 뾰족한 정도)를 보고 SS 양자화 방법을 수행할지 AU 양자화 방법을 수행할지 결정하도록 제안한다.An embodiment of the present invention proposes a method for quantizing a network by adaptively using an SS quantization method and an AU quantization method. At this time, it is proposed to determine whether to perform the SS quantization method or the AU quantization method by looking at the distribution of the input X _f (mean, skewness: the degree of skewness of the distribution, kurtosis: the sharpness of the distribution).

분포의 치우침이 있는 경우는 (Skewness > 0이면 왼쪽으로 치우침, < 0 이면 오른쪽으로 치우침) asymmetric 기법을 쓰고, 0에 가까우면 symmetric 기법을 사용 한다.If the distribution is skewed (Skewness > 0, left-skewed, < 0, right-skewed), the asymmetric technique is used, and when it is close to 0, the symmetric technique is used.

또한 뾰족한 분포를 갖는 경우는 (Kurtosis 가 큰 값을 갖는 경우는) 양자화를 8비트가 아닌 더 작은 비트로 수행하고, 평평한 경우는 양자화를 높은 비트로 적응적으로 수행하도록 한다.Also, in the case of a sharp distribution (when Kurtosis has a large value), quantization is performed with smaller bits instead of 8 bits, and in the case of a flat distribution, quantization is adaptively performed with high bits.

본 발명의 실시예에서 제시하는 양자화 방법을 적용하면 도 3과 같이 표현할 수 있다. 본 발명의 실시예에서는, 각 레이어 별로 다른 형태의 양자화 방법이 수행될 수 있다.When the quantization method presented in the embodiment of the present invention is applied, it can be expressed as shown in FIG. 3 . In an embodiment of the present invention, a different type of quantization method may be performed for each layer.

본 발명의 실시예에서는, 도 3에 제시된 바와 같이 양자화 기법을 적용한 후, 추가적으로 엔트로피 코딩 기법을 적용하여 이미지 인코더 (110)와 볼륨 인코더 (120)의 출력 결과를 압축할 수 있다.In an embodiment of the present invention, after applying the quantization technique as shown in FIG. 3 , the output results of the image encoder 110 and the volume encoder 120 may be compressed by additionally applying an entropy coding technique.

도 4는 본 발명의 실시예에서 제시하는 양자화 기법을 적용한 3차원 휴먼 모델 복원 네트워크에 엔트로피 코딩 기법을 추가적으로 적용한 것을 도시하였다.FIG. 4 shows that an entropy coding technique is additionally applied to the 3D human model reconstruction network to which the quantization technique presented in the embodiment of the present invention is applied.

볼륨 인코더 (120)의 입력인 SDF 가 0과 1로 구성된 볼륨 데이터이기 때문에 볼륨 인코더 (120)의 결과에 run-length coding (RLC)을 적용하여 추가적인 압축을 수행한다. RLC는 0 (run)의 길이 (length)를 세서 기록하기 때문에 0이 연속적으로 나오지 않는 상황에서의 압축 성능은 낮지만 압축 수행 속도가 빠르다.Since the SDF input of the volume encoder 120 is volume data composed of 0 and 1, additional compression is performed by applying run-length coding (RLC) to the result of the volume encoder 120 . Because RLC counts and records the length of 0 (run), compression performance is low in a situation where 0s do not appear consecutively, but compression performance is fast.

이미지 인코더 (110)의 결과는 RGB 영상에 대한 주요한 특징점 (경계, 형태 정보 등)을 담고 있기 때문에 RLC 보다는 느리지만 0의 값이 연속적이지 않은 상황에서도 높은 압축 성능을 보이는 Huffman Coding (HFC), Arithmetic Coding (ARC) 등을 적용해서 압축 성능을 확보한다.Huffman Coding (HFC), Arithmetic, which is slower than RLC because the result of the image encoder 110 contains major feature points (boundary, shape information, etc.) for RGB images, but shows high compression performance even in situations in which zero values are not continuous. Coding (ARC) is applied to secure compression performance.

이를 통해 이미지 인코더 (110)와 볼륨 인코더 (120)를 추가적으로 압축을 수행할 수 있다.Through this, the image encoder 110 and the volume encoder 120 may additionally perform compression.

볼륨 디코더 (130)는 서버에서 구동 될 수도 있고, 모바일 단말에서 구동 될 수도 있기 때문에 이미지 인코더 (110)와 볼륨 인코더 (120)의 출력 결과의 추가적인 압축은 단말 입장에서 전력 성능을 확보로 연결된다. (외부 메모리에 접근해 데이터를 가져오는 소비 전력이 압축을 디코딩하는 비용보다 10배 이상 전력 소모가 크다.) 볼륨 디코더 (130)의 앞단에는 압축된 특징 벡터들을 디코딩하기 위한 HFC / ARC/ RLC decoder가 선행되어야 한다.Since the volume decoder 130 may be driven in a server or in a mobile terminal, additional compression of the output results of the image encoder 110 and the volume encoder 120 is connected to secure power performance from the viewpoint of the terminal. (Power consumption of accessing an external memory and fetching data consumes more than 10 times more power than the cost of decoding compression.) At the front end of the volume decoder 130, an HFC / ARC / RLC decoder for decoding compressed feature vectors should take precedence

도 5는 본 발명의 다른 실시예에 따른 3차원 휴먼 모델 복원 방법의 설명에 제공되는 흐름도이다.5 is a flowchart provided to explain a 3D human model reconstruction method according to another embodiment of the present invention.

도시된 바와 같이, 먼저 이미지 인코더 (110)는 입력되는 RGB 이미지 영상에서 특징 벡터들을 정제하고 (S210), 이미지 인코더 (110)의 후단에 마련된 압축기는 S210단계에서 정제된 이미지 특징 벡터들을 압축한다 (S220).As shown, first, the image encoder 110 refines the feature vectors in the input RGB image image (S210), and the compressor provided at the rear end of the image encoder 110 compresses the image feature vectors refined in step S210 ( S220).

이미지 인코더 (110)에 구현된 딥러닝 네트워크는, 가중치 분포의 mean, skewness, kurtosis를 기초로 SS 양자화 방법과 AU 양자화 방법이 적응적으로 적용되어, 가중치가 양자화된 것이다. 이는, 아래의 볼륨 인코더 (120)와 볼륨 디코더 (130)도 마찬가지이다. S220단계에서의 압축은 HFC/ARC 기법이 적용된다.In the deep learning network implemented in the image encoder 110 , the SS quantization method and the AU quantization method are adaptively applied based on the mean, skewness, and kurtosis of the weight distribution, and the weights are quantized. This is the same for the volume encoder 120 and the volume decoder 130 below. In step S220, the HFC/ARC technique is applied.

볼륨 인코더 (120)는 깊이 영상 정보를 담고 있는 SDF 볼륨을 입력받아 특징 벡터들을 정제하고 (S230), 볼륨 인코더 (120)의 후단에 마련된 압축기는 S230단계에서 정제된 볼륨 특징 벡터들을 압축한다 (S240). S240단계에서의 압축은 RLC 기법이 적용된다.The volume encoder 120 receives the SDF volume containing the depth image information and refines the feature vectors (S230), and the compressor provided at the rear end of the volume encoder 120 compresses the volume feature vectors refined in step S230 (S240). ). In step S240, the RLC technique is applied.

볼륨 디코더 (130)의 후단에 마련된 신장기는 S220단계에서 압축된 이미지 특징 벡터들과 S230단계에서 압축된 볼륨 특징 벡터들을 신장시켜 압축 해제한다 (S250).The decompressor provided at the rear end of the volume decoder 130 decompresses the image feature vectors compressed in step S220 and the volume feature vectors compressed in step S230 ( S250 ).

볼륨 디코더 (130)는 S250단계에서 신장된 이미지 특징 벡터들과 볼륨 특징 벡터들을 입력으로 하여 3차원 휴먼 모델을 복원한다 (S260)The volume decoder 130 restores the 3D human model by inputting the image feature vectors and the volume feature vectors extended in step S250 (S260).

도 6은 본 발명의 다른 실시예에 따른 3차원 휴먼 모델 복원 장치의 하드웨어 구조이다.6 is a hardware structure of a 3D human model restoration apparatus according to another embodiment of the present invention.

본 발명의 실시예에 따른 3차원 휴먼 모델 복원 장치는, 도시된 바와 같이, 통신부 (310), 프로세서 (320) 및 저장부 (330)를 포함하여 구성되는 컴퓨팅 시스템으로 구현가능하다.The apparatus for restoring a three-dimensional human model according to an embodiment of the present invention can be implemented as a computing system including a communication unit 310 , a processor 320 and a storage unit 330 , as shown.

통신부 (310)는 외부 기기/네트워크와 통신하기 위한 수단이며, 프로세서 (320)는 도 1, 도 3 및 도 4에 도시된 3차원 휴먼 모델 복원을 위한 딥러닝 네트워크를 구동하기 위한 CPU와 GPU들의 집합이다. 저장부 (330)를 프로세서 (320)가 동작함에 있어 필요한 저장 공간을 제공한다.The communication unit 310 is a means for communicating with an external device/network, and the processor 320 is a CPU and GPU for driving a deep learning network for 3D human model restoration shown in FIGS. 1, 3 and 4 . is a set The storage unit 330 provides a storage space necessary for the processor 320 to operate.

지금까지, 3차원 휴먼 모델 복원을 위한 경량화된 딥러닝 네트워크 방법 및 장치에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, a preferred embodiment has been described in detail for a lightweight deep learning network method and apparatus for 3D human model restoration.

위 실시예에서는, 휴먼 모델 복원을 위한 딥러닝 네트워크를 경량화하기 위해 가중치 양자화 기법을 적응적으로 적용하도록 하였고, 특징 벡터들의 특성에 따라 적정의 압축 기법을 적용하도록 하였다.In the above embodiment, the weight quantization technique was adaptively applied to lighten the deep learning network for human model restoration, and an appropriate compression technique was applied according to the characteristics of the feature vectors.

휴먼 모델 복원에 적합한 딥러닝 네트워크의 경량화와 압축으로, 전력 절감과 압축 효과의 증대를 기대할 수 있다.By reducing the weight and compression of the deep learning network suitable for human model restoration, power saving and compression effect can be expected.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.On the other hand, it goes without saying that the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium may be any data storage device readable by the computer and capable of storing data. For example, the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims In addition, various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

110 : 이미지 인코더 (Image Encoder)
120 : 볼륨 인코더 (Volume Encoder)
130 : 볼륨 디코더 (Volume Decoder)110: Image Encoder
120: Volume Encoder
130: Volume Decoder

Claims

an image encoder for refining image feature vectors in an input image image using a deep learning network;
a volume encoder that refines volume feature vectors in an input depth image using a deep learning network;
A volume decoder that uses a deep learning network to reconstruct a 3D model from the image feature vectors refined in the image encoder and the volume feature vectors refined in the volume encoder;
A deep learning network of image encoders, volume encoders and volume decoders,
A three-dimensional model restoration system, characterized in that weights are quantized by adaptively applying a plurality of quantization methods.

The method according to claim 1,
A deep learning network of image encoders, volume encoders and volume decoders,
A three-dimensional model restoration system, characterized in that adaptively applying a quantization method for each layer.

3. The method according to claim 2,
The quantization method is
A three-dimensional model reconstruction system comprising a Symmetric + Signed (SS) quantization method and an Asymmetric + unsigned (AU) quantization method.

4. The method according to claim 3,
A three-dimensional model restoration system, characterized by applying a quantization method based on the mean, skewness, and kurtosis of the weight distribution.

5. The method according to claim 4,
If the skewness is within a specific range, the SS quantization method is applied, and if the skewness is outside the specific range, the AU quantization method is applied.
3D model reconstruction system, characterized in that the quantization bit is determined based on the size of Kurtosis.

The method according to claim 1,
at the rear end of the image encoder, an image feature vector compressor for compressing image feature vectors;
at the rear end of the volume encoder, a volume feature vector compressor for compressing the volume feature vectors; and
3D model reconstruction system, characterized in that it further comprises; an expander that expands the compressed image feature vectors and the volume feature vectors in the front end of the volume decoder.

7. The method of claim 6,
Image Features Vector Compressor,
Image feature vectors are compressed by any one of Huffman Coding (HFC) and Arithmetic Coding (ARC),
Volume feature vector compressor,
A 3D model reconstruction system, characterized in that the volume feature vectors are compressed using a run-length coding (RLC) technique.

refining image feature vectors from an input image using a deep learning network;
refining volume feature vectors in an input depth image using a deep learning network; and
Using a deep learning network, reconstructing a three-dimensional model from image feature vectors refined in an image encoder and volume feature vectors refined in a volume encoder;
A deep learning network of image encoders, volume encoders and volume decoders,
A 3D model reconstruction method, characterized in that weights are quantized by adaptively applying a plurality of quantization methods.