KR20180097949A

KR20180097949A - The estimation and refinement of pose of joints in human picture using cascade stages of multiple convolutional neural networks

Info

Publication number: KR20180097949A
Application number: KR1020170024781A
Authority: KR
Inventors: 오치민
Original assignee: 오치민
Priority date: 2017-02-24
Filing date: 2017-02-24
Publication date: 2018-09-03

Abstract

The present invention relates to a method for predicting a joint posture in an image and, more particularly, to method for predicting a joint posture in an image by using a sequential multiple convolutional neural network capable of precisely predicting human joint coordinates in an image after optimizing training data and a convolutional neural network through a sequential multiple convolutional neural network.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for predicting joint position in an image using a sequential multiple composite neural network,

본 발명은 영상 내 관절 자세 예측 방법에 관한 것으로, 보다 구체적으로는 순차적 다중 합성곱 신경망을 통해 학습데이터와 합성곱 신경망을 최적화한 후, 영상 내의 사람의 관절 좌표를 정밀하게 예측할 수 있는 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법에 관한 것이다.The present invention relates to a method of predicting a joint posture in an image, and more particularly, to a method for predicting a joint posture in an image by sequentially optimizing learning data and a composite neural network through a sequential multiple- The present invention relates to a method for predicting joint position in an image using a product neural network.

신경망(neural networks)은 패턴인식(Pattern recognition), 컴퓨터 비젼(Computer vision), 자연어처리(Natural language processing; NLP), 그리고 로보틱스(Robotics) 등 인공지능의 다양한 분야에서 사용된다Neural networks are used in various fields of artificial intelligence such as pattern recognition, computer vision, natural language processing (NLP), and robotics

딥러닝 알고리즘 중, 합성곱 신경망(CNN,Convolutional Neural Network,컨벌루션 신경망)은 영상과 음성 인식에서 좋은 성능을 보이고 있으며, 2차원 데이터의 입력이 용이하고 훈련이 쉬운 장점이 있어 최근 연구가 활발하다.Among the deep learning algorithms, CNN (Convolutional Neural Network) has good performance in image and speech recognition, and it is easy to input 2D data and easy to train.

합성곱 신경망을 통한 영상 내의 관절 자세 예측은 감시, 보안, 스포츠 등 다양한 분야에서 적용되고 있으며 예측의 정확도를 향상시키고자 하는 노력이 있다.Prediction of joint position in an image through articulated neural network is applied in various fields such as surveillance, security, and sports, and efforts are made to improve the accuracy of prediction.

1. 한국등록특허 제10-1657495호, '딥러닝 분석을 위한 모듈화시스템 및 이를 이용한 영상 인식 방법'1. Korean Patent No. 10-1657495, 'Modularization System for Deep Learning Analysis and Image Recognition Method Using It' 2. 한국등록특허 제10-0442835호, '인공 신경망을 이용한 얼굴 인식 방법 및 장치'2. Korean Patent No. 10-0442835, 'Face Recognition Method and Apparatus Using Artificial Neural Network'

본 발명은 상술한 요구를 충족하기 위해 안출된 것으로 본 발명의 목적은 사람 영상 내에서 사람의 관절 위치를 정확하게 예측할 수 있는 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법을 제공하는 것이다.It is an object of the present invention to provide a method for predicting intra-image joint position using a sequential multiple composite neural network capable of accurately predicting a joint position of a human in a human image.

상기의 목적을 달성하기 위하여 본 발명은 (a)복수의 영상(Xn)과 영상 내 관절 좌표(Yn)를 원소로 하는 학습데이터(Xn,Yn)를 생성하는 단계; (b)상기 학습데이터 중, 임의의 학습데이터(Xi,Yi)를 합성곱 신경망(CNN,Convolutional Neural Network)에 입력하고 관절 좌표를 예측하는 단계; (c)예측된 관절 좌표(Yi')와 상기 임의의 학습데이터의 관절 좌표(Yi) 간의 에러값을 계산하고, 상기 에러값과 미리 설정된 기준값을 비교하는 단계; (d)상기 에러값이 상기 기준값보다 클 경우, 오류 역전파 알고리즘(error back-propagation algorithm)을 이용하여 가중치 보정값을 계산하고, 상기 합성곱 신경망의 가중치에 상기 가중치 보정값을 더하여 가중치가 보정된 새로운 합성곱 신경망(CNN')을 생성하는 단계; (e)예측된 관절 좌표(Yi')에 노이즈를 가하여 복수의 예측된 관절 좌표(Yi")를 생성하고, 상기 학습데이터(Xn,Yn)의 원 관절 좌표(Yn)에 노이즈를 가하여 복수의 원 관절 좌표(Yn')를 생성하는 단계; (f)상기 학습데이터(Xn,Yn)의 원 영상(Xn)에서 상기 예측된 관절 좌표(Yi")를 포함하는 관심영역을 잘라내여 부분 영상들(Xn')을 생성하는 단계; (g)상기 부분 영상들(Xn')과 노이즈가 가해진 원 관절 좌표(Yn')를 원소로 하는 새로운 학습데이터(Xn',Yn')를 생성하는 단계; (h)상기 단계(c)에서 에러값이 기준값보다 작아질 때까지 상기 학습데이터(Xn,Yn)를 상기 새로운 학습데이터(Xn',Yn')로 바꿔가며 상기 (b)단계에서 상기 (g)단계를 반복하는 단계; (i)상기 단계(c)에서 에러값이 기준값보다 작아질 때 생성된 학습데이터(Xn",Yn")와 합성곱 신경망(CNN")을 최종 학습데이터 및 최종 합성곱 신경망으로 설정하는 단계; 및 (j)관절 자세를 예측하고자 하는 대상 영상을 상기 최종 합성곱 신경망에 입력하여 상기 대상 영상의 관절 자세를 예측하는 단계;를 포함하는 것을 특징으로 하는 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법을 제공한다.According to an aspect of the present invention, there is provided a method of generating training data, the method comprising: (a) generating training data Xn, Yn using elements Xn and intra-image joint coordinates Yn as elements; (b) inputting arbitrary learning data (Xi, Yi) out of the learning data into a CNN (Convolutional Neural Network) and predicting joint coordinates; (c) calculating an error value between the predicted joint coordinate (Yi ') and the joint coordinate (Yi) of the arbitrary learning data, and comparing the error value with a preset reference value; (d) calculating a weight correction value using an error back-propagation algorithm when the error value is larger than the reference value, adding the weight correction value to the weight of the resultant neural network, Generating a new composite neural network < RTI ID = 0.0 > (CNN ') < / RTI > (e) generating a plurality of predicted joint coordinates (Yi ") by applying noise to the predicted joint coordinates (Yi ') and applying noise to the joint joint coordinates (Yn) of the learning data (Xn, Yn) (Yn ') from the original image (Xn) of the learning data (Xn, Yn) by cutting out a region of interest including the predicted joint coordinate (Yi " (Xn '); (g) generating new learning data Xn ', Yn' using the partial images Xn 'and noise-added original joint coordinates Yn' as elements; (h) replacing the learning data (Xn, Yn) with the new learning data (Xn ', Yn') until the error value becomes smaller than the reference value in the step (c) ) Repeating steps; (i) setting training data (Xn ", Yn ") and a composite neural network (CNN") generated when the error value becomes smaller than the reference value in the step (c) as final learning data and final concurrent neural network; And (j) inputting a target image for predicting a joint posture to the final concatenated neural network to predict the joint posture of the target image. The method of claim 1, Provides a prediction method.

바람직한 실시예에 있어서, 상기 합성곱 신경망(CNN)이 관절 좌표를 예측하는 단계는, 입력되는 학습데이터를 콘볼루션 필터에 통과시켜 복수의 콘볼루션 레이어를 생성하는 단계; 정류 선형 유닛(ReLU,Rectified Linear Unit)을 통해 상기 각 콘볼루션 레이어의 값들 중, '0'이하의 값을 '0'으로 변환하는 단계; 상기 각 콘볼루션 레이어들을 맥스 연산(max operation) 또는 민 연산(mean operation)으로 축소하여 풀링 레이어로 생성하는 단계; 상기 풀링 레이더들을 완전 연결(fully connected)하여 1차원 특징백터로 생성하는 단계; 및 상기 1차원 특징벡터에 가중치를 변화시키며 곱하여 상기 관절 좌표에 대응하는 개수의 예측된 관절 좌표들을 출력하는 단계;를 포함한다.In a preferred embodiment, the step of predicting joint coordinates of the composite neural network (CNN) comprises: generating a plurality of convolutional layers by passing input learning data through a convolution filter; Converting a value of '0' or less among the values of each convolution layer into '0' through a rectified linear unit (ReLU); Reducing the convolutional layers to a max operation or a mean operation to generate a pulling layer; Fully connecting the pooling radars to generate a one-dimensional feature vector; And outputting a number of predicted joint coordinates corresponding to the joint coordinates by varying the weights and multiplying the one-dimensional feature vectors.

바람직한 실시예에 있어서, 상기 관절 좌표의 개수는 총 14개이고, 각 관절 좌표는 x좌표 및 y좌표를 가지며, 머리위, 목, 왼쪽 어깨, 오른쪽 어깨, 왼쪽 팔꿈치, 오른쪽 팔꿈치, 왼쪽 손목, 오른쪽 손목, 골반 왼쪽, 골반 오른쪽, 왼쪽 무릎, 오른쪽 무릎, 왼쪽 발 및 오른쪽 발의 좌표를 포함한다.In a preferred embodiment, the total number of the joint coordinates is 14, each joint coordinate has an x coordinate and a y coordinate, and the upper head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist , Pelvis left, pelvis right, left knee, right knee, left foot and right foot coordinates.

또한, 본 발명은 컴퓨터를 기능시켜 상기 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법을 실행하기 위한 기록매체에 저장된 컴퓨터 프로그램을 더 제공한다.The present invention further provides a computer program stored in a recording medium for executing a method for predicting a joint posture in an image using the sequential multiple composite neural network by functioning as a computer.

또한, 본 발명은 상기 컴퓨터 프로그램이 설치되고, 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법을 수행하는 컴퓨터를 더 제공한다.In addition, the present invention further provides a computer provided with the computer program and performing a method of predicting a joint posture in an image using a sequential multiple compound neural network.

본 발명은 다음과 같은 우수한 효과를 가진다.The present invention has the following excellent effects.

본 발명의 일 실시예에 따른 관절 자세 예측 방법은 예측 오차가 기준값 이하가 되도록 순차적으로 새롭게 생성된 다중 합성곱 신경망과 학습데이터를 생성하여 영상내 관절 좌표를 정확하게 예측할 수 있는 장점이 있다.The joint posture prediction method according to an embodiment of the present invention is advantageous in that it can accurately predict the joint coordinates in the image by generating the learning data and the multi-articulated neural network newly generated sequentially so that the prediction error is less than the reference value.

또한, 본 발명의 일 실시예에 따른 관절 자세 예측 방법은 크로핑을 통해 학습데이터가 최대한 사람 영상만을 포함하도록 새롭게 만들어지므로, 추후 실제 사람 영상에서 관절자세를 더욱 정확하게 예측할 수 있는 장점이 있다.In addition, the joint posture predicting method according to an embodiment of the present invention is advantageous in that the joint posture can be more accurately predicted in real human images in the future since the learning data is newly generated to include only the human image as much as possible through cropping.

도 1은 본 발명의 일 실시예에 따른 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법의 흐름도를 보여주는 도면,
도 2는 본 발명의 일 실시예에 따른 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법에서 순차적 다중 합성곱 신경망을 통해 새로운 학습데이터를 생성하는 과정을 설명하기 위한 도면,
도 3은 본 발명의 일 실시예에 따른 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법에서 합성곱 신경망이 관절 좌표를 예측하는 과정을 설명하기 위한 도면,
도 4는 본 발명의 일 실시예에 따른 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법에 의해 예측된 관절 좌표 예측 결과를 설명하기 위한 도면이다.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flowchart illustrating a method for predicting a joint posture in an image using a sequential multiple composite neural network according to an embodiment of the present invention; FIG.
2 is a diagram for explaining a process of generating new learning data through a sequential multiple composite neural network in a method for predicting intra-image joint position using a sequential multiple composite neural network according to an embodiment of the present invention;
3 is a diagram for explaining a process of estimating joint coordinates of a composite neural network in an intra-image joint posture prediction method using a sequential multiple composite neural network according to an embodiment of the present invention;
FIG. 4 is a diagram for explaining joint coordinate prediction results predicted by a method for predicting intra-image joint position using a sequential multiple composite neural network according to an embodiment of the present invention.

본 발명에서 사용되는 용어는 가능한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있는데 이 경우에는 단순한 용어의 명칭이 아닌 발명의 상세한 설명 부분에 기재되거나 사용된 의미를 고려하여 그 의미가 파악되어야 할 것이다.Although the terms used in the present invention have been selected as general terms that are widely used at present, there are some terms selected arbitrarily by the applicant in a specific case. In this case, the meaning described or used in the detailed description part of the invention The meaning must be grasped.

이하, 첨부한 도면에 도시된 바람직한 실시예들을 참조하여 본 발명의 기술적 구성을 상세하게 설명한다.Hereinafter, the technical structure of the present invention will be described in detail with reference to preferred embodiments shown in the accompanying drawings.

그러나 본 발명은 여기서 설명되는 실시예에 한정되지 않고 다른 형태로 구체화될 수도 있다. 명세서 전체에 걸쳐 동일한 참조번호는 동일한 구성요소를 나타낸다.However, the present invention is not limited to the embodiments described herein but may be embodied in other forms. Like reference numerals designate like elements throughout the specification.

본 발명의 일 실시예에 따른 순차적 다중 합성곱 신경망을 이용한 영상 내 관절 자세 예측 방법(이하, '관절 자세 예측 방법'이라 함)은 새롭게 생성되는 복수의 합성곱 신경망을 통해 입력되는 영상 내에서 사람의 관절 위치를 정확히 예측할 수 있는 방법이다.The intra-image joint posture predicting method (hereinafter, referred to as a 'joint posture predicting method') using a sequential multiple composite neural network according to an embodiment of the present invention is a method of predicting a joint posture predicting method in a video image input through a plurality of newly- The position of the joint can be accurately predicted.

또한, 관절 자세의 예측이란 영상내에서 관절 위치들을 예측하는 것과 동일한 의미이다.The prediction of the joint posture is the same as the prediction of the joint positions in the image.

또한, 본 발명의 일 실시예에 따른 관절 자세 예측 방법은 컴퓨터에 의해 수행되고, 상기 컴퓨터에는 컴퓨터를 기능시켜 상기 관절 자세 예측 방법을 수행하기 위한 컴퓨터 프로그램이 저장된다.Also, a method for predicting a joint posture according to an embodiment of the present invention is performed by a computer, and the computer stores a computer program for performing a joint posture predicting method by functioning a computer.

또한, 상기 컴퓨터는 일반적인 퍼스널 컴퓨터뿐만아니라, 통신망을 통해 접속가능한 서버 컴퓨터, 클라우드 시스템, 스마트폰, 테블릿PC와 같은 스마트 기기, 보안카메라 등에 내장되는 임베디드 시스템을 포함하는 광의의 컴퓨터이다.The computer is not only a general personal computer but also an intelligent computer including an embedded system embedded in a security device such as a server computer, a cloud system, a smart device such as a smart phone, a tablet PC, or a security camera.

또한, 상기 컴퓨터 프로그램은 별도의 기록 매체에 저장되어 제공될 수 있으며, 상기 기록매체는 본 발명을 위하여 특별히 설계되어 구성된 것들이거나 컴퓨터 소프트웨어 분야에서 통상의 지식을 가진 자에게 공지되어 사용 가능한 것일 수 있다. In addition, the computer program may be stored in a separate recording medium, and the recording medium may be designed and configured specifically for the present invention or may be known and used by those having ordinary skill in the computer software field .

예를 들면, 상기 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD, DVD와 같은 광 기록 매체, 자기 및 광 기록을 겸할 수 있는 자기-광 기록 매체, 롬, 램, 플래시 메모리 등 단독 또는 조합에 의해 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치일 수 있다.For example, the recording medium may be a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical recording medium such as a CD and a DVD, a magneto-optical recording medium capable of serving also as magnetic and optical recording, Or the like, or a hardware device specially configured to store and execute program instructions by itself or in combination.

또한, 상기 컴퓨터 프로그램은 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등이 단독 또는 조합으로 구성된 프로그램일 수 있고, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라, 인터프리터 등을 사용하여 컴퓨터에 의해 실행될 수 있는 고급 언어 코드로 짜여진 프로그램일 수 있다.In addition, the computer program may be a program consisting of program commands, local data files, local data structures, etc., alone or in combination, and may be executed by a computer using an interpreter or the like as well as machine code Lt; RTI ID = 0.0 > language code. &Lt; / RTI >

이하에서는 도 1을 참조하여 본 발명의 일 실시예에 따른 관절 자세 예측 방법을 상세히 설명한다.Hereinafter, a joint posture prediction method according to an embodiment of the present invention will be described in detail with reference to FIG.

도 1을 참조하면, 본 발명의 일 실시예에 따른 관절 자세 예측 방법은 크게 학습데이터와 합성곱 신경망을 순차적으로 새롭게 생성하는 과정(S1000)과 상기 학습데이터와 상기 합성곱 신경망을 최종적으로 생성한 이후, 실제 영상의 관절 자세를 예측하는 과정(S2000)을 포함한다.Referring to FIG. 1, a joint posture predicting method according to an exemplary embodiment of the present invention includes a process of sequentially generating learning data and a composite neural network sequentially (S1000), a process of finally generating the learning data and the composite neural network Thereafter, a process of predicting the joint posture of the actual image (S2000) is included.

상기 학습데이터와 상기 합성곱 신경망을 순차적으로 새롭게 생성하는 과정은(S1000)는 먼저, 학습데이터를 생성한다(S1100).In the step of sequentially generating the learning data and the composite neural network sequentially (S1000), training data is generated (S1100).

상기 학습데이터(Xn,Yn)는 초기의 학습데이터이며 영상(Xn)과 영상 내의 사람의 관절 좌표(Yn)를 포함하여 이루어진다.The learning data Xn and Yn are initial learning data and include the image Xn and the human joint coordinates Yn in the image.

또한, 하나의 영상은 하나의 관절 좌표와 서로 매핑되며, 수천에서 수만 개의 영상에 대한 데이터로 생성된다.Also, one image is mapped to one joint coordinate and is generated as data for thousands to tens of thousands of images.

예를 들면, 하나의 영상에는 총 14개의 관절 좌표가 매핑되며, 각 관절 좌표는 2차원 영상 내의 x좌표 및 y좌표를 포함한다.For example, a total of 14 joint coordinates are mapped to one image, and each joint coordinate includes an x coordinate and a y coordinate in a two-dimensional image.

즉, 상기 관절 좌표(Yn)에는 28개의 좌표값이 포함된다.That is, the joint coordinate Yn includes 28 coordinate values.

또한, 상기 14개의 좌표는 사람의 머리위, 목, 왼쪽 어깨, 오른쪽 어깨, 왼쪽 팔꿈치, 오른쪽 팔꿈치, 왼쪽 손목, 오른쪽 손목, 골반 왼쪽, 골반 오른쪽, 왼쪽 무릎, 오른쪽 무릎, 왼쪽 발 및 오른쪽 발의 좌표일 수 있다.The fourteen coordinates are the coordinates of the head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, pelvis left, pelvis right, left knee, right knee, Lt; / RTI >

그러나, 상기 좌표의 개수와 관절 위치는 설계자에 따라 변경이 가능하다.However, the number of the coordinates and the position of the joints can be changed according to the designer.

즉, 상기 학습데이터(Xn,Yn)는 복수개의 영상(Xn)과 각 영상에 매핑된 관절 좌표(Yn)로 이루어지는 데이터집합이다.That is, the learning data Xn, Yn is a data set including a plurality of images Xn and joint coordinates Yn mapped to respective images.

다음, 상기 학습데이터(Xn,Yn) 중, 임의의 하나의 학습데이터(Xi,Yi)를 합성곱 신경망(CNN,Convolutional Neural Network)에 입력하고 상기 합성곱 신경망(CNN)에서 출력되는 값들을 상기 임의의 학습데이터(Xi,Yi)의 관절 좌표(Yi')로 예측한다.Next, arbitrary one of the learning data Xi, Yi of the learning data Xn, Yn is input to the CNN (Convolutional Neural Network), and the values output from the CNN Is predicted by the joint coordinates (Yi ') of arbitrary learning data (Xi, Yi).

또한, 상기 합성곱 신경망(CNN)이란 최근 많이 이용되는 딥러닝(deep learning) 패턴인식 알고리즘으로 영상이나 음성인식에 인식률이 매우 좋은 것으로 알려져 있다.In addition, the CNN is known as a deep learning pattern recognition algorithm, which is widely used in recent years, and is known to have a very good recognition rate for image or speech recognition.

더욱 자세하게는 도 3을 참조하면, 상기 합성곱 신경망(CNN)은 먼저, 임의의 학습데이터(100)를 소정의 가중치를 갖는 콘볼루션 필터(convolution filter)를 통과시켜 특징 영상인 복수의 콘볼루션 레이어(200)를 생성한다.More specifically, referring to FIG. 3, the CNN first passes arbitrary learning data 100 through a convolution filter having a predetermined weight to generate a plurality of convolutional layers (200).

다음, 정류 선형 유닛(ReLU,Rectified Linear Unit)을 통해 상기 각 콘볼루션 레이어(200)의 픽셀 값들 중, 신경망에서 무의미한 '0'이하의 값들을 '0'으로 변환하여 정류된 선형의 콘볼루션 레이어들(300)을 생성한다.Next, among the pixel values of each convolution layer 200 through the rectilinear linear unit (ReLU, Rectified Linear Unit), values less than '0' meaningless in the neural network are converted into '0', and the rectified linear convolution layer (300).

다음, 콘볼루션 레이어의 크기를 맥스 연산(max operation) 또는 민 연산(mean operation)으로 축소하여 풀링 레이어(400)를 생성한다.Next, the size of the convolution layer is reduced to a max operation or a mean operation to generate a pooling layer 400. [

본 발명에서는 2×2 윈도우 내의 픽셀들 중 최대치를 선택하여 픽셀 수를 절반으로 줄임으로써 크기가 절반으로 줄어든 맥스 풀링 레이어(max pooling layer)를 생성하였다.In the present invention, a max pooling layer having a size reduced by half is created by selecting a maximum value among pixels in a 2x2 window to reduce the number of pixels to one half.

다음, 상기 각 풀링 레이어(400)들을 입력 영상으로 하여 콘볼루션 레이어를 생성하고 정류한 후 다시 축소하는 과정을 반복(400')함으로써, 원하는 크기와 개수의 풀링 레이어들(500)을 획득한다.Next, a convolution layer is generated by using each of the pulling layers 400 as an input image, and a process of rectifying and then reducing the convolution layer is repeated 400 'to obtain a desired size and a number of pooling layers 500.

다음, 상기 풀링 레이어들(500)을 완전 연결(fully connected)하여 하나의 1차원 특징벡터(600)를 생성한다.Next, the pulling layers 500 are fully connected to generate a one-dimensional feature vector 600.

여기서 상기 1차원 특징벡터(600)는 완전 연결 레이어(fully connected layer)이라고도 한다.Here, the one-dimensional feature vector 600 is also referred to as a fully connected layer.

다음, 상기 1차원 특징벡터(600)에 가중치(700)를 내적하여 하나의 실수(800)를 계산하고, 가중치(700)를 변경하면서 실수를 계산하는 과정을 반복(800) 한다.Next, the process of calculating the real number 800 by calculating the weight 700 internally by adding the weight 700 to the one-dimensional feature vector 600, and calculating the real number while changing the weight 700 is repeated.

이때, 상기 실수(800)는 상기 임의의 학습데이터(Xi,Yi)의 영상(Xi) 내에서 예측되는 관절 좌표의 x좌표 또는 y좌표가 된다.At this time, the real number 800 is the x coordinate or y coordinate of the joint coordinates predicted in the image Xi of the arbitrary learning data Xi, Yi.

본 발명에서는 상기 관절 좌표가 총 14개 이고, 각 관절 좌표는 x좌표 및 y좌표를 포함하므로 총 28개의 실수(800)가 계산되도록 반복한다.In the present invention, the total number of the joint coordinates is 14, and each joint coordinate includes the x-coordinate and the y-coordinate, so that a total of 28 real numbers (800) are repeatedly calculated.

즉, 상기 합성곱 신경망(CNN)은 28개의 실수(800)를 예측된 관절 좌표(Yi')로 출력한다.That is, the CNN outputs 28 real numbers (800) as predicted joint coordinates (Yi ').

한편, 상기 합성곱 신경망의 연산과정에서 콘볼루션 레이어(200)를 계산하거나 1차원 특징벡터(600)에 내적할 때, 이용되는 가중치들은 최초에 별도로 특정되지 않은 임의의 값이거나 미리 설정된 기본값이다.When calculating the convolution layer 200 or integrating the convolution layer 200 in the one-dimensional feature vector 600, the weights used are arbitrary values that are not initially specified, or predetermined default values.

다시 도 1을 참조하면, 상기 합성곱 신경망(CNN)에 의해 관절 좌표를 예측한 이후, 예측된 관절 좌표(Yi')와 상기 임의의 학습데이터(Xi,Yi)의 관절 좌표(Yi)의 차이값(Yi-Yi')인 에러값을 계산하고(S1300), 계산된 에러값이 기준값보다 클 경우, 오류 역전파 알고리즘(error back-propagation algorithm)을 통해 가중치 보정값(ㅿW)를 계산한다(S1500).Referring again to FIG. 1, after the joint coordinates are predicted by the CNN, the difference between the predicted joint coordinates Yi 'and the joint coordinates Yi of the arbitrary learning data Xi and Yi (Yi-Yi ') (S1300). If the calculated error value is larger than the reference value, the weight correction value (W) is calculated through an error back-propagation algorithm (S1500).

여기서, 상기 오류 역전파 알고리즘이란 피드포워드 신경망을 훈련시키는 방법으로 공지된 알고리즘을 이용하였다.Here, the error back propagation algorithm uses a known algorithm as a method of training the feed-forward neural network.

그러나, 상기 에러값이 상기 기준값보다 작을 경우에는 상기 학습데이터(Xn,Yn)와 상기 합성곱 신경망(CNN)을 최종 학습데이터(Xn",Yn")와 최종 합성곱 신경망(CNN")으로 설정한다.However, if the error value is smaller than the reference value, the learning data Xn, Yn and the CNN are set as the final learning data Xn ", Yn" and the final combined CNN " do.

이 최종 학습데이터(Xn",Yn")와 최종 합성곱 신경망(CNN")은 실제 관절 자세를 예측하고자 하는 대상 영상의 관절 자세 예측에 이용된다.This final learning data (Xn ", Yn ") and final concurrent neural network (CNN") are used for joint posture prediction of a target image to be predicted.

다만, 최초에 생성된 학습데이터(Xn,Yn)와 합성곱 신경망(CNN)이 최종 학습데이터(Xn",Yn")와 최종 합성곱 신경망(CNN")으로 설정되는 경우는 실제 거의 없다.However, there is almost no case in which the learning data Xn and Yn generated at the beginning and the combined product neural network CNN are set as the final learning data Xn '' and Yn '' and the final concurrent neural network CNN ''.

다음, 상기 합성곱 신경망(CNN)에서 이용되는 가중치에 상기 가중치 보정값(ㅿW)을 더하여 가중치가 보정된 새로운 합성곱 신경망(CNN')을 생성한다(S1600).Next, the weight correction value (W) is added to a weight used in the CNN (CNN) to generate a new weight CNN (CNN ') (S1600).

다음, 최초에 생성된 학습데이터(Xn,Yn)를 새로운 학습데이터(Xn',Yn')으로 생성한다.Next, the learning data (Xn, Yn) generated at the beginning is generated as new learning data (Xn ', Yn').

이 과정은 본 발명의 핵심이 되는 과정으로 학습데이터가 최대한 사람 영상만을 포함하여 이루어지고, 그 사람 영상이 관절 좌표를 갖도록 함으로써 추후 실제 사람 영상을 입력받아 관절자세를 예측할 때, 예측의 정확도를 향상시킬 수 있다.This process is a core process of the present invention, in which the learning data includes only the human image as much as possible, and the human image has the joint coordinates, so that when the joint posture is predicted by inputting the actual human image, .

도 2를 참조하여 더욱 자세하게 설명하면, 먼저, 상기 합성곱 신경망(CNN)에서 예측된 관절좌표(Yi')'(B)'에 노이즈를 부여하여 노이즈가 부여된 복수의 예측된 관절 좌표(Yi")'(C)'를 생성한다.2, a noise is given to a joint coordinate Yi '' (B) 'predicted in the CNN, and a plurality of predicted joint coordinates (Yi') ' ") '(C)'.

이와는 별도로 상기 학습데이터(Xn,Yn)'(A)'의 관절 좌표인 원 관절 좌표(Yn)에 노이즈를 부여하여 노이즈가 부여된 원 관절 좌표(Yn')'D'를 생성한다(S1700).Separately, noises are added to the original joint coordinates Yn, which are the joint coordinates of the learning data (Xn, Yn) '(A)' to generate noise-added original joint coordinates Yn ' .

또한, 상기 노이즈가 부여된 원 관절 좌표(Yn')는 상기 학습데이터(Xn,Yn)의 모든 원 관절 좌표(Yn)에 대해 각각 생성된다.In addition, the noise-added original joint coordinates Yn 'are generated for all the original joint coordinates Yn of the learning data Xn, Yn.

다음, 상기 학습데이터(Xn,Yn)의 원 영상(Xn)에서 상기 노이즈가 부여된 복수의 예측된 관절 좌표(Yi")를 포함하는 관심 영역을 잘라내어(크로핑,cropping) 부분 영상(Xn')을 생성한다(S1800).Next, a region of interest (cropping) including a plurality of predicted joint coordinates (Yi ") to which noises are imparted is cut out from an original image (Xn) of the learning data (Xn, Yn) (S1800).

예를 들면, 상기 부분 영상(Xn')은 상기 노이즈가 부여된 복수의 예측된 관절 좌표(Yi")들 중, 상하좌우 최외각에 위치하는 좌표들을 서로 연결한 사각형 영역으로 추출될 수 있으며, 추출된 사각형을 포함하여 미리 정해진 크기만큼 더 큰 사각형 영역으로 추출될 수 있다.For example, the partial image Xn 'may be extracted as a rectangular region connecting coordinates of upper, lower, left, and rightmost outermost polygons among the plurality of predicted joint coordinates (Yi') imparted with noises, And can be extracted into a rectangular area larger by a predetermined size including the extracted rectangle.

또한, 상기 부분 영상은 상기 학습데이터(Xn,Yn)의 모든 원 영상(Xn)에 대해 각각 생성된다.Further, the partial image is generated for all the original images Xn of the learning data Xn, Yn.

다음, 상기 부분 영상(Xn')과 상기 노이즈가 부여된 원 관절 좌표(Yn')을 원소로하는 새로운 학습데이터(Xn',Yn')'(E)'를 생성한다(S1900).Next, in step S1900, new learning data Xn ', Yn' '(E)' using the partial image Xn 'and the noise-added original joint coordinates Yn' as elements is generated.

이 새로운 학습데이터(Xn',Yn')는 영상 내에 사람의 영역만이 포함되므로 실제 사람 영상 내에서 관절 자세를 예측할 때 정확도를 향상시킬 수 있는 것이다.The new learning data (Xn ', Yn') includes only a human region in the image, thereby improving the accuracy in predicting the joint posture in the actual human image.

한편, 상기 부분 영상(Xn')의 크기(H'×W')는 원 영상(Xn)의 크기(H×W)보다 작기 때문에 원 영상(Xn)의 크기로 정규화하고, 상기 노이즈가 부여된 원 관절 좌표(Yn')도 상기 부분 영상(Xn')이 정규화되는 크기만큼 좌표값들을 보정하여 서로 매핑함으로써 상기 새로운 학습데이터(Xn',Yn')를 생성한다.Meanwhile, since the size H '× W' of the partial image Xn 'is smaller than the size H × W of the original image Xn, it is normalized to the size of the original image Xn, The new training data Xn 'and Yn' are generated by correcting the coordinate values of the original joint coordinates Yn 'by the magnitude that the partial images Xn' are normalized and mapped to each other.

다음, 상기 임의의 학습데이터(Xi,Yi)를 합성곱 신경망(CNN)에 입력하여 관절 좌표를 예측하는 단계(S1200)로 리턴하고, 예측된 관절좌표(Yi')의 에러값이 기준값보다 작아질 때까지 새로운 학습데이터와 새로운 합성곱 신경망을 생성하는 과정을 반복한다.Next, the arbitrary learning data (Xi, Yi) is input to the artificial neural network (CNN) to return to the step of predicting the joint coordinates (S1200). If the error value of the predicted joint coordinate (Yi ') is smaller than the reference value Repeat the process of creating new learning data and a new compound neural network until the new learning data is obtained.

또한, 여기서 임의의 학습데이터(Xi,Yi)와 합성곱 신경망(CNN)은 이전에 새롭게 생성된 학습데이터(Xn',Yn')와 합성곱 신경망(CNN')이 된다.Here, the arbitrary learning data Xi and Yi and the combined product neural network CNN become the previously generated learning data Xn 'and Yn' and the combined product neural network CNN '.

또한, 예측된 관절좌표(Yi')의 에러값이 기준값보다 작아질 경우, 바로 이전에 새롭게 생성된 학습데이티와 합성곱 신경망을 최종 학습데이터(Xn",Yn") 및 최종 합성곱 신경망(CNN")으로 설정한다(S1400).When the error value of the predicted joint coordinate Yi 'becomes smaller than the reference value, the learning data and the composite neural network newly generated just before are added to the final learning data Xn' ', Yn' 'and the final combined neural network CNN ") (S1400).

도 4는 상기 학습데이터(Xi,Yi)와 합성곱 신경망(CNN)을 새롭게 생성하면서, 관절 좌표를 예측한 결과를 보여주는 것이다.FIG. 4 shows a result of predicting joint coordinates while newly generating the learning data (Xi, Yi) and the composite neural network (CNN).

최초 스테이지1(Initial stage 1)은 최초의 합성곱 신경망(CNN)을 이용하여 관절 좌표를 예측한 결과로 붉은색 점은 최초 학습데이터(Xn,Yn)의 관절 좌표(Yn)이고 녹색 점은 예측된 관절 좌표(Yi')이다.The initial stage 1 is the joint coordinate (Yn) of the initial learning data (Xn, Yn), and the red point is the joint coordinate (Yn) of the initial learning data (CN) (Yi ').

또한, 스테이지2(stage 2)는 한번 새롭게 생성된 합성곱 신경망(CNN')을 이용하여 관절 좌표를 예측한 결과로 붉은색 점은 새롭게 생성된 학습데이터(Xn',Yn')의 관절 좌표(Yn')이고 녹색 점은 예측된 관절 좌표(Yi')이다.Stage 2 is a result of predicting the joint coordinates using the newly generated composite neural network CNN 'and the red point is the joint coordinates of the newly generated learning data Xn', Yn ' Yn ') and the green point is the predicted joint coordinate (Yi').

또한, 스테이지3(stage 3)는 다시 한번 새롭게 생성된 합성곱 신경망(CNN')을 이용하여 관절 좌표를 예측한 결과로 붉은색 점은 새롭게 생성된 학습데이터(Xn',Yn')의 관절 좌표(Yn')이고 녹색 점은 예측된 관절 좌표(Yi')이다.The stage 3 is a result of predicting the joint coordinates using the newly generated composite neural network CNN 'and the red point is the joint coordinates of the newly generated learning data Xn' and Yn ' (Yn ') and the green point is the predicted joint coordinate (Yi').

도 4에서도 알 수 있듯이 합성곱 신경망이 순차적으로 새롭게 생성되면서 예측이 반복될수록 예측한 관절좌표(Yi')의 값이 목적한 학습데이터(Xn',Yn')의 관절 좌표(Yn') 값에 가까워지는 것을 알 수 있다.As can be seen from FIG. 4, as the composite neural network is sequentially newly generated and the prediction is repeated, the predicted joint coordinate Yi 'is set to the joint coordinate Yn' of the target learning data Xn ', Yn' It can be seen that it is getting closer.

다시 도 1을 참조하면, 상기 최종 학습데이터(Xn",Yn")와 최종 합성곱 신경망(CNN")이 확정된 후, 실제 예측하고자 하는 대상 영상을 입력받는다(S2100).Referring again to FIG. 1, after the final learning data (Xn ", Yn ") and final combined neural network (CNN") are determined, a target image to be actually predicted is received (S2100).

다음, 상기 대상 영상을 상기 최종 합성곱 신경망(CNN")에 입력하고 출력되는 관절 좌표들을 관절 자세로 예측하고(S2200), 종료한다.Next, the target image is input to the final combined artificial neural network (CNN "), and the joint coordinates outputted are predicted as joint positions (S2200), and the process ends.

따라서, 본 발명의 일 실시예에 따른 관절 자세 예측 방법은 순차적으로 새롭게 생성된 다중 합성곱 신경망을 이용하여 관절 자세 예측의 오차를 최소화할 수 있는 장점이 있다.Therefore, the method of predicting the joint posture according to an embodiment of the present invention is advantageous in minimizing the error of the joint posture prediction by using the newly generated multiple composite neural network sequentially.

이상에서 살펴본 바와 같이 본 발명은 바람직한 실시예를 들어 도시하고 설명하였으나, 상기한 실시예에 한정되지 아니하며 본 발명의 정신을 벗어나지 않는 범위 내에서 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변경과 수정이 가능할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, Various changes and modifications will be possible.

100:학습 데이터 200:콘볼루션 레이어
300:정류된 선형의 콘볼루션 레이어 400:폴링 레이어
600:일차원 특징 벡터 100: learning data 200: convolution layer
300: rectified linear convolution layer 400: polling layer
600: One-dimensional feature vector

Claims

(a) generating learning data (Xn, Yn) having elements of a plurality of images (Xn) and intra-image joint coordinates (Yn) as elements;
(b) inputting arbitrary learning data (Xi, Yi) out of the learning data into a CNN (Convolutional Neural Network) and predicting joint coordinates;
(c) calculating an error value between the predicted joint coordinate (Yi ') and the joint coordinate (Yi) of the arbitrary learning data, and comparing the error value with a preset reference value;
(d) calculating a weight correction value using an error back-propagation algorithm when the error value is larger than the reference value, adding the weight correction value to the weight of the resultant neural network, Generating a new composite neural network < RTI ID = 0.0 > (CNN ') < / RTI >
(e) generating a plurality of predicted joint coordinates (Yi ") by applying noise to the predicted joint coordinates (Yi ') and applying noise to the joint joint coordinates (Yn) of the learning data (Xn, Yn) Generating an original joint coordinate Yn ';
(f) cutting out an area of interest including the predicted joint coordinates (Yi ") from the original image (Xn) of the learning data (Xn, Yn) to generate partial images (Xn ');
(g) generating new learning data Xn ', Yn' using the partial images Xn 'and noise-added original joint coordinates Yn' as elements;
(h) replacing the learning data (Xn, Yn) with the new learning data (Xn ', Yn') until the error value becomes smaller than the reference value in the step (c) ) Repeating steps;
(i) setting training data (Xn ", Yn ") and a composite neural network (CNN") generated when the error value becomes smaller than the reference value in the step (c) as final learning data and final concurrent neural network; And
(j) inputting a target image for predicting a joint posture to the final combined neural network to predict the joint posture of the target image. Way.

The method according to claim 1,
The step of predicting joint coordinates of the CNN may include:
Generating a plurality of convolution layers by passing input learning data through a convolution filter;
Converting a value of '0' or less among the values of each convolution layer into '0' through a rectified linear unit (ReLU);
Reducing the convolutional layers to a max operation or a mean operation to generate a pulling layer;
Fully connecting the pooling radars to generate a one-dimensional feature vector; And
And outputting a number of predicted joint coordinates corresponding to the joint coordinates by varying the weights and multiplying the one-dimensional feature vectors by the weights.

3. The method of claim 2,
The upper arm, the lower arm, the left arm, the right arm, the left arm, the right arm, the left arm, the right arm, the left arm, the right arm, , The left knee, the right knee, the left foot, and the coordinates of the right foot.

A computer program stored in a recording medium for executing a method of predicting a joint posture in an image using a sequential multiple composite artificial neural network according to any one of claims 1 to 4 by functioning as a computer.

A computer provided with the computer program of claim 4 and performing a joint articulated posture prediction method using sequential multiple composite neural networks.