WO2022255642A1 - Weight-reduced hand joint prediction method and device for implementation of real-time hand motion interface of augmented reality glass device - Google Patents

Weight-reduced hand joint prediction method and device for implementation of real-time hand motion interface of augmented reality glass device Download PDF

Info

Publication number
WO2022255642A1
WO2022255642A1 PCT/KR2022/005823 KR2022005823W WO2022255642A1 WO 2022255642 A1 WO2022255642 A1 WO 2022255642A1 KR 2022005823 W KR2022005823 W KR 2022005823W WO 2022255642 A1 WO2022255642 A1 WO 2022255642A1
Authority
WO
WIPO (PCT)
Prior art keywords
hand
joint
augmented reality
real
prediction
Prior art date
Application number
PCT/KR2022/005823
Other languages
French (fr)
Korean (ko)
Inventor
최치원
조성동
김정환
백지엽
민경진
이강휘
Original Assignee
주식회사 피앤씨솔루션
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 피앤씨솔루션 filed Critical 주식회사 피앤씨솔루션
Publication of WO2022255642A1 publication Critical patent/WO2022255642A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/002Specific input/output arrangements not covered by G06F3/01 - G06F3/16
    • G06F3/005Input arrangements through a video camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the present invention relates to a hand joint prediction method and device, and more particularly, to a lightweight hand joint prediction method and device for implementing a real-time hand motion interface in an augmented reality glass device.
  • a head mounted display which is a kind of wearable device, refers to various devices that can be worn on a user's head to receive multimedia contents and the like.
  • the head mounted display (HMD) is worn on the user's body and provides images to the user in various environments as the user moves.
  • Head-mounted displays (HMDs) are classified into see-through and see-closed types.
  • the see-through type is mainly used for Augmented Reality (AR), and the closed type is mainly used for virtual reality (VR). reality, VR).
  • HMD for augmented reality (hereinafter referred to as augmented reality glass device) is important for gesture (hand movement) recognition in order to easily and conveniently interact with a wearer without a separate input device.
  • augmented reality glass device In order to implement a hand gesture interface that controls an augmented reality glasses device with a hand gesture, a process of accurately detecting a hand gesture must be preceded.
  • Patent Registration No. 10-2102309 Title of Invention: Object Recognition Method for 3D Virtual Space of Head-Worn Display Device, Registration Date: April 13, 2020, etc. has been disclosed.
  • the present invention is proposed to solve the above problems of the previously proposed methods, and detects candidate keypoints of the hand joints in the entire input image without a hand detection process, and then uses a joint evaluation model to determine the correlation between the candidate keypoints.
  • a plurality of hand joints can be predicted from one candidate keypoint detection without a separate hand region detection and joint prediction process in each detected hand region.
  • the augmented reality glass device can simplify and lighten the joint prediction process, and since the joint prediction time and computation amount do not increase in proportion to the number of hands in the input image, real-time hand joint prediction can be quickly performed in an embedded environment. Its purpose is to provide a lightweight hand joint prediction method and device for implementing a real-time hand motion interface.
  • a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glass device for achieving the above object is,
  • a hand joint prediction method in which each step is performed in the augmented reality glasses device to implement a real-time hand motion interface of the augmented reality glasses device comprising:
  • step (3) (4) predicting at least one hand joint by connecting the joint points according to the connection relationship from the determination result of step (3);
  • It can be configured by learning the hand joint point map and the joint relationship map based on the artificial neural network.
  • the joint points may be selected by classifying each hand joint included in the input image, and a connection relationship between the joint points within the classified group may be determined.
  • step (2) the candidate keypoint is detected from a 2D input image captured by the augmented reality glasses device
  • a three-dimensional hand joint can be predicted.
  • a lightweight hand joint prediction device for implementing a real-time hand motion interface of an augmented reality glass device according to the features of the present invention for achieving the above object
  • a hand joint prediction device mounted on the augmented reality glasses device to implement a real-time hand motion interface of the augmented reality glasses device
  • a model storage unit for storing a joint evaluation model obtained by learning correlations between hand joints based on an artificial neural network
  • the prediction unit calculates the prediction unit, calculates
  • a detection module for detecting in real time candidate keypoints that become hand joint candidates in the entire input image captured by the augmented reality glasses device
  • a prediction module for predicting at least one hand joint by connecting the joint points according to the connection relationship from the determination result of the determination module;
  • the prediction unit calculates the prediction unit, calculates
  • It is characterized in that it predicts at least one hand joint included in the input image by sequentially operating the detection module, the determination module, and the prediction module once.
  • the model storage unit Preferably, the model storage unit, the model storage unit, and
  • the joint evaluation model configured by learning the hand joint point map and the joint relationship map based on the artificial neural network may be stored.
  • the determination module calculates the determination module
  • the joint points may be selected by classifying each hand joint included in the input image, and a connection relationship between the joint points within the classified group may be determined.
  • the detection module detects the candidate keypoint from a two-dimensional input image captured by the augmented reality glasses device
  • the prediction module may predict a 3-dimensional hand joint.
  • candidate key points of hand joints are detected in the entire input image without a hand detection process, and then the joint evaluation model is used.
  • the joint evaluation model is used.
  • FIG. 1 is a view showing the configuration of an augmented reality glasses device equipped with a lightweight hand joint prediction method and device for implementing a real-time hand motion interface of the augmented reality glasses device according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing the configuration of a lightweight hand joint prediction device for implementing a real-time hand motion interface of an augmented reality glasses device according to an embodiment of the present invention.
  • FIG. 3 is a flowchart illustrating a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device according to an embodiment of the present invention.
  • FIG. 4 is a diagram illustrating a prediction process according to a conventional hand joint prediction method
  • FIG. 5 is a diagram showing a simplified flow of a conventional hand joint prediction method
  • FIG. 6 is a diagram showing a simplified flow of a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device according to an embodiment of the present invention.
  • FIG. 7 is a diagram illustrating a prediction process according to a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device according to an embodiment of the present invention.
  • S300 Evaluating the correlation between candidate key points using the joint evaluation model, and determining the joint points corresponding to the hand joints among the candidate key points and the connection relationship between the joint points.
  • FIG. 1 is a diagram showing the configuration of an augmented reality glasses device 10 equipped with a lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention. to be.
  • a lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention is to be mounted on the augmented reality glasses device 10.
  • the augmented reality glasses device 10 may include the hand joint prediction device 100 to implement a real-time hand motion interface. More specifically, the hand joint prediction device 100 predicts hand joints from an input image captured by the camera 200 of the augmented reality glasses device 10, and transmits the predicted hand joints to the controller 300 so that the controller 300 ) can process the hand motion interface corresponding to the predicted hand joint.
  • the hand joint refers to a skeleton in the form of a skeleton connecting joint points constituting the hand
  • predicting a hand joint may mean predicting a plurality of joint points constituting the hand and a connection relationship between the joint points.
  • Hand joint prediction is for estimating the skeletal shape of the hand, constructing a hand motion from the estimated skeletal shape of the hand, and using it for the hand motion interface.
  • the lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention is As a hand joint prediction device 100 mounted on the augmented reality glasses device 10 to implement a hand motion interface, a model storage unit for storing a joint evaluation model 111 learning correlations between hand joints based on an artificial neural network ( 110) and a prediction unit 120 that predicts at least one hand joint from an input image captured by the augmented reality glasses device 10.
  • the prediction unit 120 includes a detection module 121 for detecting in real time candidate key points that become hand joint candidates in the entire input image captured by the augmented reality glasses device 10; Using the joint evaluation model 111 stored in the model storage unit 110, the correlation between candidate key points detected by the detection module 121 is evaluated, and among the candidate key points, the joint points corresponding to the hand joints and the joint points between the joint points are evaluated. a judgment module 122 for determining a connection relationship; and a prediction module 123 for predicting at least one hand joint by connecting joint points according to a connection relationship from the determination result of the determination module 122, wherein the prediction unit 120 includes the detection module 121 and the determination module. At least one hand joint included in the input image may be predicted by sequentially operating step 122 and the prediction module 123 once.
  • FIG. 3 is a flow diagram illustrating a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention.
  • a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention implements a real-time hand motion interface of the augmented reality glasses device 10
  • the joint evaluation model 111 learned from the correlation between the hand joints based on the artificial neural network is stored (S100), augmentation Detecting candidate keypoints as hand joint candidates in real time from the entire input image captured by the real glasses device 10 (S200), evaluating the correlation between candidate keypoints using the joint evaluation model 111, and Among them, it may be implemented including determining joint points corresponding to the hand joints and a connection relationship between the joint points (S300) and predicting at least one hand joint by connecting the joint points according to the connection
  • At least one included in the input image is performed through steps S200 to S400 once. More than one hand joint can be predicted. More specifically, without the hand detection process of detecting hand regions in the input image and the process of predicting hand joints in individual hand regions, candidate keypoints for hand joints are detected in the entire input image, and then the joint evaluation model 111 is used. At least one hand joint included in the input image may be predicted at once based on the correlation between candidate keypoints.
  • the lightweight hand joint prediction method for implementing the real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention, a plurality of hand joints can be predicted by performing steps S200 to S400 only once. have. Therefore, the hand joint prediction process can be simplified and lightened, and since the joint prediction time and computation amount do not increase in proportion to the number of hands in the input image, hand joint prediction can be quickly performed in real time in an embedded environment.
  • FIG. 4 is a diagram showing a prediction process according to a conventional hand joint prediction method
  • FIG. 5 is a diagram showing a simplified flow of the conventional hand joint prediction method.
  • a top-down method was used to estimate the joint. That is, a candidate region with a human hand is found in the input image (FIG. 4(a)), and the hand region is detected in the form of a bounding box by determining whether the detected candidate region is a real hand (FIG. 4(a)). of (b)).
  • a hand image is acquired by cutting the bounding box of the detected hand region, and a hand joint is estimated by applying a pose estimation algorithm to each hand image (Fig. 4(c) and (d)).
  • both hand region detection and joint estimation in each hand region must be performed. Therefore, hand detection is performed in proportion to the number of hand candidate regions. The amount of time and computation required for resource usage and joint estimation increases. Therefore, in a small embedded environment such as the augmented reality glasses device 10, calculation resources are insufficient and unstable CPU resources such as FPS decrease instantaneously appear, which is unsuitable for real-time operation of the augmented reality glasses device 10.
  • FIG. 6 is a diagram showing a simplified flow of a lightweight hand joint prediction method for real-time hand motion interface implementation of the augmented reality glasses device 10 according to an embodiment of the present invention
  • FIG. 7 is an embodiment of the present invention It is a diagram showing a prediction process according to a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to.
  • a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention includes candidate key points of hand joints in the entire input image. It is possible to predict the hand joint by detecting (Fig. 7 (a)) and evaluating the correlation between candidate key points (Fig. 7 (b)).
  • this bottom-up method it is possible to predict hand joints without detecting candidate regions of the conventional top-down method of hand joint prediction and repeating procedures for all candidate regions as shown in FIG. It can have fast estimation speed.
  • the accuracy of the estimation result may be higher than that of the bottom-up method according to the present invention.
  • the shape of the hand used for the hand gesture interface is limited, and considering the small resource usage and high speed of the bottom-up method of the present invention, the real-time hand gesture interface of the augmented reality glasses device 10 according to an embodiment of the present invention
  • a lightweight hand joint prediction method for implementation may be more efficient than conventional methods in real-time application in an embedded environment of the augmented reality glasses device 10 .
  • the present invention relates to a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device 10, and may be configured as software recorded in hardware including a memory and a processor.
  • a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 of the present invention may be stored and implemented in the augmented reality glasses device 10 .
  • the subject performing each step may be omitted.
  • step S100 the joint evaluation model 111 obtained by learning the correlation between the joints of the hand based on the artificial neural network may be stored. More specifically, the joint evaluation model 111 is stored in the model storage unit 110 of the hand joint prediction device 100, and the prediction unit 120 uses this to store the hand joint in the embedded environment of the augmented reality glass device 10. can predict In particular, a learning process requiring a lot of computing resources may be processed in a server computer or the like, and the joint evaluation model 111 generated through learning may be stored in the augmented reality glasses device 10 and used.
  • the joint evaluation model 111 of step S100 may be configured by learning the hand joint point map and the joint relationship map based on the artificial neural network. More specifically, a hand joint point map and a joint relationship map are created from images of various hand gestures taken at various angles and lighting, and the hand joint point map and joint relationship map are configured as training data and used with a deep learning algorithm. A joint evaluation model 111 may be created.
  • An algorithm used to generate the joint evaluation model 111 may be an artificial neural network model, and CNN, RNN, and the like may be used.
  • a graph neural network, a graph convolutional network (GCN), or the like may be used for effective learning of the joint relationship map.
  • step S200 candidate key points serving as hand joint candidates may be detected in real time from the entire input image captured by the augmented reality glasses device 10 . That is, in step S200, candidate key points of hand joints can be detected from the entire captured input image without detecting the hand region or detecting and cropping the bounding box in the input image captured by the camera 200 (see FIG. 7). (a)).
  • candidate keypoints may be detected from a 2D input image captured by the augmented reality glasses device 10 .
  • the camera 200 of the augmented reality glasses device 10 is a general camera device that captures a 2D image, and can capture an image in the direction of the wearer's gaze.
  • step S300 the correlation between the candidate key points detected in step S200 is evaluated using the joint evaluation model 111 stored in step S100, and the joint points corresponding to the hand joints among the candidate key points and the connection relationship between the joint points are evaluated.
  • candidate key points are matched with each other to evaluate the correlation through the joint evaluation model 111, and based on the evaluated correlation, it is determined whether the joint points constitute the hand or the joints connected to each other, that is, the joint points of one hand. can do.
  • step S300 by using the correlation between candidate key points, it is possible to classify the input image for each hand joint included in the input image, select a joint point, and determine a connection between the joint points within the classified group. That is, as shown in FIG. 7 , if there are a plurality of hands included in the input image, the joint points constituting each hand may be grouped by classifying the joint points for each hand. That is, the correlation between the candidate key points detected in (a) of FIG. 7 is evaluated, joint points corresponding to the left hand and joint points corresponding to the right hand are classified and selected, and the connection relationship between the left hand joint points and the right hand joint are selected. A connection relationship between points may be determined.
  • step S400 at least one hand joint may be predicted by connecting joint points according to a connection relationship from the determination result of step S300. More specifically, it is possible to predict hand joints and estimate hand motions using the joint points of each group and their connection relationships. That is, as shown in (b) of FIG. 7 , a left hand joint may be predicted from a connection relationship between left hand joint points, and a right hand joint may be predicted from a connection relationship between right hand joint points.
  • a three-dimensional hand joint may be predicted. That is, since hand joints can be configured in three dimensions according to the relative positions of joint points, a three-dimensional hand motion can be estimated from a two-dimensional input image.
  • candidate key points of hand joints are obtained from the entire input image without a hand detection process.
  • the joint evaluation model 111 is used to predict at least one or more hand joints included in the input image based on the correlation between candidate key points, thereby detecting a separate hand region and predicting a joint in each detected hand region. Since a plurality of hand joints can be predicted from one candidate keypoint detection without a process, the hand joint prediction process can be simplified and lightened, and the joint prediction time and computation amount do not increase in proportion to the number of hands in the input image, so the embedded environment can quickly predict hand joints in real time.
  • the present invention may include a computer-readable medium including program instructions for performing operations implemented in various communication terminals.
  • computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD_ROMs and DVDs, and floptical disks. It may include hardware devices specially configured to store and execute program instructions, such as magneto-optical media and ROM, RAM, flash memory, and the like.
  • Such computer-readable media may include program instructions, data files, data structures, etc. alone or in combination.
  • program instructions recorded on a computer-readable medium may be specially designed and configured to implement the present invention, or may be known and usable to those skilled in computer software.
  • it may include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes generated by a compiler.

Abstract

According to a weight-reduced hand joint prediction method and device for implementation of real-time hand motion interface of an augmented reality glass device proposed in the present invention, candidate key points of hand joints are detected from a whole input image without a hand detection process, and then at least one hand joint included in the input image is predicted on the basis of a correlation between the candidate key points by using a joint evaluation model. Therefore, the present invention allows prediction of a plurality of hand joints by one-time candidate key point detection without a separate hand area detection and a joint prediction process in each of detected hand areas and thus can simplify and reduce a hand joint prediction process, and can prevent a joint prediction time and an operation quantity from increasing in proportion to the number of hands in the input image and thus enables rapid hand joint prediction in real time in an embedded environment.

Description

증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법 및 장치Lightweight hand joint prediction method and device for implementing real-time hand motion interface of augmented reality glass device
본 발명은 손 관절 예측 방법 및 장치에 관한 것으로서, 보다 구체적으로는 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법 및 장치에 관한 것이다.The present invention relates to a hand joint prediction method and device, and more particularly, to a lightweight hand joint prediction method and device for implementing a real-time hand motion interface in an augmented reality glass device.
디지털 디바이스의 경량화 및 소형화 추세에 따라 다양한 웨어러블 디바이스(wearable device)들이 개발되고 있다. 이러한 웨어러블 디바이스의 일종인 헤드 마운티드 디스플레이(Head Mounted Display)는 사용자가 머리에 착용하여 멀티미디어 컨텐츠 등을 제공받을 수 있는 각종 디바이스를 의미한다. 여기서 헤드 마운티드 디스플레이(HMD)는 사용자의 신체에 착용 되어 사용자가 이동함에 따라서 다양한 환경에서 사용자에게 영상을 제공하게 된다. 이러한 헤드 마운티드 디스플레이(HMD)는 투과(see-through)형과 밀폐(see-closed)형으로 구분되고 있으며, 투과형은 주로 증강현실(Augmented Reality, AR)용으로 사용되고, 밀폐형은 주로 가상현실(Virtual Reality, VR)용으로 사용되고 있다.Various wearable devices are being developed according to the trend of light weight and miniaturization of digital devices. A head mounted display, which is a kind of wearable device, refers to various devices that can be worn on a user's head to receive multimedia contents and the like. Here, the head mounted display (HMD) is worn on the user's body and provides images to the user in various environments as the user moves. Head-mounted displays (HMDs) are classified into see-through and see-closed types. The see-through type is mainly used for Augmented Reality (AR), and the closed type is mainly used for virtual reality (VR). reality, VR).
한편, 증강현실용 HMD(이하, 증강현실 글라스 장치)는, 별도의 입력 장치 없이 착용자와 쉽고 편리하게 상호작용하기 위해 제스처(손동작) 인식이 중요하다. 손동작으로 증강현실 글라스 장치를 제어하는 손동작 인터페이스를 구현하기 위해서는, 손동작을 정확하게 검출하는 과정이 선행되어야 한다.On the other hand, HMD for augmented reality (hereinafter referred to as augmented reality glass device) is important for gesture (hand movement) recognition in order to easily and conveniently interact with a wearer without a separate input device. In order to implement a hand gesture interface that controls an augmented reality glasses device with a hand gesture, a process of accurately detecting a hand gesture must be preceded.
손동작의 정확한 검출을 위해, 컴퓨터 비전에서 객체의 위치(Position)와 방향(Orientation)을 탐지하는 기존 기술을 사용할 수 있는데, 최근에는 인공지능 기술이 컴퓨터 비전에 적용되어 객체의 위치를 검출하고 추정하는 데에 적극적으로 활용되고 있으며, 이러한 기술을 손 관절 추정에도 적용할 수 있다.For accurate detection of hand gestures, existing technology for detecting the position and orientation of an object can be used in computer vision. Recently, artificial intelligence technology has been applied to computer vision to detect and estimate the position of an object. This technique can also be applied to hand joint estimation.
한편, 증강현실 글라스 장치는 머리에 착용하는 특성상 크기와 무게를 최소화해야 하므로, 높은 컴퓨팅 파워를 갖추기는 어렵다. 그런데 기존의 인공지능 기반의 손 관절 추정은 높은 컴퓨팅 파워가 요구되므로, 이를 임베디드 환경에서 실시간 처리하기는 어렵다. 특히, 관절을 추정해야 하는 손의 수가 늘어남에 따라 연산량도 많이 늘어나므로, 소형 임베디드 환경에서는 순간적으로 FPS 감소와 불안정한 CPU 리소스 부족 현상이 나타날 수 있다. 따라서 이와 같은 문제를 해결하기 위한 솔루션이 필요한 실정이다.On the other hand, since the size and weight of the augmented reality glass device must be minimized due to the nature of being worn on the head, it is difficult to have high computing power. However, existing artificial intelligence-based hand joint estimation requires high computing power, so it is difficult to process it in real time in an embedded environment. In particular, as the number of hands for which joints are to be estimated increases, the amount of computation also increases, so in a small embedded environment, FPS may decrease and unstable CPU resource shortages may appear instantaneously. Therefore, there is a need for a solution to solve this problem.
한편, 본 발명과 관련된 선행기술로서, 등록특허 제10-2102309호(발명의 명칭: 머리 착용형 디스플레이 장치의 3차원 가상공간을 위한 객체 인식 방법, 등록일자: 2020년 04월 13일) 등이 개시된 바 있다.On the other hand, as prior art related to the present invention, Patent Registration No. 10-2102309 (Title of Invention: Object Recognition Method for 3D Virtual Space of Head-Worn Display Device, Registration Date: April 13, 2020), etc. has been disclosed.
본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 손 검출 과정 없이 입력 영상 전체에서 손 관절의 후보 키포인트를 검출한 다음 관절 평가 모델을 사용해 후보 키포인트 사이의 상관관계에 기초해 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측함으로써, 별도의 손 영역 검출과 검출된 각각의 손 영역에서의 관절 예측 과정 없이 1회의 후보 키포인트 검출로부터 복수의 손 관절을 예측할 수 있으므로, 손 관절 예측 과정을 단순화 및 경량화할 수 있고, 입력 영상 내의 손의 개수에 비례하여 관절 예측 시간과 연산량이 증가하지 않으므로, 임베디드 환경에서 실시간으로 손 관절 예측을 신속하게 할 수 있는, 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법 및 장치를 제공하는 것을 그 목적으로 한다.The present invention is proposed to solve the above problems of the previously proposed methods, and detects candidate keypoints of the hand joints in the entire input image without a hand detection process, and then uses a joint evaluation model to determine the correlation between the candidate keypoints. By predicting at least one or more hand joints included in the input image based on the hand region, a plurality of hand joints can be predicted from one candidate keypoint detection without a separate hand region detection and joint prediction process in each detected hand region. The augmented reality glass device can simplify and lighten the joint prediction process, and since the joint prediction time and computation amount do not increase in proportion to the number of hands in the input image, real-time hand joint prediction can be quickly performed in an embedded environment. Its purpose is to provide a lightweight hand joint prediction method and device for implementing a real-time hand motion interface.
상기한 목적을 달성하기 위한 본 발명의 특징에 따른 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법은,A lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glass device according to the features of the present invention for achieving the above object is,
증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위해, 상기 증강현실 글라스 장치에서 각 단계가 수행되는 손 관절 예측 방법으로서,A hand joint prediction method in which each step is performed in the augmented reality glasses device to implement a real-time hand motion interface of the augmented reality glasses device, comprising:
(1) 손 관절 사이의 상관관계를 인공신경망 기반으로 학습한 관절 평가 모델을 저장하는 단계;(1) storing a joint evaluation model that has learned correlations between hand joints based on an artificial neural network;
(2) 증강현실 글라스 장치에서 촬영되는 입력 영상 전체에서 손 관절 후보가 되는 후보 키포인트를 실시간으로 검출하는 단계;(2) detecting in real time candidate key points that become hand joint candidates in the entire input image captured by the augmented reality glasses device;
(3) 상기 단계 (1)에서 저장된 관절 평가 모델을 사용해, 상기 단계 (2)에서 검출된 후보 키포인트 사이의 상관관계를 평가하고, 상기 후보 키포인트 중에서 손 관절에 대응하는 관절 포인트 및 상기 관절 포인트 사이의 연결 관계를 판단하는 단계; 및(3) Evaluate the correlation between the candidate key points detected in step (2) using the joint evaluation model stored in step (1), and between the joint points corresponding to the hand joints and the joint points among the candidate key points. Determining the connection relationship of; and
(4) 상기 단계 (3)의 판단 결과로부터 상기 관절 포인트를 상기 연결 관계에 따라 연결해 적어도 하나 이상의 손 관절을 예측하는 단계를 포함하며,(4) predicting at least one hand joint by connecting the joint points according to the connection relationship from the determination result of step (3);
상기 단계 (2) 내지 단계 (4)의 1회 수행을 통해 상기 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측하는 것을 그 구성상의 특징으로 한다.It is characterized in that it predicts at least one hand joint included in the input image through performing steps (2) to (4) once.
바람직하게는, 상기 단계 (1)의 관절 평가 모델은,Preferably, the joint evaluation model of step (1),
손 관절 포인트 맵과 관절 관계도 맵을 인공신경망 기반으로 학습하여 구성될 수 있다.It can be configured by learning the hand joint point map and the joint relationship map based on the artificial neural network.
바람직하게는, 상기 단계 (3)에서는,Preferably, in the step (3),
상기 후보 키포인트 사이의 상관관계를 사용해, 상기 입력 영상에 포함된 손 관절 별로 분류해 상기 관절 포인트를 선정하고, 분류된 그룹 내에서 상기 관절 포인트 사이의 연결 관계를 판단할 수 있다.Using the correlation between the candidate keypoints, the joint points may be selected by classifying each hand joint included in the input image, and a connection relationship between the joint points within the classified group may be determined.
바람직하게는,Preferably,
상기 단계 (2)에서는, 상기 증강현실 글라스 장치에서 촬영되는 2차원 입력 영상에서 상기 후보 키포인트를 검출하며,In step (2), the candidate keypoint is detected from a 2D input image captured by the augmented reality glasses device;
상기 단계 (4)에서는, 3차원의 손 관절을 예측할 수 있다.In the above step (4), a three-dimensional hand joint can be predicted.
상기한 목적을 달성하기 위한 본 발명의 특징에 따른 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치는,A lightweight hand joint prediction device for implementing a real-time hand motion interface of an augmented reality glass device according to the features of the present invention for achieving the above object,
증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위해 상기 증강현실 글라스 장치에 탑재되는, 손 관절 예측 장치로서,A hand joint prediction device mounted on the augmented reality glasses device to implement a real-time hand motion interface of the augmented reality glasses device,
손 관절 사이의 상관관계를 인공신경망 기반으로 학습한 관절 평가 모델을 저장하는 모델 저장부; 및a model storage unit for storing a joint evaluation model obtained by learning correlations between hand joints based on an artificial neural network; and
상기 증강현실 글라스 장치에서 촬영되는 입력 영상으로부터 적어도 하나 이상의 손 관절을 예측하는 예측부를 포함하되,A prediction unit for predicting at least one hand joint from an input image captured by the augmented reality glasses device;
상기 예측부는,The prediction unit,
상기 증강현실 글라스 장치에서 촬영되는 입력 영상 전체에서 손 관절 후보가 되는 후보 키포인트를 실시간으로 검출하는 검출 모듈;a detection module for detecting in real time candidate keypoints that become hand joint candidates in the entire input image captured by the augmented reality glasses device;
상기 모델 저장부에 저장된 관절 평가 모델을 사용해, 상기 검출 모듈에서 검출된 후보 키포인트 사이의 상관관계를 평가하고, 상기 후보 키포인트 중에서 손 관절에 대응하는 관절 포인트 및 상기 관절 포인트 사이의 연결 관계를 판단하는 판단 모듈; 및Evaluating a correlation between candidate key points detected by the detection module using a joint evaluation model stored in the model storage unit, and determining a joint point corresponding to a hand joint among the candidate key points and a connection relationship between the joint points judgment module; and
상기 판단 모듈의 판단 결과로부터 상기 관절 포인트를 상기 연결 관계에 따라 연결해 적어도 하나 이상의 손 관절을 예측하는 예측 모듈을 포함하며,A prediction module for predicting at least one hand joint by connecting the joint points according to the connection relationship from the determination result of the determination module;
상기 예측부는,The prediction unit,
상기 검출 모듈, 판단 모듈 및 예측 모듈을 순차로 1회 동작하여 상기 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측하는 것을 그 구성상의 특징으로 한다.It is characterized in that it predicts at least one hand joint included in the input image by sequentially operating the detection module, the determination module, and the prediction module once.
바람직하게는, 상기 모델 저장부는,Preferably, the model storage unit,
손 관절 포인트 맵과 관절 관계도 맵을 인공신경망 기반으로 학습하여 구성된 상기 관절 평가 모델을 저장할 수 있다.The joint evaluation model configured by learning the hand joint point map and the joint relationship map based on the artificial neural network may be stored.
바람직하게는, 상기 판단 모듈은,Preferably, the determination module,
상기 후보 키포인트 사이의 상관관계를 사용해, 상기 입력 영상에 포함된 손 관절 별로 분류해 상기 관절 포인트를 선정하고, 분류된 그룹 내에서 상기 관절 포인트 사이의 연결 관계를 판단할 수 있다.Using the correlation between the candidate keypoints, the joint points may be selected by classifying each hand joint included in the input image, and a connection relationship between the joint points within the classified group may be determined.
바람직하게는,Preferably,
상기 검출 모듈은, 상기 증강현실 글라스 장치에서 촬영되는 2차원 입력 영상에서 상기 후보 키포인트를 검출하며,The detection module detects the candidate keypoint from a two-dimensional input image captured by the augmented reality glasses device;
상기 예측 모듈은, 3차원의 손 관절을 예측할 수 있다.The prediction module may predict a 3-dimensional hand joint.
본 발명에서 제안하고 있는 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법 및 장치에 따르면, 손 검출 과정 없이 입력 영상 전체에서 손 관절의 후보 키포인트를 검출한 다음 관절 평가 모델을 사용해 후보 키포인트 사이의 상관관계에 기초해 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측함으로써, 별도의 손 영역 검출과 검출된 각각의 손 영역에서의 관절 예측 과정 없이 1회의 후보 키포인트 검출로부터 복수의 손 관절을 예측할 수 있으므로, 손 관절 예측 과정을 단순화 및 경량화할 수 있고, 입력 영상 내의 손의 개수에 비례하여 관절 예측 시간과 연산량이 증가하지 않으므로, 임베디드 환경에서 실시간으로 손 관절 예측을 신속하게 할 수 있다.According to the lightweight hand joint prediction method and device for real-time hand motion interface implementation of an augmented reality glass device proposed in the present invention, candidate key points of hand joints are detected in the entire input image without a hand detection process, and then the joint evaluation model is used. By predicting at least one or more hand joints included in the input image based on the correlation between candidate keypoints, a plurality of hands from one candidate keypoint detection without separate hand region detection and joint prediction process in each detected hand region. Since joints can be predicted, the hand joint prediction process can be simplified and lightened, and since the joint prediction time and computation amount do not increase in proportion to the number of hands in the input image, hand joint prediction can be quickly performed in real time in an embedded environment. have.
도 1은 본 발명의 일실시예에 따른 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법 및 장치가 탑재된 증강현실 글라스 장치의 구성을 도시한 도면.1 is a view showing the configuration of an augmented reality glasses device equipped with a lightweight hand joint prediction method and device for implementing a real-time hand motion interface of the augmented reality glasses device according to an embodiment of the present invention.
도 2는 본 발명의 일실시예에 따른 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치의 구성을 도시한 도면.2 is a diagram showing the configuration of a lightweight hand joint prediction device for implementing a real-time hand motion interface of an augmented reality glasses device according to an embodiment of the present invention.
도 3은 본 발명의 일실시예에 따른 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법의 흐름을 도시한 도면.FIG. 3 is a flowchart illustrating a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device according to an embodiment of the present invention.
도 4는 종래의 손 관절 예측 방법에 따른 예측 과정을 도시한 도면.4 is a diagram illustrating a prediction process according to a conventional hand joint prediction method;
도 5는 종래의 손 관절 예측 방법의 흐름을 간략화하여 도시한 도면.5 is a diagram showing a simplified flow of a conventional hand joint prediction method;
도 6은 본 발명의 일실시예에 따른 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법의 흐름을 간략화하여 도시한 도면.6 is a diagram showing a simplified flow of a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device according to an embodiment of the present invention.
도 7은 본 발명의 일실시예에 따른 증강현실 글라스 장치의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법에 따른 예측 과정을 도시한 도면.7 is a diagram illustrating a prediction process according to a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device according to an embodiment of the present invention.
<부호의 설명><Description of codes>
10: 증강현실 글라스 장치10: augmented reality glasses device
100: 손 관절 예측 장치100: hand joint prediction device
110: 모델 저장부110: model storage unit
111: 관절 평가 모델111 joint evaluation model
120: 예측부120: prediction unit
121: 검출 모듈121: detection module
122: 판단 모듈122: judgment module
123: 예측 모듈123: prediction module
200: 카메라200: camera
300: 제어부300: control unit
S100: 손 관절 사이의 상관관계를 인공신경망 기반으로 학습한 관절 평가 모델을 저장하는 단계S100: Step of storing the joint evaluation model that has learned the correlation between the hand joints based on the artificial neural network
S200: 증강현실 글라스 장치에서 촬영되는 입력 영상 전체에서 손 관절 후보가 되는 후보 키포인트를 실시간으로 검출하는 단계S200: Detecting in real time candidate key points that become hand joint candidates in the entire input image captured by the augmented reality glasses device
S300: 관절 평가 모델을 사용해 후보 키포인트 사이의 상관관계를 평가하고, 후보 키포인트 중에서 손 관절에 대응하는 관절 포인트 및 관절 포인트 사이의 연결 관계를 판단하는 단계S300: Evaluating the correlation between candidate key points using the joint evaluation model, and determining the joint points corresponding to the hand joints among the candidate key points and the connection relationship between the joint points.
S400: 관절 포인트를 연결 관계에 따라 연결해 적어도 하나 이상의 손 관절을 예측하는 단계S400: Predicting at least one hand joint by connecting joint points according to a connection relationship
이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.Hereinafter, preferred embodiments will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, in describing a preferred embodiment of the present invention in detail, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the same reference numerals are used throughout the drawings for parts having similar functions and actions.
덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 ‘연결’ 되어 있다고 할 때, 이는 ‘직접적으로 연결’ 되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 ‘간접적으로 연결’ 되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 ‘포함’ 한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.In addition, throughout the specification, when a part is said to be 'connected' to another part, this is not only the case where it is 'directly connected', but also the case where it is 'indirectly connected' with another element in between. include In addition, 'including' a certain component means that other components may be further included, rather than excluding other components unless otherwise specified.
도 1은 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치(100)가 탑재된 증강현실 글라스 장치(10)의 구성을 도시한 도면이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치(100)는 증강현실 글라스 장치(10)에 탑재될 수 있다.1 is a diagram showing the configuration of an augmented reality glasses device 10 equipped with a lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention. to be. As shown in FIG. 1 , a lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention is to be mounted on the augmented reality glasses device 10. can
즉, 증강현실 글라스 장치(10)는, 실시간 손동작 인터페이스 구현을 위해 손 관절 예측 장치(100)를 포함할 수 있다. 보다 구체적으로, 손 관절 예측 장치(100)가 증강현실 글라스 장치(10)의 카메라(200)에서 촬영되는 입력 영상에서 손 관절을 예측하고, 예측된 손 관절을 제어부(300)에 전달해 제어부(300)가 예측된 손 관절에 대응하는 손동작 인터페이스를 처리할 수 있다.That is, the augmented reality glasses device 10 may include the hand joint prediction device 100 to implement a real-time hand motion interface. More specifically, the hand joint prediction device 100 predicts hand joints from an input image captured by the camera 200 of the augmented reality glasses device 10, and transmits the predicted hand joints to the controller 300 so that the controller 300 ) can process the hand motion interface corresponding to the predicted hand joint.
여기서, 손 관절은 손을 구성하는 관절 포인트를 연결한 뼈대 형태의 골격을 의미하는 것으로, 손 관절을 예측한다는 것은 손을 구성하는 복수의 관절 포인트와 관절 포인트들의 연결 관계를 예측하는 것을 의미할 수 있으며, 손 관절 예측은 손의 뼈대 형태를 추정하여 추정된 손의 뼈대 형태로부터 손동작을 구성해 손동작 인터페이스에 사용하기 위한 것이다.Here, the hand joint refers to a skeleton in the form of a skeleton connecting joint points constituting the hand, and predicting a hand joint may mean predicting a plurality of joint points constituting the hand and a connection relationship between the joint points. Hand joint prediction is for estimating the skeletal shape of the hand, constructing a hand motion from the estimated skeletal shape of the hand, and using it for the hand motion interface.
도 2는 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치(100)의 구성을 도시한 도면이다. 도 2에 도시된 바와 같이, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치(100)는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위해 증강현실 글라스 장치(10)에 탑재되는 손 관절 예측 장치(100)로서, 손 관절 사이의 상관관계를 인공신경망 기반으로 학습한 관절 평가 모델(111)을 저장하는 모델 저장부(110) 및 증강현실 글라스 장치(10)에서 촬영되는 입력 영상으로부터 적어도 하나 이상의 손 관절을 예측하는 예측부(120)를 포함하여 구성될 수 있다.2 is a diagram showing the configuration of a lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention. As shown in FIG. 2 , the lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention is As a hand joint prediction device 100 mounted on the augmented reality glasses device 10 to implement a hand motion interface, a model storage unit for storing a joint evaluation model 111 learning correlations between hand joints based on an artificial neural network ( 110) and a prediction unit 120 that predicts at least one hand joint from an input image captured by the augmented reality glasses device 10.
또한, 예측부(120)는, 증강현실 글라스 장치(10)에서 촬영되는 입력 영상 전체에서 손 관절 후보가 되는 후보 키포인트를 실시간으로 검출하는 검출 모듈(121); 모델 저장부(110)에 저장된 관절 평가 모델(111)을 사용해, 검출 모듈(121)에서 검출된 후보 키포인트 사이의 상관관계를 평가하고, 후보 키포인트 중에서 손 관절에 대응하는 관절 포인트 및 관절 포인트 사이의 연결 관계를 판단하는 판단 모듈(122); 및 판단 모듈(122)의 판단 결과로부터 관절 포인트를 연결 관계에 따라 연결해 적어도 하나 이상의 손 관절을 예측하는 예측 모듈(123)을 포함하며, 예측부(120)는, 검출 모듈(121), 판단 모듈(122) 및 예측 모듈(123)을 순차로 1회 동작하여 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측할 수 있다.In addition, the prediction unit 120 includes a detection module 121 for detecting in real time candidate key points that become hand joint candidates in the entire input image captured by the augmented reality glasses device 10; Using the joint evaluation model 111 stored in the model storage unit 110, the correlation between candidate key points detected by the detection module 121 is evaluated, and among the candidate key points, the joint points corresponding to the hand joints and the joint points between the joint points are evaluated. a judgment module 122 for determining a connection relationship; and a prediction module 123 for predicting at least one hand joint by connecting joint points according to a connection relationship from the determination result of the determination module 122, wherein the prediction unit 120 includes the detection module 121 and the determination module. At least one hand joint included in the input image may be predicted by sequentially operating step 122 and the prediction module 123 once.
도 3은 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법의 흐름을 도시한 도면이다. 도 3에 도시된 바와 같이, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법은, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위해, 증강현실 글라스 장치(10)에서 각 단계가 수행되는 손 관절 예측 방법으로서, 손 관절 사이의 상관관계를 인공신경망 기반으로 학습한 관절 평가 모델(111)을 저장하는 단계(S100), 증강현실 글라스 장치(10)에서 촬영되는 입력 영상 전체에서 손 관절 후보가 되는 후보 키포인트를 실시간으로 검출하는 단계(S200), 관절 평가 모델(111)을 사용해 후보 키포인트 사이의 상관관계를 평가하고, 후보 키포인트 중에서 손 관절에 대응하는 관절 포인트 및 관절 포인트 사이의 연결 관계를 판단하는 단계(S300) 및 관절 포인트를 연결 관계에 따라 연결해 적어도 하나 이상의 손 관절을 예측하는 단계(S400)를 포함하여 구현될 수 있다.FIG. 3 is a flow diagram illustrating a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention. As shown in FIG. 3 , a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention implements a real-time hand motion interface of the augmented reality glasses device 10 For this, as a hand joint prediction method in which each step is performed in the augmented reality glasses device 10, the joint evaluation model 111 learned from the correlation between the hand joints based on the artificial neural network is stored (S100), augmentation Detecting candidate keypoints as hand joint candidates in real time from the entire input image captured by the real glasses device 10 (S200), evaluating the correlation between candidate keypoints using the joint evaluation model 111, and Among them, it may be implemented including determining joint points corresponding to the hand joints and a connection relationship between the joint points (S300) and predicting at least one hand joint by connecting the joint points according to the connection relationship (S400). .
또한, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법은, 단계 S200 내지 단계 S400의 1회 수행을 통해 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측할 수 있다. 보다 구체적으로, 입력 영상에서 손 영역을 검출하는 손 검출 과정과 개별 손 영역에서 각각 손 관절을 예측하는 과정 없이, 입력 영상 전체에서 손 관절의 후보 키포인트를 검출한 다음 관절 평가 모델(111)을 사용해 후보 키포인트 사이의 상관관계에 기초해 입력 영상에 포함된 적어도 하나 이상의 손 관절을 한 번에 예측할 수 있다.In addition, in the lightweight hand joint prediction method for real-time hand motion interface implementation of the augmented reality glasses device 10 according to an embodiment of the present invention, at least one included in the input image is performed through steps S200 to S400 once. More than one hand joint can be predicted. More specifically, without the hand detection process of detecting hand regions in the input image and the process of predicting hand joints in individual hand regions, candidate keypoints for hand joints are detected in the entire input image, and then the joint evaluation model 111 is used. At least one hand joint included in the input image may be predicted at once based on the correlation between candidate keypoints.
즉, 종래의 손 관절 예측 방법에서는, 손 영역을 검출한 다음 개별 손 영역에 대해 키포인트 검출 및 손 관절 예측 과정을 각각 수행하므로, 키포인트 검출 및 손 관절 예측 과정을 검출된 손 영역의 개수만큼 반복 수행하게 된다. 반면에, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법은, 단계 S200 내지 단계 S400의 1회 수행만으로도 복수의 손 관절을 예측할 수 있다. 따라서 손 관절 예측 과정을 단순화 및 경량화할 수 있고, 입력 영상 내의 손의 개수에 비례하여 관절 예측 시간과 연산량이 증가하지 않으므로, 임베디드 환경에서 실시간으로 손 관절 예측을 신속하게 할 수 있다.That is, in the conventional hand joint prediction method, after detecting a hand region, keypoint detection and hand joint prediction processes are respectively performed for each hand region, so the keypoint detection and hand joint prediction processes are repeated as many times as the number of detected hand regions. will do On the other hand, in the lightweight hand joint prediction method for implementing the real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention, a plurality of hand joints can be predicted by performing steps S200 to S400 only once. have. Therefore, the hand joint prediction process can be simplified and lightened, and since the joint prediction time and computation amount do not increase in proportion to the number of hands in the input image, hand joint prediction can be quickly performed in real time in an embedded environment.
이하에서는, 종래의 손 관절 예측 방법과 비교하여, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법에 대해 상세히 설명하도록 한다.Hereinafter, a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention will be described in detail as compared to the conventional hand joint prediction method.
도 4는 종래의 손 관절 예측 방법에 따른 예측 과정을 도시한 도면이고, 도 5는 종래의 손 관절 예측 방법의 흐름을 간략화하여 도시한 도면이다. 도 4 및 도 5에 도시된 바와 같이, 종래의 손 관절 예측 시에는 top-down 방식으로 관절을 추정하였다. 즉, 입력 영상에서 사람의 손이 있는 후보 영역을 찾고(도 4의 (a)), 검출된 후보 영역이 실제 손인지를 판단하여 손 영역을 바운딩 박스(bounding box) 형태로 검출한다(도 4의 (b)). 검출한 손 영역의 바운딩 박스를 잘라서 손 영상을 획득하며, 개별 손 영상에 대해 포즈 추정 알고리즘을 적용해 손 관절을 추정하게 된다(도 4의 (c) 및 (d)).4 is a diagram showing a prediction process according to a conventional hand joint prediction method, and FIG. 5 is a diagram showing a simplified flow of the conventional hand joint prediction method. As shown in FIGS. 4 and 5 , when predicting a conventional hand joint, a top-down method was used to estimate the joint. That is, a candidate region with a human hand is found in the input image (FIG. 4(a)), and the hand region is detected in the form of a bounding box by determining whether the detected candidate region is a real hand (FIG. 4(a)). of (b)). A hand image is acquired by cutting the bounding box of the detected hand region, and a hand joint is estimated by applying a pose estimation algorithm to each hand image (Fig. 4(c) and (d)).
이와 같이, 종래의 손 관절 예측 방법에서는, 하나의 입력 영상에 손 후보 영역이 다수 존재하면 손 영역의 검출과 각각의 손 영역에서 관절 추정을 모두 해야 하므로, 손 후보 영역의 개수에 비례해서 손 검출 리소스 사용량과 관절 추정에 소요되는 시간 및 연산량이 증가한다. 따라서 증강현실 글라스 장치(10)와 같은 소형 임베디드 환경에서는 연산 리소스가 부족하게 되어 순간적으로 FPS 감소 등의 불안정한 CPU 리소스 현상이 나타나므로, 증강현실 글라스 장치(10)의 실시간 동작에는 부적합한 문제가 있다.As described above, in the conventional hand joint prediction method, when there are a plurality of hand candidate regions in one input image, both hand region detection and joint estimation in each hand region must be performed. Therefore, hand detection is performed in proportion to the number of hand candidate regions. The amount of time and computation required for resource usage and joint estimation increases. Therefore, in a small embedded environment such as the augmented reality glasses device 10, calculation resources are insufficient and unstable CPU resources such as FPS decrease instantaneously appear, which is unsuitable for real-time operation of the augmented reality glasses device 10.
도 6은 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법의 흐름을 간략화하여 도시한 도면이고, 도 7은 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법에 따른 예측 과정을 도시한 도면이다. 도 6 및 도 7에 도시된 바와 같이, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법은, 입력 영상 전체에서 손 관절의 후보 키포인트를 검출하고(도 7의 (a)), 후보 키포인트 사이의 상관관계를 평가해 손 관절을 예측할 수 있다(도 7의 (b)). 이를 종래의 top-down 방식과 대비하여 bottom-up 방식이라고 할 수 있다. 이러한 bottom-up 방식에 따르면, 도 5에 도시된 바와 같은 종래의 top-down 방식의 손 관절 예측의 후보 영역 검출 및 모든 후보 영역에 대한 반복 절차 없이 손 관절을 예측할 수 있으므로, 리소스 사용량이 적고 실시간 빠른 추정 속도를 가질 수 있다.6 is a diagram showing a simplified flow of a lightweight hand joint prediction method for real-time hand motion interface implementation of the augmented reality glasses device 10 according to an embodiment of the present invention, and FIG. 7 is an embodiment of the present invention It is a diagram showing a prediction process according to a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to. As shown in FIGS. 6 and 7 , a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention includes candidate key points of hand joints in the entire input image. It is possible to predict the hand joint by detecting (Fig. 7 (a)) and evaluating the correlation between candidate key points (Fig. 7 (b)). This can be referred to as a bottom-up method in contrast to the conventional top-down method. According to this bottom-up method, it is possible to predict hand joints without detecting candidate regions of the conventional top-down method of hand joint prediction and repeating procedures for all candidate regions as shown in FIG. It can have fast estimation speed.
종래의 top-down 방식은 검출된 손 영역을 자른 손 영상에서 관절 포인트를 획득하고 손 관절을 예측하므로, 추정 결과의 정확도가 본 발명인 bottom-up 방식보다 더 높을 수 있다. 그러나 손동작 인터페이스에 사용되는 손의 형태는 제한되어 있고, 본 발명인 bottom-up 방식의 적은 리소스 사용량과 빠른 속도를 고려할 때, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법은, 증강현실 글라스 장치(10)의 임베디드 환경에서 실시간으로 적용하기에 종래 방식보다 효율적일 수 있다.Since the conventional top-down method obtains joint points from a hand image obtained by cropping the detected hand region and predicts the hand joint, the accuracy of the estimation result may be higher than that of the bottom-up method according to the present invention. However, the shape of the hand used for the hand gesture interface is limited, and considering the small resource usage and high speed of the bottom-up method of the present invention, the real-time hand gesture interface of the augmented reality glasses device 10 according to an embodiment of the present invention A lightweight hand joint prediction method for implementation may be more efficient than conventional methods in real-time application in an embedded environment of the augmented reality glasses device 10 .
이하에서는, 본 발명의 일실시예에 따른 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법의 각 단계에 대해 상세히 설명하도록 한다.Hereinafter, each step of a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 according to an embodiment of the present invention will be described in detail.
본 발명은 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법에 관한 것으로서, 메모리 및 프로세서를 포함한 하드웨어에서 기록되는 소프트웨어로 구성될 수 있다. 예를 들어, 본 발명의 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법은, 증강현실 글라스 장치(10)에 저장 및 구현될 수 있다. 이하에서는 설명의 편의를 위해, 각 단계를 수행하는 주체는 생략될 수 있다.The present invention relates to a lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device 10, and may be configured as software recorded in hardware including a memory and a processor. For example, a lightweight hand joint prediction method for implementing a real-time hand motion interface of the augmented reality glasses device 10 of the present invention may be stored and implemented in the augmented reality glasses device 10 . In the following, for convenience of explanation, the subject performing each step may be omitted.
단계 S100에서는, 손 관절 사이의 상관관계를 인공신경망 기반으로 학습한 관절 평가 모델(111)을 저장할 수 있다. 보다 구체적으로, 관절 평가 모델(111)을 손 관절 예측 장치(100)의 모델 저장부(110)가 저장하고, 이를 이용해 예측부(120)가 증강현실 글라스 장치(10)의 임베디드 환경에서 손 관절을 예측할 수 있다. 특히, 많은 컴퓨팅 리소스가 요구되는 학습 과정은 서버 컴퓨터 등에서 처리하고, 학습을 통해 생성한 관절 평가 모델(111)을 증강현실 글라스 장치(10)에 저장해 사용할 수 있다.In step S100, the joint evaluation model 111 obtained by learning the correlation between the joints of the hand based on the artificial neural network may be stored. More specifically, the joint evaluation model 111 is stored in the model storage unit 110 of the hand joint prediction device 100, and the prediction unit 120 uses this to store the hand joint in the embedded environment of the augmented reality glass device 10. can predict In particular, a learning process requiring a lot of computing resources may be processed in a server computer or the like, and the joint evaluation model 111 generated through learning may be stored in the augmented reality glasses device 10 and used.
여기서, 단계 S100의 관절 평가 모델(111)은, 손 관절 포인트 맵과 관절 관계도 맵을 인공신경망 기반으로 학습하여 구성할 수 있다. 보다 구체적으로, 다양한 손동작 이미지를 다양한 각도와 조명에서 촬영한 영상으로부터 손 관절 포인트 맵과 관절 관계도 맵을 생성하고, 손 관절 포인트 맵과 관절 관계도 맵을 학습 데이터로 구성하여 딥러닝 알고리즘을 사용해 관절 평가 모델(111)을 생성할 수 있다.Here, the joint evaluation model 111 of step S100 may be configured by learning the hand joint point map and the joint relationship map based on the artificial neural network. More specifically, a hand joint point map and a joint relationship map are created from images of various hand gestures taken at various angles and lighting, and the hand joint point map and joint relationship map are configured as training data and used with a deep learning algorithm. A joint evaluation model 111 may be created.
관절 평가 모델(111)의 생성에 사용되는 알고리즘은 인공신경망 모델일 수 있으며 CNN, RNN 등을 사용할 수 있다. 또한, 관절 관계도 맵의 효과적인 학습을 위해 그래프 신경망(Graph Neural Network), 그래프 합성곱 신경망(Graph Convolutional Network, GCN) 등을 사용할 수도 있다.An algorithm used to generate the joint evaluation model 111 may be an artificial neural network model, and CNN, RNN, and the like may be used. In addition, a graph neural network, a graph convolutional network (GCN), or the like may be used for effective learning of the joint relationship map.
단계 S200에서는, 증강현실 글라스 장치(10)에서 촬영되는 입력 영상 전체에서 손 관절 후보가 되는 후보 키포인트를 실시간으로 검출할 수 있다. 즉, 단계 S200에서는, 카메라(200)에서 촬영되는 입력 영상에서 손 영역을 검출하거나 바운딩 박스를 검출해 자르는 과정 없이, 촬영되는 입력 영상 전체에서 손 관절의 후보 키포인트를 검출할 수 있다(도 7의 (a)).In step S200 , candidate key points serving as hand joint candidates may be detected in real time from the entire input image captured by the augmented reality glasses device 10 . That is, in step S200, candidate key points of hand joints can be detected from the entire captured input image without detecting the hand region or detecting and cropping the bounding box in the input image captured by the camera 200 (see FIG. 7). (a)).
보다 구체적으로, 단계 S200에서는, 증강현실 글라스 장치(10)에서 촬영되는 2차원 입력 영상에서 후보 키포인트를 검출할 수 있다. 즉, 증강현실 글라스 장치(10)의 카메라(200)는 2차원 영상을 촬영하는 일반적인 카메라 장치로, 착용자의 시선 방향의 영상을 촬영할 수 있다.More specifically, in step S200 , candidate keypoints may be detected from a 2D input image captured by the augmented reality glasses device 10 . That is, the camera 200 of the augmented reality glasses device 10 is a general camera device that captures a 2D image, and can capture an image in the direction of the wearer's gaze.
단계 S300에서는, 단계 S100에서 저장된 관절 평가 모델(111)을 사용해, 단계 S200에서 검출된 후보 키포인트 사이의 상관관계를 평가하고, 후보 키포인트 중에서 손 관절에 대응하는 관절 포인트 및 관절 포인트 사이의 연결 관계를 판단할 수 있다. 즉, 후보 키포인트들을 서로 매칭하여 관절 평가 모델(111)을 통해 상관관계를 평가하고, 평가한 상관관계를 토대로 손을 구성하는 관절 포인트인지와, 서로 연결된 관절 즉, 한 손의 관절 포인트인지를 판단할 수 있다.In step S300, the correlation between the candidate key points detected in step S200 is evaluated using the joint evaluation model 111 stored in step S100, and the joint points corresponding to the hand joints among the candidate key points and the connection relationship between the joint points are evaluated. can judge That is, candidate key points are matched with each other to evaluate the correlation through the joint evaluation model 111, and based on the evaluated correlation, it is determined whether the joint points constitute the hand or the joints connected to each other, that is, the joint points of one hand. can do.
또한, 단계 S300에서는, 후보 키포인트 사이의 상관관계를 사용해, 입력 영상에 포함된 손 관절 별로 분류해 관절 포인트를 선정하고, 분류된 그룹 내에서 관절 포인트 사이의 연결 관계를 판단할 수 있다. 즉, 도 7에 도시된 바와 같이, 입력 영상에 포함된 손이 복수이면 관절 포인트를 손별로 분류하여, 각각의 손을 구성하는 관절 포인트들을 그룹화할 수 있다. 즉, 도 7의 (a)에서 검출된 후보 키포인트들의 상관관계를 평가해, 왼손에 해당하는 관절 포인트들과 오른손에 해당하는 관절 포인트들을 각각 분류해 선정하고, 왼손 관절 포인트들의 연결 관계, 오른손 관절 포인트들의 연결 관계를 판단할 수 있다.In addition, in step S300, by using the correlation between candidate key points, it is possible to classify the input image for each hand joint included in the input image, select a joint point, and determine a connection between the joint points within the classified group. That is, as shown in FIG. 7 , if there are a plurality of hands included in the input image, the joint points constituting each hand may be grouped by classifying the joint points for each hand. That is, the correlation between the candidate key points detected in (a) of FIG. 7 is evaluated, joint points corresponding to the left hand and joint points corresponding to the right hand are classified and selected, and the connection relationship between the left hand joint points and the right hand joint are selected. A connection relationship between points may be determined.
단계 S400에서는, 단계 S300의 판단 결과로부터 관절 포인트를 연결 관계에 따라 연결해 적어도 하나 이상의 손 관절을 예측할 수 있다. 보다 구체적으로, 각 그룹의 관절 포인트들과 그 연결 관계를 사용해 손 관절을 예측하고 손동작을 추정할 수 있다. 즉, 도 7의 (b)에 도시된 바와 같이, 왼손 관절 포인트의 연결 관계로부터 왼손 관절을 예측하고, 오른손 관절 포인트들의 연결 관계로부터 오른손 관절을 예측할 수 있다.In step S400, at least one hand joint may be predicted by connecting joint points according to a connection relationship from the determination result of step S300. More specifically, it is possible to predict hand joints and estimate hand motions using the joint points of each group and their connection relationships. That is, as shown in (b) of FIG. 7 , a left hand joint may be predicted from a connection relationship between left hand joint points, and a right hand joint may be predicted from a connection relationship between right hand joint points.
이와 같이, 관절 평가 모델(111)을 여러 번 수행할 필요 없이, 단계 S300의 판단 결과를 이용해 한 번에 복수의 손 관절을 예측할 수 있으므로, 리소스 사용을 최소화하고 신속하게 실시간으로 손동작을 추정할 수 있다.In this way, since a plurality of hand joints can be predicted at once using the determination result of step S300 without the need to perform the joint evaluation model 111 several times, resource consumption can be minimized and hand motions can be quickly estimated in real time. have.
또한, 단계 S400에서는, 3차원의 손 관절을 예측할 수 있다. 즉, 관절 포인트들의 상대적 위치에 따라 3차원으로 손 관절을 구성할 수 있으므로, 2차원 입력 영상으로부터 3차원 손동작을 추정할 수 있다.In addition, in step S400, a three-dimensional hand joint may be predicted. That is, since hand joints can be configured in three dimensions according to the relative positions of joint points, a three-dimensional hand motion can be estimated from a two-dimensional input image.
전술한 바와 같이, 본 발명에서 제안하고 있는 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법 및 장치에 따르면, 손 검출 과정 없이 입력 영상 전체에서 손 관절의 후보 키포인트를 검출한 다음 관절 평가 모델(111)을 사용해 후보 키포인트 사이의 상관관계에 기초해 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측함으로써, 별도의 손 영역 검출과 검출된 각각의 손 영역에서의 관절 예측 과정 없이 1회의 후보 키포인트 검출로부터 복수의 손 관절을 예측할 수 있으므로, 손 관절 예측 과정을 단순화 및 경량화할 수 있고, 입력 영상 내의 손의 개수에 비례하여 관절 예측 시간과 연산량이 증가하지 않으므로, 임베디드 환경에서 실시간으로 손 관절 예측을 신속하게 할 수 있다.As described above, according to the lightweight hand joint prediction method and device for implementing the real-time hand motion interface of the augmented reality glasses device 10 proposed in the present invention, candidate key points of hand joints are obtained from the entire input image without a hand detection process. After detection, the joint evaluation model 111 is used to predict at least one or more hand joints included in the input image based on the correlation between candidate key points, thereby detecting a separate hand region and predicting a joint in each detected hand region. Since a plurality of hand joints can be predicted from one candidate keypoint detection without a process, the hand joint prediction process can be simplified and lightened, and the joint prediction time and computation amount do not increase in proportion to the number of hands in the input image, so the embedded environment can quickly predict hand joints in real time.
한편, 본 발명은 다양한 통신 단말기로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터에서 판독 가능한 매체를 포함할 수 있다. 예를 들어, 컴퓨터에서 판독 가능한 매체는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD_ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다.Meanwhile, the present invention may include a computer-readable medium including program instructions for performing operations implemented in various communication terminals. For example, computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD_ROMs and DVDs, and floptical disks. It may include hardware devices specially configured to store and execute program instructions, such as magneto-optical media and ROM, RAM, flash memory, and the like.
이와 같은 컴퓨터에서 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 이때, 컴퓨터에서 판독 가능한 매체에 기록되는 프로그램 명령은 본 발명을 구현하기 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예를 들어, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Such computer-readable media may include program instructions, data files, data structures, etc. alone or in combination. At this time, program instructions recorded on a computer-readable medium may be specially designed and configured to implement the present invention, or may be known and usable to those skilled in computer software. For example, it may include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes generated by a compiler.
이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.The present invention described above can be variously modified or applied by those skilled in the art to which the present invention belongs, and the scope of the technical idea according to the present invention should be defined by the claims below.

Claims (8)

  1. 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위해, 상기 증강현실 글라스 장치(10)에서 각 단계가 수행되는 손 관절 예측 방법으로서,As a hand joint prediction method in which each step is performed in the augmented reality glasses device 10 to implement a real-time hand motion interface of the augmented reality glasses device 10,
    (1) 손 관절 사이의 상관관계를 인공신경망 기반으로 학습한 관절 평가 모델(111)을 저장하는 단계;(1) storing the joint evaluation model 111, in which the correlation between the joints of the hand is learned based on the artificial neural network;
    (2) 증강현실 글라스 장치(10)에서 촬영되는 입력 영상 전체에서 손 관절 후보가 되는 후보 키포인트를 실시간으로 검출하는 단계;(2) detecting in real time candidate key points that become hand joint candidates in the entire input image captured by the augmented reality glasses device 10;
    (3) 상기 단계 (1)에서 저장된 관절 평가 모델(111)을 사용해, 상기 단계 (2)에서 검출된 후보 키포인트 사이의 상관관계를 평가하고, 상기 후보 키포인트 중에서 손 관절에 대응하는 관절 포인트 및 상기 관절 포인트 사이의 연결 관계를 판단하는 단계; 및(3) Using the joint evaluation model 111 stored in step (1), the correlation between the candidate key points detected in step (2) is evaluated, and among the candidate key points, joint points corresponding to the hand joints and Determining a connection relationship between joint points; and
    (4) 상기 단계 (3)의 판단 결과로부터 상기 관절 포인트를 상기 연결 관계에 따라 연결해 적어도 하나 이상의 손 관절을 예측하는 단계를 포함하며,(4) predicting at least one hand joint by connecting the joint points according to the connection relationship from the determination result of step (3);
    상기 단계 (2) 내지 단계 (4)의 1회 수행을 통해 상기 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측하는 것을 특징으로 하는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법.Lightweight for real-time hand motion interface implementation of augmented reality glasses device 10, characterized in that at least one hand joint included in the input image is predicted through one execution of steps (2) to (4) A method for predicting hand joints.
  2. 제1항에 있어서, 상기 단계 (1)의 관절 평가 모델(111)은,The method of claim 1, wherein the joint evaluation model 111 of step (1),
    손 관절 포인트 맵과 관절 관계도 맵을 인공신경망 기반으로 학습하여 구성된 것을 특징으로 하는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법.A lightweight hand joint prediction method for implementing a real-time hand motion interface of an augmented reality glasses device (10), characterized in that it is configured by learning a hand joint point map and a joint relationship map based on an artificial neural network.
  3. 제1항에 있어서, 상기 단계 (3)에서는,The method of claim 1, wherein in step (3),
    상기 후보 키포인트 사이의 상관관계를 사용해, 상기 입력 영상에 포함된 손 관절 별로 분류해 상기 관절 포인트를 선정하고, 분류된 그룹 내에서 상기 관절 포인트 사이의 연결 관계를 판단하는 것을 특징으로 하는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법.Characterized in that, by using the correlation between the candidate key points, the joint points are selected by classifying each hand joint included in the input image, and the connection relationship between the joint points is determined within the classified group. A lightweight hand joint prediction method for implementing a real-time hand motion interface in a glass device (10).
  4. 제1항에 있어서,According to claim 1,
    상기 단계 (2)에서는, 상기 증강현실 글라스 장치(10)에서 촬영되는 2차원 입력 영상에서 상기 후보 키포인트를 검출하며,In the step (2), the candidate keypoint is detected from the 2D input image captured by the augmented reality glasses device 10;
    상기 단계 (4)에서는, 3차원의 손 관절을 예측하는 것을 특징으로 하는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 방법.In step (4), a three-dimensional hand joint is predicted.
  5. 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위해 상기 증강현실 글라스 장치(10)에 탑재되는, 손 관절 예측 장치(100)로서,As a hand joint prediction device 100 mounted on the augmented reality glasses device 10 to implement a real-time hand motion interface of the augmented reality glasses device 10,
    손 관절 사이의 상관관계를 인공신경망 기반으로 학습한 관절 평가 모델(111)을 저장하는 모델 저장부(110); 및a model storage unit 110 for storing a joint evaluation model 111 obtained by learning correlations between hand joints based on an artificial neural network; and
    상기 증강현실 글라스 장치(10)에서 촬영되는 입력 영상으로부터 적어도 하나 이상의 손 관절을 예측하는 예측부(120)를 포함하되,A prediction unit 120 predicting at least one hand joint from an input image captured by the augmented reality glasses device 10,
    상기 예측부(120)는,The prediction unit 120,
    상기 증강현실 글라스 장치(10)에서 촬영되는 입력 영상 전체에서 손 관절 후보가 되는 후보 키포인트를 실시간으로 검출하는 검출 모듈(121);a detection module 121 for detecting in real time candidate key points that become hand joint candidates in the entire input image captured by the augmented reality glasses device 10;
    상기 모델 저장부(110)에 저장된 관절 평가 모델(111)을 사용해, 상기 검출 모듈(121)에서 검출된 후보 키포인트 사이의 상관관계를 평가하고, 상기 후보 키포인트 중에서 손 관절에 대응하는 관절 포인트 및 상기 관절 포인트 사이의 연결 관계를 판단하는 판단 모듈(122); 및A correlation between candidate key points detected by the detection module 121 is evaluated using the joint evaluation model 111 stored in the model storage unit 110, and a joint point corresponding to a hand joint among the candidate key points is evaluated. a determination module 122 for determining a connection relationship between joint points; and
    상기 판단 모듈(122)의 판단 결과로부터 상기 관절 포인트를 상기 연결 관계에 따라 연결해 적어도 하나 이상의 손 관절을 예측하는 예측 모듈(123)을 포함하며,And a prediction module 123 for predicting at least one hand joint by connecting the joint points according to the connection relationship from the determination result of the determination module 122,
    상기 예측부(120)는,The prediction unit 120,
    상기 검출 모듈(121), 판단 모듈(122) 및 예측 모듈(123)을 순차로 1회 동작하여 상기 입력 영상에 포함된 적어도 하나 이상의 손 관절을 예측하는 것을 특징으로 하는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치(100).The augmented reality glasses device (10) characterized by predicting at least one hand joint included in the input image by sequentially operating the detection module (121), the determination module (122) and the prediction module (123) once. ) A lightweight hand joint prediction device 100 for implementing a real-time hand motion interface.
  6. 제5항에 있어서, 상기 모델 저장부(110)는,The method of claim 5, wherein the model storage unit 110,
    손 관절 포인트 맵과 관절 관계도 맵을 인공신경망 기반으로 학습하여 구성된 상기 관절 평가 모델(111)을 저장하는 것을 특징으로 하는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치(100).A lightweight hand joint for implementing a real-time hand motion interface of an augmented reality glasses device 10, characterized in that the joint evaluation model 111 configured by learning the hand joint point map and the joint relationship map based on an artificial neural network is stored. Prediction Device (100).
  7. 제5항에 있어서, 상기 판단 모듈(122)은,The method of claim 5, wherein the determination module 122,
    상기 후보 키포인트 사이의 상관관계를 사용해, 상기 입력 영상에 포함된 손 관절 별로 분류해 상기 관절 포인트를 선정하고, 분류된 그룹 내에서 상기 관절 포인트 사이의 연결 관계를 판단하는 것을 특징으로 하는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치(100).Characterized in that, by using the correlation between the candidate key points, the joint points are selected by classifying each hand joint included in the input image, and the connection relationship between the joint points is determined within the classified group. A lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the glass device 10.
  8. 제5항에 있어서,According to claim 5,
    상기 검출 모듈(121)은, 상기 증강현실 글라스 장치(10)에서 촬영되는 2차원 입력 영상에서 상기 후보 키포인트를 검출하며,The detection module 121 detects the candidate keypoint from a 2D input image captured by the augmented reality glasses device 10;
    상기 예측 모듈(123)은, 3차원의 손 관절을 예측하는 것을 특징으로 하는, 증강현실 글라스 장치(10)의 실시간 손동작 인터페이스 구현을 위한 경량화된 손 관절 예측 장치(100).The prediction module 123 is a lightweight hand joint prediction device 100 for implementing a real-time hand motion interface of the augmented reality glasses device 10, characterized in that for predicting three-dimensional hand joints.
PCT/KR2022/005823 2021-06-04 2022-04-24 Weight-reduced hand joint prediction method and device for implementation of real-time hand motion interface of augmented reality glass device WO2022255642A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020210073068A KR102548208B1 (en) 2021-06-04 2021-06-04 Lightweight hand joint prediction method and apparatus for real-time hand motion interface implementation of ar glasses device
KR10-2021-0073068 2021-06-04

Publications (1)

Publication Number Publication Date
WO2022255642A1 true WO2022255642A1 (en) 2022-12-08

Family

ID=84324265

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/005823 WO2022255642A1 (en) 2021-06-04 2022-04-24 Weight-reduced hand joint prediction method and device for implementation of real-time hand motion interface of augmented reality glass device

Country Status (2)

Country Link
KR (1) KR102548208B1 (en)
WO (1) WO2022255642A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180024641A1 (en) * 2016-07-20 2018-01-25 Usens, Inc. Method and system for 3d hand skeleton tracking
KR20200032990A (en) * 2018-09-19 2020-03-27 재단법인 실감교류인체감응솔루션연구단 Method for modelling virtual hand on real hand and apparatus therefor
WO2021051131A1 (en) * 2019-09-09 2021-03-18 Snap Inc. Hand pose estimation from stereo cameras
US20210089162A1 (en) * 2019-09-19 2021-03-25 Finch Technologies Ltd. Calibration of inertial measurement units in alignment with a skeleton model to control a computer system based on determination of orientation of an inertial measurement unit from an image of a portion of a user
US20210166486A1 (en) * 2019-12-03 2021-06-03 Samsung Electronics Co., Ltd. Electronic apparatus and method for controlling thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102102309B1 (en) * 2019-03-12 2020-04-21 주식회사 피앤씨솔루션 Object recognition method for 3d virtual space of head mounted display apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180024641A1 (en) * 2016-07-20 2018-01-25 Usens, Inc. Method and system for 3d hand skeleton tracking
KR20200032990A (en) * 2018-09-19 2020-03-27 재단법인 실감교류인체감응솔루션연구단 Method for modelling virtual hand on real hand and apparatus therefor
WO2021051131A1 (en) * 2019-09-09 2021-03-18 Snap Inc. Hand pose estimation from stereo cameras
US20210089162A1 (en) * 2019-09-19 2021-03-25 Finch Technologies Ltd. Calibration of inertial measurement units in alignment with a skeleton model to control a computer system based on determination of orientation of an inertial measurement unit from an image of a portion of a user
US20210166486A1 (en) * 2019-12-03 2021-06-03 Samsung Electronics Co., Ltd. Electronic apparatus and method for controlling thereof

Also Published As

Publication number Publication date
KR102548208B1 (en) 2023-06-28
KR20220164376A (en) 2022-12-13

Similar Documents

Publication Publication Date Title
WO2019050360A1 (en) Electronic device and method for automatic human segmentation in image
WO2018048000A1 (en) Device and method for three-dimensional imagery interpretation based on single camera, and computer-readable medium recorded with program for three-dimensional imagery interpretation
WO2016107231A1 (en) System and method for inputting gestures in 3d scene
WO2020180134A1 (en) Image correction system and image correction method thereof
WO2017082539A1 (en) Augmented reality providing apparatus and method for user styling
WO2019093599A1 (en) Apparatus for generating user interest information and method therefor
EP3776469A1 (en) System and method for 3d association of detected objects
WO2019208851A1 (en) Virtual reality interface method and apparatus allowing merging with real space
WO2022197136A1 (en) System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training
CN109086725A (en) Hand tracking and machine readable storage medium
WO2021221490A1 (en) System and method for robust image-query understanding based on contextual features
WO2022255642A1 (en) Weight-reduced hand joint prediction method and device for implementation of real-time hand motion interface of augmented reality glass device
EP3469790A1 (en) Display apparatus and control method thereof
WO2019240330A1 (en) Image-based strength prediction system and method therefor
WO2020141907A1 (en) Image generation apparatus for generating image on basis of keyword and image generation method
WO2022255641A1 (en) Method and apparatus for enhancing hand gesture and voice command recognition performance, for input interface of augmented reality glass device
WO2022092762A1 (en) Stereo matching method and image processing device performing same
WO2020230921A1 (en) Method for extracting features from image using laser pattern, and identification device and robot using same
WO2020116685A1 (en) Device for processing face feature point estimation image on basis of standard face model, and physical computer-readable recording medium in which program for processing face feature point estimation image on basis of standard face model is recorded
WO2016036049A1 (en) Search service providing apparatus, system, method, and computer program
WO2019198900A1 (en) Electronic apparatus and control method thereof
WO2023219254A1 (en) Hand distance estimation method and device for augmented reality glasses
WO2022139327A1 (en) Method and apparatus for detecting unsupported utterances in natural language understanding
WO2019124602A1 (en) Object tracking method and devices for performing same
WO2022050742A1 (en) Method for detecting hand motion of wearable augmented reality device by using depth image and wearable augmented reality device capable of detecting hand motion by using depth image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22816301

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE