TWI755950B

TWI755950B - Action recognition method and system thereof

Info

Publication number: TWI755950B
Application number: TW109142075A
Authority: TW
Inventors: 郭景明; 黃柏程; 林鼎; 王志鴻; 魏禹雯; 林謚翔
Original assignee: 艾陽科技股份有限公司
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2022-02-21
Also published as: TW202223729A

Abstract

The present invention provides an action recognition method and system thereof. The action recognition method comprises: capturing a 2D image and a depth image at the same time, extracting an 2D information of the human skeleton points from the 2D image and correcting it, mapping the 2D information of the human skeleton points to the depth image to obtain the corresponding depth information with respect to the 2D information of the human skeleton points and combining the corrected 2D information of the human skeleton points and the depth information to obtain the 3D information of the human skeleton points, and finally recognizing an action from a set of 3D information of the human skeleton points during a period of time by a matching model.

Description

Action recognition method and system thereof

本發明關於一種動作識別方法及其系統，特別關於一種多模態影像整合及模擬之動作識別方法及其系統。 The present invention relates to an action recognition method and system, in particular to a multimodal image integration and simulation action recognition method and system.

人類活動識別(Human Action Recognition,HAR)是近年很熱門的研究題材，在情境感知領域、運動監測領域、老人照護領域都已研發出相當多的方法及技術。其中，二維影像中的人體骨架點定位技術已趨成熟，可利用即時(real-time)之二維RGB影像(紅綠藍影像)或IR影像(紅外線影像)辨識並定位出頭部、軀幹、上肢及下肢，進而判斷人類的活動狀態。然而，在某些人類活動識別，僅運用二維的骨架點資訊常常無法進行區分，例如有些動作的骨架點在平面上的投影有多處重疊，因此無法進行辯識與區別。 Human Action Recognition (HAR) is a very popular research topic in recent years, and a lot of methods and technologies have been developed in the field of situational awareness, motion monitoring, and elderly care. Among them, the human skeleton point positioning technology in 2D images has become mature, and real-time 2D RGB images (red, green and blue images) or IR images (infrared images) can be used to identify and locate the head, torso, Upper and lower limbs, and then judge the activity state of human beings. However, in the recognition of some human activities, only two-dimensional skeleton point information is often used to distinguish. For example, the projections of the skeleton points of some actions on the plane overlap in many places, so it is impossible to identify and distinguish.

因此，如圖1所示，更高準確度的人類活動識別還是常常仰賴人體的三維點雲(3D point cloud)座標資訊。使用三維感應器取得之三維點雲座標資訊資訊量極為龐大，因此三維感應器的解析度若太高，需耗費太多資源及時間計算才能得到人體骨架點定位圖，而解析度太低又可能因背景雜訊而無法識別出正確的骨架點，進而使動作識別的正確性降低。因此，亟需一種即時且高正確度之動作識別方法及系統。 Therefore, as shown in Figure 1, higher-accuracy human activity recognition still often relies on the coordinate information of the 3D point cloud of the human body. The amount of 3D point cloud coordinate information obtained by using the 3D sensor is extremely large. Therefore, if the resolution of the 3D sensor is too high, it will take too much resources and time to calculate the human skeleton point location map, and if the resolution is too low, it may The correct skeleton point cannot be recognized due to background noise, which reduces the accuracy of action recognition. therefore, There is an urgent need for a real-time and highly accurate action recognition method and system.

本發明提供一種動作識別方法，包含：擷取一時間點之一二維色彩影像或一二維紅外線影像及一相對應之深度影像；萃取該二維色彩影像或該二維紅外線影像中之二維人體骨架點資訊；映射該二維人體骨架點資訊至該深度影像以取得該二維人體骨架點資訊所對應之一深度資訊；使用一尺寸-深度參數及一畸變模型校正該二維人體骨架點資訊；結合經校正之該二維人體骨架點資訊與該深度資訊以得到一三維人體骨架點資訊；以及使用一匹配模型針對一段時間之一系列該三維人體骨架點資訊識別一動作。 The present invention provides a motion recognition method, comprising: capturing a two-dimensional color image or a two-dimensional infrared image and a corresponding depth image at a time point; extracting two of the two-dimensional color image or the two-dimensional infrared image 2D human skeleton point information; map the 2D human skeleton point information to the depth image to obtain depth information corresponding to the 2D human skeleton point information; use a size-depth parameter and a distortion model to correct the 2D human skeleton point information; combining the corrected 2D human skeleton point information with the depth information to obtain a 3D human skeleton point information; and using a matching model to identify an action for a series of the 3D human skeleton point information over a period of time.

本發明又提供一種動作識別系統，包含：一影像擷取裝置，用以擷取一時間點之一二維色彩影像或一二維紅外線影像；一深度影像擷取裝置，用以擷取該時間點之一相對應之深度影像；一記憶體，用以儲存一尺寸-深度參數、一畸變模型及一匹配模型；以及一處理器，電訊連接該影像擷取裝置、該深度影像擷取裝置及該記憶體，該處理器包含：一輸入模組，用以接收該二維色彩影像或該二維紅外線影像及相對應之該深度影像；一儲存模組，將該二維色彩影像或該二維紅外線影像、相對應之該深度影像儲存至該記憶體；一骨架點計算模組，用以萃取該二維色彩影像或該二維紅外線影像中之二維人體骨架點資訊，使用該尺寸-深度參數及該畸變模型校正該二維人體骨架點資訊；一映射模組，用以映射該二維人體骨架點資訊至該深度影像以得到該二維人體骨架點資訊所對應之一深度資訊，以及結合經校正之該二維人體骨架點資訊與該深度資訊以得到一三維人體骨架點資訊；以及一動作識別模組，使用一匹配模型針對一段時間之一系列該三維人體骨架點資訊識別一動作。 The present invention further provides a motion recognition system, comprising: an image capturing device for capturing a two-dimensional color image or a two-dimensional infrared image at a time point; a depth image capturing device for capturing the time a depth image corresponding to one of the points; a memory for storing a size-depth parameter, a distortion model and a matching model; and a processor telecommunicationly connected to the image capture device, the depth image capture device and the The memory, the processor includes: an input module for receiving the two-dimensional color image or the two-dimensional infrared image and the corresponding depth image; a storage module for the two-dimensional color image or the two The 2D infrared image and the corresponding depth image are stored in the memory; a skeleton point calculation module is used to extract the 2D color image or the 2D human skeleton point information in the 2D infrared image, and use the size- The depth parameter and the distortion model correct the two-dimensional human skeleton point information; a mapping module is used to map the two-dimensional human skeleton point information to the depth image to obtain a depth data corresponding to the two-dimensional human skeleton point information information, and combining the corrected two-dimensional human skeleton point information and the depth information to obtain a three-dimensional human skeleton point information; and a motion recognition module using a matching model for a series of the three-dimensional human skeleton point information over a period of time Identify an action.

於某些具體實施例中進一步包含一輸出模組在識別該動作時發出一提示訊號。 In some specific embodiments, an output module further includes an output module to issue a prompt signal when identifying the action.

於某些具體實施例中，該匹配模型係以類神經網路的深度學習架構建立之分類模型參數。 In some embodiments, the matching model is a classification model parameter established by a neural network-like deep learning architecture.

於某些具體實施例中，該畸變模型係用以校正該二維人體骨架點之像素座標位置與影像畸變中心的距離。 In some embodiments, the distortion model is used to correct the distance between the pixel coordinate position of the two-dimensional human skeleton point and the image distortion center.

於某些具體實施例中，其中該記憶體進一步儲存一組位移量參數，該深度影像係先以該組位移量參數進行校正。 In some embodiments, the memory further stores a set of displacement parameters, and the depth image is first calibrated using the set of displacement parameters.

本發明所提供之動作識別方法及系統可以解決人體三維骨架點計算費時與易受設備解析度或雜訊影響的問題，提出一種多模態(multi-modality)影像整合，並能快速且準確模擬三維骨架點資訊的方法及系統，可以應用於各種即時(real-time)人類活動識別情境，例如跌倒情境偵測。 The motion recognition method and system provided by the present invention can solve the problems of time-consuming calculation of three-dimensional skeleton points of the human body and being easily affected by device resolution or noise, and propose a multi-modality (multi-modality) image integration, which can quickly and accurately simulate The method and system for three-dimensional skeleton point information can be applied to various real-time human activity recognition situations, such as fall situation detection.

除非另有定義，本文使用的所有技術和科學術語具有與本發明所屬領域中的技術人員所通常理解相同的含義。 Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

如本文所用，冠詞「一」、「一個」以及「任何」是指一個或多於一個(即至少一個)的物品的文法物品。例如，「一個元件」意指一個元件或於一個元件。 As used herein, the articles "a," "an," and "any" refer to grammatical items of one or more than one (ie, at least one) item. For example, "an element" means an element or an element.

本文所使用的「約」、「大約」或「近乎」一詞實質上代表所述之數值或範圍位於20%以內，較佳為於10%以內，以及更佳者為於5%以內。於文所提供之數字化的量為近似值，意旨若術語「約」、「大約」或「近乎」沒有被使用時亦可被推得。 As used herein, the terms "about", "approximately" or "approximately" mean substantially the stated value or range is within 20%, preferably within 10%, and more preferably within 5% Inside. Numerical quantities provided herein are approximations intended to be inferred if the terms "about," "approximately," or "approximately" were not used.

10:動作識別系統 10: Motion Recognition System

11:影像擷取裝置 11: Image capture device

12:深度影像擷取裝置 12: Depth image capture device

13:記憶體 13: Memory

14:處理器 14: Processor

141:輸入模組 141: Input module

142:儲存模組 142: Storage Module

143:骨架點計算模組 143: Skeleton point calculation module

144:映射模組 144: Mapping Module

145:動作識別模組 145: Motion Recognition Module

146:輸出模組 146: Output module

S10:步驟10 S10: Step 10

S20:步驟20 S20: Step 20

S30:步驟30 S30: Step 30

S40:步驟40 S40: Step 40

S50:步驟50 S50: Step 50

S60:步驟60 S60: Step 60

圖1為使用三維感應器擷取人體動作計算出之人體骨架點定位圖。 FIG. 1 is a localization diagram of human skeleton points calculated by capturing human motions using a 3D sensor.

圖2為本發明實施例之動作識別系統方塊圖。 FIG. 2 is a block diagram of a motion recognition system according to an embodiment of the present invention.

圖3為本發明實施例之動作識別方法流程圖。 FIG. 3 is a flowchart of a motion recognition method according to an embodiment of the present invention.

圖4A為本發明實施例之非跌倒色彩影像骨架點之灰階示意圖。 FIG. 4A is a schematic diagram of a grayscale of a skeleton point of a non-falling color image according to an embodiment of the present invention.

圖4B為本發明實施例之跌倒動態過程的色彩影像骨架點之灰階示意圖。 FIG. 4B is a schematic diagram of a grayscale of a skeleton point of a color image of a fall dynamic process according to an embodiment of the present invention.

圖5A為本發明實施例之非跌倒深度影像骨架點之灰階示意圖。 FIG. 5A is a grayscale schematic diagram of a skeleton point in a non-falling depth image according to an embodiment of the present invention.

圖5B為本發明實施例之跌倒動態過程的深度影像骨架點之灰階示意圖。 FIG. 5B is a grayscale schematic diagram of a skeleton point of a depth image of a dynamic fall process according to an embodiment of the present invention.

圖6A為本發明實施例之近距離骨架點座標映射之灰階示意圖。 FIG. 6A is a grayscale schematic diagram of coordinate mapping of close-range skeleton points according to an embodiment of the present invention.

圖6B為本發明實施例之遠距離骨架點座標映射之灰階示意圖。 FIG. 6B is a grayscale schematic diagram of a long-distance skeleton point coordinate mapping according to an embodiment of the present invention.

圖7為本發明實施例之動作識別之灰階示意圖。 FIG. 7 is a grayscale schematic diagram of motion recognition according to an embodiment of the present invention.

有關於本發明其他技術內容、特點與功效，在以下配合參考圖式之較佳實施例的詳細說明中，將可清楚的呈現。 Other technical contents, features and effects of the present invention will be clearly presented in the following detailed description of the preferred embodiments with reference to the drawings.

如圖2所示，本發明實施例提供一種動作識別系統10，包含：一影像擷取裝置11、一深度影像擷取裝置12、一記憶體13以及一處理器14。處理器14包含一輸入模組141、一儲存模組142、一骨架點計算模組143、一映射模組144、以及一動作識別模組145。動作識別系統10可以進一步包含一輸出模組146。 As shown in FIG. 2 , an embodiment of the present invention provides a motion recognition system 10 , including: an image capturing device 11 , a depth image capturing device 12 , a memory 13 and a processor 14 . The processor 14 includes an input module 141 , a storage module 142 , a skeleton point calculation module 143 , a mapping module 144 , and a motion recognition module 145 . The motion recognition system 10 may further include an output module 146 .

如圖3所示，本發明實施例提供一種動作識別方法，包含：擷取一時間點之一二維色彩影像或一二維紅外線影像及一相對應之深度影像(步驟S10)；萃取該二維色彩影像或該二維紅外線影像中之二維人體骨架點資訊(步驟S20)；映射該二維人體骨架點資訊至該深度影像以取得該二維人體骨架點資訊所對應之一深度資訊(步驟S30)；使用一尺寸-深度參數及一畸變模型校正該二維人體骨架點資訊(步驟S40)；結合經校正之該二維人體骨架點資訊與該深度資訊以得到一三維人體骨架點資訊(步驟S50)；以及使用一匹配模型針對一段時間之一系列該三維人體骨架點資訊識別一動作(步驟S60)。 As shown in FIG. 3 , an embodiment of the present invention provides a motion recognition method, which includes: capturing a two-dimensional color image or a two-dimensional infrared image and a corresponding depth image at a time point (step S10 ); extracting the two 2D color image or 2D human skeleton point information in the 2D infrared image (step S20); map the 2D human skeleton point information to the depth image to obtain a depth information corresponding to the 2D human skeleton point information ( Step S30); use a size-depth parameter and a distortion model to correct the two-dimensional human skeleton point information (step S40); combine the corrected two-dimensional human skeleton point information and the depth information to obtain a three-dimensional human skeleton point information (Step S50 ); and use a matching model to identify an action for a series of the three-dimensional human skeleton point information over a period of time (Step S60 ).

請同時參考圖2及圖3來理解本發明實施例，影像擷取裝置11係用以擷取一時間點之一二維色彩影像或一二維紅外線影像。二維色彩影像可以為平面的RGB色彩影像，例如常見的照像機擷取的照片或是錄影機擷取的影片之一幀畫面。該二維色彩影像中的每個像素記載了一色彩資訊，其可以為紅色、綠色與藍色之含量矩陣資訊。二維紅外線影像可以為近紅外線照明下的平面灰階影像，常見於夜間識別或偵測，在照明不足的情況下仍可以擷取出解析度不錯的灰階影像。該二維紅外線影像中的每個像素代表紅外線感應器偵測到的紅外線強度。 Please refer to FIG. 2 and FIG. 3 at the same time to understand the embodiment of the present invention. The image capturing device 11 is used for capturing a two-dimensional color image or a two-dimensional infrared image at a time point. The two-dimensional color image can be a flat RGB color image, such as a photo captured by a common camera or a frame of a video captured by a video recorder. Each pixel in the two-dimensional color image records a color information, which can be the content matrix information of red, green and blue. Two-dimensional infrared images can be Flat gray-scale images under near-infrared illumination are commonly used for identification or detection at night, and gray-scale images with good resolution can still be captured in the case of insufficient lighting. Each pixel in the 2D infrared image represents the infrared intensity detected by the infrared sensor.

深度影像擷取裝置12係用以擷取該時間點之一相對應之深度影像，其可以為飛時測距(time of flight,TOF)感測器或景深攝影機(例如：Intel RealSense)。為了能相互映射，因此需在擷取該二維色彩影像或二維紅外線影像時，同時取得相對應的深度影像。深度影像亦為二維影像，只是該二維影像中每個像素是代表該像素所擷取之物體所在平面與感測器間之距離。 The depth image capturing device 12 is used to capture a depth image corresponding to one of the time points, which may be a time of flight (TOF) sensor or a depth camera (eg, Intel RealSense). In order to map each other, it is necessary to obtain the corresponding depth image at the same time when capturing the two-dimensional color image or the two-dimensional infrared image. The depth image is also a two-dimensional image, but each pixel in the two-dimensional image represents the distance between the plane where the object captured by the pixel is located and the sensor.

記憶體13用以儲存一匹配模型以用來識別不同的動作。在本發明實施例中，以識別跌倒動作為例，感應裝置高度為2公尺，其可以為本發明實施例之影像擷取裝置及深度影像擷取裝置，共拍攝60000張解析度620*350的連續幀圖片，跌倒動態的連續幀圖片與非跌倒動態的連續幀圖片大約各半。採樣跌倒與非跌倒的動態連續幀序列(sequence)，對序列中的每一幀圖片進行二維的人體骨架點計算，並結合對應的深度影像計算出模擬的三維骨架點座標。結合整個序列中時序上每幀的三維骨架點座標得到一四維動態特徵序列作為動作識別的輸入特徵。該三維人體骨架點在時序上的動態座標點序列為動作識別的重要特徵，可使用類神經網路的深度學習架構，例如長短期記憶模型(Long Short-Term Memory,LSTM)或卷積神經網路(CNN)進行深度學習，以建構出能識別出受測者不同的動態活動的匹配模型。 The memory 13 is used to store a matching model for identifying different actions. In the embodiment of the present invention, taking the recognition of a falling motion as an example, the height of the sensing device is 2 meters, which can be the image capture device and the depth image capture device of the embodiment of the present invention, and a total of 60,000 images are taken with a resolution of 620*350 The continuous frame picture of the falling dynamic and the non-falling dynamic continuous frame picture are about half and half. Sampling the falling and non-falling dynamic continuous frame sequence (sequence), perform two-dimensional human skeleton point calculation for each frame picture in the sequence, and calculate the simulated three-dimensional skeleton point coordinates in combination with the corresponding depth image. Combining the three-dimensional skeleton point coordinates of each frame in the whole sequence, a four-dimensional dynamic feature sequence is obtained as the input feature of action recognition. The dynamic coordinate point sequence of the three-dimensional human skeleton point in time series is an important feature of action recognition, and a neural network-like deep learning architecture can be used, such as long short-term memory model (Long Short-Term Memory, LSTM) or convolutional neural network. Road (CNN) for deep learning to build a matching model that can recognize the different dynamic activities of the subjects.

處理器14，電訊連接該影像擷取裝置11、該深度影像擷取裝置及該記憶體12。影像擷取裝置11與深度影像擷取裝置12擷取一時間點之二維色彩影像或二維紅外線影像及相對應之深度影像(步驟S10)後，以有線或無線傳輸給處理器14。輸入模組141係用以接收該二維色彩影像或該二維紅外線影像及相對應之該深度影像。為了方便後續應用，儲存模組142可以將該二維色彩影像或該二維紅外線影像、相對應之該深度影像儲存至該記憶體13以便隨時提取使用。 The processor 14 is telecommunicationly connected to the image capture device 11, the depth image capture device set and the memory 12 . The image capturing device 11 and the depth image capturing device 12 capture a two-dimensional color image or two-dimensional infrared image and a corresponding depth image at a time point (step S10 ), and then transmit them to the processor 14 by wire or wirelessly. The input module 141 is used for receiving the two-dimensional color image or the two-dimensional infrared image and the corresponding depth image. In order to facilitate subsequent applications, the storage module 142 can store the two-dimensional color image or the two-dimensional infrared image and the corresponding depth image in the memory 13 for retrieval and use at any time.

請同時參考圖4A及圖4B，其為二維色彩影像在非跌倒與跌倒動態過程狀況下的骨架點之灰階示意圖，雖然本發明實施例使用二維色彩影像做為範例並以灰階示意圖呈現，但本發明實施例之系統及方法不限於使用二維色彩影像，二維紅外線影像為灰階影像亦可有相同效果。骨架點計算模組143係用以萃取該二維色彩影像或該二維紅外線影像中之二維人體骨架點資訊(步驟S20)。在二維色彩影像或二維紅外線影像中辨識二維人體骨架點資訊可採用平行卷積網路的架構偵測出關節點位置的置信圖(confidence map)以及得到關節仿射場(Part Affinity Fields)以描述各關節之間的連線程度，再整合兩種特徵對每個肢段進行預測最後得出二維人體骨架點資訊。 Please refer to FIG. 4A and FIG. 4B at the same time, which are grayscale schematic diagrams of skeleton points of a 2D color image under non-falling and falling dynamic process conditions, although the embodiment of the present invention uses a 2D color image as an example and a grayscale schematic diagram However, the system and method of the embodiments of the present invention are not limited to using two-dimensional color images, and the two-dimensional infrared images can also have the same effect as grayscale images. The skeleton point calculation module 143 is used for extracting the 2D human skeleton point information in the 2D color image or the 2D infrared image (step S20). To identify 2D human skeleton point information in 2D color image or 2D infrared image, the parallel convolutional network architecture can be used to detect the confidence map of the joint point position and obtain the joint affine field (Part Affinity Fields) In order to describe the degree of connection between the joints, two features are integrated to predict each limb segment, and finally the two-dimensional human skeleton point information is obtained.

二維人體骨架點資訊為一包含二維座標的數據列表，可以指示出真實人體骨架點對應在二維色彩影像或二維紅外線影像上的像素位置，其為真實平面映射到二維色彩影像上之相對位置，常見的態樣可以是18個骨架點之像素位置，也就是一個2x18之矩陣。例如圖4A之非跌倒影像中頭部上的中心點代表鼻子在二維色彩影像中所在的像素位置為(361,88)。 The 2D human skeleton point information is a data list containing 2D coordinates, which can indicate the pixel position of the real human skeleton point corresponding to the 2D color image or the 2D infrared image, which is the real plane mapped to the 2D color image. The relative position, the common form can be the pixel position of 18 skeleton points, that is, a 2x18 matrix. For example, the center point on the head in the non-falling image of FIG. 4A represents the pixel position where the nose is located in the two-dimensional color image as (361, 88).

請同時參考圖5A與圖5B，其為深度影像在非跌倒與跌倒動態過程狀況下的骨架點之灰階示意圖。本發明實施例之重點是快速取得三維人體骨架點資訊，採用二維色彩影像或二維紅外線影像先取得平面人體骨架點資訊，再搭配深度影像來組合成三維人體骨架點資訊。因此，先要將二維色彩影像或二維紅外線影像與深度影像進行對應，並在對應的深度影像中取得深度資訊。映射模組144即是用以映射該二維人體骨架點資訊至該深度影像以取得該二維人體骨架點資訊所對應之一深度資訊(步驟S30)。映射二維色彩影像或二維紅外線影像中之人體骨架點資訊至深度影像時，可以在深度影像上相對應於二維色彩影像或二維紅外線影像中人體骨架點之同一像素位置，取得相對應之數值，該數值為該像素所擷取之人體骨架點所在平面與感測器間之距離，也就是深度資訊。 Please refer to FIG. 5A and FIG. 5B at the same time, which are the depth images in the non-falling and falling motions The grayscale diagram of the skeleton point in the state of the state process. The key point of the embodiments of the present invention is to quickly obtain 3D human skeleton point information, first obtain plane human skeleton point information by using 2D color image or 2D infrared image, and then combine with depth image to form 3D human skeleton point information. Therefore, the two-dimensional color image or the two-dimensional infrared image and the depth image are firstly corresponded, and the depth information is obtained from the corresponding depth image. The mapping module 144 is used for mapping the 2D human skeleton point information to the depth image to obtain depth information corresponding to the 2D human skeleton point information (step S30 ). When mapping the human skeleton point information in the 2D color image or the 2D infrared image to the depth image, the corresponding pixel position of the human skeleton point in the 2D color image or the 2D infrared image can be obtained on the depth image to obtain the corresponding The value is the distance between the plane where the human skeleton point captured by the pixel is located and the sensor, that is, the depth information.

二維色彩影像或二維紅外線影像與深度影像雖是同時擷取，但因兩台影像擷取裝置之間會有些許的距離差，亦或是取像上有不同的視野大小，為了提高映射時的精確度，可以在影像擷取裝置使用前進行簡單的配準校正，以建構一組位移量參數供後續校正該深度影像，使該深度影像的視野大小及影像擷取位置與二維色彩影像或二維紅外線影像相同。使用校正板或一受測物，比對其二維色彩影像或二維紅外線影像與深度影像中的對應位置座標，藉由影像形變(mesh warping)與反向映射(reverse mapping)產生配準校正後的深度影像，使同一個特徵在深度影像之像素位置與該特徵在二維色彩影像或二維紅外線影像中之像素位置一致。這個配準校正後的深度影像的一組位移量參數即可應用於後續之深度影像校正，並可以儲存在記憶體13。這組位移量參數範例可以為幾個重要校正點之位移量，其餘的座標以內插方式調整，以節省運算時間。 Although the 2D color image or 2D infrared image and depth image are captured at the same time, there will be a slight distance difference between the two image capture devices, or there will be different field sizes on the captured images. In order to improve the mapping A simple registration correction can be performed before the image capture device is used to construct a set of displacement parameters for subsequent correction of the depth image, so that the field of view size of the depth image, the image capture position and the two-dimensional color Image or 2D infrared image is the same. Using a calibration plate or a measured object, compare the corresponding position coordinates in the 2D color image or 2D infrared image and the depth image, and generate registration correction through image warping and reverse mapping. After the depth image, the pixel position of the same feature in the depth image is consistent with the pixel position of the feature in the two-dimensional color image or the two-dimensional infrared image. The set of displacement parameters of the depth image after the registration correction can be applied to the subsequent depth image correction, and can be stored in the memory 13 . This set of displacement parameter examples can be the displacements of several important calibration points, and the rest of the coordinates can be adjusted by interpolation to save computing time.

如圖6A及圖6B所示，同一個受測者與影像擷取裝置，當受測者在不同的距離下，其投射在二維色彩影像上有不同的尺寸。離影像擷取裝置越近則拍出來的人物越大(圖6A)，離影像擷取裝置越遠則拍出來的人物越小(圖6B)。即使是同一受測者，因為投射尺寸的大小不一致而使人體骨架點間的距離不一致，會導致後續的動作識別誤差，因此需要將在二維色彩影像或二維紅外線影像中所偵測到的二維人體骨架點依據其對應的深度資訊來還原出一致的比例尺度的座標空間，以利模擬重建人體骨架點之三維卡式座標(Cartesian coordinate system)位置。由於這樣的還原只需要對已經萃取出的二維人體骨架點資訊進行，因此可以節省大量的時間及資源。 As shown in FIG. 6A and FIG. 6B , for the same subject and the image capturing device, when the subject is at different distances, the projected two-dimensional color images have different sizes. The closer to the image capturing device, the larger the captured person ( FIG. 6A ), and the farther away from the image capturing device, the smaller the captured character ( FIG. 6B ). Even for the same subject, the distances between the human skeleton points are inconsistent due to the inconsistent projection size, which will lead to subsequent motion recognition errors. The two-dimensional human skeleton point restores a consistent scale coordinate space according to its corresponding depth information, so as to simulate and reconstruct the three-dimensional Cartesian coordinate system position of the human skeleton point. Since such restoration only needs to be performed on the extracted two-dimensional human skeleton point information, a lot of time and resources can be saved.

藉由測量校正板或測量同一物件在不同位置下的對應影像尺寸，再藉由線性內插的方式計算出校正板或受測物在不同距離下的對應比例尺度，得到一尺寸-深度參數。尺寸-深度參數可以儲存在記憶體13，骨架點計算模組143可以使用尺寸-深度參數校正二維人體骨架點資訊(步驟S40)，也就是先取得二維色彩影像或二維紅外線影像中的二維人體骨架點資訊相對應的深度資訊，並以尺寸-深度參數計算對應的比例尺度進行二維人體骨架點資訊的校正還原，以將不同深度之人體骨架大小調整至同一尺度。 A size-depth parameter is obtained by measuring the calibration plate or the corresponding image size of the same object at different positions, and then calculating the corresponding scale scale of the calibration plate or the measured object at different distances by means of linear interpolation. The size-depth parameters can be stored in the memory 13, and the skeleton point calculation module 143 can use the size-depth parameters to correct the 2D human skeleton point information (step S40), that is, first obtain the 2D color image or the 2D infrared image. The depth information corresponding to the two-dimensional human skeleton point information, and the corresponding scale scale calculated by the size-depth parameter is used to correct and restore the two-dimensional human skeleton point information, so as to adjust the size of the human skeleton of different depths to the same scale.

然而，由於每個影像擷取裝置的鏡頭其鏡面曲率會造成離影像上的畸變失真。即使在二維影像中的失真效果並不明顯，但在對應深度的比例尺度還原上會放大影像的畸變失真，造成人體的三維骨架點在尺度還原後會有肢體不對稱的問題，尤其在距離影像擷取裝置越遠或越偏離拍攝中心點時，在座標還原後的畸變失真會越嚴重。 However, due to the curvature of the mirror surface of the lens of each image capture device, distortion on the distant image will be caused. Even if the distortion effect in the two-dimensional image is not obvious, the distortion of the image will be amplified in the scale restoration of the corresponding depth, causing the 3D skeleton point of the human body to have limb asymmetry after the scale restoration, especially in the distance. When the image capture device is farther away or deviated from the shooting center point, the distortion and distortion after the coordinate restoration will be more serious.

為解決此問題，需要針對不同的拍攝裝置進行影像畸變校正還原。使用校正板擷取多張二維色彩影像或二維紅外線影像，計算出鏡頭的內部曲率參數k，藉由division畸變模型L(r)採用反向映射作校正還原。

，其中

、

為校正後的點座標，x、y為原始影像點座標，x _c、y _c為畸變中心點。L(r)為畸變模型，

，r為原始座標距離畸變中心點的距離。對二維色彩影像或二維紅外線影像進行畸變校正還原。畸變模型可以儲存在記憶體13，骨架點計算模組143可以使用畸變模型校正二維人體骨架點資訊(步驟S40)。接著，映射模組144結合經校正之該二維人體骨架點資訊與該深度資訊以計算得到一三維人體骨架點資訊(步驟S50)，此三維人體骨架點資訊也就非常接近真實的骨架點空間位置。 To solve this problem, it is necessary to perform image distortion correction and restoration for different photographing devices. Use the calibration plate to capture multiple 2D color images or 2D infrared images, calculate the internal curvature parameter k of the lens, and use the division distortion model L(r) to correct and restore by reverse mapping.

,in

,

are the corrected point coordinates, x , y are the original image point coordinates, x _c , y _c are the distortion center points. L(r) is the distortion model,

, r is the distance from the original coordinate to the center point of the distortion. Distortion correction and restoration of 2D color images or 2D infrared images. The distortion model can be stored in the memory 13, and the skeleton point calculation module 143 can use the distortion model to correct the two-dimensional human skeleton point information (step S40). Next, the mapping module 144 combines the corrected 2D human skeleton point information and the depth information to calculate a 3D human skeleton point information (step S50 ). The 3D human skeleton point information is also very close to the real skeleton point space Location.

如圖7所示，本發明實施例可以應用在跌倒偵測領域，但不限於跌倒領域，亦可用在運動訓練領域等。動作識別模組145係使用一匹配模型針對一段時間之一系列該三維人體骨架點資訊識別一動作(步驟S60)。一段時間之一系列該三維人體骨架點資訊可以是一個四維矩陣，也就是一段連續時間之三維人體骨架點資訊，常用的時間長度可以為1至2秒，更佳為1.5秒，以達到即時的動作識別。將動作識別示意圖標記在深度影像中時，可以使用假色(pseudo color)代表不同的深度資訊，例如紅色代表距離影像擷取裝置近，藍色代表距離影像擷取裝置遠。該匹配模型係以深度學習架構建立之行為分類模型參數，用於計算當下受測者的動態動作與模型中的何者動作較為匹配，以判斷識別出一動作，例如跌倒。 As shown in FIG. 7 , the embodiment of the present invention can be applied in the field of fall detection, but is not limited to the field of falls, and can also be used in the field of sports training and the like. The motion recognition module 145 uses a matching model to recognize a motion for a series of the three-dimensional human skeleton point information over a period of time (step S60 ). The 3D human skeleton point information can be a four-dimensional matrix, that is, the 3D human skeleton point information for a continuous period of time. The commonly used time length can be 1 to 2 seconds, more preferably 1.5 seconds, so as to achieve real-time. Action recognition. When the motion recognition schematic diagram is marked in the depth image, false colors can be used to represent different depth information, for example, red represents the distance from the image capture device, and blue represents the distance from the image capture device. The matching model is based on the parameters of the behavior classification model established by the deep learning framework, and is used to calculate which motion of the current subject's dynamic action matches the action in the model, so as to determine and identify an action, such as a fall.

本發明實施例之動作識別系統10更進一步包含一輸出模組146在識別該動作時發出一提示訊號。在跌倒偵測領域，提示訊號可以觸發一警鈴或一電話撥出動作，以通知家人或警察單位。圖7左欄顯示不同跌倒偵測區域，右上欄顯示有跌倒提示訊號，並於右下欄顯示偵測到的跌倒畫面。 The motion recognition system 10 according to the embodiment of the present invention further includes an output module 146 to issue a prompt signal when the motion is recognized. In the field of fall detection, the alert signal can trigger an alarm bell or a phone call to notify family members or police units. Figure 7 Left column shows different falls In the detection area, the upper right column displays a fall reminder signal, and the lower right column displays the detected fall screen.

本發明實施例採用RGB二維色彩影像或二維紅外線影像萃取出二維人體骨架點資訊，且結合深度資訊以快速模擬出一段時間之一系列三維人體骨架點座標作為行為識別的輸入特徵，不僅相較於二維人體骨架點精準，更比三維感測器測出的三維人體骨架點節省資源與計算時間。若做為年長者在長照即時看護上的跌倒偵測系統應用，可以解決許多平面骨架點在動作/行為上因骨架點在平面上的多處重疊而無法準確辨識的問題。 The embodiment of the present invention uses RGB two-dimensional color images or two-dimensional infrared images to extract two-dimensional human skeleton point information, and combines depth information to quickly simulate a series of three-dimensional human skeleton point coordinates for a period of time as input features for behavior recognition, not only Compared with the accuracy of the 2D human skeleton points, it saves resources and computing time compared to the 3D human skeleton points measured by the 3D sensor. If it is used as a fall detection system for the elderly in real-time long-term care, it can solve the problem that many plane skeleton points cannot be accurately identified due to the overlapping of skeleton points on the plane in the action/behavior.

S10:步驟10 S10: Step 10

S20:步驟20 S20: Step 20

S30:步驟30 S30: Step 30

S40:步驟40 S40: Step 40

S50:步驟50 S50: Step 50

S60:步驟60 S60: Step 60

Claims

A motion recognition method, comprising: capturing a two-dimensional color image or a two-dimensional infrared image and a corresponding depth image at a time point; extracting a two-dimensional human skeleton in the two-dimensional color image or the two-dimensional infrared image point information; map the 2D human skeleton point information to the depth image to obtain depth information corresponding to the 2D human skeleton point information; use a size-depth parameter and a distortion model to correct the 2D human skeleton point information, wherein the distortion model is used to correct the distance between the pixel coordinate position of the 2D human skeleton point and the image distortion center; combine the corrected 2D human skeleton point information and the depth information to obtain a 3D human skeleton point information; and A motion is identified for a series of the 3D human skeleton point information over a period of time using a matching model.

The action recognition method of claim 1, further comprising: issuing a prompt signal when the action is recognized.

The action recognition method of claim 1, wherein the matching model is a classification model parameter established by a neural network-like deep learning framework.

The motion recognition method of claim 1, wherein the depth image is first corrected with a set of displacement parameters.

An action recognition system comprising: an image capturing device for capturing a two-dimensional color image or a two-dimensional infrared image at a time point; a depth image capturing device for capturing a corresponding depth image at the time point; a memory a body for storing a size-depth parameter, a distortion model and a matching model; and a processor telecommunicationly connected to the image capture device, the depth image capture device and the memory, the processor comprising: an input a module for receiving the two-dimensional color image or the two-dimensional infrared image and the corresponding depth image; a storage module for storing the two-dimensional color image or the two-dimensional infrared image and the corresponding depth image to the memory; a skeleton point calculation module for extracting 2D human skeleton point information in the 2D color image or the 2D infrared image, and correcting the 2D human body using the size-depth parameter and the distortion model Skeleton point information, wherein the distortion model is used to correct the distance between the pixel coordinate position of the 2D human skeleton point and the image distortion center; a mapping module is used to map the 2D human skeleton point information to the depth image to obtain depth information corresponding to the two-dimensional human skeleton point information, and combining the corrected two-dimensional human skeleton point information and the depth information to obtain a three-dimensional human skeleton point information; and a motion recognition module using a matching model An action is recognized for a series of the three-dimensional human skeleton point information over a period of time.

The motion recognition system of claim 5 further comprises: an output module that sends out a prompt signal when the motion is recognized.

The action recognition system of claim 5, wherein the matching model is a neural network-like The classification model parameters established by the deep learning architecture.

The motion recognition system of claim 5, wherein the memory further stores a set of displacement parameters, and the depth image is first corrected using the set of displacement parameters.