TWI755950B - Action recognition method and system thereof - Google Patents
Action recognition method and system thereof Download PDFInfo
- Publication number
- TWI755950B TWI755950B TW109142075A TW109142075A TWI755950B TW I755950 B TWI755950 B TW I755950B TW 109142075 A TW109142075 A TW 109142075A TW 109142075 A TW109142075 A TW 109142075A TW I755950 B TWI755950 B TW I755950B
- Authority
- TW
- Taiwan
- Prior art keywords
- image
- human skeleton
- dimensional
- skeleton point
- depth
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 241000282414 Homo sapiens Species 0.000 claims abstract description 90
- 238000013507 mapping Methods 0.000 claims abstract description 14
- 230000033001 locomotion Effects 0.000 claims description 31
- 230000015654 memory Effects 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000006073 displacement reaction Methods 0.000 claims description 9
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000013145 classification model Methods 0.000 claims description 4
- 230000001537 neural effect Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 13
- 230000000694 effects Effects 0.000 description 8
- 238000012937 correction Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 230000006399 behavior Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 206010070670 Limb asymmetry Diseases 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 210000003414 extremity Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003702 image correction Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 210000003141 lower extremity Anatomy 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 210000005010 torso Anatomy 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 210000001364 upper extremity Anatomy 0.000 description 1
Images
Abstract
Description
本發明關於一種動作識別方法及其系統,特別關於一種多模態影像整合及模擬之動作識別方法及其系統。 The present invention relates to an action recognition method and system, in particular to a multimodal image integration and simulation action recognition method and system.
人類活動識別(Human Action Recognition,HAR)是近年很熱門的研究題材,在情境感知領域、運動監測領域、老人照護領域都已研發出相當多的方法及技術。其中,二維影像中的人體骨架點定位技術已趨成熟,可利用即時(real-time)之二維RGB影像(紅綠藍影像)或IR影像(紅外線影像)辨識並定位出頭部、軀幹、上肢及下肢,進而判斷人類的活動狀態。然而,在某些人類活動識別,僅運用二維的骨架點資訊常常無法進行區分,例如有些動作的骨架點在平面上的投影有多處重疊,因此無法進行辯識與區別。 Human Action Recognition (HAR) is a very popular research topic in recent years, and a lot of methods and technologies have been developed in the field of situational awareness, motion monitoring, and elderly care. Among them, the human skeleton point positioning technology in 2D images has become mature, and real-time 2D RGB images (red, green and blue images) or IR images (infrared images) can be used to identify and locate the head, torso, Upper and lower limbs, and then judge the activity state of human beings. However, in the recognition of some human activities, only two-dimensional skeleton point information is often used to distinguish. For example, the projections of the skeleton points of some actions on the plane overlap in many places, so it is impossible to identify and distinguish.
因此,如圖1所示,更高準確度的人類活動識別還是常常仰賴人體的三維點雲(3D point cloud)座標資訊。使用三維感應器取得之三維點雲座標資訊資訊量極為龐大,因此三維感應器的解析度若太高,需耗費太多資源及時間計算才能得到人體骨架點定位圖,而解析度太低又可能因背景雜訊而無法識別出正確的骨架點,進而使動作識別的正確性降低。因此, 亟需一種即時且高正確度之動作識別方法及系統。 Therefore, as shown in Figure 1, higher-accuracy human activity recognition still often relies on the coordinate information of the 3D point cloud of the human body. The amount of 3D point cloud coordinate information obtained by using the 3D sensor is extremely large. Therefore, if the resolution of the 3D sensor is too high, it will take too much resources and time to calculate the human skeleton point location map, and if the resolution is too low, it may The correct skeleton point cannot be recognized due to background noise, which reduces the accuracy of action recognition. therefore, There is an urgent need for a real-time and highly accurate action recognition method and system.
本發明提供一種動作識別方法,包含:擷取一時間點之一二維色彩影像或一二維紅外線影像及一相對應之深度影像;萃取該二維色彩影像或該二維紅外線影像中之二維人體骨架點資訊;映射該二維人體骨架點資訊至該深度影像以取得該二維人體骨架點資訊所對應之一深度資訊;使用一尺寸-深度參數及一畸變模型校正該二維人體骨架點資訊;結合經校正之該二維人體骨架點資訊與該深度資訊以得到一三維人體骨架點資訊;以及使用一匹配模型針對一段時間之一系列該三維人體骨架點資訊識別一動作。 The present invention provides a motion recognition method, comprising: capturing a two-dimensional color image or a two-dimensional infrared image and a corresponding depth image at a time point; extracting two of the two-dimensional color image or the two-dimensional infrared image 2D human skeleton point information; map the 2D human skeleton point information to the depth image to obtain depth information corresponding to the 2D human skeleton point information; use a size-depth parameter and a distortion model to correct the 2D human skeleton point information; combining the corrected 2D human skeleton point information with the depth information to obtain a 3D human skeleton point information; and using a matching model to identify an action for a series of the 3D human skeleton point information over a period of time.
本發明又提供一種動作識別系統,包含:一影像擷取裝置,用以擷取一時間點之一二維色彩影像或一二維紅外線影像;一深度影像擷取裝置,用以擷取該時間點之一相對應之深度影像;一記憶體,用以儲存一尺寸-深度參數、一畸變模型及一匹配模型;以及一處理器,電訊連接該影像擷取裝置、該深度影像擷取裝置及該記憶體,該處理器包含:一輸入模組,用以接收該二維色彩影像或該二維紅外線影像及相對應之該深度影像;一儲存模組,將該二維色彩影像或該二維紅外線影像、相對應之該深度影像儲存至該記憶體;一骨架點計算模組,用以萃取該二維色彩影像或該二維紅外線影像中之二維人體骨架點資訊,使用該尺寸-深度參數及該畸變模型校正該二維人體骨架點資訊;一映射模組,用以映射該二維人體骨架點資訊至該深度影像以得到該二維人體骨架點資訊所對應之一深度資 訊,以及結合經校正之該二維人體骨架點資訊與該深度資訊以得到一三維人體骨架點資訊;以及一動作識別模組,使用一匹配模型針對一段時間之一系列該三維人體骨架點資訊識別一動作。 The present invention further provides a motion recognition system, comprising: an image capturing device for capturing a two-dimensional color image or a two-dimensional infrared image at a time point; a depth image capturing device for capturing the time a depth image corresponding to one of the points; a memory for storing a size-depth parameter, a distortion model and a matching model; and a processor telecommunicationly connected to the image capture device, the depth image capture device and the The memory, the processor includes: an input module for receiving the two-dimensional color image or the two-dimensional infrared image and the corresponding depth image; a storage module for the two-dimensional color image or the two The 2D infrared image and the corresponding depth image are stored in the memory; a skeleton point calculation module is used to extract the 2D color image or the 2D human skeleton point information in the 2D infrared image, and use the size- The depth parameter and the distortion model correct the two-dimensional human skeleton point information; a mapping module is used to map the two-dimensional human skeleton point information to the depth image to obtain a depth data corresponding to the two-dimensional human skeleton point information information, and combining the corrected two-dimensional human skeleton point information and the depth information to obtain a three-dimensional human skeleton point information; and a motion recognition module using a matching model for a series of the three-dimensional human skeleton point information over a period of time Identify an action.
於某些具體實施例中進一步包含一輸出模組在識別該動作時發出一提示訊號。 In some specific embodiments, an output module further includes an output module to issue a prompt signal when identifying the action.
於某些具體實施例中,該匹配模型係以類神經網路的深度學習架構建立之分類模型參數。 In some embodiments, the matching model is a classification model parameter established by a neural network-like deep learning architecture.
於某些具體實施例中,該畸變模型係用以校正該二維人體骨架點之像素座標位置與影像畸變中心的距離。 In some embodiments, the distortion model is used to correct the distance between the pixel coordinate position of the two-dimensional human skeleton point and the image distortion center.
於某些具體實施例中,其中該記憶體進一步儲存一組位移量參數,該深度影像係先以該組位移量參數進行校正。 In some embodiments, the memory further stores a set of displacement parameters, and the depth image is first calibrated using the set of displacement parameters.
本發明所提供之動作識別方法及系統可以解決人體三維骨架點計算費時與易受設備解析度或雜訊影響的問題,提出一種多模態(multi-modality)影像整合,並能快速且準確模擬三維骨架點資訊的方法及系統,可以應用於各種即時(real-time)人類活動識別情境,例如跌倒情境偵測。 The motion recognition method and system provided by the present invention can solve the problems of time-consuming calculation of three-dimensional skeleton points of the human body and being easily affected by device resolution or noise, and propose a multi-modality (multi-modality) image integration, which can quickly and accurately simulate The method and system for three-dimensional skeleton point information can be applied to various real-time human activity recognition situations, such as fall situation detection.
除非另有定義,本文使用的所有技術和科學術語具有與本發明所屬領域中的技術人員所通常理解相同的含義。 Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
如本文所用,冠詞「一」、「一個」以及「任何」是指一個或多於一個(即至少一個)的物品的文法物品。例如,「一個元件」意指一個元件或於一個元件。 As used herein, the articles "a," "an," and "any" refer to grammatical items of one or more than one (ie, at least one) item. For example, "an element" means an element or an element.
本文所使用的「約」、「大約」或「近乎」一詞實質上代表所述之數值或範圍位於20%以內,較佳為於10%以內,以及更佳者為於5%以 內。於文所提供之數字化的量為近似值,意旨若術語「約」、「大約」或「近乎」沒有被使用時亦可被推得。 As used herein, the terms "about", "approximately" or "approximately" mean substantially the stated value or range is within 20%, preferably within 10%, and more preferably within 5% Inside. Numerical quantities provided herein are approximations intended to be inferred if the terms "about," "approximately," or "approximately" were not used.
10:動作識別系統 10: Motion Recognition System
11:影像擷取裝置 11: Image capture device
12:深度影像擷取裝置 12: Depth image capture device
13:記憶體 13: Memory
14:處理器 14: Processor
141:輸入模組 141: Input module
142:儲存模組 142: Storage Module
143:骨架點計算模組 143: Skeleton point calculation module
144:映射模組 144: Mapping Module
145:動作識別模組 145: Motion Recognition Module
146:輸出模組 146: Output module
S10:步驟10
S10:
S20:步驟20
S20:
S30:步驟30 S30: Step 30
S40:步驟40 S40: Step 40
S50:步驟50 S50: Step 50
S60:步驟60
S60:
圖1為使用三維感應器擷取人體動作計算出之人體骨架點定位圖。 FIG. 1 is a localization diagram of human skeleton points calculated by capturing human motions using a 3D sensor.
圖2為本發明實施例之動作識別系統方塊圖。 FIG. 2 is a block diagram of a motion recognition system according to an embodiment of the present invention.
圖3為本發明實施例之動作識別方法流程圖。 FIG. 3 is a flowchart of a motion recognition method according to an embodiment of the present invention.
圖4A為本發明實施例之非跌倒色彩影像骨架點之灰階示意圖。 FIG. 4A is a schematic diagram of a grayscale of a skeleton point of a non-falling color image according to an embodiment of the present invention.
圖4B為本發明實施例之跌倒動態過程的色彩影像骨架點之灰階示意圖。 FIG. 4B is a schematic diagram of a grayscale of a skeleton point of a color image of a fall dynamic process according to an embodiment of the present invention.
圖5A為本發明實施例之非跌倒深度影像骨架點之灰階示意圖。 FIG. 5A is a grayscale schematic diagram of a skeleton point in a non-falling depth image according to an embodiment of the present invention.
圖5B為本發明實施例之跌倒動態過程的深度影像骨架點之灰階示意圖。 FIG. 5B is a grayscale schematic diagram of a skeleton point of a depth image of a dynamic fall process according to an embodiment of the present invention.
圖6A為本發明實施例之近距離骨架點座標映射之灰階示意圖。 FIG. 6A is a grayscale schematic diagram of coordinate mapping of close-range skeleton points according to an embodiment of the present invention.
圖6B為本發明實施例之遠距離骨架點座標映射之灰階示意圖。 FIG. 6B is a grayscale schematic diagram of a long-distance skeleton point coordinate mapping according to an embodiment of the present invention.
圖7為本發明實施例之動作識別之灰階示意圖。 FIG. 7 is a grayscale schematic diagram of motion recognition according to an embodiment of the present invention.
有關於本發明其他技術內容、特點與功效,在以下配合參考圖式之較佳實施例的詳細說明中,將可清楚的呈現。 Other technical contents, features and effects of the present invention will be clearly presented in the following detailed description of the preferred embodiments with reference to the drawings.
如圖2所示,本發明實施例提供一種動作識別系統10,包含:一影像擷取裝置11、一深度影像擷取裝置12、一記憶體13以及一處理器14。處理器14包含一輸入模組141、一儲存模組142、一骨架點計算模組143、一映射模組144、以及一動作識別模組145。動作識別系統10可以進一步包含一輸出模組146。
As shown in FIG. 2 , an embodiment of the present invention provides a
如圖3所示,本發明實施例提供一種動作識別方法,包含:擷取一時間點之一二維色彩影像或一二維紅外線影像及一相對應之深度影像(步驟S10);萃取該二維色彩影像或該二維紅外線影像中之二維人體骨架點資訊(步驟S20);映射該二維人體骨架點資訊至該深度影像以取得該二維人體骨架點資訊所對應之一深度資訊(步驟S30);使用一尺寸-深度參數及一畸變模型校正該二維人體骨架點資訊(步驟S40);結合經校正之該二維人體骨架點資訊與該深度資訊以得到一三維人體骨架點資訊(步驟S50);以及使用一匹配模型針對一段時間之一系列該三維人體骨架點資訊識別一動作(步驟S60)。 As shown in FIG. 3 , an embodiment of the present invention provides a motion recognition method, which includes: capturing a two-dimensional color image or a two-dimensional infrared image and a corresponding depth image at a time point (step S10 ); extracting the two 2D color image or 2D human skeleton point information in the 2D infrared image (step S20); map the 2D human skeleton point information to the depth image to obtain a depth information corresponding to the 2D human skeleton point information ( Step S30); use a size-depth parameter and a distortion model to correct the two-dimensional human skeleton point information (step S40); combine the corrected two-dimensional human skeleton point information and the depth information to obtain a three-dimensional human skeleton point information (Step S50 ); and use a matching model to identify an action for a series of the three-dimensional human skeleton point information over a period of time (Step S60 ).
請同時參考圖2及圖3來理解本發明實施例,影像擷取裝置11係用以擷取一時間點之一二維色彩影像或一二維紅外線影像。二維色彩影像可以為平面的RGB色彩影像,例如常見的照像機擷取的照片或是錄影機擷取的影片之一幀畫面。該二維色彩影像中的每個像素記載了一色彩資訊,其可以為紅色、綠色與藍色之含量矩陣資訊。二維紅外線影像可以為
近紅外線照明下的平面灰階影像,常見於夜間識別或偵測,在照明不足的情況下仍可以擷取出解析度不錯的灰階影像。該二維紅外線影像中的每個像素代表紅外線感應器偵測到的紅外線強度。
Please refer to FIG. 2 and FIG. 3 at the same time to understand the embodiment of the present invention. The image capturing
深度影像擷取裝置12係用以擷取該時間點之一相對應之深度影像,其可以為飛時測距(time of flight,TOF)感測器或景深攝影機(例如:Intel RealSense)。為了能相互映射,因此需在擷取該二維色彩影像或二維紅外線影像時,同時取得相對應的深度影像。深度影像亦為二維影像,只是該二維影像中每個像素是代表該像素所擷取之物體所在平面與感測器間之距離。
The depth
記憶體13用以儲存一匹配模型以用來識別不同的動作。在本發明實施例中,以識別跌倒動作為例,感應裝置高度為2公尺,其可以為本發明實施例之影像擷取裝置及深度影像擷取裝置,共拍攝60000張解析度620*350的連續幀圖片,跌倒動態的連續幀圖片與非跌倒動態的連續幀圖片大約各半。採樣跌倒與非跌倒的動態連續幀序列(sequence),對序列中的每一幀圖片進行二維的人體骨架點計算,並結合對應的深度影像計算出模擬的三維骨架點座標。結合整個序列中時序上每幀的三維骨架點座標得到一四維動態特徵序列作為動作識別的輸入特徵。該三維人體骨架點在時序上的動態座標點序列為動作識別的重要特徵,可使用類神經網路的深度學習架構,例如長短期記憶模型(Long Short-Term Memory,LSTM)或卷積神經網路(CNN)進行深度學習,以建構出能識別出受測者不同的動態活動的匹配模型。
The
處理器14,電訊連接該影像擷取裝置11、該深度影像擷取裝
置及該記憶體12。影像擷取裝置11與深度影像擷取裝置12擷取一時間點之二維色彩影像或二維紅外線影像及相對應之深度影像(步驟S10)後,以有線或無線傳輸給處理器14。輸入模組141係用以接收該二維色彩影像或該二維紅外線影像及相對應之該深度影像。為了方便後續應用,儲存模組142可以將該二維色彩影像或該二維紅外線影像、相對應之該深度影像儲存至該記憶體13以便隨時提取使用。
The
請同時參考圖4A及圖4B,其為二維色彩影像在非跌倒與跌倒動態過程狀況下的骨架點之灰階示意圖,雖然本發明實施例使用二維色彩影像做為範例並以灰階示意圖呈現,但本發明實施例之系統及方法不限於使用二維色彩影像,二維紅外線影像為灰階影像亦可有相同效果。骨架點計算模組143係用以萃取該二維色彩影像或該二維紅外線影像中之二維人體骨架點資訊(步驟S20)。在二維色彩影像或二維紅外線影像中辨識二維人體骨架點資訊可採用平行卷積網路的架構偵測出關節點位置的置信圖(confidence map)以及得到關節仿射場(Part Affinity Fields)以描述各關節之間的連線程度,再整合兩種特徵對每個肢段進行預測最後得出二維人體骨架點資訊。
Please refer to FIG. 4A and FIG. 4B at the same time, which are grayscale schematic diagrams of skeleton points of a 2D color image under non-falling and falling dynamic process conditions, although the embodiment of the present invention uses a 2D color image as an example and a grayscale schematic diagram However, the system and method of the embodiments of the present invention are not limited to using two-dimensional color images, and the two-dimensional infrared images can also have the same effect as grayscale images. The skeleton
二維人體骨架點資訊為一包含二維座標的數據列表,可以指示出真實人體骨架點對應在二維色彩影像或二維紅外線影像上的像素位置,其為真實平面映射到二維色彩影像上之相對位置,常見的態樣可以是18個骨架點之像素位置,也就是一個2x18之矩陣。例如圖4A之非跌倒影像中頭部上的中心點代表鼻子在二維色彩影像中所在的像素位置為(361,88)。 The 2D human skeleton point information is a data list containing 2D coordinates, which can indicate the pixel position of the real human skeleton point corresponding to the 2D color image or the 2D infrared image, which is the real plane mapped to the 2D color image. The relative position, the common form can be the pixel position of 18 skeleton points, that is, a 2x18 matrix. For example, the center point on the head in the non-falling image of FIG. 4A represents the pixel position where the nose is located in the two-dimensional color image as (361, 88).
請同時參考圖5A與圖5B,其為深度影像在非跌倒與跌倒動
態過程狀況下的骨架點之灰階示意圖。本發明實施例之重點是快速取得三維人體骨架點資訊,採用二維色彩影像或二維紅外線影像先取得平面人體骨架點資訊,再搭配深度影像來組合成三維人體骨架點資訊。因此,先要將二維色彩影像或二維紅外線影像與深度影像進行對應,並在對應的深度影像中取得深度資訊。映射模組144即是用以映射該二維人體骨架點資訊至該深度影像以取得該二維人體骨架點資訊所對應之一深度資訊(步驟S30)。映射二維色彩影像或二維紅外線影像中之人體骨架點資訊至深度影像時,可以在深度影像上相對應於二維色彩影像或二維紅外線影像中人體骨架點之同一像素位置,取得相對應之數值,該數值為該像素所擷取之人體骨架點所在平面與感測器間之距離,也就是深度資訊。
Please refer to FIG. 5A and FIG. 5B at the same time, which are the depth images in the non-falling and falling motions
The grayscale diagram of the skeleton point in the state of the state process. The key point of the embodiments of the present invention is to quickly obtain 3D human skeleton point information, first obtain plane human skeleton point information by using 2D color image or 2D infrared image, and then combine with depth image to form 3D human skeleton point information. Therefore, the two-dimensional color image or the two-dimensional infrared image and the depth image are firstly corresponded, and the depth information is obtained from the corresponding depth image. The
二維色彩影像或二維紅外線影像與深度影像雖是同時擷取,但因兩台影像擷取裝置之間會有些許的距離差,亦或是取像上有不同的視野大小,為了提高映射時的精確度,可以在影像擷取裝置使用前進行簡單的配準校正,以建構一組位移量參數供後續校正該深度影像,使該深度影像的視野大小及影像擷取位置與二維色彩影像或二維紅外線影像相同。使用校正板或一受測物,比對其二維色彩影像或二維紅外線影像與深度影像中的對應位置座標,藉由影像形變(mesh warping)與反向映射(reverse mapping)產生配準校正後的深度影像,使同一個特徵在深度影像之像素位置與該特徵在二維色彩影像或二維紅外線影像中之像素位置一致。這個配準校正後的深度影像的一組位移量參數即可應用於後續之深度影像校正,並可以儲存在記憶體13。這組位移量參數範例可以為幾個重要校正點之位移量,其餘的座標以內插方式調整,以節省運算時間。
Although the 2D color image or 2D infrared image and depth image are captured at the same time, there will be a slight distance difference between the two image capture devices, or there will be different field sizes on the captured images. In order to improve the mapping A simple registration correction can be performed before the image capture device is used to construct a set of displacement parameters for subsequent correction of the depth image, so that the field of view size of the depth image, the image capture position and the two-dimensional color Image or 2D infrared image is the same. Using a calibration plate or a measured object, compare the corresponding position coordinates in the 2D color image or 2D infrared image and the depth image, and generate registration correction through image warping and reverse mapping. After the depth image, the pixel position of the same feature in the depth image is consistent with the pixel position of the feature in the two-dimensional color image or the two-dimensional infrared image. The set of displacement parameters of the depth image after the registration correction can be applied to the subsequent depth image correction, and can be stored in the
如圖6A及圖6B所示,同一個受測者與影像擷取裝置,當受測者在不同的距離下,其投射在二維色彩影像上有不同的尺寸。離影像擷取裝置越近則拍出來的人物越大(圖6A),離影像擷取裝置越遠則拍出來的人物越小(圖6B)。即使是同一受測者,因為投射尺寸的大小不一致而使人體骨架點間的距離不一致,會導致後續的動作識別誤差,因此需要將在二維色彩影像或二維紅外線影像中所偵測到的二維人體骨架點依據其對應的深度資訊來還原出一致的比例尺度的座標空間,以利模擬重建人體骨架點之三維卡式座標(Cartesian coordinate system)位置。由於這樣的還原只需要對已經萃取出的二維人體骨架點資訊進行,因此可以節省大量的時間及資源。 As shown in FIG. 6A and FIG. 6B , for the same subject and the image capturing device, when the subject is at different distances, the projected two-dimensional color images have different sizes. The closer to the image capturing device, the larger the captured person ( FIG. 6A ), and the farther away from the image capturing device, the smaller the captured character ( FIG. 6B ). Even for the same subject, the distances between the human skeleton points are inconsistent due to the inconsistent projection size, which will lead to subsequent motion recognition errors. The two-dimensional human skeleton point restores a consistent scale coordinate space according to its corresponding depth information, so as to simulate and reconstruct the three-dimensional Cartesian coordinate system position of the human skeleton point. Since such restoration only needs to be performed on the extracted two-dimensional human skeleton point information, a lot of time and resources can be saved.
藉由測量校正板或測量同一物件在不同位置下的對應影像尺寸,再藉由線性內插的方式計算出校正板或受測物在不同距離下的對應比例尺度,得到一尺寸-深度參數。尺寸-深度參數可以儲存在記憶體13,骨架點計算模組143可以使用尺寸-深度參數校正二維人體骨架點資訊(步驟S40),也就是先取得二維色彩影像或二維紅外線影像中的二維人體骨架點資訊相對應的深度資訊,並以尺寸-深度參數計算對應的比例尺度進行二維人體骨架點資訊的校正還原,以將不同深度之人體骨架大小調整至同一尺度。
A size-depth parameter is obtained by measuring the calibration plate or the corresponding image size of the same object at different positions, and then calculating the corresponding scale scale of the calibration plate or the measured object at different distances by means of linear interpolation. The size-depth parameters can be stored in the
然而,由於每個影像擷取裝置的鏡頭其鏡面曲率會造成離影像上的畸變失真。即使在二維影像中的失真效果並不明顯,但在對應深度的比例尺度還原上會放大影像的畸變失真,造成人體的三維骨架點在尺度還原後會有肢體不對稱的問題,尤其在距離影像擷取裝置越遠或越偏離拍攝中心點時,在座標還原後的畸變失真會越嚴重。 However, due to the curvature of the mirror surface of the lens of each image capture device, distortion on the distant image will be caused. Even if the distortion effect in the two-dimensional image is not obvious, the distortion of the image will be amplified in the scale restoration of the corresponding depth, causing the 3D skeleton point of the human body to have limb asymmetry after the scale restoration, especially in the distance. When the image capture device is farther away or deviated from the shooting center point, the distortion and distortion after the coordinate restoration will be more serious.
為解決此問題,需要針對不同的拍攝裝置進行影像畸變校正
還原。使用校正板擷取多張二維色彩影像或二維紅外線影像,計算出鏡頭的內部曲率參數k,藉由division畸變模型L(r)採用反向映射作校正還原。
,其中、為校正後的點座標,x、y為原始影像
點座標,x c 、y c 為畸變中心點。L(r)為畸變模型,,r為原始
座標距離畸變中心點的距離。對二維色彩影像或二維紅外線影像進行畸變校正還原。畸變模型可以儲存在記憶體13,骨架點計算模組143可以使用畸變模型校正二維人體骨架點資訊(步驟S40)。接著,映射模組144結合經校正之該二維人體骨架點資訊與該深度資訊以計算得到一三維人體骨架點資訊(步驟S50),此三維人體骨架點資訊也就非常接近真實的骨架點空間位置。
To solve this problem, it is necessary to perform image distortion correction and restoration for different photographing devices. Use the calibration plate to capture multiple 2D color images or 2D infrared images, calculate the internal curvature parameter k of the lens, and use the division distortion model L(r) to correct and restore by reverse mapping. ,in , are the corrected point coordinates, x , y are the original image point coordinates, x c , y c are the distortion center points. L(r) is the distortion model, , r is the distance from the original coordinate to the center point of the distortion. Distortion correction and restoration of 2D color images or 2D infrared images. The distortion model can be stored in the
如圖7所示,本發明實施例可以應用在跌倒偵測領域,但不限於跌倒領域,亦可用在運動訓練領域等。動作識別模組145係使用一匹配模型針對一段時間之一系列該三維人體骨架點資訊識別一動作(步驟S60)。一段時間之一系列該三維人體骨架點資訊可以是一個四維矩陣,也就是一段連續時間之三維人體骨架點資訊,常用的時間長度可以為1至2秒,更佳為1.5秒,以達到即時的動作識別。將動作識別示意圖標記在深度影像中時,可以使用假色(pseudo color)代表不同的深度資訊,例如紅色代表距離影像擷取裝置近,藍色代表距離影像擷取裝置遠。該匹配模型係以深度學習架構建立之行為分類模型參數,用於計算當下受測者的動態動作與模型中的何者動作較為匹配,以判斷識別出一動作,例如跌倒。
As shown in FIG. 7 , the embodiment of the present invention can be applied in the field of fall detection, but is not limited to the field of falls, and can also be used in the field of sports training and the like. The
本發明實施例之動作識別系統10更進一步包含一輸出模組146在識別該動作時發出一提示訊號。在跌倒偵測領域,提示訊號可以觸發一警鈴或一電話撥出動作,以通知家人或警察單位。圖7左欄顯示不同跌倒
偵測區域,右上欄顯示有跌倒提示訊號,並於右下欄顯示偵測到的跌倒畫面。
The
本發明實施例採用RGB二維色彩影像或二維紅外線影像萃取出二維人體骨架點資訊,且結合深度資訊以快速模擬出一段時間之一系列三維人體骨架點座標作為行為識別的輸入特徵,不僅相較於二維人體骨架點精準,更比三維感測器測出的三維人體骨架點節省資源與計算時間。若做為年長者在長照即時看護上的跌倒偵測系統應用,可以解決許多平面骨架點在動作/行為上因骨架點在平面上的多處重疊而無法準確辨識的問題。 The embodiment of the present invention uses RGB two-dimensional color images or two-dimensional infrared images to extract two-dimensional human skeleton point information, and combines depth information to quickly simulate a series of three-dimensional human skeleton point coordinates for a period of time as input features for behavior recognition, not only Compared with the accuracy of the 2D human skeleton points, it saves resources and computing time compared to the 3D human skeleton points measured by the 3D sensor. If it is used as a fall detection system for the elderly in real-time long-term care, it can solve the problem that many plane skeleton points cannot be accurately identified due to the overlapping of skeleton points on the plane in the action/behavior.
S10:步驟10
S10:
S20:步驟20
S20:
S30:步驟30 S30: Step 30
S40:步驟40 S40: Step 40
S50:步驟50 S50: Step 50
S60:步驟60
S60:
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109142075A TWI755950B (en) | 2020-11-30 | 2020-11-30 | Action recognition method and system thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109142075A TWI755950B (en) | 2020-11-30 | 2020-11-30 | Action recognition method and system thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI755950B true TWI755950B (en) | 2022-02-21 |
TW202223729A TW202223729A (en) | 2022-06-16 |
Family
ID=81329227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109142075A TWI755950B (en) | 2020-11-30 | 2020-11-30 | Action recognition method and system thereof |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI755950B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI824550B (en) * | 2022-06-07 | 2023-12-01 | 鴻海精密工業股份有限公司 | Method for generating distorted image, electronic device and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104598890A (en) * | 2015-01-30 | 2015-05-06 | 南京邮电大学 | Human body behavior recognizing method based on RGB-D video |
-
2020
- 2020-11-30 TW TW109142075A patent/TWI755950B/en not_active IP Right Cessation
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104598890A (en) * | 2015-01-30 | 2015-05-06 | 南京邮电大学 | Human body behavior recognizing method based on RGB-D video |
Non-Patent Citations (2)
Title |
---|
Das, Srijan, et al. "Action recognition based on a mixture of RGB and depth based skeleton." 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017. * |
Zhang, Chenyang, and Yingli Tian. "RGB-D camera-based daily living activity recognition." Journal of computer vision and image processing 2.4 (2012): 12.; * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI824550B (en) * | 2022-06-07 | 2023-12-01 | 鴻海精密工業股份有限公司 | Method for generating distorted image, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
TW202223729A (en) | 2022-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11328535B1 (en) | Motion identification method and system | |
CN111104816B (en) | Object gesture recognition method and device and camera | |
WO2021196294A1 (en) | Cross-video person location tracking method and system, and device | |
CN109887040B (en) | Moving target active sensing method and system for video monitoring | |
WO2020042419A1 (en) | Gait-based identity recognition method and apparatus, and electronic device | |
JP6793151B2 (en) | Object tracking device, object tracking method and object tracking program | |
CN105243664B (en) | A kind of wheeled mobile robot fast-moving target tracking method of view-based access control model | |
WO2018101247A1 (en) | Image recognition imaging apparatus | |
CN106384106A (en) | Anti-fraud face recognition system based on 3D scanning | |
US20220180534A1 (en) | Pedestrian tracking method, computing device, pedestrian tracking system and storage medium | |
CN111144207B (en) | Human body detection and tracking method based on multi-mode information perception | |
JPWO2019035155A1 (en) | Image processing system, image processing method, and program | |
JP2001283216A (en) | Image collating device, image collating method and recording medium in which its program is recorded | |
TWM610371U (en) | Action recognition system | |
CN110969045B (en) | Behavior detection method and device, electronic equipment and storage medium | |
JP2018156408A (en) | Image recognizing and capturing apparatus | |
CN113378649A (en) | Identity, position and action recognition method, system, electronic equipment and storage medium | |
TWI755950B (en) | Action recognition method and system thereof | |
CN111444837B (en) | Temperature measurement method and temperature measurement system for improving face detection usability in extreme environment | |
WO2018088035A1 (en) | Image recognition processing method, image recognition processing program, data providing method, data providing system, data providing program, recording medium, processor, and electronic device | |
CN112132110A (en) | Method for intelligently judging human body posture and nursing equipment | |
CN109544594A (en) | Target tracking method and system under multiple nonlinear distorted lenses | |
CN114639168B (en) | Method and system for recognizing running gesture | |
Hadi et al. | Fusion of thermal and depth images for occlusion handling for human detection from mobile robot | |
CN114067267A (en) | Fighting behavior detection method based on geographic video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |