TW200832237A

TW200832237A - Human activity recognition method by combining temple posture matching and fuzzy rule reasoning

Info

Publication number: TW200832237A
Application number: TW96102113A
Authority: TW
Inventors: Zhi-Yong Zhang; Zhi-Tao Lv; Jian-Wen Zhuo; Chin-Teng Lin; he-zhang Pu; de-zheng Liu
Original assignee: Univ Nat Chiao Tung
Priority date: 2007-01-19
Filing date: 2007-01-19
Publication date: 2008-08-01
Also published as: TWI322963B

Abstract

The present invention relates to a human activity recognition method, particularly to a method which uses temple posture matching and fuzzy rule reasoning to recognize an action. First of all, a foreground subject is extracted and converted to a binary image by a statistical background model based on frame ratio, which is robust to illumination changes. For better efficiency and separability, the binary image is then transformed to a new space by eigenspace and canonical space transformation, and recognition is done in canonical space. At least two image frame sequence is converted to a posture sequence by template matching. The posture sequence is classified to an action by fuzzy rule inference. Fuzzy rule approach can not only combine temporal sequence information for recognition, but also be tolerant to variation of action done by different people. Therefore, the present invention is able to achieve higher recognition accuracy than the prior art.

Description

200832237 九、發明說明：【發明所屬之技術領域】 =明之主要目_在提供_種人類動作辨識方法，特別是關於一種、。心序錢、_無齡嶋論來絲人_作的識別。【先前技術】200832237 IX. Description of the invention: [Technical field to which the invention belongs] = The main purpose of the invention is to provide a method for recognizing human motion, in particular with respect to one kind. The heart order money, _ ageless paradox to the silk man _ made identification. [Prior Art]

人類動作賴在自動監視_、人機麵、居家安全照齡統和智慧型居家環輪_财佔有主要的地位。許多人類動作麵利用單-張影像_來_該動作。但是，在時_上，姿勢狀態轉換的關係是用來辨別人類動作的重要資訊。近來已經有不少人類動作辨識的方法被提A大夕數人類動作識別方法可以依據其方法所使用的特徵而刀為兩大類第-類疋利用動量特徵(m〇ti〇n—以㈣化伽⑽）。在“聰 Trans· Pattern Anal，ν〇ΐ· 23, η〇· 3,蘭，，論文中，B〇bick 和利用比板4序#像中樣板的動量能量(耐iQne取）以及先前動量統計圖(motion-history)來辨視人類的動作。在“pr〇c. c〇nf,Human actions rely on automatic surveillance _, human face, home security, and smart home ring _ _ possessed a major position. Many human action surfaces use a single-image _ to the action. However, at time _, the relationship of posture state transition is important information for identifying other people's actions. Recently, there have been many methods for recognizing human motions. The method of human motion recognition can be based on the features used in the method. The knife is used in two categories. The first class uses the momentum feature (m〇ti〇n—to (four) Gam (10)). In "Cong Trans. Pattern Anal, ν〇ΐ· 23, η〇·3, Lan,, the paper, B〇bick and the momentum energy (IQne resistance) and the previous momentum statistics using the template in the plate 4 Motion-history to discern human movements. In "pr〇c. c〇nf,

Comput. VisionComput. Vision

Pattern Recog·，ν〇1· 4，ρρ· 38一45，2〇〇3” 論文中，R Hamid 等人採用了時空上的特徵，例如兩隻手之附目對距離和兩隻手之間的相對速度’此外還使用動態貝氏網路(dynamic Bayesian netw〇rks)來完成人類的動作識別工作。另一方面，則是使用二維跟三維的形狀特徵來識別人類動作。在 ΪΕΕΕ Comput· Soc· Workshop Models versus Exemplars in Comput· Vision，pp· 263-270，2003”論文中使用肯尼邊緣偵測器（Canny edge detector)得到的資料來表示動作的輪廓形狀，並且對於每一個動作定義了關鍵畫面（key frame)。在 “IEEE Int· Workshop on Anal· Modeling of 200832237Pattern Recog·, ν〇1· 4, ρρ· 38-45, 2〇〇3” In the paper, R Hamid et al. used features in time and space, such as the attachment of two hands to the distance and between the two hands. The relative speed 'in addition to the dynamic Bayesian netw〇rks to complete the human motion recognition work. On the other hand, the use of two-dimensional and three-dimensional shape features to identify human actions. In ΪΕΕΕ Comput· Soc· Workshop Models versus Exemplars in Comput· Vision, pp. 263-270, 2003” uses the data obtained by the Canny edge detector to represent the contour shape of the motion and defines for each motion. Key frame. "IEEE Int· Workshop on Anal· Modeling of 200832237

Faces and Gestures, ρρ· 74-81, 2003” 論文中的作者們提出了使用 SVM 以及不^:視角f彡響的二維形狀特徵描述來達到人類動作的分類與辨識。如果僅採用運動基礎和形狀基礎的特徵資訊來完成動作識別的工作，目為時序上資訊並未被考慮進去，所以還是有許多動作是無法清楚辨別的。因此，這促使我們設計-個穩定、強健的方法，利用人類動作中隱含固有的時序散態資訊來達成較精準的動作識別。随enMark〇vM〇dei ( _ )Faces and Gestures, ρρ· 74-81, 2003” The authors of the paper proposed the use of SVM and the two-dimensional shape feature description of the angle of view to achieve the classification and recognition of human actions. The shape-based feature information is used to complete the motion recognition work. The timing information is not taken into account, so there are still many actions that cannot be clearly identified. Therefore, this prompts us to design a stable and robust method to utilize humans. Implicit inherent timing information in the action to achieve more accurate motion recognition. With enMark〇vM〇dei ( _ )

能夠處理時序上的資料和能夠提供辨識上不销間尺度改變影響的特性.， • Hidden Markov Model ^ o MM 的代價是效率上的問題以及必須收集大量的資料和花費許多時間在估計 HMM相對應的參數。本發明利用特徵空間轉換(Eigenspacetransf_ti〇n)及標準空間轉換(Canonical space transformati〇n)提取圖像中的特徵因而經由使用這些空間轉換後的向量可以將時態序列轉換成姿勢序列，這個姿勢序列可由樣板模板類別的索引值組合表示。因為如果在—個很短的時間間隔中所鲁娜_»狀_錢變化差異擁少，紅要受到人齡做動作一個固有的自然鮮所限制，其頻率不會很高；因此我們採用減低取樣頻率影像而不使用所有畫面來做辨識。此外，本發明更提出了使用模糊法則推論方法’此方法不但可以結合人類動作上的姿勢序列資訊而且模糊法則可涵蓋不同人在做相同動作時姿勢上些微差異，増加辨識效果的穩定性。【發明内容】本發明之主要目的係在提供—種人_作辨識方法，制是關於一種 200832237 、（合時序絲比對趣糊關來絲人觸作的識別。一為^到上述目的’本發明之人嶋作辨識綠包含町步驟。首先設立一攝影裝置於點以捕捉該t點的原始畫面並建立學習—背景畫面，然後將_«置賴定頻率的咖擷取影像，並在降低抽樣頻=化為序】的&像輸人畫面，序列中的每―個輸人畫面並與該背景晝面做比對I’以取出前景影像。再將每_張取㈣前景影像實施空畴換，然後將至乂連_兩張&實施空曝換後的前景影像轉變成—崎序姿態序列。最後將這組時序雜相剌馳㈣絲_人_作識絲分類出成某一個人的動作_。此方法不但同時可以結合人_作上的姿勢序列資 Λ而且板糊邏輯方法能涵蓋不同人在做相同動作時姿勢上些微差異。模糊法則的優點亦包含可鱗制姿勢轉換_的隱藏模式。底下藉由具體實施例配合所附的圖式詳加說明，當更容易瞭解本發明之目的、技術内容、特點及其所達成之功效。【實施方式】本發明之主要目_在提供-種結合時序㈣_與_法則推論來完成人類動作的識別的方法。第1圖為本發明之系統架構圖，主魏構可以分成三個絲。第一個流程S110是前景影像人物的抽取。第二個流程S120是將影像經過空間轉換投影到一個維度較小、更容易辨識出姿態的空間上。第三個流程S130是單-張影像晝面的姿勢分類和使雌勢序列來辨識未知動作所屬的類別。人類動作識別系統的第一步是將前景人物部分抽取出來。在第一個流 8 200832237 程S110裡，首先第一步驟S111就是為前景人物抽取建造一個背景模型。本發明使用相連影像相除法來描述一個統計背景模型。透過計算統計最大、最小的灰階值和影像灰階值相除的最大比值，得以獲得_個背景模型。本發明提出一種已經被證實對照明變化較不敏感的影像相除比值法，而不疋使雌像差值法來建立背景模組。假設相機所捕捉卿晝面的影像強度可藉由方程式1表示。 I^x^)-Sl(x,y)ri(x9y) ⑴ 其中S代表某—個像素位置顺度，一代表某—個像素位朗反射率，而 2代表影像序列中的索引值。如果只拍攝背景並且保持照像機的穩定，反射率的影響健會存在。不過如果使用影像相除的方法，則反射率的影響將可以被消除。兩張賴的影餘除可以將他們的像素強纽值寫成方程式 2 ° logAbility to process timing data and to provide the ability to identify the effects of scale changes on the scale. • Hidden Markov Model ^ o MM is cost effective and must collect large amounts of data and spend a lot of time in estimating HMM Parameters. The present invention extracts features in an image by using feature space conversion (Eigenspace transf_ti〇n) and standard spatial transform transform (因而onical space transformati〇n), and thus can convert temporal sequences into posture sequences by using these spatially transformed vectors. It can be represented by a combination of index values for the template template category. Because if there is less difference in the change of Luna _» _ money in a short interval, the red is limited by the inherent natural freshness of the person's age, and the frequency is not high; therefore, we use the reduction Sampling the frequency image without using all the pictures for identification. In addition, the present invention further proposes a method for inferring using a fuzzy rule. This method can not only combine the sequence information of human motion, but also the fuzzy rule can cover the slight differences in postures of different people when performing the same action, and the stability of the recognition effect. SUMMARY OF THE INVENTION The main object of the present invention is to provide a method for identifying people, which is related to a kind of 200832237, (the combination of the timing and the contrast of the silky people.) The person of the present invention makes a step of identifying the green containing town. First, a photographing device is set up at a point to capture the original picture of the t point and a learning-background picture is created, and then the image of the _« 定定频率 , , , , , Reduce the sampling frequency = the sequence of the & like the input screen, each of the input screens in the sequence and compare it with the background surface to extract the foreground image. Then take each (four) foreground image Performing the domain swapping, and then transforming the foreground image after the implementation of the vacant _ two & air-converted into a sequence of singular order. Finally, the group of timing hybrids (four) _ _ _ _ _ _ Into a certain person's action _. This method can not only combine the position sequence of the person's gestures, but also the ambiguous logic method can cover the differences in the postures of different people when doing the same action. The advantages of the fuzzy rule also include the scale. Pose mode _ hidden mode The purpose of the present invention, the technical contents, the features and the effects achieved by the present invention will be more readily understood by the following detailed description of the embodiments of the present invention. [Embodiment] The main object of the present invention is to provide - A method of combining the time series (4) _ and _ law to complete the recognition of human motion. The first figure is the system architecture diagram of the present invention, the main Wei structure can be divided into three wires. The first process S110 is the extraction of foreground image characters The second process S120 is to spatially transform the image onto a space with a smaller dimension and more easily recognizing the gesture. The third process S130 is to classify the pose of the single-image and identify the female sequence. The category to which the unknown action belongs. The first step of the human motion recognition system is to extract the foreground character part. In the first flow 8 200832237, the first step S111 is to construct a background model for the foreground character extraction. Use the connected image division method to describe a statistical background model. Calculate the maximum ratio of the largest and smallest grayscale values and the grayscale values of the image. It is possible to obtain a background model. The present invention proposes an image division ratio method that has been confirmed to be less sensitive to illumination changes, without causing the female aberration method to establish a background module. The image intensity can be expressed by Equation 1. I^x^)-Sl(x, y)ri(x9y) (1) where S represents a certain pixel position smoothness, and one represents a certain pixel position reflectance, and 2 Represents the index value in the image sequence. If you only shoot the background and keep the camera stable, the effects of reflectivity will exist. However, if image separation is used, the effect of reflectivity can be eliminated. In addition to the two shadows, they can write their pixel strong values as equations 2 ° log

A-xi^y) log ^Ms^y)h\〇g{sM(s9y)) ⑵ 所以本發明提出侧影像相除的比值來建造出背景翻。背景$彡像的每個像素都以三個統計數值來代表：最小灰階強度值如）、最大灰階強度值讲㈣和最大树影像細相除雜。這三健_麟方式如方程式3所示。 9 (3)200832237 max {it (x9 y)} min {/, (jc,^)} max {/,(x，少)//卜|(x,j〇} I . otherwise max {/^i min {h{x9y)} max {/,_! (x, ^)//^ (x, j)} .* . 在建立出背景模型之後，步驟S112到S113是將前景人物部分可以從每個影像中分離出來。一張影像中的像素要被歸類於前景還是背景，主要依據方程式4所得到的結果。 0, a background pixel if ^ =* h (xty)/m{x,y)<kd(x,y) or h {x,y)/n{x,y) < kd{x,y) ⑷ 1, a foreground pixel otherwise 其中該方程式的B代表一張前景人物分離出來後的二值化影像，々值為可調參數，一般來說，k=l· 4。而前景人物的區塊則可以使用對X軸以及γ軸做投影統計後設定一個門檻值，依據此統計值的邊界位置就可以切割出來前景部分’切割出來的影像都會被調整到同樣的大小。兩張影像如果是在相鄰間隔很短的時間内捕捉進來的話，這兩張影像中的姿勢差異性並不會太大。此外，人體約屬於剛體，因此有它的自然頻率；換言之，當做某-動作時，它有行動速度上的自然限制。在我們的方法中’我們每隔-段固定的間隔取一個影像，稱做為基本樣板影像。第2 圖為挑選樣版影像的範例，該細在大賴定時間間隔㈣出五組基本左 200832237 到右走路雜版錄。秘基本觀雜會軸特徵郎_以及標準空間轉換投影到-個新的空間上如帛二個流程sl2〇所示是在標準空間中完成的。而整個識別的流程在視訊以及影像處理上，圖像資料的維度通常會非常大。因為在這些圖像中有很^/L餘的部份，通f解決這些冗餘部分的方法是關像經由空間上的轉換投影到-個新的空間上。這些轉換方法大部分是透過較少的維又來個原始圖像。在第二個流程测裡，本發明的轉換法結合了^驟5121的祕空間轉換(Ei_space t_f〇咖及步驟體的 “準工間轉換(Ca_lcal space transf_ti〇n)。特徵空間轉換已經有效的使用在自動人臉辨齡統酬和走料_識等系統上。在特徵空間轉換後，再使祕準空嶋換來減少.f料維度、最佳化於職兩個類別間的可分離性和提升辨識系統的效能。識別是在標準空間中完成的。假設總共有e __學習資料。每-個麵都代表存在整個訓練影像資料中的某—特定動作的姿勢型態。心代表在第i個類射的第』個影像，整個訓練集合的影像總數可以表示為〜^碼+碼。而整個訓練集 -了、文寫為x丨〜4 ,，其中每一個、都是含有η個像素的參像。首先’每張影像的強度會先經過正規化的處理如方程式5所示。 Ά.A-xi^y) log ^Ms^y)h\〇g{sM(s9y)) (2) Therefore, the present invention proposes a ratio of side image division to construct a background flip. Each pixel of the background image is represented by three statistical values: the minimum grayscale intensity value, for example, the maximum grayscale intensity value (4) and the maximum tree image fineness. The three health_lin methods are shown in Equation 3. 9 (3)200832237 max {it (x9 y)} min {/, (jc,^)} max {/,(x, less)//b|(x,j〇} I . otherwise max {/^i Min {h{x9y)} max {/,_! (x, ^)//^ (x, j)} .* . After establishing the background model, steps S112 to S113 are to make the foreground character part available from each Separated from the image. The pixels in an image are classified as foreground or background, mainly based on the results obtained in Equation 4. 0, a background pixel if ^ =* h (xty)/m{x,y)<kd(x,y) or h {x,y)/n{x,y) < kd{x,y) (4) 1, a foreground pixel otherwise where the equation B represents a binarized image after the foreground character is separated, and the 々 value is an adjustable parameter. In general, k=l·4. The block of the foreground character can use the projection statistics of the X axis and the γ axis to set a threshold value. According to the boundary position of the statistical value, the image cut out of the foreground portion can be adjusted to the same size. If the two images are captured in a short interval, the difference in posture between the two images is not too great. In addition, the human body is about a rigid body, so it has its natural frequency; in other words, when it is a certain action, it has a natural limit on the speed of action. In our method, we take an image at regular intervals, called the basic template image. The second picture shows an example of selecting a sample image. The detail is five sets of basic left 200832237 to the right walking miscellaneous record at the time interval (4). The basic view of the miscellaneous axis feature lang and the standard space conversion projection onto a new space as shown in the two processes sl2〇 are done in the standard space. The entire process of identification, in video and image processing, the dimensions of image data are usually very large. Since there are a lot of parts in these images, the way to solve these redundant parts is to project the image onto a new space via space conversion. Most of these conversion methods use a smaller number of dimensions to get the original image. In the second flow measurement, the conversion method of the present invention combines the secret space conversion of the 5121 (Ei_space t_f〇 and the "commutation of the work" (Ca_lcal space transf_ti〇n). The feature space conversion has been effective. It is used in the system of automatic face recognition and transportation, and after the feature space is converted, the secret space is exchanged to reduce the separability between the two dimensions and the optimized two categories. And improve the performance of the identification system. The identification is done in the standard space. Suppose there are a total of e__ learning materials. Each face represents the posture pattern of a certain action in the entire training image data. The ith image of the i-th class, the total number of images in the entire training set can be expressed as ~^code+code. The entire training set--, the text is written as x丨~4, each of which contains η The pixel's parameters. First, the intensity of each image will be normalized as shown in Equation 5. Ά.

Ml (5) 透過方程式5可以得到每張影像的平均像素值並由方程式6來表示。 (6) 200832237 7Ml (5) The average pixel value of each image can be obtained by Equation 5 and expressed by Equation 6. (6) 200832237 7

同％訓練’罐集合可以改寫成_個維度是_兩轉χ，也就是方程气 Χ = ί X L 1，1—1ϋχ，···，χ1，ΛΓι—πιχ，···，χ〇，Λ^-Ιη 1 T ⑺ 假設矩陣xxT的稚是κ，則矩陣χχΤ會有K的不為零的特徵值Ίχ，以及其相對應的特徵向量，W” W並且符合方程式8。 (8) 4U，Ru/， / = 1, 2,…,κ 方程式8㈣R = XXT，^麵是—觸_棒较心维度一般來說就是影_尺核小，這獅度會非fA，切加鱗運算很大的複 > Μ^Μ^(8ίη^Γ value decomp〇siti〇n the〇ry)^^ 透過計料這個矩陣來得到特徵值跟特徵向量，並透過方程式9來表示。 R = xTx (9) 式州矩陣的維度是一.，比原本R矩陣的_小很多。這倾矩陣也曰/K個不為零的特f植，以，a，和相對應的特徵向量，以石，接著可以利用柿式10來得到R矩_特徵值跟舰矩陣。 λ/ = λ/ (10) e^^pXe, 、中该式子中的μ’κ，則κ則代表κ個特徵向量為互相正交的向量。基於主成分分析⑽ncipaI叫。nent anaiysis)理論每張雜可以利來得軸的結果。這些 12 200832237 並且可以糊方程式11將原本的 k個這徵象量可以延展出1新的空間，影像投影到新的空間上。 yij=[ei,e2,-,eA]TXi_ 式子中/ = 1, 2,…,c而hi, 2d 換向量。 (11) 我們將[〜〜·Α]Τ這個矩陣稱為特徵空間轉使用方程式12計算出來 Ντ ΣΣ ytj (12) 其中ί=1’2’…’，=u’.'。各類別的平均向量則可以使用方程式13計算。 m<=忐Σ〜. y/病（13) • 心出辆個平均向量後，_可以絲下列三《數：St代表總散佈矩陣（total scatter _Ηχ)，&代表群内散佈矩陣（咖^咖 matrix) Sb代表類別間散佈矩陣加加阶咖％。這三個變數可以透過下列方程式計算得到The same % training 'can set can be rewritten as _ one dimension is _ two turns, that is, the equation gas = ί XL 1,1 -1ϋχ,···,χ1,ΛΓι—πιχ,···,χ〇,Λ ^-Ιη 1 T (7) Assuming that the maturity of the matrix xxT is κ, the matrix χχΤ will have a non-zero eigenvalue K of K, and its corresponding eigenvector, W” W and in accordance with Equation 8. (8) 4U, Ru/, / = 1, 2,..., κ Equation 8 (4) R = XXT, ^ face is - touch _ bar is relatively small, the shadow is generally small _ ruler nucleus, this lion will be non-fA, cut and scale operation is very large复^Μ^(8ίη^Γ value decomp〇siti〇n the〇ry)^^ The eigenvalue and eigenvector are obtained by counting the matrix and expressed by Equation 9. R = xTx (9) The dimension of the state matrix is one. It is much smaller than the _ of the original R matrix. This tilt matrix is also 曰/K non-zero special f plants, with a, and the corresponding feature vector, with stone, then Use the persimmon 10 to obtain the R moment _ eigenvalue with the ship matrix. λ / = λ / (10) e ^ ^ pXe, , in the formula μ 'κ, then κ represents κ feature vectors for mutual positive Crossed vector. Based on Principal component analysis (10) ncipaI called nent anaiysis) The result of each theory can be derived from the axis. These 12 200832237 and can paste equation 11 to extend the original k signs to a new space, the image is projected into a new space Yij=[ei,e2,-,eA]TXi_ where /= 1, 2,...,c and hi, 2d change the vector. (11) We call the [~~·Α]Τ matrix The space is calculated using Equation 12 Ντ ΣΣ ytj (12) where ί=1'2'...',=u'.'. The average vector for each category can be calculated using Equation 13. m<=忐Σ~. y/ Sickness (13) • After the heart has an average vector, _ can be the following three numbers: St for the total scatter _Ηχ, & for the intra-group scatter matrix (coffee), Sb for inter-category distribution The matrix plus the order %%. These three variables can be calculated by the following equations.

St=權㈧Ί〜)τ 1 c Σ6,,厂 1 »c_St=right (eight)Ί~)τ 1 c Σ6,,factory 1 »c_

Sb=^L^(m/-«n,)(m/-,nj/)T 13 200832237 標準空間轉換的主要目的就是同時使群中散佈矩陣最小而類別間散佈矩陣最大。要得到這結果可以透過方程式14計算出來。 J(w)=^w. WTSwW (14) 求解的方法就是將選擇一個極大解W以符合方程式15。Sb=^L^(m/-«n,)(m/-,nj/)T 13 200832237 The main purpose of standard space conversion is to minimize the scatter matrix in the group and the scatter matrix between classes. To get this result, it can be calculated by Equation 14. J(w)=^w. WTSwW (14) The solution is to choose a maximal solution W to conform to Equation 15.

假設W*是最佳解，而<向量是相對於第i ^固最大特徵值所得到的特徵向量。依照“/阶牆冰如加， 1990”書中所提到的理論可以將方程式15改寫成方程式i6。It is assumed that W* is the best solution, and the <vector is the feature vector obtained with respect to the i-th solid maximum eigenvalue. Equation 15 can be rewritten into equation i6 according to the theory mentioned in the book "/Step Wall Ice, 1990".

Sbw* =XZ-Sww* (16) 解完後可以得到η個不為零的特徵值跟其相對應的特徵向量 [心，〜】。细這個基底，我們可以將—個在特徵空間上的—個點投影到標準空間上的另一個點如方程式17所示。 Άι’ν2，.⑽ , ^^^^Κν2,...,νε_ι]ΤΜι]^„^ΓΒΐ# 換矩陣。 …〜、5方&式11和17 ’母張景多像都可以透過方程式Β投景多到一個 c- 1維的新空間上。 Z/j=HX/j.. (18) 14 200832237Sbw* =XZ-Sww* (16) After the solution, you can get η non-zero eigenvalues and their corresponding eigenvectors [heart, ~]. To thin this base, we can project a point on the feature space to another point on the standard space, as shown in Equation 17. Άι'ν2,.(10) , ^^^^Κν2,...,νε_ι]ΤΜι]^„^ΓΒΐ# Change matrix. ...~,5方&11 and 17 'Mother Zhang Jing multi-image can pass the equation Β 投景 has a new space in c- 1 dimension. Z/j=HX/j.. (18) 14 200832237

第三個流程S130主要為辨識流程，在時間影像序列中，不同姿勢間的轉變關係是酬人類動作上—個非常重要哺訊4果我們僅僅利用-張影像來做細作辨觸依據，則分賴絲料錯誤，__不同的動作中’可能會纽極類似的姿勢影像。人類動作的單-姿勢常常有模糊性，所以在步驟腳到卿裡，我們提出模糊法聽做人類動作識別，匕不僅此夠、’Ό σ時間序列上的資訊，而且可以容忍不同人綱—個動作的差異。相關使賴糊理論的論文如下。在“臓Trans.細.，齡η， ν〇1. 22’ no. 6的論文中’⑽和此谢提出從例子中去學習產生模糊 ΊΕΕΕ Trans. Sys., Man Cybern, A, vol. 30, no. 2« 中’ Su提出了以翻法縣基_方法來時序上辭部姿勢辨別。假'^疋第1個安勢類別中第個人樣板影像的空間轉換向量，而瓦是 ^-^.1 0 ^iri#.j^,^g^^^#t(membership 如姐010來絲某—張·姆於每—懈勢酬的可能性，此隸屬函數可由方程式19所表示。 -exp 2 a (19) J代表a的協方差矩陣〈c〇variance脈廿⑷而心代表的是平均向量。我Ha和心的各維度都是互相獨立的，則方程式可以改寫為方 20。 $The third process S130 is mainly for the identification process. In the time image sequence, the transition relationship between different postures is a rewarding human action - a very important feeding 4. We only use the - image to make a fine-grained basis, then Lai's silk material is wrong, __ different movements may be similar to the posture image. The single-pose of human action is often ambiguous, so in the footsteps to the Qing, we propose a fuzzy method to listen to human motion recognition, not only this enough, 'Ό σ time series information, but also can tolerate different people - The difference in actions. The papers related to the theory of reliance are as follows. In the paper "臓Trans.fine., age η, ν〇1. 22' no. 6' (10) and this Xie proposed to learn from the examples to produce fuzzy ΊΕΕΕ Trans. Sys., Man Cybern, A, vol. 30 , no. 2« 中' Su proposed to use the method of grading the county base _ method to identify the position of the vocabulary. False '^ 疋 the first space category of the first model image of the space transformation vector, and the tile is ^- ^.1 0 ^iri#.j^,^g^^^#t(membership If the sister 010 comes to the silk--Zhang M-Yu--the possibility of reluctance, this membership function can be expressed by Equation 19. Exp 2 a (19) J represents the covariance matrix of a <c〇variance pulse (4) and the heart represents the average vector. The dimensions of my Ha and heart are independent of each other, then the equation can be rewritten as square 20.

15 (20) 200832237 /、中m代表維度的索引，〜是使用全部的平均向量去求得的第阳維標準差，而r代表影像相胁第，·健勢_的隸屬函紐。最後料求得這張影像所屬的姿勢類別並由方程式21所示。 / = arg max r 1 1 (21) 母一張影像都會改用—個文字符號1來代表它，這邊的i指的是姿勢類別的索引值。馨為了包含時序資訊，我們將三張相連的減低取樣頻率影像結合成為一、、且如（k>，13)。如果這_組資料採用了太多張的影像，則_個快速完成動作週期的動作容易受到其相_作的影響;如絲用太少_影像，則序的資訊會不足。首先，這三張影像會投影到標準㈣巾，然後依辟隸屬函數值將它轉換成姿勢序列。接著將姿勢序列與它相對應的動作類別為（U k h，D) ’ D代為動作（Action Category)。這樣的姿勢序冓成核糊法則系統在學習時的輸人輸出對。而系統經由這組輸入輪出 _ 對’會產生-組相對應的法則，它的形式如下 IF antecedent conditions hold, mm consequent conditions 舉例來說，如第2圖所示，第-張影像的姿勢序列是屬於^、第二丨屬於P19、帛二張f彡像胁p2()，而其姆應賴15 (20) 200832237 /, where m represents the index of the dimension, ~ is the standard deviation of the first dimension obtained using all the average vectors, and r represents the membership of the image threat, and the membership of the health_. Finally, the pose category to which this image belongs is expected and is shown by Equation 21. / = arg max r 1 1 (21) The parent image will be replaced by a text symbol 1 to represent it. The i here refers to the index value of the pose category. In order to contain timing information, we combine three consecutive reduced sampling frequency images into one, and as (k>, 13). If this _ group of data uses too many images, then the action of _ a quick completion of the action cycle is easily affected by its phase; if the wire is used too little _ image, the information of the sequence will be insufficient. First, the three images are projected onto a standard (four) towel, and then converted into a sequence of gestures based on the value of the membership function. Then, the action sequence corresponding to the gesture sequence is (U k h, D) ′ D is the action (Action Category). Such a posture order 冓冓糊法系统系统系统系统系统系统系统系统系统系统系统系统系统系统系统And the system through this set of input rounds the _ pair 'generates the corresponding rule of the group, its form is as follows IF antecedent conditions hold, mm consequent conditions For example, as shown in Figure 2, the pose sequence of the first image It belongs to ^, the second 丨 belongs to P19, 帛 two sheets of f彡 like threats p2(), and its

會由方程w料，咖糊，峨表H (/p/2?/3；D) = (/>85/，9)P2〇; ^ 200832237 由此會產生下列的法則 IF the activity’s Ii is Pi8 AND its h is Pi9 AND its L· is p2{)，then the activity is Wlr” 第3圖顯示系統分類演算法的結構。首先，每間隔一段固定時間去綠取一張影像如S300。這一張影像會經過空間的轉換到標準空間中並且計算出其相對應的隸屬函數值如S305到S310。接著如S3I5到S325步驟所顯示，每三張影像的隸屬函數值集合成一個群組，去訓練所得到的模糊法則資料庫中尋找一組最相似的姿勢序列，這個影像序列會被分類到最類似的法則中所紀錄的動作類別。我們的實驗環境是一間教室。光源是穩定的日光燈。採用僅有一張桌子的單純背景。攝影機固定在一個地點，並且不移動的拍攝同一場景。攝衫機每秒鐘會拍攝30張640x480像素的影像。共有六個動作者，每位各自做相同的下面六個動作：「由左往右走」㈤、「由右往左走」㈤、「跳躍」 (Jump)、「蹲下」（Crgh)、「爬上」（Cup)和「攸下」（㈤。其中五個人的影片為訓練用，剩下-個則當測試用，每個人的影片都會輪流拿來做測試。第4a圖是-張由攝影機擷取的影像，第必圖是前景人物分離出來並轉為二值化影像的結果。細各_了六_基本㈣樣板影像給「由左往右走」、「由右往左走」和「攸上」，五類給「爬下」，三類給「蹲下」，兩類給「跳躍」，總共是28類。在所有的測試影片都經過以上的訓練學習後，我們必須設定一個門檻值，這Η檻錢用來摒除某些姿勢序列發生次數相對較少的法則，門 200832237 楹值的高低會影響法則數目❹寡。第5圖為不同嗎值與辨識結果的比較圖。圖村咖，tf猶^帅_解“，綠我們的實驗中’我們採用三為門檻值，這是因為如果採用的門插值太低，則合產生太 ^則⑽概驗爾w，吨—些互相矛盾的、、i產生，我們則挑選在訓練時出現次數較多的法則。表格-顯示我們的系統的認識率。我們的動作辨識系統目前是離線測試狀態’也_雜彳觀爾軸。目騎5:丨紐抽樣影像的步驟，我們在測試的時候把由不同起始時序位置，即i至5的位置去讀取影像，做辨識的狀況也類似考慮進去，這與我們在訓練模糊法則時是二樣的，而且也比較相近於即時辨識時發生的狀況。舉例來說，我們影片中弟-個、第二個、第三個、第四個或第五個影像開始去讀取做辨識。在 “IEEECVPR, PP. 379- 385, 1992” 的論文中，Yamat〇和〇_ 提出使用 HiddenMarkQvMQdel 做人_作的酬。隨enMarkQvMQdei (醜）是-種狀關轉換的機率翻，通常被拿來做時序上㈣料分析。在實驗中我們將模糊法則方法與HMM方法的識別率做個比較。所有動作者的動作辨識率 200832237It will be calculated by the equation w, the coffee paste, and the table H (/p/2?/3; D) = (/>85/, 9) P2〇; ^ 200832237 This will produce the following rule IF the activity's Ii is Pi8 AND its h is Pi9 AND its L· is p2{), then the activity is Wlr” Figure 3 shows the structure of the system classification algorithm. First, every time a fixed time interval, go to the green to take an image such as S300. The image is spatially converted into the standard space and its corresponding membership function values are calculated as S305 to S310. Then, as shown in steps S3I5 to S325, the membership function values of each of the three images are grouped into one group. The fuzzy law database obtained by the training finds a set of the most similar pose sequences. This image sequence is classified into the action categories recorded in the most similar rules. Our experimental environment is a classroom. The light source is a stable fluorescent lamp. Use a simple background with only one table. The camera is fixed at one location and the same scene is taken without moving. The camera will shoot 30 images of 640x480 pixels per second. There are six actors, each doing the same. The following six actions: "Left to the right" (5), "From right to left" (5), "Jump", "Crgh", "Cup" and "攸下" ((5). Five of the films are for training, and the rest are for testing. Each person's film will be taken for testing. Figure 4a is - Zhang captured by the camera, the first picture is the foreground The result of the separation of the characters and the conversion to binarized images. Detailed _ six _ basic (four) sample images for "from left to right", "from right to left" and "squat", five to "crawl" "The next three classes are for "Kneeling" and the two classes are for "jumping". The total is 28 categories. After all the test videos have passed the above training, we must set a threshold value, which is used to eliminate the money. The rule that the number of occurrences of certain posture sequences is relatively small, the height of the threshold of 200832237 will affect the number of rules. Figure 5 is a comparison of the difference between the value of the value and the identification result. Figure village coffee, tf still handsome _ solution Green, in our experiment, 'we use three thresholds, because if the gate is used Too low, then the combination is too ^ then (10) the general inspection w, t - some contradictory, i generated, we choose the law that occurs more frequently during training. Table - shows the recognition rate of our system. The motion recognition system is currently in the offline test state 'also _ 彳彳。。. 目 5: 丨 New Zealand sampled image steps, we read the different starting timing position, i to 5 position in the test Taking images and identifying them is similarly considered. This is the same as when we are training fuzzy rules, and it is similar to what happens when instant recognition occurs. For example, the younger, second, third, fourth, or fifth image in our movie begins to be read for identification. In the paper "IEEECVPR, PP. 379-385, 1992", Yamat〇 and 〇_ proposed the use of HiddenMarkQvMQdel as a reward. With enMarkQvMQdei (ugly) is the probability of a kind of conversion, usually used for timing (four) material analysis. In the experiment we compare the recognition rate of the fuzzy rule method with the HMM method. Action recognition rate of all actor 200832237

- ---- 内、"八w 于曰 w、j r百段’各ΗΜΜ必須經過訓練以產生最能代表某一類動作的姿勢轉換機率參數。在實驗中，細制BaUm-Weleh演算絲產生料出_ _關參數，另外我們採用了前向鍵結方式(f_rd—chaining)，狀態的各數設定28個，最後觀測序列的長度設定為三。在經過訓練資料的學過後，可以獲得六組匪此分別代表一類的動作。在識別-個未知動作之觀測相軸作時，將未知動作歸類於能從六個 HMMs中產生最大機率值，也就是最相近的HMM類別中。我們採用前向演算法(forward algorithm)來計算出這個機率值。使用HMM演算法和以模糊法則為基礎的演算法之間的辨識率比較顯示 200832237 在表格二。翻法朗算魏崎做高顯示了模糊法則演瞀法在人貊μ 率大、，勺提尚了 2.4%，這蝴__叫咖辨識效果。- ---- Inner, "eight w 曰、 w, j r hundred paragraphs â€ΗΜΜ Each must be trained to produce the posture conversion probability parameter that best represents a certain type of action. In the experiment, the fine BaUm-Weleh calculus produced _ _ off parameters, and we used the forward linkage method (f_rd-chaining), the number of states was set to 28, and the length of the last observation sequence was set to three. . After the training materials have been learned, six groups of actions can be obtained. When identifying the observed phase axis of an unknown action, the unknown action is classified into a HMM class that can generate the maximum probability value from the six HMMs, that is, the closest. We use a forward algorithm to calculate the probability value. A comparison of the recognition rates between the HMM algorithm and the algorithm based on the fuzzy rule is shown in 200832237 in Table 2. The method of calculating the fuzzy law is that the fuzzy law is higher in the human 貊μ, and the spoon is still 2.4%. This __ is called coffee recognition effect.

實驗、I果顯不’在沒有參考任何姿體所在位置、移祕徑和移動速度的狀况下I、種動作的總辨識率可以達到9178%。模糊法則演算法與画方法相比較下提高了大約2· 4%的正確率。 ' 惟以上所述者’僅為本發明之較佳實施例而已，並非用來限定本發明實施之細。故即凡依本發财請範圍所述之形狀 '構造、特徵及精神所為之均等變化或修飾，觸包括於本發明之情專利範圍内。 20 200832237 【圖式簡單說明】第1圖為本發明之系統架構示意圖。第2圖為挑選樣版影像的範例示意圖。第3圖為系統分類演算法的結構示意圖。第4圖為前景人物擷取範例示意圖。第5圖為不同門檻值與辨識結果的比較示意圖。【主要元件符號說明】鲁無The experiment, I fruit does not show the total recognition rate of I and seed movements can reach 9178% without reference to any position, moving path and moving speed. Compared with the painting method, the fuzzy law algorithm improves the accuracy of about 2.4%. The above description is only a preferred embodiment of the invention and is not intended to limit the scope of the invention. Therefore, any changes or modifications in the shape, structure, and spirit of the shapes described in the scope of the present invention are included in the scope of the invention. 20 200832237 [Simple description of the diagram] Fig. 1 is a schematic diagram of the system architecture of the present invention. Figure 2 is a schematic diagram of an example of selecting a sample image. Figure 3 is a schematic diagram of the structure of the system classification algorithm. Figure 4 is a schematic diagram of a sample of prospective characters. Figure 5 is a comparison of different threshold values and identification results. [Main component symbol description] Lu Wu

21twenty one

Claims

200832237 X. Applying for a patent garden: 1· A human motion recognition method, including: taking a device at a point to capture the original image of the fixed point and establishing a learning_Northern picture; the month is not equal to the camera as the fixed frequency The time interval is taken to capture the image, and after the sampling frequency is lowered, the sequence h image is rounded into the picture, and each input in the sequence is input and compared with the background picture to take out the foreground image; _ the foreground of the parent piece is taken out The image performs spatial transformation, and then converts at least two consecutive foreground-converted foreground images into a set of time-series pose sequences; and classifies the set of time-series poses into a certain action based on the human motion recognition based on the fuzzy rule inference. category. 2. The human motion recognition method of claim i, wherein the spatial transformation comprises feature space conversion and standard space conversion. 3. The human motion recognition method according to claim i, wherein the extracted foreground φ image is a binarized image. 4. If you apply for a patent scope! The human motion recognition method according to the item, wherein the modular law recognizes the fuzzy rule by using a lower sampling frequency of at least one image sequence as an input and combining at least one of the basic human poses. 5. * Apply for the special female encyclopedia _ _ people _ as the identification method, which can be combined with the time-sequence information 'and can tolerate the difference between different people doing the same action. 6. The human motion recognition method according to claim 1, wherein the fuzzy rule comprises a human motion hidden mode in which a gesture transition relationship can be learned. 22 200832237 7. The patent as claimed in claim 1 wherein the human motion recognition is done in a standard space. 8. The human motion recognition method of claim 1, wherein the background model uses a connected image division method to describe a statistical background model. 9. The human motion recognition method according to claim 8, wherein the statistical background model transmits, calculates a statistical maximum and minimum gray scale value and a maximum ratio % value of the connected image gray scale value to obtain a background. model. • 1〇·The human motion recognition method according to item 8 of the patent application scope, wherein the moon image pixels of the statistical background model are represented by two statistical values: a minimum gray scale intensity value and a maximum gray scale intensity value. Divides the ratio of the maximum connected image grayscale value. 11. The human motion recognition method of claim 2, wherein the human motion recognition utilizes a Gaussian membership function to represent the likelihood of each image relative to each gesture category. ♦ 23