TW200910221A - Method of determining motion-related features and method of performing motion classification - Google Patents

Method of determining motion-related features and method of performing motion classification Download PDF

Info

Publication number
TW200910221A
TW200910221A TW097117446A TW97117446A TW200910221A TW 200910221 A TW200910221 A TW 200910221A TW 097117446 A TW097117446 A TW 097117446A TW 97117446 A TW97117446 A TW 97117446A TW 200910221 A TW200910221 A TW 200910221A
Authority
TW
Taiwan
Prior art keywords
image
action
order
feature
features
Prior art date
Application number
TW097117446A
Other languages
Chinese (zh)
Inventor
Olivier Pietquin
Vasanth Philomin
Original Assignee
Koninkl Philips Electronics Nv
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninkl Philips Electronics Nv filed Critical Koninkl Philips Electronics Nv
Publication of TW200910221A publication Critical patent/TW200910221A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method of determining motion-related features (Fm) pertaining to the motion of an object (1), which method comprises obtaining a sequence of images (f1, f2, . . . , fn) of the object (1), processing at least parts of the images (f1, f2, . . . , fn) to extract a number of first-order features from the images (f1, f2, . . . , fn), computing a number of statistical values pertaining to the first-order features, combining the statistical values into a number of histograms (HLC, HG), and determining the motion-related features (Fm) for the object (1) based on the histograms (HLC, HG). Furthermore, the invention relates to a method for performing motion classification for the motion of an object (1) captured in a sequence of images (f1, f2, . . . , fn). Such a motion classification method comprises determining motion-related features (Fm) for the images (f1, f2, . . . , fn) using the described method of determining motion-related features (Fm), and using the motion-related features (Fm) to classify the motion of the object (1). The invention also relates to a system (3) for determining motion-related features (Fm) pertaining to an object (1) in an image (f1), and to a system (4) for performing motion classification for the motion of an object (1).

Description

200910221 九、發明說明: 【發明所屬之技術領域】 ,本發明係關於一種判斷有關—物件之動作之動作相關特 徵之方法,及關於一種執行一物件之動作之動作分類之方 法。此外,本發明係關於一種用於判斷有關一物件之動作 相關特徵之系統’及關於一種用於執行動作分類之系統。 【先前技術】 、人類間最自然之溝通手段之一係以手或頭作擺動,亦稱 , ^勢’以當作口說溝通的-補充,或當作溝通本身的_ 手段。語音之一簡單擴增可包括例如當給予另一人指導 吟移動口人之手與臂,以指往_特別方向。單獨藉由動 作溝通&範例係手語,其完全基於手勢。由於手勢對人 類係第一本能,因而例如在用於自動手語解譯等之一家庭 對話系統、-辦公室環境的一對話系統中,手勢將為人類 與電腦間之溝通之一自然而且容易之手段。由於近年來在 ; 货中之發展’叙想,未來各種對話系統之使用將變得愈 來愈a及,而且—使用者之擺動之分類(亦即,手勢辨識) 將扮演一逐漸重要之角色。 數個手勢辨識系統已藉由目前技術方法而提出。一般而 言,此等已提出系統依賴影像分析在一影像順序之每一者 中疋位一使用者,並且分析連續姿勢,以判斷其所作之手 勢。某些系統要求,該使用者穿戴一彩色套裝或位置感測 器,其將為例如一家庭對話系統之一般使用者最不期望的 一要求。例如美國6,256,〇33 B1之其他系統提出從一影像 130766.doc 200910221 擷取對應於一使用者之身上之某 埜彻糾由W <数個座標,並將此 ::應於已知手勢之座標比較。此—途經之一明顯缺點 析該等影像所要求之計算努力具有非常高位準, 而且用以獲得該等影像之相機 換^ 忒此必須足以確保清晰影 像。其他習知系統嘗試解決手勢 播田”々 于労辨4之問題’其係藉由與 -或多個相機所獲得之立體聲影像f料共同運作。例 Ο Ο ^觸〇3/_218提出使用二相機獲得作一手勢之一使用 者的一立體聲影像,並且進而 祛田料成 砂除该專影像中之背景,及 使用對應於該使用者之身體之影像之該等部分之 段之統計框架追蹤該使用者之上 屮糸身再·人,使用此已提 =統,用於手勢辨識所要求之計算位準係非常高,所以 勢:識將很緩慢。此類系統之另一缺點係牵涉之硬體 勢㈣“ _機或多個相機。所有習知手 勢, 使用者必須緩慢執行一手 勢而足以確保,維持影像品質之某一位 影像不模糊。 I 茨等 手㈣識而言’目前技術所建議之解決方宰太 吓貝、太緩慢,或者太笨重,特別言之, 案太 入、使用者易學易用方式#需要非侵 以判斷有關一使用者之擺動:資=,。經常、:簡單用 用於—動作分類程序之動作、 用以判斷適合 之努力係實在太耗費成b 4系統所要求 【發明内容】 因 此本發明之-目的係提供判斷有關1像中之—物件 130766.doc 200910221 之動作相關特徵及用於分類一物 之方法。再者,本發明之一 之各易而且經濟 當系統。 的係k供用於此等任務之適 最後,本發明提供—種 關特徵之方法,該方法包括獲彳之動作之動作相 處理該等影像之至少八'于〃 #的—影像順序,及 階特徵。 ”’以便從該等影像榻取數個— 藉此,術語”影像順序,, 所捕捉之全部影像二=例如藉* 例如,對於具有一快速快門速 的一選擇。 之影像可能足夠,反之:目機’採用每隔九個 本相機’可能較佳者使用全部影像。 的低成 欲判斷其動作相關特徵之一 I* , -¾ ^ , 物件的一影像可處理1餐 體,或者例如當僅該影像 〜里,、整 像之某—區。操取自該影僅處理該影 糊相關特徵、梯度值,或者可掏 :為模 數值,例如像描述_像素之色 y像之任何其他實 意謂基本上在計算之笛u°術語"一階"特徵 -特徵n 階段巾可_自卿像資訊的 且梯产值=*像素色彩值可直接榻取自該影像資訊,而 且饰度值可藉由直接盧 且擇處理该影像資料加以獲得。 的中ΪΓ等—階特徵之數個統計值係於計算 予-或多個二基此等統計值組合, 相關特徵。此等二階動作=圖而判斷該物件之動作 勁作相關特徵可為給自或擷取自直方 130766.doc 200910221 Γ=資訊’其識別與該物件之動作相關之影像中之巴 域,例如,由一移動物件所造旦 ^ £ 尤其’該等直方圖本身 k之模糊區域。 此處所佶用 視為某種二階動作相關特徵。如 者= 吏用之術語"物件"係解譯成意謂-使用者、—使用 者之身體,或者該使用者 以-預定基士4 身體的-部分’但亦可為能夠 写人^義方式移動之任何移動物件,例如一擺鐘、一機 件"與”使=型之機器等。以下將可互換地細衧語"物 Ο Ο 動==提供一種執行在一影像順序中捕捉之-物件之 動作之動作分類之方法, 等影像中捕捉之物伴^上述方式判斷在該 動作相關特徵,及使用該等動作相 關特徵刀類該物件之動作。 根據本發日月> +、+ 類,益需J二明顯優點係可達成動作或手勢分 之複雜= 術之手勢辨識系統所要求之影像分析 取今等 @且在根據本發明之方法中用以從-影像榻 者之演算法係容易取得,而且係熟諸此技術 =二例如用於影像壓縮之已設置演算法。由於將有 中特徵之統計資訊簡單地結算或計數於直方圖 等動/、可容易地而且非常快速地加以分析,以獲得該 寻勤作相關特徵,兮ρ担, 用\ 5玄已k出方法提供優於目前技術中所使 識別析方法之明確優點。根據本發明之方法不要求 求模1=化該影像中之物件(例如,-人物),而且不要 化身體之部分。另_優點係例如一網路攝影機之一 氏解析度相機係相當足以用於產生該等影像,允許 130766.doc ,10. 200910221 便且而且有效地實現根據本發明之系統。此外,由於不需 要直接追縱-物件之實際動作,對於該相機而言一低框速 率已足夠,所以一便宜相機係相當足夠。 種用於判斷有關一物件之動作相關特徵之對應系統包 括-影像來源,用於提供該物件的一影像順序;一處理單 兀,用於處理該等影像之至少一部分,以便從該影像擷取 數個一階特徵;_ 士+瞀S3 - 汁异皁凡,用於計算有關該等一階特徵 之數個統計值;一组Α置;m Ο Ο ’’、0早兀,用於組合該等統計值,以給 予數個直方圖,而且& 用於基於此4直方圖判斷該物件之動 作相關特徵。 此外’-種用於執行動作分類之對應系統包括用於判斷 !關該物件之動作相關特徵之此-系統,而且同時一分類 早疋’其使用該等動作相關特徵分類該物件之動作。 特別σ之’附屬請求項及後續描述揭 體實施例與特徵。 乃之有利具 由一影像來源(例如一相機 種類型之特徵化資訊,:如Γ! 像含有各 邊緣清晰度等,如以上用灰階值、輪廟或 的-階特徵。為了二:述=作根據本發明之方* 刀類在一影像中捕捉之一 用途,當該物件看似橫路m金^ 午之動作之 灰階值感興趣,立例如”動時’可能對色彩或 述一…一 /、例如用以追蹤該物件之色彩。然而1 以〜$景’像之-部&中之模糊位準之資田 有益。若一影像之_ °為特別 在相機前面作-手勢❹他二°可能’此區對應於像 手勢或其他擺動之一使用者之手或臂。因 130766.doc 200910221 此’在本發明之一特別較佳具體實施例中,擷取自該影像 的階特徵係模糊相關特徵。對於例如物件或圖案辨識之 影像處理技術而言,藉由一影像中之輪靡或邊緣所給定之 資汛特別有用。該影像中之某些邊緣可為清晰(對焦)或者 模糊(失焦)。對於具有充分景深的一相機,一影像區中的 Τ模糊部分將指示,於產生該影像時,該影像之該部分中 的物件移動。數個技術係可供用於識別一影像中之邊 Ο 緣而且近年來已發展小波變換之技術,用於如有效影像 壓此類應用。熟諳此技術者將知道該技術,而且此處 不而詳細解釋。總之,小波變換之技術可用以快速而且可 靠地哉別-影像中之邊緣,及用以獲得如同此等邊緣之清 晰度之位準之資訊。 因此,根據本發明之—較佳具體實施例,較佳者從—參 像擷取模糊相關特徵係藉由相對於數個標度對影像資料執 =小波變換實現’以判斷該影料之每—點的—系列小 里糸數。因為料小波係數之值端視鄰近像素之值之差 以兮而且接著差異之位準端視該影像中之模糊之位準,所 之=小波係數係模糊相關特徵,其最終提供有關該影像 係特 。使用模糊相關特徵當作一階特徵之此一方法 之一務私夕主 了重要事實,將在-影像中引入模糊 望之:動之手、頭等被用於正面效應。模糊不再為一不期 “作用’反而提供有價值資訊,以作 八 (例如,手勢辨識)之基礎。 “ 刀類 使用之小波變換之類型可端視該影像之尺寸達某—程 I3〇766.d〇c 12 200910221 Ο Ο 度。例如,具有充分解析度之一較大影像之影像資料可用 作-二元小波變換(DWT)之輪入,反之,一連續小波變換 (加)可用於僅具有—相對較低解析度之較小尺寸之影 像,例如由-典型網路攝影機所產生之該等影像。然後可 處理如此獲得之小波係數,以獲得該影像之模糊相關特 徵。該影像之每-像素或點對應於該變換之每一標度中的 一小波係數’而且每—標度具有與該影像相同之點數。因 此’於實現- 10標度變換後,將存在該影像中之每一像素 之十個係數。對於該影像中之—邊緣上的一點或像素,關 聯於該像素之係數之模數將為每一標度中的一最大值。關 聯於該邊緣像素之橫跨標度之十個係數之演進提供有關在 該點之影像之可推導性之資訊。使用一影像中之一像素之 小波係數,可對於該像素計算影像梯度。若考慮之像素屬 於該移動物件的-模糊邊緣,則該影像梯度之方向及強度 或量值提供有關模糊之方向及一物件之速度之資訊。由於 /梯又方向係與4邊緣或輪廟正交,所以其提供有關該物 狀形狀之資訊。類似地,_出自—列或行中之像素之低 階梯度值隱含擴散或平滑邊緣,而且高階梯度值隱含清晰 或相異邊緣。因此,影像梯度之強度及方向之直方圖提供 有關該等影像之考慮區域中之形狀及動作領域之資訊。 •橫跨標度之係數之演進之斜率曰總提供該像素之所謂 =pSChltz係數或指數的一估計。該指數係於所考 …像素:〜像之可推導性的一測量,進而為連續性之程度 的一測量。若此Lipschitz指數為正’則該像素位在一平滑 130766.doc -13- 200910221 或模糊邊緣。同等地,若該Lipsehitz指數為零或負,則該 像,位在—清晰邊緣。在數個標度上執行該小波變換有效 地提供該影像之二維度以外的-第三維度,而且該等係數 Ο200910221 IX. Description of the invention: [Technical field to which the invention pertains] The present invention relates to a method for judging an action-related feature relating to an action of an object, and a method for classifying an action for performing an action of an object. Furthermore, the present invention relates to a system for determining action related features relating to an object' and to a system for performing action classification. [Prior Art] One of the most natural means of communication among human beings is to swing with a hand or a head. It is also said that the potential is used as a communication to supplement, or as a means of communication itself. A simple amplification of one of the speeches can include, for example, giving another person a guide to move the hand and arm of the mouth to point to the _special direction. Communicate by communication & sample sign language alone, based entirely on gestures. Since gestures are the first instinct for human beings, for example, in a home dialogue system for automatic sign language interpretation, a dialogue system in an office environment, gestures will be a natural and easy means of communication between humans and computers. . Due to the development of 'in the past few years', the use of various dialogue systems will become more and more in the future, and the classification of users' swings (ie, gesture recognition) will play an increasingly important role. . Several gesture recognition systems have been proposed by the current technical methods. In general, it has been proposed that the system relies on image analysis to clamp a user in each of the image sequences and analyze the continuous pose to determine its hand gesture. Some systems require that the user wear a color suit or position sensor that will be a requirement that is most undesirable for a typical user of a home conversation system. For example, the other systems of the US 6,256, 〇33 B1 propose to capture a certain number of coordinates corresponding to a user's body from an image 130766.doc 200910221, and this:: should be known gestures The coordinates are compared. This is one of the obvious shortcomings. The computational effort required to analyze such images has a very high level, and the camera used to obtain such images must be sufficient to ensure a clear image. Other conventional systems attempt to solve the problem of gestures broadcasting "the problem of 労 労 4" which is operated by stereo images obtained with - or multiple cameras. Example Ο ^Touch 3/_218 proposes to use two cameras Obtaining a stereo image of the user as a gesture, and further filtering the background in the image, and tracking the segment using a statistical framework corresponding to segments of the image of the user's body The user is above the person, using this has been introduced, the calculation level required for gesture recognition is very high, so the potential: the knowledge will be very slow. Another shortcoming of such systems is related to Hard body (four) "_ machine or multiple cameras. For all conventional gestures, the user must slowly perform a handful of gestures to ensure that a certain image that maintains image quality is not blurred. I 茨等手(四)识说'The solution proposed by the current technology is too scary, too slow, or too cumbersome. In particular, the case is too intrusive, the user is easy to learn and use. #Need to be non-invasive to judge A user's swing: capital =,. Frequently, simply using the action of the action classification program to determine the appropriate effort is too expensive to be required for the b 4 system. [Invention] Therefore, the object of the present invention is to provide a judgment on the object in the image. 130766.doc 200910221 Action-related features and methods for classifying one object. Furthermore, one of the present inventions is easy and economical as a system. The present invention provides a method for characterizing a feature, the method comprising the act of obtaining an action of at least eight images of the images, and the order of the images. feature. "" to take a number from the image couch - whereby the term "image sequence", all captured images 2 = for example, borrowed * for example, for a selection with a fast shutter speed. The image may be sufficient, otherwise: the camera 'every nine cameras will be used' may use all images. The low level is determined by one of its action-related features I* , -3⁄4 ^ , an image of the object can handle 1 meal, or for example, when only the image ~, the image - some area. Obtaining only the shadow-related features, gradient values from the shadow, or 掏: as a modulo value, for example, any other meaning of a color y image that describes _pixels is basically in the calculation of the whistle term" The order "feature-feature n stage towel can be obtained from the image information and the value of the ladder value = * pixel color value can be directly taken from the image information, and the decorative value can be obtained by directly processing the image data. . The number of statistic values of the ΪΓ-element-order feature is calculated by combining the statistic value of the pre- or multiple dibasisms, and the correlation feature. The second-order action=the figure determines that the object's action-related feature can be given or retrieved from the straight-line 130766.doc 200910221 Γ=information' identifies the bar field in the image associated with the action of the object, for example, Made by a moving object, especially the 'ambiguous area of the histogram itself k. It is used here as a second-order action-related feature. For example, the term "object" is interpreted as meaning - the user, the body of the user, or the user - to book the body part of the kibs 4 - but can also be able to write ^ Any way to move any moving object, such as a pendulum clock, a machine" and a machine that makes = type. The following will be interchangeably slang "object Ο = == provide an implementation in an image The method of classifying the action of the action of the object captured in the sequence, and the object captured in the image is determined by the above method, and the action related feature is used, and the action of the object is performed using the action related feature knife. > +, + class, benefit J. The obvious advantage is that the action or gesture can be achieved. The image analysis required by the gesture recognition system is taken from @ and used in the method according to the present invention. The algorithm of the coucher is easy to obtain, and is familiar with this technology = two, for example, the set algorithm for image compression. It is easy to simply settle or count the statistical information of the feature in the histogram. Ground and very fast The analysis is carried out to obtain the relevant characteristics of the tracking operation, and the method of using the method of providing a clear advantage over the prior art identification method is not required. The method according to the invention does not require the modulo 1 = The object in the image (for example, - character), and not part of the body. Another advantage is that a network camera, such as a one-resolution camera, is quite sufficient for generating such images, allowing 130766.doc, 10. 200910221 The system according to the invention is implemented efficiently and efficiently. Furthermore, since there is no need to directly track the actual action of the object, a low frame rate is sufficient for the camera, so an inexpensive camera system is quite sufficient. A corresponding system for determining an action-related feature relating to an object includes an image source for providing an image sequence of the object, and a processing unit for processing at least a portion of the image to capture a number from the image First-order features; _ 士 + 瞀 S3 - 异 皂 , , , , , , , , , , , , , , , , , , , , , 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁 汁A statistical value is given to give a number of histograms, and & is used to determine the action-related features of the object based on the 4-histogram. Further, the corresponding system for performing the action classification includes determining the object! The action-related feature of this-system, and at the same time a classification as early as 'the use of the action-related features to classify the action of the object. Special σ's subsidiary request and subsequent descriptions of the disclosed embodiments and features. An image source (for example, a camera type of characterization information, such as Γ! like containing edge definition, etc., such as using grayscale values, round temples or - order features. For two: description = according to the present invention The square* knife is used to capture one of the uses in an image. When the object looks like a grayscale value of the cross-section m gold ^ noon action, for example, "moving time" may be for color or a... For example to track the color of the object. However, it is beneficial to take advantage of the ambiguity of the ~$景'---the & If the _ ° of an image is special - in front of the camera - gesture ❹ 二 2 ° this area corresponds to the hand or arm of a user like a gesture or other swing. In a particularly preferred embodiment of the invention, the order features are extracted from the image by fuzzy correlation features. For image processing techniques such as object or pattern recognition, the information given by the rim or edge in an image is particularly useful. Some edges in the image can be sharp (focus) or blurred (out of focus). For a camera with sufficient depth of field, the Τ blurred portion of an image area will indicate that the object in that portion of the image moves as the image is produced. Several technical systems are available for identifying edges in an image and in recent years techniques for wavelet transform have been developed for applications such as effective image compression. Those skilled in the art will be aware of this technique and will not be explained in detail here. In summary, the wavelet transform technique can be used to quickly and reliably identify the edges in the image and to obtain information as to the level of clarity of such edges. Therefore, in accordance with a preferred embodiment of the present invention, it is preferred that the fuzzy correlation feature is obtained from the image data by performing a wavelet transform with respect to a plurality of scales to determine the shadow material. - Point - series of small miles. Because the value of the wavelet coefficient looks at the difference between the values of the adjacent pixels and then the level of the difference is regarded as the level of the blur in the image, the = wavelet coefficient is a fuzzy correlation feature, which ultimately provides information about the image system. special. The use of fuzzy correlation features as a first-order feature is an important fact that will introduce vagueness into the image: the moving hand and the first class are used for positive effects. Blurring no longer provides a valuable information for a "effect" but for eight (for example, gesture recognition). "The type of wavelet transform used by the knife can look at the size of the image to a certain degree - I3〇 766.d〇c 12 200910221 Ο Ο Degree. For example, image data with a larger resolution of a larger image can be used as a round of the binary wavelet transform (DWT), whereas a continuous wavelet transform (plus) can be used to have only a relatively low resolution. Small size images, such as those produced by a typical webcam. The wavelet coefficients thus obtained can then be processed to obtain the blur related features of the image. Each pixel or point of the image corresponds to a wavelet coefficient' in each scale of the transform and each scale has the same number of points as the image. Therefore, after implementing the -10 scale conversion, there will be ten coefficients for each pixel in the image. For a point or pixel on the edge of the image, the modulus associated with the coefficient of the pixel will be a maximum value in each scale. The evolution of the ten coefficients across the scale of the edge pixels provides information about the derivability of the image at that point. The image gradient can be calculated for the pixel using the wavelet coefficients of one of the pixels in the image. If the pixel in question is the blurring edge of the moving object, the direction and intensity or magnitude of the image gradient provides information about the direction of the blur and the speed of an object. Since the /ladder direction is orthogonal to the 4 edge or wheel temple, it provides information about the shape of the object. Similarly, a low step value of a pixel from a column or row implicitly diffuses or smoothes the edge, and a high step value implies a clear or distinct edge. Thus, the histogram of the intensity and direction of the image gradient provides information about the shape and motion field in the area of consideration of the images. • The slope of the evolution across the coefficients of the scale 曰 always provides an estimate of the so-called =pSChltz coefficient or index for that pixel. The index is measured by a measure of the degree to which the pixel is: derivable, and thus the degree of continuity. If the Lipschitz exponent is positive, then the pixel is at a smooth 130766.doc -13 - 200910221 or blurred edge. Similarly, if the Lipsehitz index is zero or negative, the image is at the sharp edge. Performing the wavelet transform on several scales effectively provides a third dimension other than the two-dimensionality of the image, and the coefficients Ο

之廣進係於此第二維度中觀察。一般而言,一小波變換係 對於-系列像素(例如來卜組影像f料中之某一列或某 一行之像素)加以執行。因Λ,若可應用,則計算自所得 小波係數之LipSehitz錢及該影像梯度之強度與方向提供 ㈣-列或行中之鄰近像素之模糊之程度及該模糊之方向 之資訊。例如,一列像素的一組最負Lipschhz指數將指 不’此列像素包括定義良好之邊緣。$ 一方面,最正值之 Llpschltz指數將指示,此列中之像素間存在平滑轉變隱 含其缺乏清晰邊緣,因而可能’此列屬於該影像中的—模 糊區。 該小波變換可橫跨像素之連續列及像素之連續行加以執 二戶^該小波變換係對於每_影像或該影像的—部分執 仃’夕a例如以上所描述之指數及小波係數梯 度的-階特徵係對於每—小波變換運算(亦即,對於每— 已處理之列與行)加以計算。 收集由—階特徵所提供之資訊之—有效方式可為將 -階特徵之值進位至某些離散值,而且簡單地計數每—值 之出現數目。因此,根據本發明,將有關擷取自—影像之 至少一部分之某—種類之-階特徵之統計值(例如-特別 係數之出現數目)組合於此―階特徵的—直方圖中= 後’在—種用於獲得有關-影像中之—物件之動作之動作 130766.doc -14- 200910221 相關特徵之系統中,可將一計數器指派給每一離散係數 值,而且,當此值出現時,將該計數器增加一。如此累積 之值係結算或收集於該直方圖中,其可視覺化成一簡單長 條圖。 為了判斷由一使用者所作成之擺動或手勢,可例如藉由 一相機從開始至完成跟隨該使用者之擺動,以捕捉數個影 像(亦稱為圖框)。一般而言將此一動作或擺動稱為一手 勢,而且一般而言將該擺動中之各種階段或位置稱為姿 勢,不致以任何方式限制本發明之範疇。然後藉由一姿勢 順序給予-總體手勢,而且接著每—姿勢係於—影像順序 中捕捉。對於該使用者移動其右手及臂的—手勢或擺動, 該等影像將顯示該使用者及對於該影像之左邊的一模糊 區,該模糊區橫跨經歷時間之等影像而有效地"移動”。若 將一影像或圊框之順序成像在右上並且一者堆疊於另—者 則面,則所得三維區塊係藉由經歷時間之影像順序所給定 的-虛擬體積。可對於此—體積中之所有影像擷取一階特 徵,並將其組合,以給予有關整體順序之動作相關資訊。 因此在本發明之一較佳具體實施例中,例如藉由執行該 =中之影像之每-者之小波變換而對於該虛擬體積梅取 一階特徵’並將此等特徵之統計值組合於一或多 方圖中。 且 然而m所述,可*實際上僅該影像的一部分係感 興趣。因此’為了減少不必要計算,較佳者可僅處理實際 上感興趣之影像中之該等區域。例如如以上所描述,若 130766.doc -15- 200910221 該使用者僅以一手及臂作一手勢,而且包括此手及臂之影 像中之區僅為該總體影像的一部分,則可在一影像中識別 此區或特定區域,而且僅需處理該影像順序中之連續影像 之對應特定區域。此利用以下事實,一般而言一使用者在 一相機前面作一手勢時,或多或少將停留在相同地方。若 將該等影像或圖框成像在右上並且一者堆疊於另一者後 面’而且將該等連續圖框之每一者之特定區域成像為"連Guangjin is observed in this second dimension. In general, a wavelet transform is performed on a series of pixels (e.g., a column or a row of pixels in a group of images). Because, if applicable, the LipSehitz money from the resulting wavelet coefficients and the intensity and direction of the image gradient provide information on the degree of blurring of the neighboring pixels in the (four)-column or row and the direction of the blur. For example, a set of most negative Lipschhz indices for a column of pixels would mean that the columns of the column include well-defined edges. On the one hand, the Llpschltz index of the most positive value will indicate that there is a smooth transition between the pixels in this column, which implies that it lacks a sharp edge, so it is possible that this column belongs to the ambiguous region of the image. The wavelet transform can be performed across consecutive columns of pixels and consecutive rows of pixels. The wavelet transform is for each image or part of the image, such as the index and wavelet coefficient gradient described above. The -th order feature is calculated for each-wavelet transform operation (i.e., for each - processed column and row). The effective way to collect the information provided by the -order feature is to carry the value of the -order feature to some discrete value and simply count the number of occurrences of each value. Thus, in accordance with the present invention, statistical values relating to the -order features of a certain type of at least a portion of the image (eg, the number of occurrences of the special coefficients) are combined in the histogram of the "order" = after In a system for obtaining an action related to an object in an image - 130766.doc -14- 200910221 related features, a counter can be assigned to each discrete coefficient value, and when this value occurs, Increase the counter by one. The value thus accumulated is settled or collected in the histogram, which can be visualized into a simple bar graph. In order to determine the swing or gesture made by a user, a plurality of images (also referred to as frames) can be captured, for example, by a camera following the swing of the user from start to finish. Generally speaking, this action or swing is referred to as a gesture, and in general, the various stages or positions in the swing are referred to as postures and are not intended to limit the scope of the invention in any way. The overall gesture is then given sequentially by a gesture, and then each gesture is captured in the image sequence. For the user to move their right hand and arm-gesture or swing, the images will display the user and a blurred area to the left of the image, which effectively "moves across the time-lapse image If the order of an image or frame is imaged on the upper right and one is stacked on the other, the resulting three-dimensional block is the virtual volume given by the sequence of images undergoing time. For this - All of the images in the volume capture first-order features and combine them to give action-related information about the overall sequence. Thus in a preferred embodiment of the invention, for example, by performing the image of the = a wavelet transform to take a first-order feature for the virtual volume' and combine the statistical values of the features into one or more graphs. And, m, however, *only a portion of the image is actually of interest Therefore, in order to reduce unnecessary calculations, it is preferred that only those regions in the image of actual interest can be processed. For example, as described above, if the user is only one by one, 130766.doc -15- 200910221 The hand and the arm make a gesture, and the area in the image including the hand and the arm is only a part of the overall image, and the area or the specific area can be identified in an image, and only the continuous image in the image sequence needs to be processed. Corresponding to a specific area. This takes advantage of the fact that a user will more or less stay in the same place when making a gesture in front of a camera. If the images or frames are imaged on the upper right and one is stacked The other is followed by 'and the specific area of each of the successive frames is imaged as "

Ο 接式",則結果係一虛擬子體積。因此,在本發明之一特 別較佳具體實施例令,在一影像順序中識別一特定區域順 序’並且對於藉由經歷時間之特定區域順序所給^之子體 積榻取一階特徵。 -影像中之感興趣之特定區域之識別係藉由該影像的一 階特徵而輕易地達成,其例如藉由分析―影像順序之一者 之各種區之一種一階特徵之出現數目。例如,可將整體影 像切割或細分成數個片段或發光方塊,較佳者以該片段或 發光方塊重疊之此一方式。對於每一片段或發光方塊,可 編譯-種-階特徵的一子影像直方圖。例如,針對 _伽係數之大部分零值或正值的—子影像直方圖指 示’該影像之此區含有描钿。m 有模糊因此,可快速地識別含有模 糊之影像之所有區。可形成包括此等區的一選擇,該等區 可藉由圍繞一影像中之一槿超日狡氣辟 杈糊移動臂所繪製的一假想矩形 而視覺化,並且例如藉由此矩形之影像座標 =影像座標可用以定位-順序之所有影像中之相同; 寺 域。$等地,譬如對於具有不同於該等影像中之背景 130766.doc • 16 - 200910221 一色彩的一移動物件,每-發光方塊或片段可藉由 像素色彩值加以分析。僅含有某一色彩之像素之發光方塊 可:為感興趣’所以可忽視剩餘之發光方塊。 —感八趣之特定區域之步驟可以規則性間隔加以實 見所以該特疋區域追縱該等影像中之移動物件。以此方 式,甚至可分析該整體影像上牽涉該使用者之擺動之動作 或手勢。可將任何動作或手勢分解成一系列基本上相異之 Γ Ο 連序階段,當一起採用時’其給予總體動作或手勢。因 此以下,術§吾"位置"指在_影像順序中捕捉之一物件或 使用者之動作中之任何階段。 可對於-影像順序之每一特定區域編譯一或多個一階 徵之統計值之直方圖。因此,當一起考慮時’此等直方圖 中之資訊係該特定區域順序之特性。而且,由於一影 序可用以判斷所保持的一位置或所作成的一姿勢,因此— 特定區域順序之直方圖之資訊係所保持之位置或所作成之 安勢之特性。因此’在本發明之一較佳具體實施例中,將 有關榻取自該影像順序中之特定區域順序之一類型之—階 特徵之統叶值組合於該特徵的—體積直方圖中。描述 圖 牛之位置、一姿勢或一手勢之一部分之此一體積直方 將用於以下較詳細描述之動作分類程序。 雖然藉由該特定區域所定義之影像之區含有有關作成之 動:或手勢之類型之有用資訊,但亦可使用該影像 Η互補’’區域)。該互補區域基本上包括不包含 、 區域之像音. , n A犄疋Ο Connected ", the result is a virtual sub-volume. Accordingly, in a particularly preferred embodiment of the present invention, a particular region sequence' is identified in an image sequence and a first order feature is obtained for a child volume given by a particular region sequence of elapsed time. The identification of a particular region of interest in the image is easily achieved by the first-order features of the image, for example by analyzing the number of occurrences of a first-order feature of the various regions of one of the image sequences. For example, the overall image can be cut or subdivided into a plurality of segments or illuminating blocks, preferably in such a manner that the segments or illuminating blocks overlap. For each segment or illuminating block, a sub-image histogram of the species-order features can be compiled. For example, a sub-image histogram for most of the zero or positive values of the _ gamma coefficient indicates that this region of the image contains a tracing. m is blurred so that all areas of the image containing the blur can be quickly identified. A selection may be formed that includes such zones, which may be visualized by an imaginary rectangle drawn around a moving arm of one of the images, and for example by the image of the rectangle Coordinate = Image coordinates can be used to locate the same in all images in the order; Temple field. $, etc., for example, for a moving object having a color different from the background 130766.doc • 16 - 200910221 in the images, the per-lighting square or segment can be analyzed by pixel color values. A light-emitting block containing only pixels of a certain color can be: interested, so the remaining light-emitting blocks can be ignored. - The steps of a particular area of interest can be observed at regular intervals so that the special area tracks the moving objects in the images. In this way, it is even possible to analyze the motion or gesture of the user involved in the overall image. Any action or gesture can be broken down into a series of substantially different Γ 连 sequential stages that, when used together, give the overall action or gesture. Therefore, the following § 吾 "Location" refers to any stage in the action of capturing an object or user in the _image sequence. A histogram of one or more first-order statistical values can be compiled for each particular region of the image sequence. Therefore, when considered together, the information in these histograms is characteristic of the order of the particular region. Moreover, since an image can be used to determine a position held or a gesture made, the information of the histogram of the order of the particular region is the position maintained or the characteristics of the resulting security. Thus, in a preferred embodiment of the present invention, a leaf value relating to a type of order from a particular region in the sequence of images is combined in a volume histogram of the feature. Description The position of the cow, a pose, or a volumetric one of a gesture will be used in the action classification procedure described in more detail below. Although the region of the image defined by the particular region contains useful information about the type of creation: or gesture type, the image can also be used to complement the ''region). The complementary region basically includes an image that does not contain, the region. n A犄疋

象素因此,在本發明之一較佳具體實施例中,A I30766.doc -17- 200910221 識別—影像中的一特定 且,對;p r °Π域#,亦識別其互補區域,而 之某—種類-階特徵之_值=掏取自該互補區域順序 體積直方圖。 、冲值、、且。於此-階特徵的一互補 ★ 一影像之特定區域及/或互補區域係藉由對於 二::解釋之影像加以判斷的-階特徵而識別。例二 Γ Ο 之靖)區域。各種增量.、寅算::貝#,以便識別感興趣 技術者所知道。 可取得’而且係熟諸此 2-影像順序之動作相關特徵可用以分類 ==像順序t捕捉之動作。例如,一動作 做了為兩直方圖間的一” 胜〜》 距離此一距離可在一影像之一 4品V與對應互補區域之直方圖間計算,或者其可 影像順序中之連續 * ” 〜 標準統計分佈技術,或者藉由比較該等 =之主要性質(例如在模糊相關—階特徵之情況中之 且=度方向或主要Lipschitz係數),或者該直方圖之任何 ”_早性質。最終’此等距離係相對於該等影像區 作相關特徵之改變程度的一指示。 如以上所描述,可看見—料順序對應於—物件 =跡的-特別位置或固定部分,或者對應於一使用者 斤:成之-手勢的-特別姿勢,所以,根據本發明,如以 f述使用該影像之-階特徵所產生之子體積直方圖及 互補體積直方圖(以下集體簡稱為”體積直方圖”)最線可γ I30766.doc -18- 200910221 述或特徵化動作中之-物件之位置。因此,在根據本發明 之一較佳方法中,分析此類體積直方圖,以獲得位置特性 特徵,其可用以分類該動作。例如,像一影像順序之一體 積直方圖中之正直方圖值對負直方圖值之比率之位置特性 特徵可給予該物件之動作之位置及方向的一指示。另一類 型之位置特性資訊可為不同影像順序之二體積直方圖間的 一距離。 Ο u 使用-體積直方圖的-方式可為,對於各種不同動作順 序或者對於各式各樣手勢,比較該體積直方圖(或該體積 直方圖之-導數)與前面已產生資料。例如,可將對於藉 由衫像順序加以追蹤之一姿勢所獲得的一體積直方圖與 手勢模型之一收集(例如數個手勢之狀態轉變模型之一收 集)之數個”原型姿勢"比較。以此方式,可判斷-或多個候 :手勢,其包括對應於該體積直方圖之姿勢。藉由識別— 姿勢順序之體積直方圖之連續原型,可將候選手勢之數目 向下窄化’直到已識別最可能對應於姿勢順序之手勢。 因此’在本發明之一較佳具體實施例中,一動作之位置 特性特徵(獲自—影像順序之分析)用以識別基於一動作或 手勢之位置或姿勢子星— 一 單位之一生成模型(例如,一動作之 一狀態轉變模切1Μ __ U & )的一狀態。此一狀態對應於該動作中 一特別位置,你I 士„ Jifel ,對應於一使用者之某一姿勢,或者對 應於一物件之動作φ 中之某一階段。熟諳此技術者所熟知的 一較佳狀態轉變模型係隱藏性Markov模型(ΗΜΜ),而且其 用以判斷在-手勢或擺動中某一位置繼而另一位置之: 130766.doc •19· 200910221 率 所等直方圖導出較簡單特徵。譬如,如以上 -距離。+體積直方圖與其互補體積直方圖間的 單Li可應用用於從-系列體積直方圖獲得-實數數字之統計分佈比較之任何適當方法 . 用作至一適合演算法(例如-增量演算法 便建立-差別㈣。 …以/伙輸入,以 〇 ^此’在本發明之另—特別較佳具體實施例中,計算如 以上所描述使用一影像順序中之一 一體積直方圖與如以^順序所產生之 、士、Γ·丄、 更用影像順序中之對應互 補區域順序所產生之一互補體 ’’、 + M ^ 万圖間的—距離,以給 予:位置特性特徵,然後可將該特徵用於動作分類程序。 ,,^ ^ ,甘+ ^ § 4等直方圖(二階動作 相關特徵),其亦可視為一種類 假如原型位置係藉由此類型之特徵而^動作相關特徵。 簡單之三階特徵,可識別對應 吏用此專 j 才莫型。 ㈣中之位置的—動作 以上所描述用於獲得動作相關特徵及心 之方法之步驟可以軟體模組之形 乍为類 ❿八加U實現,該等 組係運行在分別用於獲得動作相關特徵之 用於執行動作分類之一系統的—處 …系、,先或者 如某些影像處理步驟,像在擷取模糊相關特徵ϋ例 小波變換)可以硬體之形式(例如像— 月/中之 路(ASIC)或場可程式閘極陣列(FP(3A))加以實現’’、用積體電 130766.doc •20- 200910221 從以上結合隨附圖式考慮的詳細說明中,本發明的盆他 目的及特徵將變得易於理解。然而應瞭解,圖式僅係針對 °兒明目的而⑦計’並非作為限制本發明之定義。 【實施方式】 圖1顯不一使用者1在一相機2(例如一網路攝影機2)前作 手勢。此處_,該使用者i係以動作M (如__手勢或一手勢 之〇卩分)之方向移動其臂。該網路攝影機產生該使用者iPixel Thus, in a preferred embodiment of the invention, A I30766.doc -17- 200910221 identifies that a particular one of the images, pr ° Π domain #, also identifies its complementary region, and one of - _ value of the genre-order feature = 顺序 from the complementary region sequential volume histogram. , value, and. A complement to this-order feature ★ A particular region and/or complementary region of an image is identified by a - order feature that is judged for the second:: interpreted image. Example 2 Γ Ο Jing) area. Various increments, and calculations::#, in order to identify the interested parties know. Action-related features that can be obtained and are familiar to this 2-image sequence can be used to classify == actions that are captured in order t. For example, an action is made as a "win" between two histograms. This distance can be calculated between one of the four images V and the histogram of the corresponding complementary region, or it can be consecutive* in the image sequence. ~ standard statistical distribution techniques, either by comparing the main properties of the = (for example in the case of fuzzy correlation-order features and = degree direction or main Lipschitz coefficients), or any "_ early nature of the histogram. Finally 'The distances are an indication of the degree of change in the relevant features relative to the image areas. As described above, the visible order corresponds to - the particular position or fixed portion of the object = trace, or corresponds to a use In other words, according to the present invention, the sub-volume histogram and the complementary volume histogram (hereinafter collectively referred to as "volume histogram" generated by using the image-order features of the image are described as follows. The most line γ I30766.doc -18- 200910221 describes or characterization of the position of the object. Therefore, in a preferred method according to the present invention, such a volume histogram is analyzed, Obtaining a position characteristic feature that can be used to classify the action. For example, a position characteristic feature such as a ratio of a positive histogram value to a negative histogram value in a volume histogram of an image sequence can give a position and direction of the action of the object An indication of another type of positional property information may be a distance between two volume histograms of different image sequences. Ο u Using a volumetric histogram may be for various sequences of actions or for a wide variety of Gesture, comparing the volume histogram (or the derivative of the volume histogram) with the previously generated data. For example, one of the volume histograms and gesture models obtained by tracking one of the poses by the shirt image sequence Several "prototype poses" are collected (for example, collected in one of several state transition models). In this way, one or more candidates can be determined: a gesture that includes a gesture corresponding to the volume histogram. By recognizing the continuous prototype of the volume histogram of the pose order, the number of candidate gestures can be narrowed down until the gesture most likely to correspond to the pose order has been identified. Thus, in a preferred embodiment of the invention, the positional characteristic of an action (obtained from the analysis of the image sequence) is used to identify a position or pose sub-star based on an action or gesture - a unit to generate a model (For example, one of the states of the state transitions die-cut 1Μ __ U &). This state corresponds to a special position in the action, and you, I, Jifel, corresponds to a certain posture of a user, or corresponds to a certain stage of the action φ of an object. It is familiar with one known to the skilled person. The preferred state transition model is a hidden Markov model (ΗΜΜ), and it is used to judge a position in a gesture or swing and then another position: 130766.doc •19· 200910221 Rate histogram to derive simpler features For example, the above-distance. The single Li between the volume histogram and its complementary volume histogram can be applied to any suitable method for obtaining a statistical distribution comparison of real numbers from a series of volume histograms. Method (eg, - incremental algorithm established - difference (four). ... / / input, in the other embodiment of the invention - particularly preferred embodiment, the calculation uses an image sequence as described above The volume histogram and the sequence of the corresponding complementary regions in the order of the images produced by the order of the singular, Γ·丄, and the corresponding complementary regions in the image sequence are generated between the complementary bodies '', + M ^ 10,000 graphs - To give: a location characteristic feature, which can then be used in the action classification program. , , ^ ^ , Gan + ^ § 4 and other histograms (second-order action-related features), which can also be regarded as a class if the prototype position is By the characteristics of this type, the action related features. The simple third-order feature can identify the corresponding use of this special j. (4) The position of the action - the action described above for obtaining the action-related features and heart The steps may be implemented in the form of a software module, which is operated in a system for performing action classification, for performing a motion-related feature, for example, first, or as a certain These image processing steps, such as wavelet transforms in the case of fuzzy correlation features, can be implemented in the form of hardware (such as image-month/intermediate circuit (ASIC) or field programmable gate array (FP(3A)). ', the use of integrated electrical power 130766.doc • 20- 200910221 From the above detailed description with reference to the drawings, the purpose and features of the present invention will be readily understood. However, it should be understood that the drawings are only for Purpose 7 is not intended to limit the definition of the present invention. [Embodiment] FIG. 1 shows that the user 1 gestures in front of a camera 2 (for example, a network camera 2). Here, the user i is acting The direction of M (such as __ gesture or a gesture) moves its arm. The webcam generates the user i

的景v像fl,並將此轉遞至一適合之處理單元(該圖中未顯 示)’例如一個人電腦。可使用任何適合類型之相機,而 且甚至具有—32G χ彻像素之典型解析度的—低成本網路 攝影機2係足以作為該用途。 該使用者1可作關聯於普遍出現在該使用者丨與一對話系 統間之一對話之命令之手勢,例如,”停止互動"、”等 ’"回碩 繼續、求助”等。例如在吵雜環境中,戋 者田該使用者1係藉由手語溝通時,此摒除對於語音互動 之需要。該使用者亦可提供額外手勢資訊以補充一 口說命 令,例如,藉由指往-特別方向,及口頭指導該相機往該 方向看’例如,藉由說"看這邊”。 該網路攝影機2可以或多或少規則性間隔產生該使用者工 之影像’而給予如圖2a中所示的一影像順序^ f f2, ..., fn. 解說已提出方法之下列範例令的一階特徵係模糊相關特 徵,其係因為以上詳細描述之優點所選出。不致以任何方 式限制本發明,假設,描述一種手勢分類之方法 130766.doc 21 200910221 如圖2a中所解說,已在該影像 京4冢順序fi,匕,.,.,匕中捕捉之 使用者1之各種快昭一把姑田to h 裡呎… 起私用使用者之此等快照,以組 &而形成一總體手勢中 — 一 野1T I杲位置或姿勢。然後對一影像 6執行邊緣㈣’其係藉由對該影像fl中之像素之列與行 實現一小波變換。端視該影像£1之尺寸及該相機2之解析 度’使用每隔九個之像素可能足夠,或者必需使用每一像 素以此方式獲得每-像素的一組小波係數,其含有有關The scene v is like fl, and this is forwarded to a suitable processing unit (not shown in the figure), such as a personal computer. Any suitable type of camera can be used, and even a low cost network camera 2 with a typical resolution of -32G pixels is sufficient for this purpose. The user 1 can act as a gesture associated with a command that is commonly present in a conversation between the user and a dialog system, for example, "stop interaction", "etc.", "return to continuation, help", etc. For example. In a noisy environment, the user 1 is connected by sign language, which eliminates the need for voice interaction. The user can also provide additional gesture information to supplement a spoken command, for example, by pointing to - Special direction, and verbally instruct the camera to look in that direction', for example, by saying "seeing this." The webcam 2 can generate the image of the user's work with more or less regular intervals' and gives an image sequence as shown in Fig. 2a ^ f f2, ..., fn. The following examples of the proposed method are explained The first-order features of the order are fuzzy correlation features, which are selected because of the advantages of the above detailed description. Without limiting the invention in any way, it is assumed that a method of gesture classification is described 130766.doc 21 200910221 As illustrated in Figure 2a, the user has been captured in the sequence of fi, 匕, ., . 1 of all kinds of fast 一把 一把 a aunt to h 呎 起 起 起 起 起 起 起 起 起 起 起 起 起 起 起 私 私 私 私 私 私 私 私 私 私 私 私 私 私 私 私 私 私 私 私 私An edge (four) is then performed on an image 6 by performing a wavelet transform on the columns and rows of pixels in the image fl. It may be sufficient to look at the size of the image £1 and the resolution of the camera 2 using every nine pixels, or it is necessary to use each pixel to obtain a set of wavelet coefficients per pixel in this way, which contains

鄰近像素間之差異之資訊,亦即,藉由模糊之位準所給定 之動作資訊。 對於每-組之小波係數,導出離散模糊相關特徵,在此 情況中為該Lipsehitz指數及影像梯度。計數—階特徵之每 值之出現’例如出現一1>5之Upschitz係數或—〇、之影 像梯度值之次數,並且以該一階特徵的一直方圖I、 加以收集。因此所得之直方圖Hlc、%含有有關總體影像 f丨之模糊相關資訊’例如模糊之影像匕之比例,或者該影 像fi中之邊緣之清晰度之位準。 可對於該順序中之剩餘影像f2,…,fn之每-影像重複此 程序然'而’在整體影像上執行邊緣分析係浪費資源,而 且將各易流於迂迴。譬如,可使用一類型之"滑動視窗”虛 擬地仃進於該影像資料上,其給予較小子視冑,以實現該 J波菱換。該滑動視窗之維度亦可經歷時間而變動。為了 容易分類,圖2b顯示減少計算努力、較易於視覺呈現之另 技術此處,將一影像L虛擬地分割成較小區段、發光 方塊,或子影像。可對該總體影像fi之子影像之每一者中 130766.doc •22· 200910221 之每—者執行邊緣㈣。其顯示一此子影像2。, =其-階特徵直方圖SHLC、SHG。由於該程序係對於 琢影像f丨中之每—子 該等子影像之每:以此類直方圖係對於 :像之每—者所導出 '然後可分析此等,以決定該 者之何者係實際上感興趣。由於作一手勢之一使用 =有:元素係其移動之四肢,例如臂、手等,而且由於 部八;蕤Γ伴隨某一位準之模糊,所以該影像fl之"有趣" Γ Ο :二° a檢查該等-階特徵直方圖而輕易定位,以判斷 /、何者係—模糊子影像 笙女h A τ1 如以上所述,該影像之此 二分可藉由對該影像資料使用該類型之”滑動視窗” 輕易定位,有效地最小化該計算努力。 =圖已在該影像f〗中定位二此類子影像21'22。 …、地’可將該影像。細分成較此處所示更多之子影像, :且=等子影像亦可重疊。然而,為了簡單性之緣故,僅 貝不感興趣之二子影像21、22。此等子影像21、22組合以 特^區域A,,在該圖形中藉由—厚矩形加以指示。 位於一相機或網路摄寻彡她I 2 堂七… 攝心機别面並且作手勢的-使用者通 常=或少將站立在相同地方,而且僅移動其手及臂之一 或兩者。因此,可將在一爭 像或圖框中識別之特定區域從 :::中的一影像簡單地傳播至下-者,以便定義-順序 2繼影像或圖框之每—者中的一特定區域。此節省計 資源’因為經歷時間,僅該等特定區域中之影像 著地改變。其餘影像(該"互補"區域)全部意 圖並且打鼻保持相同,而且可視為固定。 130766.doc -23- 200910221Information about the difference between adjacent pixels, that is, the motion information given by the fuzzy level. For each-group wavelet coefficient, a discrete fuzzy correlation feature is derived, in this case the Lipsehitz exponent and image gradient. The occurrence of each value of the count-order feature', e.g., the number of Upschitz coefficients of a 1>5 or the number of image gradient values of 〇, and is collected by the histogram I of the first-order feature. Therefore, the obtained histogram Hlc, % contains the fuzzy related information about the overall image f丨, such as the ratio of the blurred image ,, or the level of sharpness of the edge in the image fi. Repeating this procedure for each image of the remaining images f2, ..., fn in the sequence, while performing edge analysis on the overall image is a waste of resources, and will be easy to flow around. For example, a type of "sliding window" can be used to virtually break into the image data, which gives a smaller sub-view to achieve the J-wave transform. The dimension of the sliding window can also vary over time. For ease of classification, Figure 2b shows another technique for reducing computational effort and easier visual presentation. Here, an image L is virtually divided into smaller segments, light-emitting blocks, or sub-images. Sub-images of the overall image fi can be Each of the 130766.doc •22· 200910221 performs an edge (four). It displays a sub-image of 2., = its histogram histogram SHLC, SHG. Since the program is for the image f丨Each of these sub-images: in such a histogram system: the image is derived from each of them - and then can be analyzed to determine which of the person is actually interested. As one of the gestures Use = Yes: The element is the limb of its movement, such as the arm, the hand, etc., and because of the octave; 蕤Γ accompanied by a certain level of blur, so the image fl "interest" Γ Ο: 2 ° a check Equivalent-order feature histogram To determine /, which is - fuzzy sub-image prostitute h A τ1 As described above, the binary of the image can be easily located by using the type of "sliding window" for the image data, effectively minimizing the calculation Efforts. = The map has positioned two such sub-images 21'22 in the image f. ..., ground' can subdivide the image into more sub-images than shown here: and = sub-images can also be Overlap. However, for the sake of simplicity, only the two sub-images 21, 22 are not of interest. These sub-images 21, 22 are combined with a special area A, which is indicated by a thick rectangle in the figure. Camera or internet camera 彡 her I 2 堂7... The camera is fascinating and gesturing - the user usually = or less will stand in the same place, and only move one or both of his hands and arms. Therefore, you can The specific area identified in a contention or frame is simply propagated from an image in ::: to the next, in order to define - a specific area of the sequence 2 or each of the frames. Counting resources' because of the time, only the shadows in these specific areas Changing the rest of the image (the " complementary " region). All intent remains the same and the hit the nose, and may be considered as a fixed 130766.doc -23- 200910221.

此係解說於圖3a中,其顯示—者在另—者後面而垂直堆 且的〜像順序匕,f2,…,fn。若繪製虛線以連接此等圖框 L f2’…’ fn之角洛,則得到一虛擬體積V。將第一圖框G 中所示之特定區域Al傳播穿越至所有後繼圖框^…,I 因而此專圖框f 2, f PI J- b a 2, ·, Π在相冋相關位置具有其特定區域This system is illustrated in Fig. 3a, which shows the image sequence 匕, f2, ..., fn which are vertically stacked after the other. If a dashed line is drawn to connect the corners of the frames L f2'...' fn, a virtual volume V is obtained. The specific area A1 shown in the first frame G is propagated to all subsequent frames ^..., I thus the frame f 2, f PI J- ba 2, ·, Π has its specific position at the relevant position region

Ah…,An。再次,若繪製虛線以連接此等特定區域A。 A2,…,An之角落,則建立一虛擬子體積%,其排除該等影 像L f2’…,fn之互補區域Al,,紀...,A:。如在圖城沘 下已描述,一階特徵直方圖係對於該等圖框k ...,匕之 特定區域Als A2, ·..,乂中之子影像之每—者加以計算。藉 由將此等直方圖組合成對應體積直方圖I、% 此等直方圖中所含之資訊擴充至亦包含時間。換言之,亦 經歷時間追縱該等影像之特^區域中之模糊之位準之改 變。此係圖形式解說於圖外中,其中已將圖框順序^ L…,fn之影像資料疊加於影像3〇中,而且在區Μ中發現 Ο 由該使用者於作-手勢時所作成之所有"有趣"資訊, 應於該影像順序fi,U中之特定區域A, A : 在圖4中,顯示一動作或手熱的 飞于勢的一狀態轉變模型G,此情 況中為一離散隱藏性Mark〇v模型(Hmm ^ c ^ T —有限數目 去1::12、S3、S4對應於一動作之數個原型位置,或 :之文勢A類型之模型用以分類—動作,例如: 由該使用者所作成的一手勢。轉變係藉由從一狀 另一者之機率(如所示,例如藉由機率p(s 了 >33)、叩4.>84),其給予分別從 ^ 4)、响- 狀態4、從狀態1 130766.doc •24· 200910221 至狀態3及從狀態4至狀態4步進或進行該轉變之機率的一 測量)而加權。 例如,可將由該使用者作成並且在一影像順序中捕捉之 一特別姿勢的一體積直方圖與關聯於各種手勢之原型姿勢 的一收集比較。該圖形顯示該狀態轉變模型G之狀態~、 S2、S3、S4之四原型姿勢直方圖PH丨、pH2、pH3、pH4,但 可有任何數目之此類狀態轉變模型可取得,其每一者可關 聯於各種原型姿勢。此等原型姿勢直方圖pHi、pH:、 PH3、PH*可於前面已使用一適合之學習演算法加以產生。 Ο 當比較一姿勢的一體積直方圖與該原型姿勢時,可發現 最類似於例如關聯於該狀態轉變模型之狀態&之原型姿勢 PH】。該比較簡單地牽涉計算一距離,如以上已描述,該 距離介於該體積直方圖與該原型直方圖之間,而且將此距 離轉換成-機率,其係介於_間的一數目。可發現經歷 時間所收集之後續體積直方圖對應於連序狀態I、&、 s4’所以由該使用者所作成之手勢將很可能對應於藉由此 狀態轉變模型(3加以模型化之手勢。在另一體積直方圖不 對應於下—狀fS2之事件中,該分類程序可包含 轉變模型G不對應於一候選手勢。 ^ 圖5顯示用於執行手勢分類之一系統4的一方塊圖,不致 以任何方式限制本發明之範嘴,該系統包括—相機2,用 得—使用者1的-影像順序…,,⑽該圖形中, =不-單-影像fl)。每一影像係先在一處理單元$中處 以獲得數個模糊相關特徵12。在此具體實施例中,該 130766.doc -25- 200910221 Ο Ο ^理牽涉對該像資料實現—小波變換,以獲得LipscHtz 心數及小波係數梯度。然後將此等模糊相關—階特徵^轉 遞至- δ十算早x 6,其中針對該等一階特徵Η計算統計值 14在此e十算單凡6中,如適當,於計數或結算每一離散 值之出現前’先將該等一階特徵以之值進位或捨去至一組 預定義離散值(例如_〇.5、〇·〇、〇 5、1〇等)之最接近者'、、, =給=個計數…在—組合單元15中,將該等計數14組 二二給予數個直方圖HLC、hg,各用於每-類型之模糊 、、中為—個(Llpschltz指數及小波係數梯 -可視為有關該影像之一種動作相關特徵。在 等直方圖一可用以導出另外動作相關特 = 直方圖HLC、HG中之正值總數對零或負值Ah..., An. Again, draw a dashed line to connect to these specific areas A. A corner of A2, ..., An, a virtual sub-volume % is created which excludes the complementary areas Al, A, ... of the images L f2'..., fn. As described in the figure, the first-order feature histogram is calculated for each of the frames k ..., the specific regions Als A2, ·.., and the sub-images in the frame. By combining these histograms into corresponding volume histograms I, %, the information contained in these histograms is expanded to include time. In other words, it also experiences time to track changes in the level of blur in the special areas of the images. This diagram is illustrated in the figure, in which the image data of the frame sequence ^ L..., fn has been superimposed on the image 3〇, and it is found in the zone Ο that the user made the gesture All "fun" information should be in the specific region A, A of the image sequence fi, U: In Figure 4, a state transition model G of an action or hand heat is shown, in this case A discrete hidden Mark〇v model (Hmm ^ c ^ T - finite number of 1::12, S3, S4 corresponds to several prototype positions of an action, or: model of the type A of the genre is used for classification - action For example, a gesture made by the user. The transition is by the probability of being the other one (as shown, for example, by probability p(s > 33), 叩 4.. 84) It is weighted separately from ^4), ring-state 4, from state 1 130766.doc •24·200910221 to state 3, and from state 4 to state 4 stepping or a measure of the probability of making the transition. For example, a volume histogram created by the user and captured in a particular sequence of images can be compared to a collection of prototype gestures associated with various gestures. The graph shows the state of the state transition model G ~, S2, S3, S4 four prototype pose histograms PH丨, pH2, pH3, pH4, but any number of such state transition models can be obtained, each of which Can be associated with various prototype poses. These prototype pose histograms pHi, pH:, PH3, PH* can be generated previously using a suitable learning algorithm. Ο When comparing a volume histogram of a pose with the prototype pose, the prototype pose PH that is most similar to, for example, the state & associated with the state transition model can be found. This relatively simply involves calculating a distance, as described above, between the volume histogram and the prototype histogram, and converting this distance to a probability, which is a number between _. It can be found that the subsequent volume histogram collected by the elapsed time corresponds to the sequential state I, & s4' so that the gesture made by the user will likely correspond to the state transition model by this state (3 In the event that another volume histogram does not correspond to the lower-like fS2, the classification procedure may include that the transition model G does not correspond to a candidate gesture. ^ Figure 5 shows a block diagram of one of the systems 4 for performing gesture classification. Without limiting the scope of the present invention in any way, the system includes - camera 2, used - user 1 - image sequence..., (10) in the graph, = not - single - image fl). Each image is first acquired in a processing unit $ to obtain a plurality of fuzzy correlation features 12. In this particular embodiment, the 130766.doc -25- 200910221 牵 牵 牵 牵 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 对该 。 。 。 。 。 。 。 。 。 。 Then, the fuzzy correlation-order features ^ are forwarded to - δ ten counts early x 6, where the statistical value 14 is calculated for the first-order features 在 in this case, if appropriate, in counting or settlement Before the occurrence of each discrete value, the first-order features are first carried or rounded to the nearest set of predefined discrete values (eg _〇.5, 〇·〇, 〇5, 1〇, etc.) ',,, = give = counts... In the - combination unit 15, the counts of 14 groups are given to a number of histograms HLC, hg, each for each type of blur, and for - ( The Llpschltz index and the wavelet coefficient ladder can be regarded as an action-related feature of the image. The equal histogram can be used to derive the additional action correlation. The total number of positive values in the histogram HLC, HG is zero or negative.

總數之比率可當作該直方圖HLC、HQ的M 而供應。同等地,—直方㈣作相關特徵 直方圖Hlc、HG本身可用作一 β特徵Fm。該組合單元15可包括一分離分析單元,用於執 Z何要求之直方圖分析,但在解說中將其顯示成一單一 目前為止描述之單元及組塊2、5、6、 用於判斷動作相關特徵,而 -系統3’ 系統4之前端。 ^括用於執行手勢分類之 將該等直方圖HLC、H(J轉遞至一第 以收集,並且與前面直方圖組合,:早:’其中加 影像順序中之前面影像之影像資料或:::直=係從該 Λ導出。然後該第 130766.doc •26- 200910221 二組合單元8之輸出係一組體積直方圖VHlc、VHg、 VHLC'、VHG,,其中該等體積直方圖VHlc、VHg對應於在 藉由該相機2所產生之影像順序中所識別之感興趣之特定 區域,而且該等互補體積直方圖VHlc,、VHg,對應於該影 像順序中之互補區域。 Ο Ο 在第二組合單元8中,處理對應於該使用者丨於執行該手 勢時所作成之某些姿勢之體積直方圖VHlc、V%、 vhlc,、VHg,,以獲得位置特性特徵Fp,例如該等體積直 方圖VHlc、VHg間之距離,或者一體積直方圖VHLC、VHG 與其互補體積直方圖VHu:|、VHg,間之距離。 將該等體積直方圖VHlc、VHg、VHlc,、vtv及該等位 置特性特徵Fp轉遞至一分類單元7,纟中加以分析以分類 該使用者所作成之手勢。最後,從候選手勢的-資料庫9 型姿勢直方圖ΡΗι、ΡΗ2、叫、嗎之原型位 : A 13 °可以某種適合方式比較-體積直方圖 ρΗ=、VHG、VHLC,、VHg,與一對應原型直方圖PHl、 4卩_斷數個候選手冑。藉自比較連續體 積直方圖人疋、只肢 經歷時間irAvs動其所作手勢之全部姿勢時 數目,吉又于τ向下窄化來自該資料庫之候選手勢 數目直到識別最可能,a .gg < # 作-適合信號10轉遞至另二έ。可將該分類之結果當 模組。 处理組塊11,例如一手勢解譯 雖然顯示該第二ό 苗 -部分&单元8用於執行動作分類之系統4的 亦可假想將此第二組合單元8整合於用於判斷動 130766.doc •27· 200910221 作相關特徵的一系統3中。 圖6顯示根據本發明之動作分類之方法中之主要步驟的 一方塊圖,其用於根據模糊相關一階特徵之分析的一具體 實施例。於一第一步驟5〇〇,從例如一相機的一影像來源 獲知影像600。在一邊緣偵測組塊5〇1中對該影像6〇〇執行 邊緣偵測,例如藉由對該影像資料執行小波變換,以給予 模糊相關資訊6〇1 M列如一組小波係數6〇1。Ay系於一特 _取組塊繼中處理,以給每-組小波係數6(H的-組— ’ 階特徵602。在—直方圖編譯組塊503中,計算一階特徵 602之統•值,*且編譯#彡像之―階特徵之數個直方圖或 體積直方圖603 . 一影像、一影像之-特定區域' -影像 順序,或-影像之-特定區域順序。將該等直方圖6〇3輸 入至-直方圖分析組塊5〇4,其中從該等直方圖6〇3導出動 作相關特徵或位置特性特徵6〇4。接著將此等特徵_輸入 至一動作分類組塊505,其中其用以判斷對應位置所屬之 ϋ 冑作。一旦完成該分類程序’-輸出信號605可指示該已 識別動作,例如,由一使用者所作成的-手勢,或者該動 作分類是否已失效。 亥動 «本發明已經以較佳具體實施例及其變動之 揭示,但將瞭解,可對其進行眾多額外修改 乂 偏離本發明之範脅。本發明可用於其令可取得不: 及具有充分計算資源之一電腦 4早相機 亀知之任何系統。例如 《明之手勢辨識之方法及系統可用以辨識手語手勢^ 將此等轉換至文子。另―實施方案可為—監视系統,用 I30766.doc -28- 200910221 以分類及辨識不㈤類型之移動物件,例如人、車等。另一 可能實施方案可能在自動生產程序中,其中根據本發明之 系統及方法可用以追蹤移動項目,及用以控制生產 不同階段。 為了清楚之緣故,請瞭解,本 个f明案全爲使用••一 ". "一個”不排除複數,而且"包括 ^ 匕括不排除其他步驟或元件。 一”單元”或”模組"可包括赵4 _ 、,. 匕括數個早疋或模組,除非另外陳 述0 【圖式簡單說明】 圖1係由一相機捕捉作手熱 m ^ 不 A + 料勢之—使用者之-影像的- 意表示法; Ο 圖2a係一人物之-影像順序的-示意表示法,盆且有 聯動作資訊及模糊相關特徵; 八 圖2 b係根據本發明之.八士 2〜 _. ',、田刀成子影像之一影像的一示意表 不法, 圖2c係根據本發 示法; 關 月之衫像中之一特定區域的一示意表 一藉由來自—影像順序之一特定區域順序所給定之 =積及其關聯體積直方圖的一示意表示法; 圆係一虛擬體籍·令丁、 ;|,1 ^ » 視圖的一表示法,其顯示一影像 順序之疊加特定區域; 體顯/一手勢之—狀態轉變模型的一狀態圖,及數個 菔槓直方圖; 圖5係根據本發明之—# —種用於手勢辨識之系統的一方塊 130766.doc •29- 200910221 圖; 圖6顯示根據本發明之執行手勢辨識之方法之步驟的一 流程圖。 在圖式中,重頭到尾相同數字指相同物件。 【主要元件符號說明】 1 使用者 2 相機 3 系統 4 系統 5 處理單元 6 計算單元 7 分類單元 8 組合單元 9 資料庫 10 信號 11 處理組塊 12 模糊相關特徵 13 姿勢資訊 14 統計值 15 組合單元 20 影像 21 > 22 子影像 30 影像 31 影像之區 130766.doc -30- 200910221The ratio of the total can be supplied as the M of the histogram HLC, HQ. Equivalently, the histogram (4) is used as a correlation feature. The histograms Hlc and HG can be used as a β-feature Fm. The combining unit 15 may include a separate analyzing unit for performing a histogram analysis of the required requirements, but in the commentary, it is displayed as a single unit and blocks 2, 5, and 6 described so far. Features, while - system 3' system 4 front end. The histograms HLC, H (for forwarding to a first to collect, and combined with the previous histogram, for:: early: 'in which the image data of the previous image in the image sequence is added or: :: Straight = is derived from this 。. Then the 130130.doc • 26- 200910221 The output of the second combining unit 8 is a set of volume histograms VHlc, VHg, VHLC', VHG, wherein the volume histogram VHlc, VHg corresponds to a particular region of interest identified in the sequence of images produced by the camera 2, and the complementary volume histograms VHlc, VHg correspond to complementary regions in the sequence of images. Ο Ο In the second combining unit 8, the volume histograms VHlc, V%, vhlc, VHg corresponding to certain postures made by the user when performing the gesture are processed to obtain the position characteristic feature Fp, for example, the volume. The distance between the histograms VHlc and VHg, or the distance between a volume histogram VHLC, VHG and its complementary volume histogram VHu:|, VHg. The volume histograms VHlc, VHg, VHlc, vtv and these positions Feature feature Fp is forwarded to one Class 7 is analyzed in order to classify the gestures made by the user. Finally, from the candidate gesture - the database type 9 pose histogram ΡΗ ι, ΡΗ 2, 叫, 么 prototype bit: A 13 ° can be some suitable Mode comparison - volume histograms ρ Η =, VHG, VHLC,, VHg, and a corresponding prototype histogram PH1, 4 卩 _ number of candidate handcuffs. By comparing the continuous volume histogram, the limbs experience time irAvs The number of gestures in which the gesture is made, Kyrgyz, narrows the number of candidate gestures from the database down to τ until the recognition is most likely, a .gg <#作-suitable signal 10 is forwarded to the other two. The result of the classification is as a module. Processing block 11, for example, a gesture interpretation, although the second seedling-part & unit 8 is used to perform the action classification system 4, it is also possible to assume that the second combination unit 8 Integrated in a system 3 for determining the relevant characteristics of the motion 130766.doc • 27· 200910221. Figure 6 shows a block diagram of the main steps in the method of action classification according to the present invention for first order based on fuzzy correlation One of the characteristics of the analysis In a first step 5, the image 600 is obtained from an image source such as a camera. Edge detection is performed on the image 6〇〇 in an edge detection block 5〇1, for example by Performing wavelet transform on the image data to give fuzzy correlation information 6〇1 M columns such as a set of wavelet coefficients 6〇1. Ay is processed in a special_fetch block to give each group of wavelet coefficients 6 (H- Group - 'order feature 602. In the histogram compilation block 503, calculate the system value of the first-order feature 602, * and compile the number of histograms or volume histograms of the order-like features of the image. , an image-specific area' - image order, or - image-specific area order. The histograms 6〇3 are input to a histogram analysis block 5〇4, from which the motion-related features or positional characteristics 6〇4 are derived. These features are then input to an action classification block 505, which is used to determine the operation of the corresponding location. Once the classification procedure is completed, the output signal 605 can indicate the identified action, e.g., a gesture made by a user, or whether the action classification has expired. The present invention has been disclosed in its preferred embodiments and variations thereof, but it will be appreciated that numerous additional modifications may be made thereto without departing from the scope of the invention. The present invention can be used in any system that makes it impossible to: and has a computer with sufficient computing resources. For example, the method and system for gesture recognition can be used to recognize sign language gestures ^ to convert this to text. Alternatively, the implementation may be a monitoring system that uses I30766.doc -28- 200910221 to classify and identify moving objects of the type (5), such as people, cars, and the like. Another possible implementation may be in an automated production process in which systems and methods in accordance with the present invention can be used to track mobile projects and to control different stages of production. For the sake of clarity, please understand that this case is all in use. • • • • • • • • “ does not exclude plurals, and "including ^ does not exclude other steps or components. A "unit" or " The module " can include Zhao 4 _,,. 匕 数 疋 数 模组 模组 模组 模组 模组 模组 , , , , , 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 【 User-image-intentional representation; Ο Figure 2a is a person-image sequence-schematic representation, basin with associated motion information and fuzzy correlation features; Figure 8b is based on the invention. 2~ _. ', a schematic representation of one of the images of the Knife sub-image, Figure 2c is according to the present invention; a schematic representation of one of the specific areas of the Guanyin shirt image by the image sequence a schematic representation of the = product and its associated volume histogram given by the order of a particular region; a representation of a virtual system, a singularity, a ;;, 1 ^ » view, which displays an image sequence Superimpose a specific area; a state of the body/one gesture-state transition model Figure, and a plurality of crowbar histograms; Figure 5 is a block 130766.doc • 29-200910221 of the system for gesture recognition in accordance with the present invention; Figure 6 shows the execution of gesture recognition in accordance with the present invention. A flow chart of the steps of the method. In the drawings, the same number from the beginning to the end refers to the same object. [Description of main component symbols] 1 User 2 Camera 3 System 4 System 5 Processing unit 6 Calculation unit 7 Classification unit 8 Combination unit 9 Database 10 Signal 11 Processing block 12 Fuzzy correlation feature 13 Posture information 14 Statistics 15 Combination unit 20 Image 21 > 22 Sub-image 30 Image 31 Area of image 130766.doc -30- 200910221

A 1,A 2,...,A η Ai ',A2’,…,An' fl ,f2, ... ,fn G 特定區域 互補區域 影像 狀態轉變模型A 1,A 2,...,A η Ai ',A2',...,An' fl ,f2, ... ,fn G specific region complementary region image state transition model

130766.doc -31 -130766.doc -31 -

Claims (1)

200910221 十、申請專利範圍: 1· -種判斷有關一物件⑴之動作之動作相關特徵⑸之方 法’該方法包括 -獲得該物件⑴的-影像順序(fbU); -處王里該等影像(fl,f2,.,fn)之至少一部分,以便從 V像(fI ’ f2,’ fn)掏取數個一階特徵; •計算有關該等一階特徵之數個統計值; __組合該等統計值,以給予數個直方圖(HO); 及基於該等直方 方圖(HLC、HG)判斷該物件⑴之該動 作相關特徵(FJ。 I X汲動 2_如請求項1之方法,其中# 徵。 〒該專一階特徵包括模糊相關特 3.如請求項2之方法’其中從 相關牯Λ 4 办1冢(h,f2,·..,fn)掘取模糊 相關特徵包括對影像資 w 小波係數,月♦w 小波變換’以判斷數個 ο 特徵。 U係數’以給予該等模糊相關 4.如前述請求項令任一箱 由經歷時間之該影像二:,\,其中一階特:係對於藉 (V)加以擷取。 ,,,fn)所給定的一體積 5.如前述請求項"之方法 Λ ΛΤ 一特疋區域順序 A2,…,An)係在該影像順 (】, 階特徵係對;^± 4 2’ ’·.,fn)中識別,而且_ τ Θ於藉由經歷時f3 A2,…,An)所认定的 Μ特疋區域順序(A,, 6如;主,馆子體積㈤加以擷取。 6.如凊求項4之方法 "t將有_取自—影像順序… 130766.doc 200910221 2’ ’··’ 來自—影像順序(f},心,,fn)中之一特定區 域順序(A A 1 ’ 2,…,Αη)之某一種類之一階特徵之統計值 ▲於此階特徵的一體積直方圖(vhlc、vhg)中。 :求項6之方法,其中一互補區域順序 (f:,於影像順序(fl,f2, ·..,fn)中識別,其中-影像 傻:f V··,fn)中的一互補區域(αλ ",…,An,)係與該影 且將有LA中之該特定區域(Αΐ,Α2,...,Αη)·4 Γ (A , A ’ 、種類之一階特徵(擷取自該互補區域順序 2’ .··,An’))之統計值組合於此一階特徵的一互補 體積直方圖⑽乂㈣)中。 補 "凊求項5之方法,其中該 一影像(f,,f2 f “ 2,…,An)及 / 或 .,·.·,fn)之該互補區域(Α〗,,Α2,,...,Αη,)係 由對於該影像(f 係错 識別。 ,2’ ··,,U所判斷的該等一階特徵加以 9. -種執行在—影像順序^ f 之動作之動作分類之方;I.捕捉之-物件⑴ 至8中任1之方^ 方法包括使用如請求項1 件⑴之動作相關特等影祕14)中之該物 ⑹分類該物件⑴之動作^及使用該等動作相關特徵 10. 如請求項9之方半 Λ/υ , ' ’其中體積直方圖、 VHLC,、VHg,)係使 圖(VHlc、VHq、 得位置特性特徵(F) 4求項6或7之方法所產生,以獲 以分類該物件⑴之P動作而且分析此等位置特性特徵(Fp), 11 ·如請求項丨0之方 其中5十算一體積直方圖(VH 、 130766.doc 200910221 VHg)與一互補體積直方圖(vhlc,、VHg,)間的〜 特徵 以獲得位置特性特徵(Fp),而且分析此等 距離, (FP),以分類該物件(1)之動作 置特性 12.如5青求項1〇或之方法 n 一 ... 其中該等位置特性特徵用 以識 別一動作之一狀態轉變模型中的一狀態(s]、、/ 位置 〜),該狀態(s,、s2、s”S4)對應於該動 二S” W —特別 統 13:種::r 一物件⑴…相_-… 來源⑺,用於提供該物件⑴的1像順序 •一處理早元(5) 1於處㈣等影像(f,,f2 f. =部分’以便從該等影像(…“)擷取數個: 等一階特徵之數 ' 一計算單元(6) 個統計值; 方圖(H—組合Η單元〇5)’用於將該等統計值組合至數個直 幻 )中,並且用於基於1亥等直方圖(hlc、Hg) 而判斷该等動作相關特徵(Fm)。 14=:執行-物件⑴之動作之動作分類之系統⑷, 二如晴未項13之系統(3),用於判斷有關 =特徵⑹,及一分類單元⑺,用於使用該等動作 目特徵(Fm)分類該物件之動作。 15. -種電腦可讀取媒體’其儲存有—電腦程式,可直接載 130766.doc 200910221200910221 X. Patent application scope: 1. A method for judging the action-related features (5) of an action of an object (1) 'This method includes - obtaining the image sequence (fbU) of the object (1); - the image of the king ( At least a portion of fl, f2, ., fn) to extract a number of first-order features from the V image (fI 'f2, 'fn); • calculate a number of statistical values relating to the first-order features; __ combine And other statistical values to give a plurality of histograms (HO); and determining the action-related features of the object (1) based on the histograms (HLC, HG) (FJ. IX 2 2_, as in the method of claim 1, Where #征. 〒 The specific first-order features include fuzzy correlations. 3. As in the method of claim 2, which extracts fuzzy related features from the relevant 牯Λ 4 (h, f2, ·.., fn) including the image w w wavelet coefficient, monthly ♦w wavelet transform 'to determine a number of ο characteristics. U coefficient 'to give such fuzzy correlations 4. As described in the above request, any box by the time of the image two:, \, one of Step: It is the one given by borrowing (V). , ,, fn) 5. The method of the above-mentioned request item " 疋 疋 a special region sequence A2, ..., An) is identified in the image sequence, (^, 4 2 ' ', . And _ τ Θ 藉 藉 经历 经历 f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f ;t will have _ taken from - image sequence... 130766.doc 200910221 2' '··' From - the sequence of specific regions in the image sequence (f}, heart, fn) (AA 1 ' 2,...,Αη) The statistical value of a certain order characteristic of a certain kind ▲ in a volume histogram (vhlc, vhg) of this order feature. The method of finding item 6, wherein a complementary region is ordered (f:, in image order (fl, Identifyed in f2, ·.., fn), where a complementary region (αλ ",...,An,) in the image silly: f V··, fn) is associated with the shadow and will have the specific one in the LA The region (Αΐ, Α2,...,Αη)·4 Γ (A, A ', the statistic value of the first-order feature of the species (taken from the complementary region order 2'.··, An')) is combined here First order feature A complementary volume histogram ⑽ qe iv) in. Supplementing the method of claim 5, wherein the complementary region of the image (f,, f2 f "2, ..., An) and / or ., . . . , fn) (Α,, Α 2,, ..., Αη,) is performed by the first-order features judged by the image (f-missing recognition, 2', ·, U) - an action performed in the image sequence ^ f The party of the classification; I. The object of capture (1) to 8 of the method ^ The method includes classifying the action (1) of the object (1) using the object (6) in the action related special feature 14) of claim 1 (1) The action-related features 10. As in the case of the item 9 Λ Λ / υ , ' 'the volume histogram, VHLC,, VHg,) is the map (VHlc, VHq, position characteristic feature (F) 4 item 6 Or the method of 7 is generated to obtain the P action of classifying the object (1) and analyze the positional characteristic features (Fp), 11 · as in the case of the request item 丨0, where the five tenth volume histograms (VH, 130766. Doc 200910221 VHg) with a complementary volume histogram (vhlc, VHg,) ~ feature to obtain position characteristics (Fp), and analyze this distance (FP), to classify the action of the object (1). 12. If the position feature is used to identify a state transition in a state transition model a state (s], , / position ~), the state (s, s2, s "S4) corresponds to the motion two S" W - special system 13: species:: r an object (1) ... phase _-... source (7), for providing the image sequence of the object (1) • processing the early element (5) 1 at (4), etc. (f, f2 f. = part 'to extract several images from the image (..."): The number of first-order features 'one computational unit (6) statistical values; the square graph (H-combined unit 〇5)' is used to combine the statistical values into several illusions) and is used for The histogram (hm, Hg) such as Hai is used to determine the action-related features (Fm). 14=: The system (4) that performs the action classification of the action of the object (1), and the system (3) of the second uncle 13 of Judging the relevant = feature (6), and a classification unit (7) for classifying the object using the action feature (Fm). 15. - Computer readable media 'its storage - computer programs can be loaded directly 130766.doc 200910221 I 入至一可程式裝置之記憶體中以便用於判斷有關一物件 之動作相關特徵的一系統中,及/或用於執行一物件之動 作之動作分類的一系統中,其包括軟體碼部分,當該電 腦可讀取媒體運行於該裝置上時,用於執行如請求項1 之方法之步驟。 130766.docIinto a memory of a programmable device for use in a system for determining action-related features of an object, and/or for performing a classification of actions of an action of an object, including a software code portion The step of performing the method of claim 1 when the computer readable medium is run on the device. 130766.doc
TW097117446A 2007-05-15 2008-05-12 Method of determining motion-related features and method of performing motion classification TW200910221A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP07108267 2007-05-15

Publications (1)

Publication Number Publication Date
TW200910221A true TW200910221A (en) 2009-03-01

Family

ID=40002714

Family Applications (1)

Application Number Title Priority Date Filing Date
TW097117446A TW200910221A (en) 2007-05-15 2008-05-12 Method of determining motion-related features and method of performing motion classification

Country Status (2)

Country Link
TW (1) TW200910221A (en)
WO (1) WO2008139399A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI415032B (en) * 2009-10-30 2013-11-11 Univ Nat Chiao Tung Object tracking method
TWI492166B (en) * 2012-01-12 2015-07-11 Kofax Inc Systems and methods for mobile image capture and processing
US11386714B1 (en) 2021-01-08 2022-07-12 Institute For Information Industry Motion recognition apparatus and method

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102576412B (en) 2009-01-13 2014-11-05 华为技术有限公司 Method and system for image processing to classify an object in an image
CN102356398B (en) * 2009-02-02 2016-11-23 视力移动技术有限公司 Object identifying in video flowing and the system and method for tracking
US8890803B2 (en) 2010-09-13 2014-11-18 Samsung Electronics Co., Ltd. Gesture control system
EP2428870A1 (en) * 2010-09-13 2012-03-14 Samsung Electronics Co., Ltd. Device and method for controlling gesture for mobile device
JP5782061B2 (en) * 2013-03-11 2015-09-24 レノボ・シンガポール・プライベート・リミテッド Method for recognizing movement of moving object and portable computer
US10198813B2 (en) * 2014-05-13 2019-02-05 Omron Corporation Posture estimation device, posture estimation system, posture estimation method, posture estimation program, and computer-readable recording medium on which posture estimation program is recorded
CN107735813A (en) * 2015-06-10 2018-02-23 柯尼卡美能达株式会社 Image processing system, image processing apparatus, image processing method and image processing program
WO2016199748A1 (en) * 2015-06-11 2016-12-15 コニカミノルタ株式会社 Motion detection system, motion detection device, motion detection method, and motion detection program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007004100A1 (en) * 2005-06-30 2007-01-11 Philips Intellectual Property & Standards Gmbh A method of recognizing a motion pattern of an obejct

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI415032B (en) * 2009-10-30 2013-11-11 Univ Nat Chiao Tung Object tracking method
TWI492166B (en) * 2012-01-12 2015-07-11 Kofax Inc Systems and methods for mobile image capture and processing
US11386714B1 (en) 2021-01-08 2022-07-12 Institute For Information Industry Motion recognition apparatus and method
TWI779454B (en) * 2021-01-08 2022-10-01 財團法人資訊工業策進會 Motion recognition apparatus and method thereof

Also Published As

Publication number Publication date
WO2008139399A3 (en) 2009-04-30
WO2008139399A2 (en) 2008-11-20

Similar Documents

Publication Publication Date Title
TW200910221A (en) Method of determining motion-related features and method of performing motion classification
Jegham et al. Vision-based human action recognition: An overview and real world challenges
Rui et al. Segmenting visual actions based on spatio-temporal motion patterns
Zhang et al. Random Gabor based templates for facial expression recognition in images with facial occlusion
Bobick et al. The recognition of human movement using temporal templates
US6792144B1 (en) System and method for locating an object in an image using models
Ji et al. Learning contrastive feature distribution model for interaction recognition
Murtaza et al. Analysis of face recognition under varying facial expression: a survey.
WO2006059419A1 (en) Tracing device, and tracing method
CN110458235B (en) Motion posture similarity comparison method in video
Liu et al. Micro-expression recognition using advanced genetic algorithm
CN112541434B (en) Face recognition method based on central point tracking model
Świtoński et al. Human identification based on gait paths
Zhao et al. Real-time sign language recognition based on video stream
Xia et al. Human motion recovery jointly utilizing statistical and kinematic information
CN112329663B (en) Micro-expression time detection method and device based on face image sequence
Wu et al. Recognition of Student Classroom Behaviors Based on Moving Target Detection.
Xu et al. Action recognition by saliency-based dense sampling
Chai et al. Hierarchical and multi-featured fusion for effective gait recognition under variable scenarios
Chan et al. Human motion classification using 2D stick-model matching regression coefficients
CN115424209A (en) Crowd counting method based on spatial pyramid attention network
CN118314618A (en) Eye movement tracking method, device, equipment and storage medium integrating iris segmentation
Zeng et al. Video‐driven state‐aware facial animation
CN111626197A (en) Human behavior recognition network model and recognition method
Shiraishi et al. Optical flow based lip reading using non rectangular ROI and head motion reduction