TW201020935A

TW201020935A - Recognition and constitution method and system for video-based two-dimensional objects

Info

Publication number: TW201020935A
Application number: TW97144887A
Authority: TW
Inventors: Chen-Lan Yen; Chih-Chang Chen; Chu-Chuan Huang; Po-Lung Chen
Original assignee: Ind Tech Res Inst
Priority date: 2008-11-20
Filing date: 2008-11-20
Publication date: 2010-06-01

Abstract

Recognition and constitution method and system for video-based two-dimensional objects is disclosed. An image frame is retrieved from a video extraction device and a pre-processing operation is implemented thereto. At least one foreground object of the image frame is retrieved according to difference of pixel colors of the image frame and a feature set of an outline of the foreground object and plural line segments connecting plural feature points in the feature set are then retrieved. Position variations of the feature points of the image frame and a previous frame are compared and angle variations of the feature points of the image frame and the previous frame are compared. Comparison results of the position variations and angle variations are recorded and an input operation of an input device corresponding to the comparison results is performed.

Description

201020935 九、發明說明：【發明所屬之技術領域】本發明係有關於一種影像辨識方法，且特別有關於一種基於視訊之二維物件的辨識與構成方法與系統。【先前技術】利用視訊輸入作為人機互動介面，在介面輸入與使用上是相當直覺且便利的方式。在—些習知作法上，為了有效地辨識物件在二維視訊影像中之位置或形狀，必須穿戴特殊裝置或服裝於物件上方可達成。“，此穿戴式介面將降低使性。_ ’若能簡訊輸人作為人機介面’並且發展連_像中_輸人語意之技術，即可簡單化人機介面之操作’因而創造廣泛之市場應用。在視訊的圖形辨識研究領域中，對於影像中物件之辨，、追縱與行為分析，已是相當傳統之研究。然而’由於環境光線變化、背景_度㈣素應:二若要達到穩定且快速的辨 ^代:二右在連續影像中辨識物件行為的技術來機介面輸入’則可藉由於動作輸入之特殊特算::移㈣向資訊，如此可= =準確度^但仍可得射接受的敎人機介面控制在習知技術中，中華民國第〇隨勘號專利揭露了 6 201020935 「手勢滑鼠的構成方法」，其只能運用於單手操作，故可、组合狀態、受限於偵測區域内單手移動與旋轉所能控制的語意°此外’該手勢滑鼠的旋轉控制方法計算手部最高點與參考點所形成的直線，而使得其手勢辨識效果受限於手勢方式、方向與角度。中華民國第092133383號專利揭露了「辨識雙擊手勢之方法及控制器」，其係以手勢特徵出現的時間定量作為雙擊訊號的偵測依據。美國第 5454043 號專利揭露了「Dynamic and Static β Hand Gesture Recognition Through Low-Level Image Analysis」，其手勢辨識方法受限於手勢動作需符合系統訓練的樣式，故會受限於時間、手勢方式、方向與角度。美國第 7289645 號專利揭露了「Hand Pattern Switch Device」，其中偵測模式限定包含移動距離（Moved Distance)與停止時間（Stop Time)項目。上述習知技術僅揭露了類似的辨識方法以取代傳統的輸入裝置’但其與本案發明所欲達到的目的、功效與實施 ❿手段皆不相同。【發明内容】本發明的目的在於提供一種基於視訊之二維物件的辨識與構成方法與系統，其偵測連續的輸入影像中之物件輪廓的特徵點夾角與物件的運動方向資訊，以產生直覺式之人機介面互動輸入，用以替代鍵盤、滑鼠等傳統人機輸入控制介面。基於上述目的，本發明實施例揭露了一種基於視訊之 7 201020935 二維物件的辨識與像晝面，並且。自一視訊擷取裝置取得一影像晝面中之像象晝面執行一前處理操作。根據該影 -前景物件，繼^，異數差以取得該影像晝面中之至少 *連接該特徵點集合的特徵點集合以該影像畫面料— 姐特徵闕的複麟段。比較且比較該目前影像之該等特徵點的位置變化，並角變化。記錄該等‘二一影像晝面之該等特徵點的夾本發明實施例更揭露冑置之-輸入操作。識與構成系統，包括於視訊之二維物件的辨訊掘取裝置用以取得一发象畫、置與—電腦裝置。該視訊操取模組、一义老〜像晝面。該電腦裝置更包括一視 ° 、刚處理模組、一視訊物件八劏Γ t201020935 IX. Description of the Invention: [Technical Field] The present invention relates to an image recognition method, and in particular to a method and system for recognizing and constructing a two-dimensional object based on video. [Prior Art] Using video input as a human-machine interactive interface is a relatively intuitive and convenient way to input and use the interface. In some conventional practices, in order to effectively identify the position or shape of an object in a two-dimensional video image, special equipment or clothing must be worn over the object. "This wearable interface will reduce the sex. _ 'If the newsletter loses people as a human-machine interface' and develops the technology of the human-machine interface, it can create a wide range of applications. Market application. In the field of video identification research, it is quite traditional research on the identification, tracking and behavior analysis of objects in images. However, due to environmental light changes, background _ degrees (4) should be: Achieving a stable and fast identification: the technique of identifying the behavior of the object in the continuous image to the machine interface input can be made by the special calculation of the motion input:: shift (four) to the information, so ===accuracy^ The human-machine interface control that can still be accepted by the receiver is in the prior art. The Republic of China No. 3 is published with the survey patent. The 2010 2010 935 "Method of Constructing a Gesture Mouse" can only be used for one-hand operation, so The combined state is limited by the semantics that can be controlled by one-hand movement and rotation in the detection area. In addition, the rotation control method of the gesture mouse calculates the straight line formed by the highest point of the hand and the reference point, so that the hand The potential recognition effect is limited by the gesture mode, direction and angle. The Republic of China No. 092133383 discloses a "method and controller for recognizing a double-click gesture", which quantifies the time at which a gesture feature appears as a basis for detecting a double-click signal. US Patent No. 5,540,043 discloses "Dynamic and Static β Hand Gesture Recognition Through Low-Level Image Analysis". The gesture recognition method is limited by the gestures that need to conform to the system training style, so it is limited by time, gesture mode, direction. With angles. U.S. Patent No. 7,289,645 discloses "Hand Pattern Switch Device" in which the detection mode defines items including a Moved Distance and a Stop Time. The above-mentioned prior art only discloses a similar identification method to replace the conventional input device', but it is different from the object, the function and the implementation method of the present invention. SUMMARY OF THE INVENTION An object of the present invention is to provide a method and system for recognizing and constructing a two-dimensional object based on video, which detects the angle of a feature point of an object contour in a continuous input image and the direction of motion of the object to generate an intuition. The human-machine interface interactive input is used to replace the traditional human-machine input control interface such as keyboard and mouse. Based on the above object, an embodiment of the present invention discloses a recognition and image plane of a two-dimensional object based on video 7 201020935. A pre-processing operation is performed from a video capture device to obtain an image in the image plane. According to the shadow-foreground object, the difference is obtained to obtain at least the feature point set of the feature point set, and the feature picture set of the feature picture set is the composite picture segment of the image feature. Comparing and comparing the position changes of the feature points of the current image, and changing the angle. The clips of the feature points of the "two-one image" are recorded. The embodiment of the invention further discloses the input operation. The recognition and composition system, including the two-dimensional object of the video, is used to obtain a picture-drawing, setting-and-computer device. The video operation module, the old and the old ~ like the face. The computer device further includes a viewing unit, a processing module, and a video object.

Segmentation )模組、一控制刀割（Object 取模組自該視訊掏取裝置取得該景^=模組。該視訊擷對該影像晝面執行—前處理操作。^視訊物^處理模組 ❿據該影像晝面中之像素色彩變異^取件分割模組根之至少-前齡該控制模組==影像晝面中特徵點集合以及連接該特徵點集合中京物件之輪廓的數線段。該比較模組比較該影像畫面與^數特徵點間的複徵點的位置變化’比較該目前影像晝面c面之特該等特徵點的炎角變化。該控制模組記^衫像畫面之置變化與夾角變化的比較結果，並且j 、〜等特徵點的位應一輸入裝置之一輸入操作。艮據比較結果執行對 8 201020935 【實施方式】為了讓本發明之目的、特徵、及優點能更明顯易悝，下文特舉較佳實施例，並配合所附圖式第1圖至第3圖，做詳細之說明。本發明說明書提供不同的實施例來說明本發明不同實施方式的技術特徵。其中，實施例中的各元件之配置係為說明之用，並非用以限制本發明。且實施例中圖式標號之部分重複，係為了簡化說明，並非意指不同實施例之間的關聯性。Segmentation) module, a control knife cut (Object capture module obtains the scene ^= module from the video capture device. The video camera performs the pre-processing operation on the image surface. ^Video device ^ processing module❿ According to the pixel color variation in the image plane, at least the former module of the control module == the image of the feature point in the image plane and the number line segment connecting the contour of the object in the feature point set. The comparison module compares the position change of the revering point between the image frame and the number of feature points to compare the change of the inflammatory angle of the feature points of the c-plane of the current image. The control module records the image of the shirt image. The result of the comparison of the change and the change of the included angle, and the bit of the feature point such as j, ~, etc. should be input to one of the input devices. 执行 According to the comparison result, the pair is executed. 201020935 [Embodiment] For the purpose, features, and advantages of the present invention The present invention will be described in detail with reference to the accompanying drawings, Figures 1 through 3, which illustrate various embodiments to illustrate various embodiments of the present invention. The components of the embodiments are for illustrative purposes, and are not intended to limit the invention. The portions of the drawings in the embodiments are repeated for simplicity of description and are not meant to be between different embodiments. Relevance.

本發明實施例揭露了一種基於視訊之二維物件的辨識與構成方法與系統。本發明實施例之基於視訊之二維物件的辨識與構成方去係為由輸人的連績影像中分析物件的姿勢與運動資訊，二辨識物件操作sf意之構成方法。該方法的主要特色是以機操取視訊作為人機互動介面輸入，且分析與辨勢’進而構成-電縣置之控制介面。哥妥 ^發明實施例之基於視訊之二轉件_識與 =為梅取-攝影機之視訊輸人，並且偵測視訊2 微二=::::::觀。計算該物件之輪二 ::r 件之動作輸入（又稱為輸入語意）時’若輪廓特早 9 201020935 二=)向:：:::測多物件之動作輪入(又稱為而產生更多樣化之人機介面控制指令。動作°。意組合），本發明可被使用於二維連續影像之輸 _物件之位置與輪廓的動作語：二：：位，包括心=手：:，測人_ 制之輸入指令。雙手之動作语意’以產生人機介面控示意^ ί圖係顯示本發明實施例之判斷手部特徵點語意的用以示之偵測手部動作僅係為一實施例，其並非土限制本發明。影像晝面（例如，手部影像）經過背哥資訊，再以端點(例如，右手）之輪靡 Ρ ρ端點（即，特徵點，包括pa、pb、pc、pd、Pe、方气λ 接下來，（以順（逆）時針方向或任意 ρ式）取得該物件之輪靡的特徵點集合，"a、Pb、Pc、厂匕、5 4。排除首末兩個特徵點（1^、1^)後。資夾r舆相對應線段的長度， :本實施例中，（pb、pc、pd)與(pg、pa、pb)的夾爽角件上’故不予以考慮’而（Pa、pb、pc)的 a 4於該影像晝面上’故為有效夾角。考慮特 201020935 徵點 Pa、Pb、Pc 的重心、r· r»、h ^ (pw)與二個特徵點的相對位置，以決定，A、B間的夹角大小。複數特徵點的重心位置意指在早-物件上之所有特徵點在二維空間上之垂直與水平分量的平均值。第2圖係顯示本發明實施例之基於視訊之二維物件的辨識與構成方法的步驟流程圖。乂The embodiment of the invention discloses a method and system for identifying and constructing a two-dimensional object based on video. The identification and composition of the two-dimensional object based on video in the embodiment of the present invention is to analyze the posture and motion information of the object in the succession image of the input, and to identify the method of constructing the object. The main feature of this method is that the machine takes the video as the human-computer interaction interface input, and the analysis and the identification' constitute the control interface of the electric county. Koto ^Inventive embodiment of the video-based two-transfer _ _ and = for the mate - camera video input, and detection video 2 micro two =:::::: view. Calculate the action input (also known as input semantics) of the wheel 2::r piece of the object. 'If the contour is early 9 201020935 2=) To::::: Measure the action of multiple objects (also known as More diverse human-machine interface control commands. Actions. Intentional combination), the present invention can be used in two-dimensional continuous image transmission _ object position and contour action words: two:: bits, including heart = hand: :, test the input command of the system. The gesture of the two-handed action is used to generate a human-machine interface control. The figure shows that the gesture of the hand feature point of the embodiment of the present invention is used to detect the hand motion, which is merely an embodiment, which is not a soil limitation. this invention. The image plane (for example, the hand image) passes through the back information, and then the end point (for example, the right hand) rim ρ end point (ie, feature points, including pa, pb, pc, pd, Pe, square gas λ Next, (in the cis (reverse) hour hand direction or any ρ type), obtain the feature point set of the rim of the object, "a, Pb, Pc, 匕, 5 4. Exclude the first and last feature points ( After 1^, 1^). The length of the corresponding line segment of the folder r舆: in this embodiment, (pb, pc, pd) and (pg, pa, pb) are clipped on the corner piece, so it is not considered 'And (a, Pa, pb, pc) a 4 on the image surface 'is an effective angle. Consider the focus of the 201020935 sign Pa, Pb, Pc, r · r», h ^ (pw) and two The relative position of the feature points determines the angle between the angles A and B. The position of the center of gravity of the complex feature points means the average of the vertical and horizontal components of the two feature points in the two-dimensional space on the early-object. 2 is a flow chart showing the steps of the method for recognizing and constructing a two-dimensional object based on video according to an embodiment of the present invention.

首先’自一視訊梅取裝i (例如，網路攝影機）200 取得一影像晝面（步驟S2()i)，並且對該影像畫面執行— f處理操作（步驟伽），即調整該影像晝面中的白平衡糸數，使得該影像畫面之像素點（pixel)的色彩不會環境光源的變化而產生色彩驟變的情形。接著，根據像晝面中之像素色彩變異數差以取得該影像畫面中之至少一别景物件（例如，手），並且對該影像晝面進行影像件分離（步驟S203)。接著，判斷該前景物件僅包含單一物件（例如，左手、右手、或其它人體部位）或包含至少二個物件（例如，左手與右手）（步驟S204)。若包含至少二個物件，則將該等物件分別分離為單一區塊物件以分別處理（步驟 S2=)。在得到每一區塊物件後，再依照其中一區塊物件與背景之f彡像邊界的交接位置，根據預先較義判斷該區塊物件是否為正向影像（步驟S2〇6)。以手部物件來說，右手為正向影像。又，以身體物件來說，身體背面為反向影像。若該區塊物件非為正向影像，則反轉該區塊物件為正向影像（步驟S207)。 201020935 在步驟S204中，若該前景物件僅包含單一物件，則亦應根據預先的定義判斷該前景物件是否為正向影像 (步驟 S ) 以及當非為正向影像時反轉該前景物件為正向影像（步驟S207)。接著，（以順（逆）時針方向或任意方式）取得該前景物件（區塊物件）之輪廓的特徵點集合以及連接特徵點間的線段（步驟S208)。例如，如第1圖所示，取得特徵點Pa、Pb、Pc、Pd、&、Pf、Pg以及連接特徵點間的線段（例如，線段A、B )。需注意到，若目前之影參像畫面為第一影像晝面，則本流程回到步驟S201取得下-影像晝面。右該目前影像晝面非為第一影像晝面，則比較該目前影像畫面、與則一影像晝面之特徵點位置變化（步驟 S209)，並且比較該目前影像晝面與前一影像晝面之特徵點=角變化（步驟幻10)。舉例來說，如第1圖所示，冲/連續三個特徵點之夾角資訊與相對應線段的長度，以取得特徵點之有效爽角資訊，並且參考特徵點的重心 • (Pw)與特徵點的相對位置。又，比較目前影像畫面之特徵與前一影像畫面之特徵點h的位置變化，以及比較目4像畫面之夾角《與前—影像畫面之夾角α的角度變化。一第1圖所示排除首末兩個特徵點後，將特徵點依順』到’右連續一特徵點所構成的夾角開口朝向物件外側’特徵點重心不位於此開σ方向，線段Β大或等於線段 Α，以及角度大於臨界值八，^4所構成的三角形内 201020935 切圓界值B、c間，則判斷為虎口張η。 t W、己錄特徵點的位置變化與：結果（步驟S2U)，並且根變化的比較置(例如’滑鼠、鍵盤或其===應1入裝 (步驟㈣）。例如，若辨識 =輪入操作 -滑鼠按壓操作，而若辨識為按壓鍵盤:::作，則觸發鍵盤按壓姆。執妓對應之則觸發-取得下-影像晝面。〈輸人操作後即回到步驟S201First, an image capture surface (step S2()i) is taken from a video capture device i (for example, a webcam) 200, and a -f processing operation (step gamma) is performed on the image frame, that is, the image is adjusted. The number of white balances in the face makes the color of the pixel of the image image not change due to changes in the ambient light source. Then, at least one other object (for example, a hand) in the image frame is obtained based on the difference in pixel color variation in the image plane, and the image is separated by the image (step S203). Next, it is determined that the foreground object contains only a single object (e.g., left hand, right hand, or other body part) or contains at least two items (e.g., left hand and right hand) (step S204). If at least two objects are included, the objects are separated into a single block object for processing separately (step S2 =). After each block object is obtained, according to the intersection position of the block object and the background image boundary of the background, whether the block object is a forward image is determined according to the prior comparison (step S2〇6). In the case of a hand object, the right hand is a positive image. Also, in the case of body objects, the back of the body is a reverse image. If the block object is not a forward image, the block object is inverted as a forward image (step S207). 201020935 In step S204, if the foreground object only contains a single object, it should also be judged according to the previous definition whether the foreground object is a forward image (step S) and when the non-forward image is inverted, the foreground object is positive. The image is directed (step S207). Next, the feature point set of the outline of the foreground object (block object) and the line segment between the feature points are obtained (in the cis (reverse) hour hand direction or in any manner) (step S208). For example, as shown in Fig. 1, feature points Pa, Pb, Pc, Pd, & Pf, Pg and line segments connecting the feature points (e.g., line segments A, B) are acquired. It should be noted that if the current image reference image is the first image, the process returns to step S201 to obtain the lower image. If the current image plane is not the first image plane, the feature point position of the current image frame and the image plane is compared (step S209), and the current image plane and the previous image plane are compared. Feature point = angular change (step 10). For example, as shown in Fig. 1, the angle information of the three feature points of the punch/continue and the length of the corresponding line segment are used to obtain the effective refresh angle information of the feature points, and the center of gravity (Pw) and characteristics of the reference feature points are referred to. The relative position of the point. Further, the difference between the feature of the current image frame and the position of the feature point h of the previous image frame, and the angle of the angle "the angle α with the front image image" are compared. After the first and last feature points are excluded as shown in Fig. 1, the feature points are compliant to the right angle of the feature point formed by the right one toward the outer side of the object. The center of gravity of the feature point is not located in the σ direction, and the line segment is large. Or equal to the line segment Α, and the angle is greater than the critical value of eight, ^4 formed within the triangle 201020935 cut circle boundary value B, c, it is judged as the tiger mouth η. t W, the position change of the recorded feature point and: result (step S2U), and the comparison of the root change (for example, 'mouse, keyboard or its === should be loaded (step (4)). For example, if identification = The round-in operation-squeeze pressing operation, and if it is recognized as pressing the keyboard:::, the keyboard is triggered to press the mouse. If the corresponding is triggered, the trigger-acquisition is performed to obtain the lower-image surface. After the input operation, the process returns to step S201.

需庄意到，在步驟82()9與S21 畫面後，根據擷取出之影偾从Λ 田取侍夕個影像 $之^像物件的位置來決定影像物件的 :說：欲件次序組合以產生對應的輸入控制。舉例取萸詈塗點選滑鼠按鍵的輸入控制時，經由視訊擷 =1Γ手指由舉起到按下按鍵的影像畫面，經: 辨識後產^點選滑鼠按鍵的輸入控制。 -需/主忍到’該輸入控制可以為鍵盤或滑鼠上的單鍵行為或多重按鍵組合行為。需/主意到，該輸入控制可以為滑鼠上之多重按鍵與移動所產生的組合行為。、需主意到’該輸入控制可以為鍵盤的多重按鍵加上滑氣的夕重按鍵與滑鼠移動所產生雜合行為。 π需注意到，該物件次序可依據相對座標或絕對座標所取得的座標，並利用可排序的函數運算而得。需左意到，特徵點位置變化可為兩個特徵點間或多個特徵點間所連成的線段長度變化。 13 201020935 需注意到’辨識所得的動作與對應之滑鼠與鍵盤操作係為系統預先定義，而本發明主要係在辨識人體的全身或局部以及完整的四肢或局部肢體之運動，以在不需要滑鼠、鍵盤等傳統輸入裝置的情況下執行輸入操作，故不著重在如何定義動作與對應之滑鼠與鍵盤操作。第3圖係顯示本發明實施例之基於視訊之二維物件的辨識與構成系統的架構示意圖。本發明實施例之基於視訊之二維物件的辨識與構成系統包括一視訊擷取裝置300與一電腦裝置40(N電腦装置 400又包括一視訊揭取模組410、一前處理模組420、一視訊物件分割模組430、一判斷模組440、一控制模組45〇以及一比較模組460。電腦裝置4〇〇係為個人電腦或可式電子裝置。影像擷取模組410自視訊擷取裝置3〇〇取得一影像畫面，並且前處理模組420對該影像晝面執行一前處/ 作’即調整該影像晝面中的白平衡係數，使得該影像畫面之像素點的色彩不會因為環境光源的變化而 ===物件分割模組_根據該影像畫= 之像素色㈣異數差以取得鄉像晝面+之件（例如’手）’並且對該影像晝月]景物判斷模組440判斷該前千刀離如，左手、右手、或其夕人L 包含單一物件（例 " 體部位）或包含至少-個铷杜 (例如，左手與右手）。若包含 _ 至夕一個物件理模組430將該等物件分離兔„夕一個物件，則視訊處為早一區塊物件以分別處理。 201020935 ^得到每-區塊物件後，判_組_ 物件與背景之影像邊界的交接位置，根區塊像，=處理模組43〇反轉該區塊物件二^ 根據預先的定義判齡兮H W斷模組440亦應非Λ 别景物件是否為正向影像，以及告非為正向影像時視訊處及w ❿ 謇影像。接著，控制模Γ4η反轉該區塊物件為正向得該前景物件(區塊物件)之輪廓的特二為第線段。需注意到，若目前之影像畫：第象畫面，則影像擷取模組自取得下1像畫面。谓目現取裝置300 比較3 第一影像畫面，則比較模_ 並且比較Ϊ":像書剛一影像晝面之特徵點位置變化，變化。舉:;來二如^前一影像畫面之特徵點的夹角之第圖騎，計算每連續三個特徽點資訊與相對應線段的長度，以取得特徵點之有‘ 貝訊，並且參考特徵點的重（）戥置。又，比較目前影像畫面之特徵(點對位，a的位置變化，以及:二^ 別-影像畫面之夹角α的角度變化。面之失“與 :第1圖所不’排除首末兩個特徵點後舆序排列，若連續三特徵寺徵點依側，特徵點番、、不你於角開口朝向物件外特徵點“不位於此開口方向，線段Β大或等於線段 15 201020935 度大於臨界值A’Pa、Pb'所構成的三角形内刀圓+么介於臨界值B、c間，則判斷為虎口張開。比較完後，控制模組450記錄特徵 :變=比較結果，並且根據比較結果執行對應= 若辨識為按壓滑鼠的動作，則觸發一滑鼠按壓刼作’而若辨識為按壓鍵盤的動作，則觸發一鍵盤按壓操相較於中華民國第撕觀39號專利，本發明方法可運用於雙手或多肢體動作偵測，並可即時達成滑鼠雙擊的功能。此外’本發明的旋轉控制量可參考任三個連續特徵點所形成的特徵區域，故能運用手勢產生更豐富的控制選項。相較於中華民國第092133383號專利，本發明方法不需要根據時間間隔，即能依特徵點所構成的手勢變化直接驅動雙擊，達成更即時的手勢操作。相較於美國第5454043號專利，本發明方法無需事先訓練儲存手勢樣式，故不受限於規定的手勢樣式。相較於美國第7289645號專利，本發明方法在手勢語意認定上無需參考時間資訊。本發明更提供一種記錄媒體（例如光碟片、磁碟片與抽取式硬碟等等），其係記錄一電腦可讀取之電腦程式，以便執行上述之基於視訊之二維物件的辨識與構成方法。在此，儲存於記錄媒體上之電腦程式，基本上是由多數個程式碼片段所組成的（例如建立組織圖程式碼片段、簽核 16 201020935 表單程式碼片段、設定程式碼片段、以及部署程式碼片段），並且這些程式碼片段的功能係對應到上述方法的步驟與上述系統的功能方塊圖。雖然本發明已以較佳實施例揭露如上，然其並非用以限定本發明，任何熟習此技藝者，在不脫離本發明之精神和範圍内，當可作各種之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 201020935 【圖式簡單說明】第1圖係顯示意圖。示本發明實施例之判斷手部特徵點語意的第2圖係顯示本發明實施例之基於視訊之二維物件的辨識與構成方法的步驟流程圖。第3圖係顯示本發明實施例之基於視訊之二維物件的辨識與構成系統的架構示意圖。It is necessary to decide, after steps 82()9 and S21, to determine the image object from the position of the image of the object from the image of the object. Generate corresponding input controls. For example, when the input control of the mouse button is selected, the image of the button is pressed by the video 撷 =1 Γ finger, and the input control of the mouse button is selected after the identification. - Need/Master tolerate' This input control can be a single-button behavior or a multi-key combination behavior on a keyboard or mouse. It is necessary/intentional that the input control can be a combined behavior of multiple buttons and movements on the mouse. It is necessary to know that the input control can be used for the multi-keys of the keyboard plus the slipping of the slick key and the movement of the mouse. π It should be noted that the order of the objects can be based on coordinates obtained from relative coordinates or absolute coordinates, and is computed using a sortable function. It should be noted that the change of the feature point position may be a change in the length of the line segment formed between two feature points or between a plurality of feature points. 13 201020935 It should be noted that the 'identification of the action and the corresponding mouse and keyboard operation are pre-defined for the system, and the present invention is mainly for identifying the whole body or part of the human body as well as the movement of the whole limb or part of the limb to avoid The input operation is performed in the case of a conventional input device such as a mouse or a keyboard, so that the operation and the corresponding mouse and keyboard operations are not emphasized. Figure 3 is a block diagram showing the architecture of the identification and composition system of the two-dimensional object based on video in the embodiment of the present invention. The system for identifying and configuring a two-dimensional object based on the video of the present invention includes a video capture device 300 and a computer device 40. The N computer device 400 further includes a video removal module 410 and a pre-processing module 420. A video object segmentation module 430, a determination module 440, a control module 45A, and a comparison module 460. The computer device 4 is a personal computer or a portable electronic device. The image capture module 410 is self-viewing. The capture device 3 obtains an image frame, and the pre-processing module 420 performs a front end on the image surface to adjust the white balance coefficient in the image plane, so that the color of the pixel of the image frame It is not because of the change of the ambient light source === object segmentation module _ according to the pixel color (four) of the image difference = to get the image of the township face + (such as 'hand) 'and the image for the month] The scene determination module 440 determines that the front hand is away from the left hand, the right hand, or the evening person L includes a single object (for example, a body part) or contains at least one 铷 Du (eg, left hand and right hand). An object management module 430 separates the objects from the rabbits, and the video is processed as an early block object. 201020935 ^After obtaining each block object, the position of the image boundary of the object and the background is determined. Block image, = processing module 43 〇 invert the block object 2 ^ According to the pre-defined definition, the HW break module 440 should also be non-Λ whether the object is a positive image, and the non-forward image At the time of the video and the image of the image, then the control module 4n inverts the block object to obtain the contour of the foreground object (block object) as the first line segment. It should be noted that if the current image is drawn : The first image, the image capture module automatically obtains the next image. The comparison device 300 compares the first image, compares the mode _ and compares the 特征": the feature point position of the image just after the image Change, change. Lift:; to the second picture such as the angle of the feature point of the previous image frame, calculate the length of each consecutive three special point information and the corresponding line segment to obtain the feature point Message, and refer to the weight point of the feature point () In addition, the characteristics of the current image frame (point alignment, position change of a, and: angle change of the angle α of the image frame) are compared. The face loss is not the same as the first picture. After the feature points are arranged in order, if the three consecutive feature temple points are on the side, the feature points are not, and the feature points outside the object are not in the direction of the opening. The line segment is larger or equal to the line segment 15 201020935 degrees is greater than When the triangular inner cutter circle formed by the critical values A'Pa and Pb' is between the critical values B and c, it is determined that the tiger's mouth is open. After the comparison, the control module 450 records the feature: change = comparison result, and Corresponding to the comparison result = If it is recognized as the action of pressing the mouse, triggering a mouse press to make 'and if it is recognized as the action of pressing the keyboard, triggering a keyboard pressing operation is compared with the No. 39 patent of the Republic of China The method of the invention can be applied to the detection of two-hand or multi-limb motion, and the function of double-clicking the mouse can be realized in real time. Further, the amount of rotation control of the present invention can refer to the feature area formed by any three consecutive feature points, so that gestures can be used to generate richer control options. Compared with the Patent No. 092133383 of the Republic of China, the method of the present invention does not need to directly drive a double click according to the time interval, that is, according to the gesture change formed by the feature points, to achieve a more immediate gesture operation. Compared with U.S. Patent No. 5,540,043, the method of the present invention does not require prior training to store gesture patterns and is therefore not limited to the prescribed gesture style. Compared with U.S. Patent No. 7,829,645, the method of the present invention does not require reference time information in the determination of gesture semantics. The invention further provides a recording medium (such as a CD, a floppy disk and a removable hard disk, etc.), which records a computer readable computer program for performing the above-mentioned video-based two-dimensional object identification and composition. method. Here, the computer program stored on the recording medium is basically composed of a plurality of code segments (for example, creating an organization chart code segment, signing a 16 201020935 form code segment, setting a code segment, and deploying a program. The code segment), and the function of these code segments corresponds to the steps of the above method and the functional block diagram of the above system. While the present invention has been described above by way of a preferred embodiment, it is not intended to limit the invention, and the present invention may be modified and modified without departing from the spirit and scope of the invention. The scope of protection is subject to the definition of the scope of the patent application. 201020935 [Simple description of the diagram] Figure 1 is a schematic diagram. Fig. 2 is a flow chart showing the steps of the method for recognizing and constituting a two-dimensional object based on video according to an embodiment of the present invention. Figure 3 is a block diagram showing the architecture of the identification and composition system of the two-dimensional object based on video in the embodiment of the present invention.

【主要元件符號說明】 200、300〜視訊擷取裝置 400〜電腦裝置 410〜視訊擷取模組 420〜前處理模組 430〜視訊物件分割模組 440〜判斷模組 % 〜控制模組 460〜比較模組 S201"S212〜流程步驟[Main component symbol description] 200, 300 to video capture device 400 to computer device 410 to video capture module 420 to pre-processing module 430 to video object segmentation module 440 to determination module % to control module 460~ Compare module S201 "S212~ process steps

Claims

201020935 X. Patent application scope: The following 1 step is based on the two-dimensional object recognition and composition method of video, and an image is obtained from a video capture device; a pre-processing operation is performed on the image frame. According to the image of the township image (4)# at least one foreground object in the variation difference picture; in the collection, the special microscopic change angle between the image image and the feature points of the previous image frame=the current image surface And comparing the position change of the feature points with the change of the angle of the feature points with the feature points of the front-image pupil; and performing an input operation corresponding to the input device according to the comparison result. 2. The video-based two-dimensional book: the identification and composition method described in the item 1 of the patent, wherein the pre-processing operation includes adjusting the white balance coefficient in the image. 3. The method for identifying and constructing a two-dimensional object based on video according to claim 1 of the patent specification, further comprising the following steps: when the foreground object in the image frame is obtained, the image is displayed. Separating the image object; determining whether the foreground object contains a plurality of objects; 19 201020935

The child is positive. w(4), obtaining the sign point of the foreground object and connecting the plurality of line segments between the feature points; and when the scene object is not the forward image, inverting the foreground object is

After obtaining each block object, according to the intersection position of one of the block objects and the image boundary of the background of the image plane, and determining whether the block object is a forward image according to the pre-determination; and if the block If the object is not a forward image, the block object is inverted to be the forward image. 5. The method for identifying and constructing a two-dimensional object based on video according to claim 1, wherein the feature point set of the foreground object or the contour of the block object is obtained in a forward/counterclockwise direction or in an arbitrary manner. And connecting line segments between feature points. 6. The method for identifying and constructing a two-dimensional object based on video according to claim 1, wherein calculating an angle information of each of the three feature points and a length of the corresponding line segment In order to obtain the effective angle information corresponding to the feature points. 7. The video-based two-dimensional object according to claim 1, wherein the input device is a mouse, a sewing disk or a method for recognizing and constituting the same, wherein it is a hardware input device. 8. The method for identifying and constituting a two-dimensional object based on video according to claim 1, wherein after obtaining a plurality of image planes, the position of the plurality of image objects taken out is determined by the position of the plurality of image objects. The underscores of the objects are combined in order of objects to produce the corresponding input operation. 9. A video-based two-dimensional object as described in claim 8 of the patent scope

The identification and composition method', wherein the order of the objects can be calculated according to the coordinates obtained by the relative coordinates or the absolute coordinates, and is calculated by a sortable function. 10. The method for identifying and constructing a two-dimensional object based on video according to claim 1, wherein the positional change of the feature points is a change in the length of the line segment formed between the two feature points. 11. The method for recognizing and constituting a two-dimensional object based on video according to claim 1, wherein the position of the feature points changes to a plurality of • the length of the line segment connected between the feature points. 12. The method for recognizing and constructing a two-dimensional object based on video according to claim 1, wherein the input operation is a single button behavior or a multiple button combination behavior on a keyboard or a mouse. 13. The method for identifying and constructing a two-dimensional object based on video according to claim 1, wherein the input operation is a combined behavior of multiple buttons and movements on a mouse. 14. The method for recognizing and constructing a two-dimensional object based on video according to claim 1, wherein the input operation is a keyboard multiple press 21 201020935 key plus a mouse multiple button and movement generated The combined behavior. 15. The method of identifying and constructing a two-dimensional object based on video according to claim 1, wherein the image plane comprises a whole body or a part of a human body and a complete W limb or a local limb. 16. A system for recognizing and constituting a two-dimensional object based on video, comprising: a video capture device for acquiring an image plane; and a computer device, further comprising: ❿ a determination module; a video The capture module obtains the image frame from the video capture device; a pre-processing module performs a pre-processing operation on the image frame; and a video object segmentation module, according to pixel color variation in the image frame The difference is obtained to obtain at least one foreground object in the image frame; a control module that acquires a feature point set of the contour of the foreground object and a plurality of line segments connecting the plurality of feature points in the feature point set; Comparing the module, comparing the positional changes of the feature points of the image plane and the previous image plane, and comparing the angles between the current image plane and the feature points of the previous image plane; wherein the control module A comparison result of the positional change of the feature points and the change in the included angle is recorded, and an input operation corresponding to one of the input devices is performed in accordance with the comparison result. 17. The system for identifying and constituting a video-based two-dimensional object 22 201020935 , according to claim 16 wherein the pre-processing module adjusts a white balance coefficient in the image frame. 18. The system for identifying and constituting a two-dimensional object based on video according to claim 16 wherein: the video processing module images the image frame when the foreground object in the image plane is obtained. Block separation; =& the determination module determines whether the foreground object contains a plurality of objects, and when the foreground object does not include a plurality of objects, determines whether the foreground object is a forward image according to a prior definition; When the object is the forward image, the control module obtains a plurality of feature points of the contour of the front member and a plurality of links between the feature points and when the foreground object is not the forward image, The foreground object is transferred to the forward image. 19. The system for identifying and constituting a two-dimensional object based on video according to claim 18, wherein: ❿################################################################################ The single block object is processed separately; after obtaining each block object, the judging module judges the block according to the pre-defined definition of the intersection position of one of the blocks = the image boundary with the background of the image plane Whether the object is a forward image; and C=: is a forward image', and the control module reverses the ϋ of the area. The video-based two-dimensional object recognition and composition system described in claim 16 , wherein the foreground object or the set of feature points of the rim of the block object and the line segment connecting the feature points are obtained in a forward/counterclockwise direction or in any manner 23 201020935. 21. The system for identifying and constituting a two-dimensional object based on video according to claim 16, wherein the control module calculates angle information and corresponding line segments of each of the three feature points of the feature points. The length of the clip to obtain valid information about the corresponding feature points. 22. A system for identifying and constituting a two-dimensional video-based object, as described in the scope of the patent application, wherein the input device is a mouse, a keyboard or other hardware input device. 23. The system for recognizing and constituting a two-dimensional object based on video according to the scope of claim π, wherein, after obtaining a plurality of image frames, determining the image objects based on the positions of the plurality of image objects extracted The order ' is combined in order of objects to produce the corresponding input operation. 24. The system for identifying and constituting a two-dimensional object based on video according to claim 23, wherein the order of the objects can be based on coordinates obtained by relative coordinates or absolute coordinates, and using a sortable The operation and operation of the two-dimensional object-based identification and composition system according to claim 16, wherein the position of the feature points changes the length of the line segment formed between the solid feature points. The piece 26 is a video-based two-dimensional object identification and composition system, wherein the positional change of the feature points is a multi-feature point-to-four (four) segment length change. 27. The identification and composition system of a video-based two-dimensional object 24 201020935, as described in claim 16, wherein the input operation is a single button behavior or a multi-key combination behavior on a keyboard or mouse. 28. The system for recognizing and constituting a two-dimensional video-based object according to claim 16, wherein the input operation is a combined behavior of a plurality of keys and movements on a mouse. 29. The system for recognizing and constructing a two-dimensional object based on video according to claim 16, wherein the input operation is a combination of multiple keys of a keyboard plus multiple buttons of a mouse and movement. 10: The system for identifying and constituting a two-dimensional object based on video according to claim 16, wherein the image plane comprises a whole body or a part of a human body and a complete limb or a local limb. 25