TW561423B

TW561423B - Video-based image control system

Info

Publication number: TW561423B
Application number: TW90118059A
Authority: TW
Inventors: Evan Hildreth; Francis Macdougall
Original assignee: Jestertek Inc
Priority date: 2000-07-24
Filing date: 2001-07-24
Publication date: 2003-11-11

Abstract

A method of using stereo vision to interface with a computer is provided. The method includes capturing a stereo image, and processing the stereo image to determine position information of an object in the stereo image. The object is controlled by a user. The method also includes communicating the position information to the computer to allow the user to interact with a computer application.

Description

561423 A7 B7五、發明説明（1 ) ^ 相關應用對照本申請專利案係出於西元2000年7月24日所提出之美國臨時申請案第60/220,223號，其標題爲VIDEO-BASED IMAGE CONTROL SYSTEM，並以引用的方式併入本文中。技術範圍本發明係關於一種影像處理系統，更特別的是與一種用 : 於處理立體影像資料之以視頻爲主的影像控制系統有關。-- 背景目前有許多可用於與一種電腦系統互動或控制電腦系統的操作系統。多數這類操作系統使用運用公認的圖形使用者介面（GUI)功能與控制技術的標準化介面。由於此功能與控制技術常見於GUI間，使不熟悉電腦平台與/或應用的使用者仍可輕易控制不同的電腦平台與使用者應用。一種常用的控制技術爲使用滑鼠或軌跡球式的點選裝置， - 移動營幕物件上的游標。一種點選物件（一次或兩次）的姿態執行一 GUI功能。然而，選擇GUI功能將阻礙不熟悉電腦滑鼠操作人士聯繫電腦系統。另外也有一種無法使用電腦滑鼠或軌跡球的情形，如在城市街道上百貨公司的展示櫥窗前或使用者本身爲傷殘者。概要在一個通用觀點中揭露一種使用立體視覺與電腦聯繫的 ^ 方法。此方法包括抓取一立體影像及處理此立體影像以決定此立體影像中物體的位置資訊。此物體可由一使用者控 v 裝訂561423 A7 B7 V. Description of the invention (1) ^ Related application comparison The patent application of this application is based on US Provisional Application No. 60 / 220,223 filed on July 24, 2000, and its title is VIDEO-BASED IMAGE CONTROL SYSTEM , And incorporated herein by reference. TECHNICAL FIELD The present invention relates to an image processing system, and more particularly relates to a video-based image control system for processing stereoscopic image data. -Background There are many operating systems that can be used to interact with or control a computer system. Most of these operating systems use standardized interfaces that use well-known graphical user interface (GUI) functions and control technologies. Since this function and control technology are common between GUIs, users who are not familiar with computer platforms and / or applications can still easily control different computer platforms and user applications. A common control technique is to use a mouse or trackball-type pointing device,-to move the cursor on the object of the camp screen. A gesture of clicking an object (once or twice) performs a GUI function. However, choosing a GUI function will prevent unfamiliar computer mouse operators from contacting the computer system. There are also situations where a computer mouse or trackball cannot be used, such as in front of a department store display window on a city street or the user is disabled. Summary In a general perspective, a method of using stereo vision to connect with a computer is revealed. The method includes capturing a stereo image and processing the stereo image to determine position information of an object in the stereo image. This object can be controlled by a user

k -4- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 561423 A7 B7 五、發明説明（2 ) 制。此方法也包括徒用此位置資訊以允許使用者與一種電腦應用互動。抓取立體影像之步驟包括利用一立體照相機抓取立體影像。此方法也包括利用分析此物體在位置資訊的改變識別一種相關於此物體的姿態，並根據此已識別之姿態控制電腦應用。此方法也包括決定一種電腦應用的應用狀態並使用此應用狀態識別此姿態。此物體可爲使用者。名另一個 -範例中.，此物·體爲使用者的一部份。本方法尚包括提供反--饋給關於此電腦應用的使用者。在上述實施例中，處理此立體影像以決定此物體之位置資訊的步驟包括映射從相關於此物體之位置座標到相關於此電腦應用之螢幕座標的位置資訊。處理此立體影像的步驟也包括處理此立體影像以識別特徵資訊以及由此特徵資訊產生一景象描述。 , 處理立體影像之步驟也包括分析此景象描述以識別一物體的位置雙化與映射此物體的位置變化。處理立體影像以產生景象描述之步驟也包括處理此立體影像以識別在此立體影像的匹配特徵對，並計算每個匹配特徵對的不一致與位置以產生一景象描述。此方法也包括分析一景象分析方法中的景象描述以決定此物體的位置資訊。抓取此立體影像包括抓取一參考照相機的參考影像以及二一比較照相機的比較影像，且處理此立體影像也包括處理此參考影像與比較影像以產生特徵對。 __ -5- 本紙張尺度適用中國國家標準(CNS) Α4規格(210 X 297公釐) 561423k -4- The size of this paper applies to Chinese National Standard (CNS) A4 (210 X 297 mm) 561423 A7 B7 V. Description of Invention (2). This method also includes using this location information to allow the user to interact with a computer application. The step of capturing a stereo image includes capturing a stereo image using a stereo camera. The method also includes analyzing a change in position information of the object to identify a gesture related to the object, and controlling a computer application based on the recognized gesture. The method also includes determining an application state of a computer application and using the application state to recognize the gesture. This object can be a user. Name another-In the example, this object is part of the user. The method also includes providing feedback to a user about the computer application. In the above embodiment, the step of processing the stereo image to determine the position information of the object includes mapping the position information related to the position coordinates of the object to the screen coordinates of the computer application. The steps of processing the stereo image also include processing the stereo image to identify feature information and generate a scene description from the feature information. The step of processing the stereo image also includes analyzing the scene description to identify the location of an object and the mapping of the location change of the object. The step of processing the stereo image to generate a scene description also includes processing the stereo image to identify matching feature pairs in the stereo image, and calculating the inconsistency and position of each matching feature pair to generate a scene description. This method also includes analyzing the scene description in a scene analysis method to determine the position information of the object. Capturing the stereo image includes capturing a reference image of a reference camera and a comparison image of a two comparison camera, and processing the stereo image also includes processing the reference image and the comparison image to generate a feature pair. __ -5- This paper size applies to China National Standard (CNS) Α4 size (210 X 297 mm) 561423

處理此立體影像以識別此立體影像中的匹配對也包括識别此參考影像中的特徵，爲每個參考影像中的特徵產生一 :比較影像中的候選匹配特徵’以及從此候選匹配特徵組中馬每個參考影像中的特徵選擇一最佳匹配特徵以產生一 =徵對。處理此立體影像也包㈣波此參考影像與比較影像0 配特徵計算一匹配分的候選匹配特徵以產- 處理此特徵對也包括爲各個候選匹數與棑名”並選擇具有最高匹配分數生此特徵對。 =參考影像中的每個特徵產生—組候選匹配特徵包括由比板影像之縣決Μ圍中選擇候選匹配特徵。根據候選匹配特徵的匹配分數消除特徵對。若最高排名 ::配特徵之匹配分數低於預先決定臨界値，也可消除最高排名候選匹時徵的匹配分數在較低排名候k匹配特徵之匹配分數的特徵對。， 1預先决疋臨界値内’則消除此代!算PC配刀數的步驟包括識別所有鄭近的特徵對，將特 =配分數與不同的鄰近候選匹配特徵匹配分數調整 ::’以及選擇具有最高調整匹配分數的候選匹配特欲以產生特徵對。像可藉由應用比較影像做爲參考影像與參考影 ” ^ 像以產生第二組特徵，以及消^該等沒有與弟7: 且：徵對映射之特徵對的原始特徵對組内的特徵對。立万’包括為此景象描述中的各個特徵對，利用轉換與Processing the stereo image to identify matching pairs in the stereo image also includes identifying the features in the reference image, and generating one for each feature in the reference image: comparing candidate matching features in the image, and from the candidate matching feature group. Each feature in the reference image selects a best-matching feature to generate a = sign pair. Processing this stereo image also includes the reference image and the comparison image. 0 Matching features to calculate a candidate matching feature to produce a matching score-processing this feature pair also includes for each candidate number and name "and select the student with the highest matching score. This feature pair. = Each feature in the reference image is generated—a group of candidate matching features includes selecting candidate matching features from the county decision area of the board image. Eliminating feature pairs based on the matching score of the candidate matching features. If the highest ranking :: match The matching score of the feature is lower than the pre-determined critical threshold, and it can also eliminate the feature pairs whose matching scores of the highest ranking candidate are in the matching score of the lower-ranking k-k matching features., 1 The step of calculating the number of PC matching knives includes identifying all Zheng Jin's feature pairs, adjusting the special score with different neighboring candidate matching feature matching score adjustments :: ', and selecting the candidate matching special desire with the highest adjusted matching score to generate Feature pairs. Images can be compared by using the comparison image as a reference image and a reference image. "^ Image to generate a second set of features, and 7 and brother have: and: intrinsic features of the original feature mapping of feature pairs in the group pair. Liwan ’includes the feature pairs in the description of this scene.

561423 A7561423 A7

每個立體影像之眞實世以計算眞實世界座標。選擇徵對的不-致與位置影像與比較影像劃㈣區塊中4 體影像的參考像素的明視度樣式所描述。劃：包含於區塊中之有固定大小的像素區塊中將，影像劃分到具塊。此像素區塊爲8x8的像素區分析此景象描述以決定此物景象描述以·排除視域中所想區建立所想區域的邊界。月足之位置資訊也包括修剪此域外的特徵資訊。修剪包括_ 分析此景象描述以決定此物體的位置資訊包括，在—預先決定的範圍内利用與鄰近特徵資訊的比較將有興趣區域 (特徵資訊分群爲有特徵集合的群，並計算各群的位置。分析此景象描述也包括消除所有具有低於預先決㈣徵臨界値的群。，分析此意象描述也包括，選擇匹配一預先決定標準之群的位置，記綠匹配此預先決定標準之群的位置爲物體位置座標，及輸出此物體位置座標。此方法也包括利用檢查表示偵測區域内的特徵決定此群使用者的存在。計算各群位置可排除一物體偵測區域外的群的特徵。此方法包括根據此物體位置座標定義一動態物體偵測區域。此外，此動態物體偵測區域可定義成關於一使用者身體。此方法包括根據此物體位置座標定義一身體位置偵測區域。定義此身體位置偵測區域也包括偵測此使用者的頭部本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) A7 B7 561423 五、發明説明（5 位置。此方法也包括平滑此物體位置座標之運動以消除連續影像訊框間的跳動。此方法包括計算來自此物體位置座標的手導向資訊。輸出此物體位置座標包括輸出此手導向資訊。計算手導向資訊也包括平滑此手導向資訊内的改變。定我動怨物體偵測區域也包括識別特徵集合之軀幹分割平面的位置，以及決定垂直於軀幹分割平面座標軸之關於軀幹分割平面的手偵測區域位置。定義動態物體偵測區域包括從此特徵集合識別一身體中央位置與一身體邊界位置，使用此特徵對群與軀幹分割平面的交點從此特徵集合識別一指示此使用者手臂部分的位置，使用相關於此身體位置的手臂位置來識別此手臂爲左臂或右臂。此方法也包括由身體中央位置，身體邊界位置，軀幹分割平面，冬左臂或右臂識別建立一肩膀位置。定義此動態偵測區域包括決定與肩膀位置相關之手偵測區域的位置資料。此技術包括平滑手偵測區域的位置資料。此外，此技術包括決定垂直於此軀幹分割平面的座標軸之與軀幹分割平面相關的動態物體偵測區域的位置，在與肩膀位置相關的水平座標軸決定動態物體偵測區域的位置，以及在與使用此身體邊界位置之使用者全邵鬲度的垂直座標軸決定此動態物體偵測區域的位置。定義此動態物體偵測區域包括之步驟如下，除非最高特 -8 - 本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐)The real world of each stereo image is used to calculate the real world coordinates. Select the non-correspondence and position of the pair of images. Describes the lightness pattern of the reference pixels of the 4-volume image in the block of the image and comparison image. Stroke: A fixed-size pixel block included in a block divides the image into blocks. This pixel block is an 8x8 pixel area. Analyze the scene description to determine the object. The scene description excludes the desired area in the view and establishes the boundary of the desired area. The position information of the moon foot also includes trimming feature information outside this domain. Trimming includes_ Analyzing this scene description to determine the position information of this object includes, using a comparison with neighboring feature information within a predetermined range to compare areas of interest (feature information is grouped into groups with feature sets, and the Position. Analyzing this scene description also includes eliminating all groups with a threshold below the pre-determined threshold. Analyzing this image description also includes selecting the location of a group that matches a predetermined criterion, and remembering that the group that matches this predetermined criterion is green. Is the position of the object, and outputs the position of the object. This method also includes checking the characteristics of the detection area to determine the existence of users in this group. Calculating the position of each group can exclude the group outside the object detection area. Features. This method includes defining a dynamic object detection area based on the object position coordinates. In addition, the dynamic object detection area can be defined as a user's body. This method includes defining a body position detection area based on the object position coordinates. .Define this body position detection area also includes detecting this user's head and paper The scale applies to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) A7 B7 561423 5. Description of the invention (5 position. This method also includes smoothing the movement of the position coordinates of the object to eliminate the jump between consecutive image frames. This The method includes calculating hand guidance information from the position coordinates of the object. Outputting the position coordinates of the object includes outputting the hand guidance information. Calculating the hand guidance information also includes smoothing the changes in the hand guidance information. Definitely the object detection area also includes Identify the position of the torso segmentation plane of the feature set and determine the position of the hand detection area on the torso segmentation plane perpendicular to the coordinate axis of the torso segmentation plane. Defining a dynamic object detection area includes identifying a central position of the body and a body boundary position from this feature set Use this feature to identify the intersection of the group and the torso segmentation plane from this feature set to identify a position indicating the user's arm part, and use the arm position related to this body position to identify this arm as the left or right arm. This method also includes Plane division by body center position, body boundary position, torso The winter left or right arm recognition establishes a shoulder position. Defining this dynamic detection area includes determining the position data of the hand detection area related to the shoulder position. This technology includes smoothing the position data of the hand detection area. In addition, this technology Including determining the position of the dynamic object detection area related to the torso segmentation plane perpendicular to the coordinate axis of the torso segmentation plane, determining the position of the dynamic object detection region at the horizontal coordinate axis related to the shoulder position, and using the body boundary position The vertical coordinate axis of the user ’s full scale determines the position of this dynamic object detection area. The steps to define this dynamic object detection area are as follows, unless the highest special -8-This paper size applies the Chinese National Standard (CNS) A4 specification (210X (297 mm)

裝訂Binding

561423 A7 B7 五、發明説明（6 欲對在邊界，否則便使用特徵集合之最南特徵對以建立使用者頭頂位置，以及決定與此使用頭頂相關之手偵測區域位置。另一個觀點中揭露一種使用立體視覺與電腦聯繫的方法。此方法包括利用一立體照相機抓取一立體影像，以及處理此立體影像以決定此立體影像中物體的位置資訊，其中此物體由一使用者控制。此方法尚包括處理此立體影像以識別.特徵資訊，產生此特徵資訊的景象描述，與識別此-三體影像中的匹配特徵對。此方法也包括計算各個匹配特徵對的不一致與位置以產生此景象描述，以及分析一景象分析方法中的景象描述以決定此物體的位置資訊。此方法包括利用預先決定之範圍中與鄰近特徵資訊的比較，將所想區域之特徵資訊分群爲具有特徵集合的群，計算各群的位置，並使用此位置資訊以允許使用者與一種電腦應用互動0 此外，此技術包括映射由照相機座標到螢幕座標與此電腦應用相關之特徵資訊的物體位置，並使用此映射位置與此電腦應用互動。此方法包括利用分析此景象描述中物體位置資訊的改變以識別物體相關的姿態，並結合位置資訊與姿態以便與此電腦應用互動。抓取此立體影像的步驟包括使用一立體照相機抓取此立體影像。另一個觀點中揭露一種用於與電腦所執行之應用程式互動的立體視頻系統。第一與第二視頻照相機係配置於鄰近 _ __-9- 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公复) 423 A7 B7 五、發明説明（7 ) 結構並可操作產生一系列立體視頻影像。一處理器可操作用於接收此系列之立體視頻影像與偵測在此照相機視野之父又範圍所不的物體。此處理器執行一程序以定義與第一與第二視頻照相機位置相關之三維座標的物體偵測區域，選擇物體偵測區域内所示的控制物體，並在此控制物體移動到此物體偵測區域内時，映射此控制物體的位置座標到相關於此應用程式的位置指示器。此程·序選-擇一控制物體爲一出現在最接近此視頻照相機以及在此物體偵測區域之内的已偵測物體。此控制物體爲人類的手。一種與視頻照相機相關之控制物體的水平位置映射到此位置指示器的X軸螢幕座標。一種與視頻照相機相關之控制物體的垂直位置映射到此位置指示器的y軸螢幕座標。安裝此處理器以映射相關於此視頻照相機之控制物體的水平位置U此位置指示器的X軸螢幕座標，映射相關於此視頻照相磯之控制物體的垂直位置到此位置指示器的y軸螢幕座標’並模擬一使用提供給此應用程式之已結合X卓由與y軸螢幕座標的滑鼠功能。安裝此處理器以模擬利用由物體位置行動所得姿態的滑鼠按鈕。安裝此處理器以模擬對於一預先決定時間週期基於在此物體偵測區域之内的任何位置之控制物體的持續位置之滑鼠按叙。在另一個範例中安裝此處理器以模擬對於預先決定時間週期基於在一相互顯示區域的邊界之内持 ’ ’ ’貝、的位置指示器之位置的滑鼠按鈕。安裝此處理器以映射 -10- 561423 A7 ______ B7 五、發明説明（8 ) 相關於此視頻照相機之控制物體的z軸深度位置到此位置指示器的Z軸螢幕座標。安裝此處理器以映射相關於此視頻照相機之控制物體的 X軸位置到此位置指示器的x軸螢幕座標，映射相關於此視頻照相機之控制物體的y軸位置到此位置指示器的y軸螢幕座“，及映射相關於此視頻照相機之控制物體的z軸深度位置到此位置指示器的Z軸螢幕座標。一相互醑示區域邊界内之位置指示器的位置觸發應用程式内的行動。在一預先決定時間週期之内沿著涵蓋一預先 /夬足距離之z軸深度位置的控制物體移動觸發此應用程式内的選擇行動。用於一預先決定時間週期之物體偵測區域内任一位置所持續的控制物體位置觸發此應用程式内的選擇行動。另一個觀點中揭露一種用於與電腦所執行之應用程式互動的立體頻系統。第一與第二視頻照相機係配置於鄰近結構並可操作產生一系列立體視頻影像。一處理器可操作用於接收此系列之立體視頻影像與偵測在此照相機視野之 X叉範圍所示的物體。此處理執行一程序以定義在相關於第一與第二視頻照相機位置的三維座標的物體偵測區域，選擇一控制物體爲一出現在最接近此視頻照相機以及在此物體偵、測區域之内的已偵測物體，定義此物體偵測區域内的次區域，識別此控制物體佔據的次區域，當此控制物體佔據此次區域時，啓動與此次區域相關的一個行動，應用此行動以便與一電腦應用互動。 -11- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 五、發明説明（9 再定義與此次區域相關的行動爲鱼鍵的啓動模擬。用於-預先決定時間週：:：相關之按續的控制物體位置觸發此行動。。、久區域所持 =另-個觀點中揭露-種用於與電腦所執行之應用程式動的互體視頻系統。第一與第二視頻照相機係配近結構並可操作產生-系列立體視頻影像。一處理哭可: 作用於接收此系列之立體視頻影像與偵測在此照相機視= 足叉又範野所示的物體。此處理執行_程序以識別理解爲出現在照相機視野的交又視野與位於—預先決定深度範圍 <最大物體的物體，選擇此物體爲目標物標，決定代表此所想位置之位置座標，並使用此位置座標爲一物體控制點以控制此應用程式。此程序也導致此處理器決定並儲存一中立控制點位置，映射相關於此中立控制點位置之物體控制點的座標，與使用此映射％體控制點座標以控制此應用程式。此程序也導致此處理器根據此中立控制點位置之位置定義具有一位置的區域，在此區域之内映射相關於其位置的物體控制點使用此映射的物體控制點座標以控制此應用裎式。此程序也導致此處理器轉換此已映射物體控制點到一速度函數，決定相關於此應用程式之虛擬環境的視角，並使用此速度函數在此虛擬環境之内移動此視角。此程序導致此處理器在此應用程式内峡射此物體控制點的座標以控制一指示器位置。此實行方法中的指示器爲— 化身（avatar)。 -12-561423 A7 B7 V. Description of the invention (6 If you want to be at the boundary, otherwise use the southernmost feature pair of the feature set to establish the user's head position and determine the position of the hand detection area related to the use of the head. This is disclosed in another point of view A method for communicating with a computer using stereo vision. The method includes capturing a stereo image using a stereo camera, and processing the stereo image to determine position information of an object in the stereo image, wherein the object is controlled by a user. This method It also includes processing the stereo image to identify the feature information, generate a scene description of this feature information, and identify the matching feature pairs in this three-body image. This method also includes calculating the inconsistencies and positions of each matching feature pair to generate this scene Description, and analysis of a scene description in a scene analysis method to determine the location information of the object. This method includes comparing feature information of a desired area into a group with a feature set by using a predetermined range of comparison with neighboring feature information. , Calculate the location of each group, and use this location information to allow The user interacts with a computer application. In addition, this technology includes mapping object positions from camera coordinates to screen coordinates and feature information related to this computer application, and using this mapped position to interact with this computer application. This method includes analyzing this scene description using Change the position information of the object in order to recognize the posture related to the object, and combine the position information and posture to interact with this computer application. The step of capturing the stereo image includes capturing the stereo image using a stereo camera. Another aspect discloses a A stereo video system for interacting with applications executed by the computer. The first and second video cameras are located near _ __9- This paper size applies to the Chinese National Standard (CNS) Α4 specification (210X297 public copy) 423 A7 B7 V. Description of the Invention (7) The structure can be operated to generate a series of stereo video images. A processor can be used to receive the stereo video images of this series and detect objects that are not within the range of the father of the camera field of view. The processor executes a program to define a three-dimensional relationship with the first and second video camera positions Target object detection area, select the control object shown in the object detection area, and when the control object moves into the object detection area, map the position coordinates of the control object to the position indicator related to this application This procedure · Sequential Selection-Select a control object as a detected object that appears closest to the video camera and within the object detection area. This control object is a human hand. A control related to the video camera The horizontal position of the object is mapped to the X-axis screen coordinate of this position indicator. A vertical position of the control object related to the video camera is mapped to the y-axis screen coordinate of this position indicator. Install this processor to map the video camera related Control the horizontal position of the object U The X-axis screen coordinate of this position indicator, map the vertical position of the control object related to this video camera to the y-axis screen coordinate of this position indicator and simulate a use provided to this application It has combined mouse functions of X Zhuoyou and y-axis screen coordinates. This processor is installed to simulate a mouse button using gestures derived from the position of an object. This processor is installed to simulate a mouse click for a predetermined time period based on the continuous position of the control object at any position within the object detection area. In another example, the processor is installed to simulate a mouse button for a predetermined period of time based on the position of a position indicator that holds a '' '' in a mutual display area. Install this processor to map -10- 561423 A7 ______ B7 V. Description of the invention (8) The z-axis depth position of the control object related to this video camera to the Z-axis screen coordinate of the indicator. Install this processor to map the X-axis position of the control object related to this video camera to the x-axis screen coordinates of this position indicator, and map the y-axis position of the control object related to this video camera to the y-axis of this position indicator Screen mount ", and map the z-axis depth position of the control object related to this video camera to the Z-axis screen coordinate of this position indicator. A position of the position indicator within the boundaries of the mutual display area triggers actions in the application. Movement of a control object along a z-axis depth position covering a predetermined / stomp distance within a predetermined time period triggers a selection action in this application. Any one of the object detection areas for a predetermined time period The continuous control of the position of the object triggers the selection action in this application. Another aspect discloses a stereo system for interacting with an application executed by a computer. The first and second video cameras are arranged in adjacent structures and Operable to generate a series of stereo video images. A processor is operable to receive this series of stereo video images And detecting an object shown in the X-cross range of the camera field of view. This process executes a program to define an object detection area at a three-dimensional coordinate relative to the first and second video camera positions, and selects a control object as an output Now it is closest to the video camera and the detected objects within the object detection and measurement area. Define the sub-area within this object detection area, identify the sub-area occupied by this control object, and when this control object occupies this area At this time, start an action related to this area, and apply this action to interact with a computer application. -11- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 5. Description of the invention (9 Redefine the action related to this area as the start simulation of the fish key. It is used for-predetermining the time period: ::: the relevant continuous control of the position of the object to trigger this action., Held by the long area = disclosed in another point of view -An interactive video system for interacting with applications executed by the computer. The first and second video cameras are equipped with near structure and can be operated to produce a series of stereo video images. A process can cry: It is used to receive the stereo video images of this series and detect the objects shown in this camera = foot fork and fan Ye. This process executes the program to identify the intersection and field of vision that are understood to appear in the camera's field of view. Located at-Predetermine the object of the depth range < the largest object, select this object as the target object, determine the position coordinate representing the desired position, and use this position coordinate as an object control point to control the application. This program also Causes the processor to determine and store a neutral control point position, maps the coordinates of the object control points related to the neutral control point position, and uses this mapping to control the coordinates of the body control point to control the application. This process also causes the processor An area having a position is defined according to the position of the neutral control point position, and an object control point related to its position is mapped within this area. The object control point coordinates of the map are used to control the application mode. This procedure also causes the processor to convert the mapped object control point to a velocity function, determine the perspective of the virtual environment related to the application, and use this velocity function to move this perspective within the virtual environment. This procedure causes the processor to shoot the coordinates of the control point of the object within the application to control a pointer position. The indicator in this implementation method is-avatar. -12-

561423 A7 B7561423 A7 B7

此程序導致此處理器在此應用程式之内映射映射此物，控制點的座標以控制一指示器的出現。此實行方法中的浐示器爲-化身。此目標物標爲一視野交又範圍内所出心人類。另一個觀點中揭露一種用於與電腦所執行之應用程式互動的立體視頻系統。第一與第二視頻照相機係配置於鄰近結構並可操作產生一系列立體視頻影像。一處理器可操作 _ 用於接收此系列之立體視頻影像與偵測在此照相機視野之—' 交叉範圍所示的物體。此處理執行一程序以識別理解爲出現在照相機視野的交叉視野與位於一預先決定深度範圍之最大物體的物體，選擇此物體爲目標物標，定義在此照相機與感興趣物體之間的控制區域，此控制區域位於一預先決定位置並有一相關於目標物標之大小與位置的預先決定大小，尋找用於相關於最接近此照相機並在此控制區域内目標物標智控制區域，如果相關於此感興趣物體的點在此控制區域_之内，選擇相關於此感興趣物體的點爲一控制點，並在此控制點移動到此控制區域之内，映射此控制點的位置座標到相關於此應用程式的位置指示器。可操作此處理器以映射相關於此視頻照相機之控制物體的水平位置到此位置指示器的X轴螢幕座標，映射相關於此視頻照相機之控制物體的垂直位置到此位置指示器的y 軸螢幕座標，以及模擬一使用已結合X軸與y軸螢幕座標的滑鼠功能。另一個選擇是，可操作此處理器以映射相關於此視頻照 -13- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐）This procedure causes the processor to map the object within the application, controlling the coordinates of the control points to control the appearance of an indicator. The indicator in this implementation method is-avatar. This target is a human being in the field of vision. Another aspect discloses a stereo video system for interacting with applications executed by a computer. The first and second video cameras are disposed adjacent structures and are operable to generate a series of stereo video images. A processor is operable _ for receiving stereo video images of this series and detecting objects shown in the-'cross range of the camera's field of view. This process executes a program to identify an object that is understood to appear in the cross-field of view of the camera and the largest object in a predetermined depth range, selects this object as the target object, and defines the control area between the camera and the object of interest This control area is located at a predetermined position and has a predetermined size related to the size and position of the target object. Find the control area for the target object intelligent control area that is closest to this camera and within this control area. The point of this object of interest is within this control area_, select the point related to this object of interest as a control point, and move this control point into this control area, mapping the position coordinates of this control point to the relevant Position indicator in this application. The processor can be operated to map the horizontal position of the control object related to the video camera to the X-axis screen coordinate of the position indicator, and map the vertical position of the control object related to the video camera to the y-axis screen of the position indicator Coordinates, and simulate a mouse function that uses X- and y-axis screen coordinates. Another option is to operate this processor to map the photos related to this video. -13- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)

装訂Binding

561423 A7 _____B7 五、發明説明（11 ) 相機之控制物體的X軸位置到此位置指示器的X軸螢幕座標，映射相關於此視頻照相機之控制物體的y軸位置到此位置指示器的y軸螢幕座標，以及映射相關於此視頻照相機之控制物體的z軸深度位置到此位置指示器的z軸螢幕座標。乂體視頻系統中此目標物標爲一視野交叉範圍内所出現的人類。此外，此控制點相關於在此控制區域内·所出現之人類的·手。> — 另一個觀點中揭露一種用於與電腦所執行之應用程式互動的立體視頻系統。第一與第二視頻照相機係配置於鄰近結構並可操作產生一系列立體視頻影像。一處理器可操作用於接收此系列之立體視頻影像與偵測在此照相機視野之交叉範圍所示的物體。此處理執行一程序以定義在相關於第一與第二視頻照相機位置的三維座標的物體偵測區域，選擇來自每此物體偵測區域之内的視野之交叉視野出現之物體的兩手物體，並在此手物體在此物體偵測區域内移動時，映射此手物體的位置座標到相關於此應用程式提供之化身的虛擬手位置。此程序選擇來自在最接近此視頻照相機並在此物體偵測區域之内的視野之交叉視野出現之物體的兩手物體。此化身取得一類似人類身體的格式。此外，此化身提供於形成此應用程式邵分的虛擬環境並與之互動。此處理器執行一私序以比較在此虛擬環境之内相關於此化身的虛擬手位置與虛擬物體的位置讓使用者能夠在一虛擬環境之内與此虛 _ -14- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 561423 A7 B7 五、發明説明（12 ) 擬物體互動。此處理器也執行一程序以偵測視野之交叉範圍内的使用者位置座標，以及映射此使用者的位置座標到此應用程式所提供之化身的虚擬軀幹。若沒有選擇一映射的手物體，此程序移動至少一個與此化身相關之至少一個虛擬手到一中立位置。此處理器也執行一程序以偵測視野之交又範圍内的使用者位置·座標^並映射此使用者的位置座標到應用到此化身-的速度函數’使此化身能漫遊於此應用程式提供的虛擬環境之中。此速度函數包括一表示此化身之零速度的中立位置。此處理器也執行一程序以映射映射相關於此中立位置的使用者位置座標到相關於此化身的軀幹座標以便此化身出現傾斜。此處理器也執行一程序以以比較相關於此化身的虛擬手位置與暫辱虛擬環境之内的虛擬物體位置讓此使用者能夠在漫遊於ib虛擬環境之中時與此虛擬物體互動。如互體視頻系統其中一部份所述，由此應用程式得到與此化身相關之虛擬膝部位置並用於改進此化身的景象。另一個選擇是’由此應用程式得到與此化身相關之虛擬肘部位置並用於改進此化身的景象。下列圖式與描述提出一個或更多實行方法之細節。由描述與圖式及申請專利範圍使其他特徵與優點變得顯而易圖式說明 ___-_____ -1 5. 本紙張尺歧财@ ®家鮮 )61423 A7561423 A7 _____B7 V. Description of the invention (11) The X-axis position of the camera control object to the X-axis screen coordinate of this position indicator, mapping the y-axis position of the control object related to this video camera to the y-axis of this position indicator Screen coordinates, and map the z-axis depth position of the control object related to this video camera to the z-axis screen coordinates of this position indicator. The target in the carcass video system is a human appearing within a field of vision. In addition, this control point is related to the human hand that appears in this control area. > — Another aspect discloses a stereo video system for interacting with applications executed by a computer. The first and second video cameras are disposed adjacent structures and are operable to generate a series of stereo video images. A processor is operable to receive the stereo video images of the series and detect objects shown at the intersection of the camera's field of view. This process executes a program to define an object detection area at three-dimensional coordinates related to the positions of the first and second video cameras, selects a two-handed object that appears from a cross-field of view of the field of view within each object detection area, and When the hand object moves within the object detection area, the position coordinates of the hand object are mapped to the virtual hand position related to the avatar provided by this application. This program selects two-handed objects from objects that appear in the cross-field of view closest to the video camera and within the field of view of the object detection area. This incarnation takes on a format similar to the human body. In addition, this avatar is provided to form and interact with the virtual environment of this application. The processor executes a private sequence to compare the position of the virtual hand and the position of the virtual object related to the avatar within the virtual environment, so that the user can make this virtual within a virtual environment. -14- This paper size applies to China National Standard (CNS) A4 specification (210 X 297 mm) 561423 A7 B7 V. Description of invention (12) Intended object interaction. The processor also executes a procedure to detect the user's position coordinates within the intersection of the field of view and map the user's position coordinates to the virtual torso of the avatar provided by this application. If a mapped hand object is not selected, the program moves at least one virtual hand associated with the avatar to a neutral position. This processor also executes a program to detect the user's position and coordinates within the intersection of the field of view and coordinates ^ and map this user's position coordinate to the speed function applied to this avatar-enabling the avatar to roam the application Provided virtual environment. The speed function includes a neutral position representing the zero speed of the avatar. The processor also executes a program to map user position coordinates related to the neutral position to torso coordinates related to the avatar so that the avatar is tilted. The processor also executes a program to compare the position of the virtual hand related to the avatar with the position of the virtual object within the temporarily degraded virtual environment so that the user can interact with the virtual object while roaming in the ib virtual environment. As described in part of the interbody video system, this application obtains the virtual knee position associated with this avatar and uses it to improve the avatar scene. Another option is, 'This app gets the virtual elbow position associated with this avatar and uses it to improve the look of this avatar. The following figures and descriptions provide details of one or more methods of implementation. Other features and advantages are made easy by the description and drawings and the scope of patent application. Schematic description ___-_____ -1 5. This paper ruler Qi Cai @ ® 家鲜) 61423 A7

圖1描述一種以i目Λ 、喱乂視頻馬王之影像控制系統之一法的硬體成分與環境。貫订万圖2爲概要况明圖！系統所使用之處理技術的流程圖。、圖圖式係描述與圖i之以視頻爲主影像控制系統相關 '^各照相機的視野範圍。圖4描述一立體昭相趟逛r 、且…、相機裝置產生足一組立體影像所顯的所想的共通點與上極線。圖5之流㈣係描述—種由立體影像產生景象描述資訊之立體處理例行程序。圖6之流程圖係一種將景象描述資訊轉換爲位置與方向資料的程序。圖7之曲線圖描述關於位置如距離函數D的削減S的程度。圖8描述影像控制系、统的實行方法”其中一物體或手偵測區域係直#建立在電腦監視器螢幕前。圖9之流:程圖係描述動態定義與一使用者身體相關之手偵測區域的非必要程序。圖1 0A-1 0C之範例係描述圖9用於動態定義與使用者身體相關之手偵測區域的程序。圖11A描述一種與以視頻爲主之影像控制系統聯繫之範例使用者介面與顯示區域。圖11B描述一種用於映射手或指示器位置到與圖11A使用者介面聯繫之顯TF區域的技術。圖12A描述一種虛擬眞實環境中的範例三維使用者介 -1 6 - 本紙張尺度逋用中國國家標準(CNS) A4規格(210X 297^17 561423 五、發明説明（η ) 面。圖12B描述爲視野消除圖12A之三維使用者介面中檔案夾内容。的虛擬圖13 A描述一種用於操控虛擬三維空間之三維使用者仑面的範例表示方式。 1 圖13B之曲線圖表示影像控制系統中的座標區域爲死區域，其中沒有任何虛擬位置必要的改變。圖14描述-二種視頻遊戲介面的範例實行方法，其中行動與姿態可解釋爲用於飛行於虛擬三維都市風景的控制桿形式操控功能。 V 圖15 A之圖式係描述一劃分爲偵測平面之範例頭部偵測區域。圖1 5B之圖式係描述一劃分爲偵測匣之範例頭部偵測區圖15C與45D係描述一劃分爲兩組方向偵測匣的範例頭部偵測區域，並進一步描述一個定義鄰近方向偵測匣間的缺 α 〇不同圖式内相同參考符號指示相同元件。詳細説明圖1描述一種以視頻爲主之影像控制系統1 〇〇的實行方法。一個人物（或許多人物）101將自身置於，或將手伸入，一所想區域102。將所想區域102定位在相關於影像偵測器 1 〇 3的位置，以位於影像偵測器的總體視野範圍1 〇 4内。户斤想區域102包含一手偵測區域1〇5，其中若出現並偵測到人亡裝訂線 -17· 本紙張尺度適用中國國家標準(CNS) Α4規格(210X 297公釐） ------ 561423 A7 B7 五、發明説明（15 ) 身體的部份，則確定並測量人物的位置與姿態。區域，位置與測量皆描述於一個三維X，y，z座標或世界座標，該座標不需對準影像偵測器103。由影像彳貞測器^所產生的視頻於像系列則由可在視頻顯示1 〇 8上顯示視頻影像的計管裝置1 07處理，如個人電腦。如下列較詳細説明所述，計算裝置1〇7處理視頻影像系列，藉以分柝一物體的位置與姿態，例如使用者的手。接著峡射此緣果位置與姿態資訊到一應用程式，諸如一圖形使用者介面（GUI)或一視頻遊戲。使用者手部的位置與姿態顯示於視頻顯示108且允許執行與/或控制GU]^t視頻遊戲中的功能。一範例功能移動螢幕按鈕上的游標並接收一輕壓按姐的姿態以選擇螢幕按鈕。接著由計算裝置1〇7執行按鈕的相關功能。視頻偵測器103係描述於下列詳細說明中。系統100可實行於許多種配置中，例如一桌上電腦配置，其t影像偵測器103架設在用於觀看所想區域1〇2的視頻顯示108表面上；或例如二j員頂照相機配置，其中影像偵測器103架設在支援結構上並定位於觀看所想區域的視頻顯示10 8之上。圖2描述視頻影像分析程序200，可利用與系統1〇〇之典型實行方法相關之電腦軟體或電腦硬體實行之。影像偵測^ 或視頻照相機103取得所想區域102之立體影像2〇1與周圍景象。傳送此立體影像201至計算裝置107(此裝置可選擇性地引用於影像偵測器1〇3中），其在立體影像2〇1上執行_ 立體分析程序202以產生一景象描述2〇3。計算裝置戋 -18- 本紙張尺度適财® ®家標準(CNS) A4規格(21GX 297公爱)Figure 1 depicts the hardware components and environment of one of the video control systems of the video horse king based on i-mesh Λ. Figure 2 is a summary picture! A flowchart of the processing technology used by the system. The diagram is related to the video-based image control system in Fig. I. ^ The field of view of each camera. Fig. 4 depicts a desired common point and upper polar line shown by a stereoscopic image of a stereoscopic camera r and a camera device generating a set of stereoscopic images. The current description in Fig. 5 is a stereo processing routine that generates scene description information from stereo images. The flowchart of Fig. 6 is a procedure for converting scene description information into position and orientation data. The graph of Fig. 7 describes the degree of reduction S with respect to a position such as the distance function D. Figure 8 describes the implementation of the image control system and system. "An object or hand detection area is set straight in front of the screen of a computer monitor. Figure 9 Flow: The process map describes the dynamic definition of hands related to a user's body Non-essential procedures for the detection area. Figure 1 0A-1 0C is an example depicting the procedure for dynamically defining the hand detection area related to the user's body. Figure 11A depicts a video-based image control system An example user interface and display area for contact. Figure 11B depicts a technique for mapping the position of a hand or pointer to a displayed TF area associated with the user interface of Figure 11A. Figure 12A depicts an example three-dimensional user in a virtual reality environment介 -1 6-This paper uses the Chinese National Standard (CNS) A4 specification (210X 297 ^ 17 561423) 5. The description of the invention (η) surface. Figure 12B depicts the view to eliminate the contents of the folder in the three-dimensional user interface of Figure 12A Figure 13A depicts an example representation of a three-dimensional user plane for manipulating a virtual three-dimensional space. 1 The graph in Figure 13B shows that the coordinate area in the image control system is dead. Domain without any necessary changes to the virtual position. Figure 14 depicts two examples of video game interface implementation methods, where actions and attitudes can be interpreted as joystick-type control functions for flying over a virtual three-dimensional cityscape. V Figure 15 The diagram of A describes an example head detection area divided into detection planes. Fig. 1 5B of the diagram describes an example head detection area divided into detection boxes. Figs. 15C and 45D describe one division into Two sets of directional detection box examples of head detection areas, and further describes a definition of the lack of α between adjacent directional detection boxes. 〇 The same reference symbols in different drawings indicate the same components. Detailed description Figure 1 describes a video-based Implementation method of the image control system 100. A person (or many characters) 101 places himself or puts his hand into a desired area 102. The desired area 102 is positioned in relation to the image detector 1 〇3 position, so as to be located within the overall visual field of view of the image sensor 104. The household area 102 includes a one-hand detection area 105, and if a dead line appears and is detected- 17 · This paper size applies the Chinese National Standard (CNS) A4 specification (210X 297 mm) ------ 561423 A7 B7 V. Description of the invention (15) For the body part, determine and measure the position and posture of the person The area, location, and measurement are described in a three-dimensional X, y, z coordinate or world coordinate, which does not need to be aligned with the image detector 103. The video generated by the image sensor ^ is used by the image series The metering device 107 for displaying video images on the video display 108 is processed, such as a personal computer. As described in more detail below, the computing device 107 processes the video image series so as to distinguish the position and attitude of an object, Such as the user ’s hand. The location and attitude information of the root cause is then projected to an application, such as a graphical user interface (GUI) or a video game. The position and posture of the user's hand is displayed on the video display 108 and allows execution and / or control of functions in the video game. An example function moves the cursor on the screen button and receives a light press on the gesture of the sister to select the screen button. The computing device 107 then performs the functions of the buttons. The video detector 103 is described in the following detailed description. The system 100 can be implemented in many configurations, such as a desktop computer configuration, and its image detector 103 is mounted on the surface of a video display 108 for viewing the desired area 102; or, for example, a two-head camera configuration Among them, the image detector 103 is erected on the supporting structure and positioned above the video display 10 8 of the desired area for viewing. FIG. 2 illustrates a video image analysis program 200, which can be implemented using computer software or computer hardware related to a typical implementation method of the system 100. The image detection ^ or the video camera 103 obtains the stereo image 201 and the surrounding scene of the desired area 102. Send this stereo image 201 to the computing device 107 (this device can be selectively referenced in the image detector 103), which is executed on the stereo image 201_ stereo analysis program 202 to generate a scene description 201 . Computing Device 戋 -18- This paper is suitable for Standard Paper ® ® Standard (CNS) A4 (21GX 297)

裝訂Binding

561423 A7 ________B7 五、發明説明（16 ) 一不同的計算裝置由此景象描述203中使用一景象分析程序204計算並輸出人物手部的手/物體位置資訊2〇5或其他合適點選裝置與人物其他特徵的位置或測量。手/物體位置資訊205爲一組三維座標並提供給位置映射程序2〇7，此位置映射程序207映射或轉換此三維座標到一組螢幕座標。由位置映射程序207產生之螢幕座標可作爲應用程式2〇8所使用的螢幕座標位置資訊，該應用程式208執行於計算裝二· 置107上並提供使用者反馈206。 - 也可偵測手部的特定姿態，亦即被偵測爲手/物體位置資訊205所顯示之手與/或其他特徵位置的改變，並由姿態分析與偵測程序209解釋爲姿態資訊或姿態211。接著將位置映射程序207之螢幕座標位置資訊與姿態資訊211 —起傳遞並用於控制應用程式208。若姿態偵測的内容敏感，姿態偵測程序209可使用應用狀況210，且^姿態之標準與意義則由應用程式208選擇。應用 · 狀況2 1 0乏範例爲，依照視頻螢幕1 〇8所示之位置改變游標之所示。因此，若使用者將游標由一螢幕物體移到另一個螢幕物體時，表示游標之圖像將由指示物圖像改變爲手圖像。一般而言，使用者在視頻顯示108所示影像改變時將接收反饋206。反饋206 —般由應用程式208提供且與視頻顯示108之應用手位置與狀態有關。爲組成景象之所有或某些物體之次集合或部份物體，影二像偵測器103與計算裝置107產生景象描述資訊203，其中包括一三維位置或三維位置所包含的資訊。若物體位置在 -19- 本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐) 裝訂561423 A7 ________B7 V. Description of the invention (16) A different computing device uses this scene description 203 to calculate and output the hand / object position information of the person ’s hand 205 or other suitable pointing device and character in this scene description 203 Location or measurement of other features. The hand / object position information 205 is a set of three-dimensional coordinates and is provided to a position mapping program 207. This position mapping program 207 maps or converts the three-dimensional coordinates to a set of screen coordinates. The screen coordinates generated by the position mapping program 207 can be used as the screen coordinate position information used by the application program 208. The application program 208 is executed on the computing device 107 and provides user feedback 206. -It is also possible to detect specific gestures of the hand, that is to be detected as changes in the hand and / or other characteristic positions displayed by the hand / object position information 205, and interpreted by the posture analysis and detection program 209 as posture information or Posture 211. Then, the screen coordinate position information and attitude information 211 of the position mapping program 207 are transmitted together and used to control the application program 208. If the content of the gesture detection is sensitive, the gesture detection program 209 can use the application condition 210, and the criteria and meaning of the gesture are selected by the application program 208. Application · Condition 2 10 An example is to change the display of the cursor according to the position shown on the video screen 108. Therefore, if the user moves the cursor from one screen object to another screen object, the image representing the cursor will be changed from the pointer image to the hand image. Generally, the user will receive feedback 206 when the image shown in the video display 108 changes. Feedback 206-generally provided by the application 208 and related to the position and status of the application hand on the video display 108. In order to form a sub-set or a part of all or some objects of the scene, the image detector 103 and the computing device 107 generate scene description information 203, which includes a three-dimensional position or information contained in the three-dimensional position. If the position of the object is -19- This paper size applies to China National Standard (CNS) A4 (210X 297 mm) binding

線 561423 A7Line 561423 A7

體具有與符合系統_-般使用之人物文勢期望不符的外形或並器103内立體照相機所偵測的 :、，便排除影像偵測環境的-些限制。此環境中還他此：加對系統操作人。此對於與其他搜尋系統相:二系統互動的成使用者的部份影像’此影像爲靜止與/或被模仿的此外’對-使用者與手的缓：著A 车辟… U的裝置也增加限制，這是因爲人與 Γ〇〇:=三維形式是用於識科。使用者1。1操作系統寺甚至*要穿载手套。相較於其他利用 :Γ:Γ尋系統’上述説明亦爲系統⑽所特有的觀 ....由於身體與手的外觀依不同人物而有所不同，因此系 ^00較依賴使用者與手外觀之方法更有保障。然而請注思可與系統100並用之立體分析程序202的某些實行方法也可利用上喽之表示。一般由1體照相機產生景象描述資訊203。影像偵測器 1 一03二於：類系統中組成兩個或更多個單獨的照相機並作爲上to照相機端。照相機可以是黑白視頻照相機或彩色視頻照相機。各個獨立的照相機需要獨特視角的景象影像並產生一系列視頻影像。使用各個照相機影像之部份景象的相關位置，計算裝置107可推算景象描述203所想之物件與影像偵測器103間的距離。以下詳細説明係描述本系統所使用之立體照相機影像偵剠姦103的一種實行方法。另有其他立體照相機系統與演 -20- 561423 A7 B7 五、發明説明异法可產生適用於本系統的景象描述，應注意本發明並不限於本文中所使用的特殊立體系統。圖3中影像偵測器或立體照相機端103之照相機30 1，3 02 偵測並產生照相機視野範圍304，305(各別）内的景象影像。視野104的整體範圍係定義爲所有視野3〇4，305各自範圍的交又範圍。視野1〇4之整體範圍内的物體3〇7有被所有照相機301與302整體或部份偵測的可能性。由於景象描 4 : 述203允許f含所想區域1〇2之外的物體或物體特徵，故物-體307不需在所想區域1〇2内。關於圖3，應注意手偵測區域1 05爲所想區域1 〇2的次集合區域。關於圖4，影像對201之影像4〇1與4〇2由照相機對ι〇3所偵測。影像401有一線組，各組的線4〇3有在其他影像4〇2内的映射線404。此外，定位於線403之景象的任一共同點 405也可定位於第二照相機影像402内的映射線4〇4上，故該點可定龟於視野1 〇4的整體範圍内並可見於照相機3〇 1與 -302 (如未被景象中其他物體阻擔）。線403與404稱爲上極線。各個上極線對的點位置差異稱爲不一致。不一致與距離成反向常數，並提供產生景象描述2 〇3之所想資訊。上極線對係依靠照相機30 1與3 02間的照相機影像失眞與幾何關係。透過分類之事先處理程序決定並選擇性分析上述屬性。系統須説明大多數照相機使用之鏡片所採用的放射失眞。一種用於解決放射失眞之照相機特徵的技術係描If the body has a shape that does not meet the expectations of the characters used in the system, or is detected by the stereo camera in the parallelizer 103, some limitations of the image detection environment are excluded. Also in this environment: add to the system operator. This is part of the image of the user that interacts with other search systems: the two systems are 'the image is still and / or imitated. In addition'-the user and the hand are slow: the A device is located ... the U device is also The limitation is increased because people and Γ〇〇: = three-dimensional form is used for cognition. User 1.1 operating system Temple even * has to wear gloves. Compared to other uses: Γ: Γ seeking the system 'The above description is also unique to the system ⑽ .... Because the appearance of the body and hands varies with different characters, the system ^ 00 is more dependent on the user and the hand The appearance method is more secure. However, please note that some implementation methods of the stereo analysis program 202 that can be used with the system 100 can also use the above expressions. Scene description information 203 is generally generated by a one-body camera. Image Detector 1 032: In this kind of system, two or more separate cameras are composed and used as the camera side. The camera can be a black and white video camera or a color video camera. Each independent camera requires a unique perspective image and produces a series of video images. Using the relevant position of a part of the scene of each camera image, the computing device 107 can calculate the distance between the object of the scene description 203 and the image detector 103. The following detailed description describes an implementation method of the stereo camera image detection rape 103 used by the system. There are other stereo camera systems and performances. -20- 561423 A7 B7 V. Description of the invention A different method can produce a scene description suitable for this system. It should be noted that the invention is not limited to the special stereo system used in this article. In FIG. 3, the cameras 30 1, 3 02 of the image detector or the stereo camera end 103 detect and generate scene images in the camera fields of view 304, 305 (respectively). The overall range of the field of view 104 is defined as the intersection of the respective ranges of all fields of view 304,305. An object 307 in the entire range of the field of view 104 may be detected in whole or in part by all the cameras 301 and 302. Because scene description 4: 203 allows f to include objects or object features outside the desired area 102, the object-body 307 need not be within the desired area 102. Regarding Fig. 3, it should be noted that the hand detection area 105 is a sub-collection area of the desired area 102. With regard to Fig. 4, images 401 and 402 of image pair 201 are detected by the camera pair ι03. The image 401 has a line group, and each line 403 has a mapping line 404 in the other image 402. In addition, any common point 405 located in the scene of line 403 can also be located on the mapping line 400 in the second camera image 402, so the point can be located in the entire range of the field of view 104 and can be seen in Cameras 301 and -302 (if not obstructed by other objects in the scene). The lines 403 and 404 are called upper polar lines. The difference in the position of the points of each upper polar line pair is called inconsistency. Inconsistency and distance are inverse constant, and provide the desired information to generate the scene description 203. The upper polar line pair depends on the camera image loss and geometric relationship between the cameras 301 and 302. These attributes are determined and selectively analyzed through prior processing of the classification. The system must account for the loss of radiation used by the lenses used by most cameras. A description of a technical feature of a camera for solving radiation loss

述於Microsoft Research，由Z. Zhang所著之標題爲AIn Microsoft Research, titled A by Z. Zhang

Flexible New Technique for Camera Calibration 中， -21- 本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐) 561423 A7 B7 五、發明説明（19 http: "research .micros oft .com/〜zhang，並以引用的方式併入本文中，作爲分類的第一步驟。此技術並未發現上極線，但將造成直線，而更易於找到此線。次集合的方法係描述於 Z. Zhang 之 Determining the Epipolar Geometry and its Uncertainty: A Review, The International Journal ofIn Flexible New Technique for Camera Calibration, -21- This paper size applies to Chinese National Standard (CNS) A4 (210X 297 mm) 561423 A7 B7 V. Description of the invention (19 http: " research.micros oft.com/~ zhang, which is incorporated herein by reference as the first step in classification. This technique does not find the epipolar line, but will result in a straight line, making it easier to find this line. The method of the sub-set is described in Z. Zhang Determining the Epipolar Geometry and its Uncertainty: A Review, The International Journal of

Computer Vision 1997與 Z. Zhang之 Determining the EpipolarComputer Vision 1997 and Z. Zhang by Determining the Epipolar

Geometry and its Uncertainty: A Review, Technical Report 二： 2927，INRIA Sophia Antipolis，France，July 1996，以上皆以、引用的方式併入本文中，並可用於解決上極線，此爲分類的第二步驟。圖5描述一種用於產生景象描述2〇3之立體分析程序2〇2 的實行方法。影像對201包括一參考影像401與比較影像 402。影像濾波器503濾波單獨影像4〇1與402並闖入區塊 504的特徵中。各特徵爲一 8χ8的像嗉區塊。然而應注意特徵可定義爲大於或小於8x8的像素區塊並處理。 · 匹配處理505爲參考影像中各特徵尋找匹配。爲此目的，特徵比較處理506比較參考影像中的特徵與第二或比較影像402内映射上極線沿線之事先定義範圍内的所有特徵。在此特殊實施例中，定義特徵爲影像4〇1或4〇2之8χ8的像素區塊，其中期望此區塊包含部份的景象物體，表示區塊内的像素強度樣式（因由影像濾波器5〇3所濾波，故無法直接表π明視度）。各特徵對匹配的可能性由不一致記綠並二標不。若最佳特徵對匹配之可能性很小（相較於事先決定之臨界値）或許多特徵對皆可能有最佳匹配（若其中的差異 561423 五、發明説明（20 ) 可能性在事先定義之臨界値内，特徵則被視爲相似）時，特徵對濾波器507便消除參考影像4〇1内的區塊。對剩餘的參考特徵，鄰近區支援處理5〇8將所有特徵對的可能性按比例數調整爲有利於具有相同不一致特徵對的鄰近參考特徵可能性。爲各個參考特徵，可由特徵對選擇程序509選擇具有最佳可能性的特徵對，爲各參考特徵提供不一致（以及距離）。 ' 參考特徵T由程序504產生）因閉塞而不表現於第二或比_ 較影像402中且所示之最可能匹配的特徵將是錯誤的。因此在兩種照相機系統中，比較影像4〇2所選之特徵將由相似的程序檢查（使用第二並聯匹配程序5 1〇内的程序5〇6， 5 07，508與509)以產生參考影像4〇1中最佳的匹配特徵，取消影像4〇1與402先前的作用。在一種具有三個照相機的系統中（即使用照相機3 〇 1與3 0 2之外的第三個照相機），第二照相機影卷取代比較影像402，原始參考影像4〇1繼續作爲參考影像，透過相同程序（使用第二並聯匹配程序5 1〇内的程序506，507，508與509)決定第三視頻的最佳匹配特徵。若有三個以上的可用照相機，可將此程序重覆於其他的照相機影像中。任一參考特徵之最佳匹配對特徵具有參考影像中相同匹配特徵時則消除於比較處理5丨丨中。因此，許多錯誤匹配，則消除由閉塞所產生的錯誤距離。上述程序之結果爲深度描述映射512，其描述與影像4〇1 和402相關特徵之位置與不一致。由座標系統轉換程序5 i 3 將位置與不一致利用下述方程式1，方程式2與方程式3轉 ___ -23- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 561423Geometry and its Uncertainty: A Review, Technical Report 2: 2927, INRIA Sophia Antipolis, France, July 1996, all of which are incorporated herein by reference, and can be used to solve the upper polar line, this is the second step of classification . FIG. 5 illustrates an implementation method of a stereo analysis program 202 for generating a scene description 203. The image pair 201 includes a reference image 401 and a comparison image 402. The image filter 503 filters the individual images 401 and 402 and breaks into the features of the block 504. Each feature is an 8 × 8 image block. It should be noted, however, that features can be defined and processed as pixel blocks larger or smaller than 8x8. Matching process 505 finds a match for each feature in the reference image. To this end, the feature comparison process 506 compares the features in the reference image with all the features in a predefined range along the epipolar line mapped on the second or comparative image 402. In this special embodiment, the feature is defined as an 8 × 8 pixel block of the image 401 or 402, where it is expected that this block contains a part of the scene object, which represents the pixel intensity pattern within the block (because of the image filter 503 filter, so can not directly express π bright vision). The possibility of matching each feature pair is recorded by inconsistency and marked by two. If the best feature pair is less likely to be matched (compared to the critical threshold determined in advance) or many feature pairs are likely to have the best match (if there is a difference 561423 V. Description of the invention (20) The possibility is defined in advance Within a critical threshold, features are considered similar), the feature pair filter 507 eliminates blocks in the reference image 401. For the remaining reference features, the neighboring area support processing 508 adjusts the probability of all feature pairs by a proportional number to favor the possibility of neighboring reference features with the same inconsistent feature pair. For each reference feature, the feature pair selection program 509 can select the feature pair with the best possibility, providing inconsistency (and distance) for each reference feature. 'The reference feature T is generated by the program 504) The feature that is not shown in the second or comparison image 402 due to occlusion and the most likely matching feature shown will be wrong. Therefore, in the two camera systems, the features selected in the comparison image 40 will be checked by similar procedures (using the second parallel matching procedure 5 10, procedures 5 06, 5 07, 508, and 509) to generate a reference image. The best matching feature in 401 cancels the previous role of images 401 and 402. In a system with three cameras (ie using a third camera other than cameras 301 and 302), the second camera shadow volume replaces the comparison image 402, and the original reference image 401 continues to be used as the reference image. Through the same procedure (using the procedures 506, 507, 508, and 509 in the second parallel matching procedure 5 10), the best matching feature of the third video is determined. If there are more than three cameras available, this procedure can be repeated on other camera images. When the best matching pair of any reference feature has the same matching feature in the reference image, it is eliminated in the comparison process 5 丨丨. Therefore, many erroneous matches eliminate the erroneous distance caused by occlusion. The result of the above procedure is the depth description map 512, which describes the positions and inconsistencies of the features related to the images 401 and 402. Coordinate system conversion program 5 i 3 Use the following Equation 1, Equation 2 and Equation 3 to change the position and inconsistency. ___ -23- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) 561423

換到反覆三維世界座標系統(x，y，z座標系統)(圖κ 1〇6)。由於與距離並非直線關係，故很難與不一致共同運作。因此，一般在此時會使用以上三個方程式，使景象描述203的座標可描述成與世界座標系統1〇6相關之線性距離重新分配特徵座標將影響區域内的特徵密度，使分群特徵處理變得更爲困難（於稍後步驟中執行）。^此，通常可維持以視頻爲主的座標與已轉換座標。轉換·程序士13所產生的轉換深度描述映射係景象描述203 (圖2中）。由景象分析程序204使此資訊變得有意義並從中得到有效的資料。一般而言，景象分析程序2〇4需依賴本系統所使用的特定方式。圖6之流程圖概括景象分析程序2〇4的實行方法。在景象分析程序204中，景象描述203内的特徵由特徵排除模組 60 1濾波以排除指示特徵不屬於使珂者或位於所想區域丨〇2 之外的位^特徵。模組601也消除背景和其他相等的牽引 (如使用者-的遠方有其他人）。一般將所想區域102定義爲對準世界座標系統i 06的邊界匣。在這個情形下，模組60 1可輕易地檢查各特徵座標是否位於邊界E中。部份背景可偵測在所想區域102内，或所想匣形區域無法將使用者由背景中區隔出來（特別是在密閉空間中的時候）。若所想區域102中沒有使用者，可隨機取樣景象描述二 203並由背景取樣模組602修改之，以產生背景參考603。背景參考603爲景象外形的説明，其不改變對景象外觀的 -24- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐） 561423 A7 B7 五、發明説明（22 改變（如亮度的改變）。因此僅在建立系統1〇〇時可有效取樣景象，只要景象維持不變，則參考將保持其有效性。爲保證已觀察背景維持在背景參考6〇3所定義的外形内，背景取樣模組602可短時間觀察景象描述2〇3並爲所有位置記錄最接近的照相機103。此外，由事先決定的距離進一步擴張特徵所定義的値（一般距離符合特徵距離上不一致的像素k化）。一旦芫成取樣，可比較此背景參考6〇3與景象描述203，且街特徵排除模組6〇丨消除背景參考上或後之景象_ 描述2 0 3内的任一特徵。排除特欲後，下個步驟爲利用特徵分群程序6〇4將剩餘的特徵分群到一個或更多的特徵集合中。比較事先定義範圍内各特徵與其鄰近特徵。特徵在其影像座標中較在其他轉換座標中更易於均勾分配’故一般使用影像座標測量都近距離。事先定義最大可接收範圍’並依賴所使用之特定立體分析程I ’如立體分析程序2〇2。上述立體分析程序2〇2 產生相關密度與平均分配特徵’此方法較其他立體處理技術更易於分群。符合標準的特徵對可視爲鄰近特徵對，對、事先定我的範圍檢查其依賴不一致的轴心距離（X轴中照相機位於所想區域之前’或y軸中的照相機位於所想區域 =若有穿過結合特徵之特徵對的路徑，此路徑上的特啟，付合標準，則群包括不符合標準的特徵對。。ΐ續Λ實Λ方法，利用群遽波程序605遽波群以確保群的 ::貝…表現於所想區域102内的物體，而非立體處理歹'仃心序中錯誤定義之位置（或不—致）的特徵結果。部份 -25- 297公釐) 本紙張尺歧 561423 A7 _______B7_ 五、發明説明（23 ) 的群滤波程序605也將包含過少特徵並提供尺寸大小，外觀或位置之確認測量的群消除。測量群的區域，邊界大小與特徵數並與描述測量最小品質之事先定義的臨界値相比較。不符合標準的群與其特徵將從其他考量中刪除。由本實行方法中的表示偵測模組606決定人物是否出現。表示偵測模組606爲選擇性的，其因不是所有系統皆要求成分提供此資訊。在其最簡易的方式中，表示偵測模二組606僅需f查事先定義之表示偵測區域6〇7邊界内的特徵· J (並非之前所消除的）是否出現。表示偵測區域6〇7爲任一使用者101部份可能佔用的區域，若沒有使用者則此區域不可能被任何物體佔用。一般表示偵測區域6〇7符合所想區域102。然而在本系統之特定裝置中，定義此表示偵測區域607以避免景象内不動的物體。在本成分所使用之實行方法中，若發現使用者1 〇 1則可省略，進一步的處理。在所述4統100的實行方法中，定義手偵測區域1〇5。利 _ 用此方法定義的區域1〇5(透過程序609)係依賴下述中系統所使用的結構。該裎序可選擇性分析使用者身體並返回包括身體位置/測量資訊610等其他資訊，例如人的頭部位置。手偵測區域105不需有任何内容或只需包含人的手部或合適的指示物。任一尚未被濾波消除或具有手偵測區域 105特徵的群可視爲或包括手部或指示物。爲各群計算位二置（透過程序611)，若該位置位於手偵測區域1〇5内，則（在 &己憶體中）記錄爲手位置座標612。測量此位置作爲權重裝 ____-26- 本紙張尺度適用中國國家標準(CNS) Α4規格(210 X 297公釐)Switch to the iterative three-dimensional world coordinate system (x, y, z coordinate system) (Figure κ 106). Since distance is not linear, it is difficult to work with inconsistencies. Therefore, the above three equations are generally used at this time, so that the coordinates of the scene description 203 can be described as a linear distance related to the world coordinate system 106. Reassigning the feature coordinates will affect the feature density in the area and make the clustering feature processing change. It is more difficult (implemented in a later step). ^ This usually maintains video-based and transformed coordinates. The conversion depth description mapping system generated by the conversion programmer 13 is the scene description 203 (in FIG. 2). The scene analysis program 204 makes this information meaningful and obtains valid information from it. In general, the scene analysis program 204 depends on the specific method used by the system. The flowchart of FIG. 6 summarizes the execution method of the scene analysis program 204. In the scene analysis program 204, the features in the scene description 203 are filtered by the feature exclusion module 601 to exclude features indicating that the features do not belong to the user or are located outside the desired area. Module 601 also eliminates background and other equivalent tractions (such as the user-there are others in the distance). The desired area 102 is generally defined as a bounding box aligned with the world coordinate system i 06. In this case, the module 601 can easily check whether each feature coordinate is located in the boundary E. Part of the background can be detected in the desired area 102, or the desired box-shaped area cannot distinguish the user from the background (especially when in a confined space). If there are no users in the desired area 102, the scene description two 203 may be randomly sampled and modified by the background sampling module 602 to generate a background reference 603. The background reference 603 is a description of the appearance of the scene, which does not change the appearance of the scene. -24- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 561423 A7 B7 V. Description of the invention (22 Changes (such as Changes in brightness). Therefore, the scene can be effectively sampled only when the system is established at 100. As long as the scene remains unchanged, the reference will remain valid. To ensure that the observed background remains within the shape defined by background reference 603 The background sampling module 602 can observe the scene description 203 for a short time and record the closest camera 103 for all positions. In addition, the distance defined by the feature is further extended by the predetermined distance (generally the distance meets the inconsistent pixels on the feature distance) k)). Once the sampling is completed, the background reference 603 can be compared with the scene description 203, and the street feature exclusion module 60 can remove the scene above or behind the background reference _ description any feature in 203. After eliminating the desire, the next step is to use the feature grouping program 604 to group the remaining features into one or more feature sets. Compare each feature in the previously defined range with Proximity features. Features are easier to distribute in their image coordinates than in other transformation coordinates. Therefore, image coordinates are usually used to measure close distances. The maximum receivable range is defined in advance and depends on the specific stereoscopic analysis process I 'used, such as Stereo analysis program 202. The above stereo analysis program 202 generates correlation density and average distribution features. This method is easier to group than other stereo processing technologies. Feature pairs that meet the standard can be considered as neighboring feature pairs. The range check depends on the inconsistent axial distance (the camera in the X axis is located before the desired area 'or the camera in the y axis is located in the desired area = if there is a path through the feature pair of the combined feature, the special on this path, If the standard is met, the group includes feature pairs that do not meet the standard. Continued Λ Real Λ method, using the group 遽 wave program 605 遽 wave group to ensure the group's :: be ... objects in the desired area 102, and Non-stereoscopic processing 歹 '仃 Character sequence at the wrongly defined position (or not) in the heart order. Part -25- 297 mm) This paper rule 561423 A7 _______B7_ 5 The group filtering program 605 of the invention description (23) also eliminates a group that contains too few features and provides confirmation measurements of size, appearance, or position. The area, boundary size, and number of features of the measurement group are compared with a previously defined description that describes the minimum quality of the measurement. The critical threshold is compared. Groups that do not meet the criteria and their characteristics will be deleted from other considerations. The representation detection module 606 in this implementation method determines whether a person appears. The indication detection module 606 is selective, because not all The system requires the component to provide this information. In its simplest way, the detection detection module two group 606 only needs to check the previously defined characteristics within the detection area 607 boundary. J (not previously eliminated) Whether it appears. It indicates that the detection area 607 is an area that may be occupied by any user 101. If there is no user, this area cannot be occupied by any object. It generally indicates that the detection area 607 corresponds to the desired area 102. However, in certain devices of this system, this means that the detection area 607 is defined to avoid moving objects in the scene. In the implementation method used in this component, if the user is found to be 101, it can be omitted and further processed. In the implementation method of the above-mentioned system 100, a hand detection area 105 is defined. Benefit _ The area 105 (through program 609) defined by this method depends on the structure used by the system described below. This sequence can selectively analyze the user's body and return other information including body position / measurement information 610, such as the position of the person's head. The hand detection area 105 need not have any content or need only include a human hand or a suitable pointer. Any group that has not been filtered out or has the characteristics of the hand detection area 105 can be regarded as or include a hand or a pointer. Two positions are calculated for each group (through program 611). If the position is within the hand detection area 105, it is recorded (in & memory) as the hand position coordinate 612. Measure this position as a weight. ____- 26- This paper size applies to China National Standard (CNS) Α4 (210 X 297 mm)

裝訂Binding

561423 A7 B7 五發明説明（ 24 置。識別距離手偵測位置區域入口（範例中之1 002)最遠的群特徵（由圖10所述之範例1005識別），且根據表示手指或指示物頂端的可能假設給其位置1權重。其餘群特徵權重係基於返回其特徵的距離，利用下述方程式4的公式。若此應用僅需一個手位置且多項群具有手偵測區域1〇5内的特徵，提供距離入口 1002最遠的位置作爲手位置612並捨棄其他位置。因此，使用可最遠觸及手偵測區域1〇5的二：手。否則，一若有兩個以上具有手偵測區域丨〇5内特徵的群，-二距離入口 1 〇〇 2最遠的距離以及次遠距離的位置皆可作爲手垃置6 12，並捨棄其他位置。當上述規則導致群被包含並用於代替不同群時，則此被包含的群在手位置資料6丨2中被標記。在照相機的方向可偵測人手臂的架構中，將此方向表示爲手臂或指示物之手方向座標613”並可被手方向計算模組614所1t算。若照相機1〇3的高度與手偵測區域105同 · 南’其中包括照相機103高於手偵測區域1 〇5的架構。可由群的標準座標軸表示此方向，其由群的時刻中所計算而出0 以下是在未平均分配特徵時，同樣可產生好結果的方法。手臂可進入手偵測區域1 〇 5的位置，其中群被手偵測區域105邊界所形成的平面分割。該位置與手位置座標“^ 間的向量提供手方向座標613。二動態平滑程序615可用於手位置座標612，手方向座標 613(若已解決）與任一其他身體位置或測量61〇。平滑爲結 ____-27- 本紙張尺度適财國因冢標準(CNS) A4規格(210X297公爱) 561423 A7 ____B7 五、發明説明（25 ) 合結果與之前所解決之結果，使姿態在訊框中保持不變的程序。特殊座標値之一特殊平滑，其中各座標成分X，y，Z 皆被個別並動態平滑。下述方程式5計算削減S的程度，其中依照位置的改變動態且自動調整S。圖7所示之距離臨界値DA與Db定義姿態的三種範圍。對於小於DA的位置改變，由S a將姿態在區域7 〇 1重重地削減，藉以減少兩個鄰近値間的値往後和往前轉換的傾向（一種影像抽象取樣的邊際效用）。大於Ib的位置改變則由SB在區域702中做輕微的削-減，或不削減。這個步驟減少或消除其他平化程序所示之延遲與含糊不清的地方。對於DA與DB間的行動，削減程度有所不同，此區域爲703，故較不易注意到輕微與重大削減間的變化。下述方程式6係用於解答常數a，其係用於方程式7 (如下所述）以便修改座標。動態平滑程序61 5之結果爲圖2之手/物體位置資訊205。當程序611標記和之前位置屬於不同群功位置時，由於目前與之前的位置不同，故不使用平滑步驟609利用所述方法決定手偵測區域1〇5係依照使用影像控制系統100的方式。兩種方式皆描述於本文中。最簡易的手4貞測區域1 〇 5爲事先決定之固定區域，其不包含任何物體或僅包含人的手或指示物。如圖8所述，此定義所使用之方式爲用於控制個人電腦之使用者介面的系統 100使用方法，其中手偵測區域105係電腦顯示監視器之前以及電腦鍵盤802上的區域。在電腦傳統的使用中，使用者手或其他物體通常不進入此區域中。因此任一在手偵測 2 8 - 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐)一 A7 B7561423 A7 B7 Five invention description (24 positions. Identify the group feature (recognized by the example 1005 described in Figure 10) that is farthest from the entrance of the hand detection position area (1 002 in the example), and according to the top of the finger or pointer It is possible to assume that its position is given a weight of 1. The remaining group feature weights are based on the distance returned to its features, using the formula of Equation 4 below. If this application requires only one hand position and multiple groups have a hand detection area 105 Feature, providing the position farthest from the entrance 1002 as the hand position 612 and discarding other positions. Therefore, use the two that can reach the hand detection area 105 as far as: hand. Otherwise, if more than two have hand detection The group of features in the area 丨〇5, the second longest distance and the second longest distance from the entrance 1 002 can be used as the hand position 6 12 and other locations are discarded. When the above rules cause the group to be included and used When replacing different groups, this included group is marked in the hand position data 6 丨 2. In the structure where the direction of the camera can detect the human arm, this direction is expressed as the direction of the hand of the arm or pointer The mark 613 "can be calculated by the 1t of the hand direction calculation module 614. If the height of the camera 103 is the same as the south of the hand detection area 105, including the structure of the camera 103 higher than the hand detection area 105. The standard coordinate axis of the group indicates this direction, which is calculated from the moment of the group. The following is a method that can also produce good results when the features are not evenly distributed. The arm can enter the position of the hand detection area 1 0, where The group is divided by the plane formed by the boundary of the hand detection area 105. The vector between this position and the hand position coordinate "^ provides the hand direction coordinate 613. Two dynamic smoothing programs 615 can be used for the hand position coordinate 612, and the hand direction coordinate 613 (if the (Solved) with any other body position or measurement 61. Smoothness is knot ____- 27- This paper size is suitable for the country's national standard (CNS) A4 specification (210X297 public love) 561423 A7 ____B7 V. Description of the invention (25) The procedure of combining the result and the previous solution to keep the attitude unchanged in the frame. One of the special coordinates 値 is specially smoothed, where each coordinate component X, y, Z is individually and dynamically smoothed. Calculated by Equation 5 below reduce The degree of S, which dynamically and automatically adjusts S according to the change of position. The distance critical 値 DA and Db shown in Figure 7 define three ranges of attitude. For position changes less than DA, the attitude will be in the area 701 by Sa. Ground reduction in order to reduce the tendency of backward and forward transitions between two adjacent borders (a marginal utility of image abstract sampling). Position changes greater than Ib are slightly trimmed-reduced by SB in area 702, or No reduction. This step reduces or eliminates the delays and ambiguities shown in other flattening procedures. For DA and DB actions, the degree of reduction is different. This area is 703, so it is less easy to notice slight and major Change between cuts. Equation 6 below is used to solve the constant a, which is used in Equation 7 (described below) to modify the coordinates. The result of the dynamic smoothing program 61 5 is the hand / object position information 205 in FIG. 2. When the program 611 mark and the previous position belong to different group power positions, since the current position is different from the previous position, the smoothing is not used. Step 609 uses the method to determine the hand detection area 105 according to the way the image control system 100 is used. Both approaches are described herein. The simplest hand measurement area 105 is a fixed area determined in advance, which does not contain any objects or only human hands or pointers. As shown in FIG. 8, the method used for this definition is the method of using the system 100 for controlling the user interface of a personal computer, in which the hand detection area 105 is the area before the computer display monitor and on the computer keyboard 802. In traditional computer use, the user's hand or other objects usually do not enter this area. Therefore, any detection in hand 2 8-This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)-A7 B7

示Show

裝 561423 五、發明説明（26 區105内移動足物體可解釋爲使用者1〇1執行使用手戈浐物的成果，其中指示物可爲任一適合於執行指示，^人姿態的物體’例如鉛筆或其他適合的指示裝二、應：：乙體分析程序2G2的特定t行方法可利用作爲^物之= 類型或外觀的限制。此外，可定義上述選擇表示偵測區：為區域8G1以包含此方式中的使用者頭部。影像偵測哭103 可置於監視器108之上。 ° 一些方式一年的手偵測區域105可動態定義相關於使用者— 身體且不包含任何物體或僅包含人物的手或指示物。動態區域的使用移除事先決定位置上之使用者的限制。圖述使用本實行方法之方式。圖9詳細説明選擇動態手偵測區域位置程序6〇9的實行方法。在此程序中已解答出各座標軸上之手偵測區域、ι〇5的位置，而此手偵測區域105的大小與方向則是由事先定義之説明所復定。圖l〇A_loc之範例係用於描述此程序。使用群資料901(圖6群濾波程序605之輸出），區塊9〇2所述之程序包含尋找平面1001的位置（這類軀幹分割平面描述於圖ioc所述之側視野中），其方向與收偵測區域1〇5之邊界1002平行且使用者1〇1可經此達到此位置。若想將特徵平均分配在原始影像上（如同實行上述立體分析程序2〇2 之方法時），則多數剩餘特徵將屬於使用者軀幹而非使用者的手。在這個情形下可定位平台1001，以便將特徵分割爲二數量相同的兩個群組。若不想平均分配特徵（如同實行上述立體分析程序202之其他方法時），便無法成立上述的假 -29- 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 訂Equipment 561423 V. Description of the invention (the moving foot object in the 26 area 105 can be interpreted as the result of the user's implementation of the use of hand gestures, and the pointer can be any object suitable for performing instructions, such as a person's posture. Pencil or other suitable indicator. 2. Should: The specific t-line method of B-body analysis program 2G2 can be used as a restriction of type or appearance. In addition, the above selection can be defined to indicate the detection area: for area 8G1 to Including the user's head in this way. The image detection cry 103 can be placed on the monitor 108. ° In some ways, the hand detection area 105 for one year can be dynamically defined in relation to the user — the body and does not contain any objects or Contains only people's hands or pointers. The use of dynamic areas removes the restriction of users at predetermined locations. The method of using this implementation method is illustrated. Figure 9 details the process of selecting the position of the dynamic hand detection area 609. Implementation method. In this program, the positions of the hand detection area and ι05 on each coordinate axis have been solved, and the size and direction of the hand detection area 105 are reset by a previously defined description. The example in Figure 10A_loc is used to describe this procedure. Using group data 901 (the output of the group filtering program 605 in Figure 6), the procedure described in block 902 includes finding the location of plane 1001 (this type of torso segmentation plane) (Depicted in the side field of view described in Figure 10c), its direction is parallel to the boundary 1002 of the detection area 105 and the user 101 can reach this position. If you want to distribute the features evenly on the original image ( As when implementing the method of the above-mentioned three-dimensional analysis program 202), most of the remaining features will belong to the user's torso rather than the user's hand. In this case, the platform 1001 can be positioned to divide the features into two equal numbers Group. If you don't want to distribute the features evenly (as when implementing the other methods of the three-dimensional analysis program 202 above), the above false-29 cannot be established. This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm).

561423 五、發明説明（ A7 B7 27 。，客 Try 丄人 - ^ I市成群之外部邊界的特徵仍將期望屬於軀 $在k個h形下可定位平台1 〇〇丨，以便將最外圍的特徵、^!爲數量相同的兩個群組。在另一個情形下，區塊9〇2 (躯幹分割程序定位平台1001，以通過使用者軀幹。序區塊903決定座標軸上的手偵測區域1〇5位置，一般 ^ 仏置爲上述中的平台1001。定義手偵測區域105爲二口 1001則的事先決定距離1004，並定位於使用者身體刖。如圖1中^的情形，距離1004決定z軸上的手偵測區域105-位置。右使用者頭部完全位於所想區域102内，則期望群的最高特徵位置表示使用者的頭頂（因此也包含使用者之高度）並建亙於此實行方法的程序區塊9〇4中。在程序區塊9〇5中，根據此頭邵位置定位手偵測區域1〇5的位置，一事先定義距離是位於使用者頭頂下。在圖丨的情形中，事先定義的距離決定4軸上的手偵測區域位置。若無法測量使用者高度’或群達到所想區域102的邊界（意指此人物延伸到所想區域102之外），便在事先定義的高度定位手偵測區域丨…。在許多模式中，可決定使用者左手臂或右手臂是否聯繫圖6之位置計算區塊611所偵測的手。在程序區塊9〇6中決定手臂又又平面1001前之事先定義平面的位置。一般而言，平面符合1002所指示的手偵測區域邊界。若無接近此平面之特徵，但於此平面前發現若干特徵，則該等特徵阻擔與平面的交叉點，並假設交又位置位於阻擋的特徵之後。利用區塊特徵間最短的鄰近距離，各交又點可聯繫手裝訂561423 V. Description of the invention (A7 B7 27), the characteristics of the outer boundary of the city group will still be expected to belong to the body. Under the k h-shaped positions, the platform 1 〇〇丨 can be located The characteristics of ^! Are two groups with the same number. In another case, block 902 (the torso segmentation program locates platform 1001 to pass the user's torso. Sequence block 903 determines the hand detection on the coordinate axis. The position of the area 105 is generally set to the platform 1001 described above. The hand detection area 105 is defined as the second mouth 1001 and the predetermined distance 1004 is determined and positioned at the user's body. As shown by ^ in FIG. 1, The distance 1004 determines the 105-position of the hand detection area on the z-axis. The head of the right user is completely located in the desired area 102. The highest characteristic position of the expected group is the top of the user's head (and therefore also the user's height). Built in the program block 904 where the method is implemented. In program block 905, the position of the hand detection area 105 is located according to this head position. A predefined distance is located under the user's head. In the case of Figure 丨, the Determine the position of the hand detection area on the 4 axis. If the user's height cannot be measured or the group reaches the boundary of the desired area 102 (meaning that the character extends beyond the desired area 102), it will be positioned at a predefined height Hand detection area 丨 ... In many modes, it can be determined whether the user's left or right arm is connected to the hand detected by the position calculation block 611 in FIG. 6. In program block 906, it is determined that the arm is flat again. The location of the plane is defined before 1001. In general, the plane meets the boundary of the hand detection area indicated by 1002. If there are no features close to this plane, but some features are found in front of this plane, these features will block the plane The intersection point is assumed to be behind the blocking feature. Using the shortest proximity distance between block features, each intersection point can be contacted for hand binding

-30--30-

561423 A7561423 A7

點0 也可於程序區塊9〇7發現使用者身體中部與使用者身骨# 邊界的位置。-般而言’被均勻分佈特徵的位置爲= 有特徵的主要位置。若不期望均勾分配特徵，則使用另一個群邊界間位置中點的測量方法。在程序區塊908中，將由程序區塊9〇6發現之屬於手臂的位置與程序區塊907所發現的身體中心位置相比。若手臂位置充分地一抵銷身體中心位置的左邊或右邊，則表示手; 的來源爲使用者101的左肩膀或右肩膀。若發現兩手但僅有一手被確定標記爲左手或右手，則也包含另一手之標記。因此，根據群的結構將手標記爲左手或右手，以確: 許多形式中的適當標記，其中可發現兩手且左手位置在右手位置的右方。Point 0 can also find the position of the boundary between the middle of the user's body and the user's bone # in the program block 907. -In general, the position where features are uniformly distributed is = the main position with features. If you do not expect even-hook distribution, use another method of measuring the midpoint of the position between group boundaries. In program block 908, the position belonging to the arm found in program block 906 is compared with the body center position found in program block 907. If the position of the arm sufficiently offsets the left or right of the center position of the body, it indicates the hand; the source is the left or right shoulder of the user 101. If two hands are found but only one is identified as left or right, the other hand is also included. Therefore, the hand is marked as left- or right-handed according to the structure of the group to determine: proper markings in many forms where both hands can be found with the left-hand position to the right of the right-hand position.

若由程序區塊908識別到手，則可定位（由程序區塊9〇9) 手偵測區婆10 5位置，使此手偵測區域1 〇 5可位於期望之結合使用者手的行動範圍中。剩餘座標軸上的手偵測區域 105的位置可偏向方程式8所定義之手臂的手臂（如下所述）。若程序區塊908無法識別此手臂，或另有其他辦法，則剩餘座標軸上手偵測區域1 〇 5的位置可定位於程序區塊 907所發現之使用者身體的中央。在必須追蹤兩手的形式中，手偵測區域105可定位於使用者身體的中央位置。程序區塊903，906與909各自解答一座標軸内的手偵測區域1 05的位置，並一起定義此手偵測區域} 〇5在三維空間内的位置。一動態平滑程序910利用成分615所使用的相同方 _— _____-31- 本紙&尺度適财S S家純CNS) Α4規格(21GX 297公Θ 561423 A7 B7 五、發明説明（29 ) 法（利用方程式5，方程式6與方程式7)平滑該位置。然而，則減的較高層係使用於程序91 〇中。由動恐、平滑程序9 10輸出之平滑位置資訊，以及事先定義之大小與方向資訊9 11完整地定義手偵測區域丨〇5的邊界。解答手偵測區域105之位置的過程中，程序區塊905，907與 9〇8發現多種使用者之其他身體位置測量方法913(圖6之程序 6 1 0) 〇總之，由-圖6所描述之使用所有包含圖9中隨機成分的實行方法產生景象中人物的描述（如圖2之手/物體位置資訊 205所示），其中所包含之資訊如下： -使用者存在/不存在或數量 -對各個出現的使用者：〇身體或軀幹的左/右邊界〇身體或軀幹的中心點 · 〇頭(若頭部位於所想區域内）〇對出現的手：手偵測區域標記左/右的（若可偵測）指尖位置手或前臂的方向由景象描述203分析之已知改良方法，本文所述之實行方法詳細説咏此使用者（例如識別肘部位置）。手/物體位置資訊205係一種次集合資訊或矸包含於上述資訊的其他資訊，可允許使用者與多種應用稃式208互動 -32- 本紙張尺度適财國國家料(CNS) A4規格(210 X 297公董_) 561423If the hand is identified by program block 908, the position of hand detection area 105 can be located (from program block 109), so that the hand detection area 1 05 can be located in the desired range of action combined with the user's hand in. The position of the hand detection area 105 on the remaining coordinate axis can be biased toward the arm of the arm defined by Equation 8 (as described below). If the program block 908 cannot identify the arm, or if there is another method, the position of the hand detection area 105 on the remaining coordinate axis can be located at the center of the user's body found in the program block 907. In a form in which both hands must be tracked, the hand detection area 105 can be positioned at the center of the user's body. Program blocks 903, 906, and 909 each solve the position of the hand detection area 105 in an axis, and define this hand detection area} 〇5 position in three-dimensional space. A dynamic smoothing program 910 uses the same method used by component 615 __ _____- 31- paper & scale suitable financial SS home pure CNS Α4 specification (21GX 297 public Θ 561423 A7 B7 V. Description of the invention (29) method (utilization (Equation 5, Equation 6 and Equation 7) smooth the position. However, the higher layers that are subtracted are used in the program 91 0. The smooth position information output by the dynamic and smoothing programs 9 10, and the size and direction information defined in advance. 9 11 completely defines the boundary of the hand detection area 丨 05. In the process of solving the position of the hand detection area 105, program blocks 905, 907, and 908 find other methods for measuring the body position of various users 913 (Figure The procedure of 6 6 1 0) 〇 In short, the description of the characters in the scene is generated from the implementation method described in FIG. 6 using all the implementation methods containing the random components in FIG. 9 (as shown in the hand / object position information 205 in FIG. 2), where The information contained is as follows:-user presence / absence or quantity-for each user present: o left / right border of the body or torso o center point of the body or torso · if the head is in the desired area Inside) 〇 pair Appearing hand: Hand detection area mark left / right (if detectable) Fingertip position The direction of the hand or forearm is a known and improved method analyzed by scene description 203. The implementation method described in this article details this user (Such as identifying the position of the elbow). Hand / object position information 205 is a sub-collection or other information included in the above information, which allows users to interact with a variety of applications 208-32- (CNS) A4 size (210 X 297 public director_) 561423

與/或控制應用程式208。以下詳細說明三種應用的控制方法。透過處理上述資訊可偵測多種人類的姿態，以上處理不受限於應用208與下列所述之特殊控制分析。這類姿態的範例爲未處理空中的軌道或將手擦向另一邊。一般而言，由姿態分析與偵測程序209所偵測之姿態類型係使用手/物體位置資訊205。And / or control application 208. The control methods of the three applications are explained in detail below. A variety of human poses can be detected by processing the above information. The above processing is not limited to the application 208 and the special control analysis described below. Examples of such gestures are untreated air orbits or rubbing hands to the other side. Generally, the type of posture detected by the posture analysis and detection program 209 uses hand / object position information 205.

可使，探技術偵測姿態的狀態。偵，測程序2〇9維持手-與身體位置的所有改變過程。其中—種偵測姿態的方法爲檢查位置是否明確通過規則限定。舉例説明，若達成下列姿態偵測規則，則可識別將手擦去另一邊的姿態： L 段少於事先定義之限定時間内的水平位置變化大於事先定義的距離。 2.水平位置在一段時間内保持一貫的改變。 3 · #又日t間内之頂點位置的改變小於事先定義的距離。 4·時間結束的位置比時間開始的位置更接近（或正好位於）手偵測區域的邊界。一些安態要求許多規則限定需按照明確的順序達成，藉使此結果能導致此系統改變成使用不同規則限定的狀態。此系統不能偵測細微的姿態，在此情形下便可使用Hidden Markov模組，由於此模組仍允許偵測一系列的特定行動，同時也考慮$亥行動疋否完全符合一姿態的整體可能性。本系統之實行方法提供一種使用者互動的方法，使用者可藉此方法導致一指示器之表示在影像内移動（使用者反 -33- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 561423 A7 _ —_ B7__ 五、發明説明（31 ) 馈206)，上述内容係表現於視頻顯示1〇8上。此指示器依照反應使用者手部姿態的方式移動。在一個使用者介面形式的變化中，指示器顯示於其他圖像之前，並映射其姿態到視頻顯示螢幕108之介面所定義的二維空間。此控制形式與桌上型電腦一般使用滑鼠的形式相似。圖11A描述使用此類型控制之應用程式208的反饋影像2〇6範例。 ^ 在位置映Γ射程序207中，透過以下所述方法，將先前所述 1 之景象分析程序204所偵測的手位置205映射到景象指示物或游標1101覆蓋到視頻顯示、1〇8所示之螢幕影像206上的位置。當偵測到一隻手並發現其位於手偵測區域105内時，於傳遞此手位置205到應用程式208之前，將相關於手偵測區域1〇5的手位置205映射到訊顯示器1〇8。一種映射座標的方法係透過方程式9 (如下所述）的應用得到X座標與y座標。如圖iJB所述，由一整個包含於手偵測區域11〇4(近似 . 於手偵对區域105)内的次區域1103標示整個顯示區域 1102。次區域11 〇3内的位置（如手位置11 〇5)直線映射到顯示區域1102内的位置（如1106)。位於次區域Π03之外，但仍位於手偵測區域1104内的位置（如11〇7)映射到顯示區域 1102邊界上最接近的位置（如11〇8)。這降低使用者試圖移動游標110 1到接近顯示器邊界的位置時，不小心將手從次區域1103消除的可能性。若在手偵測區域1 〇5内偵測出使 -用者的兩手，便於位置映射程序207中選擇一手。一般而言，將選擇能伸最遠到手偵測區域105的手。由於此手具 -34- 本紙張尺度適财S S家標準(CNS) A4規格(210 X 297公I) - 561423 A7 B7 五、發明説明（32 ) 有最大或最小的X，y，z座標値（視此系統之配置與世界座標系統106之定義而定），故能偵測得到。使用此互動形式之應用通常呈現資料或控制的圖式（如按鈕1 109)。使用者期望將導致指示器1 101被定位於其中_ 個物體上。藉由比較重新映射指示器位置1 1 06與物體圖式邊界（如1110)，可偵測以上條件，若指示器位置在物體邊界内，則成立此條件。使用者隨機接收反饋，其指示群係定位於一物聽上。反饋可有許多種形式，其中包括音頻信號與/或群與/或物體的圖形改變。接著使用可活化，操作或移動群下方的物體。使用者期望將指示其目的，透過姿態的執行以活化，操作或移動物體。在本文所示之系統實行方法中，姿態分析程序209識別由景象分析程序204與/或位置映射程序207所提供之手位置或其他位置與測量變化中的姿態形式。舉例説明’使甩者可指示一目的以活化位於群下方的物體，該物體將使群保持在物體上方的時間長於事先定義的期間。此姿態偵測需要應用的狀態210，尤其是物體的邊界與/或狀悲需反饋於姿態分析程序209。由於已存在之技術可隱密地監視應用狀態210並可利用位置映射程序207提供之座標’仿效其他如電腦滑鼠的介面裝置，故無需爲系統特別建立應用。在一些形式中，無法得到並監視此應用狀態資訊2丨〇。在此狀況下，指示活化群下方物體的姿態中包括保持手的姿 *態不、·變（超出或將手快速向前或向後伸）。本紙張尺度相中g g家標準(CNS) A视格㈣X撕公爱) 裝訂Enables, detection technology to detect the state of the attitude. Detection, measurement procedures 209 maintain all hand- and body-position changes. One way to detect attitude is to check whether the position is clearly defined by rules. For example, if the following gesture detection rules are achieved, the gesture of wiping the other side can be identified: The horizontal position change of segment L less than a predefined time within a defined time is greater than a predefined distance. 2. The horizontal position changes consistently over time. 3 · # The change in the position of the vertex within day t is less than a predefined distance. 4. The position at the end of time is closer to (or just at) the boundary of the hand detection area than the position at the beginning of time. Some security states require that many rule qualifications be achieved in a clear order, so that this result can cause the system to change to a state qualified using different rules. This system cannot detect subtle gestures. In this case, the Hidden Markov module can be used, because this module still allows a series of specific actions to be detected, and also considers whether the operation of $ HAI fully meets the overall possibility of a gesture Sex. The implementation method of this system provides a method for user interaction. The user can use this method to cause the indication of an indicator to move within the image (user's anti-33- This paper size applies to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 561423 A7 _ —_ B7__ V. Description of the Invention (31) Feed 206), the above content is shown on the video display 108. This indicator moves in a way that reflects the posture of the user's hand. In a change in the form of a user interface, an indicator is displayed before other images, and its gesture is mapped to a two-dimensional space defined by the interface of the video display screen 108. This form of control is similar to the way a desktop computer typically uses a mouse. FIG. 11A depicts an example feedback image 206 using an application 208 of this type of control. ^ In the position mapping program 207, the hand position 205 detected by the scene analysis program 204 previously described 1 is mapped to the scene indicator or cursor 1101 through the method described below to cover the video display and position 108. The position on the screen image 206 is displayed. When a hand is detected and found to be within the hand detection area 105, the hand position 205 related to the hand detection area 105 is mapped to the information display 1 before the hand position 205 is transmitted to the application 208 〇8. One method of mapping coordinates is to obtain the X and y coordinates through the application of Equation 9 (described below). As shown in iJB, the entire display area 1102 is marked by a whole sub-area 1103 included in the hand detection area 1104 (approximately. In the hand detection area 105). The position in the sub-region 11 〇3 (such as the hand position 11 〇5) is linearly mapped to the position in the display region 1102 (such as 1106). A location (such as 1107) located outside the sub-region Π03 but still within the hand detection area 1104 is mapped to the closest location (such as 1108) on the boundary of the display area 1102. This reduces the possibility of the user accidentally removing his hand from the sub-region 1103 when he tries to move the cursor 110 1 to a position close to the border of the display. If both hands of the user are detected in the hand detection area 105, it is easy to select one hand in the position mapping program 207. In general, the hand that can reach as far as the hand detection area 105 will be selected. Because this hand tool -34- this paper is suitable for SS home standard (CNS) A4 specification (210 X 297 male I)-561423 A7 B7 V. Description of the invention (32) There are maximum or minimum X, y, z coordinates 値(Depending on the configuration of this system and the definition of the world coordinate system 106), it can be detected. Applications that use this interactive form often present data or control schemes (eg button 1 109). The user expects that the indicator 1 101 will be positioned on one of these objects. By comparing and remapping the pointer position 1 1 06 with the object pattern boundary (such as 1110), the above conditions can be detected. If the pointer position is within the object boundary, this condition is established. The user randomly receives feedback, which indicates that the ancestor is positioned on an object. Feedback can take many forms, including audio signals and / or graphical changes to groups and / or objects. Then use objects that can be activated, manipulated or moved under the group. The user is expected to indicate his purpose by activating, manipulating or moving the object through the performance of the gesture. In the system implementation method shown herein, the posture analysis program 209 recognizes the hand position or other position and measurement forms provided by the scene analysis program 204 and / or the position mapping program 207. Illustrate 'so that the shaker can indicate a purpose to activate an object located below the group, which will keep the group above the object for longer than a predefined period. This attitude detection needs the applied state 210, especially the boundary and / or state of the object to be fed back to the attitude analysis program 209. Since the existing technology can secretly monitor the application state 210 and use the coordinates provided by the position mapping program 207 to imitate other interface devices such as a computer mouse, there is no need to specifically create an application for the system. In some forms, this application status information 2 cannot be obtained and monitored. In this case, the attitude of the object below the activation group includes the attitude of keeping the hand. The posture is not changed or changed (beyond or extending the hand quickly forward or backward). Standards in this paper (Chinese Standard) (CNS) A Vision Grid X Tearing Love) Binding

k ^01423k ^ 01423

其中=已偵測超出的方法係經由保留手位置的變化過程，間的包ί所有手位置的記錄與結束最新取樣之事先定義時狀態。該時間表示使用者必㈣持手姿態不變的最短的:。也:於此變化過程中發現各個三維座標（x，y，ζ)内最取小與最大位置。若手出現在變化過程的所有取樣中且 -、瑕大位置間的距離在各個三維座標的事先定義臨界、2，便回報此超出的姿態。距離臨界値表示手允許移動、瑕大量，以及最大變化（或itter期望由系統的各種成分引 ^ 的位置）。回報姿態之一般模仿滑鼠的方法將模仿滑鼠接鍵。同時偵測表示滑鼠其餘操作的姿態，按兩下按艇，並模仿這些操作。二此外’可隨機偵測不限於與物體相關之指示器位置的姿 ^ 且由應用給予與應用狀態有關或無關的意義。一般使用此互動形式之應用並不明確使用•或顯示使用者的手或其他位置。！只依照此系統所做的位置解釋，完全或概要地控制這些應用。由於此系統所做之解釋可用於模仿傳統使用者輸出裝置可執行之姿態，如鍵盤或控制桿，故也無需替系統特別建立應用。寺多有用的解釋係直接依靠手偵測區域1 〇 5内之手的絕對位置。一種產生解釋的方法爲定義匣，平面或其他外形。若在第一匣内發現手位置，且尚未被置於立即優先觀察内（由於手位置在手偵測區域105之内或尚未被偵測到），則啓動一個狀態。維持此狀態直到未在第二匣内（或在第二平面所定義邊界的遠處）發現手位置爲止，並在此時關閉 -36- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐)Where = The detected excess method is the process of retaining the position of the hand, including the recording of all hand positions and the state of the latest definition at the end of the latest sampling. This time indicates the shortest time that the user must hold the same hand posture. Also: During this change, the smallest and largest positions within each three-dimensional coordinate (x, y, ζ) were found. If the hand appears in all the samples of the change process and the distance between the-and the position of the flaw is at the predefined threshold of each three-dimensional coordinate, 2, then the excess attitude is reported. The critical distance 値 indicates the hand is allowed to move, the number of flaws, and the maximum change (or the position where the itter expects to be induced by various components of the system). The general method of mimicking a mouse in return gesture will mimic the mouse connection. Simultaneously detect gestures that indicate the rest of the mouse's operations, double-click the boat, and mimic these operations. In addition, the position and position of the pointer position related to the object may be randomly detected ^ and the meaning given or related to the application state is given by the application. Applications that use this form of interaction generally do not explicitly use or display the user's hand or other location. !! Only in accordance with the position explanations made by this system, control these applications completely or in outline. Since the explanations made by this system can be used to imitate the gestures that traditional user output devices can perform, such as keyboards or joysticks, there is no need to create special applications for the system. Tera's useful explanation is directly relying on the absolute position of the hand within the 105 detection area. One way to generate an interpretation is to define a box, plane, or other shape. If the hand position is found in the first box and has not been placed in the immediate priority observation (because the hand position is within the hand detection area 105 or has not been detected), a state is activated. Keep this state until the hand position is not found in the second box (or far away from the boundary defined by the second plane), and it will be closed at this time -36- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm)

裝訂Binding

561423561423

此狀態。第二E必須將整個第一£包含在内，同時尺寸較大。使用較大E可減少手位置接近g邊界時，不小心啓動或關閉裝態的情形發生。一般而言，此狀態所使用之其中一種解釋方法係依賴姿態期望的使用目的。在一個法中，姿態直接反應啓動裝置開與關的狀態。模仿鍵盤鍵或控制桿發射按鈕時，啓動狀態時按下按鈕，關閉裝態時則鬆開按鈕。在其他一般方法中，僅利用狀態從關閉到開啓义狀態的轉變便可引起姿態。雖不將時間與關閉狀態回報給應用，但仍維持此狀態，使姿態於狀態關閉前將不會重覆，因此各個姿態範例皆需要使用者清楚定義的目的。一般所使用的第三種方法是利用從關閉到開啓之狀態的轉變引起姿態，並依與狀態開啓時間長度相同之事先定義的時間間隔定期重新引起姿態。此模仿方式保持鍵盤按鍵向下將導致特徵在某些應用中重覆。. 一種在兔偵測區域1 〇5内定義用於上述技術的匣或平面的方法描述如下。藉由定義分割手偵測區域1〇5爲發射區域1053與中立區域1504區域的第一平面（圖15八中的15〇1)與第一平面1502 (當手在平面間的區域ι5〇5時回報的姿態是依賴上述之手的先前位置），使上述技術可偵測手向前移動’其爲模仿控制桿上發射按鈕或導致應用響應與按下控制桿按奴相關的方式（例如在視頻遊戲中發射武器）。另一種在手偵測區域105内定義用於上述技術的匣或平面的方法描述如下。圖15B描述與角落重疊並被定義將手偵測區域1 05分割爲左，右，上及底部的第一類型平 561423 A7 B7 五、發明説明（35 ) 15 06，1507，1508，1509。第二類型平面被標記爲151〇， 1 5 1 1，1 5 1 2，1 5 13。分別處理第一與第二平面對。此平面組合模仿四個方向的游標鍵，其中角落的手可啓動兩個按鍵，許多應用一般解釋爲四個第二45度（對角線）方向。圖15C描述另一種用於模仿應用之抽象方向與應用的方法’其期望將明確地表示四個4 5度方向狀態。定義各個四個主要（水平和垂直）方向的匣1514，1515，1516，1517與定義各個第二>45度（對角線）方向的匣1518，1519，1520 , 1 52 1。爲清楚説明起見，僅描述第一類型的匣。在g之間定位一缺口。圖15D描述定義鄰近匣的方法。位於第一類型II 1522與1 523間的缺口確保使用者不小心進入匣，而缺口 1524由第二類型匣1 525與1526所填滿，因此系統將回報可一個姿態，直到使用者清楚地想要進入鄰近匣内。此按鈕組合可用於模仿一個八個方向的控制桿發射臺。另一種缉型的姿態依賴行動而非位置，或同時依賴行動與位置。此妥悲之範例爲將手擦往左方。此姿態可用於傳遞應用，此應用將返回前一頁或之前的狀態。透過模仿鍵盤與滑鼠’此姿態產生表示軟體，特別是p〇werp〇int，以和動到表示序列之則_個滑動。透過模仿鍵盤與滑鼠，此姿f產生瀏覽Web的使用者介面以執行與其aek按姐相關的安態。同制，將手擦往右方的㈣爲可用於傳遞一使用者須七動到下-頁或狀態之應用的姿態。舉例説明，此姿態導致表示軟體以移動到表示序列的下一個滑動，並導致瀏覽器軟體移動到下一頁。 "11 - 3 8 - 本紙張尺度適財g @家料-- 561423 A7This state. The second E must include the entire first £ and be larger in size. Using a larger E can reduce the situation of accidentally starting or closing the device when the hand position is close to the g boundary. In general, one of the interpretation methods used in this state depends on the intended purpose of the posture. In one method, the attitude directly reflects the state of the starting device on and off. When mimicking a keyboard key or joystick launch button, press the button when it is on, and release the button when it is off. In other general methods, only the transition from the closed state to the open state can be used to cause a posture. Although the time and the closed state are not reported to the application, the state is still maintained so that the posture will not be repeated until the state is closed. Therefore, each posture example requires a clearly defined purpose by the user. The third method generally used is to use attitude transitions from the closed to the open state to cause the attitude, and to re-initiate the attitude periodically at a pre-defined time interval of the same length as the state on time. Keeping keyboard keys down in this mimicry will cause features to repeat in some applications. A method of defining a cassette or plane for the above technique in the rabbit detection area 105 is described below. By defining the split hand detection area 105 as the first plane of the emission area 1053 and the neutral area 1504 (1501 in Figure 15) and the first plane 1502 (when the hand is in the area between the planes ι505) The gesture of return is dependent on the previous position of the above hand), so that the above technology can detect the hand's forward movement, which is to mimic the launch button on the joystick or cause the application to respond in a manner related to pressing the joystick (such as in the Fire weapons in video games). Another method of defining a cassette or plane for the above technique in the hand detection area 105 is described below. FIG. 15B depicts the first type of flat overlapped with corners and defined to divide the hand detection area 105 into left, right, top, and bottom 561423 A7 B7 V. Description of the invention (35) 15 06, 1507, 1508, 1509. The second type of plane is labeled 1510, 1 5 1 1, 1 5 1 2, 1 5 13. The first and second plane pairs are processed separately. This plane combination mimics the cursor keys in four directions, where the corner hand can activate two keys, and many applications are generally interpreted as four second 45 degree (diagonal) directions. Fig. 15C depicts another method for mimicking the abstract direction and application of an application ', which is expected to clearly represent the four 45 degree direction states. Define each of the four main (horizontal and vertical) boxes 1514, 1515, 1516, 1517 and define each second > 45-degree (diagonal) box 1518, 1519, 1520, 1 52 1. For clarity, only the first type of cassette is described. Locate a gap between g. FIG. 15D describes a method of defining a proximity bin. The gap between the first type II 1522 and 1 523 ensures that the user accidentally enters the box, and the gap 1524 is filled by the second type box 1 525 and 1526, so the system will report a gesture until the user clearly wants to To enter the adjacent box. This button combination can be used to simulate an eight-way joystick launch pad. Another type of stance depends on action rather than position, or on both action and position. This sad example is to rub your hand to the left. This gesture can be used to pass an application, which will return to the previous or previous state. By imitating the keyboard and mouse, this gesture generates the presentation software, especially pοwerp〇int, and moves to and from the sequence to show a slide. By imitating a keyboard and a mouse, this pose f generates a user interface for browsing the Web to perform security related to its aek press sister. With the same system, rubbing the hand to the right is a gesture that can be used to convey an application where the user must move to the next page or state seven times. For example, this pose causes the presentation software to move to the next slide of the presentation sequence and causes the browser software to move to the next page. " 11-3 8-This paper is suitable for g @ 家料-561423 A7

"以下描述一種比先前所述方法更簡易，用於偵測將手擦往左方姿態的方法，其係利用將手偵測區域105分割成透過平面分割的區域之方法。將手偵測區域i 〇5最左部份上的細條紋定義爲左邊區域。手位置的表現方式如以下三種狀態： 1·出現手但手不在左邊區域内 2·出現手且手在左邊區域内 3 ·手偵測ΐ域内沒有出現手上述從狀態1到狀態2的轉換導致偵測程序209進入一狀怨中’並藉此啓動一計時器與等候下一次的轉換。若在一事先決定的時間内觀察到狀態3的轉換，便已發生已回報之將手擦往左方的姿態。此技術一般重覆對右邊，上邊與較低邊，且因爲是在三維中發現手位置，所以也將重覆偵測ulling the hand back。可利用手或軀幹的位置偵測上述的所有姿態。在本系統另一個變化中，使用者導致一個指示器或兩個指示器的表示三維虛擬環境的表示（使用者反饋206)内移動。立體裝置可提供反饋並藉此使各使用者可用肉眼觀看建立深度幻覺的獨特影像，即使無法在許多形式中實踐此系統類型，因此只可隨機使用。然而，也可能包含利用投影轉換執行虛擬環境之物體的深度。圖12A，12B與13 A提供執行此類型之使用方法的範例。參考圖12A，以下描述一種方法，將由之前所述之景象分析程序204所偵測之位置映射程序207内的手位置205映 -39- 本紙張尺度適用中國國家標準(CNS) A4规格(210X297公釐) 561423 A7 B7 五、發明説明（37 射到虛擬環境中定位指示器1 2〇1的位置。相關於手偵測區域105的手位置205，在轉換到應用程式208前，由位置映射程序207映射到相關於視頻顯示！〇8的座標。一種映射座標的方法透過方程式9的應用，得到X座標與y與z座標的等式。除增加第三維之外，此方法與上述方法相似。若使用者有運用所有三維内之指示器1201位置的能力，則使用者101可導致指示器接觸如同眞實環境之虛擬環境的物體（如1202)。此爲使用者與虛擬環境互動的方法。比較表示爲立方體或球形之指示器與物體的邊界（如12〇3與 1204)。兩種邊界交叉的情形指示此指示器接觸到物體。若有被安排好的物體，使用者可能可以導致指示器移動到接觸物體的位置，其中指示器路徑可避免接觸其他任意物體。因此，一個ouch信號表示活化，操作或移動物體的使用者意圖。因此，不同於二維控制”指示器120 1的三維控制消除開j台實施其中一項行動之明確姿態的需要。同樣的，不同鉢二維控制，可在不同深度（如圖12a的檔案匣）安排物體以提供一介面，其更類似使用者於眞實世界執行之熟悉的行動。此外，不受限於相關於一物體之指示器丨2〇 j 位置的姿態可被隨機偵測以指示執行行動的意圖。使用者利用本系統而可能航行於一虛擬環境中。相較於一度出現於使用者反饋206中，透過讓使用者導致所示之物體或資訊的可用選擇，航行允許使用者存取吏多的物體或資訊。使用者101利用隨機形式的航行，漫遊於一虛擬環境’且物體次集合與可用物體係依賴虛擬環境中的使用 -40- 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 561423 A7" The following describes a method that is simpler than the previously described method for detecting the gesture of wiping the hand to the left, which uses the method of dividing the hand detection area 105 into areas divided by planes. The thin stripe on the leftmost part of the hand detection area i 〇5 is defined as the left area. The expression of the hand position is shown in the following three states: 1. The hand is present but the hand is not in the left area 2. The hand is present and the hand is in the left area 3. The hand does not appear in the hand detection area. The above transition from state 1 to state 2 results in The detection program 209 enters into a complaint 'and thereby starts a timer and waits for the next transition. If a state 3 transition is observed within a predetermined time, a gesture of wiping the hand to the left has already occurred. This technique generally repeats the right, upper, and lower edges, and because the hand position is found in 3D, it will also repeatedly detect the ulling the hand back. All the above gestures can be detected using the position of the hand or torso. In another variation of the system, the user causes one or two indicators to move within a representation (user feedback 206) of the three-dimensional virtual environment. Stereoscopic devices provide feedback and thereby allow each user to view the unique image of the illusion of depth created with the naked eye, even if this type of system cannot be practiced in many forms, so it can only be used randomly. However, it may also include the depth of objects that perform virtual environments using projection transformations. Figures 12A, 12B, and 13 A provide examples of how to perform this type of usage. Referring to FIG. 12A, the following describes a method that maps the hand position 205 in the position mapping program 207 detected by the scene analysis program 204 described above. -39- This paper size applies the Chinese National Standard (CNS) A4 specification (210X297). (Centi) 561423 A7 B7 V. Description of the invention (37 Shoot the position of the positioning indicator 1 2101 in the virtual environment. The hand position 205 related to the hand detection area 105 is converted by the position mapping program before being transferred to the application 208 207 is mapped to the coordinates related to the video display! 〇8. A method of mapping coordinates is obtained by applying Equation 9 to obtain the coordinates of the X coordinate and the y and z coordinates. This method is similar to the above method except that the third dimension is added. If the user has the ability to use the position of the pointer 1201 in all three dimensions, the user 101 can cause the pointer to contact an object (such as 1202) in a virtual environment like a real environment. This is a method for the user to interact with the virtual environment. Compare Represents the boundary of a cube or sphere pointer with an object (such as 1203 and 1204). The intersection of two types of boundaries indicates that the pointer is in contact with the object. If arranged The user may cause the pointer to move to a position where it contacts the object, where the pointer path avoids contact with any other object. Therefore, an ouch signal indicates the user's intention to activate, operate or move the object. Therefore, unlike the second The three-dimensional control of the “dimensional control” indicator 120 1 eliminates the need for a clear posture for one of the operations. Similarly, two-dimensional control of different bowls can arrange objects at different depths (such as the file box in Figure 12a) to provide An interface that is more similar to the familiar actions that users perform in the real world. In addition, the gestures that are not limited to pointers related to an object 丨 20j position can be randomly detected to indicate the intention to perform the action. Use Using this system, the user may navigate in a virtual environment. Compared to what once appeared in the user feedback 206, by allowing the user to cause the available selection of the objects or information shown, navigation allows the user to access many objects Or information. User 101 uses a random form of navigation to roam in a virtual environment 'and sub-collections of objects and a system of available things Depends on use in a virtual environment -40- This paper size applies to China National Standard (CNS) A4 (210X297 mm) 561423 A7

Πΐ觸Γ3,表示—範例’其中使用者可漫遊於虛擬空間以接觸表不爲儲存空間之任一物體集合。接考“ ι4 -種使用者漫遊於_虛擬環境的方法 =目=之虛擬環境的方式執行虛擬環境，藉此: 虛挺照相機視野範圍内部，且不被任何虛擬的物體表現給使用纟。在-個稱爲imp⑽n的選擇中j 相機的位置代表虛擬環境中的使用者位置。在另一個選擇中二另一指示器代表虛擬環境中的使用者位置。此指示器可犯爲個代表使用者1〇1的化身（顯示於視頻顯示108)。發生跟奴扣不益之虛擬照相機位置，使此指示器與所有相關於目前使用者位置之使用者的物體可位於虛擬照相機視野範圍内。使用者手，身體或頭部位置皆影響使用者漫遊時的虛擬位置。一表示使用者軀幹中央或頭頂的位置係由本系統的一些實行方法中發現的，特別是在如圖9所述之完全執行隨機姿態分析程序609的實行方法中。使用其中任一位置可使使用者10 1單獨執行手位置的漫遊姿態，允許使用手在漫遊時接觸虛擬物體。請注意這些可接觸物體可固定在相關於虛擬環境的位置，或固定於虛擬照相機的位置，使使用者隨時可用。若沒有可用的位置則需使用使用者手位置控制漫遊。在這個情形下，系統在使用者已漫遊接近可接觸的虛擬物體或以執行一事先定義的姿態時，可自動轉換爲接觸内容。爲提供一表示無對於虛擬位置改變的區域，稱爲死亡區 -41 - 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐)Πΐ touches Γ3, which means-example ', in which the user can roam in the virtual space to contact any set of objects that is not the storage space. Exam "ι4-A way for users to roam in _virtual environment = virtual environment to implement the virtual environment, by which: the camera's field of vision is within the scope of the camera, and is not represented by any virtual objects. -In a choice called impjn, the position of the camera represents the location of the user in the virtual environment. In another option, another indicator represents the location of the user in the virtual environment. This indicator can be regarded as a representative user The avatar of 010 (shown in video display 108). The position of the virtual camera that caused unscrupulous clasps, so that this indicator and all user objects related to the current user position can be located within the field of vision of the virtual camera. Use The position of the user ’s hand, body, or head all affects the virtual position of the user when roaming. One indicates that the position of the center or top of the user ’s torso was discovered by some implementation methods of the system, especially when it is fully implemented as shown in FIG. 9 In the implementation method of the random attitude analysis program 609. Using any of these positions allows the user 101 to perform the roaming gesture of the hand position independently, allowing use Touch virtual objects while roaming. Please note that these accessible objects can be fixed at the location related to the virtual environment or fixed to the position of the virtual camera, so that the user is available at any time. If there is no available location, use the user's hand position control Roaming. In this case, the system can automatically convert to contact content when the user has roamed close to an accessible virtual object or performed a predefined gesture. To provide an area that indicates no change to the virtual position, it is called Dead zone-41-This paper size applies to Chinese National Standard (CNS) A4 (210X297 mm)

裝訂Binding

561423 -42- A7 -、發明説明（39 域，可利用方程式1 〇的虛m γ π ~ 勺應用（及传到y與z座標的相似方程式）重新映射位置，此方法導致圖13B所述之關係。請注意邊界與中土位置需符合手偵測區域105與其中心，或JL他已動態調整以適應使用者的區域。使用軀幹或頭部時，方程式10所使用之邊界與中立位置可依照以下説明調整以適應使用者。首先，方程式10所i 用的中立位置Xc’yc，zc符合使用者身體的中立位置。使用此系..·先後，所有使用者不會處於絕對相同的位置。在使用者已給時間以進入所想區域102後，取樣使用者驅幹或頭部位置並作爲中立位置。定義使用者期望自在移動（咬代表各厘標軸）之距離的行動最大範圍。爲確保使用者移動到位置盡頭時仍能維持於所想區域102内，利用上述姿態 (半個最大範圍的最小範圍’外加各x，y，與z維中典型身組大小的半邵’將中立位置\的邊界限制在所想區域102 内:邊界—b,與\定位於與中立位置相關的位置，各距離中 JL位置半個行動之最大範圍。上述姿態係依照頭部或身體軀幹的位置與/或行動。在這個情形T，使用這些邊界所定義的區$而非手偵 105 〇使用者之水平行動（在圖丨範例中標示爲χ的座標軸上）使，擬環境的視野移向左或向右。方程式1〇轉換之水平位置是作爲虛擬垂直座標軸相關旋轉的速度函數，導致指示器或照相機偏離。也可能使用者的垂直姿態使虛擬視; 私向上或向下（在圖1範例中標示爲y的座標軸上）。方程式本紙張尺度適用巾® S家標準(CNS) A4規格(21GX297公董）裝訂線 561423 五、發明説明 A7 B7561423 -42- A7-Description of the invention (domain 39, the virtual m γ π ~ spoon application of equation 1 0 (and similar equations passed to the y and z coordinates) is used to remap the position. This method leads to the problem described in FIG. 13B Relationship. Please note that the boundary and middle-earth position must conform to the hand detection area 105 and its center, or the area that has been dynamically adjusted to suit the user. When using the torso or head, the boundary and neutral position used in Equation 10 can be determined according to The following description adjusts to suit the user. First, the neutral position Xc'yc, zc used in Equation 10 i matches the neutral position of the user's body. With this system, all users will not be in absolutely the same position. After the user has given time to enter the desired area 102, sample the user's drive or head position and use it as a neutral position. Define the maximum range of action that the user expects to move freely (bite represents each centrifugal axis). To ensure that the user can still stay within the desired area 102 when moving to the end of the position, use the above posture (half of the maximum range's minimum range 'plus each x, y, and typical body groups in the z dimension The size of Ban Shao 'limits the boundary of the neutral position \ to the desired area 102: boundary-b, which is positioned at a position related to the neutral position, and the maximum range of half the action of the JL position in each distance. The above attitude is based on The position and / or action of the head or body torso. In this case T, use the area defined by these boundaries instead of the hand detection 105. The horizontal movement of the user (on the coordinate axis labeled χ in the example in Figure 丨) makes The visual field of the pseudo-environment moves to the left or right. The horizontal position of the transformation of Equation 10 is a function of the speed of the virtual vertical coordinate axis related rotation, causing the pointer or camera to deviate. It is also possible that the user's vertical posture makes the virtual view; private upward Or downward (on the coordinate axis labeled y in the example in Figure 1). Formula This paper size is suitable for towels® S Standard (CNS) A4 size (21GX297 male director) Binding line 561423 V. Description of invention A7 B7

1 〇轉換的垂直位置可直接解釋爲關於水平座標軸的旋轉傾斜角，以定位指示器與/或照相機。所顯示的使用者101行動或出自顯示的使用者行動（在圖1範例中標示爲z的座標幸由上）導致虛擬位置移向前或向後。一類型的行動類似行走，其中指示器與/或照相機保持在虛擬地板上的事先定義高度，並沿著地板的輪廓進行（例如往樓梯上升）。此轉換位置可作爲向量的速度函數，其爲地板所定義之指示器與/ 或照相機方向到平面的推測。另一種行動類型近似於謊報。若有此必要，轉換位置可作爲指示器與/或照相機方向所定義之向量的速度函數。圖14描述利用所述控制謊報方法以航行於一虛擬環境的範例。本範例中使用由先前描述方法與利用方程式10之映射與如先前所述之適合的中立位置所發現的使用者軀幹位置。不論是否爲使用者用於控制或漫遊虛擬環境的方法，使用於虛擬環境中的指示器可採用化身的形式。一化身通常採用如圖]4之1401的人體形式。本系統所發現的位置充分提供活化虛擬人體形式的資訊。本系統在使用者的手位於手偵測區域10 5内時，發現使用者的兩手。將這些位置重新映:射到化身軀幹前的映射位置，使此化身的手可接觸如使用者接觸的相同位置。當手不在手偵測區域105内時，將不發現或選擇使用者的手。在這個情形下，可移動化身的映射虛擬手到化身身體上的中立位置。在本方法使用漫遊的實行方法中，發現與中立位置相關 -43- 本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐)10 The vertical position of the transformation can be directly interpreted as the tilt angle of rotation about the horizontal coordinate axis to locate the indicator and / or the camera. The displayed user 101 moves or results from the displayed user action (coordinates labeled z in the example in FIG. 1 fortunately from above) cause the virtual position to move forward or backward. One type of action is similar to walking, where indicators and / or cameras are held at a predefined height on the virtual floor and follow the contour of the floor (for example, going up a staircase). This translation position can be used as a velocity function of the vector, which is a speculation of the pointer and / or camera orientation to the plane defined by the floor. Another type of action is similar to a lie. If necessary, the transition position can be a function of the speed of a vector defined by the pointer and / or camera orientation. FIG. 14 illustrates an example of using the control false report method to navigate a virtual environment. This example uses the user's torso position found by the previously described method and using the mapping of Equation 10 and a suitable neutral position as previously described. Regardless of whether the method is used by the user to control or navigate the virtual environment, the pointer used in the virtual environment may take the form of an avatar. An avatar usually takes the form of a human body as shown in Figure 4 of 1401. The locations found by this system provide sufficient information to activate the virtual human form. When the user's hand is within the hand detection area 105, the system finds both hands of the user. Remap these locations: projected locations in front of the avatar's torso so that the hands of this avatar can reach the same location as the user touches. When the hand is not within the hand detection area 105, the user's hand will not be found or selected. In this case, the avatar's mapping virtual hand is neutral to the avatar's body. In the implementation of this method using roaming, it is found that it is related to neutral position -43- This paper size applies the Chinese National Standard (CNS) A4 specification (210X 297 mm)

裝訂Binding

k 561423 A7 B7 五、發明説明（41 ) ' 的控制位置。在這些實行方法中，化身的腳可維持在固定的位置並可直接利用相關控制位置以決定固定腳上之化身軀幹的位置（姿勢）。圖14描述以此方式所控制的化身。在未使用漫遊的實行方法中，可利用如選擇成分6〇9中表示使用者軀幹中心的位置或相關於頭頂的位置直接地決定化身軀幹位置。可透過反力學技術發現其他第二結合位置的細節。特別是相關於則臂之方向資料6 13可用於限制反力學解釋以定位肘到鄰近區域，其中的前臂出自於手偵測區域1〇5内。方向資料613限制到平面的肘。將平面上的肘位置定爲弧形交叉，具有表示化身長度上臂或下臂數據段，一個集中於化身手位置（位於虛擬環境中），而其他則集中於相關於表示肩膀的化身軀幹。應用程式同樣也決定化身膝蓋位置。透過在固定位置定位化身的腳亟確保化身踝部不扭曲的方式二_固定f曲膝蓋的平面，並利用如肘部的相同交叉計算決定膝蓋位置。此外，使用固定腳位置，可計算化身位置，其中化身往所需方向傾斜。利用以上計算可= 到足以活化此化身之化身軀幹，手，肘，腳與膝蓋的位置。方程式方程式1k 561423 A7 B7 V. Control position of invention description (41). In these implementation methods, the feet of the avatar can be maintained in a fixed position and the relevant control position can be directly used to determine the position (posture) of the avatar torso on the fixed foot. FIG. 14 depicts an avatar controlled in this manner. In the practice method without using roaming, the position of the center of the user's torso or the position related to the top of the head can be used to directly determine the position of the torso of the avatar using the selection component 609. Details of other second bonding positions can be found through inverse mechanics techniques. In particular, the directional data 6 13 related to the arm can be used to limit the inverse mechanics interpretation to locate the elbow to the adjacent area, of which the forearm comes from the hand detection area 105. The orientation data 613 is limited to a flat elbow. The elbow position on the plane is defined as an arc cross with data segments representing the length of the avatar upper or lower arm, one focused on the position of the avatar hand (located in the virtual environment), and the other focused on the avatar torso related to the shoulder. The app also determines the knee position of the avatar. By positioning the feet of the avatar in a fixed position, it is urgent to ensure that the ankle of the avatar is not distorted. Secondly, fix the plane of the f-curved knee, and use the same cross calculation as the elbow to determine the knee position. In addition, using the fixed foot position, the avatar position can be calculated, where the avatar is tilted in the desired direction. Using the above calculations = the position of the avatar torso, hands, elbows, feet and knees sufficient to activate this avatar. Equation 1

X上 D 其中I爲照相機之間的距離 D爲不一致D on X where I is the distance between the cameras and D is inconsistent

-44--44-

561423 A7 B7 五、發明説明（42 ) X爲影像位置 X爲世界座標位置方程式2 {d + dh -d0) 0 ..(sFI sin a) + Uycosa) i =-561423 A7 B7 V. Description of the invention (42) X is the image position X is the world coordinate position Equation 2 (d + dh -d0) 0 .. (sFI sin a) + Uycosa) i =-

D 其中I爲照相機之間的距離 D爲不一致 F爲平均焦距 s爲應用於此焦距的單位轉換因子 α爲照相機與世界座標z軸之間的傾斜角 y爲影像位置 Y爲世界座標位置方程式3 7_ (sFIcosa) + (fysina) L· =-D where I is the distance between the cameras D is inconsistent F is the average focal length s is the unit conversion factor applied to this focal length α is the tilt angle between the camera and the world coordinate z axis y is the image position Y is the world coordinate position Equation 3 7_ (sFIcosa) + (fysina) L ·-

D 其中I爲照相機之間的距離 < D爲不_ 一致 F爲丰均焦距 s爲應用於此焦距的單位轉換因子以爲照相機與世界座標z軸之間的傾斜角 z爲影像位置 Z爲世界座標位置方程式4 -dh) otherwise -45- 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐)D where I is the distance between the cameras < D is not consistent; F is the average focal length s is the unit conversion factor applied to this focal length; is the tilt angle between the camera and the world coordinate z axis; z is the image position; Z is the world Coordinate position equation 4 -dh) otherwise -45- This paper size applies to China National Standard (CNS) A4 (210X297 mm)

裝訂Binding

561423 A7 B7 五、發明説明（43 ) 其中w爲權重，量由0到1 d爲此特徵到手偵測區域的距離 d〇爲最遠特徵到手偵測區域的距離 dh爲代表此手之期望大小的預先決定距離方程式5 S = ^ aSB + (1 -a)SA where a = d-da db —Da if{D<DA) if(DA<D<DB) if(D>DB) 其中 D = |r(t)-s(t-l)| s(t)爲在時間t的平滑値 r(t)爲在時間t的未處理値 Da與Db爲臨界値 SA與SB定義削減度方程式6 α = |其中a爲邊界使得OSaSl 其中S爲利用方程式8找到的消減 e爲從先前取樣至今的消逝時間 a爲一常數方程式7 其中s(t)爲在時間t的平滑値 r(t)爲在時間t的未處理値 a爲一常數，其中OSaS 1 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐） 561423 A7561423 A7 B7 V. Description of the invention (43) Where w is the weight and the quantity is from 0 to 1 d The distance from this feature to the hand detection area d is the distance from the furthest feature to the hand detection area dh is the expected size representing the hand The predetermined distance equation 5 S = ^ aSB + (1 -a) SA where a = d-da db —Da if (D < DA) if (DA < D < DB) if (D > DB) where D = | r (t) -s (tl) | s (t) is the smoothing at time t; r (t) is the unprocessed at time t; Da and Db are critical; SA and SB define the degree of reduction equation 6 α = | Where a is the boundary such that OSaSl where S is the subtraction found using Equation 8 e is the elapsed time since the previous sampling a is a constant Equation 7 where s (t) is a smoothing at time t 値 r (t) is at time t The untreated 値 a is a constant, where OSaS 1 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) 561423 A7

方程式8 Λ： ^ βφι -bc) if ι6β _arm ^ β(br —bc) if right - arm ^ be if unknown 其中X爲此手偵測區域的位置 bc爲身體中央的位置 bl與br爲身體左與右邊界的位置· 々爲代表此手偵測區域位置偏向左或右邊之數量的常數方程式9 夂=Equation 8 Λ: ^ βφι -bc) if ι6β _arm ^ β (br —bc) if right-arm ^ be if unknown where X is the position of the hand detection area bc is the position of the center of the body bl and br are the left and right of the body The position of the right border · 々 is a constant equation representing the number of positions of the hand detection area to the left or right. 9 夂 =

if xh <bt if 其中Xh爲在此世界座標系統的手部位置if xh < bt if where Xh is the hand position in this world coordinate system

Xc爲在此螢幕上的游標位置，映射〇到1 b 1與b r爲有關此世界座標系統在此手偵測區域之内的次區域之左與右邊界的位置方程式10 - Xm Χ/ι - (Xc -夸) b丨a) ΟXc is the cursor position on this screen, mapping 0 to 1 b 1 and br are the equations for the position of the left and right borders of the sub-regions of this world coordinate system within the detection area of this hand 10-Xm χ / ι- (Xc -quad) b 丨 a) 〇

Xh ~ (Xc + ^j) KX + 夸) Xm if ^ ^bt ' z/ 6/ < χΛ < (xc — ^-) lf (xc + ff)<xh <br if xh -47- 本紙張尺度適用中國國家標準(CNS) A4規格(210 x 297公釐) 561423 A7 B7 五、發明説明（45 ) 其中xv爲應用於此虛擬座標系統的速度乂^^爲可以應用於此虛擬座標系統的速度最大量 Xh爲在此世界座標系統的位置 xc爲在此世界座標系統的中立位置 xd爲在此世界座標系統之”死亡區域π的寬度 b!與br爲有關此世界座標系統在此手偵測區域之内的次區域之左與右邊界的位置已經·描述一些履行。不過，可以做出不同修改將是可以理解的。因此，其他履行市在下述申請專利範圍的領域之内0 -48- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐)Xh ~ (Xc + ^ j) KX + Qua) Xm if ^ ^ bt 'z / 6 / < χΛ < (xc — ^-) lf (xc + ff) < xh < br if xh -47- This paper size applies the Chinese National Standard (CNS) A4 specification (210 x 297 mm) 561423 A7 B7 V. Description of the invention (45) where xv is the speed applied to this virtual coordinate system 乂 ^^ is applicable to this virtual coordinate The maximum speed of the system Xh is the position in this world coordinate system xc is the neutral position in this world coordinate system xd is the width of the "dead area π" in this world coordinate system b! And br are related to this world coordinate system here The positions of the left and right borders of the sub-areas within the hand detection area have already described some implementations. However, it will be understood that different modifications can be made. Therefore, other implementation cities are within the scope of the patent application below. -48- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)

Claims

561423 A B c D 1. A method of communicating with a computer using stereo vision. The method includes: capturing a stereo image; processing the stereo image to determine position information of an object in the stereo image, the object being controlled by a user; and using the position information to allow the user to interact with a computer application. 2. The method according to item 1 of the patent application scope, wherein the step of capturing the stereoscopic image further comprises capturing the stereoscopic image using a stereo camera. 3. The method according to item 1 of the scope of patent application, further comprising analyzing a change in position information of the object to identify a posture related to the object, and controlling a computer application based on the identified posture. 4. The method of claim 3, further comprising: determining an application state of a computer application; and using the application state when identifying the gesture. 5. If the method of the first scope of the patent application, the object is the user. 6. If applying for the method of item 1 of the Lurie range, where the object is part of the user. -7 · The method according to item 1 of the patent application scope further includes providing feedback to users of the computer application. 8. The method according to item 1 of the scope of patent application, wherein processing the stereoscopic image to determine the position information of the object further includes mapping the position information related to the position coordinates of the object to the screen coordinates of the computer application. 9. The method according to item 1 of the patent application scope, wherein processing the stereoscopic image further includes processing the stereoscopic image to identify feature information and generating a scene description from the feature information. -49- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)

561423 Scope of applying for a patent 10. The method according to item 9 of the scope of applying for a patent, further comprising analyzing the scene description in the scene analysis process to determine the position information of the object. 11. The method according to item 9 of the scope of patent application, wherein processing the stereoscopic image further includes: analyzing the scene description to identify—the position change of the object; and mapping the position change of the object. 12. The method according to item 9 of the patent application, wherein processing the stereo image to generate the scene description further includes: processing the JL volume image to identify matching feature pairs in the stereo image; and calculating # for each matching feature pair. & y, li ,, female, J day 1 are inconsistent with the position to produce a scene description. 13. For example, the method of claim 12 of the patent scope, wherein: capturing the stereo image further includes capturing a reference image from a reference camera and a comparison image from a comparison camera; and processing the stereo image further includes processing This reference image is compared with the comparison image to generate a feature pair. 14. For example, the ancient, Tuguhebeiwan method of the scope of application for patent, wherein processing the stereo image to identify matching feature pairs in the stereo image further includes: identifying the features in the reference image; for each reference The features in the image generate a set of candidate matching features in the comparison image; and for each feature in the reference image, a feature pair is generated by selecting a best matching feature from the set of candidate matching features.

-50 · 561423 A8 B8 C8 -------- _ D8 VI. Application scope of patent 15 · The method of item 13 of the scope of patent application, wherein processing the stereo image further includes filtering the reference image and comparing the image. 16. The method of claim 14 in the patent scope, wherein generating the feature pair further includes: calculating a matching score and ranking for each candidate matching feature; and selecting the candidate matching feature having the highest matching score to generate the feature pair. 17. The method according to item 14 of the scope of the patent application, wherein generating a set of candidate matching features for each feature in the reference image further includes selecting candidate matching features from a predetermined range in the comparison image. 18. The method according to item 6 of the patent application scope, wherein the feature pairs are eliminated according to the matching scores of the candidate matching features. 19. The method according to item 18 of the scope of patent application, wherein if the matching score of the highest ranking candidate matching feature is lower than the predetermined threshold, the feature pair is eliminated. A 20. The method as described in Application No. 18 of the Patent Scope, wherein the feature pair is eliminated if the matching score of the highest ranking candidate matching feature is within a predetermined threshold of the matching score of the lower ranking candidate matching feature. 21. The method of item 16 of the patent application, wherein the calculation of the matching score further includes: identifying those adjacent feature pairs; adjusting the feature pair matching score according to the proportion of matching scores of similar candidate features that are inconsistent nearby, and selecting The candidate matching feature with the highest adjusted matching score to create special -51- This paper size applies to China National Standard (CNS) A4 specifications (210 X 297 mm) 561423 A8 B8

Levy pair 22: If the scope of the patent application is No. 6 and the application comparison image is used as the reference image: and; generate a second set of feature pairs; and test ~ compare images like horses to eliminate those feature pairs of the original feature pairs. -、, and there is no phase in the competition pair 23. If the method of the scope of patent application No. 12, the further steps include ... Two each feature pair described in the t! Image, by: Each feature that runs through the world cliff is inconsistent with the inconsistency and position of the signature pair to calculate the true world coordinates. For example, the method of claim 14 in the scope of patent application, where the feature selection further includes the reference image of the interbody image and The comparison image is divided into blocks. 仏 For example, the method of applying for the scope of patent application No. 24, wherein this feature is described by the luminescence pattern of the pixels included in the block. The division further includes dividing this image into pixel blocks with a fixed size. 27. For the method of applying for item 26 of the patent scope, where the pixel block is an 8 × 8 pixel block. 28. If applying for a patent The method of scope item 10, wherein analyzing the scene description to determine the position information of the object further includes trimming the scene description to exclude feature information outside the area of interest in the horizon. 2 9. The method of the range of item 28, wherein trimming further includes establishing the boundary of the area of interest. 30. For the method of the range of application for the patent, the analysis of this scene description is based on -52- This paper scale applies Chinese national standards ( CNS) Α4 size (210 X 297 mm) A8 Βδ

The determination of the position information of the object further includes the feature information in the region of interest within a predetermined range, and the position of each group is calculated. = The knife group is a group with a feature set by comparison with the near feature information; and 31. If the method of the 30th scope of the patent application, the group below a predetermined feature threshold 32. The method of item 30, selecting-selecting a group that matches a predetermined criterion further includes eliminating these positions that further include: records that match this predetermined target; and that the position of the group of quasi-groups is the object position to output the object position coordinate. It further includes checking the presence of the user. Among them, the position of each group can be calculated. 33. For example, the method in the 30th area of the patent application, the characteristics of the previous detection area determine the group. The characteristic of the group M application-the method of the patent scope item 32, further includes defining a dynamic object detection area according to the position coordinates of the object. 'Declares the method of patent scope item 35, wherein the dynamic object detection area can be defined as related to a user's body. 37. If the method according to item 32 of the scope of patent application is applied, the method further includes defining a body position detection area based on the position coordinates of the object. 38. The method of item 37 in the patent application of claim 4, wherein defining the body position detection area further includes detecting the head position of the user. 39. The method of claim 32 of the scope of patent application, further including smoothing the object 53- ^ 61423

The movement of the position coordinates to eliminate the jitter between consecutive image frames. 40. The method of claim 32, further comprising calculating hand-oriented information from the position coordinates of the object. 41 The method according to claim 40 of the patent scope, wherein outputting the position coordinates of the object further includes outputting the hand-guided information. 42. The method of claim 40, further comprising smoothing changes in the hand-guided information. 43. The method according to item 36 of the patent application, wherein defining the dynamic object detection area includes: identifying a position of a torso segmentation plane of a feature set; and determining a torso segmentation with respect to a coordinate axis perpendicular to the torso segmentation plane here Plane related hand detection area position. 44. The method according to item 43 of the patent application scope, further comprising: identifying a central position of the body and a boundary position of the body from the feature set;-using the feature to identify the intersection of the group and the torso split plane; identifying from the feature set-indicating the user The position of the arm portion; and using the position of the arm relative to the position of the body to identify whether the arm is a left arm or a right arm. 45. The method according to item 44 of the scope of patent application, further comprising identifying a shoulder position from the position of the center of the body, the position of the boundary of the body, the plane of the trunk division, and the left or right arm. 46. The method of claim 45, wherein defining the motion detection area includes determining position data of a hand detection area related to the shoulder position. -54- This paper size applies to China National Standard (CNS) A4 specification (21〇X 297 公 #) 561423 A8 B8 C8 The scope of patent application π is as described in the 46th item of the scope of application benefits, and further includes this The position information of the hand-monitored area is smooth. 48. The method according to item 45 of the scope of patent application, further comprising: determining the position of the dynamic object detection area related to the torso segmentation plane perpendicular to the coordinate axis of the torso segmentation plane; determining the dynamics on the horizontal coordinate axis related to the shoulder position The position of the object detection area; and the vertical coordinate axis with the overall height of the user who uses the body boundary position determines the position of the dynamic object detection area. 49. If the party in the 36th scope of the application for patents & defines & this dynamic object detection area includes the following steps: Unless the highest feature pair is at the boundary, the highest feature pair of the feature set is used to create the user Head position; and determine the position of the hand detection area associated with the use of the head. 50. — Seeds! Method for contacting stereo vision with a computer, the method includes: capturing a stereo image using two stereo cameras; processing the JL volume image to determine position information of an object in the stereo image, the object being controlled by a user; processing the interbody The image is used to identify feature information, to generate a scene description of the feature information, and to identify matching feature pairs in the stereo image; to calculate the inconsistencies and positions of each matching feature pair to create this scene description; to analyze the scene description in a scene analysis program to Determine the location information of this object; -55- This paper size applies Chinese National Standard (CNS) A4 specification (21〇X 297 public director) 561423

The feature information of the desired area is grouped into a group with a feature set by using a comparison with neighboring feature information in a predetermined range; and the position of each group is calculated; and the position information is used to allow the user to interact with a computer application. 51. The method as claimed in claim 50 of the patent scope, further comprising: mapping the position of the object from the camera coordinates to the screen coordinates of the feature information related to this computer application; and using this mapping: shooting position with this computer application Interaction ^ 5 2 · The method according to item 50 of the patent application scope, further comprising: analyzing the change of the position information of the object described in the scene to identify the posture related to the object; and interacting with the computer application by combining the position information and the posture . 53. The method of claim 50, wherein the step of capturing the stereoscopic image further comprises capturing the stereoscopic image using a stereo camera. 54 · —A stereo video system that interacts with an application program executed by a computer. The stereo video system includes: "" first and second video cameras configured in a neighboring structure and operable to generate a series of stereo video images; and A processor that can be used to receive a series of stereo video images and detect objects shown in the intersection of the camera's field of view. This process executes a program to define the three-dimensional mounts related to the first and second video camera positions > 椤Object detection area; v select the control object shown in the object detection area; and when this control object moves into this object detection area, map this 栌

Equipment -56-

561423 A8 B8 C8 —— ___D8____ VI. Patent application scope Control the position coordinates of the object to the position indicator related to this application. 5: > If the stereoscopic video system according to item 54 of the patent application scope, the program selects a control object as a detected object that appears closest to the video camera and within the object detection area. 56. The stereo video system according to item 54 of the patent application, wherein the control object is a human hand. 57. The stereo video system according to item 54 of the patent application, wherein the horizontal position of the control object related to the video camera is mapped to the X-axis screen coordinates of the position indicator. 58. A stereo video system as claimed in claim 54 in which the vertical position of the control object associated with the video camera is mapped to the y-axis screen coordinates of the position indicator. 59. The stereo video system according to item 54 of the patent application, wherein the processor is installed to:, map the horizontal position of the tantalum about the control object of the video camera to the X-axis screen coordinate of the position indicator; map related to this video The camera controls the vertical position of the object to the y-axis screen coordinate of this position indicator; and simulates a mouse function that uses the combined X-axis and y-axis screen coordinates provided to this application. 60. The stereo video system according to item 59 of the patent application scope, wherein the processor is further installed to simulate a mouse button using the posture obtained from the position of the object. 61. If you apply for a stereo video system in the 59th scope of the patent application, which is further installed -57- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 561423 A8 B8 C8 D8

6. Scope of Patent Application This processor is installed to simulate a mouse button that controls the continuous position of an object for a predetermined time period based on any position within the detection area of the object. 62. The stereo video system according to item 59 of the patent application, wherein the processor is further installed to simulate a mouse button for a predetermined time period based on the position of a position indicator that continues within the boundary of a mutual display area . 63. The stereo video system according to item 59 of the patent application, wherein the processor is further installed to map the z-axis depth position of the control object related to the video camera to the z-axis screen coordinate of the position indicator. 64. For example, the stereoscopic video system according to the scope of patent application No. 54, wherein the processor is further installed to: map the x-axis position of the control object related to the video camera to the x-axis screen coordinates of the position indicator,, mapping cabinet The y-axis position of the control object of this video camera to this position indicates the y-axis screen coordinate of II; and the z-axis depth position of the control object related to this video camera is mapped to the z-axis screen coordinate of this position indicator. 65. If the stereoscopic video system of the scope of patent application No. 64, the position of a position indicator within the boundary of the mutual display area triggers an action in the application. 66. If the stereoscopic video system of the scope of patent application No. 54, where A control object moving along a z-axis depth position covering a predetermined distance within a predetermined time period triggers a selection action in this application. -58- This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) 561423 A8 B8 C8

67. If the stereoscopic video of the 54th scope of the patent application, first determine any one of the object detection areas in the time period, and use it to pre-position the position of the object to trigger the selection control in this application. -An application program for computer execution, this stereo video system includes: a stereo video series of stereo video images arranged in adjacent structures and operable to produce a series of first and second video cameras; and Images and detections in this photograph of Riello, this process performs a one-to-one procedure to receive objects appearing in the cross-field of view of the stereo video camera series of this series: ^ x,…, the object detection area of the three-dimensional coordinates of the camera position; select-control the object as-the detected object that appears closest to the video camera and within the object detection area; what body detection is here Define a sub-area within the area; identify the sub-area occupied by this control object; when this control object occupies this area, a start action related to this area; And use this action to interact with a computer application. 69. For example, the stereoscopic video system with the scope of patent application No. 68, which further defines the action related to this area as the simulation of the activation of the keys of a computer keyboard. 70. Such as The stereoscopic video system with the scope of patent application No. 68, which is used to trigger the position of an object continuously in any sub-region in a predetermined time period. (Centi) 561423 AB c D 6. The scope of patent application. 71 ·-A stereo video system for interacting with applications running on a computer. This stereo video system includes: a nearby structure and operable to generate a series of First and second video cameras for stereo video images; and a processor that operates to receive this series of stereo video images and detect objects that appear in the cross-field of view of this camera field. This process executes a program to: Appears in the intersection of the camera's field of view and the largest object in a predetermined depth range An object; selecting this object as a target object; determining a position coordinate representing the desired position; and using this position coordinate as an object control point to control the application. 72_ If the system of the 71st aspect of the patent application, where This procedure results in this processor: — determining and storing a neutral control point position; mapping the coordinates of an object control point related to this neutral control point position; and using the mapped object control point coordinates to control the application. 73. If you apply The system of claim 72, wherein the program results in the processor: defining an area having a position based on the position of this neutral control point position; mapping an object control point related to its position within this area; and -60- This paper size applies to China National Standard (CNS) A4 specifications (210 X 297 mm) 561423 A8 B8 C8 ________ D8 VI. Patent application scope Use this mapped object control point coordinates to control this application. 74. The system of claim 72, wherein the program causes the processor to: convert the mapped object control point to a velocity function; determine the perspective of the virtual environment related to the application; and use the velocity function in Move this perspective within this virtual environment. 75. If the system of the scope of patent application No. 7 丨, where this program causes the processor to map the coordinates of the control point of the object within this application to control the position of a finger that is not consulted. 76. Such as the scope of patent application No. 75 The item system, wherein the indicator is an avatar. 77. The system of claim 71, which causes the processor to map the coordinates of the control point of the object within the application to control the appearance of a pointer. 78. If the system of item 77 of the scope of the present invention is applied, the indicator is a unitary 79. The system of item 71 of the scope of the patent application is applied, wherein the target object is a human appearing within the range of the field of vision. 80. A stereo video system for interacting with an application program executed on a computer, the stereo video system includes: a first and a second video camera configured in a neighboring structure and operable to generate a series of stereo video images; and A processor that operates to receive this series of stereoscopic video images and detect objects that appear in the intersection of the camera's field of view. This process executes a -61- This paper size applies to the Chinese National Standard (CNS) A4 specification (210 X 297 561423 AB c D 6. The scope of the patent application procedure is: Λ identifies the object that is understood to appear in the cross field of view of the camera and the largest object in a predetermined depth range; select this object as a target object; defined in The control area between the camera and the object of interest. The control area is located at a predetermined position and has a predetermined size related to the size and position of the target object. The control area of the target object within this control area; if the point related to this target object is controlled here Within the area, select the relevant point of the target object thereto a control point; and when this control within the control point is moved to this area, the mapping position coordinates to the control point to the position indicator associated thereto app. 81. The system of claim 80, wherein the processor is operable to:-map the horizontal position of the control object related to the video camera to the X-axis screen coordinates of the position indicator; map related to the video camera It controls the vertical position of the object to the y-axis screen coordinate of this position indicator; and simulates a mouse function that uses the X-axis and y-axis screen coordinates. 82. The system according to item 80 of the scope of patent application, wherein the process can be operated to: map the x-axis position of the control object related to the video camera to the x-axis screen coordinate of the position indicator; -62- Applicable to China National Standard (CNS) A4 specification (210X 297 mm) 561423 AB c D 6. The scope of patent application maps the y-axis position of the control object related to this video camera to the y-axis screen coordinate of this position indicator; and mapping Corresponds to the Z-axis screen coordinate of the Z-axis depth position of the control object of this video camera to this position indicator. 83. The system according to item 80 of the patent application scope, wherein the target object is a human appearing within a field of vision. 84. The system of claim 80, wherein the control point relates to a human hand appearing in the control area. 85 · —A stereo video system for interacting with an application program executed on a computer, the stereo video system includes: a first and a second video camera configured in a neighboring structure and operable to generate a series of stereo video images; and A processor that operates to receive this series of stereoscopic video images and detect objects that appear in the cross-field of view of this camera. A processor executes a program to define 3D coordinates in 2 relative to the first and second video camera positions. Object detection area; select up to two hand objects from objects that appear at the intersection of the field of view in this object detection area and appear in the range; and map this hand object when this hand object moves within this object detection area Position coordinates to the position of the virtual hand associated with the avatar provided by this application. 86 · If the system of the scope of patent application is No. 85, the program selects at most two from the closest to the video camera and looks in the object detection area -63- This paper size applies to China National Standard (CNS) A4 specifications (210X297 mm) " " '' 1 ~ 561423

The hand of the object that appears at the turn of the wild. 8 7. The system according to item 85 of the scope of the patent application, in which the avatar is obtained-similar to the format of a human body. 88. The system of claim 85, wherein the avatar is provided in and interacts with the virtual environment that forms part of the application. 89. The system of claim 88, wherein the processor further executes a program to compare the position of the virtual hand and the position of the virtual object related to the avatar within the virtual environment so that the user can Interact with this virtual object. 90. The system of claim 85, wherein the processor further executes a program to: detect the position coordinates of the user within the intersection of the field of view; and map the position coordinates of the user to the location of the application. The virtual torso of the avatar provided. 91. If the system of item 85 of the Quarry range is applied, if a phase-mapped hand object is not selected, this program moves at least one virtual hand associated with this avatar to at least one neutral hand. 92. The system according to item 85 of the patent application range, wherein the processor further executes a program to: detect the position coordinates of the user within the crossing range of the field of view; and map the position coordinates of the user to the speed applied to the avatar Function to enable this avatar to roam in the virtual environment provided by this application. 93. If the system of item 92 of the scope of patent application is applied, in which the speed function includes the following: -64- This paper size applies to China National Standard (CNS) Α4 specification (210X 297 mm)

Binding

6 5 423

ABCD represents the neutral position of this avatar at zero speed. 94. The system of claim 93, wherein the processor further executes a program to map the user's position coordinates related to the neutral position to the torso coordinates related to the avatar so that the avatar appears tilted. 95. The system of claim 92, in which the processor further executes a program to compare the position of the virtual hand related to the avatar with the position of the virtual object temporarily in the virtual environment so that the user can roam in This virtual environment is constantly interacting with this virtual object. 96. The system of claim 85, in which the virtual knee position related to this avatar is obtained by this application and used to improve the avatar scene. · 97. The system according to item 85 of the patent application, wherein the virtual elbow position associated with this avatar is obtained by this application and used to improve the image of this avatar. 98. The system according to item 85 of the scope of patent application, further comprising a third camera disposed adjacent to the first and second video cameras and operable to generate a series of stereoscopic video images. / -65- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)