TW201025110A

TW201025110A - Method and apparatus for generation, distribution and display of interactive video content

Info

Publication number: TW201025110A
Application number: TW97149129A
Authority: TW
Inventors: Shlomo Selim Rakib; Alexander Bronstein; Michael Bronstein; Gilles Bruno Marie Devictor
Original assignee: Novafora Inc
Priority date: 2008-12-17
Filing date: 2008-12-17
Publication date: 2010-07-01

Abstract

Proposed is a model for generation and use of metadata for interactive video navigation and video content identification.

Description

201025110 九、發明說明：【發明所屬之技術領域】一種關於基於雙重手段’在客戶端與伺服端對視訊中時空位置的特徵的偵測以及對所關聯之簽章的計算，來啟動行為之方法及其裝置。 " 【先前技術】超影像連結(Hypervideo)或超連結影像(Hyperlinked ❹ deo)疋種包含在視訊及其他超媒體元素(hypermedia element)之間以嵌入式、使用者點選錨(肥沉 clickable anchors)及導航的方式連結的視訊内容(vide〇 c〇ntent)的通用名稱。超影像連結類似超文件連結(Hypertext)的概念，廣泛使用在全球資訊網中，以點選文件上的字詞連結至其他文件來取得資訊。這種概念可追酬職年代末，其發展非常緩慢，但是隨著寬綱際娜的盛行以及如YGuTube式的服務 _ 增加’視朗容的雜形態正麵速改變，可職在下載或串流的形式中網際網路的視訊分配將取代如數位影音光碟(DVD)或電視(TV)的傳統媒體分配。 /曰隨著網路網路上的視訊内容數量增加並且變得隨手 ’ 可得，視朗容導航的需求變得更重要，而超影像連結被、認 '為是方便的答案將改革視訊的消費如同當初超文件連結對文字媒體所造成的改革。最後，超影像連結最重要的價值是在商業廣告上，到目則為止’要設計商務形態的銷售視訊被證明是非常困難 5 201025110 的，傳統廣告方法是利用電視，例如視訊的介紹廣告，但線上社群認為是非常不適宜的。超影像連結提供替代方式作為銷售視訊，允許產生視訊片段之可能性，其物件連結至廣告或電子商務地點或提供特定產品的更多資訊，這種廣告新模型侵擾程度較低，當使用者透過點選視訊上的物件來做選擇時，只顯示“依需求，，的廣告資訊，由於是使用者來請求產品的資訊，這種廣告類型更有針對性與更有效果。201025110 IX. Description of the invention: [Technical field of invention] A method for initiating behavior based on the dual means 'detecting the characteristics of the spatiotemporal position in the video between the client and the server and calculating the associated signature And its equipment. " [Prior Art] Hyper-video (Hypervideo) or Hyper-linked image (Hyperlinked ❹ deo) is embedded between video and other hypermedia elements for embedded, user-selected anchors Anchors) and the generic name of the video content (vide〇c〇ntent) linked by navigation. Hyper-image links are similar to the concept of Hypertext, which is widely used in the World Wide Web to link information to other documents to obtain information. This concept can be traced to the end of the job, its development is very slow, but with the prevailing of the wide-ranging and the likes of the YGuTube-style service _ increase the variability of the positive shape of the singularity, can be downloaded or string The video distribution of the Internet in the form of streaming will replace traditional media distribution such as digital audio and video (DVD) or television (TV). /曰 As the amount of video content on the Internet increases and becomes readily available, the need for navigation is becoming more important, and the super-image link is recognized as a convenient answer that will reform the consumption of video. As the original super-file link to the reform of the text media. Finally, the most important value of super-image links is in commercial advertising. Until the end of the year, it is very difficult to design a sales video of business form. The traditional advertising method is to use TV, for example, video intro, but The online community thinks it is very unsuitable. Hyper-image links provide an alternative way to sell video, allowing the possibility of generating video clips, linking their objects to advertising or e-commerce locations or providing more information about specific products. This new model of advertising is less intrusive when users When you select an object on the video to make a selection, only the advertisement information according to the demand is displayed. Since the user requests the product information, the advertisement type is more targeted and more effective.

❹ 【發明内容】本發明遂揭露-種關於基於雙重手段，在客戶端與飼服端對視訊巾時空位置㈣賴伽以及對所關聯之簽章的計算，來啟動行為之方法及其裝置。其中一方面，本發明所揭露根據視訊串流操作之方法’其步驟至少包括在使用者處理系統中，接收視訊之區段’視訊枝包括代表連軌框之視訊魏，視訊^ 訊包括本質上?、絲轉換連續雜影像之視崎料接收行為之請求’行為是對應視訊串流之區段中所顯示影像的 a^r空位置’自動侧視訊串流之區段巾鄰近時空區域的特徵，在視訊串流之區段内自動偵測特徵之步驟是根據用來轉換連續訊框影像的該視訊資料來操作；依據w貞測徵來獲得觀φ流林_域的表徵映像；依據請求與表徵映像來料電子縣；及制電子縣紐動行為之執另方面’本發明所揭露包含連續訊框的視訊串流操 201025110 击、、*方法’其步驟至少包括在第—計算位置巾，接收視訊机’視訊串流包括代表連續訊框之視訊資訊，視訊資訊包括本質上只用來變換連續訊框影像之視訊資料；獲得視 - 訊串流内表示區域之表徵映像，包括：在視訊串流内自動 „ ^則特徵，在視訊串流之區段内自動偵測特徵之步驟是根康麦換連續況框景，像的該視訊資料來操作；分別依據所偵 7的特徵個顺算不龍域之電子縣；及酬行為並關 • 各電子簽章。進一步更包含，從第-計算錄相同或 =的第二計算位置中，接收客戶端電子簽章；匹配客戶 j電子簽章絲徵映像崎_定簽章;朗_特定簽早之行為；及啟動行為。本發明進-步的杨，本發明_露根魏訊串流操作之方法，其步驟至少包括在實體上是分開的每一使用者處理系統與飼服端處理系統十：接收視訊串流，視訊串流包括代表逹續訊框之視訊資訊，視訊資訊包括本質上口用 ® 來賴_酿鱗德贈料；獲得魏内: 域之表徵映像，包括：在視訊串流内自動_特徵^ 讯串流之區段内自動偵測特徵之步驟是根據變換連續訊框影像的視訊資料來操作；分別依據所侧的特徵計^不 ► ^各區域之電子簽章。其帽送在使用者處理系統情計 .#電子簽章其中之—至伺服端處_統；將電子簽章其^ 之-比對在飼服端處理系統所計算的電子簽章判服端處理系統所計算的特定電子簽章與電子簽章其中之一作為匹配；及依據匹配，飼服端處理系統啟動行為。 7 201025110 在上述方面，發生獲得表徵映像是依據所侧的_ 來獲得如_域的隱含物件或依據所_的特徵獲得如同區域的視訊元素。【實施方式】超影像連結(Hypervideo)表示法視訊元素(Video elements) • 廣泛來說，以超文件連結(hypertext)為例，超文件連結是以線性連續的符號表示（“文件i，，中的字元(1〇, 14) 關連至指向到其他媒體内容驗釋資料(metadata)(12, 16) (超連結(hyperlink)，如「第j圖」所示）。應用上述類型至視訊中，由於視訊是三維時空陣列 (three-dimensional spatio-temporal array)，代表視覺資訊的像素是不斷在改變，以致於陣列中鄰近區域對應到視覺有忍義物件，關聯發釋資料(如「第2圖」中22(1)、22(2)及 φ 22(3)所不)的視訊物件產生超物件(hyperobject))，一般來說，視訊中的“物件”不一定在語義上是有意義的—像素 (pixels)之相同區域不一定必須有視覺解釋，因此為了避免語義的言外之意，我們稱像素區域為視訊元素(video w elements)(或顯式物件）’如「第2圖」中視訊元素20，具 — 體來說是視訊元素20(1)、20(2)及20(3)，視訊元素被描述在美國專利申請號No. 11/778,633，“Method and Apparatus for Video Digest Generation” ，申請曰 2007/7/16 以作為參考。 8 201025110 •轉資料在超文件連結與超影像連結巾的表示法是不同的在超文件標不語言(物咖对marf^p ^guage， =TML)巾取敍使絲描述超請連結的語法是將證釋 • #_結(_相趾)3〇叙至其本相容巾，靖資 • ，_物件並且Μ至物件具有的行綠㈣，舉例來 :兒=$ 3圖」所示’一個行為能開啟另一個文件(如 «全釋貝料連結3〇⑴所示)或一個媒體槽案(如給釋資料連 ❹ 結3〇3(2)或超參考連結Ower-reference)所示），或更一般地說，是在客戶端來執行任何操作。 —、如「第4圖」所示’―種表示轉資料的可能方式如行為的清單關聯至對應的視訊元素，清單可能包含元素識別碼(element ID)、關鍵詞描述、指示是否可點選的標誌 (flag)及關聯的行為。更一般的情況下，詮釋資料可能包含關聯至母個視訊元素的可能行為之清單，從其中選擇一個特定行為’例如，從使用者設定檔(userpr〇file)使用個人 φ 資訊。在超影像連結表示法的建議中，詮釋資料分別被儲存並指向視訊映像(video map)，階層式資料結構(hierarchical data structure)被描述在之後的部份’視訊映像在視訊中依 ^ 序包含視訊元素之時空位置(spatio-temporal locations)的 - 描述，這些描述的實際語法是依據表示法請求的緊湊性與正確性。舉例來說，表示視訊元素(如「第4圖」中40(1) 一 40(5)所示）最容易的方式是透過具體指明時空邊界盒 (spatio-temporal bounding boxes)的角(comers) ’ 透過識別符 9 201025110 、視訊映像的部份，，呈現角位置，如「第4圖」所示，在視訊元素40 t視嫩像的描述是絲如邊界盒 boxes) 〇 • 下I彳’报重S ’帛來卿·訊映像以辨識可點選物件與鱗貧料，描述著行為關聯至點選之物件，根據本發 ' 财超影像連結絲法巾，這兩個資概構是分開的。視訊特徵(Video feature^) ❿ 在彳映像巾視訊元素下❸粒度(granularity)的階層具有視訊特徵，以下皆簡稱為“特徵(featoes)，，，特徵在電腦視覺j是通用名稱，經常用以描述如關聯至視訊之時空子集的資訊向量，例如特徵是時空邊緣(聊如倾㈣的3D方向、動晝領域的局部方向、顏色分佈等。如「第5圖」所示，辨別局部特徵(local features)50 與整體特徵(global feat咖)54，局部 5請聯至時空位置(在粒度最高階射’單—像素52)，紐特徵％關聯 • 至大的時空單位(訊框(ftame)、鏡頭或場景56)，典型地，局部特徵提供物件的描述，整體特徵提供上下文 (context) ’舉例來說，在電腦廣告中的蘋果物件以及在水果鏡頭_)的韻果物件具有描述物件的相同局部特徵， v 但整體的上下文是不同的。 ~ t敍來說’仙能說局部特徵框-階層 (sub-frame-level)結構以及整體特徵參照連續階^ (sequence-level)結構。曰音訊特徵(Audio features) 201025110 當提及“視訊”，雖然不一定需要要但最好是考慮在多數情況下，音訊組成(聲道)是視訊的一部分，音訊^料 (audio data)也包含用以辨識感興趣物件部份的重要資S'。' 由於音訊資料是-維的能在順序階層提供資訊:因此音訊特徵被考慮為整體特徵。 ' 簽章(Signatures)BRIEF SUMMARY OF THE INVENTION The present invention is directed to a method and apparatus for initiating behavior based on a dual means of calculating the space-time position (4) of a video towel at the client and the feeding end, and calculating the associated signature. In one aspect, the present invention discloses a method according to a video stream operation, the steps of which include at least a segment of the user's processing system that receives video information. The video branch includes video information representing a link frame, and the video message includes, in essence, the video message. ?, silk conversion continuous miscellaneous image of the request of the receiving behavior of the raw material 'behavior is the corresponding image of the image displayed in the video segment of the a ^ r empty position 'automatic side video stream of the zone adjacent to the space-time zone characteristics The step of automatically detecting the feature in the segment of the video stream is operated according to the video data used to convert the continuous frame image; obtaining a representation image of the domain _ domain according to the 贞贞 ;; And the characterization of the image of the electronic county; and the implementation of the electronic county of the action of the action of the invention, the invention includes a continuous frame of video streaming operation 201025110 hit,, * method 'the steps of at least included in the first - calculation position towel Receiving a video camera' video stream includes video information representing a continuous frame, and the video information includes video data that is essentially only used to transform a continuous frame image; The representation image of the region in the stream includes: automatically displaying the feature in the video stream, and the step of automatically detecting the feature in the segment of the video stream is the root of the scene, the video data of the image To operate; according to the characteristics of the detected 7, the e-county of the non-dragon domain; and the remuneration behavior • each electronic signature. Further includes, from the second calculation position of the same or = Receiving the client's electronic signature; matching the customer j electronic signature silk levy image saki _ fixed signature; lang _ specific signing early behavior; and starting behavior. The invention proceeds to Yang, the invention _ 露根魏讯The method of streaming operation comprises the steps of: at least physically separating each user processing system and the feeding end processing system: receiving a video stream, the video stream comprising video information representing the frame, and the video information includes In essence, the use of ® is to rely on the _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Video of continuous frame image According to the characteristics of the side, the electronic signature of each area is calculated according to the characteristics of the side. The cap is sent to the user to handle the system. #电子签章—to the servo end _ system; The comparison between the specific electronic signature and the electronic signature calculated by the electronic signature-conquering end processing system calculated by the feeding end processing system; and matching, the feeding end processing system Startup behavior. 7 201025110 In the above aspect, the occurrence of the characterization image is based on the _ of the side to obtain an implicit object such as the _ domain or to obtain a video element like the region according to the feature. [Embodiment] Hyper-image link (Hypervideo) ) Video elements • Broadly speaking, in the case of hypertext, hypertext links are represented by linear continuous symbols ("words in i," (1〇, 14) Connected to other media content (metadata) (12, 16) (hyperlink, as shown in Figure j). In the above type to video, since the video is a three-dimensional spatio-temporal array, the pixels representing the visual information are constantly changing, so that the adjacent regions in the array correspond to the visually bearable objects, and the associated release Data (such as "object object" in 22 (1), 22 (2), and φ 22 (3) in "Fig. 2"), in general, the "object" in the video is not necessarily It is semantically meaningful—the same area of pixels does not necessarily have to be visually interpreted, so to avoid the semantic meaning of the semantics, we call the pixel area a video w elements (or explicit objects) such as The video element 20, which is a video element 20(1), 20(2), and 20(3), is described in U.S. Patent Application Serial No. 11/778,633, entitled "Method and Apparatus For Video Digest Generation, application 曰2007/7/16 for reference. 8 201025110 • The transfer of data in the super file link and the super image link towel representation is different in the super file mark language (the coffee bean to marf^p ^guage, =TML) It is to release the certificate • #_结(_相相)3〇 to its compatible towel, Jingzi•, _ object and to the object has the green (four), for example: child = $ 3 figure 'A behavior can open another file (such as «full release bait link 3 〇 (1)) or a media slot case (such as the release of the data link 3 〇 3 (2) or super reference link Ower-reference) Show), or more generally, on the client side to perform any operation. -, as shown in "Figure 4" - means that the possible means of transferring data, such as a list of behaviors, is associated with the corresponding video element. The list may contain an element ID (element ID), a keyword description, and an indication of whether it can be clicked. Flag and associated behavior. More generally, the interpretation data may contain a list of possible behaviors associated with the parent video element from which to select a particular behavior', e.g., using personal φ information from the user profile (userpr〇file). In the proposal of the super-image link representation, the interpretation data is stored and directed to the video map, and the hierarchical data structure is described in the following part. The video image is included in the video. The description of the spatio-temporal locations of the video elements, the actual syntax of which is based on the compactness and correctness of the representation request. For example, the easiest way to represent video elements (as shown in 40(1)-40(5) in Figure 4) is by specifying the corners of the spatio-temporal bounding boxes. ' Through the identifier 9 201025110, the video image part, the angular position is presented, as shown in Figure 4, the description of the tender image in the video element 40 t is like a border box box) 〇•下I彳' Paying attention to S '帛来卿·Xing image to identify the items that can be selected and the scales, describing the behavior associated with the selected items. According to the 'Fuji video connection silk scarf, these two assets are separated. Video feature^ ❿ The layer of granularity under the video element of the image has video features, the following are simply referred to as "featoes", and the features are common names in computer vision j, often used Describe the information vector associated with the temporal and spatial subset of the video, for example, the feature is the space-time edge (the 3D direction of the chat (4), the local direction of the dynamic field, the color distribution, etc. As shown in Figure 5, distinguishing the local features (local features) 50 and the overall feature (global feat) 54, local 5 please connect to the space-time position (in the highest-order particle shot 'single-pixel 52), the new feature % associated • the largest space-time unit (frame (ftame ), lens or scene 56), typically, the local feature provides a description of the object, the overall feature provides a context (for example, the apple object in the computer advertisement and the fruit object in the fruit lens) has a description object The same local features, v but the overall context is different. ~ t narration ‘Sen can say that the sub-frame-level structure and the overall feature refer to the sequence-level structure. Audio features 201025110 When referring to "video", although not necessarily required, it is best to consider that in most cases, the audio component (channel) is part of the video, and the audio data is also included. The important asset S' used to identify the part of the object of interest. Since audio data is - dimensional, it provides information in the order hierarchy: therefore audio features are considered as an overall feature. 'Signatures'

因為視訊物件《由像素群組所組成的，_至不型的多局部特徵，聚集_至視訊元素4G的局部特徵％ ^其上下文到單-向量，我們獲得這裡所謂的簽章％，簽早58是視訊元素40的描述，表示它的特性與上下文，、下將會有進一步詳細的插述。音訊特徵也能是簽章58的一部份。視訊映像(Video map) 視訊映像是用以表相於視崎訊的階層式資料結構0 、、。、粒度的最締層包含考量最小#料單位（單獨素或像素小區塊)及關聯的局部特徵之表示法。、粒度的下-階層包含視訊元素及關聯的簽章之表示粒度的下—階層包含高_料單_如鏡頭 (SCeneS)及關聯的整體特徵、選擇的音訊特徵，和咖的^ 訊(例如副標題等）。旧貝粒度最低階層包含關於完整視訊的資訊，例如名稱、發佈年、演員和導演的資訊等等。 11 201025110 視訊映像提供如視訊的唯—識別碼（“條碼 (b動de)”或“指紋(fingerprint)”）’特別重要的是視訊元素之時空結構與其簽章，相同視訊的另一個例子，即使經歷某種程度的編輯或修改’相同的視訊將有物件之相似分配以及相似的指紋。視訊映像能表示如廣義“符號，，之連續，這裡的符號是數值向量。隱含物件(Implicit objects) 視訊映像可能只包含不完整的資訊，是指上述描述的資料結構有部分遺漏或不完整，命名為隱含物件是在重要特殊的情況考量下，當沒有視訊元素4G，可能是缺少的或沒生成的，視訊映像只包含特徵(局部特徵5〇與整體特徵 54)，但在視訊中對應“物件，，的區域無明確指示。這樣的情況下，透過在視訊點之時空鄰近區域聚集局部特徵50 ’連同描述上下文的整體特徵舛，簽章％仍然可產生，因此，可能地，我們能關聯簽章至每一個像素，並透過聚集在它周圍的特徵來計算。「第11圖」說明隱含物件42(b)與外顯物件42(a)之差異的示意圖。超影像連結内容生成與分配（Hyperyide〇 c〇ntent generation and distribution) 習知超影像連結内容分配系統是如「第6圖」所示由兩個主要70件所構成：内容提供端6()與前端使用者應用 (超影像連結客戶端64)。 12 201025110 '在習知超影像連結分配模型中内容提供端6〇不僅提供視訊，也提供雜資料，峻釋㈣是魏φ流(video =7)的一部分，這種模式的缺點是内容提供端60必須 . 是姆的超影像連結，那就是所有_容必_適當的處理並且轉換為可與娜像賴客戶端64姆的格式，在 • ;上述的例子中，内容提供端60會串流視訊、視訊映像與證釋資料至輕彡像賴客戶端64。、 • /根據本發明，在此所描述的模式是不靠内容提供端作超影像連結分配’如「第7圖」所示，根據此模式，只包含視訊資料及音訊資料的視訊資訊從内容提供端7〇流到超影像連結客戶端74,使用包含無轉資料或任何附加資訊的遺留内容，因此内容提供端7G對超影像連結是不可知論的。舉例來說，内容提供端70是現今的Y〇uTube，關鍵想法是超影像連結客戶端74與轉資料飼服端72對相同的視朗容使用相同的處理來獨立生成兩個視訊映 φ 像，可能的情況如下： -在超影像連結客戶端74與轉資料飼服端72兩邊的内容相同，結果視訊映像相同。 -在超影像連結客戶端74與轉資料飼服端72兩邊 . 的内容相似但不同是由於編輯視訊映像是相似。 “ -超影像連結客戶端74内容是轉資料飼服端*72内容之子集’超景〉像連結客戶端74視訊映像與詮釋資料伺服端72的視訊映像部分相似。然後绘釋資料伺服端72串流給釋資料，並與從内容提供 13 201025110 端70來的視訊及在超影像連結客戶端74生成視訊映像形成超影像連結内容來進行組合。超影像連結分配(Hypervideo distribution) 超影像連結分配的實施例中，每個視訊皆具有由内容 * 提供端70連同視訊一起發送的唯一識別碼(例如Y〇uTube ‘ 的例子），超影像連結客戶端74發送識別碼給言全釋資料伺服端72用以擷取與視訊對應的正確給釋資料，另外，相 φ 同的視訊内容能同時串流至超影像連結客戶端74與給釋資料伺服端72，雙方即時生成視訊映像。如「第8圖」所示為不同的實施例，根據簽章58在超影像連結客戶端74可以辨識内容提供端7〇串流的視訊内容，傳遞簽章58到證釋資料伺服端72可串流對應的詮釋資料，藉此能助於處理不合法複製的内容：甚至複製品仍被正確的辨識(即使已被修改或轉碼)以及超影像連結功月b仍被使用。舉例來說，即時在不合法的視訊中這將迫使 φ 使用者觀看廣告。金釋資料生成(Metadata generation) 「第9圖」說明在典型離線方式下詮釋資料伺服端 72生成_資料的過程，制容提供端％獲得給予的視 ^ =絲視織像生成如麵90，織註解在視訊映像中 “ 的每個視訊元素40來連結詮釋資料如步驟92，註解過程是階層式並且包含自動的、人工的或結合自動與人工的言: 解’而關聯至視訊之言全釋資料儲存至輯資料词根據-些個人設定檔(pers〇nal pr〇file)在察看期間给 201025110 釋資料被部分生成與增加，舉例來說，一些超連結是無效的(例如，假如使用者對汽車沒有興趣，在「第4圖」例子中的視訊元素1與3將不會被點選或引導至另外的喜愛目標）。另一個例子，考量詮釋資料產生的個人化模式，在這模式甲’根據使用者設定檔生成詮釋資料可以是不完整與完整的。舉例來說，BMW汽車視訊元素將不會連結至Because the video object "consisting of pixel groups, _ to non-type multi-local features, aggregation _ to the local feature % of the video element 4G ^ its context to single-vector, we get the so-called signature % here, sign early 58 is a description of the video element 40, indicating its characteristics and context, and will be further elaborated below. The audio feature can also be part of the signature 58. Video map The video map is a hierarchical data structure used to view the image. The most hierarchical layer of granularity contains a representation of the smallest #material unit (individual or pixel cell block) and associated local features. The lower-level of the granularity includes the video element and the associated signature. The lower-level of the representation includes the high_material list_such as the SCeneS and the associated overall features, the selected audio features, and the coffee (for example) Subtitles, etc.). The lowest level of Old Bay contains information about complete video, such as name, release year, information about actors and directors, and more. 11 201025110 Video image provides video-only identification code ("bar code" or "fingerprint"). Of particular importance is the temporal and spatial structure of the video element and its signature, another example of the same video, Even after some degree of editing or modification, the same video will have similar assignments of objects and similar fingerprints. The video image can represent a generalized "symbol, contiguous, where the symbol is a value vector. Implicit objects Video images may contain only incomplete information, which means that the data structure described above is partially missing or incomplete. The name of the hidden object is considered in the important special case. When there is no video element 4G, it may be missing or not generated. The video image only contains features (local features 5〇 and overall features 54), but in video There is no clear indication of the area corresponding to the object. In this case, by accumulating the local features 50' in the vicinity of the time and space of the video point, together with the overall characteristics of the description context, the signature % can still be generated, so, possibly, we can associate the signature to each pixel and The features gathered around it are calculated. Fig. 11 is a view showing the difference between the hidden object 42(b) and the external object 42(a). Hyperderament 〇c〇ntent generation and distribution The conventional super video link content distribution system is composed of two main 70 pieces as shown in "Picture 6": content provider 6 () and front end use Application (Super Image Link Client 64). 12 201025110 'The content provider 6 in the traditional super image link distribution model not only provides video but also provides miscellaneous information. The fourth release is part of Wei φ stream (video = 7). The disadvantage of this mode is that the content provider 60 must Is the super-image link of Mum, that is, all the _ _ must be properly processed and converted into a format that can be compared with the client 64, in the above example, the content provider 60 will stream video, The video image and the certificate data are as easy as the client 64. According to the present invention, the mode described herein is based on the content provider providing super-image link distribution as shown in Figure 7, according to which the video information and video information only contains video information from the content. The provider 7 stream is streamed to the hyper-image link client 74, using legacy content containing no transfer data or any additional information, so the content provider 7G is agnostic to the hyper-image link. For example, the content provider 70 is the current Y〇uTube, and the key idea is that the super image link client 74 and the data feed end 72 use the same processing for the same view to independently generate two video images. The possible situations are as follows: - The contents of both sides of the super video link client 74 and the transfer data port 72 are the same, and the video images are the same. - The content on both sides of the super image link client 74 and the data feed end 72 are similar but different because the edit video images are similar. " - The content of the super image link client 74 is a subset of the contents of the data feed server * 72 'Super View' like the link client 44 video image is similar to the video image portion of the interpretation data server 72. Then the data server 72 is depicted. The data is streamed and combined with the video from the content supply 13 201025110 70 and the video image generated by the super image connection client 74 to form a super video link content. Hyper video distribution Hyper image distribution In an embodiment, each video has a unique identification code (such as an example of Y〇uTube ') sent by the content* providing end 70 together with the video. The super-image linking client 74 sends the identification code to the full-release data server. 72 is used to capture the correct release data corresponding to the video, and the video content of the same video can be simultaneously streamed to the super-image link client 74 and the release data server 72, and both sides generate the video image instantaneously. 8 shows a different embodiment, according to the signature 58 in the super-image linking client 74 can identify the video content of the content provider 7 stream, pass The signature 58 to the certificate data server 72 can stream the corresponding interpretation data, thereby facilitating the processing of illegally copied content: even the duplicate is still correctly identified (even if it has been modified or transcoded) and the super image The link power month b is still used. For example, in the case of illegal video, this will force the φ user to view the advertisement. Metadata generation "Fig. 9" illustrates the process of interpreting the data server 72 to generate _ data in a typical offline mode, and the % of the production supply end obtains the image of the stencil. Each video element 40 in the video image is woven to link the interpretation data as in step 92. The annotation process is hierarchical and contains automatic, manual or automatic and artificial words: solution to the video message. The data stored in the archives is based on - some personal profiles (pers〇nal pr〇file). During the inspection period, the 201025110 release data is partially generated and added. For example, some hyperlinks are invalid (for example, if the user There is no interest in the car, and the video elements 1 and 3 in the "Figure 4" example will not be selected or directed to another favorite target). As another example, consider the personalization model generated by the interpretation of the data. In this mode, the interpretation data generated by the user profile can be incomplete and complete. For example, BMW car video elements will not be linked to

BMW的全球網路，而是根據使用者的所在地連結至當地的經銷商。也有可旎去產生關聯證釋資料辭典，例如透過多數得票，舉例來說，若多數人點選汽車最後至James B〇nd網站’ Η車此直接關聯至James Bond。在完全自動方式中，轉資料在視訊元素之間能使用相似判定鮮核生，在這餘訂，她元素能連結到另-個7G素或是在細或其他視对與目前這—視訊元素最相近(相同感覺)的元素。月，J提到連同視訊映像生成執行的附加過程是 58生成，將進一步於下述描述。 —對任何的視訊映像’一個以上的詮釋資料皆能關聯至母個視訊7L素，舉例來說，若_看BMw _子，—此汽車經銷賊立自已所屬的BMW輯資料，這些= 縛資料是姆資料集合的—部分，不_決於判斷鱗資_合肋執_行為，秋 7用子如拍#、行為請求位置、行為請求日、目的例 15 201025110 诠釋資料產生者所使用的内容不需要是内容的具體複製，用以在詮釋資料伺服端之視訊映像對應簽章所產生詮釋資料，如同按比例縮放或編輯過的版本，例如電影内容。當判斷詮釋資料用於對視訊映像產生詮釋資料時，在内容提供端70，電影原版將用以產生映像，绘釋資料產生者能使用如DVD或電影任何其他版本。超影像連結客戶端(jjyperyideo cHent) 在超影像連結客戶端74上，視訊映像在視訊察看期間即犄生成，相同的(或相似的)視訊映像可以從詮釋資料飼服端72制的’無論是從儲存器或即時產生的，常見使用的情況是朗者闕賴示視訊帽興_位置，使用者介面提供點選位置的時空座標以便於查詢視訊映像’ j辨識這些座標所對應的視訊元素4〇。視訊元素4〇之簽=58會從轉資湘服端η擷取襲的行為，因為言全釋資料伺服端72具有與超影像連結客戶端74產生的相同視訊映像，並且能只提供言全釋資料，然後超影像連結客戶端74用以執行請求的行為，如「第10圖」所示。 —和視訊映像是有意她#的資訊不同，轉資料能被緊逸的描it並且在最簡單的情況超連、_聯至視訊元素40。假如使用隱含物件，在超影像連結客戶端74產生的視訊映包含視訊元素⑽之贿，因此簽章％會關聯至點選時空位置而不是關聯至物件。超影像連結客戶端74用以產生視訊映像的内容不需 201025110 要是在轉資湘職72產生視訊映像之的具體複製，可以疋按比例縮放或編輯過的版本。電影内容為例，在詮釋資料伺服端72上，電影原版用以產生視訊映像，在超影像連結客戶端74上，廣播、 DVD或電影任何其他版本都能產生映像，在這情況下，其他内容例域告與原始電觀合，娜賴伽端72 的演算能藉由插人的廣告簽章來分離電减章並且執行匹配(match)。在這情況下，從電影刪掉一些内容(例如評價目的），鱗資料伺服端72能以電影簽章_下部分來作匹配。使用者介面超影像連結客戶端的使用者介面提供以下三種主要功月t* .顯不視訊、提供使用者在視訊中選擇感興趣的點及執行關聯至所選擇的行為，類似超文件連結，超影像連結客戶端使用者介面類似瀏覽網路。使用者介面主要組成的圖示描述如「第12圖」所示。指向裝置(Pointing device) 透過指向裝置121〇之手段執行視訊中所選擇感興趣的黑^指城置12U)允許輸入視訊中感興趣點的位置以及指不想要去啟動所需的行為1220，以下談論(透過類似 ^1見網路)如點選，則能啟行為，舉例來說，行為如視訊亲放1230 ’另外’行為如視訊覆蓋1240，其中任-的行為能用來允許結果顯示1250。指向裝置所指向的裝置之空間座標，以及點選時之瞬 201025110 間時刻，將組成指向裝置所提供的時空位置。具體來說’我們驗據超影鍵結客戶敎目標應用來辨別以下可能的指向裝置： • 個人電腦指向裝置㈣devices):使用於個人電腦應用的裝置，例如滑氣、軌跡球、觸控板等等，使用 • 這些裝置能在螢幕上顯示游標目前的空間座標，移動滑鼠 (或執跡球、觸控板)能移動游標到想要的位置，經由按壓鲁按鈕(bu_來執行點選，如「第13a圖」所示。觸摸式螢幕(Touch screen):觸摸式榮幕是觸碰敏感的顯示裝置，透過觸摸營幕上的點來判斷感興趣位置進行輸入’這種指向裝置在行動裝置上是很有㈣，如「第说圖」所示手勢辨識(gesture __οη):透過手做行為來判斷感興趣位置進行輸人’从透·定手勢的手段執行點選，這種指向裝置在像電視方面的運用是很有幫助的，手 • 勢辨識變化差異能依序指向物件，如同具有按钮的互動裝置(例如電視遠端控制）。行為(Actions) 關於點選感興_位置來_行為是以在客戶端執 . 行決定的程縣判斷行為的特定_，再使用舰端提供的言全釋資料。 ' 一般而言，我們能辨別行為的種類如下：連結(Link) : _超請触縣，_包含顯示超參考(hyp㈣ference)關聯至對應位置的視訊元素之後，執 201025110 行行為，參考通常是媒體物件例如文字、網頁、音訊或視訊稽，一旦闕，目前顯示的視訊將被終止或暫停，並且 ‘4不，體物# $外’關聯内容會顯示如不打斷視訊播放的覆蓋舉例來„兒，-旦點選位置，將會顯示描述關聯視 • 訊元素的相關資訊，另外一個情況，點選位置將啟動播放 ' 個視訊中出現相同或最相似的視訊元素。導航(Navigation): -觀索目前視訊的方式，根據 φ 闕位置，從一些有關的方式關聯視訊元件的位置來顯示視訊。如綠向裝置具衫健鄉找、遠端），行為不只取決於位置還包括按下按紐。舉例來說，在目前顯示的視訊中指向一個演員的臉並且按壓在指向裝置上的“前進(FORWARD)’’按紐，視訊將快速前進到具有選定演員的下一個場景；相同的，當按壓“後退(BACKWARD)，，按钮會讓視訊回到具有選定演員出現的上一個場景，如果指向裝置具有單一點選(觸摸式螢幕），則透過所提供的選 • 單做適當選擇來判斷行為類型。基於使用者設定檔的行為選擇(U ser profile based action selecti0n):在詮釋資料伺服端72，一個以上的詮釋資料關聯至每個簽章5 8，不同的決定方法來決定執行行為 - 之言全釋資料集合’建議方法之一是使用使用者設定檔來決 . 定採取哪一個行為。使用者設定檔元素可以是透過連接簽章58的使用端傳送，或是儲存在詮釋資料伺服端72。舉例來說’使用者設定檔可能包含以下元素： -使用者實體位置 201025110 請求的時間 -使用者年齡群 -使用者財產群 -使用者語言 * 先前請求歷史個人電腦上的網路餅乾舉例來說，加州聖荷西的BMW經銷商在勞動節有特 ❹ 卿銷售以及購買關聯至BMW視訊元素的行為，但僅限於來自此地區並且在這事件的之前，只有具有聖荷西使用者設定檔的使用者在點選位置關聯BMW視訊元素後接收到購買的行為，其他使用者設定齡法接收資訊或接收到不同的行為。互動模式(Interactive mode):點選位置將在使用者與客戶端之間的調用互動，應用的例子為電子商務，點選位置調用互動介面來允許使用者去購買關聯至位置的視訊 # 元素之項目。視訊映像生成(Video map generation> 超影像連結客戶端與證釋資料伺服端主要演算核心是依據檢索绘釋資料的過程來對視訊映像的計算。視訊映像的重要請求特性是階層式結構，允許對部分時空比較，假設情節是考量在詮釋資料伺服端72與超影像連結客戶端74上構成視訊映像相同視訊的兩個版本，由於它們經歷一些改變視訊可能是不同的，包含： -空間變換：解析度與縱橫比的轉換、像素剪輯或覆 20 201025110 蓋、顏色變換、雜訊。 _時間變換：訊框速率轉換、插入或去除内容。在给釋資料檢索階段，比較部份對應的視訊映像，在上述轉換中隱含視訊映像結構與比較過程必須是不變的或無感覺的。BMW's global network is connected to local dealers based on the location of the user. There is also a dictionary that can be used to generate relevant evidence, for example, through a majority of votes, for example, if the majority of people click on the car and finally to the James B〇nd website, the car is directly linked to James Bond. In the fully automated mode, the transfer data can be similarly judged between the video elements, and in this case, the element can be linked to another 7G element or in a fine or other view with the current video element. The closest (same feeling) element. Month, J mentions that the additional process performed along with the video image generation is 58 generation, which will be further described below. - For any video image, more than one interpretation data can be associated with the parent video 7L. For example, if _ see BMw _ child, this car dealer thief sets his own BMW series data, these = binding data Is the data collection - part, not _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The content does not need to be a specific copy of the content, which is used to interpret the interpretation data generated by the corresponding signature of the video image of the data server, such as a scaled or edited version, such as movie content. When it is judged that the interpretation material is used to generate an interpretation material for the video image, at the content providing terminal 70, the original film will be used to generate an image, and the image data producer can use any other version such as a DVD or a movie. Hyper-image link client (jjyperyideo cHent) On the super-image link client 74, the video image is generated during the video view, and the same (or similar) video image can be read from the data feed end 72. From the storage or instant generation, the common use situation is that the user interface provides the space-time coordinates of the selected location in order to query the video image 'j to identify the video elements corresponding to the coordinates. The video element 4 = sign = 58 will be attacked from the transfer of the 服服端 , , , , , , , 因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为因为Release the data, and then the super image link client 74 is used to perform the requested behavior, as shown in Figure 10. - Unlike the video image, which is intended to be her #, the transfer data can be quickly viewed and connected to the video element 40 in the simplest case. If an implied object is used, the video generated by the hyper-image link client 74 contains a bribe of the video element (10), so the signature % is associated with the point-and-click location rather than the object. The super-image linking client 74 is used to generate the content of the video image without 201025110. If the video recording of the video image is generated, it can be scaled or edited. For example, on the interpretation data server 72, the original movie is used to generate a video image, and on the super-image connection client 74, any other version of the broadcast, DVD or movie can be imaged, in this case, other content. In the case of the case, the calculus of Naraya's 72 can separate the electric chapter and perform the match by inserting the advertising signature. In this case, some content (e.g., evaluation purpose) is deleted from the movie, and the scale data server 72 can match with the movie signature _ lower part. The user interface of the user interface super-image link client provides the following three main functions: t*, no video, provides the user to select the point of interest in the video and performs the association to the selected behavior, similar to the super file link, super The image link client user interface is similar to browsing the web. A graphical description of the main components of the user interface is shown in Figure 12. The pointing device performs the position of the point of interest in the video by means of the pointing device 121. The position of the point of interest in the video is allowed to be input and the desired behavior 1220 is not required to be activated. Talk about it (by seeing the network like ^1), such as clicking, can initiate behavior, for example, behavior such as video playback 1230 'other' behavior such as video coverage 1240, the behavior of which can be used to allow results to display 1250 . The space coordinates of the device pointed to by the pointing device, and the moment between the moments of the selection, 201025110, will constitute the space-time position provided by the pointing device. Specifically, 'We check the supervised client's target application to identify the following possible pointing devices: • PC pointing device (4) devices): Devices used in personal computer applications, such as airslides, trackballs, trackpads, etc. Etc., use • These devices can display the current space coordinates of the cursor on the screen. Move the mouse (or the trackball, trackpad) to move the cursor to the desired position, and press the button (bu_) to perform the click. , as shown in Figure 13a. Touch screen: Touch-type screen is a touch-sensitive display device that can be used to determine the position of interest by touching a point on the screen. The mobile device is very good (4), such as the gesture recognition (gesture __οη) shown in the "Figure": the hand-made behavior is used to judge the position of interest to enter the person's choice from the means of the gesture, such a point The use of devices in the field of television is very helpful, and the difference in hand recognition can be directed to objects, like interactive devices with buttons (such as TV remote control). Actions Regarding the choice of _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ : Link (Link): _Super touches the county, _ contains the super reference (hyp (four) reference) associated with the corresponding location of the video element, the 201025110 line behavior, the reference is usually a media object such as text, web, audio or video, once阙, the currently displayed video will be terminated or paused, and the '4 No, Body #$外' related content will display an example of the coverage without interrupting the video playback. If the location is selected, the description will be displayed. Related information about the elements of the video, in another case, the selected location will start playing the same or most similar video elements in the video. Navigation: - View the current video mode, according to φ 阙 position, Display the video from the location of the video component in some relevant way. If the green device is located in the device, the remote location), the behavior depends not only on the location but also on pressing the button. For example, in the currently displayed video pointing to an actor's face and pressing the "Forward" button on the pointing device, the video will quickly advance to the next scene with the selected actor; the same, when Pressing "BACKWARD" will cause the video to return to the previous scene with the selected actor. If the pointing device has a single click (touch screen), the behavior will be judged by making the appropriate selection. Type: Based on the user profile selection (U ser profile based action selecti0n): at the interpretation data server 72, more than one interpretation data is associated with each signature 5 8, different decision methods to determine the execution behavior - One of the suggested methods for the full release data set is to use the user profile to decide which action to take. The user profile element can be transmitted via the use end of the connection signature 58 or stored on the interpretation data server 72. For example, the 'user profile' may contain the following elements: - User entity location 201025110 Requested time - User age group - User property group - User language * Previously requesting a history of network cookies on a personal computer, for example BMW dealers in San Jose, Calif., have special sales and purchases of links to BMW video elements on Labor Day, but only from this region and prior to this incident, only those with San Jose user profiles The user receives the purchase behavior after selecting the BMW video element in the selected location, and other users set the age method to receive information or receive different behaviors. Interactive mode: Clicking the location will interact with the user and the client. The application example is e-commerce. Clicking on the location invokes the interactive interface to allow the user to purchase the video # element associated with the location. project. Video map generation (Video map generation> Hyper-image link client and certificate release data server main calculation core is based on the process of searching and mapping data to calculate the video image. The important request feature of the video image is a hierarchical structure, allowing Partial time-space comparison, assuming that the plot is to consider two versions of the same video that constitute the video image on the interpretation data server 72 and the hyper-image link client 74, since they may undergo some changes, the video may be different, including: - spatial transformation: parsing Degree and aspect ratio conversion, pixel clipping or overlay 20 201025110 cover, color conversion, noise. _ time conversion: frame rate conversion, insertion or removal of content. In the retrieval data retrieval phase, compare some corresponding video images, The video image structure and comparison process implicit in the above conversion must be constant or non-feeling.

' 難的實_之—是在_格式t⑷如使用MPEG 壓縮）視訊串流到超影像連結客戶端，經歷解碼的視訊串 • 献生賴雜框，在像素之三維_的形式巾表示為視訊資料’音訊資料不但是附加編碼參數而且是關聯至每個從解碼視訊串流得到的訊框。以像素與編碼參數來產生局部特徵5G的映像，局部特徵50關聯至在像素三維陣列的每個時空位置，粒度是單獨的像素，或較佳大的資料單位，例如像素之小的時空區塊。我們提到這樣的區塊稱為時空基本資料單位 (spatio-temporal basic data units) ° φ 在電腦視覺學中用來描述特徵偵測與表示的很多方法’舉例來說，局部特徵5〇包含：'Difficult real_ is - in _ format t (4) if using MPEG compression) video streaming to the super-image link client, undergoing the decoded video string • Dedicated frame, in the form of a three-dimensional pixel of the pixel The data 'information data' is not only attached to the encoding parameters but also to each frame obtained from the decoded video stream. The image of the local feature 5G is generated in pixels and encoding parameters, the local feature 50 is associated with each spatiotemporal position in the three-dimensional array of pixels, the granularity is a single pixel, or preferably a large data unit, such as a small space-time block of pixels . We refer to such blocks as spatio-temporal basic data units. φ Many methods used to describe feature detection and representation in computer vision. For example, local features 5〇 include:

-Harris角偵測器與其差異被描述在c Hards “ M-Harris angle detector and its differences are described in c Hards " M

Stephens “A combined comer and edge • detector» proceedings of the 4th Alvey Vision Conference, 1988。 _尺度亙吊特徵轉換(Scale invariant feature transform, SIFT)被描述在 D. G· Lowe, Distinctive image features from sea 1 einvariant keypoints，” International Journal of Computer 21 201025110Stephens “A combined comer and edge • detector» proceedings of the 4th Alvey Vision Conference, 1988. _ Scale invariant feature transform (SIFT) is described in D. G. Lowe, Distinctive image features from sea 1 einvariant Keypoints,” International Journal of Computer 21 201025110

Vision，2004。 -以解碼視訊串流獲得移動向量(M〇ti〇n vect〇rs); -時空邊緣的方向； -顏色的分布； • -結構描述；、在一些已知的詞典中像素之分解係數，例如wavelets 轉換、curvelets轉換等。 _ 特定物件已知的先驗。下個步驟疋疋位視訊元素4〇，透過執行視訊分段的過程可以從像素值明確發現視訊元素4〇，也就是以執行分段的分離過程、利帛已計算的特徵或在先前步驟計算局部特徵5G皆·現視訊元素’特縣示向量關聯至時空基本貧料單位，經歷向量量化或聚集能決定具有相似特徵向羞的時空區域’使用現行的任何向量技術，較佳的實施例是使用Max-Lloyd方法。 • 因此’像素三維陣列被劃分為區域，這些區域經歷選擇過程，目的是刪除無意義的區域(例如太小）、合併相似區域與選擇預先判斷數值的區域，每個選定的區域標記視訊元素40及其位置並儲存在特定資料結構中，資料結構 - 允許在像素陣列給予時空位置來判斷所屬視訊元素40。較 . 佳的實施例是描述視訊元素位置如包含整體數值的三維映像’對應基本資料單位的位置上的數值X意味著基本資料單位是屬於編號x的視訊元素；另一個實施例是以邊界盒的角座標來描述視訊元素4〇。 22 2〇l〇25li〇聚集在資料單位屬於每個視訊元素40的特徵向量，執行這樣聚集的可能性之—是以向量的座標方向平均數來表不被聚集的特徵如簽章向量。另外，視訊元素的描述、 7如它的持續時間或郎大小能增加至簽章向量，簽章向、置58儲存在資料結構中並關聯至對應的視訊元素4〇。、表示構成視訊局部部分之簽章58的方式是空間變換 :感覺遲純，假域變枝非常有效，兩她本視訊的簽每早58將是相似的或相同的。特徵的結構、視訊元素與簽章即為視訊映像的整體。在超影像連結客戶端74操作期間，視訊映像即時產生#x佳的實施例是在視訊映像生成演算操作下解碼視訊資料(進一步的實施例是音訊資料也關聯至視訊資料)儲存在臨時緩衝器(tempomiy buffer)，顯示少數訊框延遲的視 Λ，對預先少數訊框足夠去產生視訊映像，習知硬體與軟 • 體，例如：具有編碼可執行所述功能之應用軟體的一般用途電腦’具有編碼可執行所述功能識之應用軟體的嵌入式負料處理器’或具有硬編瑪(hard-coded)程式設計或結合硬編碼程式設計與軟體應用的ASICS，都可以作為超影像連結客戶端。舉例來說’資料處理器能使用在包含一般用途 * 的微處理器(microprocessors)、微控器(microcontrollers)、數位^ 號處理器(digital signal processors, DSP’s)與應用特疋積體電路資料處理器（application specific integrated circuit processors) ° 當指向裝置點選視訊的位置，指向裝置(或游標)的空 23 201025110 間座標連同即時執行點選的點來查詢視訊映像中對應的視讯兀素40 ’假如請求位置存在的視訊元素40可從視訊映像類取它的簽章58(較⑽實施例是生成所請求的)並且傳送至詮釋資料伺服端72。 * 在°全釋負料祠服端72上，以相同過程來生成視訊映 * 像，簽章58比對簽章的資料庫以最接近的來選擇，再傳送關聯言全釋資料到超影像連結客戶端74，決定點選所產生 0 的行為。至於隱含物件，視訊元素生成階段是缺少的，相反，視訊映像只包含特徵向量，當指向裝置點選視訊的位置，指向裝置(或游標)的空間座標連同即時執行點選的點來查詢在時空位置的時空鄰近區域之特徵(例如預先定義的大小）’這些特徵聚集到簽章傳送至給釋資料飼服端72。視訊映像生成示範演算法以下是本發明較佳的實施例中視訊映像生成特殊的 φ 演算法： 1. 輸入從視訊串流的子序列訊框之集合，每個訊框代表像素的二維陣列並且訊框的集合代表像素之三維(時空)陣列，假如所輸入的串流被壓縮，經視訊解碼過程= 結果可獲得訊框。 « 2. 將三維像素陣列分成小部分，可能覆蓋像素三維 . 區塊(例如：16xl6x3)，編號區塊為bijk，i、j是兩個空間座標，k:疋時間座標(訊框編號）’表示區塊如三維陣列，每個項目對應一個區塊。 24 201025110 本發明的 3.對每個像素區塊，計算M局部特徵邡實施例中特徵包含： -區塊中平均亮度像素之強度； -區塊中平均濃度像素之強度；邊緣的 _計算區塊中像素的時空梯度，表示局部時空方向與強度； &Vision, 2004. - obtaining the motion vector (M〇ti〇n vect〇rs) by decoding the video stream; - the direction of the spatiotemporal edge; - the distribution of the color; - the structure description; the decomposition factor of the pixel in some known dictionaries, for example Wavelets conversion, curvelet conversion, etc. _ A priori known for a particular object. The next step is to clamp the video element 4〇. Through the process of performing video segmentation, the video element can be explicitly found from the pixel value, that is, the segmentation process is performed, the calculated feature is calculated, or the previous step is calculated. The local feature 5G is the current video element 'the special vector is associated with the space-time basic poor unit, and the vector quantization or aggregation can determine the spatio-temporal region with similar features to the shame' using any existing vector technique. The preferred embodiment is Use the Max-Lloyd method. • The 'pixel 3D array is therefore divided into regions that undergo a selection process with the goal of deleting meaningless regions (eg, too small), merging similar regions, and selecting regions of pre-judged values, each selected region marking video elements 40 And its location and stored in a particular data structure, the data structure - allows the pixel array to be given a spatiotemporal location to determine the associated video element 40. A preferred embodiment is to describe the position of the video element such as a three-dimensional image containing the overall value. The value X at the position corresponding to the basic data unit means that the basic data unit is a video element belonging to the number x; another embodiment is a bounding box. The corner coordinates are used to describe the video element 4〇. 22 2〇l〇25li〇 The feature vectors that belong to each video element 40 in the data unit are aggregated, and the possibility of such aggregation is performed—the features that are not aggregated, such as the signature vector, are the average of the coordinate directions of the vectors. In addition, the description of the video element, 7 such as its duration or lang size can be added to the signature vector, and the signature is stored in the data structure and associated with the corresponding video element. The way to represent the signature 58 that forms part of the video is spatial transformation: the feeling is late, the false domain branching is very effective, and the signatures of the two video messages will be similar or identical each morning. The structure of the feature, the video element and the signature are the entirety of the video image. During operation of the hyper-image linking client 74, the video image instant generation #x is preferred to decode the video material under the video image generation algorithm operation (further embodiment is that the audio material is also associated with the video material) stored in the temporary buffer. (tempomiy buffer), which displays a few frames of delayed view, enough for a small number of frames to generate a video image, conventional hardware and software, for example, a general-purpose computer with an application software that can perform the functions described. 'Embedded Negative Processors with Application Software Encoding the Functional Knowledge' or ASICS with hard-coded programming or hard-coded programming and software applications can be used as super-image links Client. For example, 'data processors can be used in general-purpose* microprocessors, microcontrollers, digital signal processors (DSP's) and application-specific integrated circuit data processing. Application specific integrated circuit processors ° When the pointing device clicks on the location of the video, the coordinates of the pointing device (or cursor) 23 201025110 together with the point of immediate execution of the click to query the corresponding video element in the video image 40 ' The video element 40, if the request location exists, can take its signature 58 from the video image class (as requested by the (10) embodiment is generated) and to the interpretation data server 72. * On the full release negative load port 72, the same process is used to generate the video image, the signature 58 is compared with the signature database to select the closest one, and then the related information is transmitted to the super image. Linking the client 74 determines the behavior of the 0 generated. As for hidden objects, the video element generation phase is missing. On the contrary, the video image only contains the feature vector. When pointing to the location where the device clicks on the video, the space coordinates of the pointing device (or cursor) together with the point of instant execution of the click are used to query Features of the spatiotemporal neighborhood of the spatiotemporal location (e.g., a predefined size) 'These features are gathered to the signature delivery to the delivery data feed end 72. Video Image Generation Exemplary Algorithm The following is a special φ algorithm for video image generation in a preferred embodiment of the present invention: 1. Input a set of sub-sequence frames from a video stream, each frame representing a two-dimensional array of pixels And the set of frames represents a three-dimensional (space-time) array of pixels. If the input stream is compressed, the frame can be obtained by video decoding process = result. « 2. Divide the 3D pixel array into small parts, possibly covering the 3D pixel. Block (for example: 16xl6x3), numbered block is bijk, i, j are two space coordinates, k: 疋 time coordinate (frame number) Represents a block such as a three-dimensional array, and each item corresponds to one block. 24 201025110 3. In the present invention, for each pixel block, the M local feature is calculated. The features in the embodiment include: - the intensity of the average luminance pixel in the block; - the intensity of the average density pixel in the block; the edge_computation area The spatiotemporal gradient of the pixels in the block, indicating the local spatiotemporal direction and intensity; &

-假如壓縮核，較佳的附加輸人是區塊的平均移動向量，而移動向量是目前在壓縮串流中附加的參數資訊。區塊bijk的每個上述特徵代表向量严取，這裡的瓜是特徵的數目，連接(concatenated)所有特徵為單一向量.= (fijk，…，0收），特徵向量是以三維陣列表示，每一項目對應關聯像素區塊的一個特徵。 4.特徵向量以向量量化演算法經歷叢集，本發明實施例的Max-Lloyd演算法是描述在a. Gersho, R. M. Gray 的向量量化與單一壓縮，向量量化是將像素的三維陣列劃分為具有相同或相似特徵向量的區域，區域典型的代表是區塊方向映像，每個項目是所屬區塊的區域編號。 5，時空區域之映像經歷形態的過程，刪除預置閾下空間與時間大小的區域，並且合併具有相同數量的斷開區域。本發明之實施例是給定區域的預置數量，在過程中剩餘區域的總數量等於預置數量的方法下能合併或刪掉區域0 6.在屬於階段5經歷叢集產生區域之區塊中的特徵向量產生關聯每個區域的簽章向量，透過整體特徵能擷取 25 201025110 上下文並且音訊能增加至簽章向量中。本發明的實施例中，區域簽章58具有特徵向量的維數以及是以平均區塊所屬區域的特徵向量之座標來產生。另一實施例，以計算特徵向量的直方圖執行聚合，直方圖的區間數是預置的。另一實施例，區塊所屬區域的特徵向量經歷主要元素分析(principal component analysis, PCA)來產生預置大小的向量’以PCA能擷取最大元素之組成，如κ.- If the kernel is compressed, the preferred additional input is the average motion vector of the block, and the motion vector is the parameter information currently attached to the compressed stream. Each of the above features of the block bijk represents vector strictness, where the melon is the number of features, concatenated all features as a single vector. = (fijk, ..., 0), the feature vector is represented by a three-dimensional array, each An item corresponds to a feature of the associated pixel block. 4. The feature vector is subjected to clustering by a vector quantization algorithm. The Max-Lloyd algorithm of the embodiment of the present invention is described in a. Gersho, RM Gray for vector quantization and single compression, and vector quantization is to divide the three-dimensional array of pixels into the same Or a region of similar feature vector, the typical representative of the region is the block direction image, and each item is the region number of the block to which it belongs. 5. The image of the space-time region undergoes a morphological process, deleting the pre-threshold space and the time-sized region, and merging the same number of disconnected regions. Embodiments of the present invention are preset quantities for a given area, and regions 0 can be merged or deleted under the method that the total number of remaining areas in the process is equal to the preset number. 6. In the block belonging to stage 5 that experiences the cluster generation area The feature vector generates a signature vector associated with each region, and the global feature can capture 25 201025110 context and the audio can be added to the signature vector. In an embodiment of the invention, the area signature 58 has the dimension of the feature vector and is generated by the coordinates of the feature vector of the region to which the average block belongs. In another embodiment, the aggregation is performed by calculating a histogram of the feature vectors, and the number of intervals of the histogram is preset. In another embodiment, the feature vector of the region to which the block belongs undergoes principal component analysis (PCA) to generate a vector of a preset size to allow the PCA to extract the composition of the largest element, such as κ.

Fukunagal 990 年之 Introduction to Statistical PatternFukunagal 990 Introduction to Statistical Pattern

Recognition 描述。視訊映像匹配(Video map matching) 5全釋資料飼服端72在此說明與描述如單一計算位置，雖然它應該被理解，無論如何，I全釋資料伺服端72 能被分散，在特定的實施例中，根據生成視訊映像的大小，在一或多個位置發生視訊映像生成，並且合併不同的部份，進一步，在視訊映像初始生成之後，分散生成的視訊映像到各種不同的其他位置，其他位置能幫助判斷是從客戶端58接收的簽章58c進行匹配或匹配儲存在簽章以詮釋資料伺服端位置事先生成的視訊映像，在詮釋資料飼服端72更_生成的視訊映像能在週_或其他間隔所產生。此外’更新用以產生生成視訊映像之功能來進行介紹，雖雜了解’在娜資·職72與超影像連結客戶端74驗置進行姻更新是需要的。 26 201025110 有關簽章的匹配，最理想的是，假如在詮釋資料伺服鳊72與超景^像連結客戶端74的視訊内容是相同的，所對應的視訊映像將是相同的，因此結果是完全匹配。然而，Recognition Description. Video map matching 5 The full release data feed end 72 is illustrated and described herein as a single computing location, although it should be understood that the I full release data server 72 can be dispersed, in a particular implementation. In the example, according to the size of the generated video image, video image generation occurs at one or more locations, and different parts are merged. Further, after the video image is initially generated, the generated video image is distributed to various other locations, and the like. The location can help determine whether the signature 58c received from the client 58 matches or matches the video image stored in the signature to interpret the data server location, and the video image generated at the interpretation feed 72 can be generated in the week. _ or other intervals are generated. In addition, the update is used to generate the function of generating a video image. Although it is necessary to understand the update, it is necessary to perform the marriage update on the Nai-Shou 72 and the Super-Video Link Client 74. 26 201025110 For the matching of signatures, it is ideal that if the video content of the interpreter data server 72 and the super-view image client 74 are the same, the corresponding video images will be the same, so the result is completely match. however,

金釋貧料祠服端72與超影像連結客戶端74視訊映像的版本起因於相同視訊之不同例子(舉例來說，在詮釋資料伺服端72的視訊映像是從1)^版本]^1^(：決定廣告插入是 =每秒24個訊框產生’超影像連結客戶端％的視訊映像疋從HDTV決定廣告插入是以每秒播放3〇個訊框），以及其他原因的差異，在這樣的纽下，預赃配是近似的。進一步，超影像連結客戶端74的視訊可能是新的，如對應視訊映像不存在詮釋資料伺服端72上，這種情況，接近的匹配將用以檢索詮釋資料。在超影像連結客戶端74發出的簽章58c(如探針)與給釋貝料飼服端72上的視訊映像簽章S8s之間執行匹配，，用-些預^義的公制如向量比對縣58e與池，最簡早的情況，向量使用輯里德公躲輯向量，向量之間座標方向差異平方_數，接近_配透過比對探針簽章與飼服端簽章來判斷，並且選擇最小公制的一個。雖然本發明所揭露之實施方式如上，惟崎之内容並非用以直接限定本伽之專梅護細。任何本發明所屬技術，域巾具麵常知财，在不麟本發日骑揭露之精神和犯圍的前提下，可以在實㈣形式上及細節上作些許之更動本發明之專利保護範圍，仍須以所附 : 範圍所界定者鲜。 ^糾 27 201025110 【圖式簡單說明】立第1圖為常見網路應肖於超文件連結概念視覺化之示意圖。第2圖為超影像連結概念視覺化之示意圖。 ‘ 帛3圖為在HTML格式中具有给釋資^嵌入至内容 * 的超文件之實施例示意圖。第4圖為本發明從内容分離给釋資料的超影像連结之 ❹ 實施例示意圖。、。第5圖為本發明視訊映像的資料階層之實施例示意圖。 ^圖為習知超影像連結魄分配之系統方塊圖。第7圖為本發明超影像連結内容分配之系、统方塊圖。第8圖為本發明具有内容辨識的超影像連結分配之系統方塊圖。第9圖為本發明詮釋資料伺服端的给釋資料生成階段 ❹ 之不意圖。第10圖為本發明詮釋資料伺服端與客戶端内容察看階段之示意圖。 Μ 第11圖為本發明外顯物件與隱含物件為例之示意圖。 * 帛12 ®為本發明超影像連結客戶端的使用者介面版本之方塊不意圖。第13a圖至第13c圖為本發明超影像連結客戶端使用不同指向裝置之示意圖。第14圖為本發明以不同類型的行為為實施例之示意 28 201025110 圖。【主要元件符號說明】The version of the video image of the interpreter port 72 and the super image link client 74 is caused by different examples of the same video (for example, the video image of the interpreter data server 72 is from 1) ^ version]^1^ (: Decide on ad insertion is = 24 frames per second to generate 'super image link client % video image 疋 from HDTV to determine ad insertion is to play 3 frames per second), and other reasons for the difference, in this way Under the New Zealand, the pre-match is similar. Further, the video of the super-image linking client 74 may be new, such as the corresponding video image does not exist on the interpretation data server 72. In this case, the close match will be used to retrieve the interpretation data. Performing a match between the signature 58c (such as a probe) issued by the super-image linking client 74 and the video image signature S8s on the feeding end 72, using a pre-determined metric such as a vector ratio For the county 58e and the pool, the simplest case, the vector uses the series of the ridiculous escaping vector, the squared direction difference between the vectors is _number, close to the _ match through the probe signature and the feeding end signature And choose the one with the smallest metric. Although the embodiment disclosed in the present invention is as above, the content of the saki is not intended to directly limit the beauty of the gamma. Any of the techniques of the present invention, the domain towel is often known for its wealth, and the patent protection scope of the present invention can be modified in the form and details of the invention without prejudice. , still must be attached to the scope: defined by the scope. ^ Correction 27 201025110 [Simple description of the diagram] The first picture shows the common network should be visualized in the visual concept of the super file connection concept. Figure 2 is a schematic diagram of the visualization of the super-image link concept. ‘ 帛 3 is a schematic diagram of an embodiment of a super file embedded in the HTML format in the HTML format. Fig. 4 is a schematic view showing an embodiment of a super-image link for separating and releasing data from a content according to the present invention. ,. Figure 5 is a schematic diagram of an embodiment of a data hierarchy of a video image of the present invention. ^ The figure is a system block diagram of the distribution of the conventional super image link. Figure 7 is a block diagram of the system for super-image link content distribution according to the present invention. Figure 8 is a block diagram of a system for super-image link distribution with content recognition according to the present invention. Figure 9 is a schematic diagram of the invention for interpreting the data generation stage of the data server. Figure 10 is a schematic diagram showing the stage of viewing the data server and client contents of the present invention. Μ Figure 11 is a schematic view showing an example of an external object and an implicit object of the present invention. * 帛12® is not intended for the user interface version of the Hyper Image Link Client of this invention. Figures 13a through 13c are schematic views of different super-image linking clients using different pointing devices. Figure 14 is a schematic representation of an embodiment of the invention with different types of behavior 28 201025110. [Main component symbol description]

10 字元 12 證釋資料 14 字元 16 詮釋資料 20(1) 視訊元素 20(2) 視訊元素 20(3) 視訊元素 22(1) 詮釋資料 22(2) 詮釋資料 22(3) 詮釋資料 3〇(1) 詮釋資料連結 30(2) 詮釋資料連結 40⑴ 視§fL ·7〇素 40(2) 視訊元素 40(3) 視訊元素 40(4) 視訊元素 40⑶ 視訊元素 42⑻ 外顯物件 42(b) 隱含物件 50 局部特徵 52 單一像素 54 整體特徵 29 201025110 56 訊框 58 簽章 60 内容提供端 64 超影像連結客戶端 ^ 70 内容提供端 • 72 詮釋資料伺服端 74 超影像連結客戶端 1210 指向裝置 ® 1220 行為 1230 視訊錄放 1240 視訊覆蓋 1250 顯示步驟90 視訊映像生成步驟92 詮釋資料生成過程 ❿ 3010 Characters 12 Certificate Information 14 Characters 16 Interpretation Data 20(1) Video Elements 20(2) Video Elements 20(3) Video Elements 22(1) Interpretation Data 22(2) Interpretation Data 22(3) Interpretation Data 3 〇(1) Interpret data link 30(2) Interpret data link 40(1) §fL ·7〇素40(2) Video element 40(3) Video element 40(4) Video element 40(3) Video element 42(8) Explicit object 42(b Implicit object 50 Local feature 52 Single pixel 54 Overall feature 29 201025110 56 Frame 58 Signature 60 Content provider 64 Hyper-image link client ^ 70 Content provider • 72 Interpreter data server 74 Hyper-image link client 1210 Pointing Device ® 1220 Behavior 1230 Video Recorder 1240 Video Coverage 1250 Display Step 90 Video Image Generation Step 92 Interpret Data Generation Process ❿ 30

Claims

201025110 X. Patent application scope: 1. A method according to video stream operation, which comprises at least the following steps: In a user processing system: receiving a segment of the video stream, the segment The video stream includes video information representing one of the sequential frames, and the video information includes essentially one video data for rendering the continuous frame image; receiving a behavior (action) requesting, the behavior is a spatial-temporal location corresponding to the image displayed in the segment of the stream; automatically detecting at least the adjacent space-time region of the segment of the video stream a feature, the step of automatically detecting the features in the segment of the video stream is performed according to the video data used to convert the continuous frame image; obtaining the features according to the detected features a representation map of different regions in the video stream; calculating an electronic signature based on the request and the characterization image; Use the electronic signature to initiate the performance of the action. 2. The method according to claim 1, wherein the request for the behavior is an information request, and the step of using the power 31 201025110 sub-signature further comprises the following steps: The electronic signature is sent to an external server; receiving a message in response to the transmitted electronic signature; and displaying the information. 3. The method according to claim 1, wherein the request for the behavior is an information request, and the step of using the electronic signature further comprises the following steps: performing a partial request Used to use the electronic signature to obtain a piece of information that is completely stored in the user processing system; receive the information that is responsive to the execution of the local request; and display the information. 4. The method according to claim 6, wherein the request for the act is a content action request, and wherein the step of causing the electronic signature comprises performing the booting in the video stream The step of the section of the towel to make an electronic signature is a step of the process. 5. The method according to claim 4, wherein the content behavior request is to request to skip the content corresponding to the electronic signature. 6. The method according to claim 4, wherein the content behavior is to request swap or overlay the content corresponding to the electronic signature. 7. The method according to claim 4, wherein the content behavior request is to request an electronic signature of the mutual 32 201025110, wherein the step of calculating the electronic signature is required An additional request. 8. The method according to claim 1, wherein the step of receiving the request comprises the step of receiving an input of a user controlled controller and receiving a user control device. At least one of the steps of the text message of the (user controlled device). 9. The method according to claim 1, wherein the video data operated in the detecting step is an uncompressed video material. 10. The method of operating according to video stream as described in claim 9 wherein the uncompressed video material is pixel data. 11. The method according to claim 9, wherein the step of obtaining the characterization image is to obtain at least one implicit object as the region according to the detected features. And the step of calculating the electronic signature is to calculate the electronic signature based on the hidden object in the representation image associated with the request. 12. The method according to claim 9, wherein the step of obtaining the characterization image is to obtain at least one video element of the same region according to the detected features, and The step of calculating the electronic signature is to calculate the electronic signature based on the video element in the characterization image associated with the request. 33 201025110 13. The method according to claim 9, wherein the video information further includes an auxiliary video parameter information separated from the video data, and the detecting step is also based on The auxiliary video parameter information is operated. 14. The method according to claim 13, wherein the video information further comprises an audio data' and wherein the step of detecting is performed according to the audio data. 15. The method according to claim 1, wherein the video information further includes an auxiliary video parameter information separated from the video data and the step of detecting the step is also based on the auxiliary Video parameter information to operate. 16. The method according to claim 1, wherein the video information further comprises an audio material, and wherein the step of detecting is also performed according to the audio data. 17. The method according to claim 1, wherein the step of obtaining the characterization image is based on the detected features to obtain at least one hidden object as the region, and wherein the calculating The step of the electronic signature is to calculate the electronic signature based on the hidden object in the representation image associated with the request. 18. The method according to claim 1, wherein the step of obtaining the characterization image is to obtain at least one video element as the region based on the detected features, and wherein the electronic component is calculated The step of signing is to calculate the electronic signature based on the video element of 34 201025110 in the characterization image associated with the request. 19. The method according to claim 1, wherein the video data operated in the detecting dream is a compressed video material. 20. The method according to claim 19, wherein the compressed video material is motion vector data. 21. The method according to claim 19, wherein the video information further comprises an auxiliary video parameter information separated from the video data, and wherein the detecting step is also performed; According to the auxiliary video parameter information to operate. 22. The method according to claim 21, wherein the video information further comprises an audio material, and wherein the detecting step is further performed according to the audio data. 23. The method according to claim 19, wherein the step of obtaining the characterization image is based on the detected features to obtain at least one hidden object as the region, and wherein the calculating The step of the electronic signature is to calculate the electronic signature based on the hidden object in the representation image associated with the request. 24. The method according to claim 19, wherein the step of obtaining the characterization image is to obtain at least one video element as the region according to the detected features, and wherein the The step of electronically signing is to calculate the electronic signature based on the video element in the characterization image associated with the request. 25. The method of video streaming according to the method of claim 19, 2010 25110, wherein the method of detecting the audio data, and the steps of detecting the data are also operated according to the audio data. . 26. The method according to claim 1, wherein the video string is received from one of a digital video disc (DVD) processed by the user processing system and one of the broadcast contents. The step of receiving the segment of the stream is received from an external source. 27. The method according to the video streaming operation described in claim 1, wherein the 5H user processing system comprises a micro processor, a microprocessor, a digital controller, and a digital signal processor. Processor, DSP) and one of application specific integrated circuit processors. 28. A method according to a video stream operation comprising a continuous frame, comprising at least the following steps: 〇 in a first computing location: receiving the video stream, the video stream comprising a representative continuous signal One of the sequential frames (vide〇information), the video information includes essentially one of the video frames (vide〇data) of the continuous frame image; and the video stream is obtained The representation area of the representation area includes: automatically detecting at least one feature 36 201025110 (features) in the video stream, and automatically detecting the features in the section of the video stream The step is to operate according to the video data of the continuous frame image; calculate at least one electronic signature (dectr〇nie signature) of different regions according to the detected features; and identify an action and associate To each of these electronic signatures. 29. The method according to claim 28, wherein the method according to the video stream operation including the continuous frame further comprises the following steps: from the second calculation position of the same or different position: #亥弟一§10: receiving a client electronic signature; matching the client electronic signature with the characterization image to obtain a particular signature; identifying the behavior associated with the particular signature; and initiating the behavior. 30. The method according to claim 29, wherein the step of identifying the behavior comprises the step of associating a user profile with the specific signature, The method is used to identify the behavior according to the user profile and the specific signature. 31. The method according to claim 29, wherein the step of identifying the behavior comprises the step of identifying the monetary amount and the specific signature of the 2010. 'Use to identify the behavior based on the purchase amount and the particular signature. 32. The method according to claim 29, wherein the specific signature in the matching step is a type of electronic signature with the client in the matching image. Recently it seems like an electronic signature. 33. The method according to claim 32, wherein the most approximate electronic signature in the characterization image is the same signature as the method according to the φ video stream operation comprising the continuous frame. %, as described in claim 28, in accordance with the method of video stream operation including a continuous frame, wherein the video data operated in the detecting step is an uncompressed video material. 35. The method according to claim 34, wherein the uncompressed video material is pixel data, according to a method for video stream operation comprising a continuous frame. </ RTI> 36. The method according to claim 34, wherein the step of obtaining the characterization image is obtained according to the features of the debt measurement as at least one of the regions. The implied object 'and the step in which the electronic signature is calculated is based on each of the implicit-containing objects to calculate the electronic signature. 37. The method according to claim 34, wherein the step of obtaining the characterization image is to obtain at least one video like the region according to the detected features. The element 'and the step in which the electronic signature is calculated is to calculate the electronic signature based on each of the video elements. 38. The method according to claim 34, wherein the video information further comprises an auxiliary video parameter information separated from the video data and detecting the method according to the method of video streaming comprising the continuous frame. The steps will also be based on the auxiliary video parameter information. 39. The method according to claim 38, wherein the video information further comprises an audio % data, and wherein the detecting step is further performed according to the audio data. . The method according to claim 28, wherein the video information further includes an auxiliary video parameter information separated from the video data, and the detected information therein. The step will also operate based on the auxiliary video parameter information. 41. The method according to claim 28, wherein the video information further comprises an audio data, and wherein the detecting step is further performed according to the audio data. . 42. The method according to claim 28, wherein the step of obtaining the characterization image is: obtaining at least one hidden region of the region according to the detected features. The inclusion-object, and the step in which the electronic signature is calculated, is based on each of the hidden objects to calculate the electronic signature. 43. The method according to claim 28, wherein the step of obtaining the characterization image is 39 201025110 obtaining at least one video element according to the _ frequency of the side. And the step of calculating the electronic signature is to calculate the electronic signature according to each of the video elements. 44. The method according to claim 28, wherein the video data operated in the detecting step is a compressed video material. 45. The method of applying the patent model _ 44 is based on a method of operating a video frame comprising a continuous frame, wherein the collapsed video material is motion vector data. 46. The method according to claim 44, wherein the video information further comprises an auxiliary video parameter information separated from the video poor material, and the detected information in the video streaming operation. The step will also operate based on the auxiliary video parameter information. 47. The method according to claim 28, wherein the method according to the video streaming operation comprising the continuous frame receives the digital video disc (DVD) and the broadcast content processed in the user processing system. The step of receiving the segment of the video stream is received from an external source. 48. A method according to video stream operation, comprising at least the following steps: each of a user processing system and a server processing system that are physically separate: receiving the video stream, the The video stream includes video information (video 201025110 information) representing a sequential frame, and the video information includes essentially one video material (vide〇data) for rendering a continuous frame image; The representation map of the representation area in the video stream includes: . automatically detecting at least one feature in the video stream, and automatically detecting the part in the video stream The steps of the feature are: operating the video data of the continuous frame image; and calculating at least one electronic signature of the different regions according to the detected features; and wherein: transmitting One of the electronic signatures is calculated in the user processing system to the server processing system; • _ electronic signature of which is - in the comparison The electronic signature calculated by the server processing system determines one of the electronic signatures of the specific electronic signature of the nose of the processing system as a match; and - according to the matching, the processing The system starts with an action. 49. A method according to video stream operation as described in claim 48, wherein the electronic signature obtained by each of the user processing systems and the server processing system is substantially the same. 41