TW201342935A

TW201342935A - Video compression repository and model reuse

Info

Publication number: TW201342935A
Application number: TW102108202A
Authority: TW
Inventors: Charles P Pace; Darin Deforest; Nigel Lee; Renato Pizzorni; Richard Y Wingard
Original assignee: Euclid Discoveries Llc
Priority date: 2012-03-27
Filing date: 2013-03-08
Publication date: 2013-10-16
Also published as: CA2868784A1; EP2815573A1; JP2015515807A; WO2013148091A1; JP6193972B2

Abstract

Systems and methods of improving video encoding/decoding efficiency may be provided. A feature-based processing stream is applied to video data having a series of video frames. Computer-vision-based feature and object detection algorithms identify regions of interest throughout the video datacube. The detected features and objects are modeled with a compact set of parameters, and similar feature/object instances are associated across frames. Associated feature/objects are formed into tracks, and each track is given a representative, characteristic feature. Similar characteristic features are clustered and then stored in a model library, for reuse in the compression of other videos. A model-based compression framework makes use of the preserved model data by detecting features in a new video to be encoded, relating those features to specific blocks of data, and accessing similar model information from the model library. The formation of model libraries can be specialized to include personal, ''smart'' model libraries, differential libraries, and predictive libraries. Predictive model libraries can be modified to handle a variety of demand scenarios.

Description

Video compression repository and model reuse

本發明關於視訊壓縮儲存庫及模型再利用。 The invention relates to a video compression repository and model reuse.

相關申請案 Related application

本申請案主張2012年5月22日提申的美國臨時申請案第61/650,363號和2012年3月27日提申的美國臨時申請案第61/616,334號的利益。本申請案還主張2013年2月20日提申的美國專利申請案第13/772,230號的優先權。2013年2月20日提申的美國專利申請案第13/772,230號也是2()09年10月6日提申的美國專利申請案第13/121,904號的部分接續案，美國專利申請案第13/121,904號為2009年10月6日提申的國際專利申請案第PCT/US2009/059653號的美國國家階段，該案指定國為美國且以英文公開並且主張2008年10月7日提申的美國臨時申請案第61/103,362號的利益。申請案第13/121,904號也是2008年1月4日提申的美國專利申請案第12/522,322號的部分接續案，美國專利申請案第12/522,322號為2008年1月4日提申的國際專利申請案第PCT/US2008/000090號的美國國家階段，該案指定國為美國且以英文公開，該案主張2007年1月23日提申的美國臨時申請案第60/881,966號的利益並且和2006年6月8日提申的美國臨時申請案第60/811,890號有關，而且係2006年3月31日提申的美國申請案第11/396,010號(已於2008年11月25日獲頒為美國專利案第7,457,472號)的部分接續案，該案係2006年1月20日提申的美國申請案第11/336,366號(已於2008年10月14日獲頒為美國專利案第7,436,981號)的部分接續案，該案係2005年11月16日提申的美國申請案第11/280,625號(已於2008年11月25日獲頒為美國專利案第7,457,435號)的部分接續案，該案主張2004年11月17日提申的美國臨時申請案第60/628,819號以及2004年11月17日提申的美國臨時申請案第60/628,861號的利益並且係2005年9月20日提申的美國申請案第11/230,686號(已於2008年9月16日獲頒為美國專利案第7,426,285號)的部分接續案，該案係2005年7月28日提申的美國申請案第11/191,562號(已於2007年1月2日獲頒為美國專利案第7,158,680號)的部分接續案，該案主張2004年7月30日提申的美國臨時申請案第60/598,085號的利益。美國申請案第11/396,010號也主張2005年3月31日提申的美國臨時申請案第60/667,532號以及2005年4月13日提申的美國臨時申請案第60/670,951號的優先權。 This application claims the benefit of U.S. Provisional Application No. 61/650,363, filed on May 22, 2012, and U.S. Provisional Application No. 61/616,334, filed on March 27, 2012. The present application also claims priority to U.S. Patent Application Serial No. 13/772,230, filed on February 20, 2013. U.S. Patent Application Serial No. 13/772,230, filed on Feb. 20, 2013, which is incorporated herein by reference in its entirety, in its entirety, the entire entire entire entire entire entire entire entire entire entire entire entire entire entire entire entire entire content 13/121,904 is the US national phase of International Patent Application No. PCT/US2009/059653, which was filed on October 6, 2009. The designated country is the United States and is published in English and claims to be submitted on October 7, 2008. US Provisional Application No. 61/103,362. Application No. 13/121,904 is also a continuation of U.S. Patent Application Serial No. 12/522,322, filed on Jan. 4, 2008, which is incorporated herein by reference. In the US national phase of International Patent Application No. PCT/US2008/000090, the designated country of the case is the United States and is published in English. The case claims the benefit of US Provisional Application No. 60/881,966, which was filed on January 23, 2007. And in relation to U.S. Provisional Application No. 60/811,890, which was filed on June 8, 2006, and which is filed on March 31, 2006, U.S. Application No. 11/396,010 (already on November 25, 2008) Department awarded as US Patent No. 7,457,472) The continuation case is a continuation of the US application No. 11/336,366 (issued as US Patent No. 7,436,981 on October 14, 2008), which was filed on January 20, 2006. Part of the continuation of US Application No. 11/280,625 (issued as US Patent No. 7,457,435 on November 25, 2008), which was filed on November 16, 2005, which claims November 17, 2004. U.S. Provisional Application No. 60/628,819, issued to Japan, and U.S. Provisional Application No. 60/628,861, filed on November 17, 2004, and the U.S. Application No. 11 filed on September 20, 2005 Partial succession of No. 230,686 (issued as US Patent No. 7,426,285 on September 16, 2008), filed on July 28, 2005, US Application No. 11/191,562 (already Part of the continuation of the US Patent Case No. 7,158,680 on January 2, 2007, which claims the benefit of US Provisional Application No. 60/598,085, which was filed on July 30, 2004. US Application No. 11/396,010 also claims priority to U.S. Provisional Application No. 60/667,532, filed on March 31, 2005, and U.S. Provisional Application No. 60/670,951, filed on April 13, 2005. .

本申請案同樣和2012年12月21日提申的美國專利申請案第13/725,940號有關，該案主張2012年9月28日提申的美國臨時申請案第61/707,650號和2012年3月26日提申的美國臨時申請案第61/615,795號的利益。 This application is also related to U.S. Patent Application Serial No. 13/725,940, filed on Dec. 21, 2012, which is incorporated herein by reference to U.S. Provisional Application Serial No. 61/707,650 and No. Benefits of US Provisional Application No. 61/615,795, which was filed on the 26th of the month.

本文以引用的方式將上面申請案的完整教示內容併入。 The complete teachings of the above application are hereby incorporated by reference.

視訊壓縮被視為在儲存或傳送時以使用較少位元的形式來代表數位視訊資料的方法。視訊壓縮演算法運用視訊資料在空間、時間、或顏色空間中的冗餘性(redundancy)和無關性(irrelevancy)達到壓縮之目的。視訊壓縮演算法通常將視訊資料切割成多個部分，例如，多個訊框群和多個圖像點(pel)，用以辨識該視訊裡面能夠以少於原始視訊資料之位元來表示的冗餘區域。當資料中的此等冗餘性被減少，便會達到更大的壓縮效果。編碼器係被用來將視訊資料轉換成編碼格式，而解碼器則被用來將經過編碼的視訊轉換回到相當於原始視訊資料的形式。編碼器/解碼器的施行方式稱為編解碼器(codec)。 Video compression is considered a method of representing digital video material in the form of using fewer bits when storing or transmitting. The video compression algorithm uses the redundancy and irrelevancy of video data in space, time, or color space to achieve compression. Video compression algorithms usually cut video data into multiple parts, for example, multiple frame groups and multiple An image point (pel) for identifying a redundant area in the video that can be represented by fewer bits than the original video material. When the redundancy in the data is reduced, a greater compression effect is achieved. The encoder is used to convert the video data into an encoded format, and the decoder is used to convert the encoded video back to the form equivalent to the original video material. The implementation of the encoder/decoder is called a codec.

標準編碼器會將一給定的視訊訊框分割成多個不重疊代碼單元(coding unit)或巨集區塊(macroblock)(由多個連續圖像點組成的矩形區域)，以便進行編碼，該等巨集區塊通常係以從左至右的橫向順序及該訊框中由上至下的方式來處理。當利用先前已有代碼的資料來預測並編碼巨集區塊時，便會達到壓縮之目的。利用相同訊框裡面先前已有代碼之巨集區塊的空間鄰近取樣來編碼巨集區塊的方法稱為訊框內預測(intra-prediction)。訊框內預測試圖運用資料中的空間冗餘性。利用先前已有代碼之訊框中相似的區域來編碼巨集區塊，連同移動估計模型(motion estimation model)，則稱為訊框間預測(inter-prediction)。訊框間預測試圖運用資料中的時間冗餘性。 Standard encoder splits a given video frame into multiple non-overlapping codes A coding unit or a macroblock (a rectangular area consisting of a plurality of consecutive image points) for encoding, the macroblocks are usually in a horizontal order from left to right and the message The box is handled from top to bottom. Compression is achieved when data from previously existing code is used to predict and encode macroblocks. The method of encoding a macroblock using spatial proximity sampling of a macroblock of a previously existing code in the same frame is called intra-prediction. Intraframe prediction attempts to exploit spatial redundancy in the data. Encoding a macroblock using a similar area of a previously existing code frame, together with a motion estimation model, is called inter-prediction. Inter-frame prediction attempts to exploit temporal redundancy in the data.

編碼器可藉由測量要被編碼之資料和預測值之間的差異而產生殘值(residual)。此殘值能夠提供預測巨集區塊和原始巨集區塊之間的差異。編碼器會產生移動向量資訊，舉例來說，其詳細描述一參考訊框中的巨集區塊相對於正在被編碼或解碼之巨集區塊的位置。該等預測值、移動向量(用於訊框間預測)、殘值、以及相關資料會與其它處理(例如，空間轉換、量化器、熵值(entropy)編碼器、以及迴路濾波器)結合，用以產生該視訊資料之有效的編碼。經過量化和轉換的殘值會被處理並且加回到該預測值、彙集成解碼訊框、並且儲存在訊框儲存體(framestore)之中。熟習本技術的人士便熟悉此等用於視訊之編碼技術的細節。 The encoder can measure the difference between the data to be encoded and the predicted value. Generate residual (residual). This residual value can provide a difference between the predicted macroblock and the original macroblock. The encoder generates motion vector information, for example, which details the location of the macroblock in a reference frame relative to the macroblock being encoded or decoded. The predictors, motion vectors (for inter-frame prediction), residual values, and related data are combined with other processing (eg, spatial transforms, quantizers, entropy encoders, and loop filters). A valid code used to generate the video material. The quantized and transformed residual values are processed and added back to the prediction The values are aggregated into decoded frames and stored in a frame store. Those skilled in the art will be familiar with the details of such encoding techniques for video.

H.264/MPEG-4 Part 10 AVC(進階視訊代碼術(advanced video coding))，下文中稱為H.264，係一種用於視訊壓縮的編解碼器標準，其運用以區塊為基礎的移動預測與補償並且以比較低的位元率(bitrate)達成高品質視訊代表符。此標準係用於藍光碟片創設並且係主要視訊散佈頻道(其包含網際網路上視訊串流、視訊會議、有線電視、以及直接廣播衛星電視)裡面的編碼選項中的其中一者。用於H.264的基礎代碼單元係16X16巨集區塊。H.264係視訊壓縮中最近最廣泛接受的標準。 H.264/MPEG-4 Part 10 AVC (advanced video code) Coding)), hereinafter referred to as H.264, is a codec standard for video compression that uses block-based motion prediction and compensation and achieves high quality at a relatively low bitrate. Video representative. This standard is used for Blu-ray disc creation and is one of the coding options in the main video distribution channel (which includes video streaming over the Internet, video conferencing, cable TV, and direct broadcast satellite TV). The basic code unit for H.264 is a 16X16 macroblock. H.264 is the most widely accepted standard in video compression recently.

基礎MPEG標準依據訊框中的巨集區塊如何被編碼定義三種類型訊框(或圖像)。I訊框(訊框內有代碼的圖像)僅利用出現在訊框本身中的資料進行編碼。一般來說，當編碼器接收視訊訊號資料時，該編碼器會先產生I訊框並且將該視訊訊框資料切割成各自利用訊框內預測進行編碼的多個巨集區塊。因此，I訊框僅由訊框內預測巨集區塊(或是「訊框內巨集區塊」)所組成。I訊框的編碼成本很高，因為進行編碼時並沒有受惠於來自先前解碼訊框的資訊。P訊框(預測圖像)係透過前向預測(forward prediction)來編碼，利用來自先前解碼I訊框或P訊框(亦稱為參考訊框)的資料。P訊框可能含有訊框內巨集區塊或(前向)預測巨集區塊。B訊框(雙向預測性圖像)係透過雙向預測來編碼，利用來自先前訊框和後續訊框兩者的資料。B訊框可能含有訊框內巨集區塊、(前向)預測巨集區塊、或是雙向預測巨集區塊。 The basic MPEG standard defines three types of frames (or images) depending on how the macroblocks in the frame are encoded. The I frame (the image with the code in the frame) is encoded using only the data that appears in the frame itself. Generally, when the encoder receives the video signal data, the encoder first generates an I frame and cuts the video frame data into a plurality of macroblocks each encoded by the intraframe prediction. Therefore, the I frame consists only of intra-frame prediction macro blocks (or "intra-frame macro blocks"). The encoding of the I-frame is very costly because the encoding does not benefit from the information from the previously decoded frame. The P-frame (predicted image) is encoded by forward prediction, using data from previously decoded I-frames or P-frames (also known as reference frames). The P frame may contain intra-frame macroblocks or (forward) predictive macroblocks. The B-frame (bidirectional predictive image) is encoded by bi-directional prediction, using data from both the previous frame and the subsequent frame. The B frame may contain macroblocks within the frame, (forward) predicted macroblocks, or bidirectionally predicted macroblocks.

如上面所述，習知訊框間預測係依據以區塊為基礎的移動預測和補償(Block Based Motion Estimation and Compensation，BBMEC)。BBMEC 方法會在目標巨集區塊(正在被編碼的目前巨集區塊)和先前解碼參考訊框裡面的相似大小區域之間搜尋最佳匹配。當找到最佳匹配時，編碼器便可能會傳送一移動向量。該移動向量可能包含一指向最佳匹配之訊框位置的指向符(pointer)以及和該最佳匹配及對應目標巨集區塊間的差異有關的資訊。任何人都可依此方式在整個視訊「資料塊(datacube)」(高度X寬度X的訊框)中以可信服的方式來實施竭盡式搜尋，用以找出每一個巨集區塊的最佳可能匹配；但是，竭盡式搜尋通常會有運算禁止性。因此，BBMEC搜尋方法就被搜尋之參考訊框而言的時間上以及就被搜尋之鄰近區域而言的空間上都會受到限制。這意謂著未必會找到「最佳可能(best possible)」匹配，尤其是快速變化的資料。 As described above, conventional inter-frame prediction is based on Block Based Motion Estimation and Compensation (BBMEC). BBMEC The method searches for the best match between the target macroblock (the current macroblock being encoded) and the similarly sized area in the previously decoded reference frame. When a best match is found, the encoder may transmit a motion vector. The motion vector may include a pointer to the best matching frame position and information related to the difference between the best match and the corresponding target macro block. In this way, anyone can perform a exhaustive search in the entire video "datacube" (height X width X frame) in a trusted manner to find the most of each macro block. Good matches may be possible; however, exhaustive search usually has computational prohibitions. Therefore, the BBMEC search method is limited in terms of the time in which the reference frame is searched and in the vicinity of the searched neighborhood. This means that it is not always possible to find "best possible" matches, especially for fast-changing data.

有一種特殊的參考訊框集稱為圖像群(Group Of Pictures， GOP)。GOP僅含有每一個參考訊框裡面的已解碼圖像點而且不包含和該等巨集區塊或訊框本身原始如何被編碼(I訊框、B訊框、或是P訊框)有關的資訊。較早期的視訊壓縮標準(例如，MPEG-2)使用一個參考訊框(先前訊框)來預測P訊框並且使用兩個參考訊框(一個過去訊框，一個未來訊框)來預測B訊框。相反地，H.264標準則允許使用多個參考訊框來進行P訊框預測和B訊框預測。該等參考訊框雖然通常時間上相鄰於目前訊框；不過，其亦適應於從該時間鄰近訊框集的外面來指定參考訊框。 There is a special set of reference frames called Group Of Pictures. GOP). The GOP only contains the decoded image points in each reference frame and does not contain how the macro blocks or frames themselves are encoded (I-frame, B-frame, or P-frame). News. Earlier video compression standards (eg, MPEG-2) used a reference frame (previous frame) to predict the P frame and used two reference frames (a past frame, a future frame) to predict B messages. frame. Conversely, the H.264 standard allows the use of multiple reference frames for P-frame prediction and B-frame prediction. The reference frames are typically temporally adjacent to the current frame; however, they are also adapted to specify a reference frame from outside the set of adjacent frames at that time.

習知壓縮法允許摻混(blending)多個訊框中的多個匹配，用以預測目前訊框中的區域。該摻混經常為該等匹配之線性或對數尺度線性組合。此雙向預測方法有效的其中一種範例係當隨著時間從一影像消褪成另一影像。消褪的過程為兩個影像之線性摻混，而且該過程有時候能夠利用雙向預測來有效模擬。某些舊式標準編碼器(例如，MPEG-2內插模式)允許內插線性參數，以便在眾多訊框中合成該雙向預測模型。 Conventional compression allows for blending multiple matches in multiple frames for Predict the area in the current frame. This blending is often a linear combination of the matching linear or logarithmic scales. One example of this bi-directional prediction method is that it fades from one image to another over time. The process of fading is a linear blend of two images, and the process can sometimes be utilized Bidirectional prediction for effective simulation. Some legacy standard encoders (eg, MPEG-2 interpolation mode) allow interpolation of linear parameters to synthesize the bi-predictive model in a number of frames.

H.264標準還藉由將訊框分成稱為切片之由一或多個連續巨集區塊組成之空間上不同的區域而引進額外的編碼靈活性。一訊框中的每一個切片會以和其它切片獨立的方式來編碼(且因而以獨立的方式來解碼)。I切片、P切片、以及B切片接著會以和上面所述之訊框類型類似的方式來定義，而且一訊框會由多種切片類型組成。除此之外，編碼器排序該等經處理切片的方式通常有靈活性，所以，解碼器會以該等切片抵達該解碼器的任意順序來處理它們。 The H.264 standard also consists of one or more continuous giants by dividing the frame into slices. The additional spatial flexibility is introduced by the different spatial regions of the block. Each slice in a frame is encoded in a manner independent of the other slices (and thus decoded in an independent manner). The I slice, the P slice, and the B slice are then defined in a manner similar to the frame type described above, and the frame is composed of a plurality of slice types. In addition to this, the manner in which the encoder sorts the processed slices is generally flexible, so the decoder processes them in any order in which the slices arrive at the decoder.

歷史上已經有人提出基於模型的壓縮技術，用以避免 BBMEC預測的限制。此等基於模型的壓縮技術(最為人熟知的可能係MPEG-4 Part 7標準)依賴於檢測與追蹤視訊中的物體或特徵以及以和該視訊訊框中其餘部分分開的方式來編碼此等特徵/物體的方法。然而，此等基於模型的壓縮技術卻會遭遇到將視訊訊框切割成物體區和非物體區(特徵區域和非特徵區域)的困難。首先，因為物體可能係任意大小，所以，除了它們的紋理(色彩內容(color content))之外，它們的形狀亦必須被編碼。其次，追蹤多個移動物體可能相當困難，而且不精確的追蹤會導致不正確的切割，其通常會導致不良的壓縮效能。第三項難題係，並非所有視訊皆由物體或特徵組成，所以，當物體/特徵不存在時則需要用到備援編碼技術。 Model-based compression techniques have been proposed in history to avoid Limitations of BBMEC predictions. Such model-based compression techniques (most well known as the MPEG-4 Part 7 standard) rely on detecting and tracking objects or features in the video and encoding these features in a manner separate from the rest of the video frame. / object method. However, such model-based compression techniques suffer from the difficulty of cutting video frames into object areas and non-object areas (feature areas and non-feature areas). First, because objects may be of any size, their shape must be encoded in addition to their texture (color content). Second, tracking multiple moving objects can be quite difficult, and inaccurate tracking can result in incorrect cutting, which often results in poor compression performance. The third problem is that not all video is composed of objects or features, so when the object/feature does not exist, the backup coding technique is needed.

H.264標準雖然可讓編解碼器以低於先前標準(例如， MPEG-2和MPEG-4 ASP(進階簡單檔案描述(Advanced Simple Profile，ASP)))的檔案大小來提供較佳品質的視訊；但是，施行H.264標準的「習知」壓縮編解碼器通常要在操作於有限頻寬網路上之記憶體有限的裝置(例如，智慧型電話以及其它行動裝置)中非常努力地維持更大視訊品質和解析度的需求。視訊品質和解析度通常會被犧牲以便在此等裝置上達到差強人意的播放效果。進一步言之，當視訊解析度提高時，檔案大小會提高，在此等裝置上及外面儲存視訊便成為潛在問題。 The H.264 standard allows codecs to be lower than previous standards (for example, MPEG-2 and MPEG-4 ASP (Advanced Simple Profile (ASP)) file size to provide better quality video; however, the implementation of the H.264 standard "known" compression Codecs typically struggle to maintain greater video quality and resolution requirements in devices with limited memory (eg, smart phones and other mobile devices) operating on limited bandwidth networks. Video quality and resolution are often sacrificed to achieve unsatisfactory playback on such devices. Furthermore, as video resolution increases, file size increases, and storing video on and off such devices becomes a potential problem.

申請人提出之共同申請中之美國申請案第13/725,940號(本文中稱為「940號申請案」)提出一種基於模型的壓縮技術，其會避免上面提到的切割問題。申請人之共同申請中之940號申請案中的基於模型壓縮架構(Model-Based Compression Framework，MBCF)還會檢測與追蹤物體/特徵，以便辨識視訊訊框中要編碼的重要區域，其並沒有嘗試明確地編碼此等物體/特徵。確切地說，該等物體/特徵和鄰近的巨集區塊有關，而被編碼的係該等巨集區塊，如同「習知」編解碼器中般。此隱含的(implicit)使用模擬資訊可以下面兩種方式減輕切割問題：保持固定的代碼單元(巨集區塊)尺寸(從而不需要編碼物體/特徵形狀)，以及減少不精確追蹤的影響(因為追蹤雖有助於卻未規定移動預測步驟)。除此之外，共同申請中之940號申請案中的MBCF還以多重保真度來模擬視訊資料，其包含在物體/特徵不存在時採用習知壓縮法的備援作法；此種混合編碼技術確保僅在需要時才會用到模擬資訊，而不會在不需要時不正確地應用。 U.S. Application Serial No. 13/725,940 (herein referred to as "Application No. 940") in the co-pending application of the Applicant proposes a model-based compression technique which avoids the cutting problem mentioned above. The Model-Based Compression Framework (MBCF) in the applicant's co-application No. 940 also detects and tracks objects/features in order to identify important areas to be encoded in the video frame. Try to explicitly encode these objects/features. Rather, the objects/features are associated with neighboring macroblocks, and the macroblocks that are encoded are as in the "known" codec. This implicit use of analog information can mitigate the cutting problem in two ways: maintaining a fixed code unit (macroblock size) (thus eliminating the need to encode objects/feature shapes) and reducing the effects of inaccurate tracking ( Because tracking helps, but does not specify the mobile prediction step). In addition, the MBCF in the 940 application in the co-application also simulates video data with multiple fidelity, including a backup method using a conventional compression method when the object/feature does not exist; such hybrid coding Technology ensures that analog information is used only when needed, and is not applied incorrectly when not needed.

Mead提申的美國專利案第6,088,484號提出標準基於模型壓縮技術的一種擴充作法，其中，於其中一視訊中檢測到的物體會被儲存並且接著再利用，以便幫助壓縮另一視訊中的相似物體。然而，此份Mead專利中的模型再利用壓縮技術涉及明確或直接編碼新視訊中的物體/特徵，並且因而會遭遇上面提及之相同的切割問題(也就是，從非物體/非特徵中精確切割物體/特徵的難題)。本發明在共同申請中之940號申請案的架構裡面提出一種模型再利用壓縮技術，其隱含的使用物體/特徵模型來表示要編碼之重要巨集區塊可以避免切割問題，同時保留模擬的大部分好處，以便改善編碼器預測。 U.S. Patent No. 6,088,484 to Mead, which proposes an extension of the standard model-based compression technique in which objects detected in one of the videos are stored and then reused to help compress similar objects in another video. . However, this Mead specializes The model reuse compression technique involves explicit or direct encoding of objects/features in new video, and thus encounters the same cutting problems mentioned above (ie, precise cutting of objects/features from non-object/non-features) problem). The present invention proposes a model reuse compression technique in the framework of the 940 application in the co-application, the implicit use of the object/feature model to represent the important macroblock to be encoded can avoid the cutting problem while retaining the simulation Most of the benefits in order to improve encoder prediction.

本發明發現習知視訊編解碼器之訊框間預測方法中的基本限制並且應用較高階模擬來克服此等限制並提供改善訊框間預測結果，同時保持和習知編碼器相同的一般性處理流程及架構。 The invention finds the basics in the inter-frame prediction method of the conventional video codec Limiting and applying higher order simulations overcomes these limitations and provides improved inter-frame prediction results while maintaining the same general processing flow and architecture as conventional encoders.

本發明建構在共同申請中之940號申請案中提出之基於模型壓縮方式上，其中，視訊裡面的特徵會被檢測、模擬、以及追蹤，而且該特徵資訊會被用來改善相同視訊裡面後面資料的預測和編碼效果。940號申請案的此種「上線式(online)」基於特徵預測(其中，特徵資訊會被產生並且用來幫助編碼相同視訊中後面的視訊切割段)在本發明中擴充為「離線式(offline)」基於特徵預測，其中，來自其中一視訊的特徵資訊會被保留或保存至模型程式庫之中，以便再利用來辨識目標巨集區塊並且從而幫助編碼另一視訊中的資料。達成此目的的方式不需要在目標視訊中進行特徵切割。標準的壓縮技術以及940號申請案中的上線式預測雖然試圖運用單一視訊裡面的時間冗餘性；不過，本發明提出的離線預測則試圖運用跨越多個視訊的冗餘性。 The present invention is based on the model proposed in the 940 application of the co-application. In the compression mode, the features in the video are detected, simulated, and tracked, and the feature information is used to improve the prediction and encoding of the data in the same video. This "online" based on application No. 940 is based on feature prediction (where feature information is generated and used to help encode subsequent video cuts in the same video) expanded to "offline" in the present invention. Based on feature prediction, feature information from one of the videos is retained or saved to the model library for reuse to identify the target macroblock and thereby assist in encoding the data in another video. The way to achieve this is not to perform feature cutting in the target video. The standard compression technique and the on-line prediction in the 940 application attempt to exploit temporal redundancy in a single video; however, the offline prediction proposed by the present invention attempts to exploit redundancy across multiple video.

本發明的離線基於特徵壓縮技術的四個主要組成如下：(i) 從一或多個輸入視訊中產生特徵模型及相關資訊並且保存該特徵資訊；(ii) 再利用該被保存的特徵資訊來改善另一視訊(不同於該等輸入視訊或是和該等輸入視訊無關)的壓縮效果，避免在該視訊中進行特徵切割；(iii)從多個輸入視訊組成的大型集合中形成沒有該特徵資訊的特徵模型程式庫；(iv)使用該特徵模型程式庫來解碼該等無關的或目標視訊。模型程式庫之形成會經過特殊化而包含個人「智慧型」模型程式庫、差別性程式庫、以及預測性程式庫。預測性模型程式庫會經過修正，用以應付各式各樣需求情境。 The four main components of the offline feature-based compression technique of the present invention are as follows: (i) Generating a feature model and related information from one or more input videos and saving the feature information; (ii) Reusing the saved feature information to improve the compression effect of another video (unlike the input video or independent of the input video), to avoid feature cutting in the video; (iii) from multiple input video Forming a feature model library without the feature information in the large set; (iv) using the feature model library to decode the unrelated or target video. The formation of the model library will be specialized to include personal "smart" model libraries, differential libraries, and predictive libraries. The predictive model library is modified to handle a wide variety of demand scenarios.

10-1~10-n‧‧‧檢測的特徵實例 10-1~10-n‧‧‧Featured examples of detection

20-1~20-n‧‧‧訊框 20-1~20-n‧‧‧ frame

30-1~30-n‧‧‧區域 30-1~30-n‧‧‧Area

40‧‧‧總體矩陣 40‧‧‧Overall matrix

50‧‧‧先前檢測的特徵 50‧‧‧Features detected previously

60-1~60-n‧‧‧特徵 60-1~60-n‧‧‧Characteristics

70‧‧‧特徵追蹤器 70‧‧‧Feature Tracker

80‧‧‧特徵檢測器 80‧‧‧Feature Detector

90‧‧‧目前訊框 90‧‧‧ Current frame

170‧‧‧網路 170‧‧‧Network

300‧‧‧流程圖 300‧‧‧ Flowchart

310‧‧‧步驟 310‧‧‧Steps

312‧‧‧步驟 312‧‧ steps

314‧‧‧步驟 314‧‧‧Steps

316‧‧‧步驟 316‧‧‧Steps

318‧‧‧步驟 318‧‧‧Steps

320‧‧‧步驟 320‧‧‧Steps

322‧‧‧步驟 322‧‧‧Steps

324‧‧‧步驟 324‧‧‧Steps

326‧‧‧步驟 326‧‧‧Steps

328‧‧‧步驟 328‧‧‧Steps

414‧‧‧成員數 414‧‧‧Number of members

416‧‧‧以特徵的頻譜色彩映圖為基礎來叢集特有特徵之後的結果 416‧‧‧ Results after clustering unique features based on characteristic spectral color maps

418‧‧‧叢集成員 418‧‧‧ cluster members

420‧‧‧以特數的SURF描述符為基礎來叢集特有特徵之後的結果 420‧‧‧ Results after clustering unique features based on the SURF descriptor of the special number

422‧‧‧叢集成員 422‧‧‧ cluster members

500‧‧‧步驟 500‧‧‧ steps

510‧‧‧步驟 510‧‧ steps

520‧‧‧步驟 520‧‧‧Steps

530‧‧‧步驟 530‧‧‧Steps

610‧‧‧步驟 610‧‧‧Steps

612‧‧‧步驟 612‧‧ steps

614‧‧‧步驟 614‧‧‧Steps

616‧‧‧步驟 616‧‧‧Steps

618‧‧‧步驟 618‧‧ steps

620‧‧‧步驟 620‧‧‧Steps

632‧‧‧步驟 632‧‧‧Steps

634‧‧‧步驟 634‧‧‧Steps

702‧‧‧視訊資料塊 702‧‧‧Video data block

704A~704F‧‧‧訊框 704A~704F‧‧‧ frame

706A~706F‧‧‧行 706A~706F‧‧‧

708A~708F‧‧‧列 708A~708F‧‧‧

802‧‧‧步驟 802‧‧ steps

804‧‧‧步驟 804‧‧‧ steps

806‧‧‧步驟 806‧‧‧Steps

808‧‧‧步驟 808‧‧‧Steps

810‧‧‧步驟 810‧‧‧Steps

812‧‧‧步驟 812‧‧‧ steps

814‧‧‧步驟 814‧‧‧Steps

816‧‧‧步驟 816‧‧ steps

818‧‧‧步驟 818‧‧‧Steps

820‧‧‧步驟 820‧‧‧Steps

822‧‧‧流程圖 822‧‧‧Flowchart

824‧‧‧步驟 824‧‧‧Steps

826‧‧‧步驟 826‧‧‧Steps

828‧‧‧步驟 828‧‧ steps

830‧‧‧步驟 830‧‧ steps

832‧‧‧步驟 832‧‧‧Steps

834‧‧‧步驟 834‧‧ steps

836‧‧‧步驟 836‧‧ steps

838‧‧‧步驟 838‧‧‧Steps

840‧‧‧步驟 840‧‧‧Steps

850‧‧‧流程圖 850‧‧‧flow chart

852‧‧‧步驟 852‧‧‧Steps

854‧‧‧步驟 854‧‧‧Steps

856‧‧‧步驟 856‧‧ steps

858‧‧‧步驟 858‧‧‧Steps

860‧‧‧步驟 860‧‧‧Steps

902‧‧‧視訊儲存庫 902‧‧‧Video Repository

904‧‧‧已編碼視訊集 904‧‧‧Coded video set

906A‧‧‧第一視訊集 906A‧‧‧First Video Set

906B‧‧‧第二視訊集 906B‧‧‧Second video set

906C‧‧‧第三視訊集 906C‧‧‧ Third Video Set

906D‧‧‧第N視訊集 906D‧‧‧Nth Video Set

908‧‧‧客戶裝置 908‧‧‧Customer device

910‧‧‧串流解碼模組 910‧‧‧Streaming decoder module

911‧‧‧已解碼視訊 911‧‧‧Decoded video

912A‧‧‧記憶體 912A‧‧‧ memory

912B‧‧‧顯示器 912B‧‧‧ display

912C‧‧‧儲存模組 912C‧‧‧ storage module

914‧‧‧請求產生模組 914‧‧‧Request generation module

916‧‧‧視訊請求 916‧‧‧Video request

918‧‧‧請求接收模組 918‧‧‧Request receiving module

920‧‧‧被請求視訊查詢 920‧‧‧Requested video inquiries

922‧‧‧被請求的視訊 922‧‧‧Requested video

924‧‧‧串流產生模組 924‧‧‧Stream generation module

926‧‧‧經產生程式庫 926‧‧‧ generated library

928‧‧‧被請求之已編碼視訊 928‧‧‧Requested encoded video

952‧‧‧被請求視訊之程式庫的客戶版本查詢 952‧‧‧Customer version of the requested video library

954‧‧‧版本設計模組 954‧‧‧ version design module

956‧‧‧被請求視訊之程式庫的客戶版本 956‧‧‧Customer version of the library requested to be video

958‧‧‧差別性程式庫產生模組 958‧‧‧Different library generation module

960‧‧‧視訊編碼模組 960‧‧‧Video Coding Module

962‧‧‧差別性程式庫 962‧‧‧Differential library

964‧‧‧程式庫儲存模組 964‧‧‧Library storage module

966‧‧‧程式庫配置模組 966‧‧‧Program Configuration Module

970‧‧‧組合程式庫 970‧‧‧Combined library

980‧‧‧特徵模型 980‧‧‧Characteristic model

982‧‧‧特徵模型 982‧‧‧Characteristic model

1002‧‧‧預測性程式庫產生模組 1002‧‧‧ Predictive library generation module

1004‧‧‧使用者行為描述檔模組 1004‧‧‧User Behavior Description File Module

1006‧‧‧使用者行為描述檔 1006‧‧‧User behavior description file

1008‧‧‧預測性編碼視訊請求 1008‧‧‧ Predictively encoded video request

1010‧‧‧用於編碼之視訊 1010‧‧‧Video for encoding

1012‧‧‧經預測性產生的程式庫 1012‧‧‧ Predictively generated libraries

1110‧‧‧客戶電腦/裝置 1110‧‧‧Customer computer/device

1112‧‧‧雲端 1112‧‧‧Cloud

1114‧‧‧電腦程式傳播訊號產品 1114‧‧‧Computer Program Communication Signal Products

1116‧‧‧通訊網路 1116‧‧‧Communication network

1118‧‧‧I/O裝置介面 1118‧‧‧I/O device interface

1120‧‧‧中央處理器單元 1120‧‧‧Central Processor Unit

1122‧‧‧網路介面 1122‧‧‧Internet interface

1124‧‧‧電腦軟體指令 1124‧‧‧Computer Software Instructions

1126‧‧‧OS程式 1126‧‧‧OS program

1128‧‧‧資料 1128‧‧‧Information

1130‧‧‧記憶體 1130‧‧‧ memory

1132‧‧‧碟片儲存體 1132‧‧‧ disc storage

1134‧‧‧系統匯流排 1134‧‧‧System Bus

從隨附圖式中所示之本發明的範例實施例的下面更特別說明中會明白前述內容，其中，在所有不同圖式中，相同的元件符號表示相同元件。該等圖式未必依照比例縮放，重點放在圖解本發明之實施例。 The foregoing will be understood from the following detailed description of the embodiments of the invention, The drawings are not necessarily to scale, emphasis is placed on illustrating the embodiments of the invention.

圖1所示的係根據本發明實施例之特徵模擬的方塊圖。 Figure 1 is a block diagram of a feature simulation in accordance with an embodiment of the present invention.

圖2所示的係根據本發明實施例之特徵追蹤的方塊圖。 2 is a block diagram of feature tracking in accordance with an embodiment of the present invention.

圖3所示的係儲存庫之範例實施例所運用之模型抽出和產生特徵模型之方法的流程圖。 A flow chart of a method for extracting and generating a feature model for a model embodiment of the system repository shown in FIG.

圖4A所示的係根據範例施行方式之基於特徵壓縮工具的螢幕截圖。 Figure 4A shows a screenshot of a feature-based compression tool based on an exemplary implementation.

圖4B所示的係根據範例施行方式之基於特徵壓縮工具的螢幕截圖。 Figure 4B shows a screenshot of a feature-based compression tool based on an exemplary implementation.

圖5所示的係將巨集區塊模擬成對齊巨集區塊邊界之特徵的處理元件的方塊圖。 Figure 5 is a block diagram of a processing element that simulates a macroblock as a feature that aligns the boundaries of a macroblock.

圖6A所示的係產生儲存庫所運用之索引值(index)的方法的流程圖。 Figure 6A is a flow diagram of a method of generating an index value used by a repository.

圖6B所示的係使用儲存庫所運用之索引值的方法的流程圖。 Figure 6B is a flow diagram of a method of using index values used by a repository.

圖7所示的係正規化方塊的方塊圖。該正規化方塊係多個關聯表格之集合。 Figure 7 is a block diagram of a normalized block. The normalized block is a collection of multiple associated tables.

圖8A所示的係用以產生索引值的方法的範例實施例的流程圖。 A flowchart of an exemplary embodiment of a method for generating an index value is shown in FIG. 8A.

圖8B所示的係使用索引值來查詢特徵之方法的範例實施例的流程圖。 A flowchart of an exemplary embodiment of a method of querying features using index values is shown in FIG. 8B.

圖8C所示的係儲存庫的一範例實施例所運用之另一方法的流程圖。 A flow chart of another method utilized by an exemplary embodiment of the system repository shown in FIG. 8C.

圖8D所示的係儲存庫的一範例實施例所運用之方法的流程圖。 A flow chart of a method employed by an exemplary embodiment of the system repository shown in Figure 8D.

圖9A所示的係在網路上和一客戶裝置操作連接的儲存庫的範例實施例的方塊圖。 Figure 9A is a block diagram of an exemplary embodiment of a repository operatively coupled to a client device over a network.

圖9B所示的係被配置成用以在網路上和客戶裝置進行通訊的儲存庫的另一範例實施例的方塊圖。 Figure 9B is a block diagram of another exemplary embodiment of a repository for communicating with client devices over a network.

圖10所示的係在網路上操作連接至客戶裝置的儲存庫的範例實施例的方塊圖。 Figure 10 is a block diagram of an exemplary embodiment of operating a repository connected to a client device over a network.

圖11A所示的係部署本發明實施例的電腦網路環境的示意圖。 FIG. 11A is a schematic diagram of a computer network environment in which an embodiment of the present invention is deployed.

圖11B所示的係圖11A的網路中的電腦節點的方塊圖。 Figure 11B is a block diagram of a computer node in the network of Figure 11A.

本文以引用的方式將本文中引述之所有專利案、公開申請案、以及引證案的教示內容併入。本發明之範例實施例的說明如下。 The teachings of all patents, published applications, and citations cited herein are hereby incorporated by reference. An illustration of an exemplary embodiment of the present invention is as follows.

本發明能夠套用至各種標準編碼及代碼單元。在下文中，除非另外提及，否則，「習知(conventional)」和「標準(standard)」等用詞(有時候會和「壓縮」、「編解碼器」、「編碼」、或是「編碼器」一起使用)表示H.264；而「巨集區塊(macroblock)」一般則表示基礎的H.264代碼單元。 The invention can be applied to a variety of standard coding and code units. In the following, unless otherwise mentioned, terms such as "conventional" and "standard" (sometimes with "compression", "codec", "encoding", or "encoding" "Used together" means H.264; and "macroblock" generally represents the underlying H.264 code unit.

產生與保存特徵模型 Generate and save feature models

特徵之定義 Definition of characteristics

本發明的範例元件包含能夠在儲存或傳送時以最佳的方式表示數位視訊資料的視訊壓縮和解壓縮處理。該等處理可能包含介接一(或多個)視訊壓縮/編碼演算法，以便運用視訊資料中空間上、時間上、或是頻譜上的冗餘性和無關性。使用及保留基於特徵模型/參數便可達成此運用目的。接著，「特徵(feature)」和「物體(object)」等用詞可交換使用。物體一般會被定義為「大型特徵」。特徵和物體兩者則能夠用來模擬資料。 Exemplary components of the present invention include the ability to be stored or transferred in an optimal manner Represents video compression and decompression processing of digital video data. Such processing may involve interfacing one (or more) video compression/encoding algorithms to exploit spatial, temporal, or spectral redundancy and independence in the video material. This application can be achieved by using and retaining the feature model/parameters. Then, words such as "feature" and "object" are used interchangeably. Objects are generally defined as "large features." Both features and objects can be used to simulate data.

特徵係由表現資料複雜性之非常靠近的多個圖像點組成的群組。如下面詳述，資料複雜性能夠透過各種準則來檢測；但是，從壓縮的觀點來說，資料複雜性之終極特徵的「編碼成本極高」，這表示以習知視訊壓縮來編碼該等圖像點會超出被視為「有效編碼」的臨界點。當習知編碼器分配不成比例數量的頻寬給特定區域時(因為習知訊框間搜尋無法在習知參考訊框裡面為它們找到良好匹配)，該區域很可能係「特徵豐富(feature-rich)」，而基於特徵模型的壓縮方法便會在此等區域中明顯改善壓縮效果。 The feature is composed of multiple image points that are very close in the complexity of the data. Group. As detailed below, data complexity can be detected by various criteria; however, from a compression point of view, the ultimate feature of data complexity is "very high coding cost", which means that these images are encoded by conventional video compression. The image point will exceed the critical point that is considered "effective coding." When the conventional encoder assigns a disproportionate amount of bandwidth to a particular area (because the inter-frame search cannot find a good match for them in the conventional reference frame), the area is likely to be "feature-feature- Rich), and the feature model based compression method will significantly improve the compression effect in these areas.

特徵檢測 Feature detection

圖1所示的特徵實例10-1、10-2、…、10-n已經在該視訊的一或多個訊框20-1、20-2、…、20-n中被檢測到。一般來說，此特徵能夠利用以下面兩者為基礎的數項準則來檢測：從圖像點處推知的結構性資訊以及表示習知壓縮法運用不成比例數量的頻寬來編碼特徵區域的複雜性準則。每一個特徵實例會藉由如圖1中「區域」30-1、30-2、…、30-n所示之對應的空間範圍或周圍在其訊框20-1、20-2、…、20-n中作進一步的空間辨識。舉例來說，此等特徵區域30-1、30-2、…、30-n會被抽出成為圖像點資料之簡單矩形區域。於本發明的其中一實施例中，該等特徵區域的大小為 16X16，和H.264巨集區塊相同大小。 The feature examples 10-1, 10-2, ..., 10-n shown in Fig. 1 are already in the video One or more frames 20-1, 20-2, ..., 20-n are detected. In general, this feature can be detected using several criteria based on the following: structural information inferred from image points and complexities that indicate that conventional compression uses a disproportionate number of bandwidths to encode feature regions. Sexual criteria. Each feature instance will be in its frame 20-1, 20-2, ... by the corresponding spatial extent shown by "area" 30-1, 30-2, ..., 30-n in Figure 1. Further spatial identification in 20-n. For example, such feature regions 30-1, 30-2, ..., 30-n are extracted as simple rectangular regions of image point data. In one embodiment of the invention, the size of the feature regions is 16X16, the same size as the H.264 macro block.

在以該等圖像點本身的結構為基礎來檢測特徵的文獻中已經提出許多演算法，包含適用於圖像點資料之不同轉換的非參數性特徵檢測演算法類別。舉例來說，尺度不變特徵轉換(Scale Invariant Feature Transform，SIFT)[Lowe,David在2004年於Int.J.of Computer Vision,60(2)：91-110中所發表之「從尺度不變重要關鍵點衍生的獨特影像特徵(Distinctive image features from scale-invariant keypoints)」]使用影像之高斯差函數的捲積(convolution)來檢測類班點(blob-like)特徵。加速強健特徵(Speeded-Up Robust Features，SURF)演算法[Bay,Herbert等人在2008年於Computer Vision and Image Understanding,110(3)：346-359中所發表之「SURF：加速強健特徵」]則使用海森(Hessian)運算子的行列式，同樣用以檢測類班點特徵。於本發明的其中一實施例中使用SURF演算法來檢測特徵。 In the literature that detects features based on the structure of the image points themselves, A number of algorithms have been proposed containing non-parametric feature detection algorithm classes suitable for different transformations of image point data. For example, Scale Invariant Feature Transform (SIFT) [Lowe, David, published in Int. J. of Computer Vision, 60(2): 91-110, 2004. Distinctive image features from scale-invariant keypoints] use convolution of the Gaussian difference function of the image to detect blob-like features. Speeded-Up Robust Features (SURF) algorithm [Bay, Herbert et al., 2008, Computer Vision and Image Understanding, 110(3): 346-359, "SURF: Accelerated Robust Features"] The Hessian operator's determinant is also used to detect the class characteristics. The SURF algorithm is used to detect features in one of the embodiments of the present invention.

其它特徵檢測演算法係被設計用來找尋特定類型的特徵，例如，臉部(face)。於本發明的另一實施例中，類哈爾(Haar-like)特徵會被檢測作為額頭與臉部輪廓檢測的一部分[Viola,Paul和Jones,Michael在2001年於2001 IEEE Conf.on ComputerVision and Pattern Recognition,1：511-518中所發表之「使用簡單特徵之增強級聯的快速物體檢測(Rapid object detection using a boosted cascade of simple features)」]。 Other feature detection algorithms are designed to find specific types of features, such as For example, face. In another embodiment of the invention, Haar-like features are detected as part of the forehead and facial contour detection [Viola, Paul and Jones, Michael in 2001 IEEE Conf. on ComputerVision and Pattern Recognition, "Rapid object detection using a boosted cascade of simple features", published in 1: 511-518.

於另一實施例中，已於本文以引用方式完整併入之2009年 10月6日提申的申請人之共同待審的美國申請案第13/121,904號中討論過，可以習知編碼器遇到的編碼複雜性為基礎來檢測特徵。舉例來說，編碼複雜性可經由分析習知壓縮法(舉例來說，H.264)編碼出現特徵之區域所需要的頻寬(位元的數量)來決定。再次聲明，不同檢測演算法的運算方式雖然不同；但是，各自在實施例中套用至整個視訊資料中完整的視訊訊框序列。於非限制的範例中，第一次編碼作業係利用H.264編碼器來完成並且產生「頻寬映圖(bandwidth map)」。其接著會定義或決定每一個訊框中H.264編碼成本最高的地方。 In another embodiment, 2009 is fully incorporated herein by reference. </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; For example, coding complexity can be required by analyzing conventional compression methods (for example, H.264) to encode regions in which features appear. The bandwidth (the number of bits) is determined. Again, the different detection algorithms operate differently; however, each applies a complete sequence of video frames throughout the video data in the embodiment. In a non-limiting example, the first encoding operation is done using an H.264 encoder and produces a "bandwidth map." It then defines or determines where the H.264 encoding is the most expensive in each frame.

一般來說，習知編碼器(例如，H.264)會將視訊訊框切割成以不重疊圖案排列的多個均勻拼磚(tile)(舉例來說，16X16個巨集區塊和它們的子拼磚)。於其中一實施例中，每一個拼磚皆會以H.264編碼該拼磚所需要之相對頻寬為基礎被分析成為一潛在的特徵。舉例來說，透過H.264來編碼一拼磚所需要之頻寬可以和一固定臨界值作比較，倘若頻寬超過臨界值的話，該拼磚便會被宣告為「特徵」。臨界值可能係一事先設定的數值。該事先設定的數值可以儲存在一資料庫之中，以便在特徵檢測期間容易存取。臨界值可能係一被設為分配給先前已編碼特徵之平均頻寬數的數值。同樣地，臨界值亦可能係一被設為分配給先前已編碼特徵之中位頻寬數的數值。或者，可以計算整個訊框(或是整個視訊)中該等拼磚頻寬的累積分佈函數並且將頻寬落在所有拼磚頻寬之前段百分段(top percentile)中的任何拼磚宣告為「特徵」。 In general, conventional encoders (eg, H.264) cut a video frame into multiple uniform tiles arranged in a non-overlapping pattern (for example, 16×16 macroblocks and their Sub-brick). In one embodiment, each tile is analyzed as a potential feature based on the relative bandwidth required for H.264 encoding the tile. For example, the bandwidth required to encode a tile through H.264 can be compared to a fixed threshold. If the bandwidth exceeds the threshold, the tile will be declared a "feature." The threshold may be a predetermined value. The pre-set values can be stored in a database for easy access during feature detection. The threshold may be a value that is set to the average number of bandwidths assigned to previously encoded features. Similarly, the threshold may also be a value that is set to the number of bit widths assigned to the previously encoded feature. Alternatively, you can calculate the cumulative distribution function of the tile widths in the entire frame (or the entire video) and drop any bandwidth in the top percentile of all tile bandwidths. It is a "feature".

於另一實施例中，視訊訊框會被切割成多個重疊拼磚。該等重疊取樣可能偏移，使得該等重疊拼磚的中心會出現在每四個下方拼磚之角落的交點處。此過完整切割(overcomplete partitioning)意圖提高初始取樣位置會產生被檢測特徵的可能性。亦可以使用其它可能比較複雜的拓樸切割方法。 In another embodiment, the video frame is cut into a plurality of overlapping tiles. The overlapping samples may be offset such that the center of the overlapping tiles will appear at the intersection of the corners of every four lower tiles. This overcomplete partitioning is intended to increase the likelihood that the initial sampling position will result in a detected feature. Other potentially more sophisticated top cutting methods can also be used.

被檢測為特徵之小型空間區域會被分析，用以判斷它們是否可以特定相干性準則(coherency criteria)為基礎組合成較大型空間區域。空間區域的大小會從小型圖像點群組變化至可能對應於真實物體或物體之一部分的較大型區域。然而，重要而要注意的係，該等被檢測特徵未必對應於特有且可分離的實體，例如，物體和子物體。單一特徵可能含有由二或多個物體的元素或是完全不含任何物體元素。對本發明來說，一特徵的關鍵特徵為能夠以基於特徵模型的壓縮技術有效地壓縮(相對於習知的方法)構成該特徵的圖像點集。 Small spatial areas that are detected as features are analyzed to determine if they are It can be combined into a larger spatial region based on specific coherency criteria. The size of the spatial region may vary from a small group of image points to a larger region that may correspond to a real object or a portion of the object. However, it is important to note that the detected features do not necessarily correspond to unique and separable entities, such as objects and sub-objects. A single feature may contain elements of two or more objects or no elements at all. For the purposes of the present invention, a key feature of a feature is the ability to effectively compress (relative to conventional methods) the set of image points that make up the feature with a feature model based compression technique.

用以將小型區域組合成較大型區域的相干性準則可能包含：移動相似性、移動補償後的外觀相似性、以及編碼複雜性相似性。相干的移動可以經由較高階的移動模型來發現。於其中一實施例中，每一個個別小型區域的平移移動會被整合成一能夠近似該等小型區域中每一者之移動模型的仿射移動模型。倘若一小型區域集的移動能夠以一致的基礎被整合成彙總模型的話，這便暗示該等區域之間的相依性，其可能表示可以經由一彙總特徵模型來運用該等小型區域之間的相干性。 Coherence criteria for combining small areas into larger areas may include Includes: mobile similarity, appearance similarity after motion compensation, and coding complexity similarity. Coherent motion can be found via higher order motion models. In one embodiment, the translational movement of each individual small area is integrated into an affine movement model that approximates the movement model of each of the small areas. If the movement of a small set of regions can be integrated into a summary model on a consistent basis, this implies a dependency between the regions, which may indicate that the coherence between the small regions can be applied via a summary feature model. Sex.

特徵模型形成 Feature model formation

於一視訊的多個訊框中檢測到(多個)特徵之後，重要的係將相同特徵的多個實例關聯在一起。此方法稱為特徵關聯結合並且係特徵追蹤(用以隨著時間流逝決定一特殊特徵的位置)的基礎，下面會做說明。然而，為發揮作用，特徵關聯結合方法必須先定義一能夠用來區分相似特徵實例和不相似特徵實例的特徵模型。 After detecting the feature(s) in multiple frames of a video, the important system will Multiple instances of the same feature are associated together. This method is called feature association bonding and is the basis for feature tracking (to determine the location of a particular feature over time), as explained below. However, in order to function, the feature association method must first define a feature model that can be used to distinguish similar feature instances from dissimilar feature instances.

於其中一實施例中，該等特徵圖像點本身會被用來模擬一特徵。二維的特徵圖像點區域會被向量化，而且藉由最小化不同特徵圖像點向量之間的均方差(Mean-Squared Error，MSE)或是最大化不同特徵圖像點向量之間的內積便可辨識相似的特徵。此方法的問題係，特徵圖像點向量會受到特徵中的小幅變化影響，例如，平移、旋轉、縮放、以及改變特徵之照射。特徵經常會在整個視訊中以此等方式改變，所以，使用特徵圖像點向量本身來模擬和關聯結合特徵需要考量此等變化。於其中一實施例中，本發明藉由應用在習知編解碼器(舉例來說，H.264)中所發現到之考量特徵之平移移動的標準移動預測與補償演算法，以最簡單的方式考量此等特徵變化。於其它實施例中，可以使用更複雜的技術逐個訊框考量特徵的旋轉、縮放、以及照明變化。 In one embodiment, the feature image points themselves are used to simulate a special Sign. Two-dimensional feature image point regions are vectorized, and by minimizing the Mean-Squared Error (MSE) between different feature image point vectors or maximizing between different feature image point vectors The inner product can identify similar features. The problem with this approach is that the feature image point vectors are subject to small changes in the features, such as translation, rotation, scaling, and changing the illumination of the features. Features often change in this way throughout the video, so using the feature image point vector itself to simulate and correlate the features needs to take into account these changes. In one embodiment, the present invention is simplest by applying a standard motion prediction and compensation algorithm for the translational movement of the feature found in a conventional codec (for example, H.264). Ways to consider these feature changes. In other embodiments, more complex techniques can be used to consider the rotation, scaling, and illumination variations of the features frame by frame.

於替代實施例中，特徵模型係不變於小幅旋轉、平移、縮放、以及該特徵之可能照明變化的特徵本身的的精簡代表符(「精簡(compact)」的意義為「維度低於原始的特徵圖像點向量」)一其意義為倘若該特徵逐個訊框略微改變的話，該特徵模型仍會維持相對恆定。此類型的精簡特徵模型經常稱為「描述符(descriptor)」。於本發明的其中一實施例中，舉例來說，SURF特徵描述符的長度為64(對照於長度為256的特徵圖像點向量)，並且係以多個哈爾小波轉換響應之總和為基礎。於另一實施例中，會利用該等特徵圖像點的色彩映圖(colormap)來建構有5個箱體的色彩長條分佈圖，而此5分量色彩長條分佈圖則充當特徵描述符。於替代實施例中，特徵區域會透過2D離散餘弦轉換(Discrete Cosine Transform，DCT)來轉換。該等2D DCT係數接著會在係數矩陣的上三角部和下三角部中相加。此等總和接著會構成一邊緣特徵空間並且充當特徵描述符。 In an alternative embodiment, the feature model is unchanged from small rotation, translation, scaling, And a reduced representation of the feature of the possible illumination changes of the feature ("compact" means "dimension is lower than the original feature image point vector"), meaning that if the feature changes slightly from frame to frame The feature model will still remain relatively constant. This type of reduced feature model is often referred to as a "descriptor." In one embodiment of the present invention, for example, the SURF feature descriptor has a length of 64 (contrared to a feature image point vector of length 256) and is based on the sum of multiple Haar wavelet transform responses. . In another embodiment, a color map of the five color bins is constructed by using a color map of the feature image points, and the 5-component color strip map serves as a feature descriptor. . In an alternative embodiment, the feature area is converted by a 2D Discrete Cosine Transform (DCT). The 2D DCT coefficients are then added in the upper and lower triangles of the coefficient matrix. These sums then form an edge feature space and act as feature descriptors.

當使用特徵描述符來模擬特徵時，藉由最小化該等特徵描述符(而非特徵圖像點向量)之間的MSE或是最大化該等特徵描述符之間的內積便可辨識相似的特徵。 When using feature descriptors to simulate features, by minimizing these feature descriptions Similar features can be identified by MSE between symbols (rather than feature image point vectors) or by maximizing the inner product between the feature descriptors.

特徵關聯結合與追蹤 Feature association and tracking

一旦特徵被檢測到並且模擬之後，下一個步驟便係將多個訊框中的相似特徵關聯結合。出現在多個訊框中的特徵的每一個實例都係該特徵之外觀的取樣，而跨越多個訊框關聯結合的多個特徵實例會被視為「屬於」相同的特徵。一旦關聯結合之後，屬於相同特徵的多個特徵實例便可以彙總形成一特徵軌跡(feature track)或是聚集成總體矩陣(ensemble matrix)40(圖1)。 Once the feature is detected and simulated, the next step will be multiple messages. Similar features in the box are associated with each other. Each instance of a feature that appears in multiple frames is a sample of the appearance of the feature, and multiple instances of the feature that are combined across multiple frames are considered to be "belonging" to the same feature. Once the associations are combined, multiple feature instances belonging to the same feature can be aggregated to form a feature track or aggregated into an ensemble matrix 40 (FIG. 1).

特徵軌跡的定義係特徵的(x,y)位置和視訊中之訊框的函數關係。其中一實施例會將新檢測到的特徵實例與先前追蹤的特徵關聯結合(或者，於視訊之第一訊框的情況中，則會與先前檢測到的特徵關聯結合)作為決定目前訊框中哪些特徵實例係先前建立之特徵軌跡之延伸部的基礎。利用先前建立之特徵軌跡(或者，於第一視訊訊框的情況中，則係利用先前檢測到的特徵)來辨識目前訊框中某一個特徵實例便構成該特徵之追蹤。 The definition of the feature trajectory is the function of the (x, y) position of the feature and the frame in the video. relationship. In one embodiment, the newly detected feature instance is associated with the previously tracked feature association (or, in the case of the first frame of the video, it is associated with the previously detected feature association) as a decision as to which of the current frames. The feature instance is the basis for the extension of the previously established feature track. Using a previously created feature track (or, in the case of a first video frame, utilizing previously detected features) to identify a feature instance in the current frame constitutes tracking of the feature.

圖2圖解使用特徵追蹤器70來追蹤特徵60-1、60-2、…、 60-n。特徵檢測器80(舉例來說，SIFT或SURF)係用來辨識目前訊框中的特徵。目前訊框90中被檢測到的特徵實例會與先前檢測(或追蹤)的特徵50進行匹配。於其中一實施例中，在關聯結合步驟之前會先利用自動關聯分析(Auto-Correlation Analysis，ACA)指標(其會以特徵的自動關聯矩陣為基礎來測量特徵強度)來排序目前訊框中的候選特徵檢測集，利用高斯導數濾波器來計算自動關聯矩陣中的影像梯度，在Harris-Stephens角點檢測演算法[Harris、Chris、以及Mike Stephens於1988年在第4屆Alvey Vision Conference會議記錄第147至151頁中所發表的「組合式角點與邊緣檢測器(A combined corner and edge detector)」]中可以找到。具有高ACA數值的特徵實例會被賦予作為軌跡延伸部之候選特徵實例的優先權。於其中一實施例中，ACA排序清單中較低位的特徵實例倘若落在清單中一較高位特徵實例的特定距離(舉例來說，一個圖像點)裡面的話便會從候選特徵集中刪除。 Figure 2 illustrates the use of feature tracker 70 to track features 60-1, 60-2, ..., 60-n. Feature detector 80 (for example, SIFT or SURF) is used to identify features in the current frame. The feature instances detected in the current frame 90 will match the previously detected (or tracked) features 50. In one embodiment, an Auto-Correlation Analysis (ACA) indicator (which will be based on an automatic association matrix of features) is used prior to the association step. The feature intensity is used to sort the candidate feature detection set in the current frame, and the Gaussian derivative filter is used to calculate the image gradient in the automatic correlation matrix, and the Harris-Stephens corner detection algorithm is used [Harris, Chris, and Mike Stephens in 1988). It can be found in the "A combined corner and edge detector" published on pages 147 to 151 of the 4th Alvey Vision Conference. A feature instance with a high ACA value will be given priority as a candidate feature instance for the track extension. In one embodiment, the lower-level feature instances in the ACA sort list are deleted from the candidate feature set if they fall within a particular distance (eg, an image point) of a higher-level feature instance in the list.

於不同的實施例中，特徵描述符(舉例來說，SURF描述符) 或是該等特徵圖像點本身皆可充當特徵模型，用以達到決定軌跡延伸部的目的。於其中一實施例中，會以每次一個的方式測試圖2中的區域60-1、60-2、…、60-n所示之先前追蹤特徵是否為目前訊框90中新檢測特徵中的軌跡延伸部。於其中一實施例中，每一個特徵軌跡的最近特徵實例都會在目前訊框中搜尋軌跡延伸部時充當焦點(或是「目標特徵」)。目前訊框中落在目標特徵之位置的特定距離(舉例來說，16個圖像點)裡面的所有候選特徵檢測都會被測試，而具有相對於該目標特徵之最小MSE(在圖像點空間中或是在描述符空間中)的候選特徵則會被選為該特徵軌跡的延伸部。於另一實施例中，倘若一候選特徵相對於該目標特徵之MSE大於特定臨界值的話便會被取消作為軌跡延伸部的資格。 In different embodiments, feature descriptors (for example, SURF descriptors) Or the feature image points themselves can serve as feature models for the purpose of determining the trajectory extension. In one embodiment, whether the previous tracking feature shown by the regions 60-1, 60-2, ..., 60-n in FIG. 2 is tested in the new detection feature in the current frame 90 is performed one at a time. Track extension. In one embodiment, the most recent feature instance of each feature trajectory acts as a focus (or "target feature") when searching for the trajectory extension in the current frame. All candidate feature detections within a particular distance (for example, 16 image points) at the location of the target feature are currently tested, with a minimum MSE relative to the target feature (in the image point space) Candidate features in the middle or in the descriptor space are then selected as extensions of the feature trajectory. In another embodiment, a candidate feature is disqualified as a track extension if the MSE of the target feature is greater than a certain threshold.

於進一步實施例中，倘若在目前訊框中沒有任何候選特徵檢測符合一給定特徵軌跡之延伸部的資格的話，便可以利用H.264裡面的移動補償預測(Motion Compensated Prediction，MCP)演算法或是通用的移動預測和補償(Motion Estimation and Compensation，MEC)演算法針對目前訊框中的匹配區域進行有限的搜尋。MCP和MEC兩者皆針對目前訊框的匹配區域進行梯度下降搜尋，其會最小化相對於先前訊框中之目標特徵的MSE(並且符合MSE臨界值)。倘若在目前訊框中沒有找到目標特徵之任何匹配的話(從候選特徵檢測中或是從，CP/MEC搜尋過程中)，該對應的特徵軌跡便會被宣告為「無效(dead)」或「終止(terminated)」。 In a further embodiment, if there is no candidate feature check in the current frame If you qualify for an extension of a given feature trajectory, you can use the Motion Compensated Prediction (MCP) algorithm in H.264 or the general motion prediction and The Motion Estimation and Compensation (MEC) algorithm performs a limited search for matching regions in the current frame. Both the MCP and the MEC perform a gradient descent search for the matching region of the current frame, which minimizes the MSE relative to the target feature in the previous frame (and meets the MSE threshold). If any match of the target features is not found in the current frame (from the candidate feature detection or from the CP/MEC search process), the corresponding feature track will be declared as "dead" or " Terminated.

於進一步實施例中，倘若二或多個特徵軌跡在目前訊框中的特徵實例因大於特定臨界值而同位(coincide)的話(舉例來說，70%重疊)，那麼僅有一個特徵軌跡會被留下，而所有其它特徵軌跡皆會被刪除或是不作進一步考慮。該刪除過程會讓特徵軌跡保持最長歷史記錄並且保持最大總ACA，所有特徵實例中的總和。 In a further embodiment, if two or more feature tracks are in the current frame If the feature instance is coined by a certain threshold (for example, 70% overlap), then only one feature track will be left, and all other feature tracks will be deleted or not further considered. This deletion process keeps the feature track the longest history and maintains the maximum total ACA, the sum of all feature instances.

於另一實施例中會對特徵軌跡實施中點正規化，其會先計算軌跡位置組成的「平滑」集並且接著調整「遠離」已正規化中點之特徵的位置(該過程稱為中心調整)。 In another embodiment, the midpoint normalization is performed on the feature trajectory, which is calculated first. The "smooth" set of track positions and then adjusts the position of the "away" feature of the normalized midpoint (this process is called center adjustment).

總結上述，下面的步驟係本發明眾多實施例共同的步驟：特徵檢測(SURF或臉部)；特徵模擬(SURF描述符、頻譜長條分佈圖)；基於ACA的候選特徵排序；以及透過MSE之最小化在多個候選特徵中進行特徵關聯結合與追蹤，其會輔以MCP/MEC搜尋軌跡延伸部以及軌跡之中心調整。倘若利用SURF描述符來進行追蹤的話，該處理串流(processing stream)便稱為SURF追蹤器。倘若利用色彩長條分佈圖來進行追蹤的話，該處理串流則稱為頻譜追蹤器。 Summarizing the above, the following steps are common to the various embodiments of the present invention: Sign detection (SURF or face); feature simulation (SURF descriptor, spectral strip profile); ACA-based candidate feature ranking; and feature association combining and tracking among multiple candidate features through MSE minimization It will be supplemented by the MCP/MEC search trajectory extension and the center adjustment of the trajectory. The processing stream is referred to as a SURF tracker if the SURF descriptor is used for tracking. If the color strip map is used for tracking, the processing stream is called a spectrum tracker.

圖3所示的係上面所述之基於特徵處理串流(Feature-based Processing Stream，FPS)中的基礎步驟的流程圖300。假定一特殊視訊，FPS會從310檢測該視訊中的特徵開始。接著，FPS會關聯(關聯結合)該等已檢測特徵的實例320，其可能使用精簡特徵模型，而非特徵圖像點本身。接著，FPS會追蹤該等被檢測且已關聯的特徵312。FPS還會決定該等特徵之關聯實例中的相似性322並且對該等特徵之相似的關聯實例實施中點正規化324。FPS會以該等已正規化中點為基礎來調整該等關聯特徵的中心326，並且接著反向送進追蹤器之中312。FPS會以已追蹤的特徵312、特徵的關聯實例320、以及具有已調整中心之特徵的關聯實例326中的至少其中一者為基礎來彙集由已關聯結合之特徵實例組成的特徵集314。於進一步實施例中，FPS會以不同的準則為基礎來分割或合併特徵集316。FPS還會設定模型的狀態318(舉例來說，設為隔離)。於其中一實施例中，隔離模型係不應該再被編輯的模型，因為其已經結束。於另一實施例中，儲存庫還會在模型的產生期間分析視訊的複雜性328。熟習本技術的人士便會瞭解，上面所述的方法能夠以任何順序來執行，而且不需要以上面所述的順序來進行。 Figure 3 shows the feature-based streaming described above (Feature-based Flowchart 300 of the basic steps in Processing Stream, FPS). Assuming a special video, the FPS will start from 310 detecting the features in the video. Next, the FPS associates (associates) instances 320 of the detected features, which may use a reduced feature model rather than the feature image point itself. The FPS then tracks the detected and associated features 312. The FPS also determines the similarity 322 in the associated instances of the features and implements midpoint normalization 324 for similar associated instances of the features. The FPS will adjust the center 326 of the associated features based on the normalized midpoints and then forward the feed into the tracker 312. The FPS will aggregate the feature set 314 consisting of the associated combined feature instances based on at least one of the tracked feature 312, the associated instance 320 of the feature, and the associated instance 326 having the characteristics of the adjusted center. In a further embodiment, the FPS may split or merge feature sets 316 based on different criteria. The FPS also sets the state of the model 318 (for example, set to isolation). In one embodiment, the isolation model is a model that should not be edited again because it has ended. In another embodiment, the repository also analyzes the complexity of the video 328 during the generation of the model. Those skilled in the art will appreciate that the methods described above can be performed in any order and need not be performed in the order described above.

特有特徵和特徵叢集 Unique features and feature clusters

上面的段落概述如何經由視訊(為清楚起見，本文稱為「輸入」視訊)中的訊框來檢測、模擬、關聯結合、以及追蹤特徵。本發明試圖保留或「續留」輸入視訊裡面可用於在另一「目標」視訊(定義為要被編碼的視訊)裡面改善效果的所有特徵資訊。於其中一實施例中，特徵資訊係儲存在檔案之中。於其它實施例中，特徵資訊可以儲存在關係資料庫(relational database)、物體資料庫(object database)、NoSQL資料庫、或是其它資料結構之中。和特徵資訊之儲存有關的更多細節下面會作說明。然而，為用於且有效改善另一視訊中的壓縮效果，來自輸入視訊中的特徵資訊必須廣泛但是簡潔地捕捉輸入視訊的特徵內容。 The above paragraph outlines how to go through video (for clarity, this article is called "losing" Into the video frame to detect, simulate, correlate, and track features. The present invention seeks to preserve or "renew" all of the feature information in the input video that can be used to improve the effect in another "target" video (defined as the video to be encoded). In one embodiment, the feature information is stored in the archive. In other embodiments, the feature information can be stored in a relational database, an object database, a NoSQL database, or other data structures. More details regarding the storage of feature information are described below. However, for Effectively improving the compression effect in another video, the feature information from the input video must capture the feature content of the input video widely but succinctly.

在特徵檢測步驟、模擬步驟、關聯結合步驟、以及追蹤步驟之後，輸入視訊中的特徵資訊便會併入特徵軌跡集之中。為將此資訊簡化成合宜精簡卻具代表性之形式，第一步驟係選擇每一個特徵軌跡的代表性或特有特徵。於其中一實施例中，一給定特徵軌跡的特有特徵係該軌跡中的第一個(最前面)特徵實例。於另一實施例中，一給定特徵軌跡的特有特徵係該軌跡中所有特徵實例的算術平均值。選擇每一個特徵軌跡的特有特徵會將輸入視訊的特徵資訊從特徵軌跡集簡化成特有特徵集。 Feature detection step, simulation step, association combining step, and tracking step After that, the feature information in the input video will be incorporated into the feature track set. To simplify this information into a convenient and representative form, the first step is to select representative or characteristic features of each feature trajectory. In one embodiment, the unique feature of a given feature trajectory is the first (frontmost) feature instance in the trajectory. In another embodiment, the unique feature of a given feature trajectory is the arithmetic mean of all feature instances in the trajectory. Selecting the unique feature of each feature track will simplify the feature information of the input video from the feature track set to the unique feature set.

將輸入視訊中的特徵資訊簡化成合宜精簡卻具代表性形式的下一個步驟係將相似的特有特徵叢集在一起。特有特徵會利用本技術中熟知的技術集合或叢集在一起。於追蹤器係上面詳述之頻譜追蹤器的實施例中，叢集會以該等特有特徵的頻譜色彩映圖為基礎。來自YUV色彩空間資料的「U」和「V」(色度)分量會被視為二分量向量。該等U/V分量的不同數值對應於一頻譜色彩映圖中不同的顏色。一長條分佈圖會從該色彩映圖處產生並且可能含有概括U/V分量數值之完整範圍的任何數量k的箱體。於其中一範例實施例中，k=5。於追蹤器係上面詳述之SURF追蹤器的另一實施例中，叢集會以該等特有特徵的長度64 SURF特徵描述符向量為基礎。一旦建立用於叢集的特徵模型域(舉例來說，上面範例中的色彩長條分佈圖或是SURF描述符)之後，便可以應用任何標準的叢集演算法來實施叢集。於一較佳的實施例中，會利用k均值叢集演算法(k-means clustering algorithm)來完成叢集。該k均值演算法會將輸入視訊中的所有特有特徵分派給m個叢集中的其中一者。，於其中一範例實施例中，m=5。對每一個叢集來說，該k均值演算法會計算代表該等叢集成員之算術平均值的質心(centroid)。 Simplify the feature information in the input video into a convenient and representative form The next step is to bring together similar unique features. Unique features are gathered or clustered together using techniques well known in the art. In an embodiment of the tracker detailed above in the tracker, the cluster is based on the spectral color map of the unique features. The "U" and "V" (chrominance) components from the YUV color space data are treated as two-component vectors. The different values of the U/V components correspond to different colors in a spectral color map. A long profile will be generated from the color map and may contain any number k of boxes that summarize the full range of U/V component values. In one exemplary embodiment, k=5. In another embodiment of the SURF tracker detailed above in the tracker, the cluster is based on the length 64 SURF feature descriptor vector of the unique features. Once the feature model fields for the cluster (for example, the color strip map or the SURF descriptor in the above example) are established, any standard cluster algorithm can be applied to implement the cluster. In a preferred embodiment, the clustering is accomplished using a k-means clustering algorithm. The k-means algorithm divides all the unique features in the input video. Assigned to one of the m clusters. In one exemplary embodiment, m=5. For each cluster, the k-means algorithm computes the centroid representing the arithmetic mean of the cluster members.

圖4A與圖4B所示的係根據頻譜追蹤器(圖4A)和SURF追蹤器(圖4B)之範例施行方式的基於特徵顯示工具的螢幕截圖。每一個圖式中的左上方顯示器所示的係以特徵的頻譜色彩映圖(圖4A中的416)或是特徵的SURF描述符(圖4B中的420)為基礎來叢集特有特徵之後的結果。每一個特有特徵皆代表一或多個特徵成員(414中的24)。每一個叢集(416中為十個，而420中為十二個)係由叢集質心的圖像點以及該質心的對應特徵模型(416中的頻譜色彩映圖或是420中的SURF描述符)來表示。每一個叢集皆有特定數量的特有特徵成員。對圖4A中的範例色彩頻譜叢集418來說有八個特有特徵成員418，而對圖4B中的範例SURF叢集420來說則有二十多個特有特徵成員422。 Figures 4A and 4B are based on the spectrum tracker (Figure 4A) and SURF A screenshot of the feature-based display tool for the example implementation of the tracer (Fig. 4B). The result shown in the upper left display of each figure is the result of clustering the unique features based on the characteristic spectral color map (416 in Figure 4A) or the SURF descriptor of the feature (420 in Figure 4B). . Each unique feature represents one or more feature members (24 of 414). Each cluster (ten in 416 and twelve in 420) is the image point from the cluster centroid and the corresponding feature model of the centroid (the spectral color map in 416 or the SURF description in 420) Symbol) to indicate. Each cluster has a specific number of unique feature members. There are eight unique feature members 418 for the example color spectrum cluster 418 of Figure 4A and twenty more unique feature members 422 for the example SURF cluster 420 of Figure 4B.

應該注意的係，該等m個叢集中的成員太多的話，可以實施第二層子叢集。於其中一範例實施例中，m個色彩頻譜叢集中的每一者都會被分成1個子叢集，其中，m為5，而1的範圍從2至4。 It should be noted that if there are too many members in these m clusters, Apply a second layer of sub-clusters. In one exemplary embodiment, each of the m color spectral clusters is divided into 1 sub-cluster, where m is 5 and 1 ranges from 2 to 4.

在形成m個叢集組成的初始集之後，將輸入視訊中的特徵資訊簡化成合宜精簡卻具代表性形式的最後步驟係選擇由n個叢集成員組成的子集來代表每一個叢集。需要此步驟的原因係，一叢集可能會有許多個叢集成員，而代表性叢集元素的數量n必須非常小以便有效的使用在壓縮中；於其中一範例實施例中，n=5。代表性叢集元素的選擇經常係以叢集質心為基礎。於其中一實施例中，會以最小冗餘方式，使用正交匹配追蹤 (Orthogonal Matching Pursuit，OMP)演算法來選擇最近似於叢集質心的n個叢集成員。於另一實施例中，該等n個叢集成員會被選為和叢集質心具有最大的內積；依此方式選出的叢集成員的冗餘性會大於利用OMP選出的叢集成員。 After forming the initial set of m clusters, the features in the video will be input. The final step in simplifying the information into a neat, yet representative form is to select a subset of n cluster members to represent each cluster. The reason for this step is that a cluster may have many cluster members, and the number n of representative cluster elements must be very small for efficient use in compression; in one exemplary embodiment, n=5. The choice of representative cluster elements is often based on cluster centroids. In one embodiment, orthogonal matching tracking is used in a minimally redundant manner. The (Orthogonal Matching Pursuit, OMP) algorithm is used to select the n cluster members that most closely approximate the cluster centroid. In another embodiment, the n cluster members are selected to have the largest inner product with the cluster centroid; the cluster members selected in this manner will have greater redundancy than the cluster members selected using OMP.

一旦選出每一個叢集的最具代表性叢集成員之後，便要準備保存輸入視訊的特徵資訊。被保存的特徵資訊係由n個叢集成員中的m個叢集所組成，每一個叢集成員皆係一特殊特徵軌跡中的特有特徵，而且每一個特有特徵都會有和其對應特徵區域相關聯的一組圖像點(於本發明的其中一實施例中為16X16)。每一個叢集都會有一質心(其圖像點會被保存)以及相關聯的特徵模型(舉例來說，色彩長條分佈圖或是SURF描述符)。另外，因為本發明使用該被保存的特徵資訊來編碼(用以進行「偏移處理(offset processing)，進一步細節參見下文」)之方式的關係，該被保存的特徵資訊還會由每一個叢集成員之特徵區域周遭區域的圖像點所組成。於其中一實施例中，「周遭區域」的定義為於任何方向中落在一16X16巨集區塊裡面的區域，所以，一特徵區域和它的周遭區域構成包括一48X48的超區域(super-region)。因此，該被保存的特徵資訊係由n個超區域中的m個叢集的圖像點所構成，加上該等m個叢集質心的圖像點與特徵模型。 Once you have selected the most representative cluster members of each cluster, you are ready to save the feature information for the input video. The saved feature information is composed of m clusters among n cluster members, each cluster member is a unique feature in a special feature track, and each unique feature has a associated with its corresponding feature region. The set of image points (16X16 in one of the embodiments of the present invention). Each cluster will have a centroid (the image points will be saved) and associated feature models (for example, a color strip map or a SURF descriptor). In addition, because the present invention uses the saved feature information to encode (for "offset processing, see below for details"), the saved feature information is also included in each cluster. The image points of the surrounding area of the feature area of the member. In one embodiment, the "peripheral area" is defined as an area that falls within a 16X16 macroblock in any direction, so that a feature area and its surrounding area comprise a 48X48 super area (super- Region). Therefore, the saved feature information is composed of image points of m clusters in n super regions, and image points and feature models of the m cluster centroids are added.

上面概述之基於特徵處理串流(特徵檢測、模擬、關聯結合、追蹤、特有特徵選擇、叢集、叢集成員選擇、以及特徵資訊之保存)可從一個輸入視訊延伸至多個輸入視訊。於一個以上輸入視訊的情況中則會使用代表所有輸入視訊之特徵軌跡的特有特徵來創造該等叢集。 The feature-based processing stream (feature detection, simulation, association combining, tracking, unique feature selection, clustering, cluster member selection, and preservation of feature information) outlined above can be extended from one input video to multiple input video. In the case of more than one input video, the unique features representing the characteristic trajectories of all input video are used to create the clusters.

再利用特徵模型來進行離線基於特徵壓縮 Reusing feature models for offline feature-based compression

基於模型壓縮架構 Model-based compression architecture

一旦上面概述之基於特徵處理串流(圖3中的300)應用至一輸入視訊(或多個輸入視訊)時，該被保存的特徵資訊便會被再利用，用以改善一「目標」視訊(要被編碼的視訊，很可能不同於該(等)輸入視訊)中的壓縮效果。再利用特徵資訊來進行壓縮出現在共同待審的940號申請案中概述之基於模型壓縮架構(MBCF)裡面，下文包含該案的相關元件(並且通常以924來表示)。 Once the feature-based stream (300 in Figure 3) outlined above is applied to one When inputting video (or multiple input video), the saved feature information will be reused to improve a "target" video (the video to be encoded is likely to be different from the input video). The compression effect. Reusing feature information for compression appears in the Model-Based Compression Architecture (MBCF) outlined in the co-pending Application No. 940, which contains the relevant components of the case (and is typically indicated by 924).

MBCF係從和上面概述之基於特徵處理串流相似的步驟開始：特徵會被檢測、模擬、並且關聯結合，但是，係以目標視訊為基準。於一較佳的實施例中，該等特徵係利用SURF檢測演算法來檢測並且利用SURF描述符來模擬和關聯結合。 The MBCF begins with a similar step to the feature processing stream outlined above: features are detected, simulated, and associated, but based on the target video. In a preferred embodiment, the features are detected using a SURF detection algorithm and the SURF descriptors are used to simulate and correlate the combinations.

接著，MBCF會使用特徵軌跡來讓特徵和巨集區塊產生關係，如圖5之中所示。一給定的特徵軌跡會表示一特徵在多個訊框中的位置，而且會有該特徵跨越多個訊框的相關聯移動。利用該特徵在目前訊框之前的兩個最近訊框中的位置可以將該特徵的位置投影在目前訊框之中。接著，該投影特徵位置會有一相關聯之最接近的巨集區塊，其定義為和該投影特徵位置有最大重疊的巨集區塊。此巨集區塊(現在為正在被編碼的目標巨集區塊)已經和在目前訊框中的投影位置鄰近於該巨集區塊的一特定特徵軌跡關聯結合。(圖5中的500)。單一巨集區塊可能會和多個特徵關聯結合，所以，MBCF的其中一實施例會選擇和該巨集區塊有最大重疊的特徵作為該巨集區塊的關聯結合特徵。 Next, MBCF uses the feature trajectory to make the feature and the macro block have a relationship, as shown in Figure 5. A given feature trajectory will indicate the location of a feature in multiple frames, and there will be associated movement of the feature across multiple frames. The location of the feature can be projected into the current frame using the location of the feature in the two nearest frames preceding the current frame. Next, the projected feature location has an associated macroblock that is defined as a macroblock that has the greatest overlap with the projected feature location. This macroblock (now the target macroblock being encoded) has been associated with a particular feature trajectory adjacent to the macroblock's projected position in the current frame. (500 in Figure 5). A single macroblock may be associated with multiple features, so one embodiment of the MBCF will select the feature with the greatest overlap with the macroblock as the associated binding feature of the macroblock.

接著，MBCF會計算目標巨集區塊與目前訊框中之投影特徵位置之間的偏移510。當MBCF以上線模式操作時(完全從相同視訊中的早期已解碼圖像點中來產生預測值)，此偏移會藉由利用該相關聯特徵的軌跡中的早期特徵實例來產生目標巨集區塊的預測值。利用早期特徵實例中和目標巨集區塊與目前訊框中之投影特徵位置之間的偏移相同的偏移在參考訊框中尋找該等區域便可以產生該目標巨集區塊的上線式預測值(520，530)。 Next, MBCF calculates the projection features of the target macroblock and the current frame. Offset 510 between positions. When MBCF operates in line mode (completely generating prediction values from early decoded image points in the same video), this offset generates a target macro by using early feature instances in the trajectory of the associated feature. The predicted value of the block. The on-line type of the target macroblock can be generated by using the same offset in the early feature instance and the offset between the target macroblock and the projected feature position in the current frame to find the regions in the reference frame. Predicted value (520, 530).

假定已知一目標巨集區塊(正在被編碼的目前巨集區塊)、其相關聯的特徵、以及該特徵的特徵軌跡，MBCF會產生該目標巨集區塊的主要或關鍵預測值。關鍵預測值的資料(圖像點)來自出現該特徵(目前訊框之前)的最近訊框，下文中稱為關鍵訊框。關鍵預測值係在選擇移動模型和圖像點取樣技術之後產生。於MBCF的其中一實施例中，該移動模型可能為「第0階」(假設特徵在關鍵訊框和目前訊框之間為靜止)或是「第1階」(假設特徵移動在第二最近參考訊框、關鍵訊框、以及目前訊框之間為線性)。於任一情況中，特徵的移動都會被套用至(在向後的時間方向中)目前訊框中相關聯的巨集區塊，用以取得關鍵訊框中該巨集區塊的預測值。於MBCF的其中一實施例中，圖像點取樣技術可能為「直接式」(其中，移動向量會捨入至最接近的整數，而且關鍵預測值的圖像點係直接取自關鍵訊框)，或是「間接式」(其中，會使用習知壓縮法的內插技術(例如，H.264)來推知移動補償關鍵預測值)。因此，MBCF發明相依於移動模式(第0階或第1階)和取樣技術(直接式或間接式)會有四種不同類型的關鍵預測值。 Assuming that a target macroblock (the current macroblock being encoded), its associated features, and the feature trajectory of the feature are known, the MBCF will generate a primary or critical predictor of the target macroblock. The key predictive data (image points) comes from the most recent frame where the feature appears (before the current frame), hereinafter referred to as the key frame. Key predictive values are generated after selecting the moving model and image point sampling techniques. In one embodiment of the MBCF, the motion model may be "0th order" (assuming the feature is stationary between the key frame and the current frame) or "1st order" (assuming the feature moves in the second closest) The reference frame, key frame, and current frame are linear). In either case, the movement of the feature is applied (in the backward time direction) to the macroblock associated with the current frame to obtain the predicted value of the macroblock in the key frame. In one embodiment of MBCF, the image point sampling technique may be "direct" (where the motion vector is rounded to the nearest integer, and the image points of the key prediction values are taken directly from the key frame) , or "indirect" (where the interpolation technique of the conventional compression method (for example, H.264) is used to infer the key value of the motion compensation). Therefore, the MBCF invention has four different types of key predictions depending on the motion mode (0th or 1st order) and the sampling technique (direct or indirect).

MBCF還會藉由細緻化(subtiling)過程模擬局部變形來產生細化的(refined)關鍵預測值。在細緻化過程中會計算該巨集區塊之不同局部部分的不同移動向量。於MBCF的其中一實施例中，藉由將16X16巨集區塊分割成四個8X8象限並且分開計算每一者的預測值可以達成細緻化。於另一實施例中，藉由分開計算Y、U、以及V色彩頻道的預測值可以在Y/U/V色彩空間域中實行細緻化。 MBCF also produces refined key predictions by simulating local deformations by a subtiling process. Different parts of the macroblock are calculated during the miniaturization process Part of the different moving vectors. In one embodiment of the MBCF, refinement can be achieved by dividing the 16×16 macroblock into four 8×8 quadrants and separately calculating the predicted values for each. In another embodiment, the refinement can be performed in the Y/U/V color space domain by separately calculating the predicted values of the Y, U, and V color channels.

除了目標巨集區塊的主要/關鍵預測值之外，MBCF還會以關鍵訊框之前的參考訊框中相關聯特徵的位置為基礎產生次要預測值。於其中一實施例中，從目標巨集區塊至目前訊框中該相關聯特徵之(投影)位置的偏移代表移動向量，其可用來從該特徵在過去參考訊框中的位置找尋次要預測值。依此方式，針對具有一相關聯特徵的給定目標巨集區塊可能會產生大量的次要預測值(該特徵先前已出現過的每一個訊框各一個次要預測值)。於其中一實施例中，次要預測值的數量會因將搜尋侷限至特定合理數量(舉例來說，25個)的過去參考訊框而受到限制。 In addition to the main/critical predictions of the target macroblock, MBCF will The secondary predicted value is generated based on the position of the associated feature in the reference frame before the key frame. In one embodiment, the offset from the target macroblock to the (projected) position of the associated feature in the current frame represents a motion vector that can be used to find the location of the feature in the past reference frame. To predict the value. In this manner, a given target macroblock with an associated feature may generate a large number of secondary predictions (one secondary prediction for each frame that has previously occurred). In one embodiment, the number of secondary predictions is limited by the ability to limit the search to a specific reasonable number (for example, 25) of past reference frames.

一旦產生一目標巨集區塊的主要(關鍵)預測值和次要預測值之後，便可以此等預測值為基礎來計算該目標巨集區塊的整體重建。於MBCF的其中一實施例中，遵循習知編碼器，該重建僅以關鍵預測值為基礎，因此，稱為純關鍵(Key-Only，KO)重建。 Once the primary (critical) predictions and secondary predictions for a target macroblock are generated After the value, the overall reconstruction of the target macroblock can be calculated based on the predicted values. In one embodiment of the MBCF, a conventional encoder is followed that is based only on critical predictive values and is therefore referred to as a Key-Only (KO) reconstruction.

於MBCF的另一實施例中，該重建係以加總關鍵預測值和該等次要預測值中其中一者之加權版本的複合預測值為基礎。此演算法在下文中稱為PCA-Lite(PCA-L)，其包含下面步驟：產生目標巨集區塊的向量(1-D)版本和關鍵預測值。接著，以目標向量t和關鍵向量k來表示此等結果。 In another embodiment of MBCF, the reconstruction is to add key predictions and The composite prediction value of the weighted version of one of the secondary prediction values is based. This algorithm is hereinafter referred to as PCA-Lite (PCA-L) and includes the following steps: generating a vector (1-D) version of the target macroblock and key prediction values. Next, these results are represented by the target vector t and the key vector k.

以目標向量扣除關鍵向量，用以計算殘值向量r。 The key vector is subtracted from the target vector to calculate the residual vector r.

向量化該次要預測值集，用以形成向量si(一般來說，其假設此等次要向量具有單位範數(unit norm))。接著，以所有該等次要向量扣除關鍵向量，用以形成關鍵扣除集si-k。其近似效應係該等次要向量投影至該關鍵向量。 The secondary set of predictive values is vectorized to form a vector si (generally, it assumes that these secondary vectors have a unit norm). The key vectors are then subtracted from all of the secondary vectors to form a key deduction set si-k. The approximation effect is that the secondary vectors are projected onto the key vector.

計算每一個次要向量的權值c=r^T(s_i-k)。 Calculate the weight of each secondary vector c=r^T(s_i-k).

計算每一個次要向量的複合預測值t^=k+c．(s_i-k)。 Calculate the composite predicted value of each secondary vector t^=k+c. (s_i-k).

一般來說，PCA-Lite演算法中的步驟近似眾所熟知的正交匹配追蹤演算法[Pati,Y.C.等人在1993年於第27屆Asilomar Conference會議記錄第40至44頁中所發表的「正交匹配追蹤：套用至小波分解的遞歸函數近似(Orthogonal matching pursuit：Recursive function approximation with applications to wavelet decomposition)」]中的運算，複合預測值的意義為該等主要預測值和次要預測值具有非冗餘貢獻。於另一實施例中，上面所述的PCA-Lite演算法經過修正，俾便由關鍵向量和次要向量的平均值來取代上面步驟3至5中的關鍵向量。此修正後的演算法於下文中稱為PCA-Lite-Mean。 In general, the steps in the PCA-Lite algorithm approximate the well-known orthogonal matching pursuit algorithm [Pati, YC et al., 1993, at the 27th Asilomar Conference, pages 40-44. Orthogonal matching tracking: Orthogonal matching pursuit (Recursive function approximation with applications to wavelet decomposition)], the meaning of the composite prediction value is that the main prediction value and the secondary prediction value have Non-redundant contribution. In another embodiment, the PCA-Lite algorithm described above is modified to replace the key vectors in steps 3 through 5 above by the average of the key vector and the secondary vector. This modified algorithm is hereinafter referred to as PCA-Lite-Mean.

PCA-Lite演算法提供和在某些標準編解碼器中發現到之雙向預測演算法不同類型的複合預測值。標準雙向預測演算法運用以個別預測值之參考訊框和目前訊框相隔的時間距離為基礎之多個預測值的摻混值。相反地，PCA-Lite則以個別預測值之內容為基礎將多個預測值摻混成一複合預測值。 The PCA-Lite algorithm provides different types of composite prediction values for bidirectional prediction algorithms found in some standard codecs. The standard bi-predictive algorithm uses a blended value of a plurality of predicted values based on the temporal distance between the reference frame of the individual predicted values and the current frame. Conversely, PCA-Lite blends multiple predictive values into a composite predictive value based on the content of individual predicted values.

請注意，形成如上面所述的複合預測值不需要基於特徵的模擬；複合預測值可以由一給定目標巨集區塊的多個預測值所組成之任何集合中來形成。然而，基於特徵的模擬則提供一由給定目標巨集區塊的多個預測值所組成的自然關聯結合集，而且複合預測值提供一種有效的方式組合來自此等多個預測值的資訊。 Note that forming a composite prediction value as described above does not require feature-based simulation; the composite prediction value can be formed from any set of multiple prediction values for a given target macroblock. However, feature-based simulation provides multiple blocks from a given target macroblock. A set of natural associations of predicted values, and composite predictors provide an efficient way to combine information from such multiple predictors.

用於基於模型壓縮架構中之離線串流的模型再利用 Model reuse for offline streaming in model-based compression architecture

基於模型壓縮架構(MBCF)還可以操作在離線模式中，利用如上面概述之基於特徵處理串流所產生和儲存的特徵資訊。 The Model Based Compression Architecture (MBCF) can also operate in offline mode, utilizing feature information generated and stored based on the feature processing stream as outlined above.

於其中一實施例中，離線模式MBCF會利用SURF偵側演算法來檢測目標視訊中的特徵，利用SURF描述符來模擬該等被檢測的特徵，以及在第0階移動模式(假設特徵在關鍵訊框和目前訊框之間為靜止)和直接式內插技術的條件下產生關鍵預測值。 In one embodiment, the offline mode MBCF uses the SURF detection algorithm to detect features in the target video, the SURF descriptor to simulate the detected features, and the 0th-order mobile mode (assuming the feature is critical) Key predictive values are generated under conditions of static and direct interpolation between the frame and the current frame.

接著，離線模式MBCF會從已由該基於特徵處理串流儲存的(多個)輸入視訊中讀入適當的特徵資訊。(請回想，被保存的特徵資訊係由n個超區域中的m個叢集的圖像點所構成，加上該等m個叢集質心的圖像點與特徵模型。)於其中一實施例中，MBCF會從SURF描述符最靠近和該目標巨集區塊(正在被編碼的目前巨集區塊)相關聯之特徵的SURF描述符的叢集中讀入該等叢集元素。 Next, the offline mode MBCF reads the appropriate feature information from the input video(s) that have been stored by the feature-based processing stream. (Recall that the saved feature information is composed of image points of m clusters in n super regions, plus image points and feature models of the m cluster centroids.) The MBCF reads the cluster elements from the cluster of SURF descriptors whose SURF descriptors are closest to the feature associated with the target macroblock (the current macroblock being encoded).

一旦讀入特殊叢集之後，離線模式MBCF接著便會藉由從偏離該超區域之中心的叢集中每一個超區域處(其方式和該目標巨集區塊偏離其在目標視訊中之相關聯特徵相同)抽出圖像點來產生次要預測值。依此方式會產生n個次要預測值，每一個叢集成員一個。 Once read into the special cluster, the offline mode MBCF will then deviate from each of the super-regions in the cluster from the center of the super-region (the manner and the target macroblock deviate from their associated features in the target video) The same) extracts image points to produce secondary predictions. In this way, n secondary prediction values are generated, one for each cluster member.

於其中一實施例中，由離線模式MBCF所產生的次要預測值接著會利用如上面所述之PCA-Lite演算法或PCA-Lite-Mean演算法來與關鍵預測值結合。 In one embodiment, the secondary prediction values generated by the offline mode MBCF are then combined with key prediction values using a PCA-Lite algorithm or a PCA-Lite-Mean algorithm as described above.

於另一實施例中，該等次要預測值可被視為主要預測值，倘若它們產生較低誤差或編碼成本的話，則可取代視訊內關鍵預測值。於此實施例中(主要預測值可能來自離線源(在目標視訊外面))，可以將一正規化步驟(舉例來說，假設為仿射移動模型)套用至該等離線預測值，以確保更緊密匹配該目標巨集區塊。 In another embodiment, the secondary predicted values can be considered as the primary predicted value, if If they produce lower errors or coding costs, they can replace the key predictions within the video. In this embodiment (the primary predictive value may come from an offline source (outside the target video)), a normalization step (for example, assumed to be an affine movement model) can be applied to the offline prediction values to ensure more Closely match the target macro block.

總結來說，離線模式MBCF藉由下面步驟再利用特徵模型來進行壓縮：(1)檢測目標視訊中每一個訊框的特徵；(2)模擬該等被檢測的特徵；(3)關聯結合不同訊框中的特徵，用以產生特徵軌跡；(4)使用特徵軌跡來預測正在被編碼之「目標」訊框中的特徵位置；(5)關聯結合目前訊框中位在該等已預測特徵位置附近的巨集區塊；(6)以該等特徵在最近編碼過的關鍵訊框中的位置為基礎來產生步驟5中的巨集區塊的關鍵預測值；(7)藉由決定其質心描述符最靠近該目標視訊之特徵的描述符的叢集而讀入從輸入視訊處產生的特徵資訊；(8)從在步驟7中讀入的特徵資訊中產生次要預測值。 In summary, offline mode MBCF reuses the feature model by following the steps below. Compressing: (1) detecting the characteristics of each frame in the target video; (2) simulating the detected features; (3) associating the features in different frames to generate the feature trajectory; (4) Using feature trajectories to predict feature locations in the "target" frame being encoded; (5) associating with macroblocks in the current frame near the predicted feature locations; (6) with these features Based on the position of the recently coded key frame to generate the key prediction value of the macro block in step 5; (7) by deciding the descriptor whose centroid descriptor is closest to the feature of the target video The cluster information is read into the feature information generated from the input video; (8) the secondary prediction value is generated from the feature information read in step 7.

形成特徵模型程式庫 Forming a feature model library

簡單模型程式庫：直接保存純特徵資訊 Simple model library: save pure feature information directly

如上面所提，可以從輸入視訊處產生並且接著保留一基礎的特徵資訊集。接著，可以在基於模型壓縮架構(MBCF)裡面再利用此特徵資訊，用以改善要被編碼之另一「目標」視訊的壓縮效果。直接將該特徵資訊保存在檔案、資料庫、或是資料儲存體之中代表組織與編目來自一或多個輸入視訊之特徵資訊的特徵模型程式庫的最簡單形式。 As mentioned above, it can be generated from the input video and then retained a basic Feature information set. This feature information can then be reused in the Model Based Compression Architecture (MBCF) to improve the compression of another "target" video to be encoded. Saving the feature information directly in a file, database, or data store represents the simplest form of organizing and cataloging feature model libraries from one or more input video features.

於其中一實施例中，來自特徵檢測步驟與特徵追蹤步驟的資訊會被保存在檔案、資料庫、或是資料儲存體之中。此資訊可能包含，但是並不受限於：該等特徵被檢測的輸入視訊的名稱；特徵軌跡清單，每一個特徵軌跡皆有一相關聯的特徵ID；每一個特徵軌跡的軌跡「長度」(等於該軌跡中所含的特徵實例的數量)和它的總頻寬(其定義為習知壓縮法(舉例來說，H.264)編碼該軌跡中所有特徵實例所需要的總位元數)；一軌跡中的每一個特徵實例的檢測類型(舉例來說，SURF、臉部)、發生檢測的訊框、特徵之中心的x/y座標、以及特徵的頻寬；每一個特徵軌跡中來自該軌跡之特有(代表性)特徵的圖像點。 In one embodiment, the features from the feature detection step and the feature tracking step The newsletter is stored in a file, database, or data store. This information may include, but is not limited to, the name of the input video to which the features are detected; a list of feature tracks, each of which has an associated feature ID; the track length of each feature track (equal to The number of feature instances contained in the trajectory) and its total bandwidth (defined as the number of total bits required by a conventional compression method (for example, H.264) to encode all feature instances in the trajectory); The type of detection of each feature instance in a track (for example, SURF, face), the frame in which the detection occurs, the x/y coordinates of the center of the feature, and the bandwidth of the feature; each feature track comes from An image point of a characteristic (representative) feature of the trajectory.

請注意，重要的係，來自輸入視訊之特徵檢測步驟與追蹤步驟的資訊雖然沒有在基於模型壓縮架構中直接被用來壓縮目標視訊；但是，倘若特徵模型程式庫需要累積來自一個以上輸入視訊的特徵資訊的話，特徵檢測與追蹤資訊仍必須被保存，因為當組合多個視訊的軌跡時，用於壓縮的特徵叢集的構成也會改變。 Note that importantly, the information from the feature detection steps and tracking steps of the input video is not directly used to compress the target video in the model-based compression architecture; however, if the feature model library needs to accumulate more than one input video. Feature information, feature detection and tracking information must still be saved, because when combining multiple video tracks, the composition of the feature clusters used for compression will also change.

於其中一實施例中，來自特徵叢集步驟的資訊會被保存在檔案、資料庫、或是資料儲存體之中，和特徵檢測與追蹤資訊分開。該特徵叢集資訊可能包含，但是並不受限於：叢集清單，每一個叢集皆有一相關聯的索引值；每一個叢集中的成員數量以及和叢集質心相關聯的特徵模型；來自每一個叢集成員(本身係代表一特徵軌跡的特有特徵)中位於該特徵周遭之「超區域」的圖像點，以及相關聯的特徵模型；和實施叢集的方式相關聯的各項參數(舉例來說，k均值叢集法中的公差和迭代數) In one embodiment, the information from the feature clustering step is stored in a file, database, or data store, separate from feature detection and tracking information. The feature cluster information may include, but is not limited to, a cluster list, each cluster has an associated index value; the number of members in each cluster and the feature model associated with the cluster centroid; from each cluster An image point located in the "super region" of the feature (which is a unique feature of a feature trajectory), and an associated feature model; Parameters associated with the way the cluster is implemented (for example, tolerances and iterations in the k-means clustering method)

當特徵模型程式庫需要累積多個輸入視訊的特徵資訊時，可以採用數種方式。於其中一實施例中，所有輸入視訊的特徵軌跡會直接被彙總，並且對該特徵軌跡彙總集重新進行叢集。然而，當特徵軌跡的總數量增加時此方式會有問題，因為最終特徵叢集的大小會變得太大(使得該等叢集的資訊性較低)或是特徵叢集的數量會增加(從而提高編入該等叢集之中的編碼成本)。 There are several ways in which the feature model library needs to accumulate feature information for multiple input videos. In one embodiment, all of the feature tracks of the input video are directly aggregated, and the feature track summary set is re-clustered. However, this approach can be problematic when the total number of feature trajectories increases, as the size of the final feature cluster can become too large (making the clusters less informative) or the number of feature clusters will increase (and thus increase the number of features). The coding cost among the clusters).

於特徵模型程式庫含有多個輸入視訊的另一實施例中，在進行叢集之前會先排列特徵軌跡的優先序。也就是，在進行叢集之前會先刪除該特徵軌跡彙總集，俾使得僅「最重要」的特徵軌跡會被保留進行叢集。於其中一實施例中，會根據特徵軌跡的軌跡頻寬(其定義為習知壓縮法(舉例來說，H.264)編碼該軌跡中所有特徵實例所需要的總位元數)來排列它們的優先序。習知壓縮法難以編碼的特徵會被確認為高優先序。於另一實施例中，會根據冗餘性(其鬆散定義為一軌跡中某個特徵的重複數(沒有變化性))來排列特徵軌跡的優先序。特徵軌跡冗餘性可藉由計算和一由該軌跡中不同特徵實例組成之總體矩陣相關聯的各項統計數(秩(rank)、條件數(condition number))來測量。高冗餘性特徵會經常重複出現在輸入視訊中並且因而被確認為重要而得壓縮。於另一實施例中，會根據和特定類型重要特徵(例如，臉部)的相似性來排列特徵軌跡的優先序。屬於特定特徵類型的的特徵會被確認為重要。於進一步的實施例中，可以根據語意內容(例如，特殊的移動隊伍、特殊的電視節目、…等)來特殊化該等特定特徵類型。 In another embodiment in which the feature model library contains multiple input videos, the prioritization of the feature tracks is prior to prior to clustering. That is, the feature track summary set is deleted before the cluster is made, so that only the "most important" feature tracks are retained for clustering. In one embodiment, the trajectory bandwidth of the feature trajectory (which is defined as the number of total bits required for a conventional compression method (for example, H.264) to encode all feature instances in the trajectory) is arranged according to the trajectory bandwidth of the feature trajectory. Priority. Features that are difficult to encode by conventional compression methods are identified as high priority. In another embodiment, the prioritization of the feature trajectory is ranked according to redundancy (which is loosely defined as the number of repetitions of a feature in a trajectory (no variability)). Feature trajectory redundancy can be measured by computing various statistics (rank, condition number) associated with an overall matrix of different feature instances in the trajectory. High redundancy features are often repeated in the input video and are thus confirmed to be important for compression. In another embodiment, the prioritization of the feature trajectory is ranked according to similarities to particular types of important features (eg, faces). Features belonging to a particular feature type are identified as important. In further embodiments, the particular feature types may be specialized based on semantic content (eg, special mobile teams, special television programs, ..., etc.).

圖6A摘要說明用以在上面所述之特徵模型程式庫中儲存並且接著存取特徵資訊的一般性步驟，它們接續在如圖3中所示之基於特徵處理串流(FPS)後面。在特徵檢測、模擬、關聯結合、以及追蹤之後，FPS會產生並且儲存該等特徵軌跡的特有特徵610。FPS還會產生並且儲存該等特有特徵的空間(SURF)描述符612和頻譜(色彩長條分佈圖)描述符620。FPS會叢集該等空間描述符和該等頻譜描述符614並且計算叢集質心。FPS會從該等叢集中產生一特徵索引值616，它的元素為該等叢集質心的描述符。儲存庫接著會以能夠被用來存取該叢集中之特徵的特徵索引值為基礎產生分類器618。在圖6B中，當利用上面所述的基於模型壓縮架構(MBCF)來編碼一新的目標視訊時，FPS會響應於該目標視訊中被檢測的特徵而存取該索引值632並且擷取由多個叢集成員組成的相關聯結果集634。該等叢集成員接著會在MBCF裡面被使用，如上面所述，用以幫助在該目標視訊中壓縮對應的特徵區域。熟習本技術的人士便會瞭解，上面所述的方法能夠以任何順序來執行，而且不需要以上面所述的順序來進行。 Figure 6A is a summary diagram for storing in the feature model library described above and And then the general steps of accessing the feature information, which are followed by the feature processing stream (FPS) as shown in FIG. After feature detection, simulation, association combining, and tracking, the FPS generates and stores unique features 610 of the feature trajectories. The FPS also generates and stores the spatially (SURF) descriptor 612 and the spectral (color strip map) descriptor 620 of the unique features. The FPS will cluster the spatial descriptors and the spectral descriptors 614 and calculate the cluster centroid. The FPS will generate a feature index value 616 from the clusters whose elements are descriptors of the cluster centroids. The repository then generates a classifier 618 based on the feature index values that can be used to access the features of the cluster. In FIG. 6B, when a new target video is encoded using the Model Based Compression Architecture (MBCF) described above, the FPS will access the index value 632 and retrieve the response in response to the detected feature in the target video. An associated result set 634 of multiple cluster members. The cluster members are then used in the MBCF, as described above, to help compress the corresponding feature regions in the target video. Those skilled in the art will appreciate that the methods described above can be performed in any order and need not be performed in the order described above.

進階模型程式庫：視訊儲存庫之基於雜湊索引編排 Advanced Model Library: Hash-based indexing of video repositories

除了以明確的方式將特徵資訊直接保存在上面概述的檔案、資料庫、或是資料儲存體之中達到特徵模型程式庫之最簡單形式之外，取而代之的方式係，使用基於雜湊索引編排來形成用以從視訊儲存庫中存取資料的更進階特徵模型程式庫。除了基於特徵處理串流(FPS)所產生的特徵模型980(圖9A、圖9B)之外，一視訊儲存庫還含有來自已利用該FPS處理過之一或多個輸入視訊的資料圖像點。這不同於前面段落中所述之僅含有特徵圖像點和它們相關聯模型的簡單特徵模型程式庫一並非整個輸入視訊中的所有圖像點。基於雜湊索引編排提供一種有效方式從視訊儲存庫處存取特徵資訊。基於特徵處理會被視為圖7中視訊資料塊702的稀疏取樣，其維度為訊框數(704A-704F)乘列數(708A-708F)乘行數(706A-706F)。特徵實例經常僅出現在一給定視訊資料塊中小百分比的區域中。 In addition to keeping the feature information directly in the file outlined above In addition to the simplest form of the feature model library in the case, database, or data store, the alternative is to use a hash-based indexing to form a more advanced way to access data from the video repository. Feature model library. In addition to the feature model 980 (FIG. 9A, FIG. 9B) generated by Feature Processing Streaming (FPS), a video repository also contains data image points from one or more input videos that have been processed by the FPS. . This is different from the simple feature model library that only contains feature image points and their associated models as described in the previous paragraph. All image points in the message. Hash-based indexing provides an efficient way to access feature information from a video repository. The feature-based processing is considered to be a sparse sample of the video data block 702 of FIG. 7, the dimensions of which are the number of frames (704A-704F) by the number of columns (708A-708F) by the number of rows (706A-706F). Feature instances often only appear in a small percentage of a given block of video data.

圖8A所示的係用以產生基於雜湊索引值的方法的範例實施例的流程圖。圖8A中的基於特徵處理串流(FPS)從偵側輸入視訊中的訊框中的特徵開始802。FPS接著會將一雜湊標籤套用至該等被檢測特徵中的每一者804。雜湊標籤使用單向雜湊函數(one way hash function)將用以辨識被檢測特徵的資訊(它的x/y位置、訊框、範圍、以及相關聯模型980)轉換成雜湊數值，俾使得稍後能夠從該視訊儲存庫處簡單地存取該特徵。FPS接著會將每一個已雜湊特徵及它的對應雜湊數值加入至一索引值806，該索引值會連同已編碼的視訊本身被儲存該視訊儲存庫之中。FPS接著會判斷輸入視訊的所有訊框是否皆已分析過808。倘若所有訊框皆已分析過的話，FPS便會停止產生索引值810。倘若輸入視訊中並非所有訊框皆已分析過的話，那麼，FPS便會檢測下一個訊框中的特徵802。 An example implementation of the method for generating a hash index based value is shown in FIG. 8A The flow chart of the example. The feature-based processing stream (FPS) in FIG. 8A begins 802 from the feature in the frame in the interrogation input video. The FPS then applies a hash tag to each of the detected features 804. The hash tag uses a one way hash function to convert the information used to identify the detected feature (its x/y position, frame, range, and associated model 980) into a hash value, such that later This feature can be easily accessed from the video repository. The FPS then adds each of the hashed features and its corresponding hash value to an index value 806, which is stored in the video repository along with the encoded video itself. The FPS then determines if all frames of the input video have been analyzed 808. If all frames have been analyzed, FPS will stop generating index value 810. If not all frames in the input video have been analyzed, the FPS will detect the feature 802 in the next frame.

圖8B所示的係使用基於雜湊索引值來存取特徵之方法的範例實施例的流程圖。FPS會分析輸入視訊的訊框，用以檢測特徵812。FPS接著會將雜湊函數套用至該被檢測特徵814，用以產生它的雜湊數值。FPS接著會利用該被檢測特徵的雜湊數值來搜尋索引值，用以在視訊儲存庫中尋找並抽出對應的特徵(和它的相關聯特徵模型980)816、818。於另一實施例中，FPS會從視訊資料塊中抽出一給定特徵的複數個特徵模型。FPS接著會以被抽出的特徵模型980或是相關聯的特徵資訊為基礎來壓縮820該被檢測的特徵。 Figure 8B shows the use of a method based on hash index values to access features. A flow chart of an example embodiment. The FPS analyzes the frame of the incoming video to detect feature 812. The FPS then applies the hash function to the detected feature 814 to generate its hash value. The FPS then uses the hash values of the detected features to search for index values for finding and extracting corresponding features (and its associated feature models 980) 816, 818 in the video repository. In another embodiment, the FPS extracts a plurality of feature models of a given feature from the video data block. The FPS will then compress 820 the checked based on the extracted feature model 980 or associated feature information. Measured features.

熟習本技術的人士便會瞭解，上面所述的壓縮方法能夠套用至多個訊框，而且壓縮未必以逐個訊框的方式來進行。然而，圖8B中所示的方法僅範例說明利用索引值從先前已編碼之視訊的儲存庫所含的稀疏填充視訊資料塊處存取特徵資訊背後的一般性原理；經存取的特徵資訊接著會被用來幫助壓縮新的目標視訊。 Those skilled in the art will appreciate that the compression method described above can be applied. To multiple frames, and compression does not have to be done frame by frame. However, the method illustrated in FIG. 8B merely illustrates the general principle behind accessing feature information from a sparsely populated video data block contained in a previously encoded video repository using index values; the accessed feature information is then Will be used to help compress new target video.

請注意，圖8A至8B中所示之基礎的基於特徵處理串流(FPS) 不同於上面段落中概述(及圖3和6A中所示)的FPS，除了特徵檢測之外，還包含特徵追蹤與叢集。更值得注意的係，圖8A至8B中的FPS係從視訊儲存庫處存取特徵資訊(該視訊儲存庫含有它的輸入視訊的全部圖像點)，而並非從僅含有特徵圖像點與資訊的檔案集來存取(如圖3至6A中的FPS所使用的方式)。 Note that the base feature-based processing stream (FPS) shown in Figures 8A through 8B Unlike the FPS outlined in the above paragraph (and shown in Figures 3 and 6A), in addition to feature detection, feature tracking and clustering are included. More notably, the FPS in Figures 8A-8B accesses feature information from the video repository (the video repository contains all of the image points of its input video), rather than containing only feature image points and The archive of information is accessed (as used by the FPS in Figures 3 through 6A).

圖8C所示的係根據本發明一範例實施例圖解在圖8A至8B 中之基礎FPS的一般性步驟的流程圖822。FPS會處理一輸入視訊824並且從該視訊處檢測特徵，如本申請案中832所述。FPS接著會在該視訊中產生特徵之基於雜湊的索引值834。FPS會利用已抽出的特徵832和已產生的索引值834根據本技術中已知的壓縮技術及/或本申請案中所述的壓縮技術來轉碼該視訊826。FPS會以已產生的索引值834為基礎來管理資料的散佈836。舉例來說，已編碼的視訊會被儲存在視訊儲存庫裡面的一特殊伺服器叢集之中。該視訊可視情況被儲存在儲存體結構、資料庫、或是經組織用於大型視訊集的其它資料結構之中828。當請求存取視訊時，儲存庫會裝載並存取所請求的視訊830，儲存庫會串流所請求的視訊840。在管理資料散佈836之後，FPS會從儲存庫處散佈模組838，以便幫助串流該視訊840。於其中一實施例中，FPS會從儲存庫處散佈模組838至一客戶裝置，以便幫助在該裝置上串流視訊840。熟習本技術的人士便會瞭解，上面所述的方法能夠以任何順序來執行，而且不需要以上面所述的順序來進行。 Figure 8C is illustrated in Figures 8A through 8B in accordance with an exemplary embodiment of the present invention. Flowchart 822 of the general steps of the underlying FPS. The FPS processes an incoming video 824 and detects features from the video, as described in 832 of the present application. The FPS will then generate a hash-based index value 834 for the feature in the video. The FPS will utilize the extracted features 832 and the generated index values 834 to transcode the video 826 according to compression techniques known in the art and/or compression techniques described in this application. The FPS manages the spread 836 of the data based on the generated index value 834. For example, the encoded video will be stored in a special server cluster in the video repository. The video may be stored in a storage structure, a database, or other data structure organized for use in a large video collection. When requesting access to the video, the repository will load and access the requested video 830, which will stream the requested video 840. Management data After the cloth 836, the FPS will distribute the module 838 from the repository to help stream the video 840. In one embodiment, the FPS will distribute the module 838 from the repository to a client device to help stream video 840 on the device. Those skilled in the art will appreciate that the methods described above can be performed in any order and need not be performed in the order described above.

配合一般性壓縮處理串流來使用視訊儲存庫 Use video repositories with general compression processing streams

視訊儲存庫未必要配合基於特徵處理串流一起使用。圖8D所示的係配合未必涉及基於模型處理之壓縮技術的儲存庫的一般用法。圖8D的一般性處理流程(Generalized Processing Flow，GPF)850會先接受要被儲存在儲存庫之中的目標視訊852。儲存庫會根據本技術中已知的壓縮技術(未必係基於模型)及/或本申請案中所述的壓縮技術來轉碼該視訊854。GPF會視情況將該視訊儲存在儲存體結構、資料庫、或是經組織用於大型視訊集的其它資料結構之中856。當請求存取視訊時，GPF會從儲存庫處裝載並存取所請求的視訊858。GPF接著會串流所請求的視訊860。熟習本技術的人士便會瞭解，上面所述的方法能夠以任何順序來執行，而且不需要以上面所述的順序來進行。舉例來說，於其中一實施例中，GPF可能會在其存取視訊858之後，但是在其串流視訊860之前來轉碼該視訊854。於另一實施例中，GPF可能會在輸入該視訊852之後先轉碼該視訊854，並且接著在存取該視訊858之後但是在串流該視訊860之前提供一額外的轉碼給該視訊854。 Video repositories are not necessarily used in conjunction with feature-based processing streams. The collocation shown in Figure 8D does not necessarily involve the general use of a repository of model-based compression techniques. The Generalized Processing Flow (GPF) 850 of Figure 8D will first accept the target video 852 to be stored in the repository. The repository will transcode the video 854 according to compression techniques known in the art (not necessarily based on models) and/or compression techniques as described in this application. The GPF will store the video in a storage structure, database, or other data structure organized for use in a large video collection, as appropriate. When requesting access to the video, the GPF loads and accesses the requested video 858 from the repository. The GPF then streams the requested video 860. Those skilled in the art will appreciate that the methods described above can be performed in any order and need not be performed in the order described above. For example, in one embodiment, the GPF may transcode the video 854 after it has accessed the video 858 but before it streamed the video 860. In another embodiment, the GPF may transcode the video 854 after inputting the video 852, and then provide an additional transcoding to the video 854 after accessing the video 858 but before streaming the video 860. .

模型程式庫之應用 Application of model library

基礎運算：全域模型程式庫和(個人)智慧型模型程式庫 Basic operations: global model library and (personal) smart model library

本發明的觀點可能包含儲存在伺服器/雲端(cloud)中的特徵模型程式庫。藉由將模型程式庫儲存在雲端中並且在需要時存取該等程式庫中的特徵資訊，本發明能夠以低於習知編解碼器的頻寬來串流高畫質視訊，視覺品質減損極少，甚至沒有任何減損。該等模型980不僅可再利用於單一視訊裡面(上面所述之基於模型壓縮架構[MBCF]的「上線(online)」模式)，還可跨多個不同、異類的視訊來再利用(MBCF的「離線(offline)」模式)。該系統能夠辨識、確認、並且再利用來自其中一個高畫質視訊的模型，用以在另一個高畫質視訊中處理並且呈現視訊影像。模型980的再利用會縮減程式庫的檔案大小，從而讓裝置可在串流視訊資料時降低必要的頻寬。 Aspects of the invention may include features stored in the server/cloud Model library. By storing the model libraries in the cloud and accessing the feature information in the libraries when needed, the present invention can stream high-quality video with a lower bandwidth than the conventional codec, and the visual quality is degraded. Very little, not even any derogation. These models 980 can be reused not only in a single video (the "online" mode based on the model compression architecture [MBCF] described above) but also across multiple different, heterogeneous video (MBCF "Offline" mode). The system is capable of recognizing, validating, and reusing a model from one of the high quality video to process and present the video image in another high quality video. Reuse of model 980 reduces the file size of the library, allowing the device to reduce the necessary bandwidth when streaming video content.

該等特徵模型程式庫能夠存在於雲端部署(公眾或私有)之中，而且較佳的係，僅在需要時才被下載至使用者的行動裝置。技術上和現今的Amazon Kindle(商標)與Apple iPad(商標)裝置應用程式管理雲端及使用者裝置之間的內容相似，本發明能夠離線儲存模型程式庫，並且在需要時傳送相關模型980至使用者裝置用以幫助進行視訊壓縮/解壓縮。 The feature model libraries can exist in the cloud deployment (public or private), and the preferred system is downloaded to the user's mobile device only when needed. Technically and today's Amazon Kindle (trademark) and Apple iPad (trademark) device applications manage the content between the cloud and the user device, the present invention is able to store the model library offline and transfer the relevant model 980 to use when needed. The device is used to help with video compression/decompression.

圖9A所示的係在網路170上和客戶裝置908操作連接的視訊儲存庫902的範例實施例的方塊圖。儲存庫902包含一已編碼視訊集904。該視訊集904包含第一視訊集906A、第二視訊集906B、第三視訊集906C、以及第N視訊集906D。熟習本技術的人士便會瞭解，視訊集904可能包含任何數量的視訊或視訊集。視訊集904裡面的視訊集906A至906D可能彼此各自相關。舉例來說，第一視訊集906A可能係第一特殊電視連續劇的一個完整季的劇情。第二視訊集906B可能係第二特殊電視連續劇的一個完整季的劇情。同樣地，第三視訊集906C和第N視訊集906D可能包含其它季或是其它電視連續劇。熟習本技術的人士便會進一步瞭解，視訊集906A至 906D中的每一者可能包含電視連續劇的劇情、相關的電影(續集或三部曲)、移動廣播、或是任何其它相關的視訊。 A block diagram of an exemplary embodiment of a video repository 902 operatively coupled to client device 908 on network 170 is shown in FIG. 9A. Repository 902 includes an encoded video set 904. The video set 904 includes a first video set 906A, a second video set 906B, a third video set 906C, and an Nth video set 906D. Those skilled in the art will appreciate that video set 904 may contain any number of video or video sets. Video sets 906A through 906D within video set 904 may be associated with each other. For example, the first video set 906A may be a full season scenario for the first special television series. The second video set 906B may be a full season scenario for the second special television series. Similarly, the third video set 906C and the Nth video set 906D may contain other seasons or other television series. Those familiar with the technology will learn more about the video set 906A to Each of the 906Ds may contain the story of a television series, related movies (sequels or trilogy), mobile broadcasts, or any other related video.

儲存庫902會在網路170上操作連接至客戶裝置908。該客戶裝置包含一請求產生模組914。該請求產生模組914會在網路170上發送視訊請求916給儲存庫902。儲存庫902會在請求接收模組918處接收該視訊請求916。視訊請求916係用於視訊集904中所包含之視訊的請求。在發出視訊請求916時，該客戶裝置908會預期接收該被請求的視訊，並且視情況響應於該被請求的視訊而準備適當的編解碼器來解碼該外來的位元串。 Repository 902 will be operatively coupled to client device 908 on network 170. The guest The subscriber device includes a request generation module 914. The request generation module 914 sends a video request 916 to the repository 902 on the network 170. The repository 902 will receive the video request 916 at the request receiving module 918. The video request 916 is for the request for video contained in the video set 904. Upon issuing the video request 916, the client device 908 would expect to receive the requested video and, in response to the requested video, prepare an appropriate codec to decode the foreign bit string.

為發送該被請求的視訊給客戶裝置908，儲存庫902會讓請求接收模組918發出被請求視訊查詢920給視訊集904。該被請求視訊查詢920可能係針對用以啟動該視訊集904資料結構之查詢功能的請求。該被請求視訊查詢920亦可能係針對能夠在視訊集904中有效尋找被請求之視訊的已產生之索引值的請求。視訊集904會回應該被請求視訊查詢920而產生一被請求的視訊922給串流產生模組924。 In order to send the requested video to the client device 908, the repository 902 will be requested The request receiving module 918 sends the requested video query 920 to the video set 904. The requested video query 920 may be for a request to initiate a query function for the video set 904 data structure. The requested video query 920 may also be a request for an index value that can be efficiently found in the video set 904 for the requested video. The video set 904 will be returned to the video query 920 to generate a requested video 922 to the stream generation module 924.

串流產生模組924會產生一和被請求的視訊相關聯的經產生程式庫926，加上該被請求視訊的編碼928。該經產生程式庫926(亦稱為智慧型模型程式庫)包含用以解碼該被請求之已編碼視訊928所需要的特徵模型。於其中一實施例中，該經產生智慧型模型程式庫926中的該等模型係從該視訊儲存庫以及參考該儲存庫裡面所含視訊的特徵模型980的一基於雜湊索引值處推知。 The stream generation module 924 generates a production associated with the requested video. The library 926 is added to the code 928 of the requested video. The generated library 926 (also referred to as a smart model library) contains the feature models needed to decode the requested encoded video 928. In one embodiment, the models in the generated smart model library 926 are inferred from the video repository and a hash index based reference to the feature model 980 of the video contained in the repository.

於另一實施例中，該經產生智慧型模型程式庫926中的該等模型係從一包含可再利用模型(舉例來說，特徵模型)集的全域模型程式庫 980處推知。全域模型程式庫中的模型不僅可再利用於單一視訊，還可再利用於多個不同、異類視訊來再利用。 In another embodiment, the generated smart model library 926 The model is a global model library containing a set of reusable models (for example, feature models) 980 inferences. The models in the global model library can be reused not only for a single video but also for reuse in multiple different, heterogeneous video.

該視訊儲存庫902總共會儲存該等已編碼視訊904、該全域模型程式庫980或是參考該等視訊的模型的基於雜湊索引值。 The video repository 902 stores a total of the encoded video 904, the global model library 980, or a hash-based index value that references the models of the video.

該經產生程式庫926和該已編碼視訊928會在網路170上被傳送至客戶裝置908。程式庫926能夠被傳送至包含行動裝置的任何裝置，例如：iPad、智慧型電話、以及平板電腦。客戶裝置908會在串流解碼模組910處接收該經產生程式庫926和已編碼視訊928。串流解碼模組910會利用該經產生程式庫926中的資訊以及，視情況，串流解碼模組910已知的其它編解碼器來解碼該已編碼視訊928。串流解碼模組910會輸出一已解碼視訊911。該已解碼視訊911會被傳送至下面至少其中一者：記憶體912A、顯示器912B、或是儲存模組912C。 The generated library 926 and the encoded video 928 are transmitted over the network 170 to the client device 908. The library 926 can be transferred to any device containing a mobile device, such as an iPad, a smart phone, and a tablet. The client device 908 receives the generated library 926 and the encoded video 928 at the stream decoding module 910. The stream decoding module 910 decodes the encoded video 928 using the information in the generated library 926 and, optionally, other codecs known to the stream decoding module 910. The stream decoding module 910 outputs a decoded video 911. The decoded video 911 is transmitted to at least one of: memory 912A, display 912B, or storage module 912C.

有版本設計(個人)的模型程式庫 Version design (personal) model library

圖9B所示的係被配置成用以在網路170上和客戶裝置908進行通訊的視訊儲存庫902的另一範例實施例的方塊圖。儲存庫902和客戶裝置908的操作雷同於圖9A中所引用者。然而，在圖9B中還說明額外的模組、方法、以及特徵元件。熟習本技術的人士便會瞭解，儲存庫902及客戶裝置908等模組在本文之本申請案中所述的實施例之間可以互換使用。 9B is a block diagram of another exemplary embodiment of a video repository 902 for communicating with client device 908 over network 170. The operations of repository 902 and client device 908 are similar to those referenced in Figure 9A. However, additional modules, methods, and features are also illustrated in FIG. 9B. Those skilled in the art will appreciate that modules such as repository 902 and client device 908 are used interchangeably between the embodiments described herein in this application.

於其中一實施例中，儲存庫902和客戶裝置908會被配置成用以為用來解碼視訊的程式庫設計版本。客戶裝置908包含請求產生模組914。如上面所述，該請求產生模組914會送出一視訊請求916給請求接收模組918。請求接收模組918會送出被請求視訊查詢920給視訊集904，如上面所述。然而，於其中一實施例中，該請求接收模組918會送出被請求視訊之程式庫的客戶版本查詢952給版本設計模組954。該版本設計模組會以查詢952為基礎來決定被請求視訊之程式庫的客戶版本956。於許多情況中，一客戶裝置可能會請求並下載包含相關編解碼器或程式庫的相關視訊。相關視訊的範例為同一個電視節目的後續劇情，因為該電視節目中使用的演員與布景有共同性的關係，所以，該等後續劇情包含相似的訊框。另一範例為移動賽事，在其多個訊框中的場地、場館、移動員、標幟、或是移動設備之間會有共同性。所以，倘若客戶裝置908先前已下載一相關視訊與程式庫的話，其可能已經具有用以解碼該已編碼視訊928所需要的許多甚至全部必要模型。於此情形中，要解碼該已編碼視訊928的客戶可能必須更新程式庫。僅發送更新而非完整程式庫會保留傳送資料給客戶裝置908的頻寬，而且由於較小下載大小的關係，其會提高該客戶裝置908之使用者開始觀看所請求視訊的速度。 In one embodiment, repository 902 and client device 908 are configured to design a version of the library used to decode the video. Client device 908 includes a request generation module 914. As described above, the request generation module 914 sends a video request 916 to the request receiving module 918. The request receiving module 918 sends the requested video query 920 to the video set 904, such as Above. However, in one embodiment, the request receiving module 918 sends a client version query 952 of the requested video library to the version design module 954. The version design module will determine the client version 956 of the requested video library based on query 952. In many cases, a client device may request and download related video containing the associated codec or library. The example of related video is the follow-up story of the same TV program. Because the actors used in the TV program have a common relationship with the scenery, the subsequent stories contain similar frames. Another example is a mobile event where there is a commonality between venues, venues, mobiles, banners, or mobile devices in multiple frames. Therefore, if the client device 908 has previously downloaded an associated video and library, it may already have many or all of the necessary models needed to decode the encoded video 928. In this case, the client that wants to decode the encoded video 928 may have to update the library. Sending only the update, rather than the full library, preserves the bandwidth of the transmitted material to the client device 908, and because of the smaller download size, it increases the speed at which the user of the client device 908 begins viewing the requested video.

於其中一實施例中，串流產生模組924包含一差別性程式庫產生模組958以及一視訊編碼模組960。串流產生模組924會從視訊集904處接收被請求的視訊922。差別性程式庫產生模組958會接收該被請求的視訊以及該被請求視訊之程式庫的客戶版本956。於其中一實施例中，差別性程式庫產生模組958會以被請求的視訊922、該等模型980、以及被請求視訊之程式庫的客戶版本956為基礎來決定該客戶裝置908用以在該基於模型壓縮架構裡面解碼該視訊所需要的更新。 In one embodiment, the stream generation module 924 includes a differential library generation module 958 and a video encoding module 960. The stream generation module 924 receives the requested video 922 from the video set 904. The differential library generation module 958 receives the requested video and the client version 956 of the requested video library. In one embodiment, the differential library generation module 958 determines the client device 908 to use based on the requested video 922, the models 980, and the client version 956 of the requested video library. The update required to decode the video in the model-based compression architecture.

於另一實施例中，差別性程式庫產生模組958會以被請求的視訊922、基於雜湊索引值、以及被請求視訊之程式庫的客戶版本956為基礎來決定該客戶裝置908用以在該基於模型壓縮架構裡面解碼該視訊所需要的更新。 In another embodiment, the differential library generation module 958 will use the requested video 922, the hash index value, and the client version 956 of the requested video library. The basis is to determine the update required by the client device 908 to decode the video in the model-based compression architecture.

差別性程式庫產生模組958會產生一差別性程式庫962，其僅包含已經儲存在客戶裝置908中的程式庫儲存模組964處之程式庫所需要的更新(額外的特徵模型)。視訊編碼模組960會以差別性程式庫962以及被請求視訊之程式庫的客戶版本956為基礎來產生該已編碼視訊928。使用客戶專屬程式庫版本讓視訊散佈者能夠相依於在客戶處所接收到的模型來提供不同等級的觀賞體驗。舉例來說，其中一個客戶程式庫的模型可被用來幫助提高正在被觀賞之視訊的品質。 The differential library generation module 958 generates a differential library 962, which Only the updates (additional feature models) required by the library at the library storage module 964 already stored in the client device 908 are included. The video encoding module 960 generates the encoded video 928 based on the differential library 962 and the client version 956 of the library of requested video. Using a client-specific library version allows video distributors to provide different levels of viewing experience depending on the model received at the customer's premises. For example, one of the client library models can be used to help improve the quality of the video being viewed.

於另一實施例中，視訊編碼模組960會逕自利用提供最佳壓縮的模型來產生該已編碼視訊。差別性程式庫產生模組958會以被用來編碼該視訊的模型以及所掌握到之駐存在該客戶裝置中之程式庫的客戶版本的智識為基礎來產生該差別性程式庫962。於此實施例中，僅有額外的模型(若有的話)會包含在該差別性程式庫之中。 In another embodiment, the video encoding module 960 provides the best self-use. The compressed model produces the encoded video. The differential library generation module 958 generates the differential library 962 based on the knowledge of the model used to encode the video and the client version of the library resident in the client device. In this embodiment, only additional models, if any, are included in the differential library.

客戶裝置908會接收差別性程式庫962和已編碼視訊928。客戶裝置908會在程式庫配置模組966處接收差別性程式庫962。程式庫配置模組966會從程式庫儲存模組964處載入被請求視訊之程式庫的客戶版本956。程式庫配置模組966會將該差別性程式庫962和該被請求視訊之程式庫的客戶版本956組合成一組合程式庫970。串流解碼模組910接著會利用該組合程式庫970來解碼已編碼視訊928並且產生已解碼視訊911，該已解碼視訊911會被散佈至下面至少其中一者：記憶體912A、顯示器912B、以及儲存模組912C。該系統能夠辨識、確認、並且再利用來自其中一個高畫質視訊的模型，用以在另一個高畫質視訊中處理並且呈現視訊影像。模型的再利用會縮減用以在客戶裝置908上解碼多個視訊所需要之程式庫的總檔案大小，因為相同的模型會被再利用來解碼器多個視訊。 Client device 908 receives differential library 962 and encoded video 928. Client device 908 will receive differential library 962 at library configuration module 966. The library configuration module 966 loads the client version 956 of the requested video library from the library storage module 964. The library configuration module 966 combines the differential library 962 and the client version 956 of the requested video library into a combined library 970. The stream decoding module 910 then uses the combined library 970 to decode the encoded video 928 and generate a decoded video 911 that is spread to at least one of: memory 912A, display 912B, and The storage module 912C. The system recognizes, confirms, and reuses one of the high paintings A visual video model used to process and present video images in another high-definition video. Reuse of the model reduces the total file size of the libraries needed to decode multiple videos on client device 908 because the same model is reused to decode multiple videos.

預測性模型程式庫 Predictive model library

圖10所示的係在網路170上操作連接至客戶裝置908的視訊儲存庫902的另一範例實施例的方塊圖。舉例來說，在網路170之尖峰使用週期期間，預測性產生與散佈程式庫會有利於客戶裝置908的使用者。舉例來說，假設網路有高流量，倘若因為程式庫先前已經產生並且已被傳送至客戶裝置908而使得該儲存庫不必傳送程式庫的話，網路170便可以在高使用率週期期間使用較小的頻寬。 A block diagram of another exemplary embodiment of a video repository 902 coupled to client device 908 on network 170 is shown in FIG. For example, during the peak usage period of network 170, a predictive generation and dissemination library would benefit the user of client device 908. For example, assuming that the network has high traffic, if the repository does not have to transmit the library because the library has previously been generated and has been transferred to the client device 908, the network 170 can be used during periods of high usage. Small bandwidth.

圖10中的視訊儲存庫902包含串流產生模組924，其包含一預測性程式庫產生模組1002。該預測性程式庫產生模組1002會接收使用者行為描述檔模組1004所產生的使用者行為描述檔1006。該使用者行為描述檔模組1004會儲存使用者資訊，例如，人口統計資訊、地理資訊、社交網路連結資訊、或是移動或移動隊伍小組。該使用者行為描述檔模組1004可能還包含使用者可能觀看之視訊種類的各別喜好資料。某人可能喜歡棒球、NASCAR、以及Family Guy；而另一人則可能喜歡Mad Men以及實境TV。使用者喜好可以從隨選視訊(Video-On-Demand，VOD)資料(例如，先前從儲存庫902處下載的視訊清單)處推知、從使用者訂閱(例如，全季通(season pass))處推知、從使用者視訊佇列處推知、從使用者事先發行購買處推知、或是從此等資料來源之任何組合的協同過濾(collaborative filtering)處推知。使用者觀賞喜好和行為可被用來細化傳遞至個別裝置的特徵模型程式庫；藉由組合使用者喜好資料和廣播排程可以達到進一步細化之目的。 The video repository 902 of FIG. 10 includes a stream generation module 924 that includes a predictive library generation module 1002. The predictive library generation module 1002 receives the user behavior description file 1006 generated by the user behavior description file module 1004. The user behavior description file module 1004 stores user information, such as demographic information, geographic information, social network connection information, or a mobile or mobile team group. The user behavior description file module 1004 may also include individual preference information of the type of video that the user may view. Some people may like baseball, NASCAR, and Family Guy; others may like Mad Men and Real TV. User preferences can be inferred from a Video-On-Demand (VOD) profile (eg, a list of videos previously downloaded from repository 902), subscribed from the user (eg, season pass) Inferred, inferred from the user's video queue, inferred from the user's prior purchase, or inferred from the collaborative filtering of any combination of such sources. User viewing preferences and behaviors can be used to refine the feature model library that is passed to individual devices; Further refinement can be achieved by combining user preferences and broadcast schedules.

預測性程式庫再生模組1002會以使用者行為描述檔1006 為基礎產生一預測性編碼視訊請求1008，以便產生一模型程式庫102。舉例來說，該預測性程式庫產生模組1002會為某個特殊電視節目的粉絲預測性產生一程式庫，如使用者行為描述檔1006中所示。 The predictive library regeneration module 1002 will describe the file 1006 as a user behavior. A predictive encoded video request 1008 is generated based on the basis to generate a model library 102. For example, the predictive library generation module 1002 can generate a library of predictors for a particular television program, as shown in the user behavior description file 1006.

預測散佈以及快取儲存庫能夠改善視訊存取、索引編排、以及存檔(archival)。預期需求情境能夠幫助預測散佈以及快取儲存庫和與該等視訊相關聯的程式庫。 Predictive scatter and cache repositories improve video access, indexing, And archival. The expected demand context can help predict the distribution and cache of the repository and the libraries associated with the video.

於其中一實施例中，需求情境會以依賴性、VOD之已預測事先傳遞、或是已排程之廣播為基礎。需求情境可能包含：長尾VOD(long tail VOD)(也就是，請求沒有經常被選擇的視訊)、人口統計行為描述檔、廣播排程、移動或移動隊伍小組、社交網路、協同過濾、佇列、全季通、或是事先發行購買。每一種情境皆涉及儲存需求和視訊散佈之最佳化。 In one embodiment, the demand context is predicted by dependency, VOD Pre-delivery, or scheduled broadcasts. The demand situation may include: long tail VOD (that is, requesting video that is not frequently selected), demographic behavior description file, broadcast schedule, mobile or mobile team group, social network, collaborative filtering, queue , all seasons, or pre-release purchases. Each scenario involves optimization of storage needs and video distribution.

於其中一實施例中，需求情境為長尾VOD情境。長尾VOD 涉及使用者從要被串流至該使用者的一視訊集處選擇視訊(可能係不受歡迎的視訊)。視訊選擇過程會保持均衡，以便均等存取該集之中的任何視訊資料。在長尾VOD情境中，長尾VOD(沒有經常被選擇的視訊)會利用高需求視訊特徵模型來編碼，從而提高在客戶裝置處可利用該模型資料的可能性並且讓殘餘視訊資料更容易散佈(因為理想上在散佈較高需求資料之後會剩餘較少殘餘視訊資料)。 In one embodiment, the demand context is a long tail VOD scenario. Long tail VOD The user is involved in selecting a video (possibly undesired video) from a video set to be streamed to the user. The video selection process is balanced to evenly access any video material in the episode. In the long tail VOD scenario, long tail VOD (without frequently selected video) is encoded using the high demand video feature model, thereby increasing the likelihood that the model material is available at the client device and making the residual video material easier to distribute (because Ideally, less residual video data will remain after disseminating higher demand data).

於另一實施例中，需求情境係一推薦系統。推薦系統會分析個別使用者的歷史視訊喜好，並且驅使使用者選擇下載可能相稱於使用者之歷史視訊喜好的視訊資料。特徵模型會以使用者的歷史視訊喜好為基礎來組織，以便支持散佈情境。和已預料的使用者需求相關聯的特徵模型會被事先傳遞，以避免高網路需求情境。 In another embodiment, the demand context is a recommendation system. Recommended system will be divided Analysis of historical video preferences of individual users, and driving users to choose downloads may be commensurate with the use The historical video information of the viewers. The feature model is organized based on the user's historical video preferences to support the dissemination of context. Feature models associated with anticipated user needs are passed in advance to avoid high network demand scenarios.

於另一實施例中，需求情境係區域性喜好(舉例來說，來自人口統計行為描述檔資訊)。傳統的喜好能夠從人口統計行為描述檔資訊處推知，因此，儲存庫能夠將內容傳送給區域使用者。內容提供者可以假設資源成本，以便將此內容傳送給該等使用者。 In another embodiment, the demand context is a regional preference (for example, from Demographic behavior description file information). Traditional preferences can be inferred from the demographic behavior description information, so the repository can deliver content to regional users. The content provider can assume resource costs in order to communicate this content to such users.

於另一實施例中，需求情境係廣播排程。模型會藉由廣播排程(舉例來說，已規劃的網路排程)來傳遞或是以廣播排程為基礎來推知。模型會以來自其中一頻道的記錄為基礎被創造並且被再利用於編碼另一頻道上的節目或是相同頻道上的另一個節目。模型能夠從可取自DVD、纜線、…等的視訊資料處推知。於其中一實施例中，模型的傳送可能包含增強資訊，其會提高視訊資料的品質及/或解析度。儲存庫會提供一經推知的「品質」服務，其會增補既有的廣播模型。 In another embodiment, the demand context is a broadcast schedule. Models are delivered by broadcast schedules (for example, planned network schedules) or by broadcast schedules. The model will be created based on records from one of the channels and reused to encode programs on another channel or another program on the same channel. Models can be inferred from video data that can be taken from DVDs, cables, .... In one embodiment, the transmission of the model may include enhanced information that may improve the quality and/or resolution of the video material. The repository will provide a deduced "quality" service that will supplement the existing broadcast model.

於另一實施例中，需求情境係移動或移動隊伍小組。以使用者之移動/隊伍小組為基礎的模型會有視訊資料一致性(舉例來說，相同選手的臉部、隊伍標幟和制服、隊伍的場館、…等)，並且會作地理標靶性散佈。該等模型可能會以多次觀賞瀏覽、重播、高時間解析度、以及即時需求為基礎。該等模型的散佈可能為分層式(tiered)。 In another embodiment, the demand context is a mobile or mobile team group. Models based on the user's mobile/team group will have video data consistency (for example, the same player's face, team logo and uniform, team's venue, etc.) and will be geographically targeted. spread. These models may be based on multiple viewing views, replays, high time resolution, and immediate needs. The distribution of these models may be tiered.

於另一實施例中，需求情境係社交網路連結及/或協同過濾。社交網路會藉由決定使用者之同儕/親屬/朋友的視訊需求來預期需求。協同過濾器會以同儕為基礎來間接預測使用者需求。模型會以社交網路或協同過濾器為基礎從使用者預期觀看的視訊中被推知。 In another embodiment, the demand context is a social network connection and/or collaborative filtering. The social network will anticipate the demand by determining the video needs of the user's peer/relative/friend. Collaborative filters indirectly predict user needs based on peers. Models will be social networks or Based on the collaborative filter, it is inferred from the video that the user expects to watch.

於另一實施例中，需求情境係視訊之佇列。佇列可能係透過要被佇列或時間延遲/偏移之視訊資料的使用者選擇由使用者定義的預期需求優先序。模型會以相對於佇列之內容來最佳化模型使用率為基礎被散佈。 In another embodiment, the demand context is a queue of video.伫 column may be through The user who wants to be queued or time delayed/offset video data selects the expected priority order defined by the user. The model is spread based on the optimal model usage rate relative to the contents of the queue.

於另一實施例中，需求情境係全季通。當所需求之內容的貨幣性(monetization)或排它性(exclusivity)提高並且和內容本身更直接相關的話，模型便可能會以附加費用為基礎，其中，附加費用內容並非一次性使用。於此需求情境中，較高的臨界值係用來保留被散佈的內容以及必要的內容傳遞。除此之外，散佈方式如同移動視訊資料般會有高度的資料自我相似性(舉例來說，相同的演員、布景、或是一套劇情中的圖像)。 In another embodiment, the demand context is all season pass. When the content of the demand If the monetization or exclusivity is increased and more directly related to the content itself, the model may be based on an additional fee, where the surcharge is not a one-time use. In this demand scenario, a higher threshold is used to preserve the content being distributed and the necessary content delivery. In addition, the dissemination method has a high degree of data self-similarity as mobile video data (for example, the same actors, scenery, or a set of plots).

於另一實施例中，需求情境係事先發行購買。事先發行購買包含事先發行視訊資料、預告片、短片、或是樣本「網路劇(webisode)」。散佈具有模型程式庫的視訊會以被傳遞的事先發行購買為基礎。 In another embodiment, the demand context is issued in advance. Pre-release purchases include pre-release video material, trailers, short films, or sample "webisode". The dissemination of video with a model library is based on the pre-release purchases that are delivered.

用以決定儲存庫資料之組織的使用情境可能為事先決定或非事先決定。事先決定的使用情境將處理的重點放在視訊資料的一般性表現。非事先決定的使用情境則將重點放在視訊資料的特定表現。 The use context of the organization that determines the repository data may be determined in advance or not determined in advance. The pre-determined use context will focus on the general performance of the video material. Non-determined usage scenarios focus on the specific performance of video material.

於其中一實施例中，圖10中的視訊集904會響應於預測性編碼視訊請求1008來產生用於編碼之視訊1010，以及提供相關聯視訊模型之基於雜湊索引值。 In one embodiment, the video set 904 of FIG. 10 generates a video 1010 for encoding in response to the predictively encoded video request 1008, and provides a hash-based index value for the associated video model.

於另一實施例中，預測性程式庫產生模組1002還會從一模型程式庫處取得模型982並且使用該模型來取代該基於雜湊索引值。預測性程式庫產生模組1002接著會產生一經預測性產生的程式庫1012。該經預測性產生的程式庫1012會在網路170被傳送至客戶裝置908，客戶裝置908會將該經預測性產生的程式庫1012儲存在程式庫儲存模組964之中。熟習本技術的人士便會瞭解，客戶裝置908會儲存該經預測性產生的程式庫1012並且在其接收適當已編碼視訊要解碼的時候運用該經預測性產生的程式庫1012。熟習本技術的人士還會明白，客戶裝置908上之儲存庫902的其它實施例可以結合該預測性程式庫產生實施例。舉例來說，一程式庫會被預測性產生並且還會以不同的方式被傳送至客戶裝置908，如配合圖9中所述。 In another embodiment, the predictive library generation module 1002 also retrieves the model 982 from a model library and uses the model to replace the hash-based index value. prediction The library generation module 1002 then generates a predictively generated library 1012. The predictively generated library 1012 is transmitted to the client device 908 on the network 170, and the client device 908 stores the predictively generated library 1012 in the library storage module 964. Those skilled in the art will appreciate that client device 908 stores the predictively generated library 1012 and uses the predictively generated library 1012 when it receives the appropriate encoded video to decode. Those skilled in the art will also appreciate that other embodiments of the repository 902 on the client device 908 can be combined with the predictive library to create an embodiment. For example, a library may be predictively generated and transmitted to client device 908 in a different manner, as described in connection with FIG.

上面所述之本發明的實施例能夠彼此配合及不配合使用，用以形成本發明的額外實施例。 The embodiments of the invention described above can be used with each other and not with each other. Used to form additional embodiments of the invention.

本文雖然已經參考本發明之範例實施例來特別顯示及說明過本發明；不過，熟習本技術的人士便會瞭解，可以對其形式和細節進行各種改變，其並不會脫離隨附申請專利範圍所涵蓋之本發明的範疇。舉例來說，本文中雖然已經引用各種系統組件，舉例來說，編解碼器、編碼器、以及解碼器；不過，熟習本技術的人士應該瞭解，可以使用任何其它合宜的硬體或軟體數位處理來施行本文中所述的視訊處理技術。舉例來說，本發明可以各式各樣的電腦架構來施行。圖11A與11B的電腦網路僅係為達解釋之目的，而沒有限制本發明的意圖。 The present invention has been particularly shown and described with reference to the exemplary embodiments of the present invention, but those skilled in the art will understand that various changes in form and detail may be made without departing from the scope of the accompanying claims. The scope of the invention encompassed. For example, various system components, such as codecs, encoders, and decoders, have been cited herein; however, those skilled in the art will appreciate that any other suitable hardware or software digital processing can be used. To implement the video processing technology described in this article. For example, the invention can be implemented in a wide variety of computer architectures. The computer network of Figures 11A and 11B is for illustrative purposes only and is not intended to limit the invention.

本發明的形式可能係完全硬體實施例、完全軟體實施例、或是含有硬體元件與軟體元件兩者的實施例。於其中一較佳的實施例中，本發明以軟體來施行，其包含，但是並不受限於：韌體、常駐軟體、微碼、…等。 The form of the invention may be a fully hardware embodiment, a fully software embodiment, or an embodiment containing both a hardware component and a software component. In one preferred embodiment, the invention is embodied in a software that includes, but is not limited to, firmware, resident software, microcode, and the like.

再者，本發明的形式可能係可從電腦可利用或電腦可讀取之媒體處存取的電腦程式產品，用以提供程式碼供電腦或任何指令執行系統使用，或是配合電腦或任何指令執行系統來提供程式碼。為達解釋本說明之目的，電腦可利用或電腦可讀取之媒體可能係含有、儲存、交換、傳播、或是傳輸該程式供指令執行系統、設備、或裝置使用的任何設備，或是配合指令執行系統、設備、或裝置含有、儲存、交換、傳播、或是傳輸該程式的任何設備。 Furthermore, the form of the invention may be a computer program product accessible from a computer-usable or computer-readable medium for providing code for use by a computer or any instruction execution system, or in conjunction with a computer or any instruction. Execute the system to provide the code. For the purposes of this description, a computer-usable or computer-readable medium may contain, store, exchange, transmit, or transmit the program for use by any system, device, or device used by the instruction, or An instruction execution system, device, or device that contains, stores, exchanges, propagates, or transmits any device of the program.

媒體可能係電子式、磁式、光學式、電磁式、紅外線式、或是半導體系統(或是設備或裝置)或傳播媒體。適合儲存及/或執行程式碼的資料處理系統包含經由系統匯流排直接或間接被耦合至記憶體元件的至少一處理器。該等記憶體元件可能包含在該程式碼之實際執行期間所運用的區域性記憶體、大量儲存體、以及快取記憶體，快取記憶體會暫時儲存至少特定程式碼，以便減少在執行期間從大量儲存體處擷取程式碼的時間。 The media may be electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems (or devices or devices) or media. A data processing system suitable for storing and/or executing code includes at least one processor coupled directly or indirectly to a memory component via a system bus. The memory elements may include regional memory, a large number of banks, and cache memory that are utilized during actual execution of the code, and the cache memory temporarily stores at least a particular code to reduce The time at which a large amount of storage is captured.

網路轉接器可能也會被耦合至該系統，以便讓資料處理系統經由中間的私有或公眾網路被耦合至其它資料處理系統或遠端印表機或儲存裝置。數據機、有線數據機、以及乙太網路卡僅係數種目前可用類型的網路轉接器。 A network switch may also be coupled to the system to allow the data processing system to be coupled to other data processing systems or remote printers or storage devices via an intermediate private or public network. Data modems, cable modems, and Ethernet cards are only available in a variety of currently available types of network adapters.

於一實施例中，圖11A所示的便係其中一種此類環境。(多部)客戶電腦/裝置1110以及一雲端1112(或是伺服器電腦或其叢集)提供用以執行應用程式和類似物的處理、儲存、以及輸入/輸出裝置。(多部)客戶電腦/裝置1110可能還會經由通訊網路1116被連結至其它計算裝置，該等其它計算裝置包含其它)客戶裝置/處理器1110以及(多部)伺服器電腦1112。通訊網路1116可能係遠端存取網路、全球性網路(舉例來說，網際網路)、全世界的電腦集合、區域性網路或廣域性網路、以及目前使用個別協定(TCP/IP、藍芽、…等)來彼此通訊之閘道器的一部分。其它電子裝置/電腦網路架構皆適合。 In one embodiment, one of such environments is illustrated in Figure 11A. The (multiple) client computer/device 1110 and a cloud 1112 (or server computer or cluster thereof) provide processing, storage, and input/output devices for executing applications and the like. The (multiple) client computers/devices 1110 may also be linked to other computing devices via communication network 1116, which include other client devices/processors 1110 and (multiple) server computers 1112. communication Network 1116 may be a remote access network, a global network (for example, the Internet), a worldwide collection of computers, a regional network or a wide area network, and currently using individual protocols (TCP) /IP, Bluetooth, ..., etc.) Part of the gateway that communicates with each other. Other electronic devices/computer network architectures are suitable.

圖11B所示的係圖11A之處理環境中的電腦/計算節點(舉例來說，客戶處理器/裝置1110或是伺服器電腦1112)之內部結構的圖式。每一部電腦1110、1112皆含有一系統匯流排1134，其中，匯流排係用於在一電腦或處理系統的組件之間進行資料傳輸的一組真實或虛擬硬體線路。匯流排1134基本上係一條共用導管，其連接一電腦系統的不同元件(舉例來說，處理器、碟片儲存體、記憶體、輸入/輸出埠、…等)，用以在該等元件之間傳輸資訊。附接至系統匯流排134的係I/O裝置介面1118，用以將各種輸入裝置和輸出裝置(舉例來說，鍵盤、滑鼠、顯示器、印表機、揚聲器、…等)連接至電腦1110、1112。網路介面1122允許該電腦連接至被附接至一網路(舉例來說，圖11A的1116處所示的網路)的各種其它裝置。記憶體1130為被用來施行本發明之實施例的電腦軟體指令1124和資料1128(舉例來說，圖1至10所有圖式中所述的編解碼器、視訊編碼器/解碼器、特徵模型、模型程式庫以及支援碼)提供揮發性儲存。碟片儲存體1132為被用來施行本發明之實施例的電腦軟體指令1124(等同於「OS程式1126」)和資料1128提供非揮發性儲存；其亦可以長期儲存的方式被用來儲存模型或是用來儲存壓縮格式的視訊。中央處理器單元1120同樣會被附接至系統匯流排1134並且用以執行電腦指令。請注意，在整篇內文中，「電腦軟體指令」以及「OS程式」為等效意義。 Figure 11B is a diagram showing the internal structure of a computer/computing node (e.g., client processor/device 1110 or server computer 1112) in the processing environment of Figure 11A. Each computer 1110, 1112 includes a system bus 1134, wherein the bus is a set of real or virtual hardware lines for data transfer between components of a computer or processing system. The busbar 1134 is basically a common conduit that connects different components of a computer system (for example, a processor, a disk storage, a memory, an input/output port, etc.) for use in the components. Transfer information. An I/O device interface 1118 attached to the system bus 134 for connecting various input devices and output devices (eg, keyboard, mouse, display, printer, speaker, etc.) to the computer 1110 , 1112. The network interface 1122 allows the computer to connect to various other devices that are attached to a network (for example, the network shown at 1116 of Figure 11A). Memory 1130 is computer software instructions 1124 and data 1128 used to implement embodiments of the present invention (for example, the codecs, video encoders/decoders, and feature models described in all of Figures 1 through 10) , model library and support code) provide volatile storage. The disk storage 1132 provides non-volatile storage for computer software commands 1124 (equivalent to "OS program 1126") and data 1128 used to implement embodiments of the present invention; it can also be used to store models for long-term storage. Or used to store video in compressed format. Central processor unit 1120 will also be attached to system bus 1134 and used to execute computer instructions. Please note that throughout the text, "computer software instructions" and "OS programs" are equivalent.

於其中一實施例中，該等處理器標準程序1124和資料1128係一種電腦程式產品(通稱為1124)，其包含能夠被儲存在儲存裝置1128中的電腦可讀取媒體，其會提供用於本發明系統之軟體指令的至少一部分。如本技術中已熟知，電腦程式產品1124能夠由任何合宜的軟體安裝程序來安裝。於另一實施例中，該等軟體指令的至少一部分亦可在纜線連接線、通訊連接線、及/或無線連接線上被下載。於其它實施例中，本發明程式係一種被具現為傳播媒體上之被傳播訊號(舉例來說，無線電波、紅外線波、雷射波、聲波、或是在全球性網路(例如，網際網路)或其它網路上被傳播的電波)的電腦程式傳播訊號產品1114(圖11A中)。此載波媒體或此等載波訊號提供用於本發明標準程序/程式1124、1126之軟體指令的至少一部分。 In one embodiment, the processor standard programs 1124 and 1128 are a computer program product (collectively 1124) that includes computer readable media that can be stored in the storage device 1128, which is provided for At least a portion of the software instructions of the system of the present invention. As is well known in the art, computer program product 1124 can be installed by any suitable software installation program. In another embodiment, at least a portion of the software instructions can also be downloaded over a cable connection, a communication connection, and/or a wireless connection. In other embodiments, the inventive program is a transmitted signal that is presently on the media (for example, radio waves, infrared waves, laser waves, sound waves, or on a global network (eg, the Internet) Computer program to transmit signal 1114 (in Figure 11A). The carrier medium or such carrier signals provide at least a portion of the software instructions for the standard programs/programs 1124, 1126 of the present invention.

於替代實施例中，該被傳播訊號係一種該傳播媒體中所承載之類比載波或數位訊號。舉例來說，該被傳播訊號可能係在全球性網路(舉例來說，網際網路)、電信網路、或是其它網路上被傳播的數位化訊號。於其中一實施例中，該被傳播訊號會在一時間週期中於該傳播媒體中被傳送，例如，在數毫秒、數秒、數分鐘、或更長的週期中於一網路上以封包被發送的軟體應用程式的指令。於另一實施例中，電腦程式產品1124的電腦可讀取媒體係一種電腦系統1110可接收與讀取的傳播媒體，例如，藉由接收該傳播媒體並且辨識被具現在該傳播媒體中之被傳播訊號，如上面針對電腦程式傳播訊號產品所述。 In an alternative embodiment, the transmitted signal is an analog carrier or digital signal carried in the propagation medium. For example, the transmitted signal may be a digital signal that is transmitted over a global network (for example, the Internet), a telecommunications network, or other network. In one embodiment, the propagated signal is transmitted in the propagation medium for a period of time, for example, sent in a packet on a network in a period of milliseconds, seconds, minutes, or longer. The instructions of the software application. In another embodiment, the computer readable medium of the computer program product 1124 is a broadcast medium that the computer system 1110 can receive and read, for example, by receiving the broadcast medium and recognizing the being present in the broadcast medium. Propagation signals, as described above for computer program communication products.

應該注意的係，本文中所述的圖式雖然圖解範例資料/執行路徑以及組件；不過，熟習本技術的人士便會瞭解，送往/來自此等個別組件的資料運算、資料排列、資料流會相依於要被壓縮之視訊資料的施行方式與類型而改變。所以，任何資料模組/資料路徑組成的排列皆可以使用。 It should be noted that the figures described herein illustrate example data/execution paths and components; however, those skilled in the art will appreciate the data operations, data arrangements, data flows to/from such individual components. Will depend on the party of the video material to be compressed The style and type change. Therefore, any arrangement of data modules/data paths can be used.

本文雖然已經參考本發明之範例實施例來特別顯示及說明過本發明；不過，熟習本技術的人士便會瞭解，可以對其形式和細節進行各種改變，其並不會脫離隨附申請專利範圍所涵蓋之本發明的範疇。 The present invention has been particularly shown and described with reference to the exemplary embodiments of the present invention, but those skilled in the art will understand that various changes in form and detail may be made without departing from the scope of the accompanying claims. The scope of the invention encompassed.

20-1~20-n‧‧‧訊框 20-1~20-n‧‧‧ frame

30-1~30-n‧‧‧特徵區域 30-1~30-n‧‧‧Characteristic area

40‧‧‧總體矩陣 40‧‧‧Overall matrix

Claims

A method of providing video data, comprising: encoding a target video stream by using a feature-based compression method from a feature model of a global feature model library, the code implicitly utilizing the feature models to represent the a macroblock in the target video for encoding to cause encoded video data; and transmitting the encoded video data to a requesting device upon receipt of the command, the features from the global feature model library The model allows access by the requesting device and enables decoding of the encoded video material at the requesting device; wherein the global feature model library is formed by receiving one or more input video, each An input video is different from the target video stream; and feature information and a different feature model are generated for each of the input videos.

The method of claim 1, wherein the feature-based compression method applies feature-based prediction to a plurality of different video sources based on the feature models.

The method of claim 1, wherein the global feature model library is further formed by storing the feature models generated by the input video locations in a data storage body or a cloud storage body. The storage or cloud storage system provides a suitable feature model, the feature based compression method and the request device.

The method of claim 1, wherein the global feature model library is further formed by: identifying and indexing features in the input video for each input video, such The indexed features form the individual feature model, the indexing arrangement including a location for the identified feature in the input video for each identified feature; and wherein the encoded video is decoded at the requesting device The data system uses the indexed features and corresponding feature locations in the input video to retrieve the feature models and decode the encoded video material.

The method of claim 4, wherein the indexing of the features is based on a hash index.

The method of claim 1, wherein the feature model from the global feature model library forms a working model subset library, which is specialized according to the requesting device or according to the target video stream.

According to the method of claim 1, wherein the feature model from the global feature model library forms a working model subset library, which is a differential program based on the state of any library in the requesting device. Library.

The method of claim 1, wherein the feature model from the global feature model library forms a working model subset library, which is a predictive model library, which is used at the end of the request device. The behavior description file is a function to store the feature model.

The method of claim 8 wherein the predictive model library has modifiable (settable) parameters for application to a wide variety of demand scenarios.

A video data system comprising: a repository for storing video data and serving as a streaming video source; and a codec operatively coupled to the repository, and the codec is responsive to a particular The video request is executed by a processor for (i) encoding corresponding to the requested particular view The stored video data in the repository, and (ii) streaming the encoded video data from the repository, wherein the codec is based on using a feature model from the global feature model library Feature prediction, wherein the global feature model library is formed by receiving one or more input videos, each input video being different from the stored video data in the repository corresponding to the requested specific video; And generating feature information and a different feature model for each of the input videos; causing the codec to be based on the stored video data in the repository corresponding to the requested specific video. The different video data is applied based on feature prediction, and the plurality of different video materials include the input video of the global feature model library.

According to the video data system of claim 10, wherein the codec is: encoding the stored video data in the repository by feature-based compression based on feature prediction based on the features, and The encoded video data is transmitted to a requesting device, the encoded video data being from the streamed video material of the repository.

The video data system of claim 11, wherein the feature models from the global feature model library are accessible to the requesting device and enable decoding of the encoded video material at the requesting device.

According to the video data system of claim 11, wherein the global feature model library is further formed by: identifying and indexing features in the input video for each input video, the indexed features Forming the individual feature model, the indexing arrangement is included for each An identified feature to indicate a location of the identified feature in the input video; and wherein decoding the encoded video data at the requesting device uses the indexed features and corresponding features in the input video Positioning to obtain the feature models and decoding the encoded video material.

According to the video data system of claim 13, wherein the indexing of features is based on a hash index.

According to the video data system of claim 10, wherein the feature models are stored in the repository.

According to the video data system of claim 10, wherein the feature model from the global feature model library forms a working model subset library, which is in accordance with the requesting device or according to the streamed encoded video. The information is specialized.

According to the video data system of claim 10, wherein the feature model from the global feature model library forms a working model subset library, which is based on the difference of the state of any library in the requesting device. Library.

According to the video data system of claim 10, wherein the feature model from the global feature model library forms a working model subset library, which is a predictive model library, which is end user The behavior description file is a function to store the feature model.

According to the video data system of claim 18, the predictive model library has modifiable parameters for application to a wide variety of demand scenarios.

A computer program product comprising a code component for controlling a computer to perform any of the systems specified in the scope of the preceding claims when the code component is loaded into a computer.

A computer program product comprising a code component, when the code component is loaded into a battery In the brain, the computer is controlled to execute instructions to help implement the system of claim 1 of the patent.

A computer program product comprising a code component that, when loaded into a computer, controls the computer to execute instructions to assist in implementing the system of claim 10 of the patent application.

A video data system comprising: a repository component for storing video data and serving as a streaming video source; and a codec component operatively coupled to the repository component, and wherein the codec component is Responding to a request for a particular video is performed by a processor component for (i) encoding stored video material in the repository component corresponding to the requested particular video, and (ii) streaming from the repository component The encoded video material, wherein the codec component is based on feature prediction using a feature model from a global feature model library, wherein the global feature model library is formed by: receiving one or more Input video, each input video being different from the stored video data in the repository component corresponding to the requested particular video; and generating feature information and a different feature model for each of the input video;俾 causing the codec component to be based on the stored video material in the repository component corresponding to the requested particular video Feature-based prediction is applied to a plurality of different video data, the plurality of different video materials including the input video of the global feature model library.