TW200934218A

TW200934218A - Conveying auxiliary information in a multiplexed stream

Info

Publication number: TW200934218A
Application number: TW097132961A
Authority: TW
Inventors: Philip Steven Newton; Wilhelmus Hendrikus Alfonsus Bruls
Original assignee: Koninkl Philips Electronics Nv
Priority date: 2007-08-31
Filing date: 2008-08-28
Publication date: 2009-08-01
Also published as: WO2009027923A1

Abstract

A system (650) receives a multiplexed stream. The system comprises a demultiplexer (654) for extracting at least a video elementary stream and an audio elementary stream from the multiplexed stream. The system further comprises a decoder (656) for extracting auxiliary information from the audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of the video elementary stream. The auxiliary information comprises depth information relating to at least one video frame encoded in the video elementary stream.

Description

200934218 九、發明說明：【發明所屬之技術領域】本發明係關於處置-多工之資料流。更特定言之本發明係關於在-多工之資料流中傳送輔助資訊。本發明亦係關於傳送深度資訊。、【先前技術】200934218 IX. Description of the invention: [Technical field to which the invention pertains] The present invention relates to a data stream of disposal-multiplexing. More specifically, the present invention relates to the transmission of auxiliary information in a data stream of multiplex. The invention also relates to transmission depth information. [Prior Art]

3D視訊係藉由自動立體顯示器中之新進展及現有技術之改良而促進，諸如具有高更新率之射束器及監測器。該等顯示器可呈現（render)-具有一深度感之視訊場景。藉由保證觀察者的兩個眼睛感知二個稍微不同之影像，影像中一些呈現之物件被感知為離觀察者較近，而一些呈現之物件被感知為離觀察者更遠。此増強觀看體驗。在3D視訊线之設計中，必須解決的—個層面係如何將建立深度感所需之深度資訊傳送至顯示器。此種深度資訊可。有附加資之形式，其補充一習知之2D視訊場景。如此，習知之2D顯*器可藉由簡單地呈現2D視訊場景而使用視訊資料’ _顯示器可基於深度資訊處理扣視訊場景以實現深度感。然而’許多裝置係根據定義如何處置2〇視訊之標準而操作因此ϋ非所有的現有之硬體及#體均被良好地製備以處置深度資訊。【發明内容】具有種經改良之方法以在多工之資料流中傳送輔助資訊將是有利的。為了更好地解決此問題，在本發明之一第 133819.doc 200934218 一層面中提出一種系統，其包含：一輸入，用於接收該多工之資料流； -解多工器，用於自該多工之系統榻取至少一視訊基本流及一音訊基本流；及一解碼器’用於自該音訊基本流棟取輔助資訊，其中該辅助資訊包含用於增強該視訊基本流之一視覺體驗的資訊。此種系統可自能夠處置視訊基本流及音訊基本流的一介面、一裝置或一程式接收具有輔助資訊之多工信號，即使該介面、裝置或程式並不明確支援該特定種類之輔助資訊的處置。為視訊保留之頻寬未被使用，這允許視訊框具有全解析度。在一實施例中，輔助資訊包含深度資訊。深度資訊可與編碼於視訊基本流中的至少一個視訊框相關。深度資訊可包含遮蔽資訊及/或視差（disparity)資訊。一實施例包含：移除機構，用於自該音訊基本流移除該輔助資訊以獲取沒有輔助資訊的一第二音訊基本流；及一輸出，用於提供該第二音訊基本流。此允許系統將純音訊資訊轉送至一處置音訊資料的裝置’舉例而言一音訊呈現器諸如一放大器或一環繞系統。此允許包括輔助資訊的多工之資料流連同一放大器或環繞系統一起使用，該放大器或環繞系統係未經調適用以處置一含有該輔助資訊的音訊流。 133819.doc -7- 200934218 在一實施例中，深度資訊包含一深度基本流之至少一部分’該深度基本流包含補助視訊基本流的深度及/或視差值。在一實施例中，音訊基本流包含對應於至少6個環繞通道及至少一個低音通道的至少7個音訊通道；及 ❹3D video is facilitated by new advances in autostereoscopic displays and improvements in the prior art, such as beamers and monitors with high update rates. The displays can be rendered - a video scene with a sense of depth. By ensuring that the observer's two eyes perceive two slightly different images, some of the objects appearing in the image are perceived as being closer to the viewer, while some of the presented objects are perceived to be further away from the viewer. This barely watched the experience. In the design of 3D video lines, the layer that must be solved is how to transmit the depth information needed to establish the sense of depth to the display. This kind of in-depth information is available. There is a form of additional funding that complements a conventional 2D video scene. Thus, the conventional 2D display can use the video data by simply presenting the 2D video scene. The display can process the video scene based on the depth information to achieve a sense of depth. However, many devices operate on the basis of defining how to handle the standard of 2 video. Therefore, not all existing hardware and body are well prepared to handle depth information. SUMMARY OF THE INVENTION It would be advantageous to have an improved method for transmitting auxiliary information in a multiplexed data stream. In order to better solve this problem, a system is proposed in a layer 133819.doc 200934218 of the present invention, comprising: an input for receiving the data stream of the multiplex; - a multiplexer for self The multiplexed system couch takes at least one video elementary stream and an audio elementary stream; and a decoder 'for extracting auxiliary information from the audio elementary stream, wherein the auxiliary information includes a visual for enhancing one of the video elementary streams Information about the experience. The system can receive a multiplexed signal with auxiliary information from an interface, a device or a program capable of handling the video elementary stream and the audio elementary stream, even if the interface, device or program does not explicitly support the particular type of auxiliary information. Dispose of. The bandwidth reserved for video is not used, which allows the video frame to have full resolution. In an embodiment, the auxiliary information includes depth information. The depth information can be associated with at least one video frame encoded in the video elementary stream. Depth information can include occlusion information and/or disparity information. An embodiment includes: a removal mechanism for removing the auxiliary information from the audio elementary stream to obtain a second audio elementary stream without auxiliary information; and an output for providing the second audio elementary stream. This allows the system to forward pure audio information to a device that handles audio data' as an example, an audio presenter such as an amplifier or a surround system. This allows multiplexed data including auxiliary information to be streamed together with the same amplifier or surround system that is unadjusted to handle an audio stream containing the auxiliary information. 133819.doc -7- 200934218 In an embodiment, the depth information includes at least a portion of a depth elementary stream that includes a depth and/or a disparity value of the supplemental video elementary stream. In an embodiment, the audio elementary stream includes at least seven audio channels corresponding to at least six surround channels and at least one low channel; and

解碼器經配置用以自該等環繞通道之數個操取輔助資訊’其中該數目至多等於環繞通道之數目減去五。可有利地利用這-事實，即一些媒體標準可容納多於五個，例如七個環繞音訊通道（及常常一個低音音訊通道），而大多數家庭並不使用多於五個的環繞揚聲器（及該低音音訊揚聲器在此種情形中’至少一個，例如二個音訊通道未被使用。此實施例使用該未用容量以傳送辅助資訊而不對視訊及音訊之呈現有任何折衷。在一實施例中，該輸入包含一標準化數位音訊/視訊介面。一些標準化數位音訊/視訊介面，例如HDMI4Dvi，提供-標準化方法以在裝置之間交換2〇視訊及音訊資料，但並未提#一標iM匕方法以交換一些特定類型之輔助資訊。此實施例提供一方法用以經由音訊/視訊介面輸送輔助資汛而不對音訊/視訊介面或對應之標準做出任何修改。在一實施例中，該輸入包含一讀取器，其用於讀取一媒體載體以提供多工之資料流。此實施例允許將輔助資訊作為一音訊軌之至少一部分而儲存在一媒體載體例如一藍光 133819.doc 200934218 磁碟或一 HD-DVD上。一實施例包含用於確認輔助資訊是否已被包括在音訊基本流中並取決於輔助資訊之檢測而啓動解碼器的機構。此允許彈性處置在一音訊基本流中有及沒有輔助資訊的多工之資料流》一實施例包含一用於提供一多工之資料流的系統，該系統包含：一編碼器，用於將辅助資訊及音訊資料一起編碼為一音訊基本流，其中該輔助資訊包含用於增強一視訊基本流之一視覺體驗的資訊；一多工器’用於將至少該視訊基本流及該音訊基本流組合為一多工之資料流；及一輸出，用於提供該多工之資料流。此允許準備好一組合之音訊/視訊/輔助流的傳輸或儲存’因此其可被裝置、介面或程式使用或傳送，該等裝置、介面或程式不必經調適以容納輔助資訊。實施例包含一讀取器，其用以自一内容資料載體獲取視訊基本流、辅助資訊及音訊資料，其中該辅助資訊已按不同於一基本音訊流之格式而被編碼於該内容資料載體上。一讀取器及一電視機可能為3D做好準備，但連接二者之介面可能未準備好^此實施例允許讀取器自一媒體載體讀取辅助資訊並將其編碼在一音訊通道中以經由未準備之介面傳輸。一實施例包含一内容資料載體，該内容資料載體包含一 133819.doc -9- 200934218 多工之資料流’該多工之資料流包含至少一視訊基本流及一音訊基本流’其中該印訊基本流包含辅助資訊，且立中該輔助資訊包含用於增強該視訊基本流之一呈現的視覺體驗之資訊。此允許内容資料載體被用在一未對辅助資訊做準備的讀取器上。讀取器可能只是將音訊/視訊流轉送至一顯示器，其中顯示器可自該音訊流擷取輔助資訊並使用該輔助資訊呈現視訊及音訊。在一實施例中，輔助資訊係被儲存在一第一音訊資料流中且音訊資訊係被儲存在一第二音訊資料流中，且提供一播放清單，其包含一指令用以將該第一音訊資料流及該第二音訊資料流混合為一單音訊基本流。藉由將輔助資訊及音訊資訊儲存在分離之流中，可定義不同之播放/月單以用於具有及不具有藉由輔助資訊之增強的呈現。此實施例提及一播放清單，其將音訊資訊與深度資訊混合為一單音訊輸出，該單音訊輸出可被呈現為31)深度感。該播放清單可與一可由使用者選擇的選單項目關聯。可提供另一選單項目以用於在音訊基本流中不包括輔助資訊之唯音訊資訊的呈現。一實施例包含一信號，該信號代表一包含至少一視訊基本流及一音訊基本流的多工之資料流，其中該音訊基本流包含輔助資訊’且其中該輔助資訊包含用於增強該視訊基本流之一呈現的視覺體驗之資訊。該信號可例,如經由DVB 廣播或經由網際網路而傳輸。 133819.doc 200934218 一實施例包含一種用於處置一多工之資料流的方法，該方法包含：接收該多工之資料流；自該多工之資料流擷取至少一視訊基本流及一音訊基本流；及 • 自該音訊基本流搁取輔助資訊’且其中該辅助資訊包含 • 用於增強該視訊基本流之一視覺體驗之資訊。一實施例包含一種用於提供一多工之資料流的方法該 ® 方法包含：將輔助資訊及音訊資料一起編碼為一音訊基本流，其中該輔助資訊包含用於增強一視訊基本流之一視覺體驗的資訊；將至少該視訊基本流及該音訊基本流組合為一多工之資料流；及提供該多工之資料流。 Φ -實施例包含—電腦程式產品’該電腦程式產品包含用於引起一處理器執行所述之該等方法之任一者的指令。將參考圖式進一步說明及描述本發明之此等及其他層面。【實施方式】由於新穎之自動立體顯示器的引入及現有技術的改良，諸如具有高更新率之射束器及監測器，3D視訊正重新復甦。引入3D視訊及3D圖形需要對適於視訊2MpEG標準做出改變。有關MPEG的工作正在進行中，以標準化不同之 133819.doc • U· 200934218 3D視訊格式，此等之一個係基於2D加深度。此方法係基於3D顯示技術，其可使用一 2D影像及一稱為深度圖之附加影像而計算多重視圖。此深度圖傳送2D影像中物件之深度的相關資訊。深度圖中之灰階值指示2D影像中關聯之像素的深度。一立體顯示器可藉由使用自深度圖之深度值及藉由計算必需之像素變換而計算為立體所必需的附加視圖。在此項技術中用於實現此點的方法係為吾人熟知的，例如見L· McMillan博士論文："一種用於三維電腦圖形之基於影像的方法（An image-based approach to three dimensional computer graphics)"，UNC電腦科學，TR97-013，1997。圖1顯示一 2D影像及一關聯之深度圖的一實例。發生的一個問題係，當使用2D+深度作為輸入而產生多重視圖時’視訊中一些物件在某一視圖中變得可見，在2D中該等物件係藉由所傳輸之2D影像中的其他物件所遮蔽。此被稱為解遮蔽問題。當前此係藉由使用空洞填充技術以覆蓋遺漏部分而解決。然而對於具有一高位準之深度的物件，此引起可見之假影（artifact)。一解決方案係發送附加資料至顯示器，其可用以填充於經解遮蔽之區域中。當前之自動立體顯示器犧牲解析度以在螢幕上呈現多重視圖。在飛利浦9視圖雙凸透鏡顯示器（Philips 9 view lenticular display)中，立體影像之解析度係960x540。解析度之此縮減使得可以一 1920x1080框發送必需的所有資料。圖2顯示一具有四個960x540象限的1920x1080框。左上 133819.doc •12· 200934218 象限承載2D影㈣料。右上象限承載深度資訊。左下及右下象限承載遮蔽資料。在本文中，，，深度資訊"應指對轉變一2D影像為-卿像所必需或可有幫狀任何資訊。因此，深度資訊可包括或可能不包括遮蔽資訊。The decoder is configured to fetch auxiliary information from the plurality of surround channels' wherein the number is at most equal to the number of surround channels minus five. This can be advantageously exploited by the fact that some media standards can accommodate more than five, such as seven surround audio channels (and often a bass audio channel), while most homes do not use more than five surround speakers (and In this case, the bass audio speaker is 'at least one, for example two audio channels are unused. This embodiment uses the unused capacity to transmit auxiliary information without any compromise for video and audio presentation. In an embodiment The input includes a standardized digital audio/video interface. Some standardized digital audio/video interfaces, such as HDMI4Dvi, provide a standardized method for exchanging 2 frames of video and audio data between devices, but do not mention the #1 standard iM method. To exchange some specific types of auxiliary information. This embodiment provides a method for transmitting auxiliary information via the audio/video interface without making any modifications to the audio/video interface or corresponding standards. In an embodiment, the input includes a reader for reading a media carrier to provide a multiplexed data stream. This embodiment allows for assistance The information is stored as a portion of a voice track on a media carrier such as a Blu-ray 133819.doc 200934218 disk or an HD-DVD. An embodiment includes determining whether auxiliary information has been included in the audio elementary stream and depends on A mechanism for activating a decoder for detection of auxiliary information. This allows for flexible processing of a multiplexed data stream with and without auxiliary information in an audio elementary stream. An embodiment includes a system for providing a multiplexed data stream. The system includes: an encoder for encoding auxiliary information and audio data together into an audio elementary stream, wherein the auxiliary information includes information for enhancing a visual experience of a video elementary stream; And combining at least the video elementary stream and the audio elementary stream into a multiplexed data stream; and an output for providing the multiplexed data stream. This allows a combined audio/video/auxiliary stream to be prepared for transmission. Or storage 'so it can be used or transmitted by a device, interface or program that does not have to be adapted to accommodate auxiliary information. The example includes a reader for acquiring a video elementary stream, auxiliary information and audio data from a content material carrier, wherein the auxiliary information has been encoded on the content material carrier in a format different from a basic audio stream. A reader and a TV may be ready for 3D, but the interface between the two may not be ready. ^This embodiment allows the reader to read auxiliary information from a media carrier and encode it in an audio channel. The embodiment comprises a content data carrier, the content data carrier comprising a 133819.doc -9-200934218 multiplexed data stream. The multiplexed data stream includes at least one video elementary stream and one The audio elementary stream 'where the print elementary stream contains auxiliary information, and the auxiliary information includes information for enhancing the visual experience presented by one of the video elementary streams. This allows the content material carrier to be used on a reader that is not prepared for auxiliary information. The reader may simply forward the audio/video stream to a display from which the display can retrieve auxiliary information and use the auxiliary information to present video and audio. In one embodiment, the auxiliary information is stored in a first audio stream and the audio information is stored in a second audio stream, and a playlist is provided, the command comprising an instruction for the first The audio data stream and the second audio data stream are mixed into a single audio elementary stream. By storing auxiliary information and audio information in separate streams, different play/monthly sheets can be defined for presentation with and without enhanced information with auxiliary information. This embodiment refers to a playlist that mixes the audio information with the depth information into a single audio output that can be presented as a 31) sense of depth. The playlist can be associated with a menu item that can be selected by the user. Another menu item can be provided for presentation of audiovisual information that does not include auxiliary information in the audio elementary stream. An embodiment includes a signal representing a multiplexed data stream including at least one video elementary stream and an audio elementary stream, wherein the audio elementary stream includes auxiliary information 'and wherein the auxiliary information includes for enhancing the video basic Information about the visual experience presented by one of the streams. The signal can be transmitted, for example, via DVB broadcast or via the Internet. 133819.doc 200934218 An embodiment includes a method for processing a multiplexed data stream, the method comprising: receiving the multiplexed data stream; extracting at least one video elementary stream and an audio from the multiplexed data stream The elementary stream; and • the auxiliary stream information is taken from the audio elementary stream and wherein the auxiliary information includes: information for enhancing a visual experience of the video elementary stream. An embodiment includes a method for providing a multiplexed data stream. The method includes: encoding auxiliary information and audio data together into an audio elementary stream, wherein the auxiliary information includes a visual for enhancing one of the video elementary streams The information of the experience; combining at least the video elementary stream and the audio elementary stream into a multiplexed data stream; and providing the multiplexed data stream. Φ - The embodiment comprises - a computer program product - the computer program product comprising instructions for causing a processor to perform any of the methods described. These and other aspects of the invention are further illustrated and described with reference to the drawings. [Embodiment] Due to the introduction of novel autostereoscopic displays and improvements in the prior art, such as beamers and monitors with high update rates, 3D video is being re-expanded. The introduction of 3D video and 3D graphics requires changes to the 2MpEG standard for video. Work on MPEG is underway to standardize the different 133819.doc • U· 200934218 3D video formats, one of which is based on 2D plus depth. This method is based on a 3D display technique that calculates a multiple view using a 2D image and an additional image called a depth map. This depth map conveys information about the depth of the object in the 2D image. The grayscale value in the depth map indicates the depth of the associated pixel in the 2D image. A stereoscopic display can be calculated as an additional view necessary for stereoscopic use by using depth values from the depth map and by calculating the necessary pixel transformations. The method used to achieve this in this technology is well known, for example, see Dr. L. McMillan's thesis: "An image-based approach to three dimensional computer graphics ) ", UNC Computer Science, TR97-013, 1997. Figure 1 shows an example of a 2D image and an associated depth map. One problem that arises is that when multiple views are generated using 2D+depth as input, some objects in the video become visible in a view, and in 2D the objects are in other objects in the transmitted 2D image. Covered. This is called the problem of occlusion. This is currently solved by using a hole filling technique to cover the missing portion. However, for objects with a high level of depth, this causes visible artifacts. One solution is to send additional material to the display that can be used to fill the unmasked area. Current autostereoscopic displays sacrifice resolution to present multiple views on the screen. In the Philips 9 view lenticular display, the resolution of the stereo image is 960x540. This reduction in resolution allows all the necessary information to be sent in a 1920x1080 box. Figure 2 shows a 1920x1080 frame with four 960x540 quadrants. Top left 133819.doc •12· 200934218 The quadrant carries 2D shadows (four). The upper right quadrant carries depth information. The lower left and lower right quadrants carry the masking data. In this article, the "in-depth information" should refer to any information necessary or useful for transforming a 2D image into a - image. Therefore, the depth information may or may not include occlusion information.

❹ 基於2D+深度+遮蔽資料引入立體視訊具有這一問題，即，為立體視訊添加附加資料明顯增加必需的頻寬。不僅係自一磁碟的讀取頻寬，而且解碼頻寬及資料量都必須越過介面從播放器發送至—顯示器。因此已尋找到—種解決方案以減少待發送之附加資料的大小。找到的-種解決方案將減乂深度圖之解析度。解析度之此縮減可與2D内容之 1/8-樣大(垂直及水平解析度之1/8)。對於一 hd解析度2d 視訊（1920x1080)，這意味著深度圖可小至24〇χΐ44。當前之3D顯示器可犧牲解析度（像素）以對—使用者呈現多重視圖。此等顯不器採取例如一 192〇χ1〇8〇之框作為輸入格式，藉此2D視訊佔用一 960χ540之大小。基於解析度縮減至1/8之2D螢幕大小，深度圖之大小可小為12〇χ67 5，同時仍對使用者維持3D視訊感之相同品質。一問題係如何使用現有之影片出版格式諸如DVD、藍光磁碟及HD-DVD來承載附加深度資訊。因為深度圖之低位元率及因為深度圖僅使用一有限集的深度值之事實，即使在高解析度視訊諸如HDTV之情況下，亦可將深度圖資訊編碼為一音訊流❶此"音訊"流可與"真實的"音訊流一起被發送至一3D顯示器，其中”真實的”音訊流包含待通過揚聲器呈現之音訊波形。3D顯示器其後可分離地解碼含有深度 133819.doc •13· 200934218 圖之音訊流並將其與2D視訊流一起用於產生3D感。這將允許3D視訊使用現有之標準諸如藍光磁碟及HD-DVD，以及現有之介面諸如HDMI。圖3繪示一藉由藍光磁碟支援之音訊混合模型。一類似之音訊混合模型係藉由HD-DVD支援。此模型允許一次要音訊流322及/或一互動式音訊流323與一主要流321混合。 • 次要流322可由一較低位元率之音訊流組成，該較低位元Introducing stereoscopic video based on 2D+depth+shadow data has the problem that adding additional data to stereoscopic video significantly increases the necessary bandwidth. Not only is the read bandwidth of a disk, but the decoding bandwidth and amount of data must be sent from the player to the display across the interface. Therefore, a solution has been found to reduce the size of the additional material to be sent. The solution found will reduce the resolution of the depth map. This reduction in resolution can be as large as 1/8-sample of 2D content (1/8 of vertical and horizontal resolution). For a hd resolution of 2d video (1920x1080), this means that the depth map can be as small as 24〇χΐ44. Current 3D displays can sacrifice resolution (pixels) to give the user more attention to the graph. These displays take, for example, a frame of 192 〇χ 1 〇 8 作为 as the input format, whereby the 2D video occupies a size of 960 χ 540. Based on the resolution of the 2D screen size reduced to 1/8, the depth map can be as small as 12〇χ67 5, while maintaining the same quality for the user in 3D video. One question is how to use existing video publishing formats such as DVD, Blu-ray Disc and HD-DVD to carry additional depth information. Because of the low bit rate of the depth map and the fact that the depth map uses only a finite set of depth values, the depth map information can be encoded as an audio stream even in the case of high resolution video such as HDTV. The " stream can be sent to a 3D display along with the "true" audio stream, where the "real" stream of audio contains the audio waveform to be rendered through the speaker. The 3D display can then detachably decode the audio stream containing the depth 133819.doc •13· 200934218 and use it with the 2D video stream to generate a 3D sense. This will allow 3D video to use existing standards such as Blu-ray Disc and HD-DVD, as well as existing interfaces such as HDMI. FIG. 3 illustrates an audio mixing model supported by a Blu-ray disk. A similar audio mixing model is supported by HD-DVD. This model allows a primary audio stream 322 and/or an interactive audio stream 323 to be mixed with a primary stream 321 . • Secondary stream 322 may consist of a lower bit rate audio stream, the lower bit

率之音訊流在混合器3 12被與該主要音訊同步並混合。其 © 可由例如一 DTS-HD或杜比數位（加）（DOLBY DIGITAL (Plus))流組成。互動式音訊流323可由LPCM音訊組成，該 LPCM音訊係藉由磁碟上之一應用程式而啓動並在主要流 321與次要流322已被混合之後在混合器3 14與主要流321混合。其典型地用於該應用程式之相關事件所關聯的動態聲音。B 1、B2及B3代表音訊緩衝器用以緩衝所接收之流 321、322、323，以保證待處理之資料的連續可用性。主 _ 要流321與次要流322之音量可分別用增益D1及D2而調適。增益D2將次要音訊混合元資料302發送至一開關，該開關具有元資料開啓及元資料關閉之位置。在元資料開啓 - 之情況下，次要音訊混合元資料被轉送至轉換器XI，該轉換器XI取決於BD-J增益控制306對XI混合矩陣進行轉換。 BD-J代表藍光磁碟Java，其係藍光磁碟規格之互動式部分。一 BD-J應用程式可使用適當的BD-J控制來控制增益值及平移（panning)值。在元資料關閉之情況下，BD-J平移控制信號304被轉送至轉換器XI。開關係藉由BD-J元資料 133819.doc -14- 200934218 API 3 1 8而控制。轉換器χι將χι混合矩陣提供至混合器 312。轉換器χ2取決於bd_j平移控制3〇8及bd-J增益控制 310而將X2混合矩陣提供至混合器314。混合器314取決於 BD-J平移控制3〇8及BD_;增益控制3丨〇而將互動式音訊323 與混合器312之輸出混合。互動式音訊通道323及次要流322可被用以混入一承載深度圖之音訊流中。藉由調整混合器312及混合器314之混合參數，含有深度圖之音訊可被混合使得其在輸出316之音 5ΪΙ通道中結束。深度圖較佳地被混合至在3 D模式中未被使用或是空閒的一個或多個通道中。此等空閒或未用之通道被混合使得其等含有深度圖"音訊"流並通過例如HdMI介面而發送至3D顯示器。此等混合參數可藉由一 Java應用程式而設定’或可作為元資料被定位於次要音訊流中。在二者之間選擇的開關係藉由BD-J元資料API 3 1 8而操作。舉例而言，輸出音訊基本流316可包含多達7.丨個通道： 7個環繞通道及一個低音通道。圖4顯示此種通道設置的一實例，其在一顯示器後面有一中央揚聲器C，及在左邊有揚聲器L，右邊R ’左後Lb，右後Rb，左側Ls，右側RS，及低音揚聲器（未顯示）。在許多音訊系統中，缺少左後Lb 及右後Rb揚聲器。在此通道設置中’ Lb及Rb通道可被2個承載深度圖資訊音訊流之通道所替換。此替換可藉由混合一承載深度圖資訊音訊流之次要音訊流322而完成。次要音訊流322之混合元資料302經由轉換器X1指示混合器3 1 2將次要音訊流322 I33819.doc 200934218 與主要音訊流321混合’如圖3中指示的。混合元資料302 指定自主要音訊流之Lb及Rb通道的增益被設定為靜音，及次要流中之該等相同通道之增益被設定為1。這基本上將主要音訊流中之Lb及Rb通道替換為來自次要流中之Lb及 Rb通道。在此實例中之全部7.1個音訊流或是被重編碼（其可引起深度圖流中的一些失真）並通過HDMI介面被發送，或是作為LPCM被留下並發出’如圖5中顯示的。3D TV操取深度圖"音訊"通道並將其他通道轉送至一音訊接收器。圖5顯示藍光磁碟播放器BD將混合之71個音訊流502發送至一 3D TV。3D TV自Lb及Rb通道擷取深度資訊並將剩餘 5.1個音訊流504轉送至一環繞系統5〇6。混合深度圖資訊音訊流與主要音訊流之其他方法亦係可能的。舉例而言’在一7.〗杜比數位+(D〇lby叫加丨卟雜之情況下，深度圖資訊可被嵌入流之延伸封包中。杜比數位+使用此延伸機制以提供一與51解碼器相容之機制。位元流含有原始之5·!核心封包及提供附加通道的延伸封包。如此，7.1杜比數位+流可被直接發送至一 51環繞系統而不經過3D TV，因為5.丨環繞系統不處理延伸封包。這係有利的，因為通常磁碟播放器係被直接連接至環繞系統。其他通道可用以代替Lb&Rb通道。又，互動式音訊通道323可用於代替次要通道322以承載深度資訊。音訊矩陣化亦可如在杜比定向邏輯（D〇lby Pr〇_L〇gic)中定義的被使用。然而，此可在一定程度上降低音訊品質。當以正常2D模式播放時，無須犧牲任何音訊通道。因為 133819.doc -16- 200934218 全7.1流可照原樣被使用而不混入深度圖流中。對於3〇模式，内容作者可將次要音訊流包括在磁碟上並通過一播放清單加以參照（播放清單被定義在藍光磁碟規格中。一類似之概念被定義在HD-DVD規格中），該播放清單指示此次要流應與主要流一起存在。此播放清單可被使用者選擇作為磁碟上選單中指示這將以3D模式播放影片的一附加標題。如之前所解釋的，此將混合次要與主要流使得7.丨流之一個通道被深度圖"音訊"流所替換。一不利點係，當以 3D模式時，將損失一些音訊通道。即，僅而不是7.^系可用的。對於深度資訊僅使用一個或多個音訊通道之頻寬的一部分亦係可能的，因此所有音訊通道對音訊系統均係可用的’儘管可能以一減少之品質。在另一組態中，分離之基本音訊流被混合以適於播放器之"3D TV"輸出及適於播放器之"環繞系統"輸出。深度資訊僅對於被提供至播放器之 3 D TV"輸出的流才被混入。完整、原始音訊流被提供至播放器之”環繞系統"輸出。音訊流與深度圖音訊”流的混合係在LPCM位準下完成，且可在被發出之前被重編碼。LPCM樣本可具有一多達192 kHz(48、96及192)之採樣頻率且每樣本可有多達24 位元。一 120x67.5之深度圖需要一 ι·6 Mb/s之位元率。故使用16位元樣本及一 96 kHz之採樣頻率幾乎是足夠的（15 Mb/s)。經由稍微減少用於代表深度值之位元的數目，則這將係足夠的。當然’多重變動係可能的。可使用一更高 133819.doc -17· 200934218 之採樣頻率（192 kHz)及/或一更大之樣本大小（多達24位 "")另選擇為可使用多重音訊通道。舉例而言，深度圖之大小的一半可被承載於一左通道中及另一半在一右通道中。將深度圖轉換為音訊流可逐位元及逐列完成。一新框可藉由一標魏，諸如一個或多個零位元組而指示。又最高 » 有效位元被交替使得所形成之信號更似一真正的音訊信 ❹ 號。此有助於防止損壞音訊接收器，當信號無意中在此種音訊接收器中結束時。舉例而言，藉由做出錯誤之設定或電纜連接。使用所述之實施例，可使用現有之格式諸如藍光磁碟及 HD-DVD提供3D視訊。較佳地，應考慮到受操控之音訊係通過HDMI介面被發出至顯示器且不會至一外部音訊解碼器，因為在此種情況下用於承載深度資訊之某些通道將含有雜訊。較佳地，一分離之音訊輸出具有"乾淨的”非受操〇控之音訊。一替代解決方案係在3D顯示器上提供一數位音訊輸出並通知使用者一外部音訊解碼器諸如一接收器應被連接至此輸出而非直接至藍光磁碟播放器上的輸出。3d顯 . 示器過濾出深度資訊音訊流並將剩餘通道轉送至外部解碼 • 11。在又—實施例中，音訊解碼器經配置用以檢測深度資訊的存在且隨後忽略其。此處所揭示之概念可延伸至除深度資訊外之其他類型的資訊。舉例而言，任何輔助資訊而不是深度資訊可被包括在-音訊流中。該資訊可與沈浸體驗控制資料相關，例 133819.doc -18· 200934218 如，用於控制室内光源之控制命令可被提供於輔助資訊中。該等光源可產生顏色及/或亮度可被控制之光。舉例而言，情境光係藉由顯示器側上的光源提供。在音訊流中代替深度資訊或是除了深度資訊之外包括該等控制命令，這將提供與音訊流中包括深度資訊一樣的類似優點。圖6繪示本發明之一實施例。資料，例如其在一廣播公司或一内容提供者上係可用的，被提供至一第一系統The rate of audio stream is synchronized and mixed with the primary audio at mixer 3. Its © may consist of, for example, a DTS-HD or a DOLBY DIGITAL (Plus) stream. The interactive audio stream 323 can be comprised of LPCM audio that is initiated by an application on the disk and mixed with the primary stream 321 at the mixer 314 after the primary stream 321 and the secondary stream 322 have been mixed. It is typically used for dynamic sounds associated with events associated with the application. B 1, B2 and B3 represent audio buffers for buffering the received streams 321, 322, 323 to ensure continuous availability of the data to be processed. The volume of the main _stream 321 and the secondary stream 322 can be adjusted with the gains D1 and D2, respectively. Gain D2 sends the secondary audio mix metadata 302 to a switch having a location where the metadata is turned on and the metadata is turned off. In the case where the metadata is turned on, the secondary audio mix metadata is forwarded to the converter XI, which converts the XI mixed matrix depending on the BD-J gain control 306. BD-J stands for Blu-ray Disc Java, which is an interactive part of the Blu-ray Disc specification. A BD-J application can control gain values and panning values using appropriate BD-J controls. In the event that the metadata is turned off, the BD-J shift control signal 304 is forwarded to the converter XI. The open relationship is controlled by the BD-J meta data 133819.doc -14- 200934218 API 3 1 8. The converter 提供ι supplies the χι mixing matrix to the mixer 312. Converter χ2 provides the X2 mixing matrix to mixer 314 depending on bd_j translation control 3〇8 and bd-J gain control 310. The mixer 314 mixes the interactive audio 323 with the output of the mixer 312 depending on the BD-J translation control 3〇8 and BD_; gain control 3丨〇. The interactive audio channel 323 and the secondary stream 322 can be used to mix into an audio stream carrying a depth map. By adjusting the mixing parameters of mixer 312 and mixer 314, the audio containing the depth map can be mixed such that it ends in the output 316 channel. The depth map is preferably blended into one or more channels that are not used or are idle in the 3D mode. These free or unused channels are mixed such that they contain a depth map "audio" stream and are sent to the 3D display via, for example, the HdMI interface. These blending parameters can be set by a Java application or can be located as metadata in the secondary audio stream. The open relationship selected between the two is operated by the BD-J metadata API 3 1 8 . For example, the output audio elementary stream 316 can include up to 7. channels: 7 surround channels and one bass channel. Figure 4 shows an example of such a channel setup with a central speaker C behind a display and a speaker L on the left, R' left rear Lb, right rear Rb, left Ls, right RS, and woofer (not display). In many audio systems, the left rear Lb and right rear Rb speakers are missing. In this channel setup, the 'Lb and Rb channels can be replaced by two channels that carry the depth map information stream. This replacement can be accomplished by mixing a secondary audio stream 322 that carries the depth map information stream. The mixed metadata 302 of the secondary audio stream 322 indicates via the converter X1 that the mixer 3 1 2 mixes the secondary audio stream 322 I33819.doc 200934218 with the primary audio stream 321 as indicated in FIG. The mixed metadata 302 specifies that the gains of the Lb and Rb channels from the primary audio stream are set to silence, and the gains of the same channels in the secondary stream are set to one. This basically replaces the Lb and Rb channels in the primary audio stream with the Lb and Rb channels from the secondary stream. All 7.1 audio streams in this example are either re-encoded (which can cause some distortion in the depth map stream) and are sent through the HDMI interface, or left as LPCM and emit 'as shown in Figure 5 . The 3D TV operates the depth map "audio" channel and forwards the other channels to an audio receiver. Figure 5 shows a Blu-ray Disc player BD transmitting a mix of 71 audio streams 502 to a 3D TV. The 3D TV retrieves depth information from the Lb and Rb channels and forwards the remaining 5.1 audio streams 504 to a surround system 5〇6. Other methods of mixing depth map information audio streams with primary audio streams are also possible. For example, in the case of a 7. Dolby digit + (D〇lby is called noisy, the depth map information can be embedded in the extended packet of the stream. Dolby Digital + use this extension mechanism to provide a 51 decoder compatible mechanism. The bit stream contains the original 5·! core packet and an extended packet that provides additional channels. Thus, the 7.1 Dolby Digital+ stream can be sent directly to a 51 surround system without going through 3D TV. Because the 5. surround system does not process the extended packet. This is advantageous because usually the disk player is directly connected to the surround system. Other channels can be used instead of the Lb&Rb channel. Again, the interactive audio channel 323 can be used instead. The secondary channel 322 carries depth information. The audio matrix can also be used as defined in Dolby Pro Logic (D〇lby Pr〇_L〇gic). However, this can reduce the audio quality to some extent. When playing in normal 2D mode, there is no need to sacrifice any audio channels. Because 133819.doc -16- 200934218 all 7.1 streams can be used as they are without being mixed into the depth map stream. For 3〇 mode, content authors can stream secondary streams. Included on the disk and referenced by a playlist (playlist is defined in the Blu-ray disk specification. A similar concept is defined in the HD-DVD specification), the playlist indicates the current stream and the main stream This playlist can be selected by the user as an additional title in the menu on the disk indicating that the movie will be played in 3D mode. As explained before, this will mix the secondary and primary streams so that 7. One channel is replaced by the depth map "audio" stream. One disadvantage is that when in 3D mode, some audio channels will be lost. That is, only instead of 7.^ is available. For depth information only use one or A portion of the bandwidth of multiple audio channels is also possible, so all audio channels are available to the audio system 'although perhaps with a reduced quality. In another configuration, the separated basic audio streams are mixed to suit For the player's "3D TV" output and "surrounding system" output for the player. The depth information is only mixed into the stream that is supplied to the 3D TV" output of the player. The complete, original audio stream is provided to the player's "surround system" output. The mix of audio stream and depth map audio streams is done at the LPCM level and can be re-encoded before being sent. LPCM samples can have A sampling frequency of up to 192 kHz (48, 96 and 192) and up to 24 bits per sample. A depth map of 120x67.5 requires a bit rate of ι·6 Mb/s. The meta-sample and a sampling frequency of 96 kHz are almost sufficient (15 Mb/s). This will be sufficient by slightly reducing the number of bits used to represent the depth value. Of course, multiple changes are possible. A higher frequency channel can be used with a higher sampling frequency of 133819.doc -17· 200934218 (192 kHz) and/or a larger sample size (up to 24 bits ""). For example, half the size of the depth map can be carried in one left channel and the other half in a right channel. Converting a depth map to an audio stream can be done bit by bit and column by column. A new box can be indicated by a standard, such as one or more zeros. The highest » effective bits are alternated so that the resulting signal is more like a true audio signal. This helps prevent damage to the audio receiver when the signal inadvertently ends in such an audio receiver. For example, by making a wrong setting or cable connection. Using the described embodiment, 3D video can be provided using existing formats such as Blu-ray Disc and HD-DVD. Preferably, it should be considered that the controlled audio system is sent to the display via the HDMI interface and not to an external audio decoder, since in some cases some of the channels used to carry the depth information will contain noise. Preferably, a separate audio output has "clean" non-controlled audio. An alternative solution is to provide a digital audio output on the 3D display and notify the user of an external audio decoder such as a receiver. Should be connected to this output instead of directly to the output on the Blu-ray Disc player. The 3d display filters out the depth information audio stream and forwards the remaining channels to external decoding. 11. In yet another embodiment, the audio decoder It is configured to detect the presence of depth information and then ignore it. The concepts disclosed herein can be extended to other types of information other than depth information. For example, any auxiliary information, rather than depth information, can be included in the audio stream. This information can be related to the immersion experience control data, for example, 133819.doc -18· 200934218 For example, control commands for controlling the indoor light source can be provided in the auxiliary information. The light sources can produce color and/or brightness. Controlled light. For example, ambient light is provided by a light source on the display side. Instead of depth information or in addition to depth in the audio stream Including the control commands, this will provide similar advantages to the inclusion of depth information in the audio stream. Figure 6 illustrates an embodiment of the invention, such as a broadcast company or a content provider Available, provided to a first system

該第-系統建立一代表一多工之資料流的信號州。 Μ多1之資料流包含至少—視訊基本流及—音訊基本流，其中該音訊基本流包含輔助資訊。該辅助資訊包含用於增㈣視訊基本流之_呈$的一視覺體驗之資㉟。舉例而吕’該輔助資訊包含深度資訊。該深度資訊包含深度圖及/ 或視差圖及/或遮蔽資訊。該深度資訊可被編竭為一深度基本流。第一系統600將原始資料變換為多工之資料流64〇。第一系統_包含一編碼器6〇2以用於將辅助資訊及音訊資料一起編碼為-音訊基核，料言，音料道之—個或多個被填充以輔助資訊。第一系統_包含一多工器_以用於將視訊基本流及音訊基本流組合為一多工之資料流The first system establishes a signal state that represents a multiplexed data stream. The data stream of the 11 includes at least a video elementary stream and an audio elementary stream, wherein the audio elementary stream contains auxiliary information. The auxiliary information includes a visual experience 35 for increasing the visual experience of the video stream. For example, Lu’s auxiliary information contains in-depth information. The depth information includes a depth map and/or a disparity map and/or shading information. This depth information can be compiled into a deep elementary stream. The first system 600 transforms the raw data into a multiplexed data stream 64. The first system _ includes an encoder 6 〇 2 for encoding the auxiliary information and the audio data together into an audio core, and it is said that one or more of the audio tracks are filled with auxiliary information. The first system _ includes a multiplexer _ for combining the video elementary stream and the audio elementary stream into a multiplexed data stream

64〇。輸出6G6為輸送提供多工之資料流至__接收器，舉例L @ m媒體載體或經由-廣播或隨選視訊傳輸（藉由虛線箭頭640示意性指示）。在一適於實施在-視訊播放器的實施例中，系統刪亦可包含-讀取器608以用於自一内容資料載體諸如则、 133819.doc •19- 200934218 藍光磁碟或HD-DVD獲取視訊基本流、輔助資訊及音訊資料，其中該輔助資訊已按一不同於一基本音訊流之格式而被編碼於該内容資料載體上。此之一個優點係，在播放器與電視機之間可使用一標準介面。第二系統650將多工之資料流64〇變換為真正的視訊、輔助及/或音訊信號。其可係一 3D電視機之一部分，或例如被實施為自一磁碟播放器接收一輸入及將輸出提供至一電視機及一視訊系統的一分離之機上盒（Sett〇P b〇X)。第二系統包含一輸入652以用於接收多工之資料流。舉例而言，其自另一裝置、自一廣播傳輸或自讀取器660接收流。第二系統包含一解多工器654以用於自多工之資料流擷取視訊基本流及音訊基本流。第二系統65〇包含一解碼器656以用於自音訊基本流擷取辅助資訊。第二系統650包含一移除機構658，其用於自音訊基本流移除輔助資訊以獲取沒有輔助資訊的一第二音訊基本流。此第二音訊基本流係經由一輸出而提供β 輔助資訊可包含一深度基本流，該深度基本流包含補助視訊基本流的深度及/或視差值。在一實施例中，音訊基本流包含對應於至少6個（通常7 個）環繞通道及至少一個低音通道的至少7個音訊通道。對於環繞聲使用多達六個通道（即，五個環繞通道及一個低音通道）。解碼器656經配置用以自剩餘通道，即自該等環繞通道之數個擷取輔助資訊’其中該數目至多等於環繞通道之數目減去五。 133819.doc •20· 200934218 在一實施例中，輸入652包含一標準化數位音訊/視訊介面。在另一實施例中，輸入652包含一用於讀取一媒體載體之讀取器660。一内容資料載體可被插入讀取器660中。該内容資料載體包含一多工之資料流，該多工之資料流包含至少一視訊基本流及一音訊基本流，其中該音訊基本流包含輔助資訊’且其中該輔助資訊包含用於增強該視訊基 ' 本流之一呈現的視覺體驗之資訊。輔助資訊可被儲存在一第一音訊資料流中且音訊資訊可被儲存在一第二音訊資料 © 流中，且提供一播放清單，其包含一指令用以將該第一音訊資料流及該第二音訊資料流混合為一單音訊基本流。一實施例包含用於確認輔助資訊是否已被包括在音訊基本流中並取決於輔助資訊之檢測而啓用解碼器656的機構。根據一種用於處置一多工之資料流64〇的方法，下列步驟被執行：接收該多工之資料流；自該多工之資料流擷取 ❹ 至少一視訊基本流及一音訊基本流；及自該音訊基本流擷取輔助資訊，且其中該輔助資訊包含用於增強一視覺體驗之資訊。 - 根據一種用於處置一多工之資料流640的方法，下列步 - 驟被執行：將輔助資訊及音訊資料一起編碼為一音訊基本流，其中該輔助資訊包含用於增強一視覺體驗之資訊丨將至少一視訊基本流及該音訊基本流組合為一多工之資料流；及提供該多工之資料流。應瞭解本發明亦延伸至電腦程式，尤其係在—載體上或 133819.doc -21 - 200934218 ❹ 在一載趙中經調適用以將本發明付諸實踐的電腦程式。該程式之形式可為原始碼、目標碼、原始碼與目標碼中間的碼諸如部分編譯之形式，或為適合用於根據本發明之方法的實施之任何形式。亦應瞭解此種程式可具有許多不同之=構設計。舉例而言，實施根據本發明之方法或系統之功能性的-程式竭可被再分為一個或多個次常式。用於在 ::次常式之間分配功能性的許多不同之方法對熟練技術顯而易見的。該等次常式可被一起儲存在-可執行播 f中以形成一自含型程式。此種可執行樓案可包含電腦可執订之指令’例如處理器指令及/或解譯器指令（例如⑽ =器指令）。另—選擇為，該等次常式之-個或多個或所有可被儲存在至少一個外部程式館槽案令且例如在運行 =間與一主程式靜態地或動態地連結。該主程式含有對該 :次常式之至少一個的至少一次呼叫。又，該等次常式可匕含對彼此的函數呼叫。與一電腦程式產品相關之一實施例包含電腦可執行指令，該等電腦可執行指令係對應於所 ^方法之至少一個的處理步驟之每一者。此等指令可被再分為次常式及/或被儲存在可被靜態地或動態地連結的一個或多個標案中。與一電腦程式產品相關之另一實施例包含電腦可執行指令’該等電腦可執行指令係對應於所述系統及/或產品之至少-個的機構之每一者。此等指令可被再分為次常式及/或被儲存在可被靜態地或動態地連結的一個或多個檔案中。一電腦㈣之載體可係可承载該程式的任何實體或裝 133819.doc -22· 200934218 置。舉例而言，該載體可包括一儲存媒體，諸如— r〇m， 2如-CD R0M或—半導體R0M，或—磁記錄媒體，例如 p軟碟或硬碟。此外該載體可係一可傳輸之載體諸如—電或光錢，其可經由電《光纜或藉由無線電或其他機構而料。當程式係以此種信號體現時，冑體可藉由此種境或丨他裝置或機構而組成。另一選擇為，載體可係一積體電路，程式被嵌入其中，該積體電路經調適用以執行相關方法’或用在相關方法的執行中。虛、應注意上述實施例係說明而非限制本發明，且熟習此項技術者在不脫離附加請求項之範疇下可設計許多替代實施例。在請求項中，放置於括號之間的任何參考符號不應視為限制該請求項。動詞"包含"及其詞形變化的使用並不排除除了在一請求項中所述之元件或步驟外還存在其他元件或步驟。在一元件之前的冠詞"一”或"一個"並不排除存在複數個此種TL件。本發明可藉由包含若干相異之元件的硬 φ 體，及藉由一適當程式化之電腦而實施。在列舉若干機構之裝置明求項中，此等機構之若干可藉由同一項硬體而體現。某些措施係被敘述於互不相同之附屬請求項中，這一單純事實並不指示此等措施之一組合不可被有利地使用。【圖式簡單說明】圖1顯示一具有一深度圖之實例影像；圖2顯不一具有一深度圖之實例影像及遮蔽資訊；圖3靖·示一實施例；圖4繪示一環繞系統； 133819.doc -23- 200934218 圖5繪示一實施例；及圖6續' 示一實施例。【主要元件符號說明】 302 次要音訊混合元資料 304 BD-J平移控制信號 306 BD-J增益控制 ' 308 BD-J平移控制 310 BD-J增益控制 ❹ 312 混合器 314 混合器 316 輸出音訊基本流 318 BD-J元資料API 321 主要流 322 次要流 323 互動式音訊流 502 音訊流 504 音訊流 506 環繞系統 ’ . 600 第一系統 602 編碼Is 604 多工器 606 輸出 608 讀取器 640 多工之資料流 133819.doc -24- 200934218 650 第二系統 652 輸入 654 解多工器 656 解碼器 658 移除機構 660 讀取器 • B1 音訊緩衝器 B2 音訊緩衝器 B3 音訊緩衝器 BD 藍光磁碟播放器 C 中央揚聲器 D1 增益 D2 增益 L 揚聲器 Lb 揚聲器 φ Ls 揚聲器 R 揚聲器 Rb 揚聲器 - Rs 揚聲器 XI 轉換器 X2 轉換器 133819.doc -2564〇. The output 6G6 provides a multiplexed data stream to the __ receiver, such as an L@m media carrier or via-broadcast or video-on-demand (illustrated schematically by dashed arrow 640). In an embodiment suitable for implementation in a video player, the system deletion may also include a reader 608 for use with a content material carrier such as 133819.doc • 19-200934218 Blu-ray disk or HD-DVD. Obtaining video basic stream, auxiliary information and audio data, wherein the auxiliary information has been encoded on the content data carrier in a format different from a basic audio stream. One advantage of this is that a standard interface can be used between the player and the television. The second system 650 converts the multiplexed data stream into a real video, auxiliary, and/or audio signal. It can be part of a 3D television set or, for example, a separate set-top box (Sett〇P b〇X) that is implemented to receive an input from a disk player and provide output to a television set and a video system. ). The second system includes an input 652 for receiving the multiplexed data stream. For example, it receives a stream from another device, from a broadcast transmission, or from a reader 660. The second system includes a demultiplexer 654 for extracting the video elementary stream and the audio elementary stream from the multiplexed data stream. The second system 65A includes a decoder 656 for extracting auxiliary information from the audio elementary stream. The second system 650 includes a removal mechanism 658 for removing auxiliary information from the audio elementary stream to obtain a second audio elementary stream without auxiliary information. The second audio elementary stream is provided via an output. The beta auxiliary information may comprise a depth elementary stream comprising a depth and/or a disparity value of the supplementary video elementary stream. In one embodiment, the audio elementary stream includes at least seven audio channels corresponding to at least six (typically seven) surround channels and at least one bass channel. Use up to six channels for surround sound (ie, five surround channels and one low channel). The decoder 656 is configured to retrieve auxiliary information from the remaining channels, i.e., from a number of the surrounding channels, wherein the number is at most equal to the number of surrounding channels minus five. 133819.doc • 20· 200934218 In one embodiment, input 652 includes a standardized digital audio/video interface. In another embodiment, input 652 includes a reader 660 for reading a media carrier. A content material carrier can be inserted into the reader 660. The content data carrier includes a multiplexed data stream, the multiplexed data stream including at least one video elementary stream and an audio elementary stream, wherein the audio elementary stream includes auxiliary information 'and wherein the auxiliary information includes for enhancing the video stream The information of the visual experience presented by one of the streams. The auxiliary information can be stored in a first audio data stream and the audio information can be stored in a second audio data stream, and a playlist is provided, the command includes an instruction for streaming the first audio data and the The second audio stream is mixed into a single audio elementary stream. An embodiment includes means for enabling the auxiliary information to be included in the audio base stream and enabling the decoder 656 depending on the detection of the auxiliary information. According to a method for handling a multiplexed data stream 64, the following steps are performed: receiving the multiplexed data stream; extracting at least one video elementary stream and an audio elementary stream from the multiplexed data stream; And obtaining auxiliary information from the audio elementary stream, and wherein the auxiliary information includes information for enhancing a visual experience. - According to a method for handling a multiplexed data stream 640, the following steps are performed: encoding auxiliary information and audio data together into an audio elementary stream, wherein the auxiliary information includes information for enhancing a visual experience组合 Combining at least one video elementary stream and the audio elementary stream into a multiplexed data stream; and providing the multiplexed data stream. It should be understood that the present invention also extends to computer programs, particularly on a carrier or 133819.doc -21 - 200934218 ❹ A computer program that is adapted to put the invention into practice. The program may be in the form of a source code, an object code, a code intermediate between the source code and the object code, such as a partially compiled form, or any form suitable for implementation of the method according to the present invention. It should also be understood that such a program can have many different configurations. For example, the functionality of implementing a method or system in accordance with the present invention may be subdivided into one or more subnormals. Many different methods for assigning functionality between ::subnormals are apparent to the skilled artisan. The sub-normals can be stored together in an executable broadcast f to form a self-contained program. Such executable buildings may include computer executable instructions' such as processor instructions and/or interpreter instructions (e.g., (10) = device instructions). Alternatively, one or more or all of the sub-normals may be stored in at least one external library slot and linked to a main program statically or dynamically, for example, during operation. The main program contains at least one call to at least one of the sub-normals. Again, the sub-natives can contain function calls to each other. One embodiment associated with a computer program product includes computer executable instructions corresponding to each of the processing steps of at least one of the methods. Such instructions may be subdivided into sub-normals and/or stored in one or more references that may be statically or dynamically linked. Another embodiment relating to a computer program product comprises computer executable instructions' such computer executable instructions correspond to each of at least one of the systems and/or products. Such instructions may be subdivided into sub-normals and/or stored in one or more files that may be statically or dynamically linked. A computer (4) carrier may be any entity that can carry the program or installed 133819.doc -22· 200934218. For example, the carrier may include a storage medium such as -r〇m, 2 such as -CD ROM or semiconductor ROM, or - a magnetic recording medium such as a p-flop or hard disk. Furthermore, the carrier can be a transportable carrier such as an electric or optical money, which can be supplied via an electrical cable or by radio or other means. When the program is embodied by such a signal, the body can be composed of such a device or a device or mechanism. Alternatively, the carrier can be an integrated circuit into which the program is embedded, the integrated circuit being adapted to perform the correlation method or used in the execution of the associated method. It is to be understood that the above-described embodiments are illustrative and not limiting, and that many alternative embodiments can be devised without departing from the scope of the appended claims. In a request item, any reference symbol placed between parentheses should not be considered as limiting the request. The use of the verb "including" and its morphological variations does not exclude the presence of other elements or steps in addition to the elements or steps recited in the claims. The article "a" or "an" before an element does not exclude the existence of a plurality of such TL elements. The invention may be embodied by a rigid φ body comprising a plurality of distinct elements, and by a suitable stylization Implemented by a computer. In the enumeration of devices of several institutions, some of these institutions can be embodied by the same hardware. Some measures are described in different subsidiary claims. The facts do not indicate that one of these measures cannot be used favorably. [Simplified Schematic] Figure 1 shows an example image with a depth map; Figure 2 shows an example image with a depth map and masking information; Figure 3 shows an embodiment; Figure 4 shows a surround system; 133819.doc -23- 200934218 Figure 5 illustrates an embodiment; and Figure 6 continues to show an embodiment. [Main component symbol description] 302 times To mix audio metadata 304 BD-J pan control signal 306 BD-J gain control '308 BD-J pan control 310 BD-J gain control 312 312 mixer 314 mixer 316 output audio elementary stream 318 BD-J meta data API 321 main stream 3 22 secondary stream 323 interactive audio stream 502 audio stream 504 audio stream 506 surround system ' . 600 first system 602 code Is 604 multiplexer 606 output 608 reader 640 multiplexed data stream 133819.doc -24- 200934218 650 Second System 652 Input 654 Demultiplexer 656 Decoder 658 Removal Mechanism 660 Reader • B1 Audio Buffer B2 Audio Buffer B3 Audio Buffer BD Blu-ray Disc Player C Central Speaker D1 Gain D2 Gain L Speaker Lb Speaker φ Ls Speaker R Speaker Rb Speaker - Rs Speaker XI Converter X2 Converter 133819.doc -25

Claims

200934218 X. Patent application scope: ι_ A system (65〇) for disposing a data stream of a multiplex, the system comprising: - input (652) 'which is used to receive the data stream of the multiplex; (654) 'which is used to extract at least one video elementary stream and one audio elementary stream from the multiplexed data stream; and 'a genre (656)' for assisting from the audio elementary stream Information, wherein the auxiliary information includes information for enhancing the visual experience of the video elementary stream. 2. The system of claim 1, wherein the auxiliary information includes depth information associated with at least one of the video frames encoded in the video elementary stream. 3. The system of claim 1, further comprising: a removal mechanism (658) for removing the auxiliary information from the audio elementary stream to obtain a second audio elementary stream without auxiliary information; and an output, It is used to provide the second audio elementary stream. The system of claim 2, wherein the depth information comprises at least a portion of a depth elementary stream comprising a depth and/or a disparity value that subsidizes the video elementary stream. 5. The system of claim 1, wherein the audio elementary stream comprises at least seven audio channels corresponding to at least six surround channels and at least one bass channel; and the decoder is configured to be from the plurality of surround channels Take the auxiliary information 'where the number is at most equal to the number of surrounding channels minus five. 6. The system of claim 1, wherein the input comprises a standardized digit 133819.doc 200934218 video/video interface. 7. A system according to claim 1, wherein the input comprises a reader (66A) for reading a media carrier to provide the multiplexed data stream. 8. The system of claim 1, further comprising means for confirming whether the auxiliary message has been included in the audio elementary stream and activating the decoder depending on the detection of the auxiliary information. 9. A system for providing a multiplexed data stream _), the system comprising: ❹ an encoder (602) 'for encoding auxiliary information and audio data together into an audio elementary stream, wherein the auxiliary The information includes information for enhancing a visual experience of one of the _ video elementary streams; a multiplexer (_), # is used to combine at least the video elementary stream and the audio elementary stream into a multiplexed data stream; and an output (606), which is used to provide the data stream of the multiplex. The system of claim 9 further comprising a reader (s) for obtaining the video elementary stream, the auxiliary information and the audio material from a content material carrier, wherein the auxiliary information has been pressed A format different from a basic audio stream is encoded on the content material carrier. - a content data carrier _) 'which contains a multiplexed data stream, the data stream containing at least - a video elementary stream and an audio elementary stream, wherein the audio elementary stream contains auxiliary information, and wherein the auxiliary The information includes information for enhancing the visual-presentation-visual (4) of the video elementary stream. 12. The content data carrier according to claim 11, wherein the auxiliary information is stored in the first audio data stream and the 133819.doc * 2 - 200934218 audio information is stored in a second audio data stream, and The method further includes: a playlist, including an instruction for mixing the first audio data stream and the second audio data stream into a single audio elementary stream. I3. A method for processing a multiplexed data stream (640), the method comprising: 'receiving the multiplexed data stream; extracting at least one video elementary stream and an audio base from the multiplexed data stream ® the stream; and accessing auxiliary information from the audio elementary stream, wherein the auxiliary information includes information for enhancing a visual experience of the video elementary stream. A method for providing a multiplexed data stream (64 ,), the method comprising: encoding auxiliary information and audio data together into an audio elementary stream, wherein the auxiliary information comprises for enhancing a video elementary stream a visual experience © information; combining at least the video elementary stream and the audio elementary stream into a multiplexed data stream; and - providing the multiplexed data stream. A computer program product comprising instructions for causing a processor to perform the method according to claim 13 or 14. 133819.doc