TW201722520A

TW201722520A - System and method for delivering media over network

Info

Publication number: TW201722520A
Application number: TW105117600A
Authority: TW
Inventors: 郭榮昌; 楊昇龍; 鄧安倫
Original assignee: 優必達公司
Priority date: 2015-12-21
Filing date: 2016-06-03
Publication date: 2017-07-01
Also published as: CN106899860A; CN106899860B; JP6306089B2; JP2017117431A; TWI637772B

Abstract

A method for delivering media from a server to a client device over a network is disclosed. A Virtual-Reality (VR) scene application running on the server generates a virtual VR 3D environment containing 3D models. The server checks the status of each 3D model in a predetermined order, and then, only those 3D models which are not pre-stored in the client device will all be rendered by the server into a left eye frame and a right eye frame of 2D video stream. The server then sends to the client device the frames and meta data of the 3D models which are pre-stored in the client device via the network. The client device uses a combined VR frame of these frames as a background for rendering the 3D models which are pre-stored in the client device so as to generate a mixed VR frame of video stream for output.

Description

System and method for transmitting media over a network

本發明係相關於一種透過網路傳送影像與聲音等媒體的系統及方法，特別是指一種於用戶裝置上渲染虛擬實境(Virtual-Reality；簡稱VR)影像之3D物件的方法，該方法藉由在用戶裝置渲染3D物件來結合由伺服器提供之VR場景的2D視頻串流。 The present invention relates to a system and method for transmitting media such as images and sounds through a network, and more particularly to a method for rendering a 3D object of a virtual reality (VR) image on a user device. The 2D video stream of the VR scene provided by the server is combined by rendering the 3D object at the user device.

過去幾年間，線上遊戲已成為世界潮流，隨著雲端計算相關系統與科技之發展，一種利用伺服器串流遊戲內容並提供服務之技術也已問世。 In the past few years, online games have become the world trend. With the development of cloud computing related systems and technologies, a technology that uses server to stream game content and provide services has also been introduced.

傳統一種提供雲端遊戲服務之方法，係藉由伺服器負責幾乎所有的運算，亦即，當要提供雲端遊戲服務時，該伺服器需要產生一包括多個可由參與者移動或控制之3D物件的虛擬3D環境。在已知技術中，這些3D物件可包含音效，之後依據參與者(玩家)之控制動作，該伺服器即將虛擬3D環境與3D物件結合並渲染至遊戲機上一具有立體聲之2D遊戲螢幕上。而後，該伺服器即將渲染後之影像與立體聲、以一包含聲音之2D視頻串流、透過網路傳輸至玩家的裝置上，玩家裝置於接收後、僅需解碼並顯示該2D視頻串流，而不需進行額外之3D渲染計算。然而，前述在同一伺服器上為眾多的玩家執行渲染計算之傳統技術，將導致執行3D渲染計算之伺服器負載過大；此外，因玩家所見畫面皆以經過破壞性壓縮之2D視頻串流形式傳輸，因此，無論是影像與聲音的品質均與原顯示3D物件之品質有一段落差，且伺服器與玩家裝置間之大量網路通訊頻寬，亦成為一大問題。 A traditional method of providing a cloud game service relies on a server for almost all operations, that is, when a cloud game service is to be provided, the server needs to generate a 3D object including a plurality of objects that can be moved or controlled by the participant. Virtual 3D environment. In the known art, these 3D objects may contain sound effects, which are then combined with the control of the participant (player), which combines the virtual 3D environment with the 3D object and renders it onto a stereoscopic 2D game screen on the gaming machine. Then, the server immediately transmits the image and stereo, and transmits the 2D video stream containing the sound to the player through the network, and the player device only needs to decode and display the 2D video stream after receiving. No additional 3D rendering calculations are required. However, the aforementioned conventional techniques of performing rendering calculations for a large number of players on the same server will result in an excessive load on the server performing 3D rendering calculations; in addition, the screens seen by the player are transmitted in a destructively compressed 2D video stream. Therefore, both the quality of the image and the sound are inferior to the quality of the original displayed 3D object, and the large amount of network communication bandwidth between the server and the player device becomes a major problem.

虛擬實境(Virtual-Reality；簡稱VR)技術在近期已蔚為流行。為了提供人眼一個VR的視覺感受，虛擬的VR場景必須包含一個專供人類左眼觀賞的影像、以及另一專供人類右眼觀賞的影像。本發明提供一種透過網路傳送影像與聲音等媒體的系統及方法，其藉由在用戶裝置渲染3D物件來結合由伺服器提供之VR場景的2D視頻串流。 Virtual-Reality (VR) technology has become popular in the near future. In order to provide a human VR visual experience, the virtual VR scene must contain a dedicated human The image viewed by the left eye and another image for the right eye of the human. The present invention provides a system and method for transmitting media such as images and sounds over a network by combining 3D objects in a user device to combine 2D video streams of a VR scene provided by a server.

緣此，本發明之主要目的係在提供一種透過網路傳送如影像與聲音等媒體的系統及方法，可降低伺服器之負載、提升用戶裝置上顯示之影像與聲音之品質、並節省伺服器與用戶裝置間之通訊頻寬；本發明方法之特點，係在用戶裝置渲染3D物件(也稱為3D模型)來結合由伺服器提供之VR場景的2D視頻串流，以達到於用戶裝置上渲染虛擬實境(Virtual-Reality；簡稱VR)影像之3D物件的結果。 Accordingly, the main object of the present invention is to provide a system and method for transmitting media such as video and audio through a network, which can reduce the load of the server, improve the quality of images and sounds displayed on the user device, and save the server. Communication bandwidth with the user device; the method of the present invention is characterized in that the user device renders a 3D object (also referred to as a 3D model) to combine the 2D video stream of the VR scene provided by the server to reach the user device. The result of rendering a 3D object of a virtual reality (Virtual-Reality; VR) image.

為達上述目的，本發明係提供一種透過網路傳送媒體的系統及方法，該媒體包括複數個影像。此系統包括一伺服器及一用戶裝置，而此方法係包括下列步驟：步驟(A)：在一伺服器上執行一虛擬實境(VR)應用程式，以產生一包含複數個3D模型之虛擬的VR 3D環境，每一該些3D模型係搭配一指示該3D模型是否預存在一用戶裝置中之狀態；步驟(B)：該伺服器檢驗該些3D模型的該些狀態、以決定那一個3D模型要被編碼為一2D視頻串流所包含的一左眼影格及一右眼影格，其編碼方式係為將非預存在該用戶裝置中之該些3D模型編碼至該左眼影格及該右眼影格中；步驟(C)：該伺服器至少將該2D視頻串流的該左眼影格及該右眼影格、透過網路、傳送到該用戶裝置；其中，該伺服器亦將該非預存在用戶裝置中之該些3D模型、以一預定順序、傳送至該用戶裝置；而當該用戶裝置接收到由該伺服器傳送來之該些3D模型時，該用戶裝置即將該些3D模型儲存、並發出一訊息至該伺服器，以改變該些3D模型的該些狀態，並指示該些3D模型現在係為預存在該用戶裝置中；以及步驟(D)：該用戶裝置將該接收自該伺服器之該左眼影格及該右眼影格解碼，並利用該左眼影格及該右眼影格作為渲染該些預存在用戶裝置中、但未包含在該左眼影格及該右眼影格中、之3D模型的一背景畫面，以藉此產生作為包含有一VR場景之一輸出視頻串流的一混合VR影格。 To achieve the above object, the present invention provides a system and method for transmitting media over a network, the media including a plurality of images. The system includes a server and a user device, and the method includes the following steps: Step (A): executing a virtual reality (VR) application on a server to generate a virtual reality comprising a plurality of 3D models a VR 3D environment, each of the 3D models being associated with a state indicating whether the 3D model is pre-existing in a user device; and step (B): the server checks the states of the 3D models to determine which one The 3D model is to be encoded as a left eye frame and a right eye frame included in a 2D video stream, and is encoded by encoding the 3D models that are not pre-existing in the user device to the left eye frame and the In the right eye frame; step (C): the server transmits at least the left eye frame and the right eye frame of the 2D video stream to the user device through the network; wherein the server also not The 3D models in the user device are transmitted to the user device in a predetermined order; and when the user device receives the 3D models transmitted by the server, the user device stores the 3D models And send a message to the wait a server to change the states of the 3D models and indicating that the 3D models are now pre-existing in the user device; and step (D): the user device receives the left eye shadow received from the server And the right eye frame decoding, and using the left eye frame and the right eye frame as rendering the pre-existing users A background image of the 3D model in the device, but not included in the left eye frame and the right eye frame, thereby generating a hybrid VR frame as an output video stream containing one of the VR scenes.

於一實施例中，於該步驟(D)中，在該用戶裝置把接收自該伺服器的該左眼影格及該右眼影格解碼之後，該用戶裝置更把該左眼影格及該右眼影格合併為一合併的VR影格，然後使用該合併的VR影格當作該背景畫面來渲染該些預存在用戶裝置中、但未包含在該左眼影格及該右眼影格中、之3D模型，以藉此產生作為包含有該VR場景之該輸出視頻串流的該混合VR影格。 In an embodiment, in the step (D), after the user device decodes the left eye frame and the right eye frame received by the server, the user device further displays the left eye frame and the right eye shadow. The cells are merged into a combined VR frame, and then the combined VR frame is used as the background image to render the 3D models of the pre-existing user devices but not included in the left eye frame and the right eye frame. Thereby generating the hybrid VR frame as the output video stream containing the VR scene.

於一實施例中，該伺服器更包括：一VR場景傳輸器，係為編譯在該VR應用程式中或於執行時間中動態連結在該VR應用程式上之一程式庫；其中，該VR場景傳輸器係保有包含所有該3D模型以及每一該3D模型之該狀態之一列表，該狀態係用以指明該3D模型之狀態為“Not Ready(未備妥)”、“Loading(下載中)”及“Ready for Client(用戶已下載)”中之一者；以及一VR場景伺服器，係為以該VR應用程式於該伺服器上執行之一伺服器程式；其中，VR場景伺服器係作為該VR場景傳輸器及該用戶裝置間訊息傳遞之一中繼站，該VR場景伺服器亦作為供該用戶裝置自該伺服器下載必要之該3D模型之一下載伺服器程式。 In an embodiment, the server further includes: a VR scene transmitter, which is a library compiled in the VR application or dynamically linked to the VR application during execution time; wherein the VR scene The transmitter maintains a list of all of the states including all of the 3D model and each of the 3D models, the state is used to indicate that the state of the 3D model is "Not Ready", "Loading" And one of "Ready for Client"; and a VR scene server for executing a server program on the server by the VR application; wherein the VR scene server system As a relay station of the VR scene transmitter and the message transmission between the user devices, the VR scene server also downloads the server program as one of the 3D models necessary for the user device to download from the server.

於一實施例中，該用戶裝置更包括：一VR場景用戶端，係為一於該用戶裝置上運轉之程式，用以產生該輸出視頻串流、並透過該網路與該伺服器連通；一影格結合器用以將該左眼影格與該右眼影格合併為該合併的VR影格；以及一VR場景快取，用以儲存至少一先前自該伺服器下載之該3D模型。 In an embodiment, the user device further includes: a VR scene client, which is a program running on the user device, configured to generate the output video stream and communicate with the server through the network; A frame combiner is configured to merge the left eye frame with the right eye frame into the merged VR frame; and a VR scene cache for storing at least one 3D model previously downloaded from the server.

1‧‧‧伺服器 1‧‧‧Server

3‧‧‧網路(基地台) 3‧‧‧Network (base station)

4‧‧‧網路 4‧‧‧Network

21‧‧‧用戶裝置(智慧型手機) 21‧‧‧User device (smart phone)

22‧‧‧用戶裝置(筆記型電腦) 22‧‧‧User device (notebook)

23‧‧‧用戶裝置(桌上型電腦) 23‧‧‧User device (desktop)

51‧‧‧使用者眼睛 51‧‧‧User eyes

52‧‧‧投影面 52‧‧‧Projection surface

70‧‧‧伺服器 70‧‧‧Server

71、71a‧‧‧人 71, 71a‧‧ people

72、72a‧‧‧房屋 72, 72a‧‧‧ houses

73、73a‧‧‧影格 73, 73a‧‧ frames

74‧‧‧用戶裝置 74‧‧‧User device

75‧‧‧視訊串流影格 75‧‧‧Video Streaming Frame

81、81a‧‧‧聲音 81, 81a‧‧‧ voice

82、82a‧‧‧聲音 82, 82a‧‧‧ voice

83、83a、1711、1712、1713‧‧‧影格 83, 83a, 1711, 1712, 1713‧‧ ‧ frames

85‧‧‧視訊串流影格 85‧‧‧Video Streaming Frame

100、1100‧‧‧應用程式 100, 1100‧‧‧ application

110、1110‧‧‧場景傳輸器(程式庫) 110, 1110‧‧‧ Scene Transmitter (Library)

120、1120‧‧‧場景伺服器 120, 1120‧‧‧ Scene Server

121‧‧‧資料封包器 121‧‧‧data packer

170、1170‧‧‧場景用戶端(程式) 170, 1170‧‧‧ Scene Client (Program)

1111、1171‧‧‧影格結合器 1111, 1171‧‧ ‧ frame combiner

190、1190‧‧‧場景快取 190, 1190‧‧‧ Scene cache

60~67、661、60a~67a、661a‧‧‧步驟 60~67, 661, 60a~67a, 661a‧‧ steps

101~114、122、124、172、174、176、192、194、1101、1112~1115、1122、1124、1172、1174、1176、1192、1194‧‧‧路徑 101~114, 122, 124, 172, 174, 176, 192, 194, 1101, 1112~1115, 1122, 1124, 1172, 1174, 1176, 1192, 1194‧‧

第1圖係為本發明透過網路傳送媒體的系統一標準實施例之示意圖。 1 is a schematic diagram of a standard embodiment of a system for transmitting media over a network according to the present invention. Figure.

第2圖係為本發明系統架構一實施例之示意圖。 2 is a schematic diagram of an embodiment of a system architecture of the present invention.

第3A圖係為本發明透過網路傳送媒體的方法一實施例之流程圖。 3A is a flow chart of an embodiment of a method for transmitting media over a network according to the present invention.

第3B圖為本發明透過網路傳送媒體的方法另一實施例之流程圖。 FIG. 3B is a flow chart of another embodiment of a method for transmitting media through a network according to the present invention.

第4A、4B及4C圖係顯示本發明方法如何傳送視訊串流及3D模型一實施例之示意圖。 4A, 4B and 4C are diagrams showing an embodiment of how the method of the present invention transmits a video stream and a 3D model.

第5A、5B及5C圖係顯示本發明方法如何決定哪個3D模型須被編碼至影格一實施例之示意圖。 Figures 5A, 5B and 5C show a schematic diagram of how the method of the present invention determines which 3D model has to be encoded into a frame.

第6A、6B及6C圖係顯示本發明方法如何傳送具聲音之視訊串流及3D聲音一實施例之示意圖。 6A, 6B, and 6C are diagrams showing an embodiment of how the method of the present invention transmits a video stream with sound and 3D sound.

第7A、7B及7C圖係顯示本發明方法如何決定哪個3D聲音須被編碼至具聲音之視訊串流影格一實施例之示意圖。 Figures 7A, 7B and 7C show a schematic diagram of how the method of the present invention determines which 3D sound is to be encoded into a video stream with sound.

第8圖為本發明之虛擬實境(VR)場景系統的系統架構一實施例之示意圖。 Figure 8 is a schematic diagram of an embodiment of a system architecture of a virtual reality (VR) scenario system of the present invention.

第9圖為說明本發明之虛擬實境(VR)場景系統之影格結合器的功能的一實施例示意圖。 Figure 9 is a diagram showing an embodiment of the function of the frame combiner of the virtual reality (VR) scene system of the present invention.

第10圖為本發明之VR場景系統的系統架構之第二實施例之示意圖。 FIG. 10 is a schematic diagram of a second embodiment of a system architecture of a VR scene system of the present invention.

第11圖為本發明之VR場景系統的系統架構之第三實施例之示意圖。 11 is a schematic diagram of a third embodiment of a system architecture of a VR scenario system of the present invention.

為了能更清楚地描述本發明所提出之透過網路傳送媒體的系統與方法，以下將配合圖式詳細說明之。 In order to more clearly describe the system and method for transmitting media through the network as proposed by the present invention, the following detailed description will be made in conjunction with the drawings.

本發明運用之一係在線上遊戲，其玩家係使用一用戶裝置、透過網路、於一伺服器上進行遊戲，此伺服器依玩家之指令動作、並於用戶裝置上產生視訊；例如，當一玩家在用戶裝置採取動作時，此一動作即會傳送至伺服器裝置、並於其上計算一影像、再將影像回傳至用戶裝置。在許多線上遊戲中，伺服器所產生之2D影像係包括在視線範圍內其他物件的3D渲染(Rendering)。 One of the applications of the present invention is an online game in which a player uses a user device to play a game on a server through a network, and the server acts according to the instructions of the player and generates video on the user device; for example, when When a player takes action on the user device, this action is It will be transmitted to the server device, and an image will be calculated thereon, and then the image will be transmitted back to the user device. In many online games, the 2D image produced by the server includes 3D rendering of other objects in the line of sight.

本發明藉由伺服器提供用戶裝置所需之3D模型及3D聲音、以於伺服器及用戶裝置間、進行位在視線範圍內物件的3D渲染解析，例如，伺服器提供一些或所有的3D模型及3D聲音至用戶裝置，並夾帶每一3D模型或3D聲音相關之詮譯資料，比如位置，座向及狀態資料等等。 The invention provides a 3D model and 3D sound required by the user device by the server, and performs 3D rendering analysis of the object in the line of sight range between the server and the user device. For example, the server provides some or all 3D models. And 3D sound to the user device, and interpret each 3D model or 3D sound related interpretation materials, such as position, orientation and status data.

舉例說明，當在遊戲之初始，所有在用戶裝置上與遊戲相關之影像(包括相關之3D渲染)係透過網路由伺服器產生，成為具立體聲之2D視頻串流。本發明之系統、透過網路，在視線範圍內、推送3D模型及3D聲音等媒體及其渲染資訊至用戶裝置，較近(靠近眼睛)的物件優先推送。本發明系統盡可能於用戶裝置上進行3D模型及3D聲音之渲染，退而求其次則是於伺服器上進行如3D模型或3D聲音等之渲染。 For example, at the beginning of the game, all game-related images (including related 3D rendering) on the user device are generated by the network routing server to become a stereo 2D video stream. The system of the present invention transmits the 3D model and the 3D sound and other media and the rendering information to the user device through the network in the line of sight, and the object closer to the eye (the eye) is preferentially pushed. The system of the present invention performs 3D model and 3D sound rendering as much as possible on the user device, and secondly performs rendering such as 3D model or 3D sound on the server.

一俟一3D模型或一3D聲音既存於用戶裝置上，伺服器僅需提供物件(3D模型或3D聲音)之詮譯資料至用戶裝置，用戶裝置可據此渲染這些物件、並將結果呈現於伺服器所提供之任何具立體聲之2D視訊之上；除非用戶裝置之要求，否則伺服器將不會對此3D模型及3D聲音進行渲染。本發明方法此一安排將節省伺服器上之GPU計算，伺服器可維護一動態資料庫，包含3D模型及3D聲音、以增進與用戶通訊之效能。 The 3D model or a 3D sound is stored on the user device, and the server only needs to provide the interpretation data of the object (3D model or 3D sound) to the user device, and the user device can render the objects according to the present and present the result to the user device. Any stereo 2D video provided by the server; the server will not render this 3D model and 3D sound unless requested by the user device. The arrangement of the method of the present invention saves GPU computing on the server, and the server can maintain a dynamic database, including 3D models and 3D sounds, to improve the performance of communication with the user.

本發明中，用戶裝置所顯示者乃包含下列之組合：(a)一於伺服器上渲染之3D場景，結果為具立體聲之2D視頻串流的形式，傳送至用戶端並由用戶裝置所播放、以及(b)自伺服器上下載、儲存於用戶裝置上，由用戶裝置自行渲染3D模型及3D聲音之結果，此一具立體聲之2D視訊串流與用戶裝置上渲染之3D模型及3D聲音之混合，將會在降低頻寬占用之情形下、創造出一繽紛的3D場景以及動人的環場音效。 In the present invention, the user device displays the following combinations: (a) a 3D scene rendered on the server, and the result is a stereo 2D video stream, transmitted to the user and played by the user device. And (b) downloading from the server and storing on the user device, and the user device renders the 3D model and the result of the 3D sound by itself, the stereo 2D video stream and the 3D model and 3D sound rendered on the user device. The mix will create a colorful 3D scene and an exciting surround sound effect while reducing bandwidth usage.

在一實施例中，傳送至用戶裝置之具立體聲之2D視訊串流係夾帶3D模型及3D聲音之詮譯資料，用戶裝置會檢測自己是否存有此3D模型及3D聲音，如果沒有的話，用戶裝置將會自伺服器下載所需之3D模型及3D聲音，下載後，用戶裝置會將之儲存、並建立資料清單，以備之後重建場景所需。如此，視頻串流之延遲與需要大量頻寬等問題，將會獲得改善，且由用戶裝置端自行渲染，將可得到品質更佳之影像(因未經過視頻壓縮)。 In one embodiment, the stereo 2D video stream transmitted to the user device carries the 3D model and the 3D sound interpretation data, and the user device detects whether the 3D model and the 3D sound are present, if not, the user The device will download the required 3D model and 3D sound from the server. After downloading, the user device will store it and create a list of materials for later reconstruction. In this way, the delay of the video stream and the need for a large amount of bandwidth will be improved. And the user device will render it by itself, and the better quality image will be obtained (because it has not been compressed by video).

前述之詮譯資料將允許用戶裝置可在不遺漏或重覆任何3D模型或3D聲音之情形下、正確地混合用戶裝置端以3D模型及3D聲音所渲染之結果，與伺服器提供之具立體聲之2D視訊串流；如前所述，當用戶裝置儲存所有需要的3D模型及3D聲音後，用戶裝置即可重建完整之3D場景及聲音，此時，伺服器即不再需要進行任何渲染、直至一新加入但用戶裝置端未儲存之一新的3D模型或3D聲音出現時止，而當遇到一新的3D模型時，伺服器會將此新的3D模型及其後之所有件物件進行渲染、直到用戶裝置可自行渲染此新3D模型為止。此外，如果遇到一新的3D聲音，則伺服器會將此3D聲音渲染、直至其可為用戶裝置端運用時止。 The aforementioned interpretation data will allow the user device to correctly mix the results rendered by the user device with the 3D model and the 3D sound without missing or repeating any 3D model or 3D sound, and the stereo provided by the server. 2D video streaming; as described above, when the user device stores all the required 3D models and 3D sounds, the user device can reconstruct the complete 3D scene and sound. At this time, the server no longer needs to perform any rendering, Until a new one is added but a new 3D model or 3D sound is not stored on the user device side, and when a new 3D model is encountered, the server will use this new 3D model and all subsequent objects. Render until the user device can render this new 3D model on its own. In addition, if a new 3D sound is encountered, the server will render the 3D sound until it is available for the user device.

用戶裝置會盡可能地將下載的3D模型及3D聲音存放(快取)於自己的儲存裝置上，藉此避免往後執行時需要重覆下載，爰此，網路的頻寬成本將會進一步降低，而若是無法儲存，下載和渲染會在執行時完成。 The user device will store (cache) the downloaded 3D model and 3D sound on its own storage device as much as possible, so as to avoid repeated downloading in the future execution, and the bandwidth cost of the network will further Reduced, and if it can't be saved, the download and rendering will be done at execution time.

如第1圖所示，係為本發明透過網路傳送媒體的系統一標準實施例之示意圖。伺服器1係用以執行一提供服務之應用程式，此服務可為(但不限定為)一雲端線上遊戲服務；複數個用戶裝置21、22、23可透過一網路4連結(登入)至伺服器1、以使用由伺服器1上運轉的應用程式所提供之服務。在此實施例中，網路4係為一網際網路，而用戶裝置21、22、23係為可連接至網際網路之任何電子裝置，例如像(但未侷限於)智慧型手機21、數位板、筆記型電腦22、桌上型電腦23、視訊遊戲機、或是智慧型電視等，一些用戶裝置21、22係透過一行動基地台無線連結至網路4，而另一些用戶裝置則可透過一路由器、以有線之方式、連結至網路4上；在伺服器1上運轉的應用程式可產生一包含複數個3D模型及3D聲音之虛擬3D環境，每一3D模型或3D聲音並搭配狀態，指示3D模型或3D聲音是否預存在一用戶裝置21、22、23中。在本發明之一較佳實施例中，對每一用戶裝置而言，皆有一對應之獨立應用程式，亦即，一應用程式僅提供服務給一用戶裝置，惟多個應用程式可同時在同一個伺服器上執行、以提供服務至多數個用戶裝置。如圖所示，用戶裝置21、22、23透過網路4連結至伺服器1、以獲取由應用程式產生且包含至少一些3D模型及3D聲音之媒體，此一系統架構與其特徵係詳述於第2圖及相關之敘述中。 FIG. 1 is a schematic diagram of a standard embodiment of a system for transmitting media over a network according to the present invention. The server 1 is used to execute a service providing application, which may be, but is not limited to, a cloud online game service; a plurality of user devices 21, 22, 23 may be linked (logged in) through a network 4 to The server 1 uses the services provided by the application running on the server 1. In this embodiment, the network 4 is an internet network, and the user devices 21, 22, and 23 are any electronic devices connectable to the Internet, such as, but not limited to, a smart phone 21. A tablet, a notebook computer 22, a desktop computer 23, a video game console, or a smart TV, some of the user devices 21, 22 are wirelessly connected to the network 4 through a mobile base station, while other user devices are It can be connected to the network 4 through a router and wired; the application running on the server 1 can generate a virtual 3D environment containing a plurality of 3D models and 3D sounds, each 3D model or 3D sound. The collocation state indicates whether the 3D model or the 3D sound is pre-stored in a user device 21, 22, 23. In a preferred embodiment of the present invention, each user device has a corresponding independent application, that is, an application only provides a service to a user device, but multiple applications can be simultaneously Executed on one server to provide services to a large number of user devices. As shown, the user devices 21, 22, 23 are connected to the server 1 via the network 4 to obtain media generated by the application and containing at least some 3D models and 3D sounds. The system architecture and its features are detailed in Figure 2 and related descriptions.

第2圖所示係為本發明系統架構一實施例之示意圖。 FIG. 2 is a schematic diagram of an embodiment of a system architecture of the present invention.

本發明中，應用程式100係於一伺服器1上運轉、用以產生3D影像3D聲音之渲染結果，通常為一3D遊戲。3D場景傳輸器110係為一程式庫(library)，於應用程式100在編譯時期與之靜態連結，或於應用程式100執行期間與之動態連結；3D場景用戶端(程式)170則是一於用戶裝置21、22、23上執行之程式、用以產生並輸出由應用程式100生成之3D影像及3D聲音渲染結果。在此實施例中，對每一用戶裝置21、22、23而言，其皆對應有各別獨立之應用程式100及場景傳輸器110。 In the present invention, the application 100 is run on a server 1 for generating a 3D image 3D sound rendering result, typically a 3D game. The 3D scene transmitter 110 is a library that is statically linked to the application 100 at compile time or dynamically linked during execution of the application 100; the 3D scene client (program) 170 is one Programs executed on the user devices 21, 22, 23 for generating and outputting 3D images and 3D sound rendering results generated by the application 100. In this embodiment, each user device 21, 22, 23 corresponds to a separate application 100 and a scene transmitter 110.

本發明中，3D場景用戶端170及3D場景快取190，組成用戶端之程式與執行方法、以發揮用戶裝置本身渲染3D模型與3D聲音之運算能力。 In the present invention, the 3D scene client 170 and the 3D scene cache 190 form a program and execution method of the user terminal, so as to exert the computing power of the user device itself to render the 3D model and the 3D sound.

3D場景伺服器120係為一與應用程式100共同於伺服器1上執行之一伺服器程式，乃作為伺服器1之3D場景傳輸器110，與用戶裝置21、22、23之3D場景用戶端170間，訊息傳遞之中繼站。同時亦作為一檔案下載伺服器，供用戶裝置21、22、23之3用戶端170自伺服器1下載必要3D模型及3D聲音。3D場景傳輸器110保有一個清單，列出所有3D模型及3D聲音，以及個模型或聲音的狀態，此狀態係用以指明每一個3D模型或3D聲音之狀態為(1)“Not Ready(未備妥)”、(2)“Loading(下載中)”、及(3)“Ready for Client(用戶已下載)”中之一者。 The 3D scene server 120 is a server program executed on the server 1 together with the application 100, and serves as a 3D scene transmitter 110 of the server 1, and a 3D scene client of the user devices 21, 22, and 23. 170, relay stations for message transmission. At the same time, it is also used as a file download server for the user terminals 21 of the user devices 21, 22, 23 to download the necessary 3D model and 3D sound from the server 1. The 3D scene transmitter 110 maintains a list of all 3D models and 3D sounds, as well as the state of the model or sound, which is used to indicate the status of each 3D model or 3D sound as (1) "Not Ready (not One of "" ready", (2) "Loading", and (3) "Ready for Client".

應用程式100之主程式、藉由透過API呼叫程式庫的方式(第2圖之路徑101)、將3D場景資訊傳送至3D場景傳輸器110，此3D場景資訊係包括名稱、位置、速度、屬性、座向及所有其他3D模型及3D聲音渲染所需之資料。在3D場景傳輸器110接收此類資料後，即可執行下列程序。 The main program of the application 100 transmits the 3D scene information to the 3D scene transmitter 110 by calling the library through the API (path 101 of FIG. 2), and the 3D scene information includes the name, location, speed, and attribute. , seating and all other 3D models and 3D sound rendering required. After the 3D scene transmitter 110 receives such material, the following program can be executed.

步驟(a)：對3D模型而言，將所有需渲染之3D模型排序，其排序方式係可為、相對一虛擬位置(如3D投影面或使用者之眼睛)、從近到遠排序。 Step (a): For the 3D model, all the 3D models to be rendered are sorted, and the ordering manner may be, relative to a virtual position (such as a 3D projection surface or a user's eyes), sorted from near to far.

對3D聲音而言，將所有需渲染之3D聲音排序，其排序方式係可為、相對一虛擬位置(如3D投影面或使用者之眼睛)、從近到遠排序。 For 3D sound, all 3D sounds to be rendered are sorted in such a way as to be relative to a virtual position (such as a 3D projection surface or the user's eyes), sorted from near to far.

在某些情況下，3D場景中之一3D模型A會包覆或疊覆在另一3D模型B上，例如，模型A可為一房屋、而模型B則可為房屋中之一桌子，在此情形下，哪一個模型較接近模擬位置實是一模稜兩可的問題，此時，模型A及模型B將會被視為同一3D模型，可稱為3D模型(A+B)。 In some cases, one of the 3D models A in the 3D scene may be wrapped or superimposed on another 3D model B. For example, model A may be a house and model B may be a table in a house. In this case, which model is closer to the simulated position is an ambiguous problem. At this time, model A and model B will be regarded as the same 3D model, which can be called 3D model (A+B).

有些對場景的已知資訊可用於輔助排序，例如，遊戲中的地面，可被視為在其他3D物件下之一大而平之3D模型，通常，使用者之眼睛會高於地面，因此，地面之3D模型在排序中需特別處理、以避免它顯示於其他3D模型前。 Some of the known information about the scene can be used to assist in sorting. For example, the ground in the game can be considered as a large and flat 3D model under other 3D objects. Usually, the user's eyes will be higher than the ground, therefore, The 3D model of the ground needs special handling in sorting to avoid it appearing in front of other 3D models.

步驟(b)：對於3D模型言，從最近點(最接近眼睛者)尋找第一個不具有“Ready for Client”狀態之3D模型“M”，換言之，第一個3D模型“M”之狀態係為"Not Ready"狀態，(在此之後，"Not Ready"狀態簡稱為NR狀態)；當然，也可能並無此類之3D模型存在(例如所有將被顯示的3D模型皆被標示為“Ready for Client”狀態)。 Step (b): For the 3D model, look for the first 3D model "M" that does not have a "Ready for Client" state from the closest point (closest to the eye), in other words, the state of the first 3D model "M" It is in the "Not Ready" state. (After that, the "Not Ready" state is referred to as the NR state); of course, there may be no such 3D models (for example, all 3D models to be displayed are marked as " Ready for Client status.

對於3D聲音言，從最近點(最接近眼睛者)尋找第一個不具有“Ready for Client”狀態之3D聲音“S”，換言之，第一個3D聲音“S”之狀態係為"Not Ready"狀態，(在此之後，"Not Ready"狀態簡稱為NR狀態)；當然，也可能並無此類之3D聲音存在(例如所有將被顯示的3D聲音皆被標示為“Ready for Client”狀態)。 For the 3D voice, look for the first 3D sound "S" that does not have the "Ready for Client" status from the closest point (closest to the eyes), in other words, the state of the first 3D sound "S" is "Not Ready". "Status, (after this, the "Not Ready" status is simply referred to as the NR status); of course, there may be no such 3D sounds present (eg all 3D sounds to be displayed are marked as "Ready for Client" status) ).

步驟(c)：對於3D模型言，伺服器渲染3D模型M及其後所有的3D模型，也就是所有比M距離眼睛更遠的3D模型。(如果沒有3D模型M，則以一黑螢幕顯示)，編碼渲染後之結果為一2D視訊串流影格(frame)。 Step (c): For the 3D model, the server renders the 3D model M and all subsequent 3D models, that is, all 3D models that are farther away from the eye than M. (If there is no 3D model M, it is displayed on a black screen), the result after encoding and rendering is a 2D video stream frame.

對於3D聲音言，在伺服器1上渲染(播放)所有不具有"ready for client"狀態之3D聲音(如果沒有此類3D聲音，則產生一靜音)，接著，編碼渲染後之結果為具步驟(C)中之2D視頻串流影格之立體聲。注意：接續3D模型S後之3D聲音僅在其狀態非為"Ready for Client"時、才會被渲染，此係與步驟(C)中之3D模型不同。 For 3D voices, all 3D sounds that do not have a "ready for client" state are rendered (played) on server 1 (if there is no such 3D sound, a muting is generated), and then the result of the encoding is rendered as a step (C) Stereo of 2D video streaming video frames. Note: The 3D sound after the 3D model S is only rendered when its state is not "Ready for Client", which is different from the 3D model in step (C).

步驟(d)：傳送下列六個資訊至3D場景伺服器120(路徑112)：[Info 112-A]、[Info 112-B]、[Info 112-C]、[Info 112-D]、[Info 112-E]及[Info 112-F])，而3D場景伺服器120會將以上資訊傳送至3D場景用戶端170(路徑122)。 Step (d): The following six pieces of information are transmitted to the 3D scene server 120 (path 112): [Info 112-A], [Info 112-B], [Info 112-C], [Info 112-D], [ Info 112-E] and [Info 112-F]), and the 3D scene server 120 transmits the above information to the 3D scene client 170 (path 122).

[Info 112-A]係為3D模型M前所有3D模型之狀態資訊(或稱詮釋資料)。注意可能無此類之模型存在。此類模型係皆具有“Ready for Client” 狀態，意謂著這些模型已經預載於用戶端裝置，用戶端裝置21,22,23上面的3D場景用戶端(程式)170，已經可以自行渲染這些模型。為了減少資料傳輸頻寬，3D場景傳輸器110可以不必傳送全部的狀態資訊，只要傳送狀態資訊中，本次渲染與上次渲染的差異即可。 [Info 112-A] is the status information (or interpretation data) of all 3D models before the 3D model M. Note that there may be no such models exist. This type of model has "Ready for Client" The state means that these models have been preloaded on the client device, and the 3D scene client (program) 170 on the client devices 21, 22, 23 can already render the models themselves. In order to reduce the data transmission bandwidth, the 3D scene transmitter 110 does not have to transmit all the state information, as long as the difference between the current rendering and the last rendering is transmitted in the state information.

[Info 112-B]如果伺服器找到了3D模型M、且其用戶裝置之狀態為“Not Ready”時，伺服器將更改其用戶狀態為“Loading”，並送出一3D模型M之下載指示，要求用戶端下載此3D模型M；如果用戶狀態已為“Loading”，則不要送出任何指示，因下載指示已送出。 [Info 112-B] If the server finds the 3D model M and the status of its user device is "Not Ready", the server will change its user status to "Loading" and send a download instruction of a 3D model M. The client is required to download the 3D model M; if the user status is already "Loading", do not send any indication, as the download instruction has been sent.

[Info 112-C]係為步驟(C)中之編碼後的視訊串流影格。 [Info 112-C] is the encoded video stream frame in step (C).

[Info 112-D]係指所有狀態為"ready for client"之3D聲音(亦可能無此類3D聲音存在)之狀態資訊(或稱詮釋資料)，此類聲音型係皆具有“Ready for Client”狀態，意謂著這些聲音已經預載於用戶端裝置，用戶端裝置21,22,23上面的3D場景用戶端(程式)170，已經可以自行渲染(播放)這些聲音。為了減少資料傳輸頻寬，3D場景傳輸器110可以不必傳送全部的狀態資訊，只要傳送狀態資訊中，本次渲染與上次渲染的差異即可。 [Info 112-D] refers to the status information (or interpret data) of all 3D sounds with the status "ready for client" (and may not have such 3D sounds). These sound types all have "Ready for Client". The state means that the sounds have been preloaded on the client device, and the 3D scene client (program) 170 on the client devices 21, 22, 23 can already render (play) the sounds themselves. In order to reduce the data transmission bandwidth, the 3D scene transmitter 110 does not have to transmit all the state information, as long as the difference between the current rendering and the last rendering is transmitted in the state information.

[Info 112-E]如果伺服器找到了3D聲音S、且其用戶狀態為“Not Ready”時，更改其用戶狀態為“Loading”，並送出一3D聲音S之下載指示，要求用戶端下載此3D聲音S；如果用戶狀態已為“Loading”，則不要送出任何指示，因下載指示已送出。 [Info 112-E] If the server finds the 3D sound S and its user status is "Not Ready", change its user status to "Loading" and send a 3D sound S download instruction, requesting the client to download this 3D sound S; if the user status is already "Loading", do not send any instructions, as the download instructions have been sent.

[Info 112-F]係為步驟(C)中之編碼後的立體聲。 [Info 112-F] is the encoded stereo in step (C).

當每次應用程式100之主程式將新的3D場景資料更新至3D場景傳輸器110時，重複步驟(a)~(d)，通常，應用程式100之主程式會在每一次的渲染中更新此類資料。 Each time the main program of the application 100 updates the new 3D scene data to the 3D scene transmitter 110, steps (a) to (d) are repeated. Usually, the main program of the application 100 is updated in each rendering. Such information.

一俟3D場景用戶端170接收前述資料後，即進行後述之渲染程序。 Once the 3D scene client 170 receives the aforementioned data, the rendering procedure described later is performed.

步驟(i)：解碼[Info 112-C]之視訊影格、並使用這影格作為後續3D模型渲染之背景；此外，並解碼具有[Info 112-F]視訊之立體聲、並使用其為後續3D聲音渲染之背景聲音。 Step (i): Decode the video frame of [Info 112-C] and use this frame as the background for subsequent 3D model rendering; in addition, decode the stereo with [Info 112-F] video and use it as the subsequent 3D sound. The background sound rendered.

步驟(ii)：在步驟(i)編碼後之視訊影格上渲染所有[Info 112-A]中之3D模型，為降低網路頻寬佔用，3D場景用戶端170將會儲存此一[Info 112-A]資訊至記憶體中，因此下次3D場景傳輸器110可僅傳送下次渲染與本次渲染間之狀態[Info 112-A]差異，不需要傳送全部的狀態資訊；相同地，當在渲染所有屬[Info 112-D]之3D聲音時，係將之混以步驟(i)中解碼之立體聲，而為降低網路頻寬佔用，3D場景用戶端170將會儲存此一[Info 112-D]資訊至記憶體中，因此下次3D場景傳輸器110可僅傳送下次渲染與本次渲染間之狀態[Info 112-D]差異，不需要傳送全部的狀態資訊。 Step (ii): rendering all 3D models in [Info 112-A] on the video frame encoded in step (i), in order to reduce the network bandwidth occupation, the 3D scene client 170 will store this one [Info] 112-A] information into the memory, so the next 3D scene transmitter 110 can only transmit the difference between the state of the next rendering and the current rendering [Info 112-A], and does not need to transmit all state information; similarly, When all 3D sounds belonging to [Info 112-D] are rendered, they are mixed with the stereo decoded in step (i), and in order to reduce the network bandwidth occupation, the 3D scene client 170 will store this one [ Info 112-D] information is stored in the memory, so the next 3D scene transmitter 110 can only transmit the difference between the next rendering and the current rendering [Info 112-D], and does not need to transmit all state information.

步驟(iii)：步驟(ii)中，混合了伺服器傳來的具立體聲之視訊影格，與3D場景用戶端170自行渲染之3D模型與3D聲音，將兩者混合結果輸出、成為一具聲音之輸出視訊串流(路徑176)。 Step (iii): In step (ii), a stereo video frame transmitted from the server is mixed, and the 3D model and the 3D sound rendered by the 3D scene client 170 are self-rendered, and the mixed result is outputted into a sound. The output video stream (path 176).

如果提供有[Info 112-B]之狀態者，3D場景用戶端170將依下列程序處理3D模型M。 If the status of [Info 112-B] is provided, the 3D scene client 170 will process the 3D model M according to the following procedure.

步驟(I)：搜尋3D場景快取190(路徑174)，3D場景快取190係包含先前下載及儲存於用戶裝置21、22、23中之3D模型資料庫。 Step (I): Search 3D scene cache 190 (path 174), which includes a 3D model database previously downloaded and stored in user devices 21, 22, 23.

步驟(II)：如果3D場景快取190中已經有3D模型M，則執行步驟(V)。 Step (II): If there is already a 3D model M in the 3D scene cache 190, step (V) is performed.

步驟(III)：如果3D場景快取190中沒有3D模型M，則3D場景用戶端170將會送出一下載需求，至3D場景伺服器120(路徑172)，3D場景伺服器120將會回傳3D模型M之資料給3D場景用戶端170(路徑124)。 Step (III): If there is no 3D model M in the 3D scene cache 190, the 3D scene client 170 will send a download request to the 3D scene server 120 (path 172), and the 3D scene server 120 will return The data of the 3D model M is given to the 3D scene client 170 (path 124).

步驟(IV)：一俟3D模型完全下載後，3D場景用戶端170即將之存入3D場景快取190(路徑194)，藉此，當下次具有類似需求時，即不需再進行下載。 Step (IV): After the 3D model is completely downloaded, the 3D scene client 170 is about to be stored in the 3D scene cache 190 (path 194), whereby when there is a similar demand next time, there is no need to download again.

步驟(V)：3D場景用戶端170將自3D場景快取190中，提取3D模型M(路徑192)。 Step (V): The 3D scene client 170 will extract the 3D model M (path 192) from the 3D scene cache 190.

步驟(VI)：一俟下載完成(或先前早已下載)、3D場景用戶端170即可提取3D模型M；3D場景用戶端170將會送出一“3D model is ready on client(3D模型已在用戶裝置上)”之訊息至3D場景伺服器120(路徑113)，而3D場景伺服器120將會轉送此訊息至3D場景傳輸器110(路徑114)。 Step (VI): Once the download is completed (or has been downloaded earlier), the 3D scene client 170 can extract the 3D model M; the 3D scene client 170 will send a "3D model is ready on client". The message on the device is sent to the 3D scene server 120 (path 113), and the 3D scene server 120 will forward the message to the 3D scene transmitter 110 (path 114).

步驟(VII)：一俟3D場景傳輸器110接收此訊息後，即會將3D模型M之狀態由“Loading”改為“Ready for Client”。 Step (VII): After receiving the message, the 3D scene transmitter 110 changes the state of the 3D model M from "Loading" to "Ready for Client".

步驟(VIII)：在下一次的渲染中，3D場景傳輸器110將會知曉 3D模型M現已預載於用戶裝置中，故將會請3D場景用戶端170自行渲染，因此，伺服器1將不再需要渲染此一3D模型M。 Step (VIII): In the next rendering, the 3D scene transmitter 110 will know The 3D model M is now preloaded in the user device, so the 3D scene client 170 will be rendered by itself, so the server 1 will no longer need to render the 3D model M.

如果提供有[Info 112-E]之狀態者，3D場景用戶端170將依下述程序備妥3D聲音S。(類同前述對於關於[Info 112-B]所描述者) If the status of [Info 112-E] is provided, the 3D scene client 170 will prepare the 3D sound S according to the following procedure. (similar to the foregoing for those described in [Info 112-B])

步驟(I)：搜尋3D場景快取190(路徑174)，3D場景快取190係包含先前下載及儲存於用戶裝置21、22、23中之3D聲音資料庫。 Step (I): Search 3D scene cache 190 (path 174), which includes a 3D sound database previously downloaded and stored in user devices 21, 22, 23.

步驟(II)：如果3D場景快取190中備有3D聲音，則執行步驟(V)。 Step (II): If the 3D scene cache 190 is provided with a 3D sound, step (V) is performed.

步驟(III)：如果3D場景快取190中未備有3D聲音，則3D場景用戶端170將會送出一下載需求至3D場景伺服器120(路徑172)，3D場景伺服器120將會回傳3D聲音之資料給3D場景用戶端170(路徑124)。 Step (III): If the 3D scene cache 190 does not have a 3D sound, the 3D scene client 170 will send a download request to the 3D scene server 120 (path 172), and the 3D scene server 120 will return The 3D sound data is given to the 3D scene client 170 (path 124).

步驟(IV)：一俟3D聲音完全下載後，3D場景用戶端170即將之存入3D場景快取190(路徑194)，藉此，當下次具有類似需求時，即不需再進行下載。 Step (IV): After the 3D sound is completely downloaded, the 3D scene client 170 is about to be stored in the 3D scene cache 190 (path 194), whereby when there is a similar demand next time, there is no need to download again.

步驟(V)：3D場景用戶端170將自3D場景快取190中提取3D聲音S(路徑192)。 Step (V): The 3D scene client 170 will extract the 3D sound S (path 192) from the 3D scene cache 190.

步驟(VI)：一俟下載完成(或先前早已下載)、3D場景用戶端170即可提取3D聲音S；3D場景用戶端170將會送出一“3D sound is ready on client(3D聲音已在用戶裝置上)”之訊息至3D場景伺服器120(路徑113)，而3D場景伺服器120將會轉送此訊息至3D場景傳輸器110(路徑114)。 Step (VI): Once the download is completed (or has been downloaded earlier), the 3D scene client 170 can extract the 3D sound S; the 3D scene client 170 will send a "3D sound is ready on client". The message on the device is sent to the 3D scene server 120 (path 113), and the 3D scene server 120 will forward the message to the 3D scene transmitter 110 (path 114).

步驟(VII)：一俟3D場景傳輸器110接收此訊息後，即會將3D聲音S之狀態由“Loading”改為“Ready for Client”。 Step (VII): After receiving the message, the 3D scene transmitter 110 changes the state of the 3D sound S from "Loading" to "Ready for Client".

步驟(VIII)：在下一次的渲染中，3D場景傳輸器110將會知曉3D聲音S現已預載於用戶裝置中，故將會請求3D場景用戶端170自行渲染(播放)，是以，伺服器1即不再需要渲染此一3D聲音S。 Step (VIII): In the next rendering, the 3D scene transmitter 110 will know that the 3D sound S is now preloaded in the user device, so the 3D scene client 170 will be requested to render (play) by itself, yes, servo Device 1 no longer needs to render this 3D sound S.

在最初始時，用戶裝置21、22、23中是沒有任何3D模型及3D聲音的，所以3D場景傳輸器110將會渲染所有3D模型及3D聲音、並將其結果編碼為具立體聲之2D視頻串流，3D場景傳輸器110將會把3D模型的下載需求[Info 112-B]以及3D聲音的下載需求[Info 112-E]、自最近點送至3D投影面(或使用者之眼睛)，3D場景用戶端170將會自3D場景伺服器120下載每一3D模型或3D聲音，或者自3D場景快取190上一一提取；而當更多的3D模型及3D聲音可為3D場景用戶端170取得時，3D場景傳輸器110將會自動通知3D場景用戶端170自行渲染這些模型及聲音，並減少由3D場景傳輸器110渲染之3D模型及3D聲音之數量，藉此，在編碼過之2D視訊串流中的3D模型及3D聲音會越來越少，直至最後3D場景用戶端170上可取得所有的3D模型及3D聲音時為止，而之後，在此階段中，僅剩不具聲音之黑色螢幕，換言之，伺服器1不需再傳送2D視頻串流至用戶裝置21、22、23中，且伺服器1及用戶裝置21、22、23間之通訊頻寬佔用亦可大幅降低。 At the very beginning, there are no 3D models and 3D sounds in the user devices 21, 22, 23, so the 3D scene transmitter 110 will render all 3D models and 3D sounds and encode the result into stereo 2D video. Streaming, 3D scene transmitter 110 will download the 3D model download requirements [Info 112-B] and 3D sound download requirements [Info 112-E], from the nearest point to the 3D projection surface (or the user's eyes) The 3D scene client 170 will download each 3D model or 3D sound from the 3D scene server 120, or extract it from the 3D scene cache 190 one by one; and when more 3D models When the 3D sound is available to the 3D scene client 170, the 3D scene transmitter 110 will automatically notify the 3D scene client 170 to render the models and sounds themselves, and reduce the 3D model and 3D sound rendered by the 3D scene transmitter 110. The number, by which, the 3D model and the 3D sound in the encoded 2D video stream will be less and less until all the 3D models and 3D sounds are available on the last 3D scene client 170, and then, In this stage, only the black screen with no sound remains, in other words, the server 1 does not need to transmit the 2D video stream to the user devices 21, 22, 23, and the communication between the server 1 and the user devices 21, 22, and 23. The bandwidth occupancy can also be greatly reduced.

本發明中，一俟一新的3D模型N出現在實景時，3D場景傳輸器110即會(1)通知3D場景用戶端170僅渲染所有位於此新的3D模型N前之所有3D模型(相對使用者眼睛言)，(2)通知3D場景用戶端170下載此新的3D模型N，以及(3)3D場景傳輸器110將會渲染此新的3D模型N及所有位於其後之所有模型、並將其結果編碼為一具聲音之2D視頻串流，之後，再將此具聲音之2D視頻串流傳送至3D場景用戶端170，於是，3D場景用戶端170乃可3D模型N在用戶裝置上備妥前、持續重製應用程式100之3D影像及聲音之渲染結果。 In the present invention, when a new 3D model N appears in the real scene, the 3D scene transmitter 110 will (1) notify the 3D scene client 170 to render all the 3D models located before the new 3D model N (relatively (2) notifying the 3D scene client 170 to download the new 3D model N, and the 3D scene transmitter 110 will render the new 3D model N and all subsequent models, The result is encoded into a 2D video stream of sound, and then the 2D video stream of the sound is transmitted to the 3D scene client 170, so that the 3D scene client 170 is a 3D model N in the user device. The 3D image and sound rendering result of the application 100 is continuously reproduced before being prepared.

而當一新的3D聲音T出現在實景時，3D場景傳輸器110即會(1)通知3D場景用戶端170下載此新的3D聲音T，以及(2)3D場景傳輸器110將會渲染此新的3D聲音T、並將其結果編碼為一立體聲，之後，再將此立體聲與2D視頻串流傳送至3D場景用戶端170，於是，3D場景用戶端170乃可3D聲音T在用戶裝置上備妥前、持續重製應用程式100之3D影像及聲音之渲染結果。在此一程序中，僅渲染新的3D聲音T，3D場景傳輸器110並不需再渲染其他3D聲音T後方之所有3D聲音，此一作法乃是因為聲音之本質是與影像不同，影像會擋住其後影像之顯示，但聲音不會。 When a new 3D sound T appears in the real scene, the 3D scene transmitter 110 will (1) notify the 3D scene client 170 to download the new 3D sound T, and (2) the 3D scene transmitter 110 will render this. The new 3D sound T, and the result is encoded into a stereo, and then the stereo and 2D video stream is transmitted to the 3D scene client 170, so that the 3D scene client 170 can be the 3D sound T on the user device. The 3D image and sound rendering results of the application 100 are continuously reproduced before being prepared. In this program, only the new 3D sound T is rendered, and the 3D scene transmitter 110 does not need to render all the 3D sounds behind the other 3D sounds T. This is because the essence of the sound is different from the image, and the image will be Block the display of the subsequent image, but the sound will not.

背景音樂係可視為一具一預定3D位置之3D聲音，為能盡早下載背景音樂，所定義之預定3D位置應越接近使用者眼睛越好。 The background music can be regarded as a 3D sound with a predetermined 3D position. In order to download the background music as early as possible, the defined predetermined 3D position should be as close as possible to the user's eyes.

為降低伺服器負載、或避免由不穩定的網路資料傳遞所產生之噪音，伺服器可放棄視訊中所有3D聲音之編碼。在此情形下，3D聲音僅於其下載及預存在一用戶裝置中後、始於用戶裝置上播放。 To reduce the load on the server or to avoid the noise generated by unstable network data transmission, the server can abandon the encoding of all 3D sounds in the video. In this case, the 3D sound is played only on the user device after it is downloaded and pre-stored in a user device.

對於3D聲音言，伺服器1檢驗3D聲音的狀態、以決定那一個3D聲音需被編碼為一具立體聲之2D視頻串流，其編碼方式係為將非預存在用戶裝置中之3D聲音編碼至視頻影格中；其中，當一3D聲音被編碼為視訊影格中之立體聲時，其左右聲道之音量係由其位置及相對使用者耳朵之速度所決定；其中，背景音樂可定義為一預定位置上之3D音效。 For the 3D voice, the server 1 checks the state of the 3D sound to determine which 3D sound needs to be encoded as a stereo 2D video stream, which is encoded in a non-pre-existing manner. The 3D sound in the user device is encoded into the video frame; wherein when a 3D sound is encoded as stereo in the video frame, the volume of the left and right channels is determined by its position and the speed of the user's ear; Background music can be defined as a 3D sound effect at a predetermined location.

第3A圖所示係為本發明透過網路傳送媒體的方法一實施例之流程圖；當開始透過網路傳送影像時(步驟60)，在一伺服器上執行一應用程式產生一包含複數個3D模型之虛擬3D環境(步驟61)，每一3D模型係搭配一狀態，該狀態指示此3D模型是否預存在用戶裝置中。 FIG. 3A is a flowchart of an embodiment of a method for transmitting media through a network according to the present invention; when starting to transmit an image through a network (step 60), executing an application on a server generates a plurality of In the virtual 3D environment of the 3D model (step 61), each 3D model is associated with a state indicating whether the 3D model is pre-existing in the user device.

伺服器接著檢驗3D模型的狀態(步驟62)、以決定那一個3D模型需被編碼進2D視頻串流影格，非預存在用戶裝置中之3D模型，將被編碼至影格中；伺服器將以某一虛擬位置(通常是3D投影面或使用者眼睛)為準，由近到遠，一一檢驗各3D模型之狀態，在檢驗中，當發現第一個未預存在用戶裝置中之3D模型時，即將此一發現之3D模型標記為一NR狀態，接著，無論其後之3D模型是否預存在用戶裝置中，此一3D模型M及其後方之所有3D模型皆會被編碼至影格中(步驟63)；而當任何3D模型的位置改變時、或者當排序參考用之虛擬位置改變時，即重新執行前述之檢驗，且依最新的檢驗結果決定一3D模型是否須被編碼至視訊影格中。 The server then verifies the state of the 3D model (step 62) to determine which 3D model needs to be encoded into the 2D video stream frame, and the 3D model that is not pre-existing in the user device will be encoded into the frame; the server will A virtual position (usually a 3D projection surface or a user's eye) prevails, from near to far, to check the state of each 3D model one by one. In the test, when the first 3D model in the user device is not pre-existing At this time, the discovered 3D model is marked as an NR state, and then, regardless of whether the subsequent 3D model is pre-existing in the user device, the 3D model M and all 3D models behind it are encoded into the frame ( Step 63); and when the position of any 3D model is changed, or when the virtual position for the sorting reference is changed, the aforementioned test is re-executed, and whether the 3D model has to be encoded into the video frame is determined according to the latest test result. .

步驟64：在2D視頻串流影格編碼後，伺服器即將此2D視頻串流影格以及未預存在用戶裝置中之3D模型(亦即此具有NR狀態之3D模型及其後方之所有3D模型)、以一預定順序、傳送至用戶裝置，此一預定順序係自最接近3D投影面(或使用者眼睛)之一點到最遠點3D投影面之一點之順序；一俟用戶裝置接收2D視頻串流影格(步驟65)，用戶裝置即解碼自伺服器傳來之影格、並使用此影格作為渲染預存在用戶裝置中但未包含於影格中之3D模型之背景，以藉此產生一具聲音之輸出視頻串流之混合影格(步驟66)；當用戶裝置接收由伺服器傳來之3D模型時，用戶裝置即將此3D模型儲存、並隨後傳送一訊息到伺服器，通知伺服器更改3D模型之狀態為"現在已預存在用戶裝置中"，之後，用戶裝置將伺服器傳來之視頻串流與自行渲染結果，兩者混合輸出，成為新的視訊。 Step 64: After the 2D video stream frame encoding, the server will stream the 2D video stream frame and the 3D model that is not pre-existing in the user device (that is, the 3D model with the NR state and all the 3D models behind it), Transmitted to the user device in a predetermined order, the predetermined sequence is from the point closest to the point of the 3D projection surface (or the user's eye) to the point of the farthest point 3D projection surface; the user device receives the 2D video stream The frame (step 65), the user device decodes the frame from the server and uses the frame as a background for rendering the 3D model pre-existing in the user device but not included in the frame, thereby generating a sound output. The video stream is mixed (step 66); when the user device receives the 3D model transmitted by the server, the user device stores the 3D model and then transmits a message to the server, notifying the server to change the state of the 3D model. It is "pre-existing in the user device". After that, the user device mixes the video stream and the self-rendering result sent by the server into a new video.

在步驟62中，當一新的3D模型出現在3D環境中時，無論其後方之3D模型是否預存在用戶裝置中，即將新的3D模型與其後方所有3D模型編碼至影格中。 In step 62, when a new 3D model appears in the 3D environment, whether the 3D model behind it is pre-existing in the user device, the new 3D model and all 3D models behind it are encoded into the frame.

在步驟64中，伺服器亦將未被編碼至視訊串流影格中之3D模型之狀態資訊(或稱詮釋資料)傳送至用戶裝置；用戶裝置於接收及檢驗狀態資訊時，係依下列方式進行：如果接收到的狀態資訊中之任何3D模型係為非預存在用戶裝置中者，則用戶裝置即送出一需求至伺服器、以下載3D模型(步驟661)，狀態資訊係包括每一未被編碼至影格中之詮譯資料，每一詮譯資料係包括3D模型之一名稱、一位置、一速度、一座向、以及一屬性及每一3D模型之狀態。 In step 64, the server also transmits status information (or interpretation data) of the 3D model that is not encoded into the video stream frame to the user device; when receiving and verifying the status information, the user device performs the following manner. If any of the received 3D models in the status information is non-pre-existing in the user device, the user device sends a request to the server to download the 3D model (step 661), and the status information includes each The interpretation data encoded into the frame, each interpretation data includes a name of a 3D model, a position, a velocity, a direction, and an attribute and the state of each 3D model.

第3B圖所示係為本發明透過網路傳送媒體的方法另一實施例之流程圖；當開始透過網路傳送聲音時(步驟60a)，在一伺服器上執行一應用程式產生一包含複數個3D聲音之虛擬3D環境(步驟61a)，每一3D聲音係搭配一狀態，該狀態指示此3D聲音是否預存在用戶裝置中。 FIG. 3B is a flow chart showing another embodiment of a method for transmitting media through a network according to the present invention; when starting to transmit a voice over a network (step 60a), executing an application on a server generates a complex number A virtual 3D environment of 3D sounds (step 61a), each 3D sound is associated with a state indicating whether the 3D sound is pre-existing in the user device.

伺服器接著檢驗3D聲音的狀態(步驟62a)、以決定那一個3D聲音需被編碼進2D視頻串流影格，非預存在用戶裝置中之3D聲音，將被編碼至影格中；伺服器將以某一虛擬位置(通常是3D投影面或使用者眼睛)為準，由近到遠，一一檢驗各3D聲音之狀態，在檢驗中，當發現第一個未預存在用戶裝置中之3D聲音時，即將此一發現之3D聲音標記為一NR狀態。 The server then checks the status of the 3D sound (step 62a) to determine which 3D sound needs to be encoded into the 2D video stream frame, and the 3D sound that is not pre-existing in the user device will be encoded into the frame; the server will A virtual position (usually a 3D projection surface or a user's eye) prevails, from near to far, one by one to check the state of each 3D sound, in the test, when the first 3D sound in the user device is not pre-existing At this time, the 3D sound of this discovery is marked as an NR state.

步驟64a：在包含聲音之視頻串流影格編碼後，伺服器即將此具聲音之2D視頻串流影格以及未預存在用戶裝置中之3D聲音(亦即此具有NR狀態之3D聲音)、以一預定順序、傳送至用戶裝置，此一預定順序係自最接近3D投影面(或使用者眼睛)之一點到最遠點3D投影面之另一點之順序；一俟用戶裝置接收包含聲音之視頻串流影格(步驟65a)後，用戶裝置即解碼包含於視訊串流中之音頻(亦即聲音)、並使用此音頻作為渲染預存在用戶裝置中但未包含於視訊串流影格中之3D聲音之背景，以藉此產生一混合音頻(步驟66a)；當用戶裝置接收由伺服器傳來之3D聲音時，用戶裝置即將此3D聲音儲存、並隨後傳送一訊息到伺服器，通知伺服器更改3D聲音之狀態為"現在已預存在用戶裝置中"，之後，用戶裝置將伺服器傳來之視頻串流中之音訊，與自行渲染(播放)3D聲音之結果，兩者混合輸出，成為新的音訊。 Step 64a: After encoding the video stream frame containing the sound, the server sends the 2D video stream frame of the sound and the 3D sound (that is, the 3D sound with the NR state) not pre-preserved in the user device, The predetermined sequence is transmitted to the user device in a sequence from the point closest to the 3D projection surface (or the user's eye) to the other point at the farthest point 3D projection surface; the user device receives the video string containing the sound After the stream frame (step 65a), the user device decodes the audio (ie, sound) included in the video stream and uses the audio as a 3D sound that is pre-existing in the user device but not included in the video stream frame. a background to thereby generate a mixed audio (step 66a); when the user device receives the 3D sound transmitted by the server, the user device stores the 3D sound and then transmits a message to the server, notifying the server to change the 3D The state of the sound is "now pre-existing in the user device", after which the user device mixes the audio in the video stream sent by the server with the result of self-rendering (playing) the 3D sound. , A new audio.

在步驟62a中，當一新的3D聲音出現在3D環境中時，即行將新的3D聲音編碼至具聲音之2D視訊串流影格中，然而，此新的3D聲音並不影響其他3D聲音是否被渲染，此點與前述步驟62中之3D模型不同。 In step 62a, when a new 3D sound appears in the 3D environment, the new 3D sound is encoded into the 2D video stream frame with sound. However, the new 3D sound does not appear. Whether the other 3D sounds are rendered is different from the 3D model in the aforementioned step 62.

在步驟64a中，伺服器亦將未被編碼至影格中之3D聲音之狀態資訊傳送至用戶裝置；用戶裝置於接收及檢驗狀態資訊時，係依下列方式進行：如果接收到的狀態資訊中之任何3D聲音係為非預存在用戶裝置中者，則用戶裝置即送出一需求至伺服器、以下載3D聲音(步驟661a)，狀態資訊係包括每一未被編碼至影格中之詮譯資料，每一詮譯資料係包括3D聲音之一名稱、一位置、一速度、一座向、以及一屬性及每一3D聲音之狀態。 In step 64a, the server also transmits status information of the 3D sound that is not encoded into the frame to the user device; when receiving and verifying the status information, the user device performs the following manner: if the received status information is If any 3D sound is non-pre-existing in the user device, the user device sends a request to the server to download the 3D sound (step 661a), and the status information includes each of the interpreted data that is not encoded into the frame. Each interpretation data includes a name of a 3D sound, a position, a speed, a direction, and an attribute and the state of each 3D sound.

請參閱圖第4A、4B及4C圖所示，係顯示本發明方法如何傳送視訊串流及3D模型一實施例之示意圖。 Referring to Figures 4A, 4B and 4C, there is shown a schematic diagram of an embodiment of the method of the present invention for transmitting video streams and 3D models.

如第4A圖所示，當初始用戶裝置74登入伺服器上運轉之應用程式70時，並未有任何3D模型預存在用戶裝置中，因此，伺服器渲染所有的3D模型(包括一人71及其後之一房屋72)，所有之3D模型應皆顯示在用戶裝置的螢幕上，伺服器將渲染結果編碼為一2D視訊串流影格73，接著，將此影格73傳送至用戶裝置74。在此階段中，影格73係包括人71及房屋72，用戶裝置74僅輸出此一影格73、而不需渲染其他物件。 As shown in FIG. 4A, when the initial user device 74 logs into the application 70 running on the server, no 3D model is pre-stored in the user device, so the server renders all 3D models (including one person 71 and its In the latter house 72), all 3D models should be displayed on the screen of the user device, and the server encodes the rendering result into a 2D video stream frame 73, and then transmits the frame 73 to the user device 74. In this stage, the frame 73 includes a person 71 and a house 72, and the user device 74 outputs only this frame 73 without rendering other objects.

接著，如第4B圖所示，伺服器70開始先將3D模型傳送至用戶裝置、從最接近用戶裝置螢幕3D投影面的3D模型開始；在此實施例中，與房屋72比較，人71是較接近3D投影面(或使用者眼睛)，因此，人71的3D模型乃先傳送至用戶裝置74，一俟人71的3D模型傳送並儲存在用戶裝置74上後，用戶裝置74即傳送一訊息至伺服器70、以告知人71的3D模型現已預存在用戶裝置74中；之後，伺服器70渲染房屋72、編碼其渲染結果為一2D視訊串流影格73a、傳送此影格73a與人71a的詮譯資料至用戶裝置74，用戶裝置74隨即自動利用詮譯資料渲染人，再結合人的渲染結果與影格73a(包含房屋)、以獲得相同之輸出結果；此程序(例如伺服器以一次一個之方式將3D模型傳送至用戶裝置74)將一再重複，直到所有用戶端需要顯示的3D模型，都已被傳送及預存在用戶裝置中74時為止。 Next, as shown in FIG. 4B, the server 70 begins to transmit the 3D model to the user device, starting from the 3D model closest to the 3D projection surface of the user device screen; in this embodiment, the person 71 is compared to the house 72. Closer to the 3D projection surface (or the user's eyes), therefore, the 3D model of the person 71 is first transmitted to the user device 74, and after the 3D model of the person 71 is transmitted and stored on the user device 74, the user device 74 transmits a The message to the server 70, the 3D model of the informant 71 is now pre-stored in the user device 74; after that, the server 70 renders the house 72, encodes the rendering result as a 2D video stream frame 73a, and transmits the frame 73a to the person. The interpreting data of 71a is sent to the user device 74, and the user device 74 automatically renders the person using the interpreting data, and then combines the rendering result of the person with the frame 73a (including the house) to obtain the same output result; the program (for example, the server The method of transmitting the 3D model to the user device 74 one at a time will be repeated again and again until all 3D models that the client needs to display have been transmitted and pre-stored in the user device 74.

如第4C圖所示，一俟用戶裝置74擁有所有的3D模型(包括人與房屋之3D模型)，伺服器及不需再進行渲染操作、亦不需再傳送視訊串流影格(元件75)；伺服器僅需傳送3D模型之詮譯資料(包含人71a及房屋72a)至用戶裝置74。用戶裝置將可自行渲染所有3D模型，以獲得相同輸出結果。 As shown in FIG. 4C, a user device 74 has all 3D models (including 3D models of people and houses), the server does not need to perform rendering operations, and no need to transmit video stream frames (element 75). The server only needs to transmit the interpretation data of the 3D model (including the person 71a and the house 72a) to the user device 74. The user device will be able to render all 3D models on its own to achieve the same output.

請參閱圖第6A、6B及6C圖所示，係顯示本發明方法如何傳送具聲音之視訊串流及3D聲音一實施例之示意圖。 Referring to Figures 6A, 6B and 6C, there is shown a schematic diagram of an embodiment of the method of the present invention for transmitting video streams and 3D sounds with sound.

如第6A圖所示，當初始用戶裝置74登入伺服器上運轉之應用程式70時，並未有任何3D聲音預存在用戶裝置中，因此，伺服器渲染所有的3D聲音(包括一聲音81及其後之一聲音82)，所有之3D聲音應皆呈現在用戶裝置的揚聲器上，伺服器將渲染結果編碼為一具聲音之視訊串流影格83，接著，將此具聲音之視訊串流影格83傳送至用戶裝置74。在此階段中，具聲音之視訊串流影格83係包括聲音81及聲音82，用戶裝置74僅輸出此一具聲音之視訊串流影格83、而不需渲染(播放)其他聲音。 As shown in FIG. 6A, when the initial user device 74 logs into the application 70 running on the server, no 3D sound is pre-stored in the user device, so the server renders all 3D sounds (including a sound 81 and Subsequent to one of the sounds 82), all 3D sounds should be presented on the speaker of the user device, and the server encodes the rendering result into a video stream frame 83 of sound, and then the video stream of the audio stream is streamed. 83 is transmitted to the user device 74. In this stage, the audio video stream frame 83 includes a sound 81 and a sound 82, and the user device 74 outputs only the video stream frame 83 of the sound without rendering (playing) other sounds.

接著，如第6B圖所示，伺服器70開始先將3D聲音傳送至用戶裝置、從最接近用戶裝置螢幕3D投影面的3D聲音開始；在此實施例中，與聲音82比較，聲音81是較接近3D投影面(或使用者眼睛)，因此，聲音81的3D聲音乃先傳送至用戶裝置74，一俟聲音81的3D聲音傳送並儲存在用戶裝置74上後，用戶裝置74即傳送一訊息至伺服器70、以告知聲音81現已預存在用戶裝置74中；之後，伺服器70渲染聲音82、編碼其渲染結果為一具聲音之2D視訊串流影格83a、傳送此影格83a以及聲音81的詮譯資料至用戶裝置74，用戶裝置74隨即自動利用詮譯資料渲染(播放)聲音，再結合聲音的渲染結果與影格83a(包含聲音)、以獲得相同之輸出結果；此程序(例如伺服器以一次一個之方式將3D聲音傳送至用戶裝置74)將一再重複，直到所有需要在用戶裝置的揚聲器上播放的3D聲音，都已被傳送及預存在用戶裝置中74時為止。 Next, as shown in FIG. 6B, the server 70 begins to transmit the 3D sound to the user device, starting from the 3D sound closest to the 3D projection surface of the user device screen; in this embodiment, the sound 81 is compared with the sound 82. Closer to the 3D projection surface (or the user's eyes), therefore, the 3D sound of the sound 81 is first transmitted to the user device 74, and after the 3D sound of the sound 81 is transmitted and stored on the user device 74, the user device 74 transmits a The message is sent to the server 70 to inform that the sound 81 is now pre-existing in the user device 74; after that, the server 70 renders the sound 82, encodes the 2D video stream frame 83a whose rendering result is a sound, transmits the frame 83a, and the sound. The interpretation data of 81 is sent to the user device 74, and the user device 74 automatically renders (plays) the sound by using the interpreted data, and combines the rendering result of the sound with the frame 83a (including the sound) to obtain the same output result; for example, the program (for example The server transmits the 3D sound to the user device 74 one at a time.) It will be repeated again and again until all 3D sounds that need to be played on the speaker of the user device have been transmitted and pre-existed to the user. At the time of centering 74.

如第6C圖所示，一俟用戶裝置74擁有所有的3D聲音(包括聲音81與聲音82之3D聲音)，伺服器及不需再進行渲染操作，也因此視訊串流影格(元件85)僅包括影像、而不包括聲音；伺服器僅需傳送3D聲音81之詮譯資料(包含聲音81a及聲音82a)至用戶裝置74。用戶接著將可自行渲染(播放)所有3D聲音，以獲得相同輸出結果。 As shown in FIG. 6C, a user device 74 possesses all 3D sounds (including 3D sounds of sound 81 and sound 82), the server does not need to perform rendering operations, and thus the video stream frame (element 85) only The image is included, and the sound is not included; the server only needs to transmit the interpreted material of the 3D sound 81 (including the sound 81a and the sound 82a) to the user device 74. The user will then be able to render (play) all 3D sounds themselves to get the same output.

請參閱第5A、5B及5C圖所示，係顯示本發明方法如何決定哪個3D模型須被編碼至影格一實施例之示意圖。 Referring to Figures 5A, 5B and 5C, there is shown a schematic diagram of how the method of the present invention determines which 3D model has to be encoded into a frame.

在本發明中，伺服器將所有要被渲染的3D模型依一預定順序排序，此預定順序為：相對一虛擬位置(如用戶裝置螢幕的3D投影面52、或使用者眼睛51)、從近到遠的順序。如第5A圖所示，四物件A、B、C及D需顯示於用戶裝置的螢幕上，其中物件A最接近投影面52，然後依次為物件B、物件C及物件D，當初始用戶裝置登入伺服器上運轉之應用程式時，並未有任何3D聲音預存在用戶裝置中，因此，伺服器渲染所有的物件A、物件B、物件C及物件D、將渲染結果編碼為一視訊串流影格、再將此影格傳送至用戶裝置。同時，伺服器開始一一依預定順序將物件A、物件B、物件C及物件D等3D模型傳送出，亦即，物件A的3D模型會先被傳送，然後依次是物件B、物件C及物件D，直至顯示於用戶裝置上之所有3D模型皆被傳送完為止。 In the present invention, the server sorts all 3D models to be rendered in a predetermined order, which is a relative virtual position (such as the 3D projection surface 52 of the user device screen, or User's eyes 51), from near to far. As shown in FIG. 5A, the four objects A, B, C, and D need to be displayed on the screen of the user device, wherein the object A is closest to the projection surface 52, and then the object B, the object C, and the object D in turn, as the initial user device. When logging into the application running on the server, no 3D sound is pre-existing in the user device. Therefore, the server renders all objects A, B, C, and D, and encodes the rendering result into a video stream. The frame is then transmitted to the user device. At the same time, the server starts to transmit the 3D models of the object A, the object B, the object C and the object D in a predetermined order, that is, the 3D model of the object A is transmitted first, and then the object B, the object C and Object D until all 3D models displayed on the user device are transmitted.

如第5B圖所示，在物件A及B的3D模型皆預存在用戶裝置中後，當伺服器、依前提從近到遠之預定順序、檢驗3D模型的狀態時，伺服器將會發現物件C是第一個未預存在用戶裝置中之物件，因此，伺服器會將物件C及位於物件C後之所有其他物件(如物件D)渲染，無論物件D的3D模型是否預存在用戶裝置中，而此時，伺服器不會對物件A及B的3D模型進行渲染，而之所以如此，乃是此時物件A及B既是預存在用戶裝置中、又是位於該物件C之前。 As shown in FIG. 5B, after the 3D models of the objects A and B are pre-stored in the user device, the server will find the object when the server, according to the predetermined order from the near to the far, and the state of the 3D model is checked. C is the first object that is not pre-existing in the user device. Therefore, the server renders the object C and all other objects (such as object D) located after the object C, regardless of whether the 3D model of the object D is pre-existing in the user device. At this time, the server does not render the 3D models of the objects A and B, and the reason is that the objects A and B are both pre-existing in the user device and before the object C.

如第5C圖所示，當一新的物件E顯示在應用程式創造的虛擬3D環境中時，物件E及其後之所有物件皆會為伺服器所渲染，無論此物件是否預存在用戶裝置中，例如，如第5C圖所示，與物件B、物件C及物件D比較，新的物件E相對係較接近3D投影面52，雖然物件B的3D模型已預存在用戶裝置中，但因物件B示位於新的物件E之後，故伺服器會對所有的物件E、C、B及D進行渲染，即使物件B可能僅部分為其前面之其他物件所覆蓋。 As shown in Figure 5C, when a new object E is displayed in the virtual 3D environment created by the application, the object E and all subsequent objects are rendered by the server, whether or not the object is pre-stored in the user device. For example, as shown in FIG. 5C, compared with the object B, the object C, and the object D, the new object E is relatively close to the 3D projection surface 52, although the 3D model of the object B is pre-existing in the user device, but due to the object B is located after the new object E, so the server will render all the objects E, C, B and D, even if the object B may only be partially covered by other objects in front of it.

請參閱第7A、7B及7C圖所示，係顯示本發明方法如何決定哪個3D聲音須被編碼至具聲音之視訊串流影格一實施例之示意圖。 Referring to Figures 7A, 7B and 7C, there is shown a schematic diagram of an embodiment of how the method of the present invention determines which 3D sound is to be encoded into a video stream with sound.

在本發明中，伺服器將所有要被渲染的3D聲音依一預定順序排序，此預定順序為：相對一虛擬位置(如用戶裝置螢幕的3D投影面52、或使用者眼睛51)、從近到遠的順序。如第7A圖所示，四3D聲音A、B、C及D需於用戶裝置的揚聲器上播放，其中聲音A最接近投影面52，然後依次為聲音B、聲音C及聲音D，當初始用戶裝置登入伺服器上運轉之應用程式時，並未有任何3D聲音預存在用戶裝置中，因此，伺服器渲染所有的聲音A、聲音B、聲音C及聲音D、將渲染結果編碼為一具聲音之視訊串流影格、再將此影格傳送至用戶裝置。同時，伺服器開始一一依預定順序將聲音A、聲音B、聲音C及聲音D之資料傳送出，亦即，聲音A的3D聲音會先被傳送，然後依次是聲音B、聲音C及聲音D，直至所有3D聲音皆儲存入用戶裝置後為止。 In the present invention, the server sorts all 3D sounds to be rendered in a predetermined order, which is: relative to a virtual position (such as the 3D projection surface 52 of the user device screen, or the user's eyes 51), from near To the far order. As shown in Fig. 7A, the four 3D sounds A, B, C, and D need to be played on the speaker of the user device, wherein the sound A is closest to the projection surface 52, and then the sound B, sound C, and sound D are in turn, when the initial user When the device logs into the application running on the server, no 3D sound is pre-existing in the user device. Therefore, the server renders all the sounds A, B, C and D, and encodes the rendering result into a sound. Video streaming video, then Transfer this frame to the user device. At the same time, the server starts to transmit the data of sound A, sound B, sound C and sound D in a predetermined order, that is, the 3D sound of sound A is transmitted first, followed by sound B, sound C and sound. D, until all 3D sounds are stored in the user device.

如第7B圖所示，在聲音A及B的3D聲音皆預存在用戶裝置中後，當伺服器、依前提從近到遠之預定順序、檢驗3D聲音的狀態時，伺服器將會發現聲音C是第一個未預存在用戶裝置中之聲音，因此，伺服器會將聲音C及位於聲音C後之所有其他聲音(如聲音D)渲染，而伺服器不會對聲音A及B的3D聲音進行渲染，因在此階段中，聲音A及B已預存在用戶裝置中。 As shown in FIG. 7B, after the 3D sounds of the sounds A and B are pre-stored in the user device, the server will find the sound when the server, in the predetermined order from the near to the far, and the state of the 3D sound is checked. C is the first sound that is not pre-existing in the user device. Therefore, the server will render the sound C and all other sounds (such as sound D) located after the sound C, and the server will not 3D the sounds A and B. The sound is rendered, as in this phase, sounds A and B are pre-stored in the user device.

如第7C圖所示，當一新的聲音E加入至應用程式所創造的虛擬3D環境中時，聲音E將會為伺服器所渲染，但此渲染將不會影響到其他聲音之渲染，此與處理第5C圖中之3D模型不同，如第7C圖所示，與聲音B、聲音C及聲音D比較，新的聲音E相對係較接近3D投影面52，不似第5C圖中之3D模型，預存在用戶裝置中的聲音(如聲音A及B)仍會為用戶裝置所渲染，但非預存在用戶裝置的聲音(如聲音E、C及D)則為伺服器所渲染。 As shown in Figure 7C, when a new sound E is added to the virtual 3D environment created by the application, the sound E will be rendered by the server, but this rendering will not affect the rendering of other sounds. Different from the 3D model in FIG. 5C, as shown in FIG. 7C, compared with the sound B, the sound C, and the sound D, the new sound E is relatively close to the 3D projection surface 52, unlike the 3D in the 5C chart. Models, sounds pre-existing in the user device (such as sounds A and B) will still be rendered for the user device, but non-pre-existing user device sounds (such as sounds E, C, and D) are rendered by the server.

本發明上述的技術也可被運用於虛擬實境(Virtual-Reality；簡稱VR)場景系統，來將伺服器所執行之VR場景應用程式所產生的3D模型及VR視頻串流透過網路傳送給用戶裝置，以下將詳述之。 The foregoing technology of the present invention can also be applied to a Virtual Reality (VR) scenario system to transmit a 3D model and a VR video stream generated by a VR scenario application executed by a server to the network. User equipment, which will be described in detail below.

為了提供人眼一個VR的視覺感受，虛擬的VR場景必須包含一個專供人類左眼觀賞的影像、以及另一專供人類右眼觀賞的影像。請參閱第8圖，為本發明之VR場景系統的系統架構一第一實施例之示意圖。 In order to provide a human VR visual experience, the virtual VR scene must contain an image for the left eye of the human eye and another image for the human right eye. Please refer to FIG. 8 , which is a schematic diagram of a first embodiment of a system architecture of a VR scenario system according to the present invention.

本發明中，的場景伺服器1120是一個執行於一具有VR場景應用程式1100(也簡稱為VR應用程式或應用程式)之伺服器1上的伺服器電腦軟體，用以產生包含有複數3D模型的一虛擬VR 3D環境。VR場景應用程式1100也是於伺服器1上運轉，通常為一VR遊戲。VR場景伺服器1120係為一與應用程式1100共同於伺服器1上執行之一伺服器程式，乃作為伺服器1之VR場景傳輸器1110、與用戶裝置21、22、23之VR場景用戶端1170間，訊息傳遞之中繼站。VR場景伺服器1120同時亦作為一檔案下載伺服器，供用戶裝置21、22、23之VR場景用戶端1170自伺服器1下載必要的3D模型。VR場景傳輸器1110係為一程式庫(library)，於VR場景應用程式1100在編譯時期與之靜態連結，或於VR場景應用程式1100執行期間與之動態連結。VR場景用戶端(程式)1170則是一於用戶裝置21、22、23上執行之程式、用以在用戶裝置內產生並輸出由VR場景應用程式1100生成之3D影像渲染結果。在此實施例中，對每一用戶裝置21、22、23而言，其皆對應有各別獨立之VR場景應用程式1100及VR場景傳輸器1110。VR場景傳輸器1110保有一個清單，列出所有3D模型以及各個3D模型是否已被儲存在用戶裝置的狀態，此狀態係用以指明每一個3D模型在用戶裝置中的狀態為(1)“Not Ready(未備妥)”、(2)“Loading(下載中)”、及(3)“Ready for Client(用戶已下載)”中之一者。 In the present invention, the scene server 1120 is a server computer software executed on a server 1 having a VR scene application 1100 (also referred to as a VR application or application) for generating a complex 3D model. A virtual VR 3D environment. The VR Scene Application 1100 also runs on Server 1, typically a VR game. The VR scenario server 1120 is a server program executed on the server 1 together with the application program 1100, and is used as the VR scene transmitter 1110 of the server 1 and the VR scene client of the user devices 21, 22, and 23. 1170, the relay station for message transmission. The VR scene server 1120 also serves as a file download server for the VR scene client 1170 of the user devices 21, 22, 23 to download the necessary 3D model from the server 1. The VR scene transmitter 1110 is a library, and the VR scene application 1100 is compiled at the same time. Static link, or dynamically linked to the VR scene application 1100 during execution. The VR scene client (program) 1170 is a program executed on the user devices 21, 22, 23 for generating and outputting a 3D image rendering result generated by the VR scene application 1100 in the user device. In this embodiment, each user device 21, 22, 23 corresponds to a separate VR scenario application 1100 and a VR scenario transmitter 1110. The VR Scene Transmitter 1110 maintains a list listing all 3D models and whether each 3D model has been stored in the state of the user device. This state is used to indicate that the status of each 3D model in the user device is (1) "Not One of Ready (not ready), (2) "Loading", and (3) "Ready for Client".

伺服器1會檢查這些這些3D模型的該狀態，以決定哪些3D模型需被編碼在一2D視訊串流的一左眼影格、而哪些3D模型需被編碼在該2D視頻串流的一右眼影格，於本發明中，那些沒有被預先儲存在用戶裝置21、22、23中的3D模型都會被編碼在該左眼影格及右眼影格中。為了達到此功能，VR場景應用程式1100之主程式、藉由透過API呼叫程式庫的方式(第8圖之路徑1101)、將VR場景資訊傳送至VR場景傳輸器1110，此VR場景資訊係包括名稱、位置、速度、屬性、座向及所有其他3D模型渲染所需之資料。在VR場景傳輸器1110接收此類資料後，即可執行下列程序。 Server 1 will check the status of these 3D models to determine which 3D models need to be encoded in a left eye frame of a 2D video stream, and which 3D models need to be encoded in a right eye shadow of the 2D video stream. In the present invention, those 3D models that are not previously stored in the user devices 21, 22, 23 are encoded in the left eye frame and the right eye frame. In order to achieve this function, the main program of the VR scene application 1100 transmits the VR scene information to the VR scene transmitter 1110 by calling the library through the API (path 1101 of FIG. 8), and the VR scene information includes Name, location, speed, attributes, orientation, and all the information needed to render the 3D model. After the VR scene transmitter 1110 receives such data, the following procedure can be performed.

步驟(a)：對所有3D模型而言，將左眼影格中所有需渲染之3D模型排序，其排序方式係可為、相對一虛擬位置(如3D投影面或使用者之左眼睛)、從近到遠排序。 Step (a): For all 3D models, sort all the 3D models to be rendered in the left eye frame, and the ordering manner may be, relative to a virtual position (such as a 3D projection surface or a user's left eye), from Sorted from near to far.

步驟(b)：對於3D模型言，從最近點(最接近使用者的左眼睛者)尋找第一個不具有“Ready for Client”狀態之3D模型“M”，換言之，第一個3D模型“M”之狀態係為"Not Ready"狀態，(在此之後，"Not Ready"狀態簡稱為NR狀態)；當然，也可能並無此類之3D模型存在(例如所有將被顯示的3D模型皆被標示為“Ready for Client”狀態)。 Step (b): For the 3D model, look for the first 3D model "M" that does not have a "Ready for Client" state from the closest point (the one closest to the user's left eye), in other words, the first 3D model. The state of M" is the "Not Ready" state, (after which, the "Not Ready" state is simply referred to as the NR state); of course, there may be no such 3D models (for example, all 3D models to be displayed) It is marked as "Ready for Client" status).

步驟(c)：對於3D模型言，由伺服器1來渲染3D模型M及其後所有的3D模型，也就是所有比M距離左眼睛更遠的3D模型。(如果沒有3D模型M，則以一黑螢幕顯示)，編碼渲染後之結果為一2D視頻串流的左眼影格(frame)，提供給使用者的左眼觀看。 Step (c): For the 3D model, the 3D model M and all subsequent 3D models are rendered by the server 1, that is, all 3D models that are farther away from the left eye than M. (If there is no 3D model M, it is displayed on a black screen), the result of the coded rendering is a left eye frame of a 2D video stream, which is provided to the user for left eye viewing.

步驟(d)：針對右眼影格來重述上述的步驟(a)至(c)，也就是把上述步驟(a)至(c)中所述的左眼睛的操作都改為右眼睛，藉此產生另一2D 視頻串流的另一影格也就是右眼影格，提供給使用者的右眼觀看。 Step (d): repeating the above steps (a) to (c) for the right eye frame, that is, changing the operation of the left eye described in the above steps (a) to (c) to the right eye, borrowing This produces another 2D Another frame of the video stream is the right eye frame, which is provided to the user for viewing by the right eye.

步驟(e)：為左眼影格傳送下列三個資訊至VR場景伺服器1120(路徑1112)：[Info 1112-A]、[Info 1112-B]及[Info 1112-C]，以及，為右眼影格傳送下列三個資訊至VR場景伺服器1120(路徑1113)：[Info 1113-A]、[Info 1113-B]及[Info 1113-C]。 Step (e): The following three pieces of information are transmitted to the left eye frame to the VR scene server 1120 (path 1112): [Info 1112-A], [Info 1112-B], and [Info 1112-C], and, right The eye shadow grid transmits the following three messages to the VR scene server 1120 (path 1113): [Info 1113-A], [Info 1113-B], and [Info 1113-C].

步驟(f)：VR場景伺服器1120中的資料封包器121會把左、右兩眼的資訊([Info 1112-A]、[Info 1112-B]、[Info 1112-C]、[Info 1113-A]、[Info 1113-B]及[Info 1113-C])打包成一個資訊封包。 Step (f): The data packer 121 in the VR scene server 1120 will display the information of the left and right eyes ([Info 1112-A], [Info 1112-B], [Info 1112-C], [Info 1113] -A], [Info 1113-B] and [Info 1113-C] are packaged into one information packet.

步驟(g)：VR場景伺服器1120會將步驟(f)中產生的資訊封包傳送至用戶裝置21、22、23中的VR場景用戶端1170(路徑1122)。 Step (g): The VR scenario server 1120 transmits the information packet generated in the step (f) to the VR scenario client 1170 (path 1122) in the user devices 21, 22, 23.

[Info 1112-A]係為3D模型M前所有3D模型之狀態資訊(或稱詮釋資料)。注意可能無此類之模型存在。此類模型係皆具有“Ready for Client”狀態，意謂著這些模型已經預載於用戶裝置，用戶裝置21,22,23上面的VR場景用戶端(程式)1170已經可以自行渲染這些模型。為了減少資料傳輸頻寬，VR場景傳輸器1110可以不必傳送全部的狀態資訊，只要傳送狀態資訊中，本次渲染與上次渲染的差異即可。 [Info 1112-A] is the status information (or interpretation data) of all 3D models before the 3D model M. Note that there may be no such models exist. Such model systems all have a "Ready for Client" state, meaning that these models are preloaded on the user device, and the VR scene client (program) 1170 on the user devices 21, 22, 23 can already render these models themselves. In order to reduce the data transmission bandwidth, the VR scene transmitter 1110 does not have to transmit all the state information, as long as the difference between the current rendering and the last rendering is transmitted in the state information.

[Info 1112-B]如果伺服器找到了3D模型M、且其在用戶裝置預先儲存之狀態為“Not Ready”時，伺服器將更改其用戶狀態為“Loading”，並送出一3D模型M之下載指示，要求用戶裝置下載此3D模型M；如果用戶狀態已為“Loading”，則不要送出任何指示，因下載指示已送出。 [Info 1112-B] If the server finds the 3D model M and its pre-stored state in the user device is "Not Ready", the server will change its user status to "Loading" and send a 3D model M. The download instruction asks the user device to download the 3D model M; if the user status is already "Loading", then do not send any indication, because the download instruction has been sent.

[Info 1112-C]係為步驟(c)中之編碼後的左眼的視頻串流影格，也就是左眼影格。 [Info 1112-C] is the video stream frame of the left eye encoded in step (c), that is, the left eye frame.

[Info 1113-A]、[Info 1113-B]及[Info 1113-C]基本上是分別實質相同於[Info 1112-A]、[Info 1112-B]及[Info 1112-C]，只不過[Info 1113-A]、[Info 1113-B]及[Info 1113-C]是關於右眼影格。 [Info 1113-A], [Info 1113-B], and [Info 1113-C] are basically the same as [Info 1112-A], [Info 1112-B], and [Info 1112-C], respectively, except that [Info 1113-A], [Info 1113-B], and [Info 1113-C] are about the right eye frame.

當每次VR場景應用程式1100之主程式將新的VR場景資料更新至VR場景傳輸器1110時，重複步驟(a)~(g)，通常，VR場景應用程式1100之主程式會在每一次的渲染週期中更新此類資料。 Each time the main program of the VR scene application 1100 updates the new VR scene data to the VR scene transmitter 1110, steps (a) to (g) are repeated. Usually, the main program of the VR scene application 1100 will be used every time. This type of material is updated during the rendering cycle.

一俟VR場景用戶端1170接收前述資料後，即進行後述之渲染程序。 Once the VR scene client 1170 receives the aforementioned data, it performs the rendering described later. program.

步驟(i)：解碼[Info 1112-C and Info 1113-C]中的視訊影格(包括左眼影格與右眼影格兩者)、並將這兩影格傳送給影格結合器1171。 Step (i): Decode the video frames (including both the left eye frame and the right eye frame) in [Info 1112-C and Info 1113-C] and transfer the two frames to the frame binder 1171.

步驟(ii)：影格結合器1171將這兩影格(包括左眼影格1711與右眼影格1712兩者)合併成為一合併的VR影格1713(請參閱第9圖)，作為後續3D模型渲染之背景。 Step (ii): The frame combiner 1171 combines the two frames (including both the left eye frame 1711 and the right eye frame 1712) into a combined VR frame 1713 (see Fig. 9) as a background for subsequent 3D model rendering. .

步驟(iii)：在步驟(ii)編碼後之合併的VR影格上渲染所有[Info 1112-A and Info 1113-A]中之3D模型。為降低網路頻寬佔用，VR場景用戶端1170將會儲存此一[Info 1112-A and Info 1113-A]資訊至記憶體中，因此下次VR場景傳輸器1110可僅傳送下次渲染與本次渲染間之[Info 1112-A and Info 1113-A]狀態的差異，不需要傳送全部的狀態資訊。 Step (iii): Render all 3D models in [Info 1112-A and Info 1113-A] on the merged VR frames after step (ii) encoding. In order to reduce the network bandwidth occupation, the VR scene client 1170 will store this [Info 1112-A and Info 1113-A] information into the memory, so the next VR scene transmitter 1110 can only transmit the next rendering and The difference in the status of [Info 1112-A and Info 1113-A] between the renderings does not require the transmission of all status information.

步驟(iv)：輸出步驟(iii)中的渲染結果以作為一包含了VR場景之輸出視頻串流中的一渲染後的混合VR影格，也就是最終被輸出的視頻串流結果(路徑1176)。在此實施例中，該用戶裝置是一眼鏡或頭盔造型的電子裝置，其包括了分別位於使用者左眼與右眼前方的兩個顯示螢幕；其中，左邊的螢幕用來顯示供使用者左眼觀看的影像(影格)，右邊的螢幕用來顯示供使用者右眼觀看的影像(影格)。該輸出視頻串流中的混合VR影格是以下述方式播放於用戶裝置的這兩螢幕上，亦即，將該混合VR影格中每一行的左半部各畫素都顯示於該左眼螢幕，而該混合VR影格中每一行的右半部各畫素則都顯示於該右眼螢幕，以提供使用者虛擬實境(VR)的視覺感受。 Step (iv): outputting the rendering result in step (iii) as a rendered hybrid VR frame in the output video stream containing the VR scene, that is, the final output video stream result (path 1176) . In this embodiment, the user device is a glasses or a helmet-shaped electronic device, and includes two display screens respectively located in front of the left eye and the right eye of the user; wherein the left screen is displayed for the user to the left The image that is viewed by the eye (frame), and the screen on the right is used to display the image (frame) for the user's right eye. The mixed VR frame in the output video stream is played on the two screens of the user device in such a manner that the left half of each pixel in each line of the mixed VR frame is displayed on the left eye screen. The right half of each pixel in each line of the hybrid VR frame is displayed on the right eye screen to provide a visual reality of the user's virtual reality (VR).

如果提供有[Info 1112-B]與[Info 1113-B]之狀態時，表示有3D模型M需要被VR場景用戶端1170準備，此時，VR場景用戶端1170將依下列程序處理3D模型M。 If the status of [Info 1112-B] and [Info 1113-B] is provided, it indicates that the 3D model M needs to be prepared by the VR scene client 1170. At this time, the VR scene client 1170 will process the 3D model M according to the following procedure. .

步驟(I)：搜尋VR場景快取1190(路徑1174)，VR場景快取1190係包含先前下載及儲存於用戶裝置21、22、23中之3D模型資料庫。 Step (I): Search VR Scene Cache 1190 (Path 1174), VR Scene Cache 1190 contains the 3D model database previously downloaded and stored in user devices 21, 22, 23.

步驟(II)：如果VR場景快取1190中已經有3D模型M，則直接執行步驟(V)。 Step (II): If there is already a 3D model M in the VR scene cache 1190, step (V) is directly executed.

步驟(III)：如果VR場景快取1190中沒有3D模型M，則VR場景用戶端1170將會送出一下載需求，至VR場景伺服器1120(路徑1172)，VR場景伺服器1120將會回傳3D模型M之資料給VR場景用戶端1170(路徑1124)。 Step (III): If there is no 3D model M in the VR scene cache 1190, the VR scene client 1170 will send a download request to the VR scene server 1120 (path 1172), and the VR scene server 1120 will return The data of the 3D model M is given to the VR scene client 1170 (path 1124).

步驟(IV)：一俟3D模型完全下載後，VR場景用戶端1170即將之存入VR場景快取1190(路徑1194)，藉此，當下次具有類似需求時，即不需再進行下載。 Step (IV): After the 3D model is completely downloaded, the VR scene client 1170 is about to be stored in the VR scene cache 1190 (path 1194), so that when there is a similar demand next time, there is no need to download again.

步驟(V)：VR場景用戶端1170將會自VR場景快取1190中，提取3D模型M(路徑1192)。 Step (V): The VR scene client 1170 will extract the 3D model M (path 1192) from the VR scene cache 1190.

步驟(VI)：一俟下載完成(或先前早已下載)、VR場景用戶端1170即可提取3D模型M；VR場景用戶端1170將會送出一“3D model is ready on client(3D模型已在用戶裝置上)”之訊息至VR場景伺服器1120(路徑1115)，而VR場景伺服器1120將會轉送此訊息至VR場景傳輸器1110(路徑1114)。 Step (VI): Once the download is completed (or has been downloaded earlier), the VR scene client 1170 can extract the 3D model M; the VR scene client 1170 will send a "3D model is ready on client" (the 3D model is already in the user) The message on the device is sent to the VR Scene Server 1120 (path 1115), and the VR Scene Server 1120 will forward the message to the VR Scene Transmitter 1110 (path 1114).

步驟(VII)：一俟VR場景傳輸器1110接收此訊息後，即會將3D模型M之狀態由“Loading”改為“Ready for Client”。 Step (VII): After receiving the message, the VR scene transmitter 1110 changes the state of the 3D model M from "Loading" to "Ready for Client".

步驟(VIII)：在下一次的渲染中，VR場景傳輸器1110將會知曉3D模型M現已預載於用戶裝置中，故將會請VR場景用戶端1170自行渲染，因此，伺服器1將不再需要渲染此一3D模型M。 Step (VIII): In the next rendering, the VR scene transmitter 1110 will know that the 3D model M is now preloaded in the user device, so the VR scene client 1170 will be rendered by itself, therefore, the server 1 will not It is then necessary to render this 3D model M.

在最初始時，用戶裝置21、22、23中是沒有任何3D模型的，所以VR場景傳輸器1110將會渲染所有3D模型、並將其結果編碼為包括左眼影格與右眼影格的2D視頻串流。VR場景傳輸器1110將會把3D模型的下載需求[Info 1112-B]及[Info 1113-B]，從最接近3D投影面(或使用者之左眼或右眼)者開始處理。VR場景用戶端1170將會自VR場景伺服器1120下載每一3D模型，或者自VR場景快取1190上一一提取。而當更多的3D模型可由VR場景用戶端1170取得時，VR場景傳輸器1110將會自動通知VR場景用戶端1170自行渲染這些模型及聲音，並減少由VR場景傳輸器1110渲染之3D模型之數量。藉此，在編碼過之具有左眼影格與右眼影格的2D視訊串流中的3D模型會越來越少，直至最後VR場景用戶端1170上可取得所有的3D模型時為止；而之後，在此階段中，僅剩黑色螢幕是由伺服器1來編碼，換言之，伺服器1不需再傳送2D視頻串流至用戶裝置21、22、23中，且伺服器1及用戶裝置21、22、23間之通訊頻寬佔用亦可大幅降低。 At the very beginning, there are no 3D models in the user devices 21, 22, 23, so the VR scene transmitter 1110 will render all 3D models and encode the results into 2D video including the left eye frame and the right eye frame. Streaming. The VR Scene Transmitter 1110 will process the download requirements [Info 1112-B] and [Info 1113-B] of the 3D model from the closest 3D projection surface (or the user's left or right eye). The VR scene client 1170 will download each 3D model from the VR scene server 1120 or extract it from the VR scene cache 1190 one by one. When more 3D models are available to the VR scene client 1170, the VR scene transmitter 1110 will automatically notify the VR scene client 1170 to render the models and sounds themselves, and reduce the 3D model rendered by the VR scene transmitter 1110. Quantity. Thereby, the 3D model in the encoded 2D video stream with the left eye frame and the right eye frame will be less and less until the last VR scene client 1170 can obtain all the 3D models; and then, In this stage, only the black screen is encoded by the server 1, in other words, the server 1 does not need to transmit the 2D video stream to the user devices 21, 22, 23, and the server 1 and the user devices 21, 22 The communication bandwidth of 23, can also be greatly reduced.

一俟一新的3D模型N出現在VR場景時，VR場景傳輸器1110即會(1)通知VR場景用戶端1170僅渲染所有位於此新的3D模型N前之所有 3D模型(相對使用者的左眼或右眼而言)，(2)通知VR場景用戶端1170下載此新的3D模型N，以及(3)VR場景傳輸器1110將會渲染此新的3D模型N及所有位於其後之所有模型、並將其結果編碼為一包含左眼影格與右眼影格之2D視頻串流。之後，再將此包含左眼影格與右眼影格之2D視頻串流傳送至VR場景用戶端1170。於是，VR場景用戶端1170仍可在3D模型N於用戶裝置上備妥前、持續重製VR場景應用程式1100之3D影像渲染結果。 When a new 3D model N appears in the VR scene, the VR scene transmitter 1110 will (1) notify the VR scene client 1170 that only all of the new 3D models N are rendered. 3D model (relative to the user's left or right eye), (2) notifying the VR scene client 1170 to download the new 3D model N, and (3) the VR scene transmitter 1110 will render the new 3D model N and all the models that follow it, and encode the result as a 2D video stream containing the left eye frame and the right eye frame. Then, the 2D video stream including the left eye frame and the right eye frame is transmitted to the VR scene client 1170. Therefore, the VR scene client 1170 can still reproduce the 3D image rendering result of the VR scene application 1100 before the 3D model N is prepared on the user device.

第10圖為本發明之VR場景系統的系統架構之第二實施例之示意圖。於第10圖所示之第二實施例中，大部分元件與功能實質上是相同或類似於第8圖所揭露的第一實施例，唯獨其影格結合器1111是位於VR場景傳輸器1110中，而非位於VR場景用戶端1170；也因此，第10圖中相同或類似的元件將會被給予和第8圖相同的編號，且不贅述其細節。 FIG. 10 is a schematic diagram of a second embodiment of a system architecture of a VR scene system of the present invention. In the second embodiment shown in FIG. 10, most of the elements and functions are substantially the same or similar to the first embodiment disclosed in FIG. 8, except that the frame combiner 1111 is located in the VR scene transmitter 1110. The same or similar elements in Fig. 10 will be given the same reference numerals as in Fig. 8, and the details thereof will not be described.

如第10圖所示，VR場景應用程式1100之主程式、藉由透過API呼叫程式庫的方式、將VR場景資訊傳送至VR場景傳輸器1110，此VR場景資訊係包括名稱、位置、速度、屬性、座向及所有其他3D模型渲染所需之資料。在VR場景傳輸器1110接收此類資料後，即可執行下列程序。 As shown in FIG. 10, the main program of the VR scene application 1100 transmits the VR scene information to the VR scene transmitter 1110 by calling the library through the API. The VR scene information includes name, location, speed, Properties, orientation, and all other 3D model rendering materials. After the VR scene transmitter 1110 receives such data, the following procedure can be performed.

步驟(b)：對於3D模型言，從最近點(最接近使用者的左眼睛者)尋找第一個不具有“Ready for Client”狀態之3D模型“M”，換言之，第一個3D模型“M”之狀態係為"Not Ready"狀態，(在此之後，"Not Ready"狀態簡稱為NR狀態)；當然，也可能並無此類之3D模型存在。 Step (b): For the 3D model, look for the first 3D model "M" that does not have a "Ready for Client" state from the closest point (the one closest to the user's left eye), in other words, the first 3D model. The state of M" is the "Not Ready" state, (after which, the "Not Ready" state is simply referred to as the NR state); of course, there may be no such 3D model.

步驟(c)：在伺服器1中把3D模型“M”及所有後續的3D模型都進行渲染(倘若不存在所述的3D模型“M”時，則直接產生一黑螢幕)然後儲存在記憶體中。 Step (c): rendering the 3D model "M" and all subsequent 3D models in the server 1 (if the 3D model "M" is not present, a black screen is directly generated) and then stored in the memory In the body.

步驟(d)：針對右眼影格來重述上述的步驟(a)至(c)，也就是把上述步驟(a)至(c)中所述的左眼睛的操作都改為右眼睛，藉此產生一右眼影格，提供給使用者的右眼觀看。 Step (d): repeating the above steps (a) to (c) for the right eye frame, that is, changing the operation of the left eye described in the above steps (a) to (c) to the right eye, borrowing This produces a right eye frame that is provided to the user for viewing in the right eye.

步驟(e)：由影格結合器1111將已渲染的左眼影格與右眼影格合併成為一2D視頻串流中的一合併的VR影格。 Step (e): The rendered left eye frame and the right eye frame are combined by the frame combiner 1111 into a combined VR frame in a 2D video stream.

步驟(e)：為左眼影格與右眼影格傳送下列三個資訊至VR場景伺服器1120(路徑1112)：[Info 1112-A]、[Info 1112-B]及[Info 1112-C]，然後，VR場景伺服器1120會再將其傳送至用戶裝置21、22、23中的VR場景用戶端1170(路徑1122)。 Step (e): The following three pieces of information are transmitted to the VR scene server 1120 (path 1112) for the left eye frame and the right eye frame: [Info 1112-A], [Info 1112-B], and [Info 1112-C], The VR scene server 1120 then transmits it to the VR scene client 1170 (path 1122) in the user devices 21, 22, 23.

[Info 1112-C]係為步驟(e)中之已渲染且包含了左眼影格與右眼影格的視頻串流影格中的合併的VR影格。 [Info 1112-C] is the merged VR frame in the video stream frame that has been rendered in step (e) and contains the left eye frame and the right eye frame.

一俟VR場景用戶端1170接收前述資料後，即進行後述之渲染程序。 Once the VR scene client 1170 receives the aforementioned data, the rendering procedure described later is performed.

步驟(i)：解碼[Info 1112-C]中的合併的VR影格、並將其作為後續3D模型渲染時的背景。 Step (i): Decode the merged VR frame in [Info 1112-C] and use it as the background for rendering the subsequent 3D model.

步驟(ii)：在該合併的VR影格上渲染所有[Info 1112-A]中之3D模型。為降低網路頻寬佔用，VR場景用戶端1170將會儲存此一[Info 1112-A]資訊至記憶體中，因此下次VR場景傳輸器1110可僅傳送下次渲染與本次渲染間之[Info 1112-A]狀態的差異，不需要傳送全部的狀態資訊 Step (ii): Render all 3D models in [Info 1112-A] on the merged VR frame. In order to reduce the network bandwidth occupation, the VR scene client 1170 will store the [Info 1112-A] information into the memory, so the next VR scene transmitter 1110 can only transmit the next rendering and the current rendering. [Info 1112-A] Difference in status, no need to transfer all status information

步驟(iii)：輸出步驟(ii)中的渲染結果以作為一包含了VR場景之輸出視頻串流中的一渲染後的混合VR影格，也就是最終被輸出的視頻串流結果(路徑1176)。 Step (iii): outputting the rendering result in step (ii) as a rendered mixed VR frame in the output video stream containing the VR scene, that is, the final output video stream result (path 1176) .

第11圖為本發明之VR場景系統的系統架構之第三實施例之示意圖。於第11圖所示之第三實施例中，大部分元件與功能實質上是相同或類似於第8圖所揭露的第一實施例，只是此第三實施例已不再具有影格結合器；也因此，第11圖中相同或類似的元件將會被給予和第8圖相同的編號，且不贅述其細節。 11 is a schematic diagram of a third embodiment of a system architecture of a VR scenario system of the present invention. In the third embodiment shown in FIG. 11, most of the elements and functions are substantially the same or similar to the first embodiment disclosed in FIG. 8, except that the third embodiment no longer has a combination of frames. Therefore, the same or similar elements in Fig. 11 will be given the same reference numerals as in Fig. 8, and the details thereof will not be described.

如第11圖所示，VR場景伺服器1120是一個執行於一具有VR場景應用程式1100之伺服器1上的伺服器電腦軟體，用以產生包含有複數3D模型的一虛擬VR 3D環境。VR場景伺服器1120係為一與應用程式1100共同於伺服器1上執行之一伺服器程式，乃作為伺服器1之VR場景傳輸器1110、與用戶裝置21、22、23之VR場景用戶端1170間，訊息傳遞之中繼站。VR場景伺服器1120同時亦作為一檔案下載伺服器，供用戶裝置21、22、23之VR場景用戶端1170自伺服器1下載必要的3D模型。VR場景傳輸器1110保有一個清單，列出所有3D模型以及各個3D模型是否已被儲存在用戶裝置的狀態，此狀態係用以指明每一個3D模型在用戶裝置中的狀態為(1)“Not Ready(未備妥)”、(2)“Loading(下載中)”、及(3)“Ready for Client(用戶已下載)”中之一者。 As shown in FIG. 11, the VR scene server 1120 is a server computer software executed on a server 1 having a VR scene application 1100 for generating a virtual VR 3D environment including a plurality of 3D models. The VR scenario server 1120 is a server program executed on the server 1 together with the application program 1100, and is used as the VR scene transmitter 1110 of the server 1 and the VR scene client of the user devices 21, 22, and 23. 1170, the relay station for message transmission. The VR scene server 1120 also serves as a file download server for the VR scene client 1170 of the user devices 21, 22, 23 to download the necessary 3D model from the server 1. The VR Scene Transmitter 1110 maintains a list listing all 3D models and whether each 3D model has been stored in the state of the user device. This state is used to indicate that the status of each 3D model in the user device is (1) "Not One of Ready (not ready), (2) "Loading", and (3) "Ready for Client".

伺服器1會檢查這些這些3D模型的該狀態，以決定哪些3D模型需被編碼在一2D視訊串流的一左眼影格、而哪些3D模型需被編碼在該2D視頻串流的一右眼影格，於本發明中，那些沒有被預先儲存在用戶裝置21、22、23中的3D模型都會被編碼在該左眼影格及右眼影格中。為了達到此功能，VR場景應用程式1100之主程式、藉由透過API呼叫程式庫的方式(第11圖之路徑1101)、將VR場景資訊傳送至VR場景傳輸器1110，此VR場景資訊係包括名稱、位置、速度、屬性、座向及所有其他3D模型渲染所需之資料。在VR場景傳輸器1110接收此類資料後，即可執行下列程序。 Server 1 will check the status of these 3D models to determine which 3D models need to be encoded in a left eye frame of a 2D video stream, and which 3D models need to be encoded in a right eye shadow of the 2D video stream. In the present invention, those 3D models that are not previously stored in the user devices 21, 22, 23 are encoded in the left eye frame and the right eye frame. In order to achieve this function, the main program of the VR scene application 1100 transmits the VR scene information to the VR scene transmitter 1110 by calling the library through the API (the path 1101 of FIG. 11), and the VR scene information includes Name, location, speed, attributes, orientation, and all the information needed to render the 3D model. After the VR scene transmitter 1110 receives such data, the following procedure can be performed.

步驟(c)：對於3D模型言，由伺服器1來渲染3D模型M及其後所有的3D模型，也就是所有比M距離左眼睛更遠的3D模型。(如果沒有3D模型M，則以一黑螢幕顯示)，編碼渲染後之結果為一2D視頻串流的一左眼影格(frame)，提供給使用者的左眼觀看。 Step (c): For the 3D model, the 3D model M is rendered by the server 1 and thereafter All 3D models, that is, all 3D models that are farther away from the left eye than M. (If there is no 3D model M, it is displayed on a black screen), the result of the coded rendering is a left eye frame of a 2D video stream, which is provided to the user for left eye viewing.

步驟(d)：針對右眼影格來重述上述的步驟(a)至(c)，也就是把上述步驟(a)至(c)中所述的左眼睛的操作都改為右眼睛，藉此產生另一2D視頻串流的另一影格也就是右眼影格，提供給使用者的右眼觀看。 Step (d): repeating the above steps (a) to (c) for the right eye frame, that is, changing the operation of the left eye described in the above steps (a) to (c) to the right eye, borrowing Another frame that produces another 2D video stream, the right eye frame, is provided to the user for viewing by the right eye.

[Info 1113-A]、[Info 1113-B]及[Info 1113-C]基本上是分別實質相同於[Info 1112-A]、[Info 1112-B]及[Info 1112-C]，只不過[Info 1113-A]、[Info 1113-B]及[Info 1113-C]是關於右眼影格。 [Info 1113-A], [Info 1113-B], and [Info 1113-C] are basically the same as [Info 1112-A], [Info 1112-B], and [Info 1112-C], respectively, except that [Info 1113-A], [Info 1113-B], and [Info 1113-C] are related to the right eye frame.

步驟(i)：解碼[Info 1112-C and Info 1113-C]中的視訊影格(包括左眼影格與右眼影格兩者)、並將這兩影格儲存於不同的記憶體空間。 Step (i): Decode the video frames (including both the left eye frame and the right eye frame) in [Info 1112-C and Info 1113-C] and store the two frames in different memory spaces.

步驟(ii)：在解碼後之左眼影格與右眼影格上分別渲染所有[Info 1112-A and Info 1113-A]中所包含之3D模型(如果此3D模型存在的話)。為降低網路頻寬佔用，VR場景用戶端1170將會儲存此一[Info 1112-A and Info 1113-A]資訊至記憶體中，因此下次VR場景傳輸器1110可僅傳送下次渲染與本次渲染間之[Info 1112-A and Info 1113-A]狀態的差異，不需要傳送全部的狀態資訊。 Step (ii): Render all 3D models included in [Info 1112-A and Info 1113-A] on the decoded left eye frame and right eye frame respectively (if this 3D model exists). In order to reduce the network bandwidth occupation, the VR scene client 1170 will store this [Info 1112-A and Info 1113-A] information into the memory, so the next VR scene transmitter 1110 can only transmit the next rendering and The difference in the status of [Info 1112-A and Info 1113-A] between the renderings does not require the transmission of all status information.

步驟(iii)：輸出步驟(ii)中的渲染結果以作為一包含了VR場景之輸出視頻串流中的渲染後的一混合左眼影格與一混合右眼影格，也就是最終被輸出的視頻串流結果(路徑1176)。其中，所述的混合左眼影格與混合右眼影格，可合併被稱為如先前曾提及的混合VR影格。 Step (iii): outputting the rendering result in step (ii) as a rendered left-eye frame and a mixed right-eye frame in the output video stream including the VR scene, that is, the final output video Streaming results (path 1176). Wherein, the mixed left eye frame and the mixed right eye frame can be combined to be referred to as a hybrid VR frame as mentioned before.

於此實施例中，該用戶裝置是一眼鏡或頭盔造型的電子裝置，其包括了分別位於使用者左眼與右眼前方的兩個顯示螢幕；其中，左邊的螢幕用來顯示供使用者左眼觀看的影像(影格)，右邊的螢幕用來顯示供使用者右眼觀看的影像(影格)。該輸出視頻串流中的混合VR影格是以下述方式播放於用戶裝置的這兩螢幕上，亦即，將該混合VR影格中的每一個混合左眼影格都顯示於該左眼螢幕，而該混合VR影格中每一個混合右眼影格則都顯示於該右眼螢幕，以提供使用者虛擬實境(VR)的視覺感受。 In this embodiment, the user device is a glasses or a helmet-shaped electronic device, and includes two display screens respectively located in front of the left eye and the right eye of the user; wherein the left screen is displayed for the user to the left The image that is viewed by the eye (frame), and the screen on the right is used to display the image (frame) for the user's right eye. The mixed VR frame in the output video stream is played on the two screens of the user device in such a manner that each of the mixed left VR frames in the mixed VR frame is displayed on the left eye screen, and the Each of the mixed right eye frames in the hybrid VR frame is displayed on the right eye screen to provide a visual experience of the user's virtual reality (VR).

而在另一實施例中，於用戶裝置之一螢幕上所輸出的視頻串流，是在同一螢幕上依序輪流顯示該混合左眼影格與該混合右眼影格。使用者可配戴一眼鏡造型的電子裝置，其可以對應於該螢幕上所顯示之混合左眼影格與該混合右眼影格來依序輪流開啟與關閉其左眼視窗與右眼視窗，以提供使用者虛擬實境(VR)的視覺感受。 In another embodiment, the video stream outputted on one of the user devices is displayed on the same screen in turn to display the mixed left eye frame and the mixed right eye frame. The user can wear a glasses-shaped electronic device, which can sequentially open and close the left-eye window and the right-eye window in turn according to the mixed left-eye frame and the mixed right-eye frame displayed on the screen to provide The visual perception of the user's virtual reality (VR).

唯以上所述之實施例不應用於限制本發明之可應用範圍，本發明之保護範圍應以本發明之申請專利範圍內容所界定技術精神及其均等變化所含括之範圍為主者。即大凡依本發明申請專利範圍所做之均等變化及修飾，仍將不失本發明之要義所在，亦不脫離本發明之精神和範圍，故都應視為本發明的進一步實施狀況。 The above-mentioned embodiments are not intended to limit the scope of application of the present invention, and the scope of the present invention should be based on the technical spirit defined by the content of the patent application scope of the present invention and the scope thereof. It is to be understood that the scope of the present invention is not limited by the spirit and scope of the present invention, and should be considered as a further embodiment of the present invention.

1‧‧‧伺服器 1‧‧‧Server

21、22、23‧‧‧用戶裝置 21, 22, 23‧‧‧ User equipment

1100‧‧‧應用程式 1100‧‧‧Application

1110‧‧‧場景傳輸器 1110‧‧‧Scenario Transmitter

1120‧‧‧場景伺服器 1120‧‧‧Scenario Server

121‧‧‧資料封包器 121‧‧‧data packer

1170‧‧‧場景用戶端 1170‧‧‧ Scene Client

1171‧‧‧影格結合器 1171‧‧1 frame combiner

1190‧‧‧場景快取 1190‧‧‧ Scene cache

1101、1112、1113、1114、1122、1172、1124、1115、1174、1192、1194、1176‧‧‧路徑 Paths of 1101, 1112, 1113, 1114, 1122, 1172, 1124, 1115, 1174, 1192, 1194, 1176‧‧

Claims

A method for transmitting media over a network, the media comprising a plurality of images, comprising: step (A): executing a virtual reality (VR) application on a server to generate a virtual image comprising a plurality of 3D models a VR 3D environment, each of the 3D models being associated with a state indicating whether the 3D model is pre-existing in a user device; and step (B): the server checks the states of the 3D models to determine which one The 3D model is to be encoded as a left eye frame and a right eye frame included in a 2D video stream, and is encoded by encoding the 3D models that are not pre-existing in the user device to the left eye frame and the In the right eye frame; step (C): the server transmits at least the left eye frame and the right eye frame of the 2D video stream to the user device through the network; wherein the server also not The 3D models in the user device are transmitted to the user device in a predetermined order; and when the user device receives the 3D models transmitted by the server, the user device stores the 3D models And send a message to the a server to change the states of the 3D models and indicating that the 3D models are now pre-existing in the user device; and step (D): the user device receives the left eye shadow received from the server And the right eye frame is decoded, and the left eye frame and the right eye frame are used as a background for rendering the 3D model of the pre-existing user device but not included in the left eye frame and the right eye frame. A picture to thereby generate a hybrid VR frame as an output video stream containing one of the VR scenes.

The method for transmitting media over a network as described in claim 1, wherein in the step (B), the states of the 3D models are obtained by the server in a position closest to a virtual location. The order of the other point farthest from the virtual position is checked; and, in the verification, when the first 3D model in the user device is not pre-stored, whether or not the 3D models located behind it are pre-existing In the user device, all remaining 3D models including the discovered 3D model are encoded into the left eye frame and the right eye frame.

A method for transmitting media over a network as described in claim 2, wherein when a new 3D model appears in the VR 3D environment, whether or not the 3D models located thereafter are pre-existing in the user device , will include all subsequent 3D of the new 3D model The model is encoded into the left eye frame and the right eye frame.

The method for transmitting media over a network as described in claim 2, wherein the virtual location is a 3D projection plane; and, in the step (D), the user device receives the server from the server After the left eye frame and the right eye frame are decoded, the user device further merges the left eye frame and the right eye frame into a combined VR frame, and then uses the combined VR frame as the background picture to render the The 3D models pre-stored in the user device but not included in the left eye frame and the right eye frame are thereby generated as the hybrid VR frame containing the output video stream of the VR scene.

The method for transmitting media over a network as described in claim 1, wherein: in the step (C), the server is configured to transmit the 3D models not pre-existing in the user device to the The predetermined sequence of the user device is in the order of being closest to one of the virtual locations to another point farthest from the virtual location; in the step (C), the server will not be encoded to the left eye frame and The status information of the 3D model in the right eye frame is transmitted to the user device. When receiving and verifying the status information, the user device performs the following manner: if any 3D model system in the status information is received If the device is not pre-existing, the user device sends a request to the server to download the 3D model; wherein the status information includes each left eye shadow that is not encoded into the 2D video stream. And the at least one interpretation data of the 3D model in the right eye frame, the interpretation data of each of the 3D models includes a name, a position, a speed, a direction, and an attribute of the 3D model.

A system for transmitting media over a network includes a server for executing a virtual reality (VR) application to generate a virtual VR 3D environment including a plurality of 3D models, each of the 3D models Corresponding to a state indicating whether the 3D model is pre-existing in a user device; and the user device is connected to the server through a network for obtaining at least some of the 3D models generated by the VR application The media includes a plurality of images, and the plurality of images are transmitted by: step (B): the server checks the states of the 3D models to determine which 3D model needs to be encoded. a left eye frame and a right eye shadow for a 2D video stream a coding method for encoding the non-pre-existing of the 3D models in the user device into the left-eye frame and the right-eye frame (C): the server at least streams the 2D video stream to the left The eye shadow grid and the right eye frame are transmitted to the user device through the network; wherein the server also transmits the 3D models in the non-pre-existing user device to the user device in a predetermined order; When the user device receives the 3D models sent by the server, the user device stores the 3D models and sends a message to the server to change the states of the 3D models. And indicating that the 3D models are now pre-existing in the user device; step (D): the user device decodes the left eye frame and the right eye frame received from the server, and the left eye frame And combining the right eye frame into a merged VR frame, and then using the merged VR frame as a background image for rendering the pre-existing user devices but not included in the merged VR frame, thereby generating a One of the VR scenes A mixed stream VR Movies frame; and a step (E): the user equipment comprises the mixing the output VR of the output image frame of the video stream of the scene VR.

A system for transmitting media over a network as described in claim 6 wherein, in the step (B), the states of the 3D models are selected by the server to be closest to a virtual location. In the order of the other point farthest from the virtual location, it is checked that, in the verification, when the first 3D model in the user device is not pre-stored, whether the 3D model located behind it pre-stores the user In the device, all remaining 3D models including the discovered 3D model are encoded into the left eye frame and the right eye frame.

A system for transmitting media over a network as described in claim 7, wherein when a new 3D model appears in the VR 3D environment, whether the 3D model is located behind the user device is pre-existing All of the remaining 3D models including the new 3D model are encoded into the left eye frame and the right eye frame.

A system for transmitting media over a network as described in claim 6 wherein: in the step (C), the server is configured to transmit the 3D models not pre-existing in the user device to the user The predetermined sequence of devices is one closest to the virtual The order of the 3D model of the position to the other 3D model farthest from the virtual position; in the step (C), the server will also not be encoded into the left eye frame and the 3D in the right eye frame One state information of the model is transmitted to the user device; when receiving and verifying the status information, the user device performs the following manner: if any of the 3D models received in the status information are non-pre-existing in the device The user device sends a request to the server to download the 3D model; wherein the status information includes each of the 3D models that are not encoded into the left eye frame and the right eye frame. The interpretation data, the interpretation data of each of the 3D models includes a name, a position, a speed, a direction, and an attribute of the 3D model.

The system for transmitting media over the network as described in claim 6, wherein the server further comprises: a VR scene transmitter, which is compiled in the VR application or dynamically linked to the VR during execution time. a library on the application; wherein the VR scene transmitter maintains a list of all of the states of the 3D model and each of the 3D models, the state is used to indicate that the state of the 3D model is "Not Ready" One of (not prepared), "Loading" and "Ready for Client"; and a VR scene server for executing the VR application on the server a server program; wherein the VR scene server serves as a relay station for the VR scene transmitter and the message transmission between the user devices, and the VR scene server also serves as the 3D necessary for the user device to download from the server. One of the models downloads the server program.

The system for transmitting media over the network according to claim 10, wherein the user device further comprises: a VR scene client, which is a program running on the user device, for generating the output video string. Streaming and communicating with the server through the network; a frame combiner for combining the left eye frame and the right eye frame into the combined VR frame; A VR scene cache is used to store at least one 3D model previously downloaded from the server.

A method of transmitting media over a network, the media comprising a plurality of images, the method comprising: step (A): executing a virtual reality (VR) application on a server to generate a plurality of 3D models a virtual VR 3D environment, each of the 3D models being associated with a state indicating whether the 3D model is pre-existing in a user device; and (B): the server checks the states of the 3D models to Determining which one of the 3D model sounds to be encoded as a 2D video stream and a right eye frame, the encoding method is to encode the 3D models that are not pre-existing in the user device to the 2D video Streaming the left eye frame with the right eye frame; then, the server merges the left eye frame and the right eye frame into a combined VR frame of the 2D video stream; step (C): the servo Transmitting at least the merged VR frame of the 2D video stream to the user device through a network; wherein the server also uses the 3D models in the non-pre-existing user device in a predetermined order, Transmitted to the user device; and when the user When receiving the 3D models transmitted by the server, the user device stores the 3D models and sends a message to the server to change the states of the 3D models and indicate the Some 3D models are now pre-existing in the user device; and step (D): the user device decodes the merged VR frame received from the 2D video stream of the server and utilizes the merged VR The frame is used as a background image for rendering the 3D model of the pre-existing user device but not included in the merged VR frame, thereby generating a hybrid VR frame containing one of the VR scenes and outputting the video stream. .

The method for transmitting media over a network according to claim 12, wherein: in the step (B), the state of the 3D models is selected by the server to be closest to a virtual location. In the order of another point away from the virtual location, it is checked whether, in the verification, the first 3D model in the user device is not pre-existing, whether the 3D models located behind it are pre-existing in the user device All put the bag All remaining 3D models of the discovered 3D model are encoded into the left eye frame of the 2D video stream and the right eye frame; in the step (C), the server also stores the user in the left eye frame. a 3D model in the device, transmitted to the user device in a predetermined order from one of the closest to the virtual location to another point farthest from the virtual location; and when the user device receives the transmission from the server In the 3D models, the user device stores the 3D models and sends a message to the server to change the states of the 3D models, and indicates that the 3D models are now pre-existing the user. In the device.

A method for transmitting media over a network as described in claim 13 wherein when a new 3D model appears in the VR 3D environment, whether or not the 3D models located thereafter are pre-existing in the user device All subsequent 3D models including the new 3D model are encoded into the left eye frame and the right eye frame; wherein the virtual position is a 3D projection surface.

A method for transmitting media over a network as described in claim 12, wherein in the step (C), the server is also not encoded into the left eye frame and the 3D in the right eye frame One of the model status information is transmitted to the user device, and the user device receives and checks the status information in the following manner: if any of the 3D models in the status information are received, the device is non-pre-existing in the device. The user device sends a request to the server to download the 3D model; wherein the status information includes each left eye frame and the right eye frame that are not encoded into the 2D video stream. An interpreting material of the 3D model, the interpreting data including a name, a position, a speed, a direction, and an attribute of the 3D model.