CN106899860A

CN106899860A - The system and method for media is transmitted by network

Info

Publication number: CN106899860A
Application number: CN201610377813.1A
Authority: CN
Inventors: 郭荣昌; 杨昇龙; 邓安伦
Original assignee: Ubitus Inc
Current assignee: Yobeta Co.,Ltd.
Priority date: 2015-12-21
Filing date: 2016-05-31
Publication date: 2017-06-27
Anticipated expiration: 2036-05-31
Also published as: JP2017117431A; TWI637772B; CN106899860B; JP6306089B2; TW201722520A

Abstract

The invention discloses a kind of system and method that media are sent to a user's set by network from a server, including：One, in the virtual reality applications program performed on server, produces a virtual VR 3D environment comprising multiple 3D models；This server checks each state according to a predefined procedure；Then, server only will not yet be pre-stored in the 3D models of user's set, is rendered into a left eye shadow lattice and a right eye shadow lattice of a 2D video streamings and is sent to user's set；Remaining has been pre-stored in the 3D models of user's set, and server is not rendered, only synchronous driving its annotate the information of translating.After user's set receives left and right two eye shadows lattice and annotates both information, using left and right two eye shadows lattice as background frame, and according to the information of annotation, re-started using the 3D models for being pre-stored in itself and rendered, as foreground picture；Both last mixed foreground and background, produce the mixing VR shadows lattice as the output video streaming for including a VR scenes and output it.

Description

The system and method for media is transmitted by network

Technical field

The present invention relates to a kind of system and method that the media such as image and sound are transmitted by network, especially Refer to a kind of in rendering virtual reality (Virtual-Reality on user's set；Abbreviation VR) image The method of 3D objects, the method is provided by rendering 3D objects in user's set with reference to by server VR scenes 2D video streamings.

Background technology

Between in the past few years, game on line has turned into world trends, as high in the clouds calculates related system and section A kind of development of skill, utilization server crossfire game content simultaneously provides the technology of service and has also come out.

A kind of method of traditional offer high in the clouds game services is to be responsible for almost all of fortune by server Calculate, that is, when high in the clouds game services to be provided, the server needs to produce and one includes that multiple can be by The virtual 3D environment of the 3D objects of participant's movement or control.In known technology, these 3D pairs As that can include audio, afterwards according to the control action of participant (player), the server will virtual 3D Environment is combined with 3D objects and renders on game machine one to be had on stereosonic 2D game screens.It Afterwards, the image after the server will be rendered is logical with the stereo 2D video streamings comprising sound with Network transmission is crossed to the device of player, player device is in after reception, it is only necessary to decode and show the 2D Video streaming, calculating is rendered without carrying out extra 3D.However, foregoing on same server For numerous players performs the conventional art for rendering calculating, will cause to perform the service that 3D renders calculating Device load excessive；Additionally, because player's finding picture is all with by the 2D video streamings of destructive compression Form is transmitted, therefore, the quality that either image shows 3D objects with the quality of sound with original has One paragraph is poor, and a large amount of network communication bandwidths between server and player device, also as a big problem.

Virtual reality (Virtual-Reality；Abbreviation VR) technology is very popular in the recent period.In order to For human eye provides a visual experience of VR, virtual VR scenes must specialize in the mankind comprising one Image that left eye is viewed and admired and another specialize in the image that human right observes reward.The present invention provides one kind and passes through Network transmits the system and method for the media such as image and sound, and it renders 3D pairs by user's set As come combine by server provide VR scenes 2D video streamings.

The content of the invention

It is a kind of by network transmission such as the media such as image and sound it is a primary object of the present invention to provide System and method, it is possible to decrease the load of server, the image shown on lifting user's set and sound Quality, and save the communication bandwidth between server and user's set；The characteristics of the inventive method be with Family device renders 3D objects (also referred to as 3D models) to combine the VR scenes provided by server 2D video streamings, to reach in rendering virtual reality (Virtual-Reality on user's set；Referred to as VR) the result of the 3D objects of image.

In order to achieve the above object, the invention provides a kind of system and side that media are transmitted by network Method, the media include multiple images.This system includes a server and a user's set, the method bag Include the following steps：

Step (A)：A virtual reality (VR) application program is performed on a server, to produce one to include The virtual VR 3D environment of multiple 3D models, every 3D models are arranged in pairs or groups the instruction 3D models Whether state in a user's set is pre-stored in；

Step (B)：The state of the plurality of 3D models of the server check, so which 3D mould determined Type will be encoded as the left eye shadow lattice and a right eye shadow lattice that a 2D video streamings are included, its coding Mode is by the non-multiple 3D model based codings being pre-stored in the user's set to the left eye shadow lattice and the right side In eye shadow lattice；

Step (C)：The server at least by the left eye shadow lattice of the 2D video streamings and the right eye shadow lattice, The user's set is sent to by network；Wherein, also non-be pre-stored in user's set for this by the server The plurality of 3D models be sent to the user's set in a predetermined order；When the user's set is received During the plurality of 3D models come by server transmission, the user's set will the plurality of 3D models storage An information to the server is deposited and sent, to change the state of the plurality of 3D models, and indicates this many Individual 3D models are currently to be pre-stored in the user's set；And

Step (D)：The left eye shadow lattice and the right eye shadow lattice that the user's set receives this from the server Decoding, and by the use of the left eye shadow lattice and the right eye shadow lattice as rendering the plurality of being pre-stored in user's set But a background frame of the 3D models being not included in the left eye shadow lattice and the right eye shadow lattice, to produce As a mixing VR shadow lattice of the output video streaming for including a VR scenes.

In an embodiment, in the step (D), should by what is received from the server in the user's set After left eye shadow lattice and right eye shadow lattice decoding, the user's set is further by the left eye shadow lattice and the right side Eye shadow lattice merge into the VR shadow lattice of a merging, then using the VR shadows lattice of the merging as the background Picture the plurality of be pre-stored in user's set but be not included in the left eye shadow lattice and the right eye shadow lattice to render In 3D models, mixing of the output video streaming of the VR scenes to produce as including VR shadow lattice.

In an embodiment, the server also includes：

One VR scene transfer devices, the dynamic in the VR application programs or in run time to be compiled in It is linked at the chained library in the VR application programs；Wherein, the VR scene transfers device is comprising all One list of the state of the 3D models and each 3D models, the state is used to indicate the 3D The state of model is " Not Ready (not being ready for) ", " Loading (in download) " and " Ready for Client One of (user has downloaded) "；And

One VR scene servers, are in the service performed on the server with the VR application programs Device program；Wherein, VR scene servers are used as information between the VR scene transfers device and the user's set One relay station of transmission, the VR scene servers are also downloaded as the user's set from the server One download server program of the necessary 3D models.

In an embodiment, the user's set also includes：

One VR scene clients, be one on the user's set operate program, be used to produce this defeated Go out video streaming and connected with the server by the network；

One shadow lattice colligator, is used to for the left eye shadow lattice to merge into the VR for merging with the right eye shadow lattice Shadow lattice；And

One VR scene caches, the 3D models downloaded from the server before being used to store at least.

Brief description of the drawings

Fig. 1 is the schematic diagram of the standards Example of system one that the present invention transmits media by network；

Fig. 2 is the schematic diagram of the embodiment of present system framework one；

Fig. 3 A are the flow chart of the embodiment of method one that the present invention transmits media by network；

Fig. 3 B are the flow chart of another embodiment of method that the present invention transmits media by network；

How Fig. 4 A, 4B and 4C transmit video streams and the embodiment of 3D models one for the inventive method Schematic diagram；

How Fig. 5 A, 5B and 5C determine which 3D model must be encoded into shadow lattice for the inventive method The schematic diagram of one embodiment；

Fig. 6 A, 6B and 6C are the video streams and 3D sound how the inventive method transmits tool sound The schematic diagram of one embodiment；

How Fig. 7 A, 7B and 7C determine which 3D sound must be encoded into tool sound for the inventive method The schematic diagram of the embodiment of video streams shadow lattice one of sound；

Fig. 8 is the signal of the embodiment of system architecture one of virtual reality of the invention (VR) scene system Figure；

The function of the shadow lattice colligator of Fig. 9 virtual reality (VR) scene systems to illustrate the invention One embodiment schematic diagram；

Figure 10 is the schematic diagram of the second embodiment of the system architecture of VR scene systems of the invention；

Figure 11 is the schematic diagram of the 3rd embodiment of the system architecture of VR scene systems of the invention.

Description of reference numerals：1~server；3~network (base station)；4~network；21~user fills Put (smart mobile phone)；22~user's set (notebook computer)；23~user's set (desktop computer)； 51~user's eyes；52~perspective plane；70~server；71st, 71a~people；72nd, 72a~room Room；73rd, 73a~shadow lattice；74~user's set；75~video streams shadow lattice；81st, 81a~sound； 82nd, 82a~sound；83rd, 83a, 1711,1712,1713~shadow lattice；85~video streams shadow lattice； 100th, 1100~application program；110th, 1110~scene transfer device (chained library)；120th, 1120~field Scape server；121~data packet device；170th, 1170~scene client (program)；1111、 1171~shadow lattice colligator；190th, 1190~scene cache；60~67,661,60a~67a, 661a~ Step；101~114,122,124,172,174,176,192,194,1101,1112~1115, 1122nd, 1124,1172,1174,1176,1192,1194~path.

Specific embodiment

In order to be able to more clearly describe system and side that media are transmitted by network proposed by the invention Method, describes in detail below in conjunction with schema.

The present invention with one of be game on line, its player using a user's set by network in one clothes Played on business device, this server according to player instruction action and in producing video signal on user's set； For example, when a player takes action in user's set, this action can be sent to server unit And calculate an image thereon, then by image post-back to user's set.In many games on line, clothes The 3D that 2D images produced by business device are included in other objects visual line of sight Nei is rendered (Rendering).

3D models and 3D sound needed for the offer user's set by server of the invention, with service The 3D for carrying out being located at object in visual line of sight between device and user's set renders parsing, for example, server Some or all of 3D models and 3D sound are provided to user's set, and carry secretly every 3D models or Data are translated in related the annotating of 3D sound, such as position, seat to and status data etc..

For example, initial in game, all related to game on a user device images (including phase The 3D of pass is rendered) it is to be produced by server by network, as the stereosonic 2D video streamings of tool. System of the invention by network, pushed in visual line of sight the media such as 3D models and 3D sound and its Spatial cue to user's set, the object priority of relatively near (near eyes) is pushed.Present system to the greatest extent may be used Can be had to take the second best in rendering for 3D models and 3D sound is carried out on user's set, be in server On carry out rendering for such as 3D models or 3D sound.

One 3D models or a 3D sound were both stored on user's set, and server only needs to provide object (3D Model or 3D sound) annotate and translate data to user's set, user's set can accordingly render these objects simultaneously Result is presented in the stereosonic 2D video signals of any tool that server is provided；Unless user's set Requirement, otherwise server will not render to this 3D model and 3D sound.The inventive method The GPU that this arrangement will be saved on server is calculated, and server can safeguard a dynamic data base, comprising 3D models and 3D sound, to improve the efficiency with user communication.

In the present invention, shown by user's set comprising following combination：A () is in wash with watercolours on server The 3D scenes of dye, are as a result the form of the stereosonic 2D video streamings of tool, be sent to client and by User's set is played, and (b) is downloaded and be stored on user's set from server, is filled by user Put the result for voluntarily rendering 3D models and 3D sound, this stereosonic 2D video streams of tool with use The mixing of the 3D models and 3D sound that are rendered on the device of family, it will in the case of bandwidth occupancy is reduced Create in riotous profusion a 3D scenes and moving ring audio.

In one embodiment, the stereosonic 2D video streams entrainment 3D moulds of tool of user's set are sent to Type and annotating for 3D sound translate data, and whether user's set can detect oneself have this 3D model and 3D Sound, if it's not true, user's set will download required 3D models and 3D sound from server Sound, after download, user's set can be stored and be set up list, in case rebuilding scene institute afterwards Need.In this way, the problems such as delay of video streaming is with massive band width is needed, it will improved, and by User's set end voluntarily renders, and will can obtain quality more preferably image (because without video compress).

Foregoing annotating is translated data and will allow the user's set can not omit or repeat any 3D models or 3D The result that correctly hybrid subscriber device end is rendered with 3D models and 3D sound in the case of sound, The stereosonic 2D video streams of tool provided with server；As it was previously stated, when user's set stores institute After 3D models in need and 3D sound, user's set can rebuild complete 3D scenes and sound, Now, server no longer need to carry out it is any render, until one new adds but user's set end is not stored up New the 3D models or 3D sound deposited stops when occurring, and when a new 3D models are run into, clothes Business device can be rendered this new 3D model and subsequent all objects, until user's set can Untill voluntarily rendering this new 3D model.If additionally, running into a new 3D sound, server meeting This 3D sound is rendered, until its can for user's set end with when stop.

User's set can as much as possible will download 3D models and 3D sound storage (cache) in oneself On storage device, repeated downloads are needed during avoiding performing backward, therefore, the bandwidth cost of network will Can further reduce, if cannot store, downloading and rendering to complete upon execution.

Transmit the signal of the standards Example of system one of media by network for the present invention as shown in Figure 1 Figure.Server 1 is the application program for performing offer service, and this service can be (but being not limited to) One high in the clouds game on line service；Multiple user's sets 21,22,23 can be linked by a network 4 and (stepped on Enter) to server 1, with the service provided using the application program by being operated on server 1.Herein In embodiment, network 4 is an internet, and user's set 21,22,23 is that may be connected to internet Any electronic installation, such as (but being not limited to) smart mobile phone 21, digiboard, notebook computer 22, Desktop computer 23, video game or intelligent electric are regarded etc., and some user's sets 21,22 pass through , to network 4, other user's sets then can be by a router having for one action base station wireless link The mode of line is linked on network 4；The application program for operating on the server can produce one comprising many The virtual 3D environment of individual 3D models and 3D sound, every 3D models or 3D sound and shape of arranging in pairs or groups State, indicates whether 3D models or 3D sound are pre-stored in a user's set 21,22,23.At this In one preferred embodiment of invention, for each user's set, all there is a corresponding independent utility journey Sequence, that is, an application program provides service only to a user's set, but multiple application programs can be simultaneously Perform on the same server, to provide service to multiple user's sets.As illustrated, user fills Put 21,22,23 and server 1 linked to by network 4, with obtain by application program produce and comprising The media of at least some 3D models and 3D sound, this system architecture is specified in Fig. 2 with its feature And in the narration of correlation.

Fig. 2 show the schematic diagram of the embodiment of present system framework one.

In the present invention, application program 100 is used to produce 3D images 3D in being operated on a server 1 The rendering result of sound, usually 3D game.3D scene transfers device 110 is a chained library (library), in application program 100 compiling period and static linkage, or held in application program 100 Between the departure date and dynamic link；3D scenes client (program) 170 be one in user's set 21,22, The program performed on 23, is used to produce and export the 3D images and 3D generated by application program 100 Sound rendering result.In this embodiment, for each user's set 21,22,23, it is all right Should there are independent application program 100 out of the ordinary and scene transfer device 110.

In the present invention, 3D scenes client 170 and 3D scenes cache 190 constitute the journey of client Sequence and the method for execution, to play the operational capability that user's set renders 3D models and 3D sound in itself.

3D scene servers 120 are one common with application program 100 in the clothes performed on server 1 Business device program, as 3D scene transfers device 110 and the user's set 21,22,23 of server 1 170 relay stations of information transmission of 3D scenes client.Simultaneously also as a file download service device, 3 clients 170 for user's set 21,22,23 download necessary 3D models and 3D from server 1 Sound.3D scene transfers device 110 is stored with an inventory, lists all 3D models and 3D sound And the state of model or sound, this state is used to indicate the state of each 3D model or 3D sound It is (1) " Not Ready (not being ready for) ", (2) " Loading (in download) " and (3) " Ready for Client One of (user has downloaded) ".

The main program of application program 100 (path 101 of Fig. 2) by way of API Calls chained library 3D scene informations are sent to 3D scene transfers device 110, this 3D scene information includes title, position Put, speed, attribute, seat to and every other 3D models and 3D sound render required data. After 3D scene transfers device 110 receives such data, you can perform following procedure.

Step (a)：For 3D models, by all 3D models sequences that need to be rendered, its sequence side Formula can be a relative virtual location (such as eyes of 3D perspective planes or user) from closely to remote sequence.

For 3D sound, by all 3D sound sequences that need to be rendered, its sortord can be relative One virtual location (such as eyes of 3D perspective planes or user) is from closely to remote sequence.

In some cases, the 3D models A in 3D scenes can be coated or repeatedly be overlayed on another 3D moulds On type B, for example, model A can be a house, Model B can be then the desk in house, herein Under situation, it is in fact an ambiguous problem, now, model which model is closer to emulation location A and Model B will be considered as same 3D models, can be described as 3D models (A+B).

Some Given informations to scene can be used to aid in sequence, for example, the ground in game, can quilt It is considered as the big and flat 3D models under other 3D objects, generally, the eyes of user can be above Ground Face, therefore, the 3D models on ground need especially treatment in the ranking, to avoid it from being shown in other 3D Before model.

Step (b)：For 3D models speech, find first from closest approach (closest to eyes person) and do not have There are the 3D models " M " of " Ready for Client " state, in other words, first 3D model " M " State is " Not Ready " state, (after this, " Not Ready " state is referred to as NR states)； It is of course also possible to have no such 3D models and exist (such as it is all by shown 3D models all by It is denoted as " Ready for Client " state).

For 3D sound speech, find first from closest approach (closest to eyes person) and do not have " Ready The 3D sound " S " of for Client " states, in other words, first state of 3D sound " S " is " Not Ready " states, (after this, " Not Ready " state is referred to as NR states)；Certainly, also may be used Such 3D sound can be had no in the presence of (such as all to be all denoted as shown 3D sound " Ready for Client " state).

Step (c)：For 3D models speech, server renders 3D model Ms and thereafter all of 3D Model, that is, it is all than M apart from the farther 3D models of eyes.If (there is no 3D model Ms, Then shown with a blank screen), the result after coding is rendered is 2D video streams shadows lattice (frame).

For 3D sound speech, (broadcasting) is rendered on the server all without " ready for client " The 3D sound (if without such 3D sound, generation one is Jing Yin) of state, then, encodes wash with watercolours Result after dye is the stereo of the 2D video streaming shadow lattice in tool step (C).Note：Continue 3D 3D sound after model S only its state it is non-for " Ready for Client " when can just be rendered, this with 3D models in step (C) are different.

Step (d)：Transmit following six information to 3D scene servers 120 (path 112)：[Info 112-A], [Info 112-B], [Info 112-C], [Info 112-D], [Info 112-E] and [Info 112-F]), and information above can be sent to (the road of 3D scenes client 170 by 3D scene servers 120 Footpath 122).

[Info 112-A] is the status information (or annotation data) of all 3D models before 3D model Ms. Note may existing without such model.This class model all has " Ready for Client " state, meaning Call these models and be preloaded on client terminal device, the 3D above client terminal device 21,22,23 Scene client (program) 170 can voluntarily render these models.In order to reduce output transmission Width, 3D scene transfers device 110 may not necessarily transmit whole status informations, as long as transmission state letter In breath, this renders the difference rendered with last time.

[Info 112-B] is if server have found 3D model Ms and the state of its user's set is When " Not Ready ", server will change its User Status for " Loading ", and send out a 3D models The download of M is indicated, it is desirable to client downloads this 3D model Ms；If User Status has been " Loading ", then should not send out any instruction, indicate to have sent out because downloading.

[Info 112-C] be step (C) in coding after regarding letter crossfire shadow lattice.

[Info 112-D] refer to the stateful 3D sound for " ready for client " (also may be without this The presence of class 3D sound) status information (or annotation data), such sound type all have " Ready for Client " states, it is meant that these sound have been preloaded on client terminal device, client terminal device 21,22, 3D scenes client (program) 170 above 23 can voluntarily render (broadcasting) these sound.For Reduction data transfer bandwidth, 3D scene transfers device 110 may not necessarily transmit whole status informations, As long as in transferring status data, this renders the difference rendered with last time.

[Info 112-E] is if server have found 3D sound S and its User Status is " Not During Ready ", its User Status is changed for " Loading ", and sends out the download of a 3D sound S indicate, It is required that this 3D sound of client downloads S；If User Status has been " Loading ", should not send out Any instruction, indicates to have sent out because downloading.

[Info 112-F] be step (C) in coding after it is stereo.

When new 3D contextual datas are updated to 3D scene transfers by the main program of each application program 100 During device 110, repeat step (a)~(d), generally, the main program of application program 100 can be each time Render the middle such data of renewal.

As soon as after 3D scenes client 170 receives aforementioned data, that is, carrying out rendering program described later.

Step (i)：Decode the video signal shadow lattice of [Info 112-C] and use this shadow lattice as follow-up 3D models The background for rendering；Additionally, simultaneously decoding has the stereo of [Info 112-F] video signal and the use of its is follow-up The background sound that 3D sound is rendered.

Step (ii)：Step (i) coding after depending on letter shadow lattice on render all [Info 112-A] in 3D Model, is taken to reduce the network bandwidth, and 3D scenes client 170 will be stored this one [Info 112-A] In information to internal memory, therefore 3D scene transfers device next time 110 can only be transmitted and rendered next time and this wash with watercolours State [Info 112-A] difference between dye, it is not necessary to the whole status information of transmission；In the same manner, when When rendering the 3D sound of all category [Info 112-D], by it is mixed with the stereo of step (i) Zhong Xie Code, Taken to reduce the network bandwidth, 3D scenes client 170 will store this one [Info 112-D] letter In breath to internal memory, therefore 3D scene transfers device next time 110 can only be transmitted to render and rendered with this next time Between state [Info 112-D] difference, it is not necessary to the whole status informations of transmission.

Step (iii)：In step (ii), the stereosonic video signal shadow lattice of tool that server is transmitted are mixed with, with The 3D models that 3D scenes client 170 is voluntarily rendered and 3D sound, both mixing resultants are exported, Output video streams (path 176) as a tool sound.

Provided that there is the state of [Info 112-B], 3D scenes client 170 will be according to following procedure at Reason 3D model Ms.

Step (I)：3D scenes cache 190 (path 174) are searched, 3D scenes cache 190 includes it Preceding download and the 3D model databases being stored in user's set 21,22,23.

Step (II)：If there are 3D model Ms in 3D scenes cache 190, step (V) is performed.

Step (III)：If there is no 3D model Ms in 3D scenes cache 190,3D scene clients End 170 will send out a download demand to 3D scene servers 120 (path 172), the service of 3D scenes Device 120 will return the data of 3D model Ms to 3D scenes client 170 (path 124).

Step (IV)：After one 3D models are downloaded completely, 3D scenes client 170 is deposited into 3D Scene cache 190 (path 194), thereby, when having similar demand next time, that is, is not required to carry out down again Carry.

Step (V)：3D scenes client 170 will extract 3D models from 3D scenes cache 190 M (path 192).

Step (VI)：One downloads completion (or downloading already before), and 3D scenes client 170 can be carried Take 3D model Ms；3D scenes client 170 will send out " 3D model is ready on a client The information of (3D models on a user device) " is to 3D scene servers 120 (path 113), 3D Scene server 120 will transfer this information to 3D scene transfers device 110 (path 114).

Step (VII)：After one 3D scene transfers device 110 receives this information, i.e., can be by 3D model Ms State be changed to by " Loading " " Ready for Client ".

Step (VIII)：In rendering next time, 3D scene transfers device 110 will understand that 3D moulds Type M has been preloaded in user's set, therefore 3D scenes client 170 will be asked voluntarily to render, because This, server 1 will no longer be required to render this 3D model M.

Provided that there is the state of [Info 112-E], 3D scenes client 170 will be ready for according to following programs 3D sound S.(similar previously on [Info 112-B] one depicted)

Step (I)：3D scenes cache 190 (path 174) are searched, 3D scenes cache 190 includes it Preceding download and the 3D audio databases being stored in user's set 21,22,23.

Step (II)：If having 3D sound in 3D scenes cache 190, step (V) is performed.

Step (III)：If not having 3D sound in 3D scenes cache 190,3D scene clients End 170 will send out a download demand to 3D scene servers 120 (path 172), the service of 3D scenes Device 120 will return the data of 3D sound to 3D scenes client 170 (path 124).

Step (IV)：After one 3D sound is downloaded completely, 3D scenes client 170 is deposited into 3D Scene cache 190 (path 194), thereby, when having similar demand next time, that is, is not required to carry out down again Carry.

Step (V)：3D scenes client 170 will extract 3D sound from 3D scenes cache 190 S (path 192).

Step (VI)：One downloads completion (or downloading already before), 3D scenes client 170 can carry Take 3D sound S；3D scenes client 170 will send out " 3D sound is ready on a client The information of (3D sound on a user device) " is to 3D scene servers 120 (path 113), and 3D Scene server 120 will transfer this information to 3D scene transfers device 110 (path 114).

Step (VII)：As soon as after 3D scene transfers device 110 receives this information, i.e., can be by 3D sound S State be changed to by " Loading " " Ready for Client ".

Step (VIII)：In rendering next time, 3D scene transfers device 110 will understand that 3D sound Sound S has been preloaded in user's set, therefore 3D scenes client 170 will be asked voluntarily to render and (broadcast Put), therefore, server 1 no longer needs to render this 3D sound S.

It is without any 3D models and 3D sound when most initial, in user's set 21,22,23 , so 3D scene transfers device 110 will render all 3D models and 3D sound and by its result The stereosonic 2D video streamings of tool are encoded to, 3D scene transfers device 110 will be under 3D models The download demand [Info 112-E] of load demand [Info 112-B] and 3D sound delivers to 3D from closest approach Perspective plane (or eyes of user), 3D scenes client 170 will be from 3D scene servers 120 Every 3D models or 3D sound are downloaded, or is extracted one by one from 3D scenes cache 190；And work as When more 3D models and 3D sound can be that 3D scenes client 170 is obtained, 3D scene transfers Device 110 will automatically notify that 3D scenes client 170 voluntarily renders these models and sound, and subtract The quantity of few 3D models rendered by 3D scene transfers device 110 and 3D sound, thereby, in coding 3D models and 3D sound in the 2D video streams crossed can be fewer and feweri, until last 3D scenes Untill when can obtain all of 3D models and 3D sound in client 170, and afterwards, in this stage In, the black screen for not having sound is only remained, in other words, server 1 is not required to transmit 2D video strings again It flow in user's set 21,22,23, and server 1 and user's set 21, leading between 22,23 News bandwidth occupancy can also be greatly reduced.

In the present invention, when a new 3D models N appears in outdoor scene, 3D scene transfers device 110 is meeting (1) notify that 3D scenes client 170 only renders all 3D before this new 3D models N Model (says) that (2) notify that 3D scenes client 170 downloads this new 3D model N with respect to eyes of user, And (3) 3D scene transfers device 110 will render this new 3D models N and all positioned at thereafter Its result is simultaneously encoded to the 2D video streamings of a tool sound for all models, afterwards, then by this tool sound The 2D video streamings of sound are sent to 3D scenes client 170, then, 3D scenes client 170 Can be before 3D models N be ready on a user device, the 3D images of application program 100 of persistently remaking And the rendering result of sound.

And when a new 3D sound T appears in outdoor scene, 3D scene transfers device 110 is that meeting (1) is notified 3D scenes client 170 downloads this new 3D sound T, and (2) 3D scene transfers device 110 will This new 3D sound T can be rendered and its result is encoded to it is one stereo, afterwards, then this is three-dimensional Sound is sent to 3D scenes client 170 with 2D video streamings, then, 3D scenes client 170 Can before 3D sound T is ready on a user device, persistently remake application program 100 3D images and The rendering result of sound.Herein in a program, new 3D sound T, 3D scene transfer device is only rendered 110 and it is not required to render again all 3D sound at other 3D sound T rears, this practice is because sound The essence of sound is different from image, and image can block the display of image thereafter, but sound will not.

Background music can be considered the 3D sound of the predetermined 3D positions of a tool one, be that can as early as possible download background sound Happy, defined predetermined 3D positions should be better closer to eyes of user.

To reduce server load, or the noise produced by unstable network data transmission is avoided, Server can abandon the coding of all 3D sound in video signal.In this case, 3D sound is only under it After carrying and be pre-stored in a user's set, start from and played on user's set.

For 3D sound speech, server 1 checks the state of 3D sound, to determine that 3D sound Sound need to be encoded as the stereosonic 2D video streamings of a tool, and its coded system is to be pre-stored in user by non- In 3D acoustic codings to video shadow lattice in device；Wherein, when a 3D sound is encoded as video signal shadow During stereo in lattice, the volume of its left and right acoustic channels is determined by the speed of its position and relative user's ear It is fixed；Wherein, background music may be defined as the 3D audios on a precalculated position.

Fig. 3 A show the flow chart that the present invention transmits the embodiment of method one of media by network；When When beginning through network transmission image (step 60), an application program is performed on a server and produces one Virtual 3D environment (step 61) comprising multiple 3D models, every 3D models are arranged in pairs or groups a state, Whether the state instruction this 3D model is pre-stored in user's set.

Server then verifies that the state (step 62) of 3D models, to determine which 3D model is needed 2D video streaming shadow lattice are coded into, the non-3D models being pre-stored in user's set will be encoded into In shadow lattice；Server will be with a certain virtual location (typically 3D perspective planes or user's eyes) Standard, from the near to the remote, checks the state of each 3D models one by one, in inspection, when find first not During the 3D models being pre-stored in user's set, will this one find 3D models be labeled as a NR shapes State, then, no matter whether 3D models thereafter are pre-stored in user's set, this 3D model M And its all 3D models at rear can be all encoded into shadow lattice (step 63)；And work as any 3D models Position when changing or when the referential virtual location of sorting changes, that is, re-execute foregoing inspection Test, and determine whether a 3D models must be encoded into video signal shadow lattice according to newest assay.

Step 64：After 2D video streaming shadow trellis codings, server will this 2D video streaming shadow Lattice and be not pre-stored in user's set 3D models (that is, this have NR states 3D models and All 3D models at its rear) user's set is sent in a predetermined order, this predefined procedure is from most Close to 3D perspective planes (or user's eyes) a little to the order of any on solstics 3D perspective planes； One user's set receives 2D video streamings shadow lattice (step 65), and user's set is that decoding is passed from server Come shadow lattice and be pre-stored in user's set but be not included in shadow lattice as rendering using this shadow lattice The background of 3D models, to produce mixing shadow lattice (step 66) of the output video streaming of a tool sound； When user's set receives the 3D models transmitted by server, user's set will the storage of this 3D model And an information to server is then transmitted, notify the state of server change 3D models for " now It is pre-stored in user's set ", afterwards, the video streaming that user's set transmits server is rendered with voluntarily As a result, both mix output, as new video signal.

In step 62, when a new 3D models are appeared in 3D environment, no matter its rear Whether 3D models are pre-stored in user's set, 3D models and all 3D models in its rear that will be new In coding to shadow lattice.

In step 64, server will also not be encoded into the shape of the 3D models in video streams shadow lattice State information (or annotation data) is sent to user's set；User's set is in reception and test status information When carried out according to following manner：If any 3D models in the status information for receiving are pre-stored in for non- In user's set, then user's set be send out a demand to server, to download 3D model (steps 661), status information include it is each be not encoded into shadow lattice annotate and translate data, each annotating translates packet Include a title of 3D models, a position, a speed, one to and an attribute and every 3D moulds The state of type.

Fig. 3 B show the flow chart that the present invention transmits another embodiment of method of media by network； When network transmission sound is begun through (step 60a), application program generation is performed on a server The one virtual 3D environment (step 61a) comprising multiple 3D sound, every 3D sound is arranged in pairs or groups a state, Whether the state instruction this 3D sound is pre-stored in user's set.

Server then verifies that the state (step 62a) of 3D sound, to determine which 3D sound is needed 2D video streaming shadow lattice are coded into, the non-3D sound being pre-stored in user's set will be encoded into shadow In lattice；Server will be defined by a certain virtual location (typically 3D perspective planes or user's eyes), From the near to the remote, the state of each 3D sound is checked one by one, in inspection, is not prestored when finding first During 3D sound in user's set, will this 3D voice mark for finding be a NR states.

Step 64a：After the video streaming shadow trellis coding comprising sound, server will this tool sound 2D video streaming shadow lattice and the 3D sound that is not pre-stored in user's set (that is, this has NR The 3D sound of state) user's set is sent in a predetermined order, this predefined procedure is certainly closest to 3D The order of another point for a little arriving solstics 3D perspective planes on perspective plane (or user's eyes)；One user After device receives video streaming shadow lattice (step 65a) comprising sound, user's set is that decoding is contained in and regards Audio (that is, sound) in news crossfire is simultaneously pre-stored in user's set but not as rendering using this audio The background of the 3D sound being contained in video streams shadow lattice, to produce a mixed audio (step 66a)； When user's set receives the 3D sound transmitted by server, user's set will this 3D voice storage And an information to server is then transmitted, notify the state of server change 3D sound for " now Be pre-stored in user's set ", afterwards, message in the video streaming that user's set transmits server with The result of (broadcasting) 3D sound is voluntarily rendered, both mix output, as new message.

In step 62a, when a new 3D sound is appeared in 3D environment, that is, it is about to new In 3D acoustic codings to the 2D video streams shadow lattice of tool sound, however, this new 3D sound is not Influence whether other 3D sound are rendered, this puts different from the 3D models in abovementioned steps 62.

In step 64, server will also not be encoded into the status information of the 3D sound in shadow lattice It is sent to user's set；User's set is carried out when reception and test status information according to following manner：Such as Any 3D sound in the status information that fruit receives is pre-stored in person in user's set for non-, then user Device be send out a demand to server, to download 3D sound (step 661a), status information includes It is each be not encoded into shadow lattice annotate and translate data, each annotating translates data including 3D sound Title, a position, a speed, one to and an attribute and every 3D sound state.

As Fig. 4 A, 4B and 4C show how the inventive method is transmitted regarding letter crossfire and 3D models one The schematic diagram of embodiment.

As shown in Figure 4 A, when original user device 74 logins the application program 70 operated on server When, there are not any 3D models to be pre-stored in user's set, therefore, server renders all of 3D Model (including a people 71 and subsequent a house 72), all of 3D models should all be displayed in user's dress On the screen put, rendering result is encoded to a 2D video streams shadows lattice 73 by server, then, will This shadow lattice 73 is sent to user's set 74.In this stage, shadow lattice 73 include people 71 and house 72, User's set 74 only exports this shadow lattice 73, without rendering other objects.

Then, as shown in Figure 4 B, server 70 starts that 3D models first are sent into user's set, Since the 3D models closest to user's set screen 3D perspective planes；In this embodiment, with house 72 compare, and people 71 is to be closer to 3D perspective planes (or user's eyes), therefore, the 3D moulds of people 71 Type is first sent to user's set 74, and the 3D models of a people 71 are transmitted and are stored in user's set 74 After upper, user's set 74 transmits an information to server 70, existing with the 3D models of nunciator 71 It has been pre-stored in user's set 74；Afterwards, server 70 renders house 72, encodes its rendering result It is a 2D video streams shadow lattice 73a, transmits annotating for this shadow lattice 73a and people 71a and translate data to user's dress 74 are put, user's set 74 utilizes to annotate automatically immediately translates data render people, in conjunction with the rendering result of people With shadow lattice 73a (include house), to obtain identical output result；(for example server is with one for this program 3D models are sent to user's set 74 by next mode) one is repeated, until all clients Need the 3D models of display, when all having been transmitted and be pre-stored in 74 in user's set untill.

As shown in Figure 4 C, a user's set 74 possesses all of 3D models (including the 3D in people and house Model), server is not required to carry out Rendering operations again, is also not required to transmit video streams shadow lattice (component 75) again； Server only needs annotating for transmission 3D models to translate data (comprising people 71a and house 72a) to user's set 74. User's set can voluntarily render all 3D models, to obtain identical output result.

As Fig. 6 A, 6B and 6C show the inventive method how to transmit tool sound video streams and The schematic diagram of the embodiment of 3D sound one.

As shown in Figure 6A, when original user device 74 logins the application program 70 operated on server When, there is not any 3D sound to be pre-stored in user's set, therefore, server renders all of 3D Sound (including a sound 81 and subsequent a sound 82), all of 3D sound should all be presented on user On the loudspeaker of device, server by rendering result be encoded to a tool sound regarding letter crossfire shadow lattice 83, Then, this tool sound is sent to user's set 74 depending on letter crossfire shadow lattice 83.In this stage, Have sound includes sound 81 and sound 82 regarding letter crossfire shadow lattice 83, and user's set 74 only exports this The video streams shadow lattice 83 of one tool sound, without rendering (broadcasting) other sound.

Then, as shown in Figure 6B, server 70 starts that 3D sound first is sent into user's set, Since the 3D sound closest to user's set screen 3D perspective planes；In this embodiment, with sound 82 compare, and sound 81 is to be closer to 3D perspective planes (or user's eyes), therefore, sound 81 3D sound is first sent to user's set 74, and the 3D sound of a sound 81 is transmitted and is stored in user's dress After putting on 74, user's set 74 transmits an information to server 70, with alert sound 81 It is pre-stored in user's set 74；Afterwards, server 70 renders sound 82, encodes its rendering result and is The 2D of one tool sound regards letter crossfire shadow lattice 83a, transmits this shadow lattice 83a and annotating for sound 81 translates number Data render (broadcasting) sound is translated according to user's set 74, user's set 74 utilizes to annotate automatically immediately, then Rendering result and shadow lattice 83a (including sound) with reference to sound, to obtain identical output result；This Program (for example 3D sound is sent to user's set 74 by server in the way of one at a time) will again and again Repeat, until institute it is in need on the loudspeaker of user's set broadcasting 3D sound, all transmitted And when being pre-stored in 74 in user's set untill.

As shown in Figure 6 C, a user's set 74 possesses all of 3D sound (including sound 81 and sound The 3D sound of sound 82), server is not required to carry out Rendering operations again, also therefore video streams shadow lattice (group Part 85) only include image without including sound；Server only needs annotating for transmission 3D sound 81 to translate data (bag 81a containing sound and sound 82a) to user's set 74.User then can voluntarily render (broadcasting) to be owned 3D sound, to obtain identical output result.

As Fig. 5 A, 5B and 5C show how the inventive method determines which 3D model must be encoded To the schematic diagram of the embodiment of shadow lattice one.

In the present invention, server sorts all 3D models to be rendered according to a predefined procedure, This predefined procedure is：(such as the 3D perspective planes 52 of user's set screen or used with respect to a virtual location Person's eyes 51) from closely to remote order.As shown in Figure 5A, four objects A, B, C and D need to show Be shown on the screen of user's set, wherein object A closest to perspective plane 52, be then followed successively by object B, Object C and object D, when original user device logins the application program operated on server, not There is any 3D sound to be pre-stored in user's set, therefore, server renders all of object A, right As B, object C and object D, rendering result is encoded to a video streams shadow lattice, then by this shadow lattice It is sent to user's set.Meanwhile, server start one by one according to predefined procedure by object A, object B, The 3D models such as object C and object D send out, that is, the 3D models of object A can be transmitted first, Then it is successively object B, object C and object D, until all 3D being shown on user's set Model is all by untill having transmitted.

As shown in Figure 5 B, after the 3D models in object A and B are all pre-stored in user's set, when According to foregoing when closely the state of 3D models is checked to remote predefined procedure, server will be sent out server Existing object C be first be not pre-stored in user's set in object, therefore, server can be by object C And the every other object (such as object D) after object C is rendered, no matter the 3D models of object D Whether it is pre-stored in user's set, and now, server will not enter to the 3D models of object A and B Row is rendered, and why in this way, because now object A and B are both to be pre-stored in user's set It is before being located at object C.

As shown in Figure 5 C, in a new object E is displayed in the virtual 3D environment of application program creation When, object E and subsequent all objects can all be rendered by server, and no matter whether this object prestores In user's set, for example, as shown in Figure 5 C, compare with object B, object C and object D, New object E is relatively close proximity to 3D perspective planes 52, although the 3D models of object B have been pre-stored in use In the device of family, but because object B shows after new object E, therefore server can be to all of object E, C, B and D are rendered, even if object B may only be partly other object institutes before it Covering.

As Fig. 7 A, 7B and 7C show how the inventive method determines which 3D sound must be encoded To the schematic diagram of the embodiment of video streams shadow lattice one of tool sound.

In the present invention, server sorts all 3D sound to be rendered according to a predefined procedure, This predefined procedure is：(such as the 3D perspective planes 52 of user's set screen or used with respect to a virtual location Person's eyes 51), from closely to remote order.As shown in Figure 7 A, four 3D sound As, B, C and D Need to be in being played on the loudspeaker of user's set, wherein then sound A is followed successively by closest to perspective plane 52 Sound B, sound C and sound D, when original user device logins the application program operated on server When, there is not any 3D sound to be pre-stored in user's set, therefore, server renders all of sound Sound A, sound B, sound C and sound D, rendering result is encoded to the video streams of a tool sound Shadow lattice, then this shadow lattice is sent to user's set.Meanwhile, server starts one by one will according to predefined procedure The data of sound A, sound B, sound C and sound D send out, that is, the 3D sound of sound A Sound can be transmitted first, then be successively sound B, sound C and sound D, until all 3D sound Untill being all stored into after user's set.

As shown in Figure 7 B, after the 3D sound in sound A and B is all pre-stored in user's set, when According to foregoing when closely the state of 3D sound is checked to remote predefined procedure, server will be sent out server Existing sound C be first be not pre-stored in user's set in sound, therefore, server can be by sound C And the every other sound (such as sound D) after sound C is rendered, and server will not be to sound A And the 3D sound of B is rendered, because in this stage, sound A and B have been pre-stored in user's set In.

As seen in figure 7 c, when a new sound E adds the virtual 3D environment created to application program When middle, sound E will be rendered by server, but this renders the wash with watercolours that will not impact other sound Dye, this is different from the 3D models in treatment Fig. 5 C, as seen in figure 7 c, with sound B, sound C And sound D compares, new sound E relatively close proximity to 3D perspective planes 52, not like the 3D in Fig. 5 C Model, the sound being pre-stored in user's set (such as sound A and B) can still be rendered by user's set, But rendered by server if the non-sound (such as sound E, C and D) for being pre-stored in user's set.

The above-mentioned technology of the present invention can also be employed for virtual reality (Virtual-Reality；Abbreviation VR) Scene system, by 3D models and VR produced by the VR scene application programs performed by server Video streaming sends user's set to by network, below will be described.

In order to provide a visual experience of VR for human eye, virtual VR scenes must include one Specialize in image that mankind's left eye views and admires and another specialize in the image that human right observes reward.Such as Fig. 8 institutes It is shown as the schematic diagram of the first embodiment of system architecture one of VR scene systems of the invention.

Scene server 1120 in the present invention is one and is implemented in one with VR scene application programs Server computer on the server 1 of 1100 (also referred to as VR application programs or application programs) Software, is used to produce the virtual VR 3D environment for including multiple 3D models.VR scene applications Program 1100 is also in operating on server 1, usually VR game.VR scene servers 1120 It is that one and application program 1100 are common in the server program performed on server 1, as server 1 VR scene transfers device 1110 and the VR scenes client 1170 of user's set 21,22,23 Between information transmission relay station.VR scene servers 1120 simultaneously also as a file download service device, VR scenes client 1170 for user's set 21,22,23 downloads necessary 3D from server 1 Model.VR scene transfers device 1110 is a chained library (library), in VR scenes application program 1100 Compiling period and static linkage, or in VR scenes application program 1100 perform during and dynamic Link.VR scenes client (program) 1170 are one in execution on user's set 21,22,23 Program, to be produced in user's set and exports the 3D generated by VR scenes application program 1100 Image rendering result.In this embodiment, for each user's set 21,22,23, it is all right There should be independent VR scenes application program 1100 and VR scene transfers device 1110.VR scene transfers Device 1110 is stored with an inventory, lists all 3D models and whether each 3D model is stored up There is the state of user's set, this state is used to indicate each shape of 3D models in user's set State is (1) " Not Ready (not being ready for) ", (2) " Loading (in download) " and (3) " Ready for Client One of (user has downloaded) ".

Server 1 can check these 3D model states, to determine which 3D model need to be coded in one One left eye shadow lattice of 2D video streams, and which 3D model need to be coded in the 2D video streamings One right eye shadow lattice, in the present invention, those are not stored beforehand in user's set 21,22,23 In 3D models can all be coded in the left eye shadow lattice and right eye shadow lattice.In order to reach this function, The main program of VR scenes application program 1100 (path of Fig. 8 by way of API Calls chained library 1101) VR scene informations are sent to VR scene transfers device 1110, this VR scene information includes name Title, position, speed, attribute, seat to and every other 3D model renderings needed for data.In VR After scene transfer device 1110 receives such data, you can perform following procedure.

Step (a)：For all 3D models, by all 3D models that need to be rendered in left eye shadow lattice Sequence, its sortord can be a relative virtual location (such as left eye eyeball on 3D perspective planes or user) From closely to remote sequence.

Step (b)：For 3D models speech, found from closest approach (closest to the left eye eyeball person of user) First 3D model " M " without " Ready for Client " state, in other words, first 3D The state of model " M " is " Not Ready " state, and (after this, " Not Ready " state is referred to as NR states)；It is of course also possible to have no such 3D models there are (such as all 3D that will be shown Model is all denoted as " Ready for Client " state).

Step (c)：For 3D models speech, 3D model Ms are rendered by server 1 and is owned thereafter 3D models, that is, it is all than M apart from the farther 3D models of left eye eyeball.If (without 3D Model M, then shown with a blank screen), encode the left eye that the result after rendering is a 2D video streamings Shadow lattice (frame), there is provided to the left eye viewing of user.

Step (d)：Above-mentioned step (a) to (c) is repeated for right eye shadow lattice, that is, above-mentioned steps A the operation of the left eye eyeball described in () to (c) is all changed to right eye eyeball, thereby produce another 2D video streamings Another shadow lattice i.e. right eye shadow lattice, there is provided to user right eye viewing.

Step (e)：For left eye shadow lattice transmit the following three information to (path of VR scene servers 1120 1112)：[Info 1112-A], [Info 1112-B] and [Info 1112-C], and, it is right eye shadow lattice Transmit following three information to VR scene servers 1120 (path 1113)：[Info 1113-A]、[Info 1113-B] and [Info 1113-C].

Step (f)：Data packet device 121 in VR scene servers 1120 can be left and right two Information ([Info 1112-A], [Info 1112-B], [Info 1112-C], [Info 1113-A], [Info 1113-B] and [Info 1113-C]) it is packaged into a message packet.

Step (g)：The message packet produced in step (f) can be sent to use by VR scene servers 1120 VR scenes client 1170 (path 1122) in family device 21,22,23.

[Info 1112-A] be 3D model Ms before all 3D models status information (or annotate number According to).Note may existing without such model.This class model all has " Ready for Client " state, Mean that these models have been preloaded on user's set, the VR above user's set 21,22,23 Scene client (program) 1170 can voluntarily render these models.In order to reduce output transmission Width, VR scene transfers device 1110 may not necessarily transmit whole status informations, as long as transmission state letter In breath, this renders the difference rendered with last time.

[Info 1112-B] is if server have found 3D model Ms and it is stored up in advance in user's set When the state deposited is " Not Ready ", server will change its User Status for " Loading ", and send out The download of one 3D model Ms is indicated, it is desirable to which user's set downloads this 3D model M；If user's shape State has been " Loading ", then should not send out any instruction, indicates to have sent out because downloading.

[Info 1112-C] is the video streaming shadow lattice of the left eye after the coding in step (c), that is, Left eye shadow lattice.

[Info 1113-A], [Info 1113-B] and [Info 1113-C] is substantially substantially identical respectively In [Info 1112-A], [Info 1112-B] and [Info 1112-C], only [Info 1113-A], [Info 1113-B] and [Info 1113-C] is on right eye shadow lattice.

When be updated to for new VR contextual datas by the main program of each VR scenes application program 1100 During VR scene transfer devices 1110, repeat step (a)~(g), generally, VR scenes application program 1100 Main program can each time render the cycle in update such data.

After one VR scenes client 1170 receives aforementioned data, that is, carry out rendering program described later.

Step (i)：Video signal shadow lattice (including left eye in Xie Code [Info 1112-C and Info 1113-C] Both shadow lattice and right eye shadow lattice) and send this two shadows lattice to shadow lattice colligator 1171.

Step (ii)：Shadow lattice colligator 1171 is by this two shadows lattice (including left eye shadow lattice 1711 and right eye shadow Both lattice 1712) merge the VR shadows lattice 1713 (referring to the 9th figure) for turning into a merging, as The background of follow-up 3D model renderings.

Step (iii)：All [Info 1112-A are rendered on the VR shadow lattice of the merging after step (ii) coding And Info 1113-A] in 3D models.Taken to reduce the network bandwidth, VR scenes client 1170 In will storing this one [Info 1112-A and Info 1113-A] information to internal memory, therefore VR next time Scene transfer device 1110 can only be transmitted and rendered and this [Info 1112-A and Info between rendering next time 1113-A] state difference, it is not necessary to the whole status informations of transmission.

Step (iv)：Rendering result in output step (iii) is as the output for containing VR scenes In video streaming one render after mixing VR shadow lattice, that is, the video streaming knot being finally output Really (path 1176).In this embodiment, the user's set is the electronics of a glasses or helmet moulding Device, that includes two display screens being located at respectively in front of user's left eye and right eye；Wherein, The screen on the left side is used for showing the image (shadow lattice) for the viewing of user's left eye that the screen on the right is used for showing Show the image (shadow lattice) for the viewing of user's right eye.Mixing VR shadow lattice in the output video streaming are It is played in the following manner on this two screen of user's set, that is, will be every in mixing VR shadow lattice Each pixel in left side of a line is all shown in the left eye screen, and per a line in mixing VR shadow lattice Each pixel of right side is then all shown in the right eye screen, to provide the vision of user's virtual reality (VR) Impression.

Provided that when having the state of [Info 1112-B] and [Info 1113-B], indicating 3D model Ms Needs are prepared by VR scenes client 1170, and now, VR scenes client 1170 will be according to following journey Sequence processes 3D model Ms.

Step (I)：VR scenes cache 1190 (path 1174) are searched, VR scenes cache 1190 is included The 3D model databases in user's set 21,22,23 are downloaded and are stored in before.

Step (II)：If there are 3D model Ms in VR scenes cache 1190, step is directly performed Suddenly (V).

Step (III)：If there is no 3D model Ms in VR scenes cache 1190, VR scenes visitor Family end 1170 will send out a download demand to VR scene servers 1120 (path 1172), VR The data that scape server 1120 will return 3D model Ms give the (path of VR scenes client 1170 1124)。

Step (IV)：After one 3D models are downloaded completely, VR scenes client 1170 is deposited into VR scenes cache 1190 (path 1194), thereby, when having similar demand next time, that is, is not required to again It is downloaded.

Step (V)：VR scenes client 1170 will extract 3D from VR scenes cache 1190 Model M (path 1192).

Step (VI)：One downloads completion (or downloading already before), and VR scenes client 1170 can be carried Take 3D model Ms；VR scenes client 1170 will send out " 3D model is ready on a client The information of (3D models on a user device) " is to VR scene servers 1120 (path 1115), VR Scene server 1120 will transfer this information to VR scene transfers device 1110 (path 1114).

Step (VII)：After one VR scene transfers device 1110 receives this information, i.e., can be by 3D model Ms State be changed to by " Loading " " Ready for Client ".

Step (VIII)：In rendering next time, VR scene transfers device 1110 will understand that 3D moulds Type M has been preloaded in user's set, therefore VR scenes client 1170 will be asked voluntarily to render, Therefore, server 1 will no longer be required to render this 3D model M.

It is no any 3D models when most initial, in user's set 21,22,23, so VR scene transfers device 1110 will render all 3D models and its result is encoded to including left eye shadow The 2D video streamings of lattice and right eye shadow lattice.VR scene transfers device 1110 will be under 3D models Load demand [Info 1112-B] and [Info 1113-B], from closest to 3D perspective planes (or the left eye of user Or right eye) person's start to process.VR scenes client 1170 will be under VR scene servers 1120 Every 3D models are carried, or is extracted one by one from VR scenes cache 1190.And work as more 3D When model can be obtained by VR scenes client 1170, VR scene transfers device 1110 will be notified automatically VR scenes client 1170 voluntarily renders these models and sound, and reduces by VR scene transfer devices The quantity of the 1110 3D models for rendering.Thereby, encoded with left eye shadow lattice and right eye shadow lattice 2D video streams in 3D models can be fewer and feweri, until last VR scenes client 1170 On when can obtain all of 3D models untill；And afterwards, in this stage, only remaining black screen is Encoded by server 1, in other words, server 1 is not required to transmit 2D video streamings to user's dress again Put in 21,22,23, and server 1 and user's set 21, the communication bandwidth occupancy between 22,23 Also can be greatly reduced.

When one new 3D models N appears in VR scenes, VR scene transfers device 1110 is that meeting (1) is led to Know that VR scenes client 1170 only renders all 3D moulds before this new 3D models N Type (for respect to the left eye or right eye of user), it is new that (2) notify that VR scenes client 1170 downloads this 3D model N, and (3) VR scene transfers device 1110 will render this new 3D models N and institute Have positioned at all models thereafter and its result is encoded to one includes left eye shadow lattice and right eye shadow lattice 2D video streamings.Afterwards, then by this transmitted with the 2D video streamings of right eye shadow lattice comprising left eye shadow lattice To VR scenes client 1170.Then, VR scenes client 1170 still can 3D models N in The 3D image rendering results of VR scenes application program 1100 of persistently being remake before being ready on user's set.

Figure 10 is the schematic diagram of the second embodiment of the system architecture of VR scene systems of the invention.In In second embodiment shown in Figure 10, most of component is substantially the same with function or similar to Fig. 8 First embodiment, only its shadow lattice colligator 1111 be located at VR scene transfers device 1110 in, Rather than positioned at VR scenes client 1170；Also therefore, same or similar component will in Figure 10 It is given and is identically numbered with Fig. 8, and does not repeat its details.

As shown in Figure 10, the main program of VR scenes application program 1100 passes through API Calls chained library Mode VR scene informations are sent to VR scene transfers device 1110, this VR scene information includes Title, position, speed, attribute, seat to and every other 3D model renderings needed for data. After VR scene transfers device 1110 receives such data, you can perform following procedure.

Step (b)：For 3D models speech, found from closest approach (closest to the left eye eyeball person of user) First 3D model " M " without " Ready for Client " state, in other words, first 3D The state of model " M " is " Not Ready " state, and (after this, " Not Ready " state is referred to as NR states)；It is of course also possible to have no such 3D models exist.

Step (c)：3D models " M " and all follow-up 3D models are all carried out wash with watercolours in the server 1 Dye (if during in the absence of described 3D models " M ", then directly producing a blank screen) is then stored at In internal memory.

Step (d)：Above-mentioned step (a) to (c) is repeated for right eye shadow lattice, that is, above-mentioned steps A the operation of the left eye eyeball described in () to (c) is all changed to right eye eyeball, thereby produce a right eye shadow lattice, there is provided To the right eye viewing of user.

Step (e)：The left eye shadow lattice that will have been rendered by shadow lattice colligator 1111 are merged into right eye shadow lattice It is the VR shadow lattice of the merging in a 2D video streamings.

Step (e)：It is that left eye shadow lattice and right eye shadow lattice transmit following three information to VR scene servers 1120 (paths 1112)：[Info 1112-A], [Info 1112-B] and [Info 1112-C], then, VR scene servers 1120 can again be sent to the VR scenes visitor in user's set 21,22,23 Family end 1170 (path 1122).

[Info 1112-A] is the status information (or annotation data) of all 3D models before 3D model Ms. Note may existing without such model.This class model all has " Ready for Client " state, meaning Taste these models and has been preloaded on user's set, user's set 21,22, the VR scenes visitor above 23 Family end (program) 1170 can voluntarily render these models.In order to reduce data transfer bandwidth, VR Scene transfer device 1110 may not necessarily transmit whole status informations, as long as in transferring status data, This renders the difference rendered with last time.

[Info 1112-C] is for having rendered in step (e) and contains left eye shadow lattice with right eye shadow lattice The VR shadow lattice of the merging in video streaming shadow lattice.

Step (i)：Decoding [Info 1112-C] in merging VR shadows lattice and as follow-up 3D Background during model rendering.

Step (ii)：The 3D models in all [Info 1112-A] are rendered on the VR shadow lattice of the merging. Taken to reduce the network bandwidth, VR scenes client 1170 will store this one [Info 1112-A] letter In breath to internal memory, therefore VR scene transfers device next time 1110 can only be transmitted to render and rendered with this next time Between [Info 1112-A] state difference, it is not necessary to the whole status informations of transmission

Step (iii)：Rendering result in output step (ii) is as the output for containing VR scenes In video streaming one render after mixing VR shadow lattice, that is, the video streaming knot being finally output Really (path 1176).

Figure 11 is the schematic diagram of the 3rd embodiment of the system architecture of VR scene systems of the invention.In In 3rd embodiment shown in Figure 11, most of component is substantially the same with function or similar to Fig. 8 First embodiment, simply this 3rd embodiment no longer have shadow lattice colligator；Also therefore, Figure 11 In same or similar component will be given and be identically numbered with Fig. 8, and do not repeat its details.

As shown in figure 11, VR scene servers 1120 be one be implemented in one have VR scene applications Server computer software on the server 1 of program 1100, is used to produce and includes multiple 3D moulds One virtual VR 3D environment of type.VR scene servers 1120 are one common with application program 1100 In the server program performed on server 1, as the VR scene transfers device 1110 of server 1 With 1170 relay stations of information transmission of VR scenes client of user's set 21,22,23.VR Scene server 1120 simultaneously also as a file download service device, for user's set 21,22,23 VR scenes client 1170 download necessary 3D models from server 1.VR scene transfer devices 1110 are stored with an inventory, list all 3D models and whether each 3D model has been stored In the state of user's set, this state is used to indicate each state of 3D models in user's set It is (1) " Not Ready (not being ready for) ", (2) " Loading (in download) " and (3) " Ready for Client One of (user has downloaded) ".

Server 1 can check the state of these these 3D models, with determine which 3D model need by The left eye shadow lattice in a 2D video streams are encoded, and which 3D model need to be coded in the 2D and regard One right eye shadow lattice of frequency crossfire, in the present invention, those be not stored beforehand user's set 21, 22nd, the 3D models in 23 can be all coded in the left eye shadow lattice and right eye shadow lattice.In order to reach this Function, the main program of VR scenes application program 1100 (Figure 11 by way of API Calls chained library Path 1101) VR scene informations are sent to VR scene transfers device 1110, this VR scene letter Breath include title, position, speed, attribute, seat to and every other 3D model renderings needed for number According to.After VR scene transfers device 1110 receives such data, you can perform following procedure.

Step (c)：For 3D models speech, 3D model Ms are rendered by server 1 and is owned thereafter 3D models, that is, it is all than M apart from the farther 3D models of left eye eyeball.If (without 3D Model M, then shown with a blank screen), encode the left side that the result after rendering is a 2D video streamings Eye shadow lattice (frame), there is provided to the left eye viewing of user.

As soon as after VR scenes client 1170 receives aforementioned data, that is, carrying out rendering program described later.

Step (i)：Video signal shadow lattice (including left eye shadow in decoding [Info 1112-C and Info 1113-C] Both lattice and right eye shadow lattice) and this two shadows lattice is stored in different memory headrooms.

Step (ii)：All [Info are rendered respectively on left eye shadow lattice after the decoding and right eye shadow lattice 1112-A and Info 1113-A] included in 3D models (if this 3D model exist if). Taken to reduce the network bandwidth, VR scenes client 1170 will store this [Info 1112-A and Info 1113-A] in information to internal memory, therefore VR scene transfers device next time 1110 can be transmitted only next time Render the difference of [the Info 1112-A and Info 1113-A] state with this between rendering, it is not necessary to Transmit whole status informations.

Step (iii)：Rendering result in output step (ii) is as the output for containing VR scenes A mixing left eye shadow lattice after rendering in video streaming mix right eye shadow lattice, that is, finally quilt with one The video streaming result (path 1176) of output.Wherein, described mixing left eye shadow lattice with mix the right side Eye shadow lattice, can merge the mixing VR shadow lattice being referred to as previously once referred to.

In this embodiment, the user's set is the electronic installation of a glasses or helmet moulding, and it includes Two display screens in front of user's left eye and right eye are located at respectively；Wherein, the screen on the left side is used To show the image (shadow lattice) for the viewing of user's left eye, the screen on the right is used for showing for user's right eye The image (shadow lattice) of viewing.Mixing VR shadow lattice in the output video streaming are to broadcast in the following manner It is put on this two screen of user's set, that is, each in mixing VR shadow lattice is mixed left Eye shadow lattice are all shown in the left eye screen, and each mixes right eye shadow lattice then in mixing VR shadow lattice The right eye screen is all shown in, to provide the visual experience of user's virtual reality (VR).

And in another embodiment, in the video streaming exported on a screen of user's set be same Sequentially show that the mixing left eye shadow lattice mix right eye shadow lattice with this on one screen in turn.User can wear one The electronic installation of glasses moulding, it can correspond to mixing left eye shadow lattice shown on the screen and is somebody's turn to do Mix right eye shadow lattice sequentially to open and close its left eye window and right eye window in turn, to provide user The visual experience of virtual reality (VR).

Embodiment described above bel not applied to limitation it is of the invention can range of application, protection of the invention Scope should be included with scope of the presently claimed invention content institute definition techniques spirit and its impartial change Scope based on.The impartial change done according to scope of the invention as claimed generally and modification, will Do not lose where main idea of the invention, also do not depart from the spirit and scope of the present invention, former capital should be regarded as this hair Bright further status of implementation.

Claims

1. a kind of method that media are transmitted by network, the media include multiple images, it is characterised in that The method is comprised the following steps：

Step (A)：A virtual reality applications program is performed on a server, to produce one to include multiple The virtual VR 3D environment of 3D models, every 3D models arrange in pairs or groups the instruction 3D models whether It is pre-stored in the state in a user's set；

Step (B)：The state of the plurality of 3D models of the server check, so which 3D mould determined Type will be encoded as the left eye shadow lattice and a right eye shadow lattice that a 2D video streamings are included, its coding Mode is by the non-the plurality of 3D model based codings being pre-stored in the user's set to the left eye shadow lattice and is somebody's turn to do In right eye shadow lattice；

Step (C)：The server is at least by the left eye shadow lattice of the 2D video streamings and the right eye shadow lattice The user's set is sent to by network；Wherein, also non-be pre-stored in user's set for this by the server The plurality of 3D models be sent to the user's set in a predetermined order；When the user's set is received During the plurality of 3D models come by server transmission, the user's set will the plurality of 3D models storage An information to the server is deposited and sent, to change the state of the plurality of 3D models, and indicates this many Individual 3D models are currently to be pre-stored in the user's set；And

2. it is according to claim 1 by network transmit media method, it is characterised in that In the step (B), the state of the plurality of 3D models is by the server with closest to the one of a virtual location Point is tested to the order of another point farthest away from the virtual location；Also, in inspection, work as hair During existing first 3D model not being pre-stored in the user's set, no matter positioned at the plurality of 3D thereafter Whether model is pre-stored in the user's set, will all include all remaining 3D of the 3D models of the discovery In model based coding to left eye shadow lattice and the right eye shadow lattice.

3. it is according to claim 2 by network transmit media method, it is characterised in that when When one new 3D models are appeared in the VR 3D environment, no matter positioned at the plurality of 3D moulds thereafter Whether type is pre-stored in the user's set, will all include all 3D models thereafter of the new 3D models In coding to left eye shadow lattice and the right eye shadow lattice.

4. it is according to claim 2 by network transmit media method, it is characterised in that should Virtual location is a 3D perspective planes；Also, in the step (D), will be from the clothes in the user's set After left eye shadow lattice and the right eye shadow lattice that business device is received are decoded, the user's set is further by the left side Eye shadow lattice and the right eye shadow lattice merge into the VR shadow lattice of a merging, then use the VR shadows of the merging Lattice render as the background frame and the plurality of be pre-stored in user's set but be not included in the left eye shadow lattice And the 3D models in the right eye shadow lattice, output video of the VR scenes to produce as including The mixing VR shadow lattice of crossfire.

5. it is according to claim 1 by network transmit media method, it is characterised in that：

In the step (C), the server be used to by this be not pre-stored in it is the plurality of in the user's set The predefined procedure that 3D models are sent to the user's set be one with closest to the virtual location a bit To the order of another point farthest away from the virtual location；

In the step (C), the server will be encoded into the left eye shadow lattice and the right eye shadow lattice One status information of the 3D models is sent in the user's set, and the user's set is in reception and checks Carried out according to following manner during the status information：If any 3D models in receiving the status information Prestored in the apparatus for non-, then the user's set is a demand of sending out to the server, is somebody's turn to do with downloading 3D models；Wherein, the status information includes each left eye for not being encoded into the 2D video streamings At least one of the 3D models in shadow lattice and the right eye shadow lattice annotates and translates data, each 3D models should Annotate translate a title of data including the 3D models, a position, a speed, one to and an attribute.

6. it is a kind of by network transmit media system, it is characterised in that including：

One server, is used to perform a virtual reality applications program, to produce one to include multiple 3D moulds Whether the virtual VR 3D environment of type, arrange in pairs or groups instruction 3D models of every 3D models prestore State in a user's set；And

The user's set, by a network linking to the server, is used to obtain include and is answered by the VR The media at least some 3D models produced with program；

Wherein, the media include multiple images, and the transmission means of the plurality of image is comprised the following steps：

Step (B)：The state of the plurality of 3D models of the server check, so which 3D mould determined Type need to be encoded as a left eye shadow lattice and a right eye shadow lattice of a 2D video streamings, and its coded system is By the non-the plurality of 3D model based codings being pre-stored in the user's set to the left eye shadow lattice and the right eye shadow In lattice

Step (C)：The server is at least by the left eye shadow lattice of the 2D video streamings and the right eye shadow lattice The user's set is sent to by the network；Wherein, this also non-is pre-stored in user's set by the server In the plurality of 3D models be sent to the user's set in a predetermined order；When the user's set is received During to the plurality of 3D models come by server transmission, the user's set stores up the plurality of 3D models An information to the server is deposited and sent, to change the state of the plurality of 3D models, and indicates this many Individual 3D models are currently to be pre-stored in the user's set；

Step (D)：The left eye shadow lattice and the right eye shadow lattice that the user's set receives this from the server Decoding, and the left eye shadow lattice and the right eye shadow lattice are merged into the VR shadow lattice of a merging, then use The VR shadows lattice of the merging the plurality of be pre-stored in user's set but be not included in the merging as rendering One background frame of VR shadow lattice, mixes with produce the output video streaming for including a VR scenes one Close VR shadow lattice；And

Step (E)：The user's set output include the VR scenes the output video streaming this mix Close VR shadow lattice.

7. it is according to claim 6 by network transmit media system, it is characterised in that In the step (B), the state of the plurality of 3D models is by the server with closest to the one of a virtual location Point is tested to the order of another point farthest away from the virtual location, in inspection, when discovery first It is individual be not pre-stored in the user's set 3D models when, no matter be positioned at the plurality of 3D models thereafter It is no to be pre-stored in the user's set, will all include that all remaining 3D models of the 3D models of the discovery are compiled In code to left eye shadow lattice and the right eye shadow lattice.

8. it is according to claim 7 by network transmit media system, it is characterised in that when When one new 3D models are appeared in the VR 3D environment, no matter positioned at the plurality of 3D moulds thereafter Whether type is pre-stored in the user's set, will all include all remaining 3D models of the new 3D models In coding to left eye shadow lattice and the right eye shadow lattice.

9. it is according to claim 6 by network transmit media system, it is characterised in that：

In the step (C), the server is used to the plurality of 3D that will be pre-stored in the user's set The predefined procedure that model is sent to the user's set is one with closest to the 3D moulds of the virtual location Order of the type to another 3D models farthest away from the virtual location；

In the step (C), the server will be also encoded into the left eye shadow lattice and the right eye shadow lattice A status information of the 3D models be sent in the user's set；The user's set is in reception and examines Carried out according to following manner when testing the status information：If any 3D in receiving the status information Model prestores in the apparatus for non-, then the user's set is a demand of sending out to the server, below Carry the 3D models；Wherein, the status information is not encoded into the left eye shadow lattice and the right side including each One of the 3D models in eye shadow lattice annotates and translates data, each 3D models this annotate translate that data include should One title of 3D models, a position, a speed, one to and an attribute.

10. it is according to claim 6 by network transmit media system, it is characterised in that The server also includes：

One VR scene transfer devices, the dynamic in the VR application programs or in run time to be compiled in It is linked at the chained library in the VR application programs；Wherein, the VR scene transfers device is comprising all One list of the state of the 3D models and each 3D models, the state is used to indicate the 3D The state of model is one of " not being ready for ", " in download " and " user has downloaded "；And

11. systems that media are transmitted by network according to claim 10, it is characterised in that The user's set also includes：

A kind of 12. methods that media are transmitted by network, the media include multiple images, and its feature exists In the method is comprised the following steps：

Step (A)：A virtual reality applications program is performed on a server, to produce comprising multiple One virtual VR 3D environment of 3D models, the arrange in pairs or groups instruction 3D models of every 3D models are The no state being pre-stored in a user's set；

Step (B)：The state of the plurality of 3D models of the server check, so which 3D determined Model sound need to be encoded as a left eye shadow lattice and a right eye shadow lattice of a 2D video streamings, its coding staff Formula is being somebody's turn to do the non-the plurality of 3D model based codings being pre-stored in the user's set to the 2D video streamings In left eye shadow lattice and the right eye shadow lattice；Then, the server closes the left eye shadow lattice and the right eye shadow lattice And the VR shadow lattice of the merging for the 2D video streamings；

Step (C)：The VR shadows lattice of the merging of the 2D video streamings are at least passed through one by the server Network is sent to the user's set；Wherein, the server also by this it is non-be pre-stored in user's set should Multiple 3D models are sent to the user's set in a predetermined order；When the user's set is received by this During the plurality of 3D models that server transmission comes, the user's set will the storage of the plurality of 3D models simultaneously An information to the server is sent, to change the state of the plurality of 3D models, and is used to indicate this many Individual 3D models are currently to be pre-stored in the user's set；And

Step (D)：The user's set by this from the server receive the 2D video streamings the merging VR shadow lattice decoded, and VR shadows lattice by the use of the merging the plurality of are pre-stored in use as rendering In the device of family but the background frame of 3D models that is not included in the VR shadow lattice of the merging, to produce One mixing VR shadow lattice of the raw one output video streaming for including a VR scenes.

13. methods that media are transmitted by network according to claim 12, it is characterised in that：

In the step (B), the state of the plurality of 3D models is by the server with closest to a virtual bit The order a little to another point farthest away from the virtual location put is tested, and in inspection, works as hair During existing first 3D model not being pre-stored in the user's set, no matter positioned at the plurality of 3D thereafter Whether model is pre-stored in the user's set, will all include all remaining 3D of the 3D models of the discovery In model based coding to left eye shadow lattice and the right eye shadow lattice of the 2D video streamings；

In the step (C), the server is also by the plurality of 3D moulds not being pre-stored in the user's set Type with Zi closest to the virtual location a little to another point farthest away from the virtual location one make a reservation for it is suitable Sequence is sent in the user's set；When the user's set is received by the plurality of of server transmission During 3D models, the plurality of 3D models are stored and send an information to the server by the user's set, To change those states of the plurality of 3D models, and indicate the plurality of 3D models currently to be pre-stored in this In user's set.

14. methods that media are transmitted by network according to claim 13, it is characterised in that When a new 3D models are appeared in the VR 3D environment, no matter positioned at the plurality of 3D thereafter Whether model is pre-stored in the user's set, will all include the 3D moulds all thereafter of the new 3D models Type is encoded into the left eye shadow lattice and the right eye shadow lattice；The virtual location therein is a 3D perspective planes.

15. methods that media are transmitted by network according to claim 12, it is characterised in that In the step (C), the server will also not be encoded into the left eye shadow lattice and the right eye shadow lattice in should One status information of 3D models is sent in the user's set, and the user's set is in receiving and check the shape Carried out according to following manner during state information：If any 3D models in receiving the status information are It is non-to prestore in the apparatus, then the user's set be send out a demand to the server, to download the 3D Model；Wherein, the status information includes each left eye shadow for not being encoded into the 2D video streamings One of the 3D models in lattice and the right eye shadow lattice annotates and translates data, and this is annotated and translates data and include the 3D models A title, a position, a speed, one to and an attribute.