WO2018062641A1 - Fourniture d'un service de réalité virtuelle en tenant compte de la zone d'intérêt - Google Patents

Fourniture d'un service de réalité virtuelle en tenant compte de la zone d'intérêt Download PDF

Info

Publication number
WO2018062641A1
WO2018062641A1 PCT/KR2017/001087 KR2017001087W WO2018062641A1 WO 2018062641 A1 WO2018062641 A1 WO 2018062641A1 KR 2017001087 W KR2017001087 W KR 2017001087W WO 2018062641 A1 WO2018062641 A1 WO 2018062641A1
Authority
WO
WIPO (PCT)
Prior art keywords
video data
information
base layer
data
service
Prior art date
Application number
PCT/KR2017/001087
Other languages
English (en)
Korean (ko)
Inventor
류은석
Original Assignee
가천대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 가천대학교 산학협력단 filed Critical 가천대학교 산학협력단
Publication of WO2018062641A1 publication Critical patent/WO2018062641A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/332Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/366Image reproducers using viewer tracking
    • H04N13/383Image reproducers using viewer tracking for tracking with gaze detection, i.e. detecting the lines of sight of the viewer's eyes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/155Conference systems involving storage of or access to video conference sessions

Definitions

  • This specification relates to providing a virtual reality service considering a region of interest.
  • Video conferencing services are examples of services implemented on the basis of virtual reality technology.
  • a user may use a device for processing multimedia data including video information of a conference participant for a video conference.
  • the present specification provides image processing in consideration of ROI information in virtual reality.
  • the present specification provides image processing of different quality according to the gaze information of the user.
  • the present disclosure provides image processing in response to a change in the gaze of the user.
  • the present disclosure provides signaling corresponding to a change in gaze of a user.
  • An image receiving apparatus includes a communication unit configured to receive a bitstream including video data for a virtual reality service, wherein the video data includes at least base layer video data for a base layer and predicted from the base layer. At least one enhancement layer video data for one enhancement layer; A base layer decoder for decoding the base layer video data; And an enhancement layer decoder that decodes the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data is video data for at least one region of interest in a virtual space. Can be.
  • the image receiving apparatus includes a communication unit for receiving base layer video data for the base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer; A first processor for decoding the base layer video data; And a second processor electrically coupled with the first processor to decode the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data is within a virtual space. It may be video data for at least one region of interest.
  • the image transmission apparatus includes a base layer encoder for generating base layer video data; An enhancement layer encoder for generating at least one enhancement layer video data based on the base layer video data; And a communication unit configured to transmit a bitstream including video data for a virtual reality service, wherein the video data is the at least one of the base layer video data for a base layer and the at least one enhancement layer predicted from the base layer.
  • One enhancement layer video data, wherein the at least one enhancement layer video data may be video data for at least one region of interest in a virtual space.
  • an image receiving method includes receiving a bitstream including video data for a virtual reality service, wherein the video data is predicted from base layer video data for the base layer and the base layer At least one enhancement layer video data for at least one enhancement layer; Decoding the base layer video data; And decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data may be video data for at least one region of interest in a virtual space.
  • the image transmission method comprises the steps of generating the base layer video data; Generating at least one enhancement layer video data based on the base layer video data; And transmitting a bitstream comprising video data for the virtual reality service, wherein the video data is for the base layer video data for the base layer and the at least one for the at least one enhancement layer predicted from the base layer.
  • One enhancement layer video data, wherein the at least one enhancement layer video data may be video data for at least one region of interest in a virtual space.
  • the image processing apparatus may apply different image processing methods based on the eyes of the user.
  • the video conferencing device for example, HMD
  • the bandwidth (BW) for image transmission There is an effect of reducing the power consumption through the improvement of the image processing performance.
  • FIG. 1 is a diagram illustrating an exemplary video conferencing system.
  • FIG. 2 is a diagram illustrating an exemplary video conferencing service.
  • FIG. 3 is a diagram illustrating an example scalable video coding service.
  • FIG. 4 is a diagram illustrating an exemplary configuration of a server device.
  • 5 is a diagram illustrating an exemplary structure of an encoder.
  • FIG. 6 illustrates an example video conferencing service using scalable video coding.
  • FIG. 7 is a diagram illustrating an exemplary image transmission method.
  • FIG. 8 is a diagram illustrating an example method of signaling a region of interest.
  • FIG. 9 is a diagram illustrating an exemplary configuration of a client device.
  • FIG. 10 is a diagram illustrating an exemplary configuration of a controller.
  • 11 is a diagram illustrating an exemplary configuration of a decoder.
  • FIG. 12 is a diagram illustrating an exemplary method of generating and / or transmitting image configuration information.
  • FIG. 13 is a diagram illustrating an example method for a client device to signal image configuration information.
  • FIG. 14 is a diagram illustrating an exemplary method of transmitting a high / low level image.
  • 15 is a diagram illustrating an exemplary image decoding method.
  • 16 is a diagram illustrating an exemplary video encoding method.
  • FIG. 17 is a diagram illustrating an exemplary syntax of ROI information.
  • FIG. 18 is a diagram illustrating exemplary ROI information and an exemplary SEI message in XML format.
  • 19 illustrates an example protocol stack of a client device.
  • SLT service layer signaling
  • 21 is a diagram illustrating an example SLT.
  • 22 is a diagram illustrating an example code value of a serviceCategory attribute.
  • FIG. 23 illustrates an example SLS bootstrapping and example service discovery process.
  • 24 is a diagram illustrating an exemplary USBD / USD fragment for ROUTE / DASH.
  • FIG. 25 is a diagram illustrating an example S-TSID fragment for ROUTE / DASH.
  • FIG. 26 illustrates an exemplary MPD fragment.
  • FIG. 27 is a diagram illustrating an exemplary process of receiving a virtual reality service through a plurality of ROUTE sessions.
  • 28 is a diagram illustrating an exemplary configuration of a client device.
  • 29 is a diagram illustrating an exemplary configuration of a server device.
  • FIG. 30 is a diagram illustrating an exemplary operation of a client device.
  • FIG. 31 is a diagram illustrating an exemplary operation of a server device.
  • first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.
  • first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.
  • FIG. 1 is a diagram illustrating an exemplary video conferencing system.
  • the video conferencing system may provide video conferencing services to at least one user located at a remote location.
  • Video conferencing service is a service that allows people in different regions to have a meeting while looking at each other's faces on the screen without meeting each other directly.
  • the video conferencing system can be configured in two forms.
  • a video conferencing system can be achieved using direct N: N communication between client devices (eg, HMDs) of each user.
  • client devices eg, HMDs
  • the video conferencing system can provide an optimal video for each user.
  • the video conferencing system may further include a server device (or relay system) for video conferencing.
  • the server device may receive at least one video image from each client device, and collect / select at least one video image to serve each client device.
  • Video conferencing system 100 may include at least one client device 120, and / or server device 130 for at least one user 110 in a remote location.
  • the client device 120 may obtain user data from the user 110 using the client device 120.
  • the user data may include image data, audio data, and additional data of the user.
  • the client device 120 may include at least one of a 2D / 3D camera and an immersive camera that acquire image data of the user 110.
  • the 2D / 3D camera may capture an image having a viewing angle of 180 degrees or less.
  • Immersive cameras can capture images with a viewing angle of less than 360 degrees.
  • the client device 120 may acquire the user data of the first user 111 located in the first place (Place 1), the first client device 121 and the second located in the second place (Place 2). At least one of a second client device 123 for acquiring user data of the user 113 and a third client device 125 for acquiring user data of the third user 115 located in the third place (Place 3) It may include.
  • each client device 120 may transmit the obtained user data to the server device 130 via the network.
  • the server device 130 may receive at least one user data from the client device 120.
  • the server device 130 may generate the entire image for the video conference in the virtual space based on the received user data.
  • the entire image may represent an immersive image providing an image in a 360 degree direction in the virtual space.
  • the server device 130 may generate the entire image by mapping the image data included in the user data to the virtual space.
  • the server device 130 may transmit the entire image to each user.
  • Each client device 120 may receive the entire image and render and / or display as much as the area viewed by each user in the virtual space.
  • FIG. 2 is a diagram illustrating an exemplary video conferencing service.
  • the first user 210, the second user 220, and the third user 230 may exist in the virtual space.
  • the first user 210, the second user 220, and the third user 230 may perform a conference while looking at each other in a virtual space.
  • the description will be given based on the first user 210.
  • the video conferencing system may determine the line of sight of the speaker and / or the first user 210 speaking in the virtual space.
  • the second user 220 may be a speaker, and the first user 210 may look at the second user.
  • the video conferencing system may transmit an image of the second user 220 viewed by the first user 210 to the first user 210 as a high quality video image.
  • the video conferencing system may transmit an image of the third user 230 which is invisible or partially visible in the direction of the first user 220 to the first user 210 as a low quality video image.
  • the video conferencing system makes a difference in the image processing method based on the user's eyes, and saves the bandwidth (BW) for video transmission, compared to the conventional method of transmitting all the images as high quality video images.
  • Image processing performance can be improved.
  • FIG. 3 is a diagram illustrating an example scalable video coding service.
  • the scalable video coding service is a video compression method for providing various services in a scalable manner in terms of time, space, and picture quality in accordance with various user environments such as network conditions or terminal resolutions in various multimedia environments.
  • Scalable video coding services generally provide scalability in terms of spatial resolution, quality, and temporal.
  • Spatial scalability can be serviced by encoding different resolutions for the same image for each layer. It is possible to provide image content adaptively to devices having various resolutions such as digital TVs, laptops, and smart phones by using spatial hierarchies.
  • the scalable video coding service may simultaneously support a TV having one or more different characteristics from a VSP (Video Service Provider) through a home gateway in a home.
  • VSP Video Service Provider
  • the scalable video coding service may simultaneously support high-definition TV (HDTV), standard-definition TV (SDTV), and low-definition TV (LDTV) having different resolutions.
  • HDTV high-definition TV
  • SDTV standard-definition TV
  • LDTV low-definition TV
  • Temporal scalability may adaptively adjust a frame rate of an image in consideration of a network environment or content of a terminal through which content is transmitted. For example, by providing a service at a high frame rate of 60 frames per second (FPS) when using a local area network, and providing a content at a low frame rate of 16 frames by using a wireless broadband network such as a 3G mobile network, The service can be provided so that the user can receive the video without interruption.
  • FPS frames per second
  • the scalable video coding service may include a base layer and one or more enhancement layer (s), respectively.
  • the receiver When the receiver receives only the base layer, the receiver may provide a general image quality, and when the receiver receives both the base layer and the enhancement layer, it may provide high quality. That is, when there is a base layer and one or more enhancement layers, the more enhancement layers (for example, enhancement layer 1, enhancement layer 2,..., enhancement layer n) are received when the base layer is received, the quality of the image or the quality of the provided image is increased. This gets better.
  • the receiver receives a small amount of base layer data quickly, processes and plays back the image of general quality, and adds the enhancement layer image data if necessary. Can improve the quality of service.
  • FIG. 4 is a diagram illustrating an exemplary configuration of a server device.
  • the server device 400 may include a control unit 410 and / or a communication unit 420.
  • the controller 410 may generate an entire image for a video conference in the virtual space and encode the generated entire image. In addition, the controller 410 may control all operations of the server device 400. Details are described below.
  • the communication unit 420 may transmit and / or receive data to an external device and / or a client device.
  • the communicator 420 may receive user data and / or signaling data from at least one client device.
  • the communication unit 420 may transmit the entire image for the video conference to the client device in the virtual space.
  • the controller 410 may include at least one of a signaling data extractor 411, an image generator 413, an ROI determiner 415, a signaling data generator 417, and / or an encoder 419. have.
  • the signaling data extractor 411 may extract signaling data from data received from the client device.
  • the signaling data may include image configuration information.
  • the image configuration information may include gaze information indicating a user's gaze direction and a zoom region information indicating a user's viewing angle in the virtual space.
  • the image generator 413 may generate the entire image for the video conference in the virtual space based on the image received from the at least one client device.
  • the ROI determiner 417 may determine an ROI corresponding to the user's gaze direction in the entire area of the virtual space for the video conference service. For example, the ROI determiner 417 may determine the ROI based on the gaze information and / or the zoom region information. For example, the region of interest may be the location of a tile in the virtual space that the user will see (eg, where a new enemy appears in a game, a speaker's location in the virtual space), and / or the user's location. It may be where your eyes look. Also, the region of interest determination unit 417 may determine a virtual space for a video conference service.
  • the ROI may be generated to indicate the ROI corresponding to the direction of the user's gaze in the entire region.
  • the signaling data generator 413 may generate signaling data for processing the entire image.
  • the signaling data may transmit the ROI information.
  • the signaling data may be transmitted through at least one of a Supplement Enhancement Information (SEI), a video usability information (VUI), a Slice Header, and a file describing video data.
  • SEI Supplement Enhancement Information
  • VUI video usability information
  • Slice Header a file describing video data.
  • the encoder 419 may encode the entire video based on the signaling data. For example, the encoder 419 may encode the entire image in a customized manner for each user based on each user's gaze direction. For example, when the first user looks at the second user in the virtual space, the encoder encodes an image corresponding to the second user in high quality based on the first user's gaze in the virtual space, and corresponds to the third user.
  • the video can be encoded with low quality.
  • the encoder 419 may include at least one of the signaling data extractor 411, the image generator 413, the ROI determiner 415, and / or the signaling data generator 417. have.
  • 5 is a diagram illustrating an exemplary structure of an encoder.
  • the encoder 500 may include at least one of a base layer encoder 510, at least one enhancement layer encoder 520, and a multiplexer 530.
  • the encoder 500 may encode the entire image using a scalable video coding method.
  • the scalable video coding method may include scalable video coding (SVC) and / or scalable high efficiency video coding (SHVC).
  • the scalable video coding method is a video compression method for providing various services in a scalable manner in terms of time, space, and picture quality according to various user environments such as network conditions or terminal resolution in various multimedia environments.
  • the encoder 500 may generate a bitstream by encoding two or more different quality (or resolution, frame rate) images for the same video image.
  • the encoder 500 may use inter-layer prediction tools, which are encoding methods using inter-layer redundancy, to increase compression performance of a video image.
  • the inter-layer prediction tool improves the extrusion efficiency in the enhancement layer by removing redundancy of images existing between layers.
  • the enhancement layer may be encoded by referring to information of a reference layer using an inter-layer prediction tool.
  • the reference layer refers to a lower layer referenced when encoding the enhancement layer.
  • the bitstreams of all the lower layers referred to are required.
  • the bitstream of the lowest layer is a base layer and may be encoded by an encoder such as H.264 / AVC, HEVC, or the like.
  • the base layer encoder 510 may generate base layer video data (or base layer bitstream) for the base layer by encoding the entire image.
  • the base layer video data may include video data for the entire area that the user views within the virtual space.
  • the image of the base layer may be the image of the lowest quality.
  • the enhancement layer encoder 520 may include at least one enhancement layer for at least one enhancement layer that is predicted from the base layer by encoding the entire picture based on the signaling data (eg, region of interest information) and the base layer video data.
  • Video data (or enhancement layer bitstream) may be generated.
  • the enhancement layer video data may include video data for the region of interest in the entire region.
  • the multiplexer 530 may multiplex base layer video data, at least one enhancement layer video data, and / or signaling data, and generate one bitstream corresponding to the entire image.
  • FIG. 6 illustrates an example video conferencing service using scalable video coding.
  • the client device receives the entire video as one compressed video bitstream, decodes it, and renders the image as much as the user views in the virtual space.
  • the prior art transmits and / or receives the entire image (eg, 360 degree immersive image) as a high resolution (or high quality) image, so the total bandwidth of the bitstream where the high resolution image is collected is very large. .
  • the server device may use a scalable video coding method.
  • exemplary techniques are described in detail.
  • the virtual user 611 may include a first user 611, a second user 613, and a third user 615.
  • the first user 611, the second user 613, and the third user 615 may have a meeting in the virtual space area 610.
  • the client device may determine the line of sight of the speaker and the user in the virtual space and generate image configuration information.
  • the client device may transmit the image configuration information to the server device and / or another client device when the image composition information is generated for the first time or when the gaze of the user does not face the speaker.
  • the server device may receive a video image and signaling data from at least one client device, and generate an entire image of the virtual space 610.
  • the server device may then encode the at least one video image based on the signaling data.
  • the server device may differently encode the quality of the video image corresponding to the gaze direction (or the region of interest) and the video image not corresponding to the gaze direction based on the image configuration information (for example, gaze information and middle region information). Can be.
  • the server device may encode a video image corresponding to the user's gaze direction with high quality, and encode a video image corresponding to the user's gaze direction with low quality.
  • the first video image 630 is a video image of the ROI corresponding to the eyeline direction of the first user 611.
  • the first video image 630 needs to be provided to the first user 611 in high quality.
  • the server device may encode the first video image 630 to generate base layer video data 633, and generate at least one enhancement layer video data 635 using inter-layer prediction.
  • the second video image 650 is a video image of a non-interested region that does not correspond to the eye direction of the first user 611.
  • the second video image 650 needs to be provided to the first user 611 in low quality.
  • the server device may encode the second video image 650 to generate only base layer video data 653.
  • the server device can then send the encoded at least one bitstream to the client device used by the first user 611.
  • the server device is the second user.
  • the image of 613 may be transmitted as base layer video data and at least one enhancement layer video data in scalable video coding.
  • the server device may transmit only the base layer video data for the image of the third user 615.
  • FIG. 7 is a diagram illustrating an exemplary image transmission method.
  • the server device may receive a video image and signaling data from at least one client device using a communication unit.
  • the server device may extract the signaling data using the signaling data extractor.
  • the signaling data may include view information and zoom area information.
  • the gaze information may indicate whether the first user looks at the second user or the third user.
  • the gaze information may indicate a direction from the first user to the second user.
  • the zoom area information may indicate an enlargement range and / or a reduction range of the video image corresponding to the user's gaze direction.
  • the zoom area information may indicate a viewing angle of the user.
  • the server device may then generate the entire video for the video conference in the virtual space using the video generating unit.
  • the server device may grasp image configuration information about a viewpoint and a zoom region viewed by each user in the virtual space based on the signaling data using the ROI determiner.
  • the server device may determine the ROI of the user based on the image configuration information using the ROI determiner.
  • the video image corresponding to the gaze direction viewed by the first user occupies a large area of the second user, and the third user occupies a small area or may not be included in the video image. It may be.
  • the ROI may be an area including the second user. The ROI may be changed according to the gaze information and the zoom area information.
  • the server device may receive new signaling data.
  • the server device may determine a new region of interest based on the new signaling data.
  • the server device may determine whether the data currently processed based on the signaling data is data corresponding to the ROI, using the control unit.
  • the server device may determine whether the data currently being processed is data corresponding to the ROI based on the new signaling data.
  • the server device may encode a video image (eg, the region of interest) corresponding to the viewpoint of the user with high quality by using an encoder (740).
  • the server device may generate base layer video data and enhancement layer video data for the corresponding video image and transmit them.
  • the server device may transmit a video image (new region of interest) corresponding to a new view as a high quality image. If the server device is transmitting a low quality image, but the signaling data is changed and the server device transmits the high quality image, the server device may further generate and / or transmit enhancement layer video data.
  • the server device may encode a video image (eg, the non-ROI) that does not correspond to the user's viewpoint with low quality (750). For example, the server device may generate only base layer video data for a video image that does not correspond to a user's viewpoint, and transmit the base layer video data.
  • a video image eg, the non-ROI
  • the server device may generate only base layer video data for a video image that does not correspond to a user's viewpoint, and transmit the base layer video data.
  • the server device may transmit a video image (new non-interest region) that does not correspond to the viewpoint of the new user as a low quality image. If the server device was previously transmitting high quality video but the signaling data changed and the server device transmitted the low quality video, the server device no longer generates and / or transmits at least one enhancement layer video data. Only hierarchical video data may be generated and / or transmitted.
  • Enhancement layer video data may be received for a video image (eg, a region of interest) corresponding to a gaze direction of.
  • the client device may provide a user with a high quality video image within a short time.
  • the exemplary method of the present specification has a great advantage over the simple pre-caching method of receiving only data of some additional area in advance, or a method of receiving only data of an area corresponding to a user's gaze direction.
  • Exemplary methods herein can lower the overall bandwidth as compared to conventional methods of sending all data in high quality.
  • the exemplary method herein may speed up video processing in response to user eye movement in real time.
  • the conventional method is a video for expressing a third user by grasping this movement with a client device (for example, a sensor of an HMD) when the first user looks at the second user and turns to the third user. Process the information and play it on the screen. Since the conventional method is difficult to process the image of a new area very quickly, the conventional method uses an inefficient method of receiving all data in advance.
  • a client device for example, a sensor of an HMD
  • the exemplary technique of the present specification performs adaptive video transmission through the above scalable video
  • the user quickly responds to the user by using the existing base layer data. can do.
  • Exemplary techniques herein can reproduce video images faster than when processing full high definition data.
  • the example techniques herein can process video images in rapid response to eye movement.
  • FIG. 8 is a diagram illustrating an example method of signaling a region of interest.
  • FIG. (A) it illustrates a method of signaling a region of interest in scalable video.
  • the server device may divide one video image (or picture) into several tiles having a rectangular shape.
  • the video image may be partitioned on the basis of a Coding Tree Unit (CTU) unit.
  • CTU Coding Tree Unit
  • one CTU may include Y CTB, Cb CTB, and Cr CTB.
  • the server device may encode video layers of the base layer as a whole without segmenting them into tiles for fast user response.
  • the server device may encode a video image of one or more enhancement layers by dividing a part or the whole into several tiles as necessary.
  • the server device may divide the video image of the enhancement layer into at least one tile and encode tiles corresponding to a region of interest (ROI).
  • ROI region of interest
  • the region of interest 810 is the position of the tiles where the important object to be seen by the user in the virtual space (eg, a position where a new enemy appears in a game, a speaker's position in the virtual space), and And / or where the user's gaze looks.
  • the server device may generate the ROI information including tile information for identifying at least one tile included in the ROI.
  • the ROI information may be generated by the ROI determiner, the signaling data generator, and / or an encoder.
  • the tile information of the region of interest 810 may be effectively compressed even if all the tiles are not numbered.
  • the tile information may include not only the numbers of all tiles corresponding to the ROI, but also the start and end numbers of the tiles, coordinate point information, a list of coding unit (CU) numbers, and a tile number expressed by a formula.
  • the tile information of the non-interested region may be sent to other client devices, image processing computing equipment, and / or servers after undergoing Entropy coding provided by the encoder.
  • the ROI information can be transmitted through a high-level syntax protocol that carries Session information.
  • the ROI information may be transmitted in packet units such as Supplementary Enhancement Information (SEI), video usability information (VUI), and Slice Header (Slice Header) of the video standard.
  • SEI Supplementary Enhancement Information
  • VUI video usability information
  • Slice Header Slice Header
  • the ROI information may be delivered as a separate file describing the video file (e.g. DASH MPD).
  • the video conferencing system can lower overall bandwidth and reduce video processing time by transmitting and / or receiving only necessary tiles of the enhancement layer between client devices and / or between client and server devices through signaling of region of interest information. This is important to ensure fast HMD user response time.
  • FIG. (B) shows a method of signaling a region of interest in a single screen video.
  • An exemplary technique of the present specification may use a technique of degrading image quality by downscaling (downsampling) a region that is not a region of interest (ROI) in a single screen image that is not scalable video.
  • the prior art does not share filter information 820 written for downscaling between terminals using a service, and promises only one technology from the beginning, or only the encoder knows the filter information.
  • the server device may transmit the filter information 820 used at the time of encoding to the client device in order to improve the quality of the region outside the region of interest downscaled by the client device (or the HMD terminal) receiving the encoded image. Can be. This technology can actually significantly reduce image processing time and provide picture quality improvement.
  • the server device may generate the region of interest information.
  • the ROI information may further include filter information as well as tile information.
  • the filter information may include the number of promised filter candidates and values used in the filter.
  • FIG. 9 is a diagram illustrating an exemplary configuration of a client device.
  • the client device 900 may include an image input unit 910, an audio input unit 920, a sensor unit 930, an image output unit 940, an audio output unit 950, a communication unit 960, and / or a controller 970. It may include at least one of.
  • the client device 900 may be a head mounted display (HMD).
  • the controller 970 of the client device 900 may be included in the client device 900 or may exist as a separate device.
  • the image input unit 910 may capture a video image.
  • the image input unit 910 may include at least one of a 2D / 3D camera and / or an immersive camera that acquires an image of a user.
  • the 2D / 3D camera may capture an image having a viewing angle of 180 degrees or less.
  • Immersive cameras can capture images with a viewing angle of less than 360 degrees.
  • the audio input unit 920 may record a user's voice.
  • the audio input unit 920 may include a microphone.
  • the sensor unit 930 may acquire information about the movement of the user's gaze.
  • the sensor unit 930 may include a gyro sensor for detecting a change in azimuth of an object, an acceleration sensor for measuring an acceleration or impact strength of a moving object, and an external sensor for detecting a user's gaze direction.
  • the sensor unit 930 may include an image input unit 910 and an audio input unit 920.
  • the image output unit 940 may output image data received from the communication unit 960 or stored in a memory (not shown).
  • the audio output unit 950 may output audio data received from the communication unit 960 or stored in a memory.
  • the communication unit 960 may communicate with an external client device and / or server device through a broadcast network and / or broadband.
  • the communication unit 960 may include a transmitter (not shown) for transmitting data and / or a receiver (not shown) for receiving data.
  • the controller 970 may control all operations of the client device 900.
  • the controller 970 may process video data and signaling data received from the server device. Details of the controller 970 will be described below.
  • FIG. 10 is a diagram illustrating an exemplary configuration of a controller.
  • the controller 1000 may process signaling data and / or video data.
  • the controller 1000 may include at least one of a signaling data extractor 1010, a decoder 1020, a speaker determiner 1030, a gaze determiner 1040, and / or a signaling data generator 1050. .
  • the signaling data extractor 1010 may extract signaling data from data received from the server device and / or another client device.
  • the signaling data may include ROI information.
  • the decoder 1020 may decode video data based on the signaling data. For example, the decoder 1020 may decode the entire image in a customized manner for each user based on the gaze direction of each user. For example, when the first user looks at the second user in the virtual space, the decoder 1020 of the first user may decode the image corresponding to the second user in high definition based on the first user's gaze in the virtual space. The video corresponding to the third user may be decoded with low quality. According to an embodiment, the decoder 1020 may include at least one of a signaling data extractor 1010, a speaker determiner 1030, a gaze determiner 1040, and / or a signaling data generator 1050. .
  • the speaker determination unit 1030 may determine who the speaker is in the virtual space based on the voice and / or the given option.
  • the gaze determiner 1040 may determine the gaze of the user in the virtual space and generate image configuration information.
  • the image configuration information may include gaze information indicating a gaze direction and / or zoom area information indicating a viewing angle of a user.
  • the signaling data generator 1050 may generate signaling data for transmission to the server device and / or another client device.
  • the signaling data may transmit image configuration information.
  • the signaling data may be transmitted through at least one of a Supplement Enhancement Information (SEI), a video usability information (VUI), a Slice Header, and a file describing video data.
  • SEI Supplement Enhancement Information
  • VUI video usability information
  • Slice Header a file describing video data.
  • 11 is a diagram illustrating an exemplary configuration of a decoder.
  • Decoder 1100 may include at least one of extractor 1110, base layer decoder 1120, and / or at least one enhancement layer decoder 1130.
  • the decoder 1100 may decode a bitstream (video data) using an inverse process of the scalable video coding method.
  • the extractor 1110 may receive a bitstream (video data) including video data and signaling data and selectively extract a bitstream according to the image quality of an image to be reproduced.
  • the bitstream (video data) is a base layer bitstream (base layer video data) for the base layer and at least one enhancement layer bitstream (enhancement layer video data) for at least one enhancement layer predicted from the base layer. ) May be included.
  • the base layer bitstream (base layer video data) may include video data for the entire area of the virtual space.
  • At least one enhancement layer bitstream (enhanced layer video data) may include video data for the region of interest within the entire region.
  • the signaling data may include ROI information indicating an ROI corresponding to the gaze direction of the user in the entire area of the virtual space for the video conference service.
  • the base layer decoder 1120 may decode a bitstream (or base layer video data) of a base layer for a low quality image.
  • the enhancement layer decoder 1130 may decode at least one bitstream (or enhancement layer video data) of at least one enhancement layer for high quality video based on the signaling data and / or the bitstream (or base layer video data) of the base layer. have.
  • FIG. 12 is a diagram illustrating an exemplary method of generating and / or transmitting image configuration information.
  • the image configuration information may include at least one of gaze information indicating a gaze direction of a user and / or zoom area information indicating a viewing angle of the user.
  • the user's gaze refers to the direction that the user looks in the virtual space, not the real space.
  • the gaze information may include not only information indicating a direction of a gaze of the current user, but also information indicating a gaze direction of the user in the future (for example, information about a gaze point expected to receive attention).
  • the client device may sense an operation of looking at another user located in a virtual space centered on the user and process the same.
  • the client device may receive the sensing information from the sensor unit by using the controller and / or the gaze determination unit.
  • the sensing information may be an image photographed by a camera and a voice recorded by a microphone.
  • the sensing information may be data sensed by a gyro sensor, an acceleration sensor, and an external sensor.
  • the client device may identify a movement of the user's gaze based on the sensing information by using the controller and / or the gaze determination unit. For example, the client device may check the movement of the user's gaze based on the change in the value of the sensing information.
  • the client device may generate image configuration information in the virtual conference space by using the controller and / or the gaze determiner. For example, when the client device physically moves or the user's gaze moves, the client device may calculate the gaze information and / or the zoom area information of the user in the virtual conference space based on the sensing information.
  • the client device may transmit image configuration information to the server device and / or another client device using the communication unit.
  • the client device may transfer the image configuration information to its other components.
  • the present invention is not limited thereto, and the server device may receive sensing information from the client device and generate image configuration information.
  • an external computing device connected with the client device may generate the image configuration information, and the computing device may deliver the image configuration information to its client device, another client device, and / or a server device.
  • FIG. 13 is a diagram illustrating an example method for a client device to signal image configuration information.
  • the part of signaling image configuration information (including viewpoint information and / or zoom region information) is very important. If the signaling of the video configuration information is too frequent, it may burden the client device, the server device, and / or the entire network.
  • the client device may signal the image configuration information only when the image configuration information (or the gaze information and / or the zoom area information) of the user is changed. That is, the client device may transmit the gaze information of the user to other client devices and / or server devices only when the gaze information of the user is changed.
  • the gaze information may be signaled to the client device or the server device of another user only when the speaker who makes the voice differs from the user's gaze direction by using the point that the speaker is usually noticed in the video conference.
  • the client device may have options on the system (eg, speaker and / Alternatively, the lecturer may obtain information on the speaker through setting as the second user.
  • the client device may determine who is the speaker in the virtual space area for the video conference by using the controller and / or the speaker determination unit (1310). For example, the client device may determine who is the speaker based on the sensing information. In addition, the client device may determine who is the speaker according to the given options.
  • the client device may determine the gaze of the user by using the controller and / or the gaze determination unit (1320). For example, the client device may generate image configuration information based on the gaze of the user using the controller and / or the gaze determiner.
  • the client device may determine whether the user's eyes are directed to the speaker by using the controller and / or the gaze determination unit (1330).
  • the client device may not signal the image configuration information using the communication unit (1340). In this case, the client device may continue to receive the image of the speaker in the user's gaze direction with high quality, and may receive the image that is not in the user's gaze direction with the low quality.
  • the client device may signal the image configuration information using the communicator (1350). For example, if the user's gaze first directed to the speaker but later changed to another place, the client device may signal image configuration information for the user's new gaze direction. That is, the client device may transmit image configuration information for the new gaze direction to other client devices and / or server devices. In this case, the client device may receive the image corresponding to the new gaze of the user with high quality, and the image corresponding to the new gaze of the user (for example, the video corresponding to the speaker) may be received with low quality. have.
  • the client device generates and / or transmits the image configuration information.
  • the server device receives the sensing information from the client device, generates the image configuration information based on the sensing information, and generates the image configuration information. It can also be sent to one client device.
  • the video conference system may display the speaker's video information in the base layer data and the enhancement layer data. Can be transmitted as scalable video data.
  • the video conferencing system may receive signaling from a user looking at a user other than the speaker, and may transmit video information of the other user as scalable video data of base layer data and enhancement layer data. Through this, the video conferencing system can provide fast and high quality video information to the user while greatly reducing the signaling on the entire system.
  • the above-mentioned signaling may be signaling between a server device, a client device, and / or an external computing device (if present).
  • the above-mentioned signaling may be signaling between a client device and / or an external computing device (if present).
  • FIG. 14 is a diagram illustrating an exemplary method of transmitting a high / low level image.
  • the method of transmitting a high / low level image based on the user's gaze information is a method of switching a scalable codec layer (1410), a rate control method using a single bitstream and a QP (Quantization Parameter) in real time encoding. (1420), a single bitstream such as DASH switching in units of chunks (1430), Down Scaling / Up Scaling method (1440), and / or in the case of Rendering high definition rendering method using more resources (1450) It may include.
  • the quantization coefficient (1420, Quantization Parameter) or Down / Up scaling Adjusting the degree 1440 may provide advantages such as lowering the overall bandwidth, quickly responding to user eye movement, and the like.
  • the exemplary technique of the present specification switches between high level images and low level images in chunks. It may provide (1430).
  • the present specification takes a video conferencing system as an example, the present specification may be equally applicable to VR (Augmented Reality), AR (Augmented Reality) game, etc. using the HMD. That is, all of the techniques for providing a high level image of an area corresponding to the user's gaze and signaling only when the user looks at an area other than an area or an object that the user is expected to see. The same applies as in the example.
  • 15 is a diagram illustrating an exemplary image decoding method.
  • the image decoding apparatus may include at least one of an extractor, a base layer decoder, and / or an enhancement layer decoder.
  • the contents of the image decoding apparatus and / or the image decoding method may include all related contents among the above descriptions of the server device and / or the image decoding apparatus (or the decoder).
  • the image decoding apparatus may use the extractor to receive a bitstream including video data and signaling data (1510).
  • the image decoding apparatus may extract signaling data, base layer video data, and / or at least one enhancement layer video data from the video data.
  • the image decoding apparatus may decode base layer video data using a base layer decoder (1520).
  • the image decoding apparatus may decode at least one enhancement layer video data based on the signaling data and the base layer video data using the enhancement layer decoder (1530).
  • video data may include the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
  • the signaling data may include ROI information indicating an ROI corresponding to the gaze direction of the user in the entire area of the virtual space for the video conference service.
  • the base layer video data may include video data for the entire region
  • the at least one enhancement layer video data may include video data for the region of interest in the entire region.
  • the at least one enhancement layer may be divided into at least one tile having a rectangular shape for each layer, and the ROI information may include tile information for identifying at least one tile included in the ROI.
  • the ROI information is generated based on the image configuration information
  • the image configuration information may include gaze information indicating a direction of the user's gaze in a virtual space and zoom area information indicating the user's viewing angle.
  • the image configuration information may be signaled when the gaze direction of the user does not face the speaker.
  • the signaling data may be transmitted through at least one of Supplementary Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing the video data.
  • SEI Supplementary Enhancement Information
  • VUI video usability information
  • Slice Header a file describing the video data.
  • 16 is a diagram illustrating an exemplary video encoding method.
  • the image encoding apparatus may include at least one of a base layer encoder, an enhancement layer encoder, and / or a multiplexer.
  • the contents of the image encoding apparatus and / or the image encoding method may include all related contents among the descriptions of the client device and / or the image encoding apparatus (or the encoder) described above.
  • the image encoding apparatus may generate base layer video data using the base layer encoder (1610).
  • the apparatus for encoding an image may generate at least one enhancement layer video data based on the signaling data and the base layer video data using the enhancement layer encoder.
  • the apparatus for encoding an image may generate a bitstream including video data and signaling data using a multiplexer.
  • the image encoding apparatus and / or the image encoding method may perform an inverse process of the image decoding apparatus and / or the image decoding method.
  • common features may be included for this purpose.
  • FIG. 17 is a diagram illustrating an exemplary syntax of ROI information.
  • the ROI information (sighted_tile_info) for each video picture is shown.
  • the ROI information may include at least one of info_mode information, tile_id_list_size information, tile_id_list information, cu_id_list_size information, cu_id_list information, user_info_flag information, user_info_size information, and / or user_info_list.
  • the info_mode information may indicate a mode of information expressing a region of interest for each picture.
  • the info_mode information may be represented by 4 bits of unsigned information.
  • the info_mode information may indicate the mode of the included information. For example, when the value of the info_mode information is '0', the info_mode information may indicate that the previous information mode is used as it is. If the value of the info_mode information is '1', the info_mode information may indicate a list of all tile numbers corresponding to the ROI. If the value of info_mode information is '2', info_mode information is the start number of consecutive tiles corresponding to the region of interest.
  • the info_mode information may indicate the number of the upper left and lower right tiles of the ROI. If the value of the info_mode information is '4', the info_mode information may indicate the number of tiles corresponding to the ROI and the number of coding units included in the tiles.
  • the tile_id_list_size information may indicate the length of the tile number list.
  • the tile_id_list_size information may be represented by 8 bits of unsigned information.
  • the tile_id_list information may include a tile number list based on the info_mode information. Each tile number may be represented by unsigned 8 bits of information.
  • the cu_id_list_size information may indicate the length of a coding unit list.
  • the cu_id_list_size information may be represented by unsigned 16 bits of information.
  • the cu_id_list information may include a list of coding unit numbers based on the info_mode information. Each coding unit number may be represented by unsigned 16 bits of information.
  • the user_info_flag information may be a flag indicating additional user information mode.
  • the user_info_flag information may indicate whether there is tile-related information that the user and / or provider additionally want to transmit.
  • the user_info_flag information may be represented by unsigned 1 bit information. For example, if the value of the user_info_flag information is '0', it may be indicated that there is no additional user information. If the value of the user_info_flag information is '1', it may indicate that there is additional user information.
  • the user_info_size information may indicate the length of additional user information.
  • the user_info_size information may be represented by unsigned 16 bits of information.
  • the user_info_list information may include a list of additional user information. Each additional user information may be represented by information of an unsignable changeable bit.
  • the ROI information for each file, chunk, and video picture group is shown.
  • the ROI information may include at least one of a version information field, an entire data size field, and / or at least one unit information field.
  • the region of interest information (sighted_tile_info) for each file, chunk, and video picture group is shown.
  • the ROI information may include at least one of version_info information, file_size information, and / or unit information.
  • the version_info information may indicate a version of the ROI information (or signaling standard).
  • the version_info information may be represented by unsigned 8 bits of information.
  • the file_size information may indicate the size of the unit information.
  • the file_size information may be represented by unsigned 64-bit information.
  • the file_size information may indicate a file size, chunk size, and video picture group size.
  • the unit information may include region of interest information for each file unit, chunk unit, and / or video picture group unit.
  • the unit information may include at least one of poc_num information, info_mode information, tile_id_list_size information, tile_id_list information, cu_id_list_size information, cu_id_list information, user_info_flag information, user_info_size information, and / or user_info_list information.
  • the poc_num information may indicate the number of a video picture.
  • the picture number field may indicate a picture order count (POC) in HEVC and a corresponding picture (frame) number in a general video codec.
  • POC picture order count
  • the poc_num information may be represented by unsigned 32 bits of information.
  • the tile_id_list_size information the tile_id_list information, the cu_id_list_size information, the cu_id_list information, the user_info_flag information, the user_info_size information, and / or the user_info_list information is the same as the above description, detailed description thereof will be omitted.
  • the ROI information may be generated at the server device (or an image transmitting apparatus) and transmitted to at least one client device (or an image receiving apparatus).
  • the ROI information may be generated in at least one client device (or image receiving apparatus) and transmitted to at least one client device (or image receiving apparatus) and / or server device (or image transmitting apparatus).
  • the client device and / or the controller of the client device may further include the above-described signaling data extractor, image generator, ROI determiner, signaling data generator, and / or encoder.
  • FIG. 18 is a diagram illustrating exemplary ROI information and an exemplary SEI message in XML format.
  • the ROI information (sighted_tile_info) may be expressed in an XML form.
  • the ROI information (sighted_tile_info) may include info_mode information ('3'), tile_id_list_size information ('6'), and / or tile_id_list information ('6, 7, 8, 9, 10, 11, 12'). It may include.
  • the payload syntax (Syntax) of the Supplemental Enhancement Information (SEI) message in the international video standard is shown.
  • the SEI message indicates additional information that is not essential in the decoding process of the video coding layer (VCL).
  • the region of interest information (sighted_tile_info, 1810) may be included in an SEI message of high efficiency video encoding (HEVC), MPEG-4 (MPEG-4), and / or advanced video encoding (AVC) and transmitted through a broadcast network and / or broadband.
  • HEVC high efficiency video encoding
  • MPEG-4 MPEG-4
  • AVC advanced video encoding
  • the SEI message may be included in the compressed video data.
  • 19 illustrates an example protocol stack of a client device.
  • the broadcast stack protocol stack is divided into a portion transmitted through a service list table (SLT) and a MPEG Media Transport Protocol (MMTP), and a portion transmitted through a real time object delivery over Unidirectional Transport (ROUTE). Can lose.
  • SLT service list table
  • MMTP MPEG Media Transport Protocol
  • ROUTE Unidirectional Transport
  • the SLT 1910 may be encapsulated through a User Datagram Protocol (UDP) and an Internet Protocol (IP) layer.
  • MPEG Media Transport Protocol (MMTP) may transmit data 1920 formatted in MPU (Media Processing Unit) format defined in MPEG media transport (MMT) and signaling data 1930 according to MMTP. These data can be encapsulated over the UDP and IP layers.
  • ROUTE is a non-timed data such as data 1960 and signaling data 1940 formatted in the form of a dynamic adaptive streaming over HTTP (DASH) segment, and a non-real time (NRT).
  • DASH dynamic adaptive streaming over HTTP
  • NRT non-real time
  • timed data 1950 may be transmitted. These data can also be encapsulated over the UDP and IP layers.
  • the part transmitted through SLT and MMTP and the part transmitted through ROUTE may be encapsulated again in the data link layer after being processed in the UDP and IP layers.
  • the broadcast data processed in the link layer may be multicast as a broadcast signal through a process such as encoding / interleaving in the physical layer.
  • the broadband protocol stack portion may be transmitted through the HyperText Transfer Protocol (HTTP) as described above.
  • HTTP HyperText Transfer Protocol
  • Data 1960 formatted in the form of a DASH segment, signaling data 1980, and data 1970 such as an NRT may be transmitted through HTTP.
  • the signaling data shown here may be signaling data regarding a service.
  • This data can be processed via the Transmission Control Protocol (TCP), IP layer, and then encapsulated at the link layer. Subsequently, the processed broadband data may be unicast to broadband through processing for transmission in the physical layer.
  • TCP Transmission Control Protocol
  • IP layer IP layer
  • a service can be a collection of media components that are shown to the user as a whole, a component can be of multiple media types, a service can be continuous or intermittent, a service can be real time or non-real time, and a real time service can be a sequence of TV programs. It can be configured as.
  • the service may include the aforementioned virtual reality service and / or augmented reality service.
  • the video data and / or audio data may include at least one of data 1920 formatted in MPU format, non timed data 1950 such as NRT, and / or data 1960 formatted in DASH segment form. It can be included in one.
  • the signaling data (eg, the first signaling data, the second signaling data) may be included in at least one of the SLT 1910, the signaling data 1930, the signaling data 1940, and / or the signaling data 1980. Can be.
  • SLT service layer signaling
  • Service signaling provides service discovery and description information and includes two functional components. These are bootstrap signaling through SLT 2010 and SLS 2020 and 2030. For example, SLS in MMTP may be represented by MMT signaling components 2030. These represent the information needed to discover and obtain user services. SLT 2010 allows the receiver to build a basic list of services and bootstrap the discovery of SLSs 2020 and 2030 for each service.
  • SLT 2010 enables very fast acquisition of basic service information.
  • SLS 2020 and 2030 allow the receiver to discover and access the service and its content components (such as video data or audio data).
  • the SLT 2010 may be transmitted through UDP / IP.
  • data corresponding to the SLT 2010 may be delivered through the most robust method for this transmission.
  • the SLT 2010 may have access information for accessing the SLS 2020 carried by the ROUTE protocol. That is, the SLT 2010 may bootstrap the SLS 2020 according to the ROUTE protocol.
  • the SLS 2020 is signaling information located in a layer above ROUTE in the above-described protocol stack and may be transmitted through ROUTE / UDP / IP. This SLS 2020 may be delivered via one of the LCT sessions included in the ROUTE session.
  • the SLS 2020 may be used to access a service component 2040 corresponding to a desired service.
  • the SLT 2010 may also have access information for accessing the SLM (MMT signaling component) 2030 carried by the MMTP.
  • the SLT 2010 may bootstrap to the SLM (MMT signaling component) 2030 according to the MMTP.
  • This SLS (MMT signaling component) 2030 may be carried by an MMTP signaling message defined in MMT.
  • the SLS (MMT signaling component) 2030 may be used to access a streaming service component (MPU) 2050 corresponding to a desired service.
  • the NRT service component 2060 is ROUTE.
  • the SLS (MMT signaling component) 2030 according to MMTP may also include information for accessing it.
  • SLS is carried over HTTP (S) / TCP / IP.
  • the service may be included in at least one of the service components 2040, the streaming service components 2050, and / or the NRT service components 2060.
  • the signaling data (eg, the first signaling data and the second signaling data) may be included in at least one of the SLT 2010, the SLS 2020, and / or the MMT signaling components 2030.
  • 21 is a diagram illustrating an example SLT.
  • SLT supports fast channel scan that allows the receiver to build a list of all the services it can receive by channel name, channel number, and so on.
  • the SLT also provides bootstrap information that allows the receiver to discover the SLS for each service.
  • the SLT may include at least one of @bsid, @sltCapabilities, sltInetUrl element, and / or Service element.
  • @bsid may be a unique identifier of the broadcast stream.
  • the value of @bsid can be unique at the local level.
  • @sltCapabilities means the specifications required for meaningful broadcasting in all services described in the SLT.
  • the sltInetUrl element refers to a URL (Uniform Resource Locator) value which can download ESG (Electronic Service Guide) data or service signaling information providing guide information of all services described in the corresponding SLT through a broadband network.
  • the sltInetUrl element may include @URLtype.
  • @URLtype refers to the type of file that can be downloaded through the URL indicated by the sltInetUrl element.
  • the service element may include service information.
  • the service element may include at least one of @serviceId, @sltSvcSeqNum, @protected, @majorChannelNo, @minorChannelNo, @serviceCategory, @shortServiceName, @hidden, @broadbandAccessRequired, @svcCapabilities, BroadcastSignaling element, and / or svcInetUrl element.
  • @serviceId is a unique identifier of the service.
  • @sltSvcSeqNum has a value that indicates information about whether the contents of each service defined in the SLT have changed.
  • @protected has a value of “true”, it means that one of the components that make up a service is protected in order to show the service on the screen.
  • @majorChannelNo means the major channel number of the service.
  • @minorChannelNo means that the service is minor channel number.
  • @serviceCategory indicates the type of service.
  • @hidden indicates whether the service should be shown to the user when scanning the service.
  • @broadbandAccessRequired indicates whether to connect to the broadband network in order to show the service meaningfully to the user.
  • @svcCapabilities specifies the specifications that must be supported to make the service meaningful to the user.
  • the BroadcastSignaling element includes a definition of a transport protocol, a location, and identifier values of signaling transmitted to a broadcast network.
  • the BroadcastSignaling element may include at least one of @slsProtocol, @slsMajorProtocolVersion, @slsMinorProtocolVersion, @slsPlpId, @slsDestinationIpAddress, @slsDestinationUdpPort, and / or @slsSourceIpAddress.
  • @slsProtocol represents the protocol over which the SLS of the service is transmitted.
  • @slsMajorProtocolVersion represents the major version of the protocol over which the SLS of the service is transmitted.
  • @slsMinorProtocolVersion represents the minor version of the protocol over which the SLS of the service is transmitted.
  • @slsPlpId indicates the PLP identifier through which the SLS is transmitted.
  • @slsDestinationIpAddress represents the destination IP address of SLS data.
  • @slsDestinationUdpPort represents the destination Port value of SLS data.
  • @slsSourceIpAddress represents the source IP address of SLS data.
  • the svcInetUrl element indicates a URL value for downloading ESG service or signaling data related to the service.
  • the svcInetUrl element may contain @URLtype.
  • @URLtype refers to the type of file that can be downloaded through the URL indicated by the svcInetUrl element.
  • 22 is a diagram illustrating an example code value of a serviceCategory attribute.
  • the service may not be specified. If the value of the serviceCategory attribute is '1', the service may be a linear audio / video service. If the value of the serviceCategory attribute is '2', the service may be a linear audio service. If the value of the serviceCategory attribute is '3', the service may be an app-based service. If the value of the serviceCategory attribute is '4', the service may be an electronic service guide (ESG) service. If the value of the serviceCategory attribute is '5', the service may be an emergency alert service (EAS).
  • ESG electronic service guide
  • EAS emergency alert service
  • the corresponding service may be a virtual reality and / or augmented reality service.
  • the value of the serviceCategory attribute may be '6' (2210).
  • FIG. 23 illustrates an example SLS bootstrapping and example service discovery process.
  • the receiver can obtain the SLT.
  • SLT is used to bootstrap SLS acquisition, and then SLS is used to acquire service components carried in a ROUTE session or an MMTP session.
  • the SLT provides SLS bootstrapping information such as PLPID (# 1), source IP address (sIP1), destination IP address (dIP1), and destination port number (dPort1). .
  • the SLT provides SLS bootstrapping information such as PLPID (# 2), destination IP address (dIP2), and destination port number (dPort2).
  • a broadcast stream is a concept of an RF channel defined in terms of carrier frequencies concentrated within a specific band.
  • PLP physical layer pipe
  • Each PLP has specific modulation and coding parameters.
  • the receiver can obtain the SLS fragments delivered to the PLP and IP / UDP / LCT sessions.
  • SLS fragments include a User Service Bundle Description / User Service Description (USBD / USD) fragment, a Service-based Transport Session Instance Description (S-TSID) fragment, and a Media Presentation Description (MPD) fragment. They are related to a service.
  • USBD / USD User Service Bundle Description / User Service Description
  • S-TSID Service-based Transport Session Instance Description
  • MPD Media Presentation Description
  • the receiver may obtain SLS fragments that are delivered in PLP and MMTP sessions. These SLS fragments may include USBD / USD fragments, MMT signaling messages. They are related to a service.
  • the receiver may obtain a video component and / or an audio component based on the SLS fragment.
  • one ROUTE or MMTP session may be delivered through a plurality of PLPs. That is, one service may be delivered through one or more PLPs. As described above, one LCT session may be delivered through one PLP. Unlike shown, components constituting one service may be delivered through different ROUTE sessions. In addition, according to an embodiment, components constituting one service may be delivered through different MMTP sessions. According to an embodiment, components constituting one service are connected to a ROUTE session and an MMTP session.
  • a component constituting one service may be delivered separately.
  • a component constituting one service may be delivered through a broadband (hybrid delivery).
  • service data eg, video component and / or audio component
  • signaling data eg, SLS fragment
  • 24 is a diagram illustrating an exemplary USBD / USD fragment for ROUTE / DASH.
  • the USBD / USD (User Service Bundle Description / User Service Description) fragment describes the service layer characteristics and provides a Uniform Resource Identifier (URI) reference for the S-TSID fragment and a URI reference for the MPD fragment. That is, the USBD / USD fragment may refer to the S-TSID fragment and the MPD fragment, respectively.
  • the USBD / USD fragment can be expressed as a USBD fragment.
  • the USBD / USD fragment can have a bundleDescription root element.
  • the bundleDescription root element may have a userServiceDescription element.
  • the userServiceDescription element may be an instance of one service.
  • the userServiceDescription element may include at least one of @globalServiceId, @serviceId, @serviceStatus, @fullMPDUri, @sTSIDUri, name element, serviceLanguage element, deliveryMethod element, and / or serviceLinakge element.
  • @globalServiceId can indicate a globally unique URI that identifies the service.
  • @serviceId is a reference to the corresponding service entry in the SLT.
  • @serviceStatus can specify the status of the service. The value indicates whether the service is enabled or disabled.
  • @fullMPDUri may reference an MPD fragment containing a description of the content component of the service delivered over broadcast and / or broadband.
  • @sTSIDUri may refer to an S-TSID fragment that provides access-related parameters to a transport session that delivers the content of the service.
  • the name element may indicate a name of a service.
  • the name element may include @lang indicating the language of the service name.
  • the serviceLanguage element may indicate an available language of the service.
  • the deliveryMethod element may be a container of transports related to information pertaining to the content of the service on broadcast and (optionally) broadband modes of access.
  • the deliveryMethod element may include a broadcastAppService element and a unicastAppService element.
  • Each subelement may have a basePattern element as a subelement.
  • the broadcastAppService element may be a DASH presentation delivered on a multiplexed or non-multiplexed form of broadcast containing corresponding media components belonging to the service over the duration of the media presentation to which it belongs. That is, each of the present fields may mean DASH presentations delivered through the broadcasting network.
  • the unicastAppService may be a DASH presentation delivered on a multiplexed or non-multiplexed form of broadband including constituent media content components belonging to the service over all durations of the media presentation to which it belongs. That is, each of the present fields may mean DASH representations delivered through broadband.
  • the basePattern may be a character pattern used by the receiver to match against all parts of the fragment URL used by the DASH client to request media segmentation of the parent presentation in the included period.
  • the serviceLinakge element may include service linkage information.
  • FIG. 25 is a diagram illustrating an example S-TSID fragment for ROUTE / DASH.
  • the Service-based Transport Session Instance Description (S-TSID) fragment provides a transport session description for one or more ROUTE / LCT sessions to which the media content component of the service is delivered and a description of the delivery object delivered in that LCT session.
  • the receiver may obtain at least one component (eg, video component and / or audio component) included in the service based on the S-TSID fragment.
  • the S-TSID fragment may include an S-TSID root element.
  • the S-TSID root element may include @serviceId and / or at least one RS element.
  • @serviceID may be a reference corresponding to a service element in USD.
  • the RS element may have information about a ROUTE session for delivering corresponding service data.
  • the RS element may include at least one of @bsid, @sIpAddr, @dIpAddr, @dport, @PLPID and / or at least one LS element.
  • @bsid may be an identifier of a broadcast stream to which the content component of broadcastAppService is delivered.
  • @sIpAddr may indicate the source IP address.
  • the source IP address may be a source IP address of a ROUTE session for delivering a service component included in a corresponding service.
  • @dIpAddr may indicate a destination IP address.
  • the destination IP address may be a destination IP address of a ROUTE session for delivering a service component included in a corresponding service.
  • @dport can represent a destination port.
  • the destination port may be a destination port of a ROUTE session for delivering a service component included in a corresponding service.
  • @PLPID may be an ID of a PLP for a ROUTE session represented by an RS element.
  • the LS element may have information about an LCT session that carries corresponding service data.
  • the LS element may include @tsi, @PLPID, @bw, @startTime, @endTime, SrcFlow and / or RprFlow.
  • @tsi may indicate a TSI value of an LCT session in which a service component of a corresponding service is delivered.
  • @PLPID may have ID information of a PLP for a corresponding LCT session. This value may override the default ROUTE session value.
  • @bw may indicate the maximum bandwiss value.
  • @startTime can indicate the start time of the LCT session.
  • @endTime may indicate an end time of the corresponding LCT session.
  • the SrcFlow element may describe the source flow of ROUTE.
  • the RprFlow element may describe the repair flow of ROUTE.
  • the S-TSID may include ROI information.
  • the RS element and / or the LS element may include ROI information.
  • FIG. 26 illustrates an exemplary MPD fragment.
  • the media presentation description (MPD) fragment may include a formal description of the DASH media presentation corresponding to the linear service of a given duration determined by the broadcaster. MPD fragments are primarily associated with linear services for the delivery of DASH fragments as streaming content.
  • the MPD provides the source identifiers for the individual media components of the linear / streaming service in the form of fragment URLs, and the context of the identified resource within the media presentation. MPD may be transmitted over broadcast and / or broadband.
  • the MPD fragment may include a period element, an adaptation set element, and a presentation element.
  • Period elements contain information about periods.
  • the MPD fragment may include information about a plurality of periods.
  • a period represents a continuous time interval of media content presentation.
  • the adaptation set element includes information about the adaptation set.
  • the MPD fragment may include information about a plurality of adaptation sets.
  • An adaptation set is a collection of media components that includes one or more media content components that can be interchanged.
  • the adaptation set may include one or more representations.
  • Each adaptation set may include audio of different languages or subtitles of different languages.
  • the representation element contains information about the representation.
  • the MPD may include information about a plurality of representations.
  • a representation is a structured collection of one or more media components, where there may be a plurality of representations encoded differently for the same media content component.
  • the electronic device may switch the received presentation to another presentation based on the updated information during media content playback. In particular, the electronic device may convert the received representation into another representation according to the bandwidth environment.
  • the representation is divided into a plurality of segments.
  • a segment is a unit of media content data.
  • the representation may be transmitted as a segment or part of a segment according to a request of the electronic device using the HTTP GET or HTTP partial GET method defined in HTTP 1.1 (RFC 2616).
  • the segment may include a plurality of sub-segments.
  • the subsegment may mean the smallest unit that can be indexed at the segment level.
  • the segment may include an Initialization Segment, a Media Segment, an Index Segment, and a BitstreamSwitching Segment.
  • the MPD fragment may include ROI information.
  • the period element, the adaptation set element, and / or the presentation element may include ROI information.
  • FIG. 27 is a diagram illustrating an exemplary process of receiving a virtual reality service through a plurality of ROUTE sessions.
  • the client device may receive the bitstream through the broadcast network.
  • the bit stream may include video data and second signaling data for the service.
  • the second signaling data may include an SLT 2710 and an SLS 2730.
  • the service may include a virtual reality service.
  • the service data may include base layer service data 2740 and enhancement layer service data 2750.
  • the bitstream may include at least one physical layer frame.
  • the physical layer frame may include at least one PLP.
  • the SLT 2710 may be transmitted through the PLP # 0.
  • the PLP # 1 may include a first ROUTE session ROUTE # 1.
  • the 1 ROUTE session ROUTE # 1 may include a first LCT session tsi-sls, a second LCT session tsi-bv, and a third LCT session tsi-a.
  • the SLS 2730 is transmitted through the first LCT session tsi-sls
  • the base layer video data 2740 is transmitted through the second LCT session tsi-bv
  • the third LCT session (tsi-sls). Audio data may be transmitted through tsi-a.
  • the PLP # 2 may include a second ROUTE session ROUTE # 2
  • the second ROUTE session ROUTE # 2 may include a fourth LCT session tsi-ev.
  • Enhancement layer video data (Video Segment) 2750 may be transmitted through a fourth LCT session tsi-ev.
  • the client device can then obtain the SLT 2710.
  • the SLT 2710 may include bootstrap information 2720 for obtaining the SLS 2730.
  • the client device may then obtain the SLS 2730 for the virtual reality service based on the bootstrap information 2720.
  • the SLS may include a USBD / USD fragment, an S-TSID fragment, and / or an MPD fragment.
  • At least one of the USBD / USD fragment, the S-TSID fragment, and / or the MPD fragment may include ROI information.
  • the MPD fragment includes ROI information.
  • the client device may then obtain the S-TSID fragment and / or the MPD fragment based on the USBD / USD fragment.
  • the client device may match the representation of the MPD fragment with the media component transmitted over the LCT session based on the S-TSID fragment and the MPD fragment.
  • the client device can then obtain the base layer video data 2740 and audio data based on the RS element (ROUTE # 1) of the S-TSID fragment.
  • the client device may also obtain enhancement layer video data 2750 and audio data based on the RS element (ROUTE # 2) of the S-TSID fragment.
  • the client device can then decode the service data (eg, base layer video data, enhancement layer video data, audio data) based on the MPD fragment.
  • service data eg, base layer video data, enhancement layer video data, audio data
  • the client device may decode the enhancement layer video data based on the base layer video data and / or region of interest information.
  • the enhancement layer video data is transmitted through the second ROUTE session (ROUTE # 2).
  • the enhancement layer video data may be transmitted through the MMTP session.
  • 28 is a diagram illustrating an exemplary configuration of a client device.
  • the client device A2800 may include at least one of an image input unit, an audio input unit, a sensor unit, an image output unit, an audio output unit, a communication unit A2810, and / or a controller A2820. Can be.
  • the details of the client device A2800 may include all the contents of the above-described client device.
  • the controller A2820 may include at least one of a signaling data extractor, a decoder, a speaker determiner, a gaze determiner, and / or a signaling data generator.
  • the details of the controller A2820 may include all of the above-described contents of the controller.
  • a client device may include a communication unit A2810 and / or a controller A2820.
  • the controller A2820 may include a base layer decoder A2821 and / or an enhancement layer decoder A2825.
  • the communication unit A2810 may receive a bitstream including video data for a virtual reality service.
  • the communication unit A2810 may receive a bitstream through a broadcast network and / or broadband.
  • the video data may include base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
  • the base layer decoder A2821 may decode the base layer video data.
  • the enhancement layer decoder A2825 may decode the at least one enhancement layer video data based on the base layer video data.
  • the at least one enhancement layer video data may be video data for at least one region of interest in a virtual space.
  • controller A2820 may further include a signaling data generator that generates first signaling data.
  • the first signaling data may include image configuration information.
  • the image configuration information may include at least one of gaze information indicating a gaze direction of the user and a zoom area information indicating the viewing angle of the user in the virtual space.
  • the controller A2820 may further include a gaze determination unit that determines whether a gaze area corresponding to the gaze information is included in the at least one ROI.
  • the communication unit A2810 may transmit the first signaling data to a server (or a server device, a transmitter, an image transmission device) and / or at least one client.
  • the server, the server device, and / or the at least one client device receiving the first signaling data may correspond to the gaze information corresponding to the gaze information in the at least one ROI. It may include. That is, the region of interest may include at least one of a region including the speaker in the virtual space, a region that is predetermined by using at least one enhancement layer video data, and a region of gaze corresponding to the gaze information.
  • the bitstream may further include second signaling data.
  • the communication unit A2810 may independently receive the base layer video data and the at least one enhancement layer video data based on the second signaling data through a plurality of sessions.
  • the communication unit A2810 may receive base layer video data through a first ROUTE session and receive at least one enhancement layer video data through at least one second ROUTE session.
  • the communication unit A2810 may receive base layer video data through a ROUTE session and receive at least one enhancement layer video data through at least one MMTP session.
  • the second signaling data may include at least one of service layer signaling data (or SLS) including information for acquiring the video data and a service list table (or SLT) including information for acquiring the service layer signaling data. It may include one.
  • SLS service layer signaling data
  • SLT service list table
  • the service list table may include a service category attribute indicating a category of a service.
  • the service category attribute may indicate the virtual reality service.
  • the service layer signaling data may include the ROI information.
  • the service layer signaling data may be included in an S-TSID fragment including information on a session in which at least one media component for the virtual reality service is transmitted, and in the at least one media component (video data and / or audio data). It may include at least one of an MPD fragment including information about, and a USBD / USD fragment including a URI value connecting the S-TSID fragment and the MPD fragment.
  • the MPD fragment may include ROI information indicating a location of the at least one ROI in the entire area of the virtual space.
  • the bitstream may further include region of interest information indicating a location of the at least one region of interest within the entire region of the virtual space.
  • the ROI information may be transmitted and / or received through at least one of a Supplemental Enhancement Information (SEI) message, a Video Usability Information (VUI) message, a slice header, and a file describing the video data.
  • SEI Supplemental Enhancement Information
  • VUI Video Usability Information
  • the at least one enhancement layer video data may be generated (encoded) and / or decoded based on the base layer video data and the ROI information.
  • the ROI information may include at least one of an information mode field indicating a mode of information representing the ROI for each picture and a tile number list field including a number of at least one tile corresponding to the ROI.
  • the information mode field may be the above-described info_mode information
  • the tile number list field may be the above-described tile_id_list information.
  • the tile number list field may include a number of all tiles corresponding to the ROI, starting numbers and ending numbers of consecutive tiles, and numbers of upper and lower right tiles of the ROI, based on the information mode field. It may include the number of the at least one tile in one of the manner.
  • the ROI information may further include a coding unit number list field indicating the ROI.
  • the coding unit number list field may be the above-described cu_id_list information.
  • the coding unit number list field may indicate the number of tiles corresponding to the ROI and the number of coding units included in the tile based on the information mode field.
  • the client device B2800 may include at least one of an image input unit, an audio input unit, a sensor unit, an image output unit, an audio output unit, a communication unit B2810, and / or a controller B2820.
  • the details of the client device B2800 may include all the contents of the client device A2800 described above.
  • controller B2820 may include at least one of the first processor B2821 and / or the second controller B2825.
  • the first processor B2821 may decode base layer video data.
  • the first processor B2821 may be a video processing unit (VPU) and / or a digital signal processor (DSP).
  • VPU video processing unit
  • DSP digital signal processor
  • the second processor B2825 may be electrically connected to the first processor to decode the at least one enhancement layer video data based on the base layer video data.
  • the second processor B2825 may be a central processing unit (CPU) and / or a graphics processing unit (GPU).
  • 29 is a diagram illustrating an exemplary configuration of a server device.
  • At least one client device may perform all operations of the server device (or image transmitting apparatus).
  • the server device or image transmitting apparatus.
  • the server device A2900, a transmitter, and an image transmission device may include a controller A2910 and / or a communicator A2920.
  • the controller A2920 may include at least one of a signaling data extractor, an image generator, an ROI determiner, a signaling data generator, and / or an encoder. Details of the server device A2900 may include all the contents of the server device described above.
  • the controller A2910 of the server device A2900 may include a base layer encoder A2911 and / or an enhancement layer encoder A2915.
  • the base layer encoder A2911 may generate base layer video data.
  • the enhancement layer encoder A2915 may generate at least one enhancement layer video data based on the base layer video data.
  • the communicator A2920 may transmit a bitstream including video data for a virtual reality service.
  • the communication unit A2920 may transmit a bitstream through a broadcast network and / or broadband.
  • the video data may also include the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
  • the at least one enhancement layer video data may be video data for at least one region of interest in a virtual space.
  • the communication unit A2920 may further receive the first signaling data.
  • the first signaling data may include image configuration information.
  • the ROI determiner of the controller A2910 may include the gaze area corresponding to the gaze information in the at least one ROI.
  • the signaling data generator of the controller A2910 may generate second signaling data.
  • the communication unit A2920 may independently transmit the base layer video data and the at least one enhancement layer video data through a plurality of sessions based on the second signaling data.
  • the second signaling data and / or the ROI information may include all of the above contents.
  • the server device B2900, a transmitter, and an image transmission device may include at least one of the controller B2910 and / or the communicator B2920.
  • the controller B2920 may include at least one of a signaling data extractor, an image generator, an ROI determiner, a signaling data generator, and / or an encoder. Details of the server device B2900 may include all the contents of the server device described above.
  • the controller B2910 of the server device B2900 may include a first processor B2911 and / or a second processor B2915.
  • the first processor B2911 may include a base layer encoder that generates base layer video data.
  • the second processor B2915 may be electrically connected to the first processor to generate (or encode) the at least one enhancement layer video data based on the base layer video data.
  • FIG. 30 is a diagram illustrating an exemplary operation of a client device.
  • the client device may include a communication unit and / or a control unit.
  • the control unit may include a base layer decoder and / or an enhancement layer decoder.
  • the controller may include a first processor and / or a second processor.
  • the client device may use the communication unit to receive a bitstream including video data for the virtual reality service (3010).
  • the video data may include base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
  • the client device may then decode (3020) the base layer video data using a base layer decoder and / or a first processor.
  • the client device may then decode (3030) the at least one enhancement layer video data based on the base layer video data using an enhancement layer decoder and / or a second processor.
  • the at least one enhancement layer video data may be video data for at least one region of interest in a virtual space.
  • the contents related to the operation of the client device may include all the contents of the client device described above.
  • FIG. 31 is a diagram illustrating an exemplary operation of a server device.
  • the server device may include a control unit and / or a communication unit.
  • the control unit may include a base layer encoder and / or an enhancement layer encoder.
  • the controller may include a first processor and / or a second processor.
  • the server device may generate base layer video data using the base layer encoder and / or the first processor (3110).
  • the server device may then use the enhancement layer encoder and / or the second processor to generate at least one enhancement layer video data based on the base layer video data (3120).
  • the server device may then use the communication unit to transmit the bitstream containing the video data for the virtual reality service.
  • the video data may include the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
  • the at least one enhancement layer video data may be video data for at least one region of interest in a virtual space.
  • the contents related to the operation of the server device may include all the contents of the server device described above.
  • the above-described method may be implemented as code that can be read by a processor in a medium in which a program is recorded.
  • the processor-readable medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, and may be implemented in the form of downloadable file.
  • the electronic device described above is not limited to the configuration and method of the above-described embodiments, but the embodiments may be configured by selectively combining all or some of the embodiments so that various modifications may be made. It may be.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé de réception d'une vidéo comprenant les étapes consistant à : recevoir un train de bits comprenant des données vidéo pour un service de réalité virtuelle, les données vidéo comprenant des données vidéo de couche de base pour une couche de base, et au moins un élément de données vidéo de couche d'amélioration pour au moins une couche d'amélioration prédite par la couche de base ; décoder les données vidéo de couche de base ; et décoder l'au moins élément de données vidéo de couche d'amélioration sur la base des données vidéo de couche de base, l'au moins un élément de données vidéo de couche d'amélioration étant des données vidéo pour au moins une zone d'intérêt dans l'espace virtuel.
PCT/KR2017/001087 2016-09-28 2017-02-01 Fourniture d'un service de réalité virtuelle en tenant compte de la zone d'intérêt WO2018062641A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2016-0125145 2016-09-28
KR1020160125145A KR101861929B1 (ko) 2016-09-28 2016-09-28 관심 영역을 고려한 가상 현실 서비스 제공

Publications (1)

Publication Number Publication Date
WO2018062641A1 true WO2018062641A1 (fr) 2018-04-05

Family

ID=61760922

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2017/001087 WO2018062641A1 (fr) 2016-09-28 2017-02-01 Fourniture d'un service de réalité virtuelle en tenant compte de la zone d'intérêt

Country Status (2)

Country Link
KR (1) KR101861929B1 (fr)
WO (1) WO2018062641A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019199025A1 (fr) 2018-04-09 2019-10-17 에스케이텔레콤 주식회사 Procédé et dispositif de codage/décodage d'image
US11509937B2 (en) 2018-04-09 2022-11-22 Sk Telecom Co., Ltd. Method and apparatus for encoding/decoding video
KR102183895B1 (ko) * 2018-12-19 2020-11-27 가천대학교 산학협력단 가상 현실 비디오 스트리밍에서의 관심영역 타일 인덱싱
KR102278748B1 (ko) * 2019-03-19 2021-07-19 한국전자기술연구원 360 vr 인터랙티브 중계를 위한 사용자 인터페이스 및 방법
KR102261739B1 (ko) * 2019-06-19 2021-06-08 주식회사 엘지유플러스 증강 현실 미디어 콘텐츠의 적응적 스트리밍 시스템 및 적응적 스트리밍 방법

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120041769A (ko) * 2009-07-28 2012-05-02 소니 컴퓨터 엔터테인먼트 인코포레이티드 화상파일 생성장치, 화상처리장치, 화상파일 생성방법, 및 화상처리방법
KR101540113B1 (ko) * 2014-06-18 2015-07-30 재단법인 실감교류인체감응솔루션연구단 실감 영상을 위한 영상 데이터를 생성하는 방법, 장치 및 이 방법을 실행하기 위한 컴퓨터 판독 가능한 기록 매체
KR20150122781A (ko) * 2013-04-08 2015-11-02 소니 주식회사 Shvc를 이용한 관심 영역 확장성

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120041769A (ko) * 2009-07-28 2012-05-02 소니 컴퓨터 엔터테인먼트 인코포레이티드 화상파일 생성장치, 화상처리장치, 화상파일 생성방법, 및 화상처리방법
KR20150122781A (ko) * 2013-04-08 2015-11-02 소니 주식회사 Shvc를 이용한 관심 영역 확장성
KR101540113B1 (ko) * 2014-06-18 2015-07-30 재단법인 실감교류인체감응솔루션연구단 실감 영상을 위한 영상 데이터를 생성하는 방법, 장치 및 이 방법을 실행하기 위한 컴퓨터 판독 가능한 기록 매체

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LEE , J-H ET AL: "Multi-channel Adaptive SVC Video Streaming with ROI", JOURNAL OF BROADCAST ENGINEERING, vol. 13, no. 1, 30 January 2008 (2008-01-30), pages 34 - 42, XP055603045, DOI: 10..5909/BE_2008.13.1.34 *
YAGO SANCHEZ ET AL: "Compressed Domain Video Processing for Tile Based Panorami c Streaming using SHVC", PROCEEDING IMMERSIVEME '15 PROCEEDINGS OF THE 3R D INTERNATIONAL WORKSHOP ON IMMERSIVE MEDIA EXPERIENCES, 30 October 2015 (2015-10-30), Brisbane, Australi, pages 13 - 18, XP058074928, DOI: 10.1145/2814347.2814353 *

Also Published As

Publication number Publication date
KR20180035089A (ko) 2018-04-05
KR101861929B1 (ko) 2018-05-28

Similar Documents

Publication Publication Date Title
WO2017188714A1 (fr) Procédé de transmission d'une vidéo à 360 degrés, procédé de réception d'une vidéo à 360 degrés, appareil de transmission d'une vidéo à 360 degrés, appareil de réception d'une vidéo à 360 degrés
WO2015126144A1 (fr) Procédé et appareil destinés à l'émission-réception combinée de signal de radiodiffusion destiné au service panoramique
WO2018038520A1 (fr) Procédé destiné à transmettre une vidéo omnidirectionnelle, procédé destiné à recevoir une vidéo omnidirectionnelle, appareil destiné transmettre une vidéo omnidirectionnelle et appareil destiné à recevoir une vidéo omnidirectionnelle
WO2018174387A1 (fr) Procédé d'envoi de vidéo à 360 degrés, procédé de réception de vidéo à 360 degrés, dispositif d'envoi de vidéo à 360 degrés et dispositif de réception de vidéo à 360 degrés
WO2012023789A2 (fr) Appareil et procédé de réception d'un signal de radiodiffusion numérique
WO2012036532A2 (fr) Procédé et appareil pour traiter un signal de télédiffusion pour un service de diffusion 3d (en 3 dimensions)
WO2018062641A1 (fr) Fourniture d'un service de réalité virtuelle en tenant compte de la zone d'intérêt
WO2016182371A1 (fr) Dispositif d'émission de signal de radiodiffusion, dispositif de réception de signal de radiodiffusion, procédé d'émission de signal de radiodiffusion, et procédé de réception de signal de radiodiffusion
WO2009151265A2 (fr) Procédé et système pour recevoir des signaux de radiodiffusion
WO2010021525A2 (fr) Procédé de traitement d'un service web dans un service en temps non réel et un récepteur de diffusion
WO2015034306A1 (fr) Procédé et dispositif pour transmettre et recevoir un contenu de diffusion uhd perfectionné dans un système de diffusion numérique
WO2014109594A1 (fr) Procédé et dispositif pour coder une vidéo entre couches pour compenser une différence de luminance, procédé et dispositif pour décoder une vidéo
WO2012030158A2 (fr) Procédé et appareil adaptés pour traiter et pour recevoir un signal de diffusion numérique pour un affichage en trois dimensions
WO2015080414A1 (fr) Procédé et dispositif d'émission et de réception d'un signal de diffusion pour assurer un service de lecture spéciale
WO2016171518A2 (fr) Émetteur de signal de radiodiffusion, récepteur de signal de radiodiffusion, procédé d'émission d'un signal de radiodiffusion et procédé de réception d'un signal de radiodiffusion
WO2015199468A1 (fr) Procédé et dispositif d'émission/réception d'un signal de diffusion
WO2012030176A2 (fr) Procédé et dispositif de traitement de signal de diffusion pour un service de diffusion en trois dimensions (3d)
WO2015133770A1 (fr) Appareil et procédés pour émettre/recevoir un signal de diffusion
WO2012030177A2 (fr) Récepteur numérique et procédé destiné à traiter un contenu 3d dans le récepteur numérique
WO2017061796A1 (fr) Dispositif d'émission de signal de radiodiffusion, dispositif de réception de signal de radiodiffusion, procédé d'émission de signal de radiodiffusion, et procédé de réception de signal de radiodiffusion
WO2017135673A1 (fr) Dispositif d'émission de signal de diffusion, dispositif de réception de signal de diffusion, procédé d'émission de signal de diffusion et procédé de réception de signal de diffusion
WO2016064150A1 (fr) Dispositif et procédé d'émission d'un signal de diffusion, dispositif et procédé de réception d'un signal de diffusion
WO2011132879A2 (fr) Procédé pour l'émission/réception d'un contenu sur internet et émetteur/récepteur l'utilisant
WO2021242066A1 (fr) Appareil et procédé de réalisation d'un codage par intelligence artificielle et d'un décodage par intelligence artificielle sur une image
WO2016171528A1 (fr) Appareil de transmission d'un signal de radiodiffusion, appareil de réception d'un signal de radiodiffusion, procédé de transmission d'un signal de radiodiffusion, et procédé de réception d'un signal de radiodiffusion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17856527

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17856527

Country of ref document: EP

Kind code of ref document: A1