KR20180035089A - Providing virtual reality service considering region of interest - Google Patents
Providing virtual reality service considering region of interest Download PDFInfo
- Publication number
- KR20180035089A KR20180035089A KR1020160125145A KR20160125145A KR20180035089A KR 20180035089 A KR20180035089 A KR 20180035089A KR 1020160125145 A KR1020160125145 A KR 1020160125145A KR 20160125145 A KR20160125145 A KR 20160125145A KR 20180035089 A KR20180035089 A KR 20180035089A
- Authority
- KR
- South Korea
- Prior art keywords
- video data
- information
- base layer
- interest
- data
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/332—Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/366—Image reproducers using viewer tracking
- H04N13/383—Image reproducers using viewer tracking for tracking with gaze detection, i.e. detecting the lines of sight of the viewer's eyes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- H04N7/155—Conference systems involving storage of or access to video conference sessions
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The present disclosure relates to a method and apparatus for receiving a bitstream comprising video data for a virtual reality service, the video data comprising at least one enhancement layer for base layer video data for a base layer and at least one enhancement layer predicted from the base layer Layer video data; Decoding the base layer video data; And decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data includes at least one enhancement layer video data for at least one region of interest
A method for receiving video data is disclosed.
Description
This specification relates to providing a virtual reality service in consideration of a region of interest.
Recently, various services have been realized as the technology and equipment of virtual reality (VR) have developed. Video conferencing services are examples of services implemented on the basis of virtual reality technology. A user may use a device for processing multimedia data including video information of a conference participant for video conferencing.
The present specification provides image processing that considers region of interest information within a virtual reality.
In addition, the present specification provides image processing of different quality according to the user's gaze information.
The present specification also provides image processing responsive to variations in the user's gaze.
In addition, the present specification provides signaling corresponding to a user's gaze variation.
According to another aspect of the present invention, there is provided an image receiving apparatus including a communication unit for receiving a bitstream including video data for a virtual reality service, the video data including base layer video data for a base layer and at least At least one enhancement layer video data for one enhancement layer; A base layer decoder for decoding the base layer video data; And an enhancement layer decoder for decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data is video data for at least one region of interest .
According to another aspect of the present invention, there is provided an image receiving apparatus including: a communication unit for receiving base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer; A first processor for decoding the base layer video data; And a second processor, electrically coupled to the first processor, for decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data is stored in the virtual space And may be video data for at least one region of interest.
According to another aspect of the present invention, there is provided an image transmission apparatus including: a base layer encoder for generating base layer video data; An enhancement layer encoder for generating at least one enhancement layer video data based on the base layer video data; And a communication unit for transmitting a bitstream including video data for a virtual reality service, the video data comprising at least the base layer video data for a base layer and the at least one enhancement layer for at least one enhancement layer predicted from the base layer. One enhancement layer video data, and the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.
In addition, another image receiving method according to another embodiment disclosed herein includes receiving a bitstream including video data for a virtual reality service, the video data including base layer video data for a base layer and prediction At least one enhancement layer video data for at least one enhancement layer to be enhanced; Decoding the base layer video data; And decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space .
According to another aspect of the present invention, there is provided an image transmission method including: generating base layer video data; Generating at least one enhancement layer video data based on the base layer video data; And transmitting a bitstream comprising video data for a virtual reality service, wherein the video data comprises at least the base layer video data for a base layer and the at least one enhancement layer for at least one enhancement layer predicted from the base layer. One enhancement layer video data, and the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.
According to the technique disclosed in this specification, the image processing apparatus can apply different image processing methods based on the user's gaze. According to the technique disclosed in this specification, the image processing method considering the user's gaze information minimizes the change in the image quality felt by the video conferencing device, for example, the HMD and the wearer, and saves bandwidth (BW) And reduction of power consumption by improving image processing performance.
1 is a diagram illustrating an exemplary video conferencing system.
2 is a diagram illustrating an exemplary video conferencing service.
3 is a diagram illustrating an exemplary scalable video coding service.
4 is a diagram showing an exemplary configuration of a server device.
5 is a diagram showing an exemplary structure of an encoder.
6 is a diagram illustrating an exemplary video conferencing service using scalable video coding.
7 is a diagram illustrating an exemplary image transmission method.
8 is a diagram illustrating an exemplary method of signaling a region of interest.
9 is a diagram showing an exemplary configuration of a client device.
10 is a diagram showing an exemplary configuration of the control unit.
11 is a diagram showing an exemplary configuration of a decoder.
12 is a diagram illustrating an exemplary method of generating and / or transmitting image configuration information.
13 is a diagram illustrating an exemplary method by which a client device signals image configuration information.
14 is a diagram illustrating an exemplary method of transmitting high / low level images.
15 is a diagram illustrating an exemplary image decoding method.
16 is a diagram illustrating an exemplary image encoding method.
17 is a diagram showing an exemplary syntax of the region of interest information.
18 is a diagram illustrating exemplary ROI information in an XML format and an exemplary SEI message.
19 is a diagram illustrating an exemplary protocol stack of a client device.
20 is an illustration showing an exemplary relationship between SLT and SLS (service layer signaling).
21 is a diagram showing an exemplary SLT.
22 is a diagram illustrating an exemplary code value of the serviceCategory attribute.
23 is a diagram illustrating an exemplary SLS bootstrapping and exemplary service discovery process.
24 is a diagram illustrating an exemplary USBD / USD fragment for ROUTE / DASH.
25 is a diagram illustrating an exemplary S-TSID fragment for ROUTE / DASH.
26 is a diagram illustrating an exemplary MPD fragment.
27 is a diagram illustrating an exemplary process of receiving a virtual reality service through a plurality of ROUTE sessions.
28 is a diagram showing an exemplary configuration of a client device.
29 is a diagram showing an exemplary configuration of a server device.
30 is a diagram illustrating an exemplary operation of a client device.
31 is a diagram showing an exemplary operation of the server device.
It is noted that the technical terms used herein are used only to describe specific embodiments and are not intended to limit the scope of the technology disclosed herein. Also, the technical terms used herein should be interpreted as being generally understood by those skilled in the art to which the presently disclosed subject matter belongs, unless the context clearly dictates otherwise in this specification, Should not be construed in a broader sense, or interpreted in an oversimplified sense. It is also to be understood that the technical terms used herein are erroneous technical terms that do not accurately represent the spirit of the technology disclosed herein, it is to be understood that the technical terms used herein may be understood by those of ordinary skill in the art to which this disclosure belongs And it should be understood. Also, the general terms used in the present specification should be interpreted in accordance with the predefined or prior context, and should not be construed as being excessively reduced in meaning.
As used herein, terms including ordinals, such as first, second, etc., may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the description of the technology, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals denote like or similar elements, and redundant description thereof will be omitted.
Further, in the description of the technology disclosed in this specification, a detailed description of related arts will be omitted if it is determined that the gist of the technology disclosed in this specification may be obscured. It is to be noted that the attached drawings are only for the purpose of easily understanding the concept of the technology disclosed in the present specification, and should not be construed as limiting the spirit of the technology by the attached drawings.
1 is a diagram illustrating an exemplary video conferencing system.
The video conferencing system can provide video conferencing services to at least one user located at a remote location. Videoconferencing is a service where people in different regions can meet each other face-to-face without having to meet each other directly.
The video conferencing system can be configured in two ways. First, the video conferencing system can be achieved using direct N: N communication between client devices (e.g., HMDs) of each user. In this case, since various signaling and image transmission are performed respectively, the total bandwidth is large, but the video conferencing system can provide an optimal image to each user.
Second, the video conferencing system may further include a server device (or relay system) for video conferencing. In this case, the server device may receive at least one video image from each client device, and may collect / select at least one video image to service each client device.
The exemplary techniques described herein can be applied to both of the above two video conferencing systems, and the following description will focus on the second embodiment.
The
The
For example, the
For example, the
Each
The
The
Each
2 is a diagram illustrating an exemplary video conferencing service.
Referring to the drawing, a
The video conferencing system can determine the line of sight of the speaker and / or the
In this case, the video conferencing system can transmit an image of the
As a result, compared with the conventional method of transmitting all the images as high-quality video images, the video conferencing system makes a difference in the image processing method based on the user's sight, saves bandwidth (BW) for image transmission, The image processing performance can be improved.
3 is a diagram illustrating an exemplary scalable video coding service.
Scalable video coding service is an image compression method for providing various services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. Scalable video coding services generally provide scalability in terms of spatial resolution, quality, and temporal aspects.
Spatial scalability can be provided by encoding the same image with different resolution for each layer. It is possible to adaptively provide image contents to devices having various resolutions such as a digital TV, a notebook, and a smart phone using spatial hierarchy.
Referring to the drawings, a scalable video coding service can support one or more TVs having different characteristics from a video service provider (VSP) through a home gateway in the home. For example, the scalable video coding service can simultaneously support HDTV (High-Definition TV), SDTV (Standard-Definition TV), and LDTV (Low-Definition TV) having different resolutions.
Temporal scalability can adaptively adjust the frame rate of an image in consideration of the network environment in which the content is transmitted or the performance of the terminal. For example, when a local area network is used, a service is provided at a high frame rate of 60 frames per second (FPS). When a wireless broadband communication network such as a 3G mobile network is used, a content is provided at a low frame rate of 16 FPS, A service can be provided so that the user can receive the video without interruption.
Quality scalability In addition, by providing contents of various image quality according to the network environment or the performance of the terminal, the user can stably reproduce the image contents.
The scalable video coding service may each include a base layer and one or more enhancement layers (s). The receiver provides a normal image quality when receiving only the base layer, and can provide a high image quality when the base layer and the enhancement layer are received together. In other words, when there is a base layer and one or more enhancement layers, when an enhancement layer (for example,
Thus, since the scalable video coding service is composed of a plurality of layers, the receiver can quickly receive the base layer data with a small capacity and quickly process and reproduce the image of general image quality, The service quality can be improved.
4 is a diagram showing an exemplary configuration of a server device.
The
The
The
The
The signaling
The
The region-of-
It is possible to generate the area of interest information indicating the area of interest corresponding to the direction of the user's gaze within the entire area.
The signaling
The
5 is a diagram showing an exemplary structure of an encoder.
The
The
The scalable video coding method is an image compression method for providing a variety of services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. For example, the
For example, the
The enhancement layer can be encoded by referring to information of a reference layer using an inter-layer prediction tool. The reference layer refers to the lower layer that is referred to in the enhancement layer encoding. Here, since there is a dependency between layers by using a layer-to-layer tool, in order to decode the image of the highest layer, a bitstream of all lower layers to be referred to is required. In the middle layer, decoding can be performed by acquiring only a bitstream of a layer to be decoded and its lower layers. The bit stream of the lowest layer is a base layer, and can be encoded by an encoder such as H.264 / AVC, HEVC, or the like.
The
The
The
6 is a diagram illustrating an exemplary video conferencing service using scalable video coding.
The client device receives the entire image as one compressed image bitstream, decodes it, and renders as many areas as the user views in a virtual space. Since the conventional technique transmits and / or receives a whole image (for example, a 360-degree immersive image) as a high-resolution (or high-quality) image, the total bandwidth of a bit stream having a high resolution image is very large .
The server device may use a scalable video coding method. Hereinafter, an exemplary technique will be described in detail.
The
The client device (not shown) can determine the line of sight of the speaker and the user in the virtual space, and can generate image configuration information. The client device may transmit the image configuration information to the server device and / or other client devices when the image configuration information is first created or when the user's gaze is not facing the speaker.
A server device (not shown) may receive video and signaling data from at least one client device, and may generate a full image of the
The server device may then encode at least one video image based on the signaling data. The server device encodes the quality of the video image (or the region of interest) corresponding to the viewing direction and the quality of the video image that does not correspond to the viewing direction differently based on the image configuration information (for example, the sight line information and the medium region information) . For example, the server device can encode a video image corresponding to the user's gaze direction at a high quality and a video image not corresponding to the user's gaze direction at a low quality.
Referring to FIG. 6, the
The
The server device may then transmit the encoded at least one bitstream to the client device used by the
As a result, when the
7 is a diagram illustrating a method of transmitting an image.
The server device can receive video image and signaling data from at least one client device using the communication unit. Further, the server device can extract the signaling data using the signaling data extracting unit. For example, the signaling data may include viewpoint information and zoom region information.
The gaze information may indicate whether the first user views the second user or the third user. If the first user views the direction of the second user in the virtual space, the gaze information may indicate the direction from the first user to the second user.
The zoom area information may indicate an enlarged range and / or a reduced range of the video image corresponding to the user's gaze direction. In addition, the zoom area information can indicate the viewing angle of the user. If the video image is enlarged based on the value of the zoom area information, the first user can view only the second user. If the video image is reduced based on the value of the zoom area information, the first user can see part and / or entirety of the third user as well as the second user.
Then, the server device can generate the entire image for the video conference in the virtual space using the image generating unit.
Then, the server device can determine the image configuration information for the viewpoint and the zoom region of each user in the virtual space based on the signaling data using the region-of-
Then, the server device can determine the region of interest of the user based on the image configuration information using the region-of-interest determination unit (720).
When the first user views the second user, the video image corresponding to the viewing direction of the first user occupies a large area of the second user, the third user occupies a small area, It is possible. In this case, the region of interest may be a region including the second user. The region of interest may be changed according to the gaze information and the zoom region information.
When the signaling data (for example, at least one of the view information and the zoom area information) is changed, the server device can receive new signaling data. In this case, the server device can determine a new region of interest based on the new signaling data.
Then, the server device can use the control unit to determine whether the data currently processed based on the signaling data is data corresponding to the region of interest.
When the signaling data is changed, the server device can determine whether or not the data currently processed based on the new signaling data is data corresponding to the region of interest.
In case of data corresponding to a region of interest, the server device may encode a video image (for example, a region of interest) corresponding to a user's viewpoint at a high quality using an encoder (740). For example, the server device can generate base layer video data and enhancement layer video data for the video image and transmit them.
When the signaling data is changed, the server device can transmit a video image (new interest area) corresponding to a new time point as a high-quality image. If the server device is transmitting a low-quality image but the signaling data is changed so that the server device transmits a high-quality image, the server device can additionally generate and / or transmit enhancement layer video data.
If the data does not correspond to a region of interest, the server device may encode a video image (e.g., a non-interest region) that does not correspond to a user's viewpoint at a low quality (750). For example, the server device may generate only base layer video data for video images not corresponding to the user's viewpoint, and may transmit them.
When the signaling data is changed, the server device can transmit a video image (new non-interest area) not corresponding to a new user's viewpoint with a low-quality image. In the case where the server device is transmitting a high quality image but the signaling data is changed and the server device transmits a low quality image, the server device does not generate and / or transmit at least one enhancement layer video data, Only hierarchical video data can be generated and / or transmitted.
That is, since the image quality of the video image when the base layer video data is received is lower than the image quality of the video image when the enhancement layer video data is received, the client device, at the moment when the user obtains the information, Layer video data for a video image (e.g., region of interest) corresponding to the viewing direction of the video data. Then, the client device can provide the user with a high-quality video image in a short time.
The exemplary method of the present invention has a great advantage over a simple pre-caching method in which only a part of additional area data is transmitted in advance, or a method of receiving only data in an area corresponding to the direction of the user's sight line.
The exemplary method herein can reduce the overall bandwidth compared to conventional methods of sending all data at high quality.
In addition, the exemplary method herein can increase the speed of video processing by reacting in real time to user gaze movements.
In the conventional method, when a first user looks at a second user and turns his / her head to a third user, a video for identifying the motion with a client device (e.g., a sensor of the HMD) Process information and play it on the screen. The conventional method is very difficult to process the image of the new area very quickly, and the conventional method uses an inefficient method of receiving all the data in advance.
However, since the exemplary technique of the present invention has an adaptive video transmission through the scalable video, when the first user turns his head to the third user, the base layer data that he already has, can do. The exemplary techniques herein can reproduce video images faster than when processing the entire high definition data. Thus, the exemplary techniques herein are capable of rapidly processing video images in response to eye movements.
8 is a diagram illustrating an exemplary method of signaling a region of interest.
Referring to Figure (a), there is shown a method of signaling a region of interest in scalable video.
A server device (or an encoder) can divide one video image (or picture) into a plurality of tiles having a rectangular shape. For example, a video image can be partitioned into Coding Tree Unit (CTU) units. For example, one CTU may include Y CTB, Cb CTB, and Cr CTB.
The server device can encode the video image of the base layer as a whole without dividing the video image into a tile for fast user response. In addition, the server device may encode a video image of one or more enhancement layers by dividing a part or all of the video image into a plurality of tiles as needed.
That is, the server device may divide the video image of the enhancement layer into at least one tile and encode tiles corresponding to the region of interest 810 (ROI, Region of Interest).
In this case, the area of
The server device may also generate region of interest information including tile information identifying at least one tile included in the region of interest. For example, the region of interest information may be generated by the region of interest determiner, the signaling data generator, and / or the encoder.
Since the tile information of the region of
The tile information in the non-interest region may be sent to another client device, image processing computing device, and / or server after entropy coding provided by the encoder.
The region of interest can be delivered through a high-level syntax protocol that carries Session information. In addition, the region of interest may be transmitted in packet units such as SEI (Supplement Enhancement Information), VUI (video usability information), and slice header of a video standard. In addition, the region of interest information may be transferred to a separate file describing the video file (e.g., MPD of DASH).
The video conferencing system can reduce the overall bandwidth and video processing time by transmitting and / or receiving only the required tiles of the enhancement layer between the client devices and / or between the client device and the server device through signaling of the area of interest information. This is important to ensure fast HMD user response time.
Referring to Figure (b), there is shown a method of signaling a region of interest in a single screen video.
The exemplary technique of the present invention can use a technique of reducing the image quality by downscaling (downsampling) an area other than a ROI in a single-screen image, rather than a scalable video. The prior art does not share the
However, the server device transmits the
As described above, the server device may generate the region of interest information. For example, the area of interest information may further include filter information as well as tile information. For example, the filter information may include the number of promised filter candidates, the values used in the filter.
9 is a diagram showing an exemplary configuration of a client device.
The
The
The
The
The
The
The
The
10 is a diagram showing an exemplary configuration of the control unit.
The
The signaling
The
The
The line of
The signaling
11 is a diagram showing an exemplary configuration of a decoder.
The
The
The
The signaling data may also include region of interest information indicating a region of interest corresponding to the direction of the user's gaze within the entire region of the virtual space for the video conferencing service.
The
The
12 is a diagram illustrating an exemplary method of generating and / or transmitting image configuration information.
Hereinafter, a method of generating image configuration information for responding to the movement of the user's gaze in real time will be described.
The image configuration information may include at least one of gaze information indicating a gaze direction of a user and / or zoom area information indicating a viewing angle of a user. The user's gaze is the direction that the user looks in the virtual space, not the actual space. In addition, the gaze information may include information indicating the gaze direction of the user in the future (for example, information on gaze points that are expected to receive attention), as well as information indicating the gaze direction of the current user.
The client device can sense the operation of looking at another user located in the virtual space around the user and process the operation.
The client device can receive the sensing information from the sensor unit using the control unit and / or the sight line determination unit. The sensing information may be a video shot by a camera, or a voice recorded by a microphone. In addition, the sensing information may be data sensed by a gyro sensor, an acceleration sensor, and an external sensor.
Also, the client device can check the movement of the user's gaze based on the sensing information using the control unit and / or the sight line determination unit (1210). For example, the client device can check the movement of the user's gaze based on the change of the value of the sensing information.
In addition, the client device may generate image configuration information in the virtual conference space using the control unit and / or the visual determination unit (1220). For example, when the client device physically moves or the user's gaze moves, the client device can calculate the gaze information and / or the zoom area information of the user in the virtual meeting space based on the sensing information.
Further, the client device can transmit the image configuration information to the server device and / or another client device using the communication unit (1230). In addition, the client device may forward the video configuration information to its other components.
In the foregoing, a method of generating image configuration information by a client device has been described. However, the present invention is not limited thereto, and the server device may receive the sensing information from the client device and generate the image configuration information.
In addition, an external computing device connected to the client device may generate image configuration information, and the computing device may communicate image configuration information to its client device, another client device, and / or a server device.
13 is a diagram illustrating an exemplary method by which a client device signals image configuration information.
Signaling the video configuration information (including viewpoint information and / or zoom area information) is very important. If the signaling of the video configuration information is too frequent, it may place a burden on the client device, the server device, and / or the entire network.
Accordingly, the client device can signal image configuration information only when the image configuration information (or gaze information and / or zoom area information) of the user is changed. That is, the client device can transmit the gaze information of the user to another client device and / or the server device only when the gaze information of the user is changed.
In one embodiment, the visual information can be signaled to a client device or a server device of another user only when a speaker who speaks a voice is different from a direction of a user's gaze by using a point where a speaker is usually noticed in a video conference.
In the case of a user who is not speaking, but is performing, or who needs to be noticed, such as writing something on the chalkboard, the client device may use options (eg, speaker and / Or the lecturer is set as the second user).
Referring to the drawing, the client device can determine the speaker within the virtual space area for the video conference using the control unit and / or the speaker determination unit (1310). For example, the client device can determine who the speaker is based on the sensing information. In addition, the client device can determine who is the speaker according to the given option.
Then, the client device can determine the user's gaze using the control unit and / or the visual determination unit (1320). For example, the client device can generate image configuration information based on the user's gaze using the control unit and / or the visual determination unit.
Then, the client device can determine whether the user's gaze is directed to the speaker using the control unit and / or the gaze determination unit (1330).
If the user's gaze is directed to the speaker, the client device may not signal the video configuration information using the communication unit (1340). In this case, the client device can continuously receive the image of the speaker in the direction of the user's eyes in high quality, and the images in the direction of the user's eyes can be continuously received in low quality.
If the user's line of sight does not point to the speaker, the client device can signal the video configuration information using the communication unit (1350). For example, if the user's gaze is initially directed to the speaker but later changed to another location, the client device may signal image configuration information for the new viewing direction of the user. That is, the client device may transmit image configuration information for a new viewing direction to another client device and / or a server device. In this case, the client device can receive the image corresponding to the user's new gaze direction with high quality, and can receive the image (for example, the image corresponding to the speaker) that does not correspond to the new gaze direction of the user with low quality have.
In the above description, the client device generates and / or transmits the image configuration information. However, the server device may receive the sensing information from the client device, generate the image configuration information based on the sensing information, It may be transmitted to one client device.
As described above, in a situation where users are all looking at a speaker in a video conference in a virtual space using a client device (e.g., HMD), the video conference system converts the speaker's video information into base layer data and enhancement layer data Of scalable video data. Also, the video conferencing system receives signaling from a user looking at a user other than the speaker, and can transmit video information of another user as scalable video data of base layer data and enhancement layer data. Through this, the video conferencing system can provide fast and high quality video information to the user while greatly reducing the signaling on the whole system.
The above-mentioned signaling may be signaling between a server device, a client device, and / or an external computing device (if present). In addition, the above-mentioned signaling may be signaling between the client device and / or an external computing device (if present).
14 is a diagram illustrating an exemplary method of transmitting high / low level images.
A method of transmitting a high / low level image based on a user's gaze information includes a
Although the exemplary techniques described above refer to the
In addition, although the present specification exemplifies a video conference system, the present specification can be equally applied to VR (Virtual Reality) and AR (Augmented Reality) games using an HMD. That is, all of the techniques for providing a high-level region corresponding to the line of sight the user is looking at, and signaling only when the user looks at an area or an object that is not expected to be viewed, It can be applied just as in the example.
15 is a diagram illustrating an exemplary image decoding method.
The video decoding apparatus (or decoder) may include at least one of an extractor, a base layer decoder, and / or an enhancement layer decoder. The contents of the video decoding apparatus and / or the video decoding method may include all contents related to the server device and / or the video decoding apparatus (or decoder) described above.
The video decoding apparatus can receive a bitstream including video data and signaling data using an extractor (1510). The video decoding apparatus may extract signaling data, base layer video data, and / or at least one enhancement layer video data from the video data.
Further, the video decoding apparatus may decode the base layer video data using a base layer decoder (1520).
In addition, the video decoding apparatus may decode at least one enhancement layer video data based on the signaling data and the base layer video data using an enhancement layer decoder (1530).
For example, the video data may comprise the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
The signaling data may also include region of interest information indicating a region of interest corresponding to the direction of the user's gaze within the entire region of the virtual space for the video conferencing service.
In addition, the base layer video data may include video data for the entire area, and at least one enhancement layer video data may include video data for the area of interest within the entire area.
Also, the at least one enhancement layer may be divided into at least one tile having a rectangular shape for each layer, and the region of interest information may include tile information identifying at least one tile included in the region of interest.
In addition, the ROI information is generated based on the ROI, and the ROI may include ROI information indicating the ROI of the user and ROI information indicating the viewing angle of the user in the ROI.
Also, the image configuration information can be signaled when the direction of the user's gaze does not face the speaker.
Further, the signaling data may be transmitted through at least one of Supplement Enhancement Information (SEI), video usability information (VUI), slice header, and a file describing the video data.
16 is a diagram illustrating an exemplary image encoding method.
The image encoding apparatus (or encoder) may include at least one of a base layer encoder, an enhancement layer encoder, and / or a multiplexer. The content of the video encoding apparatus and / or the video encoding method may include any content related to the client device and / or the video encoding apparatus (or the encoder) described above.
The video encoding apparatus can generate base layer video data using a base layer encoder (1610).
Further, the image encoding apparatus can generate at least one enhancement layer video data based on the signaling data and the base layer video data using an enhancement layer encoder.
Further, the video encoding apparatus can generate a bitstream including video data and signaling data using a multiplexer.
The image encoding apparatus and / or the image encoding method may perform an inverse process of the image decoding apparatus and / or the image decoding method. Also, common features can be included for this purpose.
17 is a diagram showing an exemplary syntax of the region of interest information.
Referring to FIG. 5A, interest area information (sighted_tile_info) for each video picture is shown. For example, the region of interest information may include at least one of info_mode information, tile_id_list_size information, tile_id_list information, cu_id_list_size information, cu_id_list information, user_info_flag information, user_info_size information, and / or user_info_list.
The info_mode information may indicate a mode of information representing a region of interest for each picture. The info_mode information can be represented by 4 bits of unsigned information. Or info_mode information may indicate the mode of the containing information. For example, if the value of the info_mode information is '0', the info_mode information can indicate that the previous information mode is used as it is. If the value of info_mode information is '1', info_mode information can indicate all tile number lists corresponding to the area of interest. If the value of the info_mode information is '2', the info_mode information indicates the start time of the consecutive tiles corresponding to the region of interest
Call and end numbers. If the value of the info_mode information is 3 ', the info_mode information may indicate the upper left and lower right tile numbers of the ROI. If the value of the info_mode information is '4', the info_mode information may indicate the number of the tile corresponding to the area of interest and the number of the coding unit included in the tile.
The tile_id_list_size information may indicate the length of the tile number list. The tile_id_list_size information can be represented by 8 bits of unsigned information.
The tile_id_list information may include a tile number list based on the info_mode information. Each tile number can be represented by 8 bits of unsigned information. The tile_id_list information includes information on the number of all tiles corresponding to the area of interest (when info_mode information = 1), the start number and end number of consecutive tiles (when info_mode information = 2) And the number of the upper left and lower right tiles (when info_mode information = 3).
The cu_id_list_size information may indicate the length of a Coding Unit list. The cu_id_list_size information can be represented by 16 bits of unsigned information.
The cu_id_list information may include a list of coding unit numbers based on the info_mode information. Each coding unit number can be represented by 16 bits of unsigned information. For example, the cu_id_list information may indicate a list of coding unit numbers (for example, info_mode information = 4) corresponding to the region of interest, based on the info_mode information.
The user_info_flag information may be a flag indicating an additional user information mode. The user_info_flag information may indicate whether the user and / or the provider have tile-related information to be transmitted further. The user_info_flag information can be represented by one bit of unsigned information. For example, if the value of the user_info_flag information is '0', it can be indicated that there is no additional user information. If the value of the user_info_flag information is '1', it can be indicated that there is additional user information.
The user_info_size information may indicate the length of the additional user information. user_info_size information can be represented by 16 bits of unsigned information.
The user_info_list information may include a list of additional user information. Each additional user information may be represented by information of unsigned changeable bits.
Referring to FIG. 5B, ROI information for each file, chunk, and video picture group is shown. For example, the region of interest information may include at least one of a version information field, an entire data size field, and / or at least one unit information field.
Referring to the drawing, interest area information (sighted_tile_info) for each file, chunk, and video picture group is shown. For example, the region of interest information may include at least one of version_info information, file_size information, and / or unit information.
The version_info information may indicate the version of the region of interest information (or signaling specification). The version_info information can be represented by 8 bits of unsigned information.
The file_size information may indicate the size of the unit information. The file_size information can be represented by 64 bits of unsigned information. For example, the file_size information may indicate a file size, a chunk size, and a video picture group size.
The unit information may include ROI information by file unit, chunk unit, and / or video picture group unit.
The unit information may include at least one of poc_num information, info_mode information, tile_id_list_size information, tile_id_list information, cu_id_list_size information, cu_id_list information, user_info_flag information, user_info_size information, and / or user_info_list information.
The poc_num information may indicate the video picture number. For example, a picture number field may indicate a picture order count (POC) in HEVC, and a picture (frame) number in a general video codec. The poc_num information can be represented by 32 bits of unsigned information.
The detailed contents of the info_mode information, the tile_id_list_size information, the tile_id_list information, the cu_id_list_size information, the cu_id_list information, the user_info_flag information, the user_info_size information, and / or the user_info_list information are the same as those described above, and a detailed description thereof will be omitted.
The area of interest information may be generated at a server device (or an image transmission device) and transmitted to at least one client device (or image receiving device).
In addition, the area of interest information may be generated in at least one client device (or image receiving device) and transmitted to at least one client device (or image receiving device) and / or a server device (or image transmitting device). In this case, the control unit of the client device and / or the client device may further include the signaling data extraction unit, the image generation unit, the ROI determination unit, the signaling data generation unit, and / or the encoder.
18 is a diagram showing exemplary ROI information in an XML format, and an exemplary SEI message.
Referring to FIG. 5A, the region of interest (sighted_tile_info) can be expressed in XML format. For example, the interest area information (sighted_tile_info) includes info_mode information ('3'), tile_id_list_size information ('6'), and / or tile_id_list information ('6, 7, 8, 9, 10, 11, 12') .
Referring to Figure (b), the payload syntax of a Supplemental Enhancement Information (SEI) message in an international video standard is shown. The SEI message indicates additional information that is not essential in the decoding process of the video coding layer (VCL).
The region of interest information (sighted_tile_info) 1810 may be included in SEI messages of High Efficiency Video Coding (HEVC), MPEG-4 and / or Advanced Video Coding (AVC) and transmitted over a broadcasting network and / or broadband have. For example, the SEI message may be included in the compressed video data.
Hereinafter, a method of transmitting and / or receiving video data and / or signaling data for a virtual reality service through a broadcasting network and / or broadband will be described.
19 is a diagram illustrating an exemplary protocol stack of a client device.
In this figure, the broadcast protocol stack portion is divided into a portion transmitted through a service list table (SLT) and a portion transmitted through an MMTP (MPEG Media Transport Protocol), and a portion transmitted through ROUTE (Real time Object delivery over Unidirectional Transport) Can be.
The
timed < / RTI > This data can also be encapsulated via UDP, IP layer.
The part transmitted through the SLT and the MMTP, the part transmitted through the ROUTE may be processed at the UDP, IP layer, and then re-encapsulated at the link layer (Data Link Layer). The broadcast data processed in the link layer can be multicasted as a broadcast signal through processes such as encoding / interleaving in the physical layer.
In this figure, the broadband side protocol stack portion can be transmitted through HTTP (HyperText Transfer Protocol) as described above.
The service may be a collection of media components that are displayed to the user as a whole, the component may be of several media types, the service may be continuous or intermittent, the service may be real or non real time, ≪ / RTI >
The service may include the virtual reality service and / or augmented reality service described above. Also, the video data and / or audio data may include at least one of
20 is an illustration showing an exemplary relationship between SLT and SLS (service layer signaling).
Service signaling provides service discovery and description information, and includes two functional components. These are bootstrap signaling through
As described above, the
The
Protocol, the SLS (MMT signaling component) 2030 according to the MMTP may also include information for accessing it. In broadband delivery, SLS is delivered over HTTP (S) / TCP / IP.
The service may be included in at least one of the
21 is a diagram showing an exemplary SLT.
SLT supports fast channel scans, allowing receivers to build up a list of all services it can receive with channel names, channel numbers, and so on. SLT also provides bootstrapping information that allows the receiver to discover the SLS for each service.
The SLT may include at least one of the @bsid, @sltCapabilities, sltInetUrl elements, and / or Service elements.
@bsid may be a unique identifier of the broadcast stream. The value of @bsid can have a unique value at the local level.
@sltCapabilities is a specification required for meaningful broadcasting in all services described in the SLT.
The sltInetUrl element is a URL (Uniform Resource Locator) value for downloading ESG (Electronic Service Guide) data or service signaling information providing guide information of all services described in the SLT through a broadband network. The sltInetUrl element can contain @URLtype.
@URLtype is the type of file that can be downloaded through the URL pointed to by the sltInetUrl element.
The Service element may contain service information. The service element may include at least one of @ serviceId, @ serviceSlcSeqNum, @protected, @majorChannelNo, @minorChannelNo, @serviceCategory, @shortServiceName, @hidden, @broadbandAccessRequired, @svcCapabilities, BroadcastSignaling element, and / or svcInetUrl element.
@ serviceId is the unique identifier of the service.
@sltSvcSeqNum has a value indicating whether the content of each service defined by SLT has been changed.
If @protected has a value of "true", it means that one of the components that make up the service is protected to show the service on the screen.
@majorChannelNo means the major channel number of the service.
@minorChannelNo means the service's minor channel number.
@serviceCategory indicates the type of service.
@shortServiceName indicates the name of the service.
@hidden indicates whether or not the service should be shown to the user when scanning the service.
@broadbandAccessRequired indicates whether a broadband network should be accessed to show the service to users in a meaningful way.
@svcCapabilities indicates specifications that must be supported to make the service meaningful to the user.
The BroadcastSignaling element contains definitions for the transport protocol, location, and identifier values of the signaling sent to the broadcast network. The BroadcastSignaling element may include at least one of @slsProtocol, @slsMajorProtocolVersion, @slsMinorProtocolVersion, @slsPlpId, @slsDestinationIpAddress, @slsDestinationUdpPort, and / or @slsSourceIpAddress.
@slsProtocol indicates the protocol to which the SLS of the corresponding service is transmitted.
@slsMajorProtocolVersion indicates the major version of the protocol to which the SLS of the service is transmitted.
@slsMinorProtocolVersion indicates the minor version of the protocol to which the SLS of the service is transmitted.
@slsPlpId indicates the PLP identifier to which the SLS is transmitted.
@slsDestinationIpAddress indicates the destination IP address value of the SLS data.
@slsDestinationUdpPort indicates the destination port value of the SLS data.
@slsSourceIpAddress represents the source IP address value of the SLS data.
The svcInetUrl element indicates the URL value for downloading the ESG service or the signaling data associated with the service. The svcInetUrl element can contain @URLtype.
@URLtype is the type of file that can be downloaded through the URL pointed to by the svcInetUrl element.
22 is a diagram illustrating an exemplary code value of the serviceCategory attribute.
For example, if the value of the serviceCategory attribute is '0', the service may not be specified. If the value of the serviceCategory attribute is '1', the service may be a linear audio / video service. If the value of the serviceCategory attribute is '2', the service may be a linear audio service. If the value of the serviceCategory attribute is '3', the service may be an app-based service. If the value of the serviceCategory attribute is '4', the service may be an electronic service guide (ESG) service. If the value of the serviceCategory attribute is '5', the service may be an emergency alert service (EAS).
If the value of the serviceCategory attribute is '6', the service may be a virtual reality and / or augmented reality service.
For a video conferencing service, the value of the serviceCategory attribute may be '6' (2210).
23 is a diagram illustrating an exemplary SLS bootstrapping and exemplary service discovery process.
The receiver can acquire the SLT. The SLT is used to bootstrap the SLS acquisition, after which the SLS is used to acquire the service component delivered in the ROUTE session or the MMTP session.
With respect to the service delivered in the ROUTE session, the SLT provides SLS bootstrapping information such as PLPID (# 1), source IP address (sIPl), destination IP address (dIPl), and destination port number dPortl . With respect to the service delivered in the MMTP session, the SLT provides SLS bootstrapping information such as PLPID (# 2), Destination IP address (dIP2), and Destination port number (dPort2).
For reference, a broadcast stream is a concept of an RF channel defined in terms of a carrier frequency concentrated in a specific band. The physical layer pipe (PLP) is part of the RF channel. Each PLP has specific modulation and coding parameters.
For streaming service delivery using ROUTE, the receiver can acquire SLP fragments that are delivered in PLP and IP / UDP / LCT sessions. These SLS fragments include a USBD / USD (User Service Bundle Description / User Service Description) fragment, a Service-based Transport Session Instance Description (S-TSID) fragment, and a MPD (Media Presentation Description) fragment. They are related to one service.
For streaming service delivery using MMTP, the receiver can obtain SLS fragments that are delivered in PLP and MMTP sessions. These SLS fragments may include USBD / USD fragments, MMT signaling messages. They are related to one service.
The receiver may obtain video components and / or audio components based on the SLS fragments.
Unlike the illustrated embodiment, one ROUTE or MMTP session may be delivered over a plurality of PLPs. That is, one service may be delivered via one or more PLPs. As described above, one LCT session can be transmitted through one PLP. Components which constitute one service may be delivered through different ROUTE sessions according to an embodiment. Also, according to an exemplary embodiment, the components configuring one service may be delivered through different MMTP sessions. According to the embodiment, the components constituting one service are connected to the ROUTE session and the MMTP session
It can be divided and delivered. Although not shown, there may be a case where a component constituting one service is delivered through a broadband (hybrid delivery).
In addition, service data (e.g., video components and / or audio components) and / or signaling data (e.g., SLS fragments) may be transmitted over the broadcast network and / or broadband.
24 is a diagram illustrating an exemplary USBD / USD fragment for ROUTE / DASH.
The USBD / USD (User Service Bundle Description / User Service Description) fragment describes the service layer characteristics and provides a URI reference for the S-TSID fragment and a URI reference for the MPD fragment. That is, USBD / USD fragments can refer to S-TSID fragments and MPD fragments, respectively. USBD / USD fragments can be represented as USBD fragments.
USBD / USD fragments can have a bundleDescription root element. The bundleDescription root element can have a userServiceDescription element. The userServiceDescription element can be an instance of one service.
The userServiceDescription element may include at least one of @globalServiceId, @serviceId, @serviceStatus, @fullMPDUri, @sTSIDUri, name element, serviceLanguage element, deliveryMethod element, and / or serviceLinakge element.
@globalServiceId can point to a globally unique URI that identifies the service.
The @ serviceId is a reference to the corresponding service entry in the SLT.
@ serviceStatus can specify the status of the service. The value indicates whether the service is active or inactive.
@fullMPDUri may refer to an MPD fragment that contains a description of the content component of the service delivered on broadcast and / or broadband.
@sTSIDUri can refer to an S-TSID fragment that provides access-related parameters to the transport session carrying the contents of that service.
The name element can represent the name of the service. The name element can contain @lang, which indicates the language of the service name.
The serviceLanguage element may represent the language in which the service is available.
The deliveryMethod element may be a container of a transport related to broadcast of access and (optionally) information pertaining to the content of the service in broadband mode. The deliveryMethod element can contain the broadcastAppService element and the unicastAppService element. Each subelement can have a basePattern element as a child element.
The broadcastAppService element may be a DASH presentation that is delivered on a multiplexed or unmultiplexed form of broadcast that includes corresponding media components belonging to the service over all periods of the media presentation to which it belongs. That is, each of these fields may refer to DASH representations transmitted over a broadcast network.
The unicastAppService may be a DASH presentation that is delivered on a broadband or multiplexed or non-multiplexed form containing configuration media content components belonging to the service over all periods of the media presentation to which it belongs. That is, each of these fields may refer to DASH representations transmitted over broadband.
The basePattern may be a character pattern used by the receiver to match for all parts of the split URL used by the DASH client to request media splitting of the parent presentation to the included period.
The serviceLinakge element may contain service linkage information.
25 is a diagram illustrating an exemplary S-TSID fragment for ROUTE / DASH.
A Service-based Transport Session Instance Description (S-TSID) fragment provides a description of a transport session description for one or more ROUTE / LCT sessions over which a media content component of the service is delivered, and a delivery object delivered in that LCT session. The receiver may obtain at least one component (e.g., a video component and / or an audio component) included in the service based on the S-TSID fragment.
The S-TSID fragment may include the S-TSID root element. The S-TSID root element may contain @serviceId and / or at least one RS element.
@serviceID can be a reference to a service element in USD.
The RS element may have information about a ROUTE session that carries corresponding service data.
The RS element may contain at least one of @bsid, @sIpAddr, @dIpAddr, @dport, @PLPID, and / or at least one LS element.
@bsid may be the identifier of the broadcast stream to which the content component of the broadcastAppService is delivered.
@sIpAddr can indicate the source IP address. Where the source IP address may be the source IP address of the ROUTE session that carries the service component included in the service.
@dpAddr can represent the destination IP address. The destination IP address may be a destination IP address of a ROUTE session that carries a service component included in the service.
@dport can represent a destination port. The destination port may be a destination port of a ROUTE session that carries a service component included in the service.
@PLPID may be the ID of the PLP for the ROUTE session represented by the RS element.
The LS element may have information about an LCT session that delivers the corresponding service data.
The LS element can contain @tsi, @PLPID, @bw, @startTime, @endTime, SrcFlow and / or RprFlow.
@tsi can indicate the TSI value of the LCT session over which the service component of the service is delivered.
The @PLPID may have the ID information of the PLP for the corresponding LCT session. This value may override the default ROUTE session value.
@bw can indicate the maximum bandwidth value. @startTime can indicate the start time of the LCT session. @endTime can indicate the end time of the LCT session. The SrcFlow element can describe the source flow of ROUTE. The RprFlow element can describe the repair flow of ROUTE.
The S-TSID may include region of interest information. Specifically, the RS element and / or the LS element may include the region of interest information.
26 is a diagram illustrating an exemplary MPD fragment.
The MPD (Media Presentation Description) fragment may contain a formalized description of a DASH media presentation corresponding to a linear service of a given duration as determined by the broadcaster. MPD fragments are primarily concerned with linear services for delivery of DASH fragments as streaming content. The MPD provides a source identifier for the individual media component of the linear / streaming service in the form of a fragment URL, and the context of the identified resource within the media presentation. The MPD may be transmitted via broadcast and / or broadband.
The MPD fragment may include a Period element, an Adaptation Set element, and a Representation element.
The period element contains information about the period. The MPD fragment may contain information about a plurality of periods. The period represents a continuous time interval of media content presentation.
An adaptation set element contains information about an adaptation set. The MPD fragment may contain information about a plurality of adaptation sets. An adaptation set is a collection of media components that contain one or more media content components that are interchangeable. An adaptation set may include one or more representations. Each of the adaptation sets may include audio of different languages or may include subtitles of different languages.
The presentation element contains information about the presentation. The MPD may include information about a plurality of representations. A representation is a structured collection of one or more media components, and there may be a plurality of differently encoded representations for the same media content component. On the other hand, when bitstream switching is enabled, the electronic device can switch the received presentation to another presentation based on the updated information during media content playback. In particular, the electronic device can convert the received presentation into another representation depending on the bandwidth environment. The representation is divided into a plurality of segments.
A segment is a unit of media content data. The representation may be sent as part of a segment or segment as requested by the electronic device using the HTTP GET or HTTP partial GET method defined in HTTP 1.1 (RFC 2616).
Further, the segment may be configured to include a plurality of sub-segments. A sub-segment may mean the smallest unit that can be indexed at the segment level. The segment may include an Initialization Segment, a Media Segment, an Index Segment Index Segment, a Bitstream Switching Segment, and the like.
The MPD fragment may include region of interest information. In particular, the Period element, the Adaptation Set element, and / or the Representation element may include the region of interest information.
27 is a diagram illustrating an exemplary process of receiving a virtual reality service through a plurality of ROUTE sessions.
The client device (or receiver) can receive the bitstream through the broadcast network. For example, the bitstream may comprise video data for service and second signaling data. For example, the second signaling data may include
The bitstream may include at least one physical layer frame. The physical layer frame may include at least one PLP. For example, the
Also, the
Also, the
The client device may then obtain the
The client device may then obtain the
The client device may then obtain the S-TSID fragment and / or the MPD fragment based on the USBD / USD fragment. The client device may match the representation of the MPD fragment with the media component transmitted via the LCT session, based on the S-TSID fragment and the MPD fragment.
The client device may then obtain base
The client device may then decode the service data (e.g., base layer video data, enhancement layer video data, audio data) based on the MPD fragment.
More specifically, the client device may decode the enhancement layer video data based on the base layer video data and / or the region of interest information.
In the above description, the enhancement layer video data is transmitted through the second ROUTE session (ROUTE # 2), but the enhancement layer video data may be transmitted through the MMTP session.
28 is a diagram showing an exemplary configuration of a client device.
The client device A 2800 includes at least one of an image input unit, an audio input unit, a sensor unit, an image output unit, an audio output unit, a communication unit A 2810, and / or a control unit A 2820 . For example, the specific contents of the client device (A 2800) may include all the contents of the client device described above.
The control unit A2820 may include at least one of a signaling data extraction unit, a decoder, a speaker determination unit, a gaze determination unit, and / or a signaling data generation unit. For example, the contents of the control unit A2820 may include all the contents of the control unit described above.
Referring to the drawings, a client device (or receiver, image receiving apparatus) may include a communication unit A2810 and / or a control unit A2820. The control unit A2820 may include a base layer decoder A2821 and / or an enhancement layer decoder A2825.
The communication unit A2810 can receive a bit stream including video data for a virtual reality service. The communication unit A2810 can receive the bit stream through the broadcasting network and / or the broadband.
The video data may include base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
The base layer decoder A2821 may decode the base layer video data.
The enhancement layer decoder A2825 may decode the at least one enhancement layer video data based on the base layer video data.
The at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.
In addition, the control unit A2820 may further include a signaling data generation unit for generating the first signaling data.
The first signaling data may include image configuration information. The image configuration information may include at least one of gaze information indicating a gaze direction of a user in the virtual space and zoom area information indicating a viewing angle of the user.
In addition, the control unit A2820 may further include a visual determination unit for determining whether a visual region corresponding to the visual information is included in the at least one region of interest.
The communication unit A2810 may transmit the first signaling data to a server (or a server device, a transmitter, a video transmission device) and / or at least one client, if the gaze region is included in an area other than the at least one ROI The server device, the server device, and / or the at least one client device, which have received the first signaling data, may transmit to the device (or the image receiving device) . That is, the region of interest may include at least one of a region including a speaker in the virtual space, a region predetermined to be expressed using at least one enhancement layer video data, and a line of sight corresponding to the line of sight information.
In addition, the bitstream may further include second signaling data.
The communication unit A2810 can independently receive the base layer video data and the at least one enhancement layer video data through a plurality of sessions based on the second signaling data.
For example, the communication unit A2810 can receive base layer video data through a first ROUTE session and receive at least one enhancement layer video data through at least one second ROUTE session. Alternatively, the communication unit A2810 may receive base layer video data through a ROUTE session and receive at least one enhancement layer video data through at least one MMTP session.
The second signaling data may include at least one of a service layer signaling data (or SLS) including information for acquiring the video data and a service list table (or SLT) including information for acquiring the service layer signaling data. One can be included.
In addition, the service list table may include a service category attribute indicating a category of the service. For example, the service category attribute may indicate the virtual reality service.
Also, the service layer signaling data may include the region of interest information. Specifically, the service layer signaling data includes an S-TSID fragment including information on a session in which at least one media components for the virtual reality service are transmitted, the at least one media component (video data and / or audio data) And a USBD / USD fragment including a URI value linking the S-TSID fragment and the MPD fragment.
In addition, the MPD fragment may include region of interest information indicating the location of the at least one region of interest within the entire region of the virtual space.
In addition, the bitstream may further comprise region of interest information indicating the location of the at least one region of interest within the entire region of the virtual space. For example, the region of interest information may be transmitted and / or received via at least one of a Supplemental Enhancement Information (SEI) message, a Video Usability Information (VUI) message, a slice header, and a file describing the video data.
Also, the at least one enhancement layer video data may be generated (encoded) and / or decoded based on the base layer video data and the region of interest information.
In addition, the ROI information may include at least one of an information mode field indicating a mode of information representing the ROI, and a tile number list field including a number of at least one tile corresponding to the ROI . For example, the information mode field may be the info_mode information described above, and the tile number list field may be the tile_id_list information described above.
For example, the tile number list field may include a number of all tiles corresponding to the area of interest, a start number and an end number of consecutive tiles, and a number of the upper left and lower right tiles of the area of interest, Lt; RTI ID = 0.0 > tile < / RTI >
In addition, the ROI information may further include a coding unit number list field indicating the ROI. For example, the coding unit number list field may be cu_id_list information described above.
For example, the coding unit number list field may indicate a number of a tile corresponding to the region of interest and a number of a coding unit included in the tile based on the information mode field.
The client device B2800 may include at least one of an image input unit, an audio input unit, a sensor unit, a video output unit, an audio output unit, a communication unit B2810, and / or a control unit B2820. have. For example, the specific contents of the client device B 2806 may include all the contents of the client device A 2800 described above.
In addition, the control unit B2820 may include at least one of the first processor B2821 and / or the second control unit B2825.
The first processor B2821 may decode the base layer video data. For example, the first processor B2821 may be a video processing unit (VPU) and / or a digital signal processor (DSP).
The second processor B2825 may be electrically coupled to the first processor to decode the at least one enhancement layer video data based on the base layer video data. For example, the second processor B2825 may be a central processing unit (CPU) and / or a graphics processing unit (GPU).
29 is a diagram showing an exemplary configuration of a server device.
When performing communication only between client devices, at least one client device (or HMD, image receiving apparatus) may perform all operations of the server device (or image transmitting apparatus). Hereinafter, the case where a server device exists will be mainly described, but the contents of the present specification are not limited thereto.
Referring to FIG. 2A, a server device (A 2900, a transmitter, an image transmission apparatus) may include a control unit A 2910 and / or a communication unit A 2920. The controller A2920 may include at least one of a signaling data extracting unit, an image generating unit, a region of interest determining unit, a signaling data generating unit, and / or an encoder. The specific contents of the server device A 2900 may include all the contents of the server device described above.
Referring to the drawings, a controller A2910 of the server device A2900 may include a base layer encoder A 2911 and / or an enhancement layer encoder A 2915.
The base layer encoder A 2911 can generate base layer video data.
The enhancement layer encoder A 2915 may generate at least one enhancement layer video data based on the base layer video data.
The communication unit A2920 can transmit a bit stream including video data for a virtual reality service. The communication unit A2920 can transmit the bit stream through the broadcasting network and / or the broadband.
Also, the video data may include the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
Also, the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.
Further, the communication section A2920 can further receive the first signaling data. For example, the first signaling data may include image configuration information.
The region of interest determination unit of the control unit A2910 may include a visual region corresponding to the visual information in the at least one region of interest.
Also, the signaling data generation unit of the control unit A2910 may generate the second signaling data.
In addition, the communication unit A2920 may independently transmit the base layer video data and the at least one enhancement layer video data through a plurality of sessions based on the second signaling data.
In addition, the second signaling data and / or the region of interest information may include all of the above.
Referring to FIG. 5B, the server device (B2900, transmitter, image transmission apparatus) may include at least one of a control unit (B2910) and / or a communication unit (B2920). The control unit B2920 may include at least one of a signaling data extracting unit, an image generating unit, a region of interest determining unit, a signaling data generating unit, and / or an encoder. The specific contents of the server device B 2900 may include all the contents of the server device described above.
The control unit B2910 of the server device B2900 may include a first processor B2911 and / or a second processor B2915.
The first processor B2911 may include a base layer encoder for generating base layer video data.
The second processor B 2915 may be electrically coupled to the first processor to generate (or encode) the at least one enhancement layer video data based on the base layer video data.
30 is a diagram illustrating an exemplary operation of a client device.
The client device (or receiver, video receiving apparatus) may include a communication unit and / or a control unit. The control unit may include a base layer decoder and / or an enhancement layer decoder. Further, the control unit may include a first processor and / or a second processor.
The client device can receive the bitstream including the video data for the virtual reality service using the communication unit (3010).
For example, the video data may include base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
The client device may then decode the base layer video data using a base layer decoder and / or a first processor (3020).
The client device may then decode (3030) the at least one enhancement layer video data based on the base layer video data using an enhancement layer decoder and / or a second processor.
For example, the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.
The content related to the operation of the client device may include the contents of the client device described above.
31 is a diagram showing an exemplary operation of the server device.
The server device may include a control unit and / or a communication unit. The control unit may include a base layer encoder and / or an enhancement layer encoder. Further, the control unit may include a first processor and / or a second processor.
The server device may generate 3110 base layer video data using a base layer encoder and / or a first processor.
The server device may then generate 3120 at least one enhancement layer video data based on the base layer video data using an enhancement layer encoder and / or a second processor.
Then, the server device can transmit the bit stream including the video data for the virtual reality service using the communication unit.
For example, the video data may include the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.
Also, the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.
The content related to the operation of the server device may include all the contents of the server device described above.
Further, according to the embodiments disclosed herein, the above-described method can be implemented as a code that can be read by a processor on a medium on which the program is recorded. Examples of the medium that can be read by the processor include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The above-described electronic device can be applied to a configuration and a method of the embodiments described above in a limited manner, but the embodiments can be configured such that all or some of the embodiments are selectively combined so that various modifications can be made It is possible.
In the foregoing, preferred embodiments of the present invention have been described with reference to the accompanying drawings. Herein, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and should be construed as meaning and concept consistent with the technical idea of the present technology.
The scope of the present technology is not limited to the embodiments disclosed in the present specification, and the present invention may be modified, changed, or improved in various forms within the scope of the present invention.
A2821: Base layer decoder A2825: Enhancement layer decoder
A2810:
A2911: Base layer encoder A2915: Enhancement layer encoder
A2920:
Claims (23)
Wherein the video data comprises base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer;
Decoding the base layer video data; And
And decoding the at least one enhancement layer video data based on the base layer video data,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.
And generating first signaling data,
Wherein the first signaling data includes visual information indicating a direction of a user's gaze in the virtual space.
Determining whether a gaze region corresponding to the gaze information is included in the at least one region of interest; And
And transmitting the first signaling data if the viewing area is included in an area other than the at least one area of interest,
Wherein the gaze region is added to the at least one region of interest.
Wherein the bitstream comprises interest region information indicating the location of the at least one ROI within the entire region of the virtual space,
Wherein the at least one enhancement layer video data is decoded based on the base layer video data and the region of interest information.
Wherein the ROI information includes a tile number list field including a number of at least one tile corresponding to the ROI.
Wherein the tile number list field comprises a number of tiles of the at least one tile in one of a number of all tiles corresponding to the area of interest, a start number and an end number of consecutive tiles, / RTI >
Wherein the ROI information is received through at least one of a Supplemental Enhancement Information (SEI) message, a Video Usability Information (VUI) message, a slice header, and a file describing the video data.
Wherein the bitstream comprises second signaling data,
Wherein the receiving the bitstream comprises:
And receiving the base layer video data and the at least one enhancement layer video data independently through a plurality of sessions based on the second signaling data.
Wherein the second signaling data includes a service list signal including information for acquiring the video data and information for acquiring the service layer signaling data.
Wherein the service layer signaling data comprises the region of interest information.
Generating at least one enhancement layer video data based on the base layer video data; And
Transmitting a bitstream including video data for a virtual reality service,
Wherein the video data comprises the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.
Receiving first signaling data,
Wherein the first signaling data includes line-of-sight information indicating a line-of-sight direction of the user in the virtual space,
Wherein the first signaling data is received when a line of sight corresponding to the line of sight information is included in an area other than the at least one area of interest.
Wherein the gaze region is added to the at least one region of interest.
Wherein the bitstream comprises interest region information indicating the location of the at least one ROI within the entire region of the virtual space,
Wherein the at least one enhancement layer video data is encoded based on the base layer video data and the region of interest information.
Wherein the ROI information includes a tile number list field including a number of at least one tile corresponding to the ROI.
Wherein the tile number list field comprises a number of tiles of the at least one tile in one of a number of all tiles corresponding to the area of interest, a start number and an end number of consecutive tiles, A video transmission method comprising a number.
Wherein the ROI information is transmitted through at least one of a Supplemental Enhancement Information (SEI) message, a Video Usability Information (VUI) message, a slice header, and a file describing the video data.
Further comprising generating second signaling data,
Wherein the step of transmitting the bitstream comprises:
And the base layer video data and the at least one enhancement layer video data are independently transmitted through a plurality of sessions based on the second signaling data.
Wherein the second signaling data comprises a service list signal including service layer signaling data including information for acquisition of the video data and information for acquiring the service layer signaling data.
Wherein the service layer signaling data includes the region of interest information.
Wherein the video data comprises base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer;
A base layer decoder for decoding the base layer video data; And
And an enhancement layer decoder for decoding the at least one enhancement layer video data based on the base layer video data,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.
A first processor for decoding the base layer video data; And
And a second processor, electrically coupled to the first processor, for decoding the at least one enhancement layer video data based on the base layer video data,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.
An enhancement layer encoder for generating at least one enhancement layer video data based on the base layer video data; And
And a communication unit for transmitting a bitstream including video data for a virtual reality service,
Wherein the video data comprises the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160125145A KR101861929B1 (en) | 2016-09-28 | 2016-09-28 | Providing virtual reality service considering region of interest |
PCT/KR2017/001087 WO2018062641A1 (en) | 2016-09-28 | 2017-02-01 | Provision of virtual reality service with consideration of area of interest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160125145A KR101861929B1 (en) | 2016-09-28 | 2016-09-28 | Providing virtual reality service considering region of interest |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20180035089A true KR20180035089A (en) | 2018-04-05 |
KR101861929B1 KR101861929B1 (en) | 2018-05-28 |
Family
ID=61760922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020160125145A KR101861929B1 (en) | 2016-09-28 | 2016-09-28 | Providing virtual reality service considering region of interest |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR101861929B1 (en) |
WO (1) | WO2018062641A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019199025A1 (en) * | 2018-04-09 | 2019-10-17 | 에스케이텔레콤 주식회사 | Method and device for encoding/decoding image |
KR20200076529A (en) * | 2018-12-19 | 2020-06-29 | 가천대학교 산학협력단 | Indexing of tiles for region of interest in virtual reality video streaming |
KR20200111408A (en) * | 2019-03-19 | 2020-09-29 | 한국전자기술연구원 | User interface and method for 360 VR interactive relay |
US11509937B2 (en) | 2018-04-09 | 2022-11-22 | Sk Telecom Co., Ltd. | Method and apparatus for encoding/decoding video |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102261739B1 (en) * | 2019-06-19 | 2021-06-08 | 주식회사 엘지유플러스 | System and method for adaptive streaming of augmented reality media content |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5037574B2 (en) * | 2009-07-28 | 2012-09-26 | 株式会社ソニー・コンピュータエンタテインメント | Image file generation device, image processing device, image file generation method, and image processing method |
KR101972284B1 (en) * | 2013-04-08 | 2019-04-24 | 소니 주식회사 | Region of interest scalability with shvc |
KR101540113B1 (en) * | 2014-06-18 | 2015-07-30 | 재단법인 실감교류인체감응솔루션연구단 | Method, apparatus for gernerating image data fot realistic-image and computer-readable recording medium for executing the method |
-
2016
- 2016-09-28 KR KR1020160125145A patent/KR101861929B1/en active IP Right Grant
-
2017
- 2017-02-01 WO PCT/KR2017/001087 patent/WO2018062641A1/en active Application Filing
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019199025A1 (en) * | 2018-04-09 | 2019-10-17 | 에스케이텔레콤 주식회사 | Method and device for encoding/decoding image |
US11509937B2 (en) | 2018-04-09 | 2022-11-22 | Sk Telecom Co., Ltd. | Method and apparatus for encoding/decoding video |
US11778238B2 (en) | 2018-04-09 | 2023-10-03 | Sk Telecom Co., Ltd. | Method and apparatus for encoding/decoding video |
US11778239B2 (en) | 2018-04-09 | 2023-10-03 | Sk Telecom Co., Ltd. | Method and apparatus for encoding/decoding video |
US11792436B2 (en) | 2018-04-09 | 2023-10-17 | Sk Telecom Co., Ltd. | Method and apparatus for encoding/decoding video |
US11902590B2 (en) | 2018-04-09 | 2024-02-13 | Sk Telecom Co., Ltd. | Method and apparatus for encoding/decoding video |
KR20200076529A (en) * | 2018-12-19 | 2020-06-29 | 가천대학교 산학협력단 | Indexing of tiles for region of interest in virtual reality video streaming |
KR20200111408A (en) * | 2019-03-19 | 2020-09-29 | 한국전자기술연구원 | User interface and method for 360 VR interactive relay |
Also Published As
Publication number | Publication date |
---|---|
WO2018062641A1 (en) | 2018-04-05 |
KR101861929B1 (en) | 2018-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11184584B2 (en) | Method for image decoding, method for image encoding, apparatus for image decoding, apparatus for image encoding | |
CN110036641B (en) | Method, device and computer readable storage medium for processing video data | |
KR102342274B1 (en) | Advanced signaling of regions of most interest in images | |
US11303826B2 (en) | Method and device for transmitting/receiving metadata of image in wireless communication system | |
CN109076239B (en) | Circular fisheye video in virtual reality | |
KR102252238B1 (en) | The area of interest in the image | |
US20190104326A1 (en) | Content source description for immersive media data | |
KR101861929B1 (en) | Providing virtual reality service considering region of interest | |
CN109218734A (en) | For Video coding and decoded method, apparatus and computer program product | |
KR20190091275A (en) | Systems and Methods of Signaling of Regions of Interest | |
US10567734B2 (en) | Processing omnidirectional media with dynamic region-wise packing | |
US12035020B2 (en) | Split rendering of extended reality data over 5G networks | |
JP7035088B2 (en) | High level signaling for fisheye video data | |
KR102361314B1 (en) | Method and apparatus for providing 360 degree virtual reality broadcasting services | |
KR20200024829A (en) | Enhanced High-Level Signaling for Fisheye Virtual Reality Video in DASH | |
KR101898822B1 (en) | Virtual reality video streaming with viewport information signaling | |
KR101941789B1 (en) | Virtual reality video transmission based on viewport and tile size | |
JP2024519747A (en) | Split rendering of extended reality data over 5G networks | |
WO2020068935A1 (en) | Virtual reality viewpoint viewport center point correspondence signaling | |
WO2020068284A1 (en) | Virtual reality (vr) viewpoint grouping | |
KR102183895B1 (en) | Indexing of tiles for region of interest in virtual reality video streaming | |
Fautier | VR video ecosystem for live distribution | |
CN117256154A (en) | Split rendering of augmented reality data over 5G networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant |