WO2019117628A1

WO2019117628A1 - Virtual reality video quality calibration

Info

Publication number: WO2019117628A1
Application number: PCT/KR2018/015794
Authority: WO
Inventors: 류은석; 류영일
Original assignee: 가천대학교 산학협력단
Priority date: 2017-12-12
Filing date: 2018-12-12
Publication date: 2019-06-20
Also published as: KR101981868B1

Abstract

An image quality calibration method for a wearable image display device disclosed in the present specification comprises: determining of a threshold of a gaze movement speed for image quality conversion in accordance with the characteristics of a wearable image display device; measuring of a gaze movement speed of a user of the wearable image display device; and requesting of quality adjustment with respect to a video image to be transmitted in accordance with a comparison result of the gaze movement speed and the threshold.

Description

Virtual reality video quality control

Mutual citation with related application

This application claims the benefit of priority based on Korean Patent Application No. 10-2017-0170822, filed December 12, 2017, the entire contents of which are incorporated herein by reference.

Technical field

This specification relates to controlling the quality of virtual reality video.

Recently, wearable devices such as a head-mounted display (HMD) have been introduced along with the development of virtual reality technology and equipment. Among the many service scenarios through head-mounted imaging devices, there are real-time 360-degree image transmission services. The 360-degree image transmission system acquires a 360-degree image using a plurality of cameras, codes the acquired image, and transmits the image to a head-mounted imaging apparatus worn by a user. The transmitted image is mapped to a 360-degree virtual space after being decoded and provided to the user. In this case, since the HMD reproduces the image at a position very close to the user's eyes, it is necessary to use an image of UHD (Ultra High Definition) or higher in order to provide the user with immersive feeling of immersion. In this case, There is a need for a method for securing a bandwidth and supporting a fast response speed in a user terminal and a video transmission system.

In order to secure the bandwidth of the user terminal and the image transmission system for processing the increased amount of video data and to support the fast response speed, the 360-degree image transmission system selectively transmits only the specific sub- When tilting technology and scalable video encoding technology are applied, if user's gaze change is fast and frequently occurs, high quality image information to be transmitted increases, and the effect of reducing bandwidth required for image transmission is lowered. Therefore, there is no need to utilize the tiling technology and the scalable image coding technique to secure the bandwidth and to support the fast response speed. Therefore, a method for solving the problem has become necessary.

The present specification discloses a method of controlling image quality of a wearable video display device. The image quality control method of the wearable image display device may include: determining a threshold value of a visual movement speed for image quality switching according to characteristics of the wearable image display device; Measuring an eye movement speed of a user of the wearable image display device; And requesting quality adjustment of the video image to be transmitted according to the comparison result of the gaze speed and the threshold value.

The method and other embodiments may include the following features.

Requesting quality control of a video image to be transmitted according to a result of comparison between the gaze speed and the threshold value; requesting transmission of image data of a first quality if the gaze speed is lower than the threshold; Requesting transmission of video data of a second quality lower than the first quality if the gaze movement speed is equal to or greater than the threshold value and if the gaze movement speed is not increasing , And may request transmission of the image data of the first quality.

In addition, the image data of the first quality may include at least one of high image quality data, base layer image data, and enhancement layer image data for a currently transmitted video image, and the image data of the second quality may include at least one of Low-quality image data for a video image, and base layer image data.

In addition, the quality of the first quality image data and the quality of the second quality image data may differ depending on the image quality factors including the image quality, the number of frames of the image, the resolution of the image, and the scan method of the image .

According to another aspect of the present invention, there is provided a method of controlling quality of a video image to be transmitted according to a comparison result between a speed of movement of a line of sight and a threshold value, , Or if the gaze movement speed is not increasing, transmission of the image data of the first quality may be requested.

In addition, requesting the transmission of the image data of the first quality when the visual-movement speed is less than the threshold value or the visual-movement speed is not increasing while the image data of the second quality is being transmitted, The base station transmits the upsampled base layer image data of the video image for the delay time during the delay time occurring when the transmission quality is switched, When the data is received, the video image of the first quality can be output.

In addition, an operation of requesting transmission of a video image whose quality is adjusted according to a result of comparison between the speed of movement of the line of sight and the threshold value may include requesting transmission of image data of a first quality when the eye movement speed is smaller than the threshold value And to request transmission of video data having a second quality lower than the first quality if the gaze movement speed is equal to or greater than the threshold value.

Also, the image data of the first quality may include at least one of high-definition image data, base-layer image data, and enhancement-layer image data for a video image currently being transmitted, and the image data of the second quality may include at least one of the currently- The low-resolution image data for the image, and the base layer image data.

The threshold value of the gaze movement speed for switching the image quality is a gaze movement speed at which the user can not perceive a difference in quality with respect to images of different quality when the user moves the gaze, And may vary depending on the characteristics of the image display device.

Meanwhile, the present specification discloses a video transmission method of a video server. The method comprising: receiving a transmission request message of a first quality video data from a wearable video display device; Transmitting the video data of the first quality to a video image corresponding to the transmission request of the video data of the first quality to the wearable video display device; Receiving a transmission request message of video data of a second quality lower than the first quality from the wearable video display device; And transmitting the video data of the second quality for the video image to the wearable video display device in response to a request for transmission of the video data of the second quality, When receiving the transmission request message of the first quality video data during transmission to the wearable video display device, the upsampled video data of the video data of the second quality is transmitted to the wearable And transmits the image data of the first quality to the wearable display device after transmitting the image data to the image display device.

The method and other embodiments may include the following features.

Wherein the image data of the first quality includes base layer image data and enhancement layer image data of a currently transmitted video image and the image data of the second quality includes the base layer image data, May be upsampled image data of the base layer image data.

On the other hand, the present specification proposes a method of controlling image quality. Wherein the image quality control method includes: determining a threshold value of a line-of-sight movement speed for switching a video quality according to a device characteristic of the wearable video display device; The wearable video display device measuring an eye movement speed of a user; Requesting the video server to transmit the quality-adjusted video image according to the comparison result of the gaze speed and the threshold value; And transmitting the quality-adjusted video image to the wearable image display device in response to a transmission request of the video image by the video server.

The method and other embodiments may include the following features.

Wherein the wearable video display device requests the video server to transmit video data of a first quality when the gazing speed is lower than the threshold value as a result of the comparison, And requests the video server to transmit video data of a second quality lower than the first quality if the gaze movement speed is increased and if the gaze movement speed is not increased, Quality video data to the video server.

In addition, the video server transmits the video data of the first quality to the video image corresponding to the transmission request of the video data of the first quality to the wearable video display device, And transmits the video data of the second quality to the video image corresponding to the transmission request to the wearable video display device.

When the video server receives the video data transmission request message of the first quality while transmitting the video data of the second quality to the wearable video display device, The image data of the first quality may be transmitted to the wearable display device after the upsampled image data of the image data of the second quality is transmitted to the wearable display device.

On the other hand, the present specification discloses a wearable video display device. The wearable video display device includes a critical speed determining unit for determining a threshold value of a visual movement speed for switching an image quality according to a wearable video display device characteristic; A line-of-sight movement speed measuring unit for measuring a line-of-sight movement speed of a user of the wearable image display device; A control unit for generating a message for requesting adjustment of the quality of a video image to be transmitted according to a result of comparison between the gaze speed and the threshold value; And a communication unit for transmitting the quality adjustment request message to the outside and receiving the video image.

The apparatus and other embodiments may include the following features.

When the gaze movement speed is less than the threshold value and when the gaze movement speed is equal to or greater than the threshold value and when the gaze movement speed is not increasing, It is possible to request transmission of high quality video data among the data.

Meanwhile, in this specification, a method of controlling the image quality of a wearable video display device is presented. The image quality control method of the wearable image display device may include: determining a threshold value of a visual movement speed for image quality switching according to characteristics of the wearable image display device; Measuring an eye movement speed of a user of the wearable image display device; And requesting quality adjustment for a scalable video image to be transmitted according to a result of the comparison of the gaze speed and the threshold value, wherein when the gaze speed is smaller than the threshold value, And when the speed of sight line movement is not increased, it is possible to request the transmission of the enhancement layer video data of the scalable video image.

The method and other embodiments may include the following features.

The method includes receiving up-sampled base layer image data of the scalable video image during a delay time occurring when a transmission quality is switched when the base layer video data is transmitted and requesting transmission of the enhancement layer video data, And receiving the enhancement layer image data.

According to embodiments of the present invention, in a virtual reality image providing system that provides a 360-degree image, a bandwidth of the image receiving apparatus and the image transmitting system for processing the amount of video data increased due to the movement of the user's gaze And can support a fast response speed.

In addition, according to the embodiments disclosed in the present specification, in the virtual reality image providing system, the amount of video data to be transmitted can be efficiently controlled even with frequent eye movement and quick gaze movement of the user.

1 illustrates an exemplary virtual reality system for providing a virtual reality image.

2 is a diagram illustrating an exemplary scalable video coding service.

3 is a diagram showing an exemplary configuration of a server device.

4 is a diagram showing an exemplary structure of an encoder.

Figure 5 is an illustration of an exemplary method of signaling a region of interest

6 is a diagram showing an exemplary configuration of a client device.

7 is a diagram showing an exemplary configuration of the control unit.

8 is a diagram showing an exemplary configuration of a decoder.

9 is a diagram illustrating an exemplary method of controlling image quality in a wearable video display device.

FIG. 10 is a diagram illustrating an example in which the image quality is controlled according to a change in the line-of-sight movement speed.

11 is a diagram illustrating an exemplary method of error concealment when an enhancement layer video data enters a retransmission mode.

12 illustrates an exemplary method of transmitting a virtual reality image in a video server.

FIG. 13 exemplarily illustrates a video quality control method in a virtual reality system.

FIG. 14 is a diagram showing an exemplary configuration of a wearable image display device capable of controlling the quality of a transmission image according to a visual movement speed.

Figure 15 shows an OMAF syntax in an exemplary international video standard for signaling for image quality control.

Figure 16 shows an exemplary tile information syntax expressed in XML form.

The techniques disclosed herein can be applied to a virtual reality system. However, the technology disclosed in this specification is not limited thereto, and can be applied to all electronic devices and methods to which the technical idea of the above-described technology can be applied.

It is noted that the technical terms used herein are used only to describe specific embodiments and are not intended to limit the scope of the technology disclosed herein. Also, the technical terms used herein should be interpreted as being generally understood by those skilled in the art to which the presently disclosed subject matter belongs, unless the context clearly dictates otherwise in this specification, Should not be construed in a broader sense, or interpreted in an oversimplified sense. It is also to be understood that the technical terms used herein are erroneous technical terms that do not accurately represent the spirit of the technology disclosed herein, it is to be understood that the technical terms used herein may be understood by those of ordinary skill in the art to which this disclosure belongs And it should be understood. Also, the general terms used in the present specification should be interpreted in accordance with the predefined or prior context, and should not be construed as being excessively reduced in meaning.

As used herein, terms including ordinals, such as first, second, etc., may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals denote like or similar elements, and redundant description thereof will be omitted.

Further, in the description of the technology disclosed in this specification, a detailed description of related arts will be omitted if it is determined that the gist of the technology disclosed in this specification may be obscured. It is to be noted that the attached drawings are only for the purpose of easily understanding the concept of the technology disclosed in the present specification, and should not be construed as limiting the spirit of the technology by the attached drawings.

The virtual reality system includes a virtual reality image generation device that generates a virtual reality image, a server device that encodes and transmits the input virtual reality image, and one or more client devices that decode the transmitted virtual reality image and output the decoded virtual reality image to a user .

1, an exemplary virtual reality system 100 includes a virtual reality image generation device 110, a server device 120, and one or more client devices 130, Are not limited to these numbers. The virtual reality system 100 may also be referred to as a 360 degree image providing system.

The virtual reality image generating apparatus 110 may include one or more camera modules and generate a spatial image by photographing an image of a space in which the virtual reality image generating apparatus 110 is located.

The server device 120 generates a 360-degree image by stitching, projecting, and mapping spatial images generated and input in the virtual reality image generating apparatus 110, A 360-degree image can be encoded with video data of a desired quality and then encoded.

Also, the server device 120 may transmit the bitstream including the video data and the signaling data for the encoded 360-degree image to the client device 130 through the network (communication network).

The client device 130 may decode the received bit stream and output a 360-degree image to a user wearing the client device 130. [ The client device 130 may be a near-eye display device such as a head-mounted display (HMD).

Meanwhile, the virtual reality image generating apparatus 110 may be configured as a computer system to generate an image of a virtual 360-degree space implemented by computer graphics. In addition, the virtual reality image generating apparatus 110 may be a provider of virtual reality contents such as a virtual reality game.

The client device 130 may obtain user data from a user using the client device 130. The user data may include user's image data, voice data, viewport data (sight line data), region of interest data, and additional data.

For example, the client device 130 may include at least one of a 2D / 3D camera and an Immersive camera for acquiring image data of a user. The 2D / 3D camera can shoot an image having a viewing angle of 180 degrees or less. Immersive cameras can capture images with a viewing angle of 360 degrees or less.

For example, the client device 130 may include a first client device 131 that obtains user data of a first user located at a first location, a second client device 130 that obtains user data of a second user located at a second location, A second client device 133, and a third client device 135 that obtains user data of a third user located at a third location.

Each client device 130 may transmit user data obtained from the user to the server device 120 via the network.

Server device 120 may receive at least one user data from client device 130. The server device 120 can generate a full image of the virtual reality space based on the received user data. The entire image generated by the server device 120 may represent an immersive image providing a 360-degree image in the virtual reality space. The server device 120 may generate the entire image by mapping the image data included in the user data to the virtual reality space.

The server device 120 may transmit the generated whole image to each user.

Each client device 130 may receive the entire image and render and / or display only the area that each user views in the virtual reality space.

2 is a diagram illustrating an exemplary scalable video coding service.

Scalable video coding service is an image compression method for providing various services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. Scalable video coding services generally provide scalability in terms of spatial resolution, quality, and temporal aspects.

Spatial scalability can be provided by encoding the same image with different resolution for each layer. It is possible to adaptively provide image contents to devices having various resolutions such as a digital TV, a notebook, and a smart phone using spatial hierarchy.

Referring to the drawings, a scalable video coding service can support one or more TVs having different characteristics from a video service provider (VSP) through a home gateway in the home. For example, the scalable video coding service can simultaneously support HDTV (High-Definition TV), SDTV (Standard-Definition TV), and LDTV (Low-Definition TV) having different resolutions.

Temporal scalability can adaptively adjust the frame rate of an image in consideration of the network environment in which the content is transmitted or the performance of the terminal. For example, when a local area network is used, a service is provided at a high frame rate of 60 frames per second (FPS). When a wireless broadband communication network such as a 3G mobile network is used, a content is provided at a low frame rate of 16 FPS, A service can be provided so that the user can receive the video without interruption.

Quality scalability In addition, by providing contents of various image quality according to the network environment or the performance of the terminal, the user can stably reproduce the image contents.

The scalable video coding service may each include a base layer and one or more enhancement layers (s). The receiver provides a normal image quality when receiving only the base layer, and can provide a high image quality when the base layer and the enhancement layer are received together. In other words, when there is a base layer and one or more enhancement layers, when an enhancement layer (for example, enhancement layer 1, enhancement layer 2, ..., enhancement layer n) is further received while receiving a base layer, Is better.

Thus, since the scalable video coding service is composed of a plurality of hierarchical layers, the receiver receives the base layer data of a small capacity at a high speed and processes and reproduces the basic image quickly, The service quality can be increased.

3 is a diagram showing an exemplary configuration of a server device.

The server device 300 may include a control unit 310 and / or a communication unit 320.

The controller 310 may generate a full image of the virtual space and encode the entire image. In addition, the control unit 310 can control all the operations of the server device 300. Details will be described below.

The communication unit 320 may transmit and / or receive data to an external device and / or a client device. For example, the communication unit 320 may receive user data and / or signaling data from at least one client device. In addition, the communication unit 320 may transmit the entire image of the virtual space and / or the image of the partial region to the client device.

The control unit 310 may include at least one of a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, a signaling data generation unit 317, and / or an encoder 319 have.

The signaling data extracting unit 311 can extract signaling data from the data received from the client device. For example, the signaling data may include image configuration information. The image configuration information may include gaze information indicating a gaze direction of a user and zoom area information indicating a viewing angle of a user in a virtual space. In addition, the image configuration information may include the viewport information of the user in the virtual space.

The image generating unit 313 may generate a full image of the virtual space and an image of a specific region in the virtual space.

The ROI determining unit 315 may determine a ROI corresponding to the user's viewing direction in the entire area of the virtual space. In addition, the user's viewport can be determined within the entire area of the virtual space. For example, the ROI determiner 315 may determine the ROI based on the sight line information and / or the zoom area information. For example, the region of interest may include a location of a tile where the important object is located in a virtual space to be viewed by the user (for example, a location where a new enemy appears in a game or the like, a position of a speaker in a virtual space) It can be a place to look at. In addition, the ROI determining unit 315 may generate ROI information indicating the ROI corresponding to the user's viewing direction and information about the user's viewport in the entire area of the virtual space.

The signaling data generation unit 317 can generate signaling data for processing the entire image. For example, the signaling data may transmit the region of interest information and / or the viewport information. The signaling data may be transmitted via at least one of Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing the video data.

The encoder 319 may encode the entire image based on the signaling data. For example, the encoder 319 may encode the entire image in a customized manner for each user based on the viewing direction of each user. For example, when the user looks at a specific point in the virtual space, the encoder encodes the image corresponding to the specific point in high quality on the basis of the user's gaze in the virtual space, and the corresponding image other than the specific point is encoded can do. The encoder 319 may include at least one of a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, and / or a signaling data generation unit 317 have.

The control unit 310 includes a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, a signaling data generation unit 317, and an encoder 319 as well as a processor ), A memory (not shown), and an input / output interface (not shown).

The processor may include one or more of a central processing unit (CPU), an application processor (AP), or a communication processor (CP). The processor may perform, for example, operations or data processing related to control and / or communication of at least one other component of the controller 310. For example,

In addition, the processor may be implemented as a system on chip (SoC), for example. According to one embodiment, the processor may further comprise a graphics processing unit (GPU) and / or an image signal processor.

In addition, the processor may control a plurality of hardware or software components connected to the processor, for example, by driving an operating system or an application program, and may perform various data processing and operations.

The processor may also load or process instructions or data received from at least one of the other components (e.g., non-volatile memory) into volatile memory and store the various data in non-volatile memory have.

The memory may include volatile and / or non-volatile memory. The memory may, for example, store instructions or data related to at least one other component of the controller 310. [ According to one embodiment, the memory may store software and / or programs.

The input / output interface may serve as an interface through which commands or data input from, for example, a user or another external device can be transmitted to the other component (s) of the control unit 310. The input / output interface may output commands or data received from other component (s) of the controller 310 to a user or another external device.

Hereinafter, an exemplary image transmission method using a region of interest will be described.

The server device can receive video data and signaling data from at least one client device using a communication unit. Further, the server device can extract the signaling data using the signaling data extracting unit. For example, the signaling data may include viewpoint information and zoom region information.

The gaze information can indicate which area (point) the user sees in the virtual space. When the user looks at a specific area within the virtual space, the line of sight information can indicate the direction from the user to the specific area.

The zoom area information may indicate an enlarged range and / or a reduced range of the video data corresponding to the viewing direction of the user. In addition, the zoom area information can indicate the viewing angle of the user. If the video data is enlarged based on the value of the zoom area information, the user can view only the specific area. If the video data is reduced based on the value of the zoom area information, the user can view not only the specific area but also a part and / or the entire area other than the specific area.

Then, the server device can generate the entire image of the virtual space using the image generating unit.

Then, the server device can use the region-of-interest determination unit to grasp the video configuration information of the point of view and the zoom region of each user in the virtual space based on the signaling data.

Then, the server device can determine the region of interest of the user based on the image configuration information using the region of interest determination unit.

When the signaling data (for example, at least one of the view information and the zoom area information) is changed, the server device can receive new signaling data. In this case, the server device can determine a new region of interest based on the new signaling data.

Then, the server device can use the control unit to determine whether the data currently processed based on the signaling data is data corresponding to the region of interest.

When the signaling data is changed, the server device can determine whether or not the data currently processed based on the new signaling data is data corresponding to the region of interest.

In the case of data corresponding to the region of interest, the server device can encode video data (for example, a region of interest) corresponding to the user's viewpoint at a high quality using an encoder. For example, the server device may generate base layer video data and enhancement layer video data for the video data and transmit them.

When the signaling data is changed, the server device can transmit the video data corresponding to the new time point (new interest area) as a high-quality image. If the server device is transmitting a low-quality image but the signaling data is changed so that the server device transmits a high-quality image, the server device can additionally generate and / or transmit enhancement layer video data.

In the case of data not corresponding to the area of interest, the server device can encode video data (e.g., non-interest area) that does not correspond to the user's viewpoint at a low quality. For example, the server device may generate only base layer video data for video data that does not correspond to a user's viewpoint, and may transmit them.

When the signaling data is changed, the server device can transmit video data (new non-interest area) that does not correspond to the new user's viewpoint with a low quality image. In the case where the server device is transmitting a high quality image but the signaling data is changed and the server device transmits a low quality image, the server device does not generate and / or transmit at least one enhancement layer video data, Only hierarchical video data can be generated and / or transmitted.

That is, since the image quality of the video data when the base layer video data is received is lower than the image quality of the video data received when the enhancement layer video data is received, the client device, at the moment when the user obtains the information, (E.g., a region of interest) corresponding to the viewing direction of the video data. Then, the client device can provide high quality video data to the user in a short time.

4 is a diagram showing an exemplary structure of an encoder.

The encoder 400 may include at least one of a base layer encoder 410, at least one enhancement layer encoder 420, and a multiplexer 430.

The encoder 400 may encode the entire image using a scalable video coding method. The scalable video coding method may include Scalable Video Coding (SVC) and / or Scalable High Efficiency Video Coding (SHVC).

The scalable video coding method is an image compression method for providing a variety of services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. For example, the encoder 400 may encode images of two or more different qualities (or resolution, frame rate) for the same video data to generate a bitstream.

For example, the encoder 400 may use an inter-layer prediction tool, which is an encoding method using intra-layer redundancy, in order to increase the compression performance of video data. The inter-layer prediction tool is a technique for enhancing the extrusion efficiency in an enhancement layer (EL) by eliminating redundancy of images existing between layers.

The enhancement layer can be encoded by referring to information of a reference layer using an inter-layer prediction tool. The reference layer refers to the lower layer that is referred to in the enhancement layer encoding. Here, since there is a dependency between layers by using a layer-to-layer tool, in order to decode the image of the highest layer, a bitstream of all lower layers to be referred to is required. In the middle layer, decoding can be performed by acquiring only a bitstream of a layer to be decoded and its lower layers. The bitstream of the lowest layer is a base layer (BL), and can be encoded by an encoder such as H.264 / AVC or HEVC.

The base layer encoder 410 may encode the entire image to generate base layer video data (or base layer bitstream) for the base layer. For example, the base layer video data may include video data for the entire area viewed by the user in the virtual space. The image of the base layer may be the image of the lowest image quality.

The enhancement layer encoder 420 encodes the entire image based on signaling data (e.g., region of interest information) and base layer video data to generate at least one enhancement layer for at least one enhancement layer, Video data (or enhancement layer bitstream). The enhancement layer video data may include video data for a region of interest within the entire region.

The multiplexer 430 may multiplex the base layer video data, the at least one enhancement layer video data, and / or the signaling data, and may generate one bitstream corresponding to the entire image.

5 is a diagram illustrating an exemplary method of signaling a region of interest.

Referring to FIG. 5, there is shown a method of signaling a region of interest in scalable video.

A server device (or an encoder) converts one video data (or picture) composed of an enhancement layer into scalable video data 500 composed of a base layer BL and at least one enhancement layer (EL) Tiles < RTI ID = 0.0 > 510 < / RTI > For example, video data can be partitioned into Coding Tree Unit (CTU) units. For example, one CTU may include Y CTB, Cb CTB, and Cr CTB.

The server device can encode the video data of the base layer BL as a whole without dividing the data into tiles for fast user response.

The server device may divide and encode video data of one or more enhancement layers into a plurality of tiles, some or all, as needed. That is, the server device may divide the video data of the enhancement layer into at least one tile and encode tiles corresponding to the region of interest 520 (ROI, Region of Interest).

In this case, the region of interest 520 may include a location of a tile where an important object (Object) to be viewed by the user is located in the virtual reality space (for example, a location where a new enemy appears in the game, Location), and / or where the user's gaze is viewed.

The server device may also generate region of interest information including tile information identifying at least one tile included in the region of interest. For example, the region of interest information may be generated by a region of interest determination unit, a signaling data generation unit, and / or an encoder included in the server device.

Since the tile information in the area of interest 520 is continuous, it can be effectively compressed without having all the numbers of tiles. For example, the tile information may include not only the numbers of all the tiles corresponding to the area of interest but also the starting and ending numbers of the tiles, the coordinate point information, the CU (Coding Unit) number list, .

In addition, the area of interest 520 may be the current viewport of the user.

The tile information in the non-interest region may be sent to another client device, image processing computing device, and / or server after entropy coding provided by the encoder.

The region of interest may be delivered via a High-Level Syntax Protocol carrying the session information. In addition, the region of interest may be transmitted in packet units such as SEI (Supplement Enhancement Information), VUI (video usability information), and slice header of a video standard. In addition, the region of interest information may be transferred to a separate file describing the video file (e.g., MPD of DASH).

Hereinafter, a method of signaling a region of interest in single-screen video is shown.

An exemplary technique of the present disclosure can use a technique of downscaling an image in a non-scalable video rather than down-scaling (downsampling) an area, rather than a region of interest (ROI) have. The prior art does not share the filter information used for downscaling between the terminals using the service, but makes an appointment from the beginning with only one technique, or only the encoder knows the filter information.

However, the server device according to the present invention may be configured such that the client device (or the HMD terminal) that receives the encoded image uses the filter information used at the time of encoding to slightly improve the image quality of the downscaled out- To the client device. This technique can actually reduce image processing time significantly and can provide image quality enhancement.

As described above, the server device may generate the region of interest information. For example, the area of interest information may further include filter information as well as tile information. For example, the filter information may include the number of promised filter candidates, the values used in the filter.

6 is a diagram showing an exemplary configuration of a client device.

The client device 600 includes an image input unit 610, an audio input unit 620, a sensor unit 630, an image output unit 640, an audio output unit 650, a communication unit 660, and / As shown in FIG. For example, the client device 600 may be an HMD (Head-Mounted Display). The control unit 670 of the client device 600 may be included in the client device 600 or may be a separate device.

The video input unit 610 can capture video data. The image input unit 610 may include at least one of a 2D / 3D camera and / or an immersive camera for acquiring a user's image. The 2D / 3D camera can shoot an image having a viewing angle of 180 degrees or less. Immersive cameras can capture images with a viewing angle of 360 degrees or less.

The audio input unit 620 can record the user's voice. For example, the audio input 620 may include a microphone.

The sensor unit 630 can acquire information on the movement of the user's gaze. For example, the sensor unit 630 may include a gyro sensor for sensing a change in the azimuth of the object, an acceleration sensor for measuring the acceleration of the moving object or the intensity of the impact, and an external sensor for sensing the direction of the user's gaze . According to an embodiment, the sensor unit 630 may include an image input unit 610 and an audio input unit 620.

The video output unit 640 can output video data received from the communication unit 660 or stored in a memory (not shown).

The audio output unit 650 can output audio data received from the communication unit 660 or stored in the memory.

The communication unit 660 can communicate with an external client device and / or a server device through a broadcasting network, a wireless communication network, and / or broadband. For example, the communication unit 660 may include a transmitting unit (not shown) for transmitting data and / or a receiving unit (not shown) for receiving data.

The control unit 670 can control all operations of the client device 600. [ The control unit 670 can process the video data and the signaling data received from the server device. Details of the control unit 670 will be described below.

7 is a diagram showing an exemplary configuration of the control unit.

The control unit 700 may process the signaling data and / or the video data. The control unit 700 may include at least one of a signaling data extractor 710, a decoder 720, a line of sight determiner 730, and / or a signaling data generator 740.

The signaling data extracting unit 710 may extract signaling data from data received from the server device and / or another client device. For example, the signaling data may include region of interest information.

Decoder 720 may decode the video data based on the signaling data. For example, the decoder 720 may decode the entire image in a customized manner for each user based on the viewing direction of each user. For example, when the user looks at a specific area in the virtual space, the decoder 720 decodes the image corresponding to the specific area with high image quality based on the user's gaze in the virtual space, Lt; / RTI > The decoder 720 may include at least one of a signaling data extractor 710, a line of sight determiner 730, and / or a signaling data generator 740 according to an embodiment of the present invention.

The gaze determining unit 730 can determine the user's gaze in the virtual space and generate the image configuration information. For example, the image configuration information may include gaze information indicating a gaze direction and / or zoom area information indicating a viewing angle of a user.

The signaling data generation unit 740 may generate signaling data for transmission to a server device and / or another client device. For example, the signaling data may transmit image configuration information. The signaling data may be delivered via a High-Level Syntax Protocol carrying the session information. The signaling data may be transmitted via at least one of Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing the video data.

8 is a diagram showing an exemplary configuration of a decoder.

The decoder 800 may include at least one of an extractor 810, a base layer decoder 820, and / or at least one enhancement layer decoder 830.

The decoder 800 may decode the bitstream (video data) using an inverse process of the scalable video coding method.

The extractor 810 receives the bitstream (video data) including the video data and the signaling data, and can selectively extract the bitstream according to the image quality of the video to be reproduced. For example, a bitstream (video data) may include a base layer bitstream (base layer video data) for a base layer and at least one enhancement layer bitstream for at least one enhancement layer predicted from the base layer ). The base layer bitstream (base layer video data) may include video data for the entire area of the virtual space. At least one enhancement layer bitstream (enhancement layer video data) may include video data for a region of interest within the entire region.

The signaling data may also include region of interest information indicating a region of interest corresponding to the direction of the user's gaze within the entire region of the virtual space for the video conferencing service.

The base layer decoder 820 can decode a base layer bitstream (or base layer video data) for a low-quality image.

The enhancement layer decoder 830 can decode at least one enhancement layer bitstream (or enhancement layer video data) for the high-definition video based on the signaling data and / or the bitstream (or base layer video data) have.

Hereinafter, a method of generating image configuration information for responding to the movement of the user's gaze in real time will be described.

The image configuration information may include at least one of gaze information indicating a gaze direction of a user and / or zoom area information indicating a viewing angle of a user. The user's gaze is the direction that the user looks in the virtual space, not the actual space. In addition, the gaze information may include information indicating the gaze direction of the user in the future (for example, information on gaze points that are expected to receive attention), as well as information indicating the gaze direction of the current user.

The client device can sense the operation of looking at a specific area located in the virtual space around the user and process the operation.

The client device can receive the sensing information from the sensor unit using the control unit and / or the sight line determination unit. The sensing information may be a video shot by a camera, or a voice recorded by a microphone. In addition, the sensing information may be data sensed by a gyro sensor, an acceleration sensor, and an external sensor.

Further, the client device can confirm the movement of the user's gaze based on the sensing information by using the control unit and / or the visual-line determining unit. For example, the client device can check the movement of the user's gaze based on the change of the value of the sensing information.

Further, the client device can generate image configuration information in the virtual reality space using the control unit and / or the visual determination unit. For example, when the client device physically moves or the user's gaze moves, the client device can calculate the gaze information and / or the zoom area information of the user in the virtual reality space based on the sensing information.

Further, the client device can transmit image configuration information to the server device and / or another client device using the communication unit. In addition, the client device may forward the video configuration information to its other components.

In the foregoing, a method of generating image configuration information by a client device has been described. However, the present invention is not limited thereto, and the server device may receive the sensing information from the client device and generate the image configuration information.

In addition, an external computing device connected to the client device may generate image configuration information, and the computing device may communicate image configuration information to its client device, another client device, and / or a server device.

Hereinafter, a method for the client device to signal image configuration information will be described.

Signaling the video configuration information (including viewpoint information and / or zoom area information) is very important. If the signaling of the video configuration information is too frequent, it may place a burden on the client device, the server device, and / or the entire network.

Accordingly, the client device can signal image configuration information only when the image configuration information (or gaze information and / or zoom area information) of the user is changed. That is, the client device can transmit the gaze information of the user to another client device and / or the server device only when the gaze information of the user is changed.

In the above description, the client device generates and / or transmits the image configuration information. However, the server device may receive the sensing information from the client device, generate the image configuration information based on the sensing information, It may be transmitted to one client device.

The above-mentioned signaling may be signaling between a server device, a client device, and / or an external computing device (if present). In addition, the above-mentioned signaling may be signaling between the client device and / or an external computing device (if present).

In the following, an exemplary method of transmitting high / low level images is described.

A method of transmitting a high / low level image based on a user's gaze information includes a method of switching layers of a scalable codec, a rate control method using QP (quantization parameter) in case of single bit stream and real time encoding, DASH A method of switching in units of chunks in the case of a single bit stream of a bit stream, a down scaling / up scaling method and / or a high quality rendering method utilizing more resources in the case of rendering can do.

Although the above-described exemplary techniques describe a differential transmission scheme using scalable video, even when using a general video coding technique with a single layer, by adjusting the quantization parameter and the degree of downscaling / upscaling, Lowering overall bandwidth, and quickly responding to user gaze movements. In addition, when using files that are transcoded into a bitstream having several bitrates in advance, the exemplary technique of the present invention switches between a high-level image and a low-level image on a chunk basis .

In addition, although the present specification assumes a virtual reality system, the present specification can be equally applied to a VR (Virtual Reality) game using an HMD, an Augmented Reality (AR) game, and the like. That is, all of the techniques for providing a high-level region corresponding to the line of sight that the user is looking at, and signaling only when the user looks at an area or an object that is not expected to be viewed, It can be applied just as in the example.

Hereinafter, a method for controlling the quality of a virtual reality image to be displayed when the gaze movement speed changes in a wearable image display device such as an HMD will be described with reference to FIGS.

The human eye has a characteristic that, even when an image having a different quality at a specific speed is displayed on the screen, the difference in quality can not be recognized when the eye moves. Therefore, by using these characteristics, the threshold value is set as the gaze speed at which the image quality is changed during the movement of the line of sight but does not recognize the difference. Since the threshold value varies depending on the characteristics of the wearable image display device, the threshold value may be different for each wearable image display device.

First, the wearable video display device determines a threshold value of the visual movement speed for switching the image quality according to the characteristics of the apparatus (901).

In addition, the wearable video display device measures the user's gaze movement speed (903).

Thereafter, the wearable video display device compares the measured eye movement speed with a threshold value (905), and requests adjustment of the quality of video data to be transmitted to the wearable display device according to the comparison result.

First, the wearable video display device requests the server device to transmit high quality video data if the gaze speed is smaller than the threshold value (e.g., time t ₀ ) (907).

Meanwhile, in the case where the gaze movement speed is equal to or greater than the threshold value (1000, 1010), the wearable image display apparatus measures a change tendency of the gaze movement speed (909).

The wearable video display device displays a low quality tile (for example, a time interval of 1000, i.e., t ₁ time) as a result of the measurement of the change tendency of the visual movement speed, BL) data to the server device (911).

On the other hand, if the viewing speed does not increase (1010), the wearable video display requests the server device to transmit high quality tile (BL + EL) data (907).

Here, the user's gaze movement speed can track the user's head movement and / or the pupil's movement through a sensor provided on the wearable display device or an external sensor, and can use the gaze movement speed to obtain the gaze movement speed. Also, only high-quality video data is transmitted in an area in the virtual reality space corresponding to the viewport of the user.

According to another embodiment, when the gaze movement speed changes, the quality control of the virtual reality image requests the server device to transmit high quality image data when the gaze movement speed is smaller than the threshold value, , It can be achieved by requesting the server device to transmit the image data of low quality.

Here, high quality video data such as high quality video data UHD (Ultra High Definition) and low quality video data may be relatively low quality video data such as HD and SD.

Also, the quality of the first quality image data and the quality of the second quality image data may be different from each other due to differences in image quality factors including image quality, image frame number, image resolution, and image scanning method.

The video data may be scalable video data, and the high quality video data may include base layer video data and enhancement layer video data of currently transmitted video data, Only the base layer image data excluding the data can be included.

In addition, the high quality video data transmission request step is referred to as an EL layer request mode, and the low quality video data transmission request step is referred to as an EL layer mode.

In addition, when the wearer-type video display device is in the process of transferring low-quality video data, if the speed of the user's gaze movement is smaller than the threshold value or the speed of gaze movement is not changed or decreased, that is, , It is possible to request the server device to transmit the high quality video data again.

At this time, the wearable video display device requests the server device to transmit the enhancement layer video data of the video data.

In the case of the virtual reality image transmission system using the scalable video technique and the tiling technique, only the tiles corresponding to the viewport currently viewed by the user provide high quality (high image quality) image information. At this time, the tiles which are required to provide new high-quality image information due to the movement of the user's viewport should receive image information of the enhancement layer and proceed with image decoding. Due to the limitation of the image motion prediction structure, It may be impossible to proceed.

This phenomenon continues until the intra picture (Intra Picture, I-Picture) whose dependency has been removed is transmitted in order to apply the intra-picture prediction technique only and to decode the neighboring pictures. When providing the high picture quality information . During the generated delay time, the user can receive only the low quality image information, and the user may feel uncomfortable feeling such as motion sickness.

In order to solve this problem, as shown in FIG. 10, in the re-entry of the enhancement layer transmission request mode, an error concealment technique using an upsampled base layer of a reference picture Lt; / RTI >

The technique of error concealment using the upsampled base layer of the reference picture is to replace the base layer image information of the reference image instead of the image information of the enhancement layer of the reference image, It can be used for motion compensation by sampling.

The wearable video display device reproduces the upsampled base layer image data during the delay time during the delay time occurring when the transmission quality is changed from the low quality video data to the high quality video data, When the hierarchical video data is received, the high quality video data may be output, thereby alleviating the unpleasantness / fatigue of the user due to rapid change in the video quality.

As described above, there is a delay in providing a high-quality image until the intra picture is transmitted. However, the corresponding technique uses only the base layer image information of the reference image and the enhancement layer image of the current tile during the corresponding delay time, And provides the user with image information of a higher quality than that provided by the user.

This can improve the image quality of the average service, and can alleviate the discomfort / fatigue of the user due to rapid changes in the image quality.

Hereinafter, with reference to FIG. 12, a description will be given of a method of transmitting a virtual reality image while controlling the quality of a virtual reality image in a wearable image display device such as an HMD in a video server.

When the video server receives a transmission request message of high quality video data from the wearable video display device in operation 1201, the video server transmits high quality video data to the virtual reality space in response to the transmission request of the high quality video data Type image display apparatus (1203).

In addition, if the video server receives a transmission request message of low quality video data from the wearable video display device in step 1205, the video server transmits low quality video data to the virtual reality space in response to the transmission request of the low quality video data To the wearable video display device (1207).

Here, when the video server receives the transmission request message of the high quality video data while transmitting the low quality video data to the wearable video display device (1209), the video server transmits the low quality video data during the delay time After the up-sampled image data of the data is transmitted to the wearable display device, high-quality image data is transmitted to the wearable display device (1211).

Here, the high-quality image data includes the base layer image data and the enhancement layer image data of the currently transmitted video data, and the low-quality image data may include only the base layer image data.

Also, the upsampled image data may be upsampled image data of the base layer image data.

Referring to FIG. 13, a description will be made of a method for lowering the bandwidth through image quality control in a virtual reality system.

The wearable video display device 1330 determines a threshold value of the line-of-sight movement speed for switching image quality according to device characteristics (1331).

The wearable video display 1330 measures the eye movement speed of the user (1333).

The wearable display 1330 requests 1315 the video server to transmit video data whose quality has been adjusted according to the comparison result of the gaze speed and the threshold value. Eye line information is transmitted (1337).

The video server 1310 obtains the user's viewport from the received sight line information, adjusts the quality of the video corresponding to the viewport in response to the transmission request of the video data whose quality is adjusted, (Step 1339). In step 1339, the video data having the adjusted quality is transmitted.

Next, the wearable video display device 1330 decodes and outputs the received video data (1341)

Here, the wearable display 1330 may request the video server to transmit 1335b the high quality image data if the eye movement speed is less than the threshold value as a result of the comparison 1335a.

Meanwhile, if the eye movement speed is equal to or greater than the threshold value as a result of the comparison (1335a), the wearable display device (1330) looks at a change trend of the eye movement speed (1335c).

The wearable display device 1330 requests the video server to transmit low quality video data if the gaze speed is increasing 1335d and if the gaze speed is not increasing, Quality video data to the video server 1310 (1335b).

When the video server 1310 receives the transmission request message of the high quality video data while transmitting the low quality video data to the wearable video display device 1330, Sampled image data of the low quality image data to the wearable image display device 1330 during the delay time for transmitting the high quality image data to the wearable image display device 1330. [

Therefore, by adjusting the quality of the virtual reality image corresponding to the viewport according to the speed of the user's gaze movement through the image transmission method of the exemplary virtual reality system disclosed in the present specification, it is possible to save the bandwidth required for the transmission of the image data do.

Hereinafter, with reference to FIG. 14, an exemplary wearable video display device capable of saving bandwidth by controlling the quality of a transmission image according to the visual movement speed in the virtual reality system will be described.

The wearable display 1400 may include a critical speed determiner 1410, a visual-movement speed measuring unit 1430, a controller 1450, and a communication unit 1470.

The critical speed determiner 1410 can determine the threshold value of the visual line moving speed for switching the image quality in consideration of the characteristics of the wearable image display device 1400. [

The eye movement speed measuring unit 1430 may measure the eye movement speed of the user of the wearable display 1400. Here, the user's gaze movement speed may track the user's head movement and / or the pupil's movement through a sensor or an external sensor provided in the wearable display device, and may use the gaze movement speed to obtain the gaze movement speed.

The control unit 1450 may generate a message requesting to adjust the quality of the video data to be transmitted to the viewport according to the result of the comparison between the speed of sight movement and the threshold value. In addition, when the gaze movement speed is smaller than the threshold value and when the gaze movement speed is equal to or greater than the threshold value, when the gaze movement speed does not increase, Quality of video data to be transmitted can be adjusted by requesting transmission of high quality video data among video data of different quality.

The communication unit 1470 may transmit the quality control request message to the external video server and receive the video data from the video server.

Wherein the threshold value of the gaze movement speed for switching the image quality is a gaze movement speed at which the user can not perceive a difference in quality with respect to images of different qualities when the user moves the gaze, The size may vary depending on the characteristics of the apparatus.

Hereinafter, a signal system for controlling image quality will be described with reference to FIG. 15 to FIG.

As described above, by transmitting the image quality adjustment signal based on the user's gaze speed, it is possible to provide the maximum quality image service in the minimum transmission bandwidth. However, transmitting the moving speed information to the server side whenever the user moves his or her gaze places a burden on the entire network, the user terminal, or the image transmission system.

Therefore, the exemplary signaling method (signaling scheme) is based on a point in time at which the user's gaze movement speed is fast and a section in which high-quality image information is omitted (referred to as an enhancement layer skip interval) The quality control information is transmitted only when a section for requesting high quality video information is requested again (referred to as an enhancement layer request section), thereby reducing the burden on the entire network, the user terminal, or the video transmission system.

Referring to FIG. 15, there is shown an exemplary Omnidirectional Media Application Format (OMAF) syntax in an international video standard such as H.264 AVC or H.265 HEVC.

The syntax of reference numeral 1500 in the drawing is a new addition to the embodiment of the present specification, and all the other syntaxes are existing standard syntax.

unsigned (n) means the number of unsigned 'n' bits in a normal programming language.

The center_yaw syntax specifies the viewport orientation relative to the global coordinate axis and represents the center of the viewport. The range should be within -180 * 2 ^ 16 ~ 180 * 2 ^ 16 - 1.

The center_pitch statement specifies the viewport orientation relative to the global coordinate axis and represents the center of the viewport. The range should be within -90 * 2 ^ 16 ~ 90 * 2 ^ 16 - 1.

The center_roll statement specifies the viewport orientation relative to the global coordinate axis and represents the roll coordinates of the viewport. The range should be within -180 * 2 ^ 16 ~ 180 * 2 ^ 16 - 1.

The hor_range statement represents the horizontal extent in the sphere. The range is specified through the center point of the sphere and must be within 0 ~ 720 * 2 ^ 16.

The ver_range syntax indicates a vertical range in the sphere. The range is specified through the center point of the sphere and must be within 0 ~ 180 * 2 ^ 16.

The interpolate syntax indicates whether linear interpolation is applied. A value of 1 indicates that linear interpolation is applied.

The EL_skip_flag syntax indicates an EL request mode when the value is 0. When the EL_skip_flag value is 1, the EL_skip_flag syntax indicates an EL skip mode. .

The above defined syntax and semantics information can be expressed in XML format in HTTP based video communication such as MPEG DASH.

Figure 16 shows an exemplary tile information syntax expressed in XML form.

Referring to FIG. 16, an information mode, a total number of tiles, and transmission / non-transmission information of EL (enhancement layer) video data for each tile may be expressed as XML as a Tile information syntax expressed in XML form.

The virtual reality system according to the embodiments disclosed herein can be implemented as computer readable code on a computer readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers skilled in the art to which the present description belongs.

In the foregoing, preferred embodiments of the present invention have been described with reference to the accompanying drawings. Here, the terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary meanings, but should be construed in a meaning and a concept consistent with the technical idea of the present invention.

The scope of the present invention is not limited to the embodiments disclosed herein, and the present invention can be modified, changed, or improved in various forms within the scope of the present invention and the claims.

Claims

Determining a threshold value of a line-of-sight traveling speed for switching the image quality according to a characteristic of the wearable video display device;

Measuring an eye movement speed of a user of the wearable image display device; And

And requesting a quality adjustment of a video image to be transmitted according to a result of the comparison between the gaze speed and the threshold value.
The method according to claim 1,

The operation of requesting the quality adjustment of the video image to be transmitted according to the result of the comparison between the gaze speed and the threshold value

Requesting transmission of image data of a first quality if the gaze movement speed is smaller than the threshold value,

If the gaze movement speed is equal to or greater than the threshold value,

Requesting transmission of video data of a second quality lower than the first quality if the gaze movement speed is increasing,

And requests transmission of the image data of the first quality if the gaze movement speed does not increase.
3. The method of claim 2,

Wherein the image data of the first quality includes at least one of high-quality image data, base-layer image data, and enhancement-layer image data for a video image currently being transmitted,

Wherein the image data of the second quality includes at least one of low-quality image data for the currently transmitted video image and the base layer image data.
The method of claim 3,

Wherein the image data of the first quality and the image data of the second quality differ in quality due to differences in image quality factors including the image quality, the number of frames of the image, the resolution of the image, A method of controlling image quality of a video display device.
The method of claim 3,

The operation of requesting the quality adjustment of the video image to be transmitted according to the result of the comparison between the gaze speed and the threshold value

A wearable video display requesting transmission of the video data of the first quality if the gaze movement speed is less than the threshold value or the gaze movement speed is not increasing while the video data of the second quality is being transmitted A method for controlling image quality of a device.
6. The method of claim 5,

Requesting transmission of the image data of the first quality when the visual-movement speed is lower than the threshold value or the visual-movement speed is not increasing while the image data of the second quality is being transmitted

Wherein the upsampled base layer image data of the video image is reproduced during the delay time during a delay time which occurs when a transmission quality is switched, And outputting the video image of the first quality when the video data is received.
The method according to claim 1,

The operation of requesting transmission of the video image whose quality is adjusted according to the result of the comparison between the speed of sight movement and the threshold value

Requesting transmission of image data of a first quality if the gaze movement speed is smaller than the threshold value,

And transmits the image data of the second quality lower than the first quality when the gaze speed is equal to or greater than the threshold value.
8. The method of claim 7,

Wherein the image data of the first quality includes at least one of high-quality image data, base-layer image data, and enhancement-layer image data for a video image currently being transmitted,

Wherein the image data of the second quality includes at least one of low-quality image data for the currently transmitted video image and the base layer image data.
The method according to claim 1,

Wherein the threshold value of the gaze movement speed for switching the image quality is a gaze movement speed at which the user can not perceive a difference in quality with respect to images of different qualities when the user moves the gaze, A method of controlling image quality of a wearable video display device, the method being dependent on characteristics of the device.
Receiving a transmission request message of video data of a first quality from a wearable video display device;

Transmitting the video data of the first quality to a video image corresponding to the transmission request of the video data of the first quality to the wearable video display device;

Receiving a transmission request message of video data of a second quality lower than the first quality from the wearable video display device; And

And transmitting the image data of the second quality to the video image corresponding to the transmission request of the video data of the second quality to the wearable video display device,

When receiving the transmission request message of the image data of the first quality while the image data of the second quality is being transmitted to the wearable display,

Sampling image data of the image data of the second quality to the wearable image display device for a delay time that occurs when the transfer quality is switched, and then transmits the image data of the first quality to the wearable image display device The method comprising the steps of:
11. The method of claim 10,

Wherein the image data of the first quality includes base layer image data and enhancement layer image data of a currently transmitted video image,

The video data of the second quality includes the base layer video data

Wherein the upsampled image data is upsampled image data of the base layer image data.
An operation of the wearable video display device to determine a threshold value of a visual movement speed for switching an image quality according to a device characteristic;

The wearable video display device measuring an eye movement speed of a user;

Requesting the video server to transmit the quality-adjusted video image according to the comparison result of the gaze speed and the threshold value; And

And transmitting the quality-adjusted video image to the wearable display device in response to a request for transmission of the video image by the video server.
13. The wearable video display device according to claim 12,

Requesting the video server to transmit video data of a first quality if the gazing speed is smaller than the threshold,

Requesting the video server to transmit video data of a second quality lower than the first quality if the gaze speed is equal to or greater than the threshold value and the gaze movement speed is increasing, And transmits the video data of the first quality to the video server if the moving speed is not increasing.
14. The video server of claim 13,

Transmitting the image data of the first quality to the video image corresponding to the transmission request of the video data of the first quality to the wearable video display device,

And transmitting image data of the second quality to the video image in response to a request for transmission of the video data of the second quality to the wearable video display device.
15. The video server of claim 14,

When receiving the transmission request message of the image data of the first quality while the image data of the second quality is being transmitted to the wearable display,

Sampling image data of the image data of the second quality to the wearable image display device for a delay time that occurs when the transfer quality is switched, and then transmits the image data of the first quality to the wearable image display device A method of controlling image quality.
13. The method of claim 12,

Wherein the threshold value of the gaze movement speed for switching the image quality is a gaze movement speed at which the user can not perceive a difference in quality with respect to images of different qualities when the user moves the gaze, A method of controlling image quality of a wearable video display device, the method being dependent on characteristics of the device.
A critical speed determining unit for determining a threshold value of a line-of-sight traveling speed for switching an image quality according to a wearable video display device characteristic;

A line-of-sight movement speed measuring unit for measuring a line-of-sight movement speed of a user of the wearable image display device;

A control unit for generating a message for requesting adjustment of the quality of a video image to be transmitted according to a result of comparison between the gaze speed and the threshold value; And

And a communication unit for transmitting the quality adjustment request message to the outside and receiving the video image.
18. The apparatus of claim 17, wherein the control unit

When the gaze movement speed is less than the threshold value and when the gaze movement speed is equal to or greater than the threshold value and when the gaze movement speed is not increasing, Quality video data is requested to be transmitted.
18. The method of claim 17,

Wherein the threshold value of the gaze movement speed for switching the image quality is a gaze movement speed at which the user can not perceive a difference in quality with respect to images of different qualities when the user moves the gaze, A wearable image display device which varies depending on the characteristics of the device.
Determining a threshold value of a line-of-sight traveling speed for switching the image quality according to a characteristic of the wearable video display device;

Measuring an eye movement speed of a user of the wearable image display device; And

And requesting quality adjustment for a scalable video image to be transmitted according to a result of the comparison of the gaze speed and the threshold value,

The transmission of the enhancement layer image data of the scalable video image is performed when the visual line moving speed is less than the threshold value and when the visual line moving speed is equal to or greater than the threshold value, And controlling the image quality of the wearable video display device.
21. The method of claim 20,

If transmission of the enhancement layer video data is requested while base layer video data is being transmitted,

Further comprising: receiving up-sampled base layer video data of the scalable video image during a delay time occurring during transmission quality change, and receiving the enhancement layer video data after the delay time Method for controlling image quality.