WO2017007208A1

WO2017007208A1 - Device and method for extracting image from high-resolution image

Info

Publication number: WO2017007208A1
Application number: PCT/KR2016/007209
Authority: WO
Inventors: 임정연; 서동범
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2015-07-03
Filing date: 2016-07-04
Publication date: 2017-01-12

Abstract

A device and a method for extracting an image from multiple images are disclosed. According to one aspect of the present embodiment, one purpose is to provide a device for selecting/extracting an image, a service, and a method therefor, the device enabling: an image of a predefined area to be extracted, in real time, from a user device such that the image is displayed according to a display screen of the device, if a service provider transmits, without a separate converting step, an image of a size larger than that of a screen of a device possessed by a user; or the image of the predefined area to be extracted and transmitted, in real time, in a network gateway or a streaming server, so as to be suitable for the performance of a terminal, which has requested the corresponding image; or a location navigated and selected by the user through a user terminal to be received such that an image of a corresponding area is extracted, thereby providing the user's viewpoint from various angles, such as the enlargement of a specific location.

Description

Apparatus and method for image extraction from high resolution images

The present embodiment relates to a method and an apparatus capable of extracting some images from a high resolution image.

The contents described in this section merely provide background information on the present embodiment and do not constitute a prior art.

Recently, the number of ultra high definition (UHD) content is increasing. In addition, as the resolution of video contents increases, various attempts have been made to enlarge or view a part of a desired image such as UHD 360 panorama using various wearable devices.

As the UHD service expands, the resolution of display devices such as TVs for providing ultra high-definition service is increasing. However, mobile terminals have been introduced to the extent that they support QHD (Quad High Definition) resolution. Although the mobile terminal may support UHD contents, contents having a resolution higher than Full HD do not show a difference in image quality in the mobile terminal. In addition, since the mobile terminal has a limitation on the screen size, there are limitations such as limiting the buffer size of the terminal and deterioration in image quality due to downsampling.

Content with a constant resolution is generally provided to a mobile terminal by reducing the size of the bit rate or resolution using a real-time transcoder. However, as the resolution of the original image is larger, there is a limit to reducing only the bit rate, and it is necessary to edit the image to adjust the resolution before transmission.

As the resolution of content increases, existing legacy terminals (TV, smartphones, pads, monitors, etc.) that cannot reproduce content as well should be provided with content edited in advance using a transcoder.

In the above case, a transcoding procedure occurs that requires decoding, editing, and re-encoding the received image. Due to the transcoding process, problems such as deterioration and delay may occur, thereby limiting the provision of content and increasing the cost of reprocessing the image.

In addition, when the headend performs the image reprocessing, the user's right to select a specific portion of the playback is limited as the resolution increases. Therefore, there is a disadvantage that it is difficult to utilize the features of the ultra-high definition that can be selected by viewing the desired image.

In the present embodiment, when a service provider delivers an image having a size larger than the screen of a device owned by the user without a separate conversion process, the image of the predefined area is extracted in real time from the user device and displayed according to the display screen of the device. Or, extract the image of the predefined area in real time from the network gateway or streaming server according to the performance of the terminal requesting the video, or extract the image of the area by receiving the location selected by the user navigation through the user terminal Accordingly, an object of the present invention is to provide an image selection / extraction apparatus, a service, and a method for providing a user's viewpoint at various angles, such as expanding a specific location.

According to an aspect of the present embodiment, the information indicating that one or more tiles are composed of the first receiver for receiving the entire bit stream included in the header information and the image selection information according to the user's request, the network environment or the performance of the terminal. An image information analyzer for analyzing image code and header information from the second receiving unit and the entire bit stream received by the first receiving unit and the header information of the entire bit stream according to the image selection information received by the second receiving unit. And an extraction bitstream generation unit configured to generate new header information by modifying and generating an extraction bitstream including an image code corresponding to the new header information and the image selection information.

According to another aspect of the present embodiment, the method comprising: receiving the entire bit stream and the image selection information included in the header information information indicating that the information is composed of one or more tiles, and analyzing the image code and header information from the entire bit stream; Modifying the header information of the entire bit stream according to the image selection information to generate new header information; and generating an extracted bit stream including the new header information and an image code corresponding to the image selection information. It provides an image extraction method characterized in that.

According to another aspect of the present embodiment, the receiving unit for receiving the content including the image and the image included in the content by analyzing the image, the size of the image or the terminal to receive the image to one or more tiles It provides a bit stream generating apparatus comprising an encoding unit for setting and encoding the size of the tile or the number of tiles in the picture to be configured.

In addition, according to another aspect of the present embodiment, the communication unit for receiving at least one of the entire bit stream and the information about the entire bit stream included in the header information indicating that the information consisting of one or more tiles related to the entire bit stream A display unit for displaying information and a user input unit for generating image selection information, which is information about an object or region that a user wants to select, in the information about the entire bit stream displayed by the display unit, and an image code and a header of the entire bit stream Analyzing the information, modifying header information of the entire bit stream according to the video selection information to generate new header information, and extracting an extracted bit stream including video information of the tile corresponding to the new header information and the video selection information. An image extracting unit to generate and It provides an image extraction terminal apparatus comprising: a decoder for decoding (Decoding) the group extracted bitstream.

As described above, according to an aspect of the present embodiment, when a user navigates to and selects a portion of UHD or higher definition content that the user wants to watch in the image, the extracted portion may be enlarged and displayed. It works.

In addition, according to an aspect of the present embodiment, in extracting a specific portion desired by the user from the UHD content or in controlling the traffic of the video according to the network environment, the performance of the terminal, by modifying the header without additional transcoding By extracting a part of the image, it is possible to solve the delay caused by the time required for transcoding and the degradation of the image quality, and to select and extract a part of the image in the house so that the user's preference can be immediately reflected. There is.

1A is a schematic diagram illustrating a state in which an image extracting apparatus according to an embodiment of the present invention is connected to user terminals through a network.

1 (b) is a schematic diagram illustrating a state in which an extract stream is delivered to user terminals through a network according to another embodiment of the present invention.

1 (c) is a schematic diagram illustrating a state in which an extract stream is delivered to user terminals via a network according to another embodiment of the present invention.

2 is a schematic diagram of a display image divided into a tile structure of three rows and three columns.

3 is a block diagram illustrating a configuration of an image extraction apparatus or an image extraction unit according to an embodiment of the present invention.

4 is a schematic diagram illustrating a structure of a NAL unit according to an embodiment of the present invention.

5 (a) is a block diagram showing the configuration of a terminal according to an embodiment of the present invention.

5 (b) is a block diagram showing the configuration of a terminal according to another embodiment of the present invention.

5 (c) is a block diagram showing the configuration of a terminal according to another embodiment of the present invention.

6 is a flowchart illustrating a method of extracting an image according to an embodiment of the present invention.

7 is a diagram illustrating a structure of media expression description information according to another embodiment of the present invention.

Hereinafter, some embodiments of the present invention will be described in detail through exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are assigned to the same components as much as possible even though they are shown in different drawings. In addition, in describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

In addition, in describing the component of this invention, terms, such as 1st, 2nd, A, B, (a), (b), can be used. These terms are only for distinguishing the components from other components, and the nature, order or order of the components are not limited by the terms. Throughout the specification, when a part is said to include, 'include' a certain component, which means that it may further include other components, except to exclude other components unless otherwise stated. . In addition, as described in the specification. The terms 'unit' and 'module' refer to a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software.

The headend 110 encodes content stored in a camera, a real-time channel, or storage into a structure having a plurality of tiles, and uses the network 116 to extract the entire encoded image stream 118. To send). The head end 110 may include a receiver 112 and an encoder 114.

The receiver 112 of the headend receives content stored in a camera, a real-time channel, and storage and transmits the content to the encoder.

The encoder 114 of the headend encodes the entire bit stream and sets the size of the tile and the number of tiles in the picture at the time of encoding. In this case, the encoder 114 may appropriately set the number, size, and position of tiles in the picture according to the image / scene configuration, the image size, the terminal to be serviced, and the like. For example, when the entire stream has a resolution of 7680x4320, the encoder 114 extracts a tile area that can be viewed at a resolution of 3840x2160 in the 7680x4320 support screen so that the TV can smoothly watch a TV having a resolution of 3840x2160. After extracting the tile region, the encoder 114 may set the tile to 3840x2160 or encode several tiles having a smaller size in advance so that only the extracted region can be reproduced.

Accordingly, the header information of the entire bit stream encoded by the encoder 114 includes information indicating that a plurality of tile structures are configured per display picture. In addition, the header information includes flag information indicating that a tile structure in a picture is used and information on the number of tiles and the number of tiles.

As described above, the encoder 114 of the headend sets the number of tiles or the size of the tiles according to the image / scene configuration, the image size, the terminal to be serviced, and the like for one entire bit stream.

As shown in FIG. 2, one display screen may be set to three tiles horizontally and three tiles vertically. In each tile, the number of coding tree blocks (CTBs) that are coding units is determined.

Referring to FIG. 2, the number of samples in the width direction of any one tile is the same as the number of samples in the width direction of a tile vertically adjacent thereto. In addition, the number of samples in the height direction of any one tile is equal to the number of samples in the height direction of a tile adjacent thereto.

For example, the number of samples in the horizontal direction of the

tiles

210, 240, and 270 is set to be the same. In addition, the number of samples in the vertical direction in

tiles

210, 220, and 230 is set to be the same. In this way, the tile structure of H.265 High Efficiency Video Coding (HEVC) can be applied.

The encoder 114 encodes the entire bit stream and grasps information about the entire bit stream. The information about the entire bit stream refers to information that can represent the entire bit stream. A thumbnail image of the entire bit stream, a thumbnail video, some images of the entire bit stream, some images of the entire bit stream, and all the bit streams. The concept also includes text representing scenes or objects that appear.

In addition, the encoder 114 of the headend may transmit the tile position information on the region where the preset object is located as additional information to the encoded stream. Accordingly, the

terminals

120 and 122 may track the movement of the preset object in which the motion exists in the image to select and view the preset object. The preset object may be set, for example, as a specific player, manager, cheerleader, or the like in an image related to a sporting event. The encoder 114 detects an area where a predetermined object is located in the image, and fits the size of the object so that the preset object can be extracted well in the image or maintains the shape of the tile to extract the tile well. You can set tiles of various sizes, such as setting a smaller size. The encoder 114 may mux the tile position information of the region to be extracted to the image extracting apparatus 118 with the entire video stream and transmit the same together. In this case, the size, number, etc. of the tiles may be preset for each resolution so that the corresponding region may be extracted according to the resolution of the terminal. For example, if the stream to be encoded in the headend to be delivered to the image extraction apparatus has a resolution of 3840x2160, while the terminal provides a resolution of 1920x1080, the encoder 114 is 1920x1080 or with a stream having a resolution of 3840x2160 The tile size and the number information set to the following size can be delivered together as additional information.

The image extracting apparatus 118 receives information about the entire bit stream and the entire bit stream from the headend 110 using the network 116, and transmits the information about the entire bit stream to the

terminals

120 and 122. . The image extracting apparatus 118 may be included as some functions in the home gateway of the home. The image extracting apparatus 118 receives terminal information and image selection information from the

terminals

120 and 122. The image selection information refers to information about a location of a specific object or a location of a specific object to which a movement is to be tracked in the image. The image selection information is obtained from input by a predetermined selection device such as a touch screen, a remote controller, a mouse, or a keyboard, or by identifying a location to which the user's eyes are directed to determine where the region of interest (ROI) is located. It can be obtained using an eye tracking device that can be identified. For example, the information about the object position may be a coordinate position of the image or an object ID predefined through the UI. The image extracting apparatus 118 may determine the size of the splittable tile using the terminal information including the maximum resolution and the designated coordinates. That is, when the tile corresponding to the designated coordinates is the same as the supported resolution of the terminal, the image extracting apparatus 118 may select one corresponding tile. When the corresponding tile is set to be small and a plurality of tiles can be selected in the terminal, the image extracting apparatus 118 may determine a desired selection position by selecting the corresponding tile and the neighboring tiles together. When the viewer selects a plurality of specific regions (players, actors, directors, etc.) at the same time in the entire video or multiple channels, the image extracting apparatus 118 may select and recombine only the corresponding tiles. The image extracting apparatus 118 extracts an image tile part corresponding to the image selection information from the entire bit stream using the image selection information. The image extracting apparatus 118 changes the header according to the size of the extracted image without additional transcoding, and then transmits the extracted image to the

terminals

120 and 122. Detailed description thereof will be described with reference to FIG. 3.

The

terminals

120 and 122 receive information about the entire bit stream from the image extractor 118 and transmit the terminal information and the image selection information to the image extractor 120. The

terminals

120 and 122 may be a personal computer (PC), a notebook computer, a tablet, a personal digital assistant (PDA), a game console, a portable multimedia player (PMP), or a PlayStation Portable. (PSP: PlayStation Portable), a wireless communication terminal (Wireless Communication Terminal), a smart phone (Smart Phone), a TV, a set-top box (Set-Top Box), it may be a user terminal such as a media player. The terminal information corresponds to information regarding which of the above-described user terminals.

The head end 130 encodes the entire bit stream in a structure having a plurality of tiles, and transmits information about the entire bit stream to the

terminals

148 and 150 using the streaming server 136. In addition, the headend 130 receives image selection information from the

terminals

148 and 150 and generates an extraction bit stream for a predetermined region or object corresponding to the image selection information among all the bit streams. The headend 130 may include a receiver 132, an encoder 134, and a streaming server 136.

The receiver 132 performs the same operation as the receiver 112 shown in FIG.

The encoder 134 encodes the entire bit stream and sets the size of the tile and the number of tiles in the picture at the time of encoding. In this case, the encoder 134 may preset the size of the tile and the number of tiles in the picture according to the object configuration information in the scene, the critical region in the image, the performance of the terminal, and the network environment obtained by analyzing the image. Accordingly, the encoder 134 may enable a portion of the entire bit stream to be reproduced in the user terminal by using location information about a predetermined region or object. For example, when the entire stream has a 7680x4320 resolution, the encoder 134 divides the tile resolution into 3840x2160 or less in the 7680x4320 support screen so that the stream can be smoothly watched on a TV that provides a 3840x2160 resolution. In particular, the encoder 134 may be divided into a tile resolution of 3840x2160 or less so that an area desired by a user may be extracted based on the middle of a screen having a high importance or an important person in a scene. After that, the encoder 134 extracts one or more tiles and sets the tiles to be viewed according to the TV resolution, and encodes the tiles together with the tile information. In addition, the encoder 114 encodes the entire bit stream and grasps information about the entire bit stream.

The streaming server 136 transmits the information about the entire bit stream to the

terminals

148 and 150, receives the terminal information and the image selection information, and transmits the information about the preset area or the object corresponding to the image selection information among the entire bit streams. Create an extraction bit stream. The streaming server 136 may include a communication unit 138, a location tracker 140, an image extractor 142, and a stream transmitter 144.

The communication unit 138 receives information about the entire bit stream and the entire bit stream encoded from the encoder 134, and transmits the information about the entire bit stream to the

terminals

148 and 150. In addition, the communication unit 138 receives terminal information and image selection information from the

terminals

148 and 150.

The location tracking unit 140 determines a tile to be extracted by mapping the terminal information received from the communication unit 138, the image selection information, and the tile structure constituting the image. The location tracking unit 140 may determine the size of the tile that can be divided according to the terminal information including the maximum resolution and the numerical coordinates. That is, when the tile corresponding to the digitized coordinates is the same as the supported resolution of the terminal, one tile may be selected. If the corresponding tile is smaller than the supported resolution of the terminal, the desired selection position may be determined by selecting the tile together with the neighboring tiles. The location tracking unit 140 selects only the tiles and extracts the tile information to be recombined when the viewer selects a plurality of specific areas (players, actors, directors, etc.) at the same time in the entire video or in several channels (eg The tile ID) to the image extraction unit 142.

The image extractor 142 receives the entire stream from the communication unit 138 and tile information to be extracted from the location tracker 140. The image extractor 142 changes the header according to the size of the extracted image without additional transcoding, and then transfers the extracted image to the stream transmitter 144. A detailed description thereof will be described with reference to FIG. 3.

The stream transmitter 144 receives the image extracted by the image extractor 142 and delivers the image to the terminal. As a method of transmitting a stream, a streaming method may be divided into a push method in which a streaming server transmits a stream to a terminal and a pull method in which a terminal requests a stream from the streaming server and takes the stream. The push method is a method in which a server transmits a packet by using a real time protocol (RTP), a user datagram protocol (UDP), etc., and transmits a packet regardless of a terminal and a network environment in an existing broadcasting system. Pull methods that a terminal requests a constant bit stream to a server include HLS (HTTP Live Streaming), MPEG-DASH (Dynamic Adaptive Streaming over HTTP), MPEG MMT (MPEG Media Transport), etc. As a result, it is mainly used in internet broadcasting services requiring traffic control. In transmitting the stream by the pull method, the streaming server delivers media presentation description (MPD) information to the terminal, and the terminal requests the bit stream by viewing the media presentation description information. The media representation description information is information including information on a bit stream requested by the terminal to the streaming server 136. A detailed structure thereof will be described with reference to FIG.

In the case of a pull service, the image selection information may be defined using media expression description information. In order to describe a stream encoded with various bitrates for one content, the existing media representation description information uses ID, URL information, etc. of each stream as an adaptation set 730 or a representation 740. In order to express a specific object or region in the same manner, the media expression description information may separately define an ID and a URL corresponding to the specific object or region. The location of the object in the image / scene is determined by the location tracker 140 using various tracking techniques. When the image extractor 142 extracts a specific object or region according to the tile information to be extracted, the stream transmitter 144 Is extracted as SegmentURL in Adaptation Set defined as a specific object or region and transmits newly created extraction stream.

Headend 160 performs the same operation as headend 110 shown in FIG.

The

terminals

168 and 172 include the image extracting apparatus 118 shown in FIG. The

terminals

168 and 172 receive information about the entire bit stream and the entire bit stream from the headend 160 and receive image selection information from the user by using the information about the entire bit stream. Each of the image extracting apparatuses 170 and 174 included in the

terminals

168 and 172 generates an extraction bit stream for a predetermined region or object corresponding to the image selection information among the entire bit streams. The

terminals

168 and 172 display the extracted image by using the extracted bit stream. Description of the image extraction apparatus 170, 174 will be described with reference to FIG.

Referring to FIG. 3, the image extracting apparatus 118, 170, 174 or the image extracting unit 142 according to an embodiment of the present invention may include a receiver 310, an image information analyzer 320, and an extracted bitstream generator. 330 may be implemented.

The receiver 310 receives the entire bit stream, the terminal information, and the image selection information. In this case, the receiver 310 may receive the entire bit stream from the

headends

110 and 160 together with information about the region of the muxed preset object. The first receiver 313 may receive image selection information from the

terminals

120 and 122, the location tracker 140, or the user input unit 550 in the terminal, and the second receiver 316 may receive the

headend

110, 160 or the communication unit 138 may receive information about the entire bit stream or the area of the preset object muxed therewith. Alternatively, on the contrary, the first receiver 313 may receive information about the entire bit stream or the region of the preset object muxed thereto, and the second receiver 316 may receive image selection information. The receiver 310 may receive tile information to be extracted from the location tracker 140 as image selection information.

The image information analyzer 320 receives the entire bit stream from the receiver 310 and analyzes the image information including the image code and header information from the entire bit stream. The entire bit stream is an image encoded, and the unit picture of the entire bit stream may consist of one or more slices, and each slice may be generated in a structure having one or more tiles.

The slice and tile structure is a rectangular structure including a plurality of coding tree blocks (CTBs), and a data structure for encoding and decoding H.265 High Efficiency Video Coding (HEVC). to be. Each tile structure is in the form of a matrix of CTBs. For example, one tile structure may be in the form of a 3 × 3 matrix of CTBs, but is not limited thereto, and may be a CTB matrix having various sizes.

The image code refers to a network abstract layer (NAL) having an actual video coding stream (VCL). The header information includes, for example, the number of slices, the number of tiles per slice, the size of each tile, and the pixel of the picture. It includes a non video coding layer (NVCL) including information such as the number of samples.

The extraction bit stream generator 330 receives image selection information from the receiver 310 and generates an extraction bit stream according to the image selection information. Alternatively, the extraction bit stream generator 330 may identify tile information corresponding to information about a region of a preset object received from the receiver 310 and generate an extraction bit stream according to the identified tile information.

In more detail, the extraction bit stream generation unit 330 is to extract the header information included in the entire bit stream received from the image information analysis unit 320 according to the image selection information or the information on the region of the preset object. Extraction header information is generated by modifying the bitstream of the region.

The extraction bit stream generator generates an extraction bit stream including the modified extraction header information, the image selection information, or the image code corresponding to the information about the region of the preset object.

Here, the header of the generated extraction bit stream includes information indicating the total size of the extracted image and how many tiles are included in the extracted image. When the extracted image is composed of one tile, the header of the extracted bit stream includes information on the size of the tile to be extracted and information on which of the plurality of slices in the display picture is the first slice. When the extracted image is composed of a plurality of tiles, the header of the extracted bit stream may include information about the total number of tiles present in the extracted image, information about the size of the entire tile, and which of the plurality of slices in the display picture. Contains information about whether this is the first slice.

For reference, arrows in FIG. 4 indicate reference relationships. Referring to FIG. 4, the NAL unit includes a NAL header and a Raw Bytes Sequence Payload (RBSP).

The entire bit stream of the encoded image is delivered to a NAL unit, which is a unit of a network abstract layer (NAL). The NAL header of the NAL unit consists of a Non-Video Coding Unit (Non-VCL) NAL unit and a VCL NAL unit. Non-VCL NAL unit is NAL (VPS) which means Video Parameter Set (VPS) NAL, NAL (SPS) which means Sequence Parameter Set (SPS) NAL, NAL (PPS) which means Picture Parameter Set (PPS) NAL And NAL (SEI), which stands for Supplemental Enhancement Information (SEI) NAL. In particular, the SPS includes ON / OFF information of the encoding tool, and the PPS includes information related to tiles.

In order to process the image corresponding to the extracted bit stream in accordance with the specifications of the terminal, the extraction bit stream generation unit 330 modifies the information of the SPS and PPS of the NAL header, etc. and converts it into a tile structure, thereby extracting the extracted bit stream of the image. Create

The VCL NAL unit may include a plurality of frames such as a first frame (Frame 1), a second frame (Frame 2), and the like. Each frame contains three consecutive NALs (VCLs).

Information about tiles in the extracted bit stream is set in the PPS. for example. Tiles_enabled_flag, which is information set in the PPS, is information indicating whether a tile structure exists in a picture.

The size of each tile in the extracted bit stream is set by num_tile_columns_minus1, num_tile_rows_minus1, and uniform_spacing_flag.

num_tile_columns_minus1 indicates the number of tiles in the extraction bit stream, and num_tile_rows_minus1 indicates the number of tiles in the extraction bit stream. The uniform_spacing_flag is information indicating whether the tiles have the same size.

With reference to the num_tile_columns_minus1 and num_tile_rows_minus1 information, it can be checked whether the tiles are the same size. If the sizes of the tiles are not the same, each size of the horizontal tile is set by column_width_minus1 [i], and each size of the vertical tile is set by row_height_minus1 [i].

On the other hand, there are some constraints on the extraction bit stream from the entire bit stream.

When encoding the entire bit stream, the header information of the entire bit stream includes information indicating that loop filtering cannot be performed across the boundary of the slice (eg, loop_filter_across_tiles_enabled_flag = 0). When the extracted bit stream is generated from the entire bit stream encoded under such a constraint, loop filtering does not occur and thus deterioration of image quality at the tile boundary is prevented.

When encoding the entire bit stream, the header information of the entire bit stream includes a padding portion in which the motion vector exceeds the image range in the extracted bit stream when the motion vector coding mode such as merge or merge skip is performed in the prediction unit. Information indicating that it cannot be referenced is included. Therefore, when the motion vector is calculated in a mode such as merge or merge skip, the motion information is prevented from being referred beyond the tile boundary to determine the motion vector candidate.

When encoding the entire bit stream, the header information of the entire bit stream includes information indicating that the range of the motion estimation cannot exceed the padding portion in estimating the motion of the prediction unit. When the extracted bit stream is decoded, motion prediction beyond the tile boundary is prevented.

When encoding the entire bit stream, when encoding the motion vector in the prediction unit, the header information of the entire bit stream includes a motion vector (Temporal motion vector) of another picture existing at the same position as the block of the prediction unit to be encoded. Information indicating that it cannot be referenced is included.

When the extraction bit stream generator 330 generates SPS, PPS, slice header information, etc. corresponding to the extraction bit stream, the following information is modified differently from the header information of the entire bit stream.

First, referring to the case where the extracted image is composed of only one tile, the header information of the extracted bit stream is modified as follows.

The pic_width_in_luma_samples and pic_height_in_luma_samples of the SPS are changed to the size of the single tile screen to be extracted and set to the horizontal size and the vertical size of the extracted image.

In PPS, tiles_enabled_flag indicating information on whether a tile structure exists in a picture is modified to 0 to indicate that there is no tile structure in the picture.

In the case of the first slice in the extraction header information of the extracted bit stream, first_slice_segment_in_pic_flag is set to 1, and for the remaining slices, first_slice_segment_in_pic_flag is set to 0. In addition, num_entry_point_offsets, which means an offset of a tile in a slice, is set to 0 in all slice headers.

Next, referring to the case where the extracted image is composed of a plurality of tiles, the header information of the extracted bit stream is modified as follows.

The pic_width_in_luma_samples and pic_height_in_luma_samples of the SPS are changed to the size of the entire tile screen to be extracted and set to the horizontal size and the vertical size of the extracted image.

In the PPS, num_tile_columns_minus1 and num_tile_rows_minus1 are changed to match the number of vertical and transverse tiles in the extracted bit stream from the number of vertical and transverse tiles in the existing whole bit stream.

In the extraction header information of the extraction bit stream, information first_slice_segment_in_pic_flag indicating whether only the first slice header is the first slice in the picture is included. In the extraction header information of the extraction bit stream, first_slice_segment_in_pic_flag is set to 1 and first_slice_segment_in_pic_flag is set to 0 for the first slice in the extracted whole picture on the extraction bit stream. In addition, in every slice header, num_entry_point_offsets, which means an offset of a tile in a slice, is set to zero.

Referring to FIG. 5A,

terminals

120 and 122 according to an embodiment of the present invention may be implemented to include a communication unit 510, a decoder 512, a display unit 514, and a user input unit 518. Can be.

First, the communication unit 510 receives information about the entire bit stream from the image extraction apparatus.

The display unit 514 displays information about the entire bit stream. The information about the entire bit stream may be a thumbnail image or video of the entire bit stream, may be some images or videos of the entire bit stream, and may be text representing the entire bit stream.

The user input unit 518 receives an object or region to be selected for information about the entire bit stream displayed by the display unit from the user. The user input unit 518 receives a position of one or more specific objects for which the movement is to be tracked or one or more regions to be viewed from the user. At this time, the position tracking unit 545 digitizes the position information of one or more specific objects for which the movement is to be tracked or one or more region information to be watched by coordinates in the stream. The location tracking unit 520 tracks tile information corresponding to the location or area of the object according to the numerical information. The image selection information digitized by the position tracking unit 520 is transmitted to the communication unit. In this case, the user input unit 518 may be a predetermined selection device.

The communication unit 510 receives image selection information from the user input unit 518 and transmits the image selection information to the image extraction apparatus. In addition, the communication unit 510 also transmits terminal information to the image extraction apparatus 118.

The communication unit 510 receives the extraction bit stream from the image extraction apparatus 118, and the decoder 512 decodes the extraction bit stream received by the communication unit 510.

The display unit 514 displays the decoded extracted bit stream. In this case, the renderer 516 included in the display unit 514 adjusts the size of some or all of the tiles included in the extracted bit stream according to the display information. The display information corresponds to information for making some tiles of the extracted bit stream larger and tiles of the remaining extracted bit stream small when the extracted bit stream includes a plurality of tiles. The renderer 516 may adjust the size of each tile of the extracted bit stream according to the display information.

The display unit 514 displays the extracted bit stream scaled by the renderer 516.

Referring to FIG. 5B,

terminals

148 and 150 according to an embodiment of the present invention may be implemented to include a communication unit 530, a decoder 532, a display unit 534, and a user input unit 538. Can be.

The communication unit 530 receives information about the entire bit stream from the streaming server 136.

The user input unit 518 receives an object or region to be selected for information about the entire bit stream displayed by the display unit from the user. The user input unit 518 may be selected for the location of one or more specific objects for which the movement is to be tracked or for one or more areas to be viewed. Unlike the user input unit 518 illustrated in FIG. 5A, the user input unit 518 only receives an object or region because the location tracking unit does not exist. The user input unit 538 creates the media expression description information of the object or region selected by the user.

The communication unit 530 receives the media expression description information as the image selection information from the user input unit 538 and transmits it to the streaming server 136. In addition, the communication unit 530 also transmits the terminal information to the streaming server 136.

The communication unit 530 accesses the buffer 168 of the streaming server 136 and obtains the extraction bit stream by requesting the stored extraction bit stream (Pull method).

The decoder 532 decodes the obtained extraction bit stream, and the display unit 534 displays the decoded extraction bit stream. In this case, the renderer 536 included in the display unit adjusts the size of some or all of the tiles included in the extracted bit stream according to the display information.

5 (c) is a block diagram showing the configuration of a terminal according to another embodiment of the present invention. FIG. 5C illustrates a terminal in which the image extractors 170 and 174 are included as some devices in the terminal.

Referring to FIG. 5 (c), the

terminals

168 and 172 according to an embodiment of the present invention may include a communication unit 540, a decoder 542, a display unit 544, a user input unit 548, and an image extracting device ( 170, 174).

First, the communication unit 540 receives information about the entire bit stream and the entire bit stream from the headend 160.

The display unit 544 displays information about the entire bit stream. The information about the entire bit stream may be a thumbnail image or video of the entire bit stream, may be some images or videos of the entire bit stream, and may be text representing the entire bit stream.

The user input unit 548 receives an object or region to be selected by the user with respect to the decoded bit stream displayed by the display unit 544. The user input unit 548 receives a location of one or more specific objects for which the movement is to be tracked or one or more areas to be viewed from the user. At this time, the position tracking unit 550 digitizes the position information of one or more specific objects for which the movement is to be tracked or the one or more region information to be viewed by coordinates in the stream. The location tracking unit 550 tracks tile information corresponding to the location or area of the object according to the numerical information. The image selection information digitized by the position tracking unit 550 is transferred to the image extracting apparatuses 170 and 174. In this case, the user input unit 518 may be a predetermined selection device.

The image extractors 170 and 174 generate the extracted bit stream by receiving the image selection information from the user input unit 548 and the entire bit stream from the communication unit 540. Since a description thereof has been described with reference to FIG. 3, a detailed description thereof will be omitted.

The decoder 542 decodes the obtained extraction bit stream, and the display unit 544 displays the decoded extraction bit stream. In this case, the renderer 546 included in the display adjusts the size of some or all of the tiles included in the extracted bit stream according to the display information received from the image extractors 170 and 174.

The video selection information is received from the entire encoded bit stream and the terminal (S610).

The image code and header information are analyzed from the entire encoded bit stream (S620).

Extraction header information is generated by modifying the encoded header information of the entire bit stream to correspond to the bit stream of the region to be extracted according to the image selection information (S630).

An extraction bit stream including the generated header information and the image code corresponding to the image selection information is generated (S640).

In FIG. 6, processes S610 to S640 are described as being sequentially executed, but this is merely illustrative of the technical idea of the exemplary embodiment of the present invention. In other words, a person of ordinary skill in the art to which an embodiment of the present invention belongs may execute the process described in FIG. 6 by changing the order described in FIG. 6 without departing from the essential characteristics of the embodiment of the present invention. Since the above processes may be variously modified and modified to be executed in parallel, FIG. 6 is not limited to the time series order.

Meanwhile, the processes shown in FIG. 6 may be implemented as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. That is, the computer-readable recording medium may be a magnetic storage medium (for example, ROM, floppy disk, hard disk, etc.), an optical reading medium (for example, CD-ROM, DVD, etc.) and a carrier wave (for example, the Internet Storage medium). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

Period 710 means a period for which media information is maintained.

BaseURL 720 means a common URL of the stream to request.

The adaptation set 730 is a set of variously encoded resources selected by the terminal. There must be at least one adaptation set 730 in period 710.

Representation 740 represents an encoded version of one or more media streams. The Representation 740 is composed of one or more segments, and includes a SegmentBase 750, a SegmentList 770, a SegmentURL 780, and the like of each Segment constituting the Representation.

When the terminal desires to enlarge and view a preset area or object, the terminal may make a request to the streaming server using the adaptation set 730 or the representation 740 of the media expression description information. Alternatively, when the user selects a specific object or controls traffic according to the network or the terminal, the terminal may request the streaming server 136 using the SegmentURL 780 in the media expression description information.

The above description is merely illustrative of the technical idea of the present embodiment, and those skilled in the art to which the present embodiment belongs may make various modifications and changes without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment but to describe the present invention, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of the present embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present embodiment.

CROSS-REFERENCE TO RELATED APPLICATION

This patent application is related to the patent application No. 10-2015-0095470 filed in Korea on July 03, 2015 and the patent application No. 10-2016-0084443 filed in Korea on July 04, 2016. (a) Claims priority under section 35 USC § 119 (a), all of which are hereby incorporated by reference in this patent application. In addition, this patent application claims priority to countries other than the United States for the same reasons, all of which are incorporated herein by reference.

Claims

A first receiver configured to receive an entire bit stream including information indicating that one or more tiles are included in header information;

A second receiver configured to receive image selection information according to a user's request, network environment, or performance of a terminal;

An image information analyzer for analyzing image code and header information from the entire bit stream received by the first receiver; And

The header information of the entire bit stream is modified according to the image selection information received by the second receiver to generate new header information, and the extracted bit stream including the new header information and an image code corresponding to the image selection information is generated. Extraction bitstream generator to generate

Image extraction apparatus comprising a.
The method of claim 1,

The video selection information,

The image extraction apparatus, characterized in that received from the device for generating the entire bit stream, or from the terminal receiving the extracted bit stream.
The method of claim 2,

The video selection information,

When received from the device for generating the entire bit stream, the image extraction apparatus, characterized in that the information on the area in which one or more preset objects are located.
The method of claim 2,

The extraction bitstream generator,

In modifying the header information of the entire bit stream according to the video selection information, at least one of a sequence parameter set (SPS), a picture parameter set (PPS) and slice header information in the header of the entire bit stream is modified. Image extraction device.
The method of claim 4, wherein

The extraction bitstream generator,

And modifying the SPS in the header information of the entire bit stream according to the image selection information, modifying pic_width_in_luma_samples and pic_height_in_luma_samples to a screen size of a tile to be extracted.
The method of claim 4, wherein

The extraction bitstream generator,

In modifying the PPS of the header information of the entire bit stream according to the video selection information,

If the extracted image is composed of one tile, modify tiles_enabled_flag to 0, and if the extracted image is composed of a plurality of tiles, modify num_tile_columns_minus1 and num_tile_rows_minus1 to match the number of vertical and horizontal axis tiles in the extracted bit stream. Image extraction apparatus characterized in that.
The method of claim 2,

The video selection information,

When received from the terminal receiving the extracted bit stream, it is information about one or more areas to be enlarged within the entire image corresponding to the entire bit stream or information about the position of one or more specific objects that want to track the movement Image extraction apparatus characterized in that.
The method of claim 1,

The new header information,

When the extracted image is composed of one tile, image extraction comprising at least one of information indicating a size of a tile to be extracted, whether there is a tile structure and what is the first slice. Device.
The method of claim 1,

The new header information,

When the extracted image is composed of a plurality of tiles, the image extraction apparatus comprising at least one of the size of the entire tile to be extracted, the number of the total tiles and information indicating whether the first slice (Slice) is what .
Receiving the entire bit stream and the image selection information including information indicating that one or more tiles are included in the header information;

Analyzing video code and header information from the entire bit stream;

Generating new header information by modifying header information of the entire bit stream according to the video selection information; And

Generating an extracted bit stream including an image code corresponding to the new header information and the image selection information;

Image extraction method comprising a.
A receiver configured to receive content including an image; And

Analyzing an image included in the content, encoding by setting the size of the tile or the number of tiles in the picture to be composed of one or more tiles according to the configuration of the image, the size of the image or the terminal to receive the image (Encoding) Encoder

Bit stream generation apparatus comprising a.
A communication unit configured to receive at least one of an entire bit stream included in header information and information indicating that one or more tiles are configured;

A display unit which displays information on the entire bit stream;

A user input unit which generates image selection information, which is information about an object or an area that a user wants to select within information about the entire bit stream displayed by the display unit;

Analyzing the video code and the header information of the entire bit stream to modify the header information of the entire bit stream according to the video selection information to generate new header information, the new header information and the tile corresponding to the video selection information An image extractor configured to generate an extracted bit stream including image information; And

Decoder for decoding the extracted bit stream

Image extraction terminal device comprising a.