WO2021002303A1

WO2021002303A1 - Information processing device, information processing method, playback processing device, and playback processing method

Info

Publication number: WO2021002303A1
Application number: PCT/JP2020/025379
Authority: WO
Inventors: 平林　光浩; 遼平高橋; 優池田; 勇司藤本; 矢ケ崎　陽一
Original assignee: ソニー株式会社
Priority date: 2019-07-03
Filing date: 2020-06-26
Publication date: 2021-01-07

Abstract

An information processing device, an information processing method, a playback processing device, and a playback processing method, which provide a user with a high-quality audiovisual experience, are provided. An encoding unit encodes images in an image sequence to generate an encoded stream. A determination unit determines one or more decoding start images in the image sequence that can be used as images for starting decoding during gradual random access (GRA). A file generation unit inserts GRA information, which is related to the decoding start image determined by the determination unit, into a header region and inserts the encoded stream into a data region of a file format containing the header region and the data region.

Description

Information processing device, information processing method, reproduction processing device and reproduction processing method

The present invention relates to an information processing device, an information processing method, a reproduction processing device, and a reproduction processing method.

H.H., which is one of the standard specifications of the image coding method. In 265 / HEVC, the following is stipulated. A sequence that corresponds to the entire compressed moving image contains a plurality of images, and each image is called a picture. Each picture is divided into one or more slices. A slice is the smallest decoding unit. Then, each slice is classified into one of I slice (Intra Slice), P slice (Predictive Slice) and B slice (Bipredictive Slice).

The I slice is a slice that is independently decoded without referring to other images. A P-slice is a slice that is decoded by referencing a single other image. A B slice is a slice that is decoded by referencing a plurality of other images.

The picture at the beginning of the sequence consisting of only I slices is called an IDR (Instantaneous Decoding Refresh) picture. The IDR picture is identified by the value of the NAL (Network Abstraction Layer) unit type. The pictures in the same sequence that follow the IDR picture do not refer to the pictures before the IDR picture in the decoding order (decoding order), but in the decoding order (decoding Order) or display order (presentation Order) than the IDR picture. Located behind.

Therefore, when attempting to randomly access a time point in the middle of the video of a certain coded stream, the video can be appropriately decoded from the IDR picture in the vicinity of the specified time point. Here, the random access is not a decoding process from the beginning of the stream, but a process of decoding and reproducing the stream from the middle of the stream.

Further, in recent years, a method has been proposed in which all pictures are gradually replayed and restored by using the refresh region by I stripe without using the IDR picture for the random access point. This method is called Gradual Random Access (GRA). Here, the decoding start image corresponding to the access point in GRA is called a GRA picture.

The image that becomes the start image of the random access point is treated as a sync sample in MPEG-4 in the HEVC (High Efficiency Video Coding) standard. sync sample is stored in sync sample box.

In the HEVC standard, in order to realize GRA, random access by the I stripe method is realized by adding some control of coding processing on the encoder side. This is intended to reduce the maximum amount of coded picture and reduce code delay in the overall coding and transmission. Since there is no reference in the IDR picture, the coding efficiency is poor and the amount of bits generated is large. However, when the entire image is reproduced and restored using the I slice, the maximum coding amount can be suppressed because each picture has a reference.

On the other hand, in the VVC (Versatile Video Coding) standard, a method is being studied in which the playback / return using the refresh area by the I stripe is handled by the standard instead of the encoder limitation. This makes it possible to set a GRA picture as a random access point in VVC.

However, when playback is performed using GRA, it is possible to correctly decode and display the clean area, which is the area where playback recovery by the refresh area is completed, but correctly decode the dirty area where playback recovery is not completed. Difficult to do. Therefore, in the reproduction using GRA, the display image may be distorted. In order to suppress the distorted display image from being provided to the user, it is conceivable to implement it by device mounting, but the correspondence may differ depending on the device, and the appearance of the content becomes device-dependent. .. Therefore, the quality of the viewing experience of the user may be impaired.

Therefore, the present disclosure provides an information processing device, an information processing method, a reproduction processing device, and a reproduction processing method that provide a user with a high-quality viewing experience.

According to the present disclosure, the coding unit encodes an image in an image sequence to generate a coded stream. The determination unit determines one or more decoding start images in the image sequence that can be used as the image to start decoding at the time of Gradual Random Access (GRA). The file generation unit inserts GRA information regarding the decoding start image determined by the determination unit into the header area of the file format including the header area and the data area, and inserts the coded stream into the data area.

It is a system configuration diagram of an example of a distribution system. It is a block diagram of a file generator. It is a figure for demonstrating the display process of a picture at the time of GRA. It is a figure which shows the standard of the GRA picture adopted in JVET-N0865. It is a figure which shows an example of the sample group of a GRA picture. It is a figure which shows the storage example of GraSyncSampleGroupBox. It is a figure which shows the storage state of GraSyncSampleGroupBox according to the presence or absence of a movie fragment. It is a block diagram of a client device. It is a flowchart of a file generation process by a file generation apparatus. It is a flowchart of the reproduction process executed by a client device. It is a figure which shows an example of the syntax of GradualOutputStruct (). It is a figure of an example of the content which each value of gradual_output_flag shows. It is a figure which shows an example of the syntax of GradualOutputInformationStruct (). It is a figure of an example of the content which each value of gradual_output_type shows. It is a figure which shows the 1st example of GradualOutputInformationStruct () using another definition. It is a figure which shows the 2nd example of GradualOutputInformationStruct () using another definition. It is a figure which shows the 3rd example of GradualOutputInformationStruct () using another definition. It is a figure which shows the 4th example of GradualOutputInformationStruct () using another definition. It is a figure which shows an example of the syntax of InterpolationStruct (). It is a figure of an example of the content which each value of interpolation_type shows. It is a figure which shows the format of Matroska Media Container. It is a hardware block diagram of a computer.

The embodiments of the present disclosure will be described in detail below with reference to the drawings. In each of the following embodiments, the same parts are designated by the same reference numerals, so that duplicate description will be omitted. Further, the scope disclosed in the present technology is not limited to the contents of the embodiment, but also includes the contents described in the following non-patent documents known at the time of filing.

Non-Patent Document 1: (above)
Non-Patent Document 2: ITU-T H.264. SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSEMS. Infrastructure of audiovisual services-Coding of moving video. Advanced video coding for generic audiovisual services, 2017-04
Non-Patent Document 3: m48053, Versatile Video Coding (Draft 5), B. Bross, J. Chen, S. Liu, Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29/WG 11 14th Meeting: Geneva, CH, 19-27 Mar. 2019
Non-Patent Document 4: m48054, Algorithm description for Versatile Video Coding and Test Model 5 (VTM 5), J. Chen, Y. Ye, S. Kim, Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29/WG 11 14th Meeting: Geneva, CH, 19-27 Mar. 2019
Non-Patent Document 5: m47100, AHG12: Loop filter disabled across virtual boundaries, S.-Y. Lin, L. Liu, J.-L. Lin, Y.-C. Chang, C.-C. Ju (Media Tek) ), P. Hanhart, Y. He (InterDigital), Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29/WG 11 14th Meeting: Geneva, CH, 19-27 Mar. 2019
Non-Patent Document 6: m47986, Gradual Random Access, S. Deshpande (Sharp), Y.-K. Wang, Hendry (Huawei), R. Sjoberg, M. Pettersson (Ericsson), L. Chen (Media Tek), Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29/WG 11 14th Meeting: Geneva, CH, 19-27 Mar. 2019
Non-Patent Document 7: ISO / IEC 14496-12: 2015 Information technology. Coding of audio-visual object. Part 12: ISO base media file format, 2015-12
Non-Patent Document 8: ISO / IEC 14496-12: 2017 Information technology. Coding of audio-visual object. Part 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format, 2017-02

The contents described in the above-mentioned non-patent documents are also incorporated into this embodiment by reference. In other words, the contents described in the above-mentioned non-patent documents are also the basis for determining the support requirements. For example, Quad-Tree Block Structure described in Non-Patent Document 2, QTBT (Quad Tree Plus Binary Tree) Block Structure described in Non-Patent Document 3, and MTT described in Non-Patent Documents 4 and 5 (MTT (Quad Tree Plus Binary Tree) Block Structure. Even if the Multi-type Tree) Block Structure is not directly defined in the detailed description of the invention, it is within the scope of this disclosure and shall meet the support requirements of the claims. Similarly, technical terms such as Parsing, Syntax, and Semantics are also within the scope of the present disclosure, even if they are not directly defined in the detailed description of the invention. Yes, and shall meet the support requirements of the claims.

In addition, the present disclosure will be described according to the order of items shown below.
1. 1. First Embodiment 1.1 Modifications of the first embodiment (1)
1.2 Modified example of the first embodiment (2)
1.3 Modified example of the first embodiment (3)
2. 2. Second embodiment

[1. First Embodiment]
In HEVC, in addition to IDR pictures, CRA (Clean Random Access) pictures and BLA (Broken Link Access) pictures having a slight dependency can also be used as sync samples. A picture equivalent to IDR / CRA / BLA of HEVC can be stored in the sync sample box of the ISOBMFF file. On the other hand, the GRA picture is an image that displays a part of the entire display image, and it is not appropriate to treat it in the same manner as the IDR picture, and it is difficult to store it in the sync sample box.

It is also possible to store GRA_NAT in sync sample group entry, which sets a sample group for each NAL unit type. However, it is difficult to store the information specific to the NAL unit type in the sync sample group entry, and it is not practical to use such a sample group for storing the information of the GRA picture.

In addition, recovery_poc_cnt, which is the number of frames until recovery is completed, can be stored in the "roll" sample group of the ISOBMFF file and used as a roll-distance that represents the period until the complete image can be displayed. It is possible. However, even with this method, the processing method of slices other than the I slice is implementation-dependent, and in this case, the appearance of the content becomes device-dependent. Therefore, the quality of the viewing experience of the user may be impaired. As described above, in the existing structure, it is difficult to appropriately store the information of the GRA picture and provide the user with a high-quality viewing experience by random access using GRA.

(Configuration of Distribution System According to First Embodiment)
FIG. 1 is a system configuration diagram of an example of a distribution system. The distribution system 100 includes a file generation device 1 which is an information processing device, a client device 2 which is a reproduction processing device, and a Web server 3. The file generation device 1, the client device 2, and the Web server 3 are connected to the network 4. Then, the file generation device 1, the client device 2, and the Web server 3 can communicate with each other via the network 4. Here, although each device is shown one by one in FIG. 1, the distribution system 100 may include a plurality of file generation devices 1 and a plurality of client devices 2, respectively.

The file generation device 1 generates video content which is data for providing video. The file generation device 1 uploads the generated video content to the Web server 3. Here, in the present embodiment, the configuration in which the Web server 3 provides the video content to the client device 2 will be described, but the distribution system 100 can adopt another configuration. For example, the file generation device 1 may include the functions of the Web server 3, store the generated video content in its own device, and provide it to the client device 2.

The Web server 3 holds the video content uploaded from the file generation device 1. Then, the Web server 3 provides the designated video content according to the request from the client device 2.

The client device 2 transmits a video content transmission request to the Web server 3. Then, the client device 2 acquires the video content specified in the transmission request from the Web server 3. Then, the client device 2 decodes the video content to generate a video, and displays the video on a display device such as a monitor.

(Configuration of File Generation Device According to First Embodiment)
Next, the details of the file generation device 1 will be described. FIG. 2 is a block diagram of the file generator. As shown in FIG. 2, the file generation device 1 which is an information processing device has a file generation processing unit 10, a control unit 11, and a transmission unit 12. The control unit 11 executes a process related to the control of the file generation processing unit 10. For example, the control unit 11 performs integrated control such as the operation timing of each unit of the file generation processing unit 10. The file generation processing unit 10 includes a data acquisition unit 101, an encoding unit 102, a metadata generation unit 103, a determination unit 104, and a file generation unit 105.

The data acquisition unit 101 accepts the input of the original data of the video content for displaying the video. The original data of the video content includes image data and control information of each image included in an image sequence which is a series of images. The control information includes, for example, time information information of each image data. The data acquisition unit 101 outputs the image data included in the image sequence of the acquired video content to the coding unit 102. Further, the data acquisition unit 101 outputs the control information included in the original data of the acquired video content to the metadata generation unit 103.

The coding unit 102 receives input of image data of each image included in the image sequence. Then, the coding unit 102 encodes the image data of each image in the image sequence to generate a coded stream. At this time, the encoding unit 102 encodes so that the pictures 111 to 116 for realizing the reproduction return of the picture by GRA as shown in FIG. 3 are formed.

FIG. 3 is a diagram for explaining a picture display process at the time of RGA. The pictures 111 to 116 are images for enabling the picture to be reproduced by using the refresh area of the intra-stripe. The refresh area is an area that can be reproduced without referring to other images. The clean area is an area that can be accurately regenerated by GRA. The dirty area is an area that refers to a picture before the start of GRA, and is an area in which it is difficult to accurately reproduce the picture after the start of GRA.

Pictures 111 to 116 have refresh areas 121 to 126, respectively. Refresh areas 121-126 include one or more slices. When all the refresh areas 121 to 126 are combined, an image that covers the entire area of one picture is obtained.

The picture 111 is a decoding start image in which decoding is started when the picture is returned to playback by GRA, and the reproduction start point of the picture 111, which is referred to here as a “GRA picture”, is a random access point in GRA. Further, the picture 116 is a picture in which the entire screen of the picture is regenerated by the GRA started from the picture 111, and the reproduction start point of the picture 116 is called a recovery point.

Picture 111 has an intra-stripe refresh area 121. In picture 111, the refresh area 121 directly corresponds to the clean area 131. The area other than the refresh area 121 is the dirty area 141.

Picture 112 has an intra-stripe refresh area 122. In the picture 112, the refresh area 121 of the picture 111 is referred to, and together with the refresh area 122, it becomes a clean area 132. The area other than the clean area 132 is the dirty area 142. The dirty area 142 is smaller than the dirty area 141 by the newly added refresh area 122.

Picture 113 has an intra-stripe refresh area 123. In the picture 113, the refresh area 121 of the picture 111 and the refresh area 122 of the picture 112 are referred to, and together with the refresh area 123, the clean area 133 is formed. The area other than the clean area 133 is the dirty area 143. The dirty area 143 is smaller than the dirty area 142 by the amount of the newly added refresh area 123.

After that, a refresh area is added every time the picture of each frame is played by GRA according to the passage of time, the clean area increases, and the dirty area decreases. Then, the picture 114 has a clean area 134 including a refresh area 124 and a dirty area 144. Further, the picture 115 has a clean area 135 including a refresh area 125 and a dirty area 145.

Then, the picture 116 serving as a recovery point has an intra-stripe refresh area 126. In the picture 116, the refresh areas 121 to 125 of the pictures 111 to 115 are referred to, and the clean area 136 is formed together with the refresh areas 126. In the picture 116, the clean area 136 is the entire screen of the picture, and there is no dirty area. As a result, the reproduction of the picture is completed in the picture 116. In this way, in GRA, the screen of the picture gradually returns to playback.

Here, the case where the clean area increases in order from the left end to the right end toward the paper is explained, but the arrangement of the refresh area is not limited to this. For example, the refresh area is an area extending in the lateral direction of the picture toward the paper surface, and the clean area may increase from the bottom to the top. The shape and position of the refresh area are not particularly limited as long as they are continuous areas, and the order in which the areas in the clean area increase is not particularly limited.

The period from the random access point, which is the playback start point of picture 111, to the recovery point, which is the playback start point of picture 116, is called a roll. GRA is adopted in JVET-N0865 in the form defined by the syntax shown in FIG. FIG. 4 is a diagram showing a GRA picture standard adopted in JVET-N0865. The VCL (Video Coding Layer) NAL of the GRA picture is defined by JVET-N0865 as NalUnitType == GRA_NUT in line 151 of FIG. Further, recovery_per_cnt in line 152 of FIG. 4 is a value indicating the frame number at which the reproduction / return of the picture is completed from the random access point, and this value can be used as a role.

The coding unit 102 outputs a coded stream containing image data encoded so that GRA can be executed to the file generation unit 105. More specifically, a VCL buffer and a non-VCL buffer are provided between the encoding unit 102 and the file generation unit 105. The image data includes visual data that is video and audio data that is audio. Then, the data on the visual side output from the coding unit 102 is sent to the file generation unit 105 via the VCL buffer, and the data on the audio side is sent to the file generation unit 105 via the non-VCL buffer.

The determination unit 104 confirms the encoding result of the encoding unit 102. Then, the determination unit 104 identifies the GRA picture, which is the decoding start image in GRA, from each picture included in the coded stream. Further, the determination unit 104 identifies a random access point and a recovery point in the GRA executed from the specified GRA picture. Then, the determination unit 104 obtains the number of frames from the next frame of the random access point to the frame of the recovery point as a role. This number of frames corresponds to recovery_per_cnt specified in JVET-N086. After that, the determination unit 104 outputs the GRA picture information and the role information to the file generation unit 105.

The metadata generation unit 103 receives input of control information from the data acquisition unit 101. Then, the metadata generation unit 103 generates metadata for image reproduction using the control information. The metadata includes control information related to image generation and reproduction such as what kind of codec is used for compression. The metadata generation unit 103 outputs the generated metadata to the file generation unit 105.

The file generation unit 105 receives the input of the coded stream including the image data encoded so that the GRA can be executed from the code unit 102. Further, the file generation unit 105 receives the input of the metadata from the metadata generation unit 103. Further, the file generation unit 105 receives input of GRA picture information and role information from the determination unit 104.

Then, the file generation unit 105 newly defines and generates a sample group of GRA pictures represented by the syntax of FIG. FIG. 5 is a diagram showing an example of a sample group of GRA pictures.

Here, the file generation unit 105 generates GRA information representing information about the GAR picture. For example, the file generation unit 105 generates a sample group of GRA pictures as GRA information. In that case, the file generation unit 105 generates GraSyncSampleGroupEntry (), which is a new group of VisualSampleGroup, as a sample group of GRA pictures. Then, the file generation unit 105 sets the information about GRA in GraSyncSampleGroupEntry ().

Specifically, the file generation unit 105 sets a GRA picture in GraSyncSampleGroupEntry (), and roll_distance in GraSyncSampleGroupEntry () represents a role in GRA. For example, the file generation unit 105 sets the number of frames from the random access point of GRA to the recovery point as the value of roll_distance by using recovery_per_cnt defined by JVET-N086.

Further, the file generation unit 105, as GradualoutputStruct () in GraSyncSampleGroupEntry (), provides gradual display permission information which is information indicating whether or not to execute the gradual display (Gradual output) which displays the clean area so as to gradually expand. Set. The information on whether or not to allow the gradual display for displaying the clean area so as to gradually expand may be preset in the file generation device 1 by the user, or the file generation unit 105 may display the gradual display. You may receive input when setting permission information.

Further, the file generation unit 105 acquires information on how the refresh area transitions in the picture by using the image data included in the acquired coded stream. Then, the file generation unit 105 generates gradual display type information representing the transition of the display of the refresh area from the acquired information. Then, the file generation unit 105 sets the display control information regarding the clean area when executing GRA as GradualOutputInformationStruct () in GraSyncSampleGroupEntry ().

Further, the file generation unit 105 sets dirty area interpolation information indicating how to interpolate the dirty area as InterpolationStruct () in GraSyncSampleGroupEntry (). Information on how to interpolate the dirty area may be preset in the file generation device 1 by the user, for example, or the file generation unit 105 receives input when setting the dirty area interpolation information. May be good.

The file generation unit 105 creates a file by storing the generated GRA picture sample group for each segment in the ISOBMFF file together with the image data and metadata included in the coded stream, and generates a segment file of the video content. Specifically, the file generation unit 105 generates an ISOBMFF file including video information (mdat) and management information (moov). mdat is a data area in the ISOBMFF file. Further, moov is a header area in ISOBMFF.

Then, the file generation unit 105 stores GRA information, which is information about the GRA picture, in the moov of ISOBMFF. Specifically, the file generation unit 105 sets the GraSyncSampleGroupBox that stores the GraSyncSampleGroupEntry () in the moov of the ISOBMFF. For example, the file generation unit 105 sets the GraSyncSampleGroupBox in the BOX 161 in the moov indicated by the BOX 160, as shown in FIG. FIG. 6 is a diagram showing a storage example of the GraSyncSampleGroupBox.

Here, it is conceivable that the MPEG-4 file does not have a movie fragment that divides one content into a plurality of pieces, or that a movie fragment is performed. FIG. 7 is a diagram showing a storage state of the GraSyncSampleGroupBox according to the presence or absence of a movie fragment.

When the movie fragment is not performed, as shown in the file 170 of FIG. 7, one moov and one mdat exist for one video content. In this case, the file generation unit 105 stores one GraSyncSampleGroupBox indicated by BOX171 in moov.

On the other hand, when a movie fragment is performed, as shown in file 180 of FIG. 7, one moov and a plurality of pairs of moof and mdat exist for one video content. In this case, the file generation unit 105 stores one GraSyncSampleGroupBox in each moof as shown by BOX181 to 183.

Return to Fig. 2 and continue the explanation. After that, the file generation unit 105 outputs the segment file of the video content including the sample group of the GRA picture to the transmission unit 12.

The transmission unit 12 receives the input of the video data segment file from the file generation unit 105. Then, the transmission unit 12 uploads the acquired video data segment file to the Web server 3.

(Configuration of client device)
FIG. 8 is a block diagram of the client device. As shown in FIG. 8, the client device 2 has a reproduction processing unit 20 and a control unit 21.

The control unit 21 controls the operation of each unit of the reproduction processing unit 20. For example, the control unit 21 comprehensively controls the operation timing of each unit of the reproduction processing unit 20. Further, the control unit 21 receives an input of a command from the operator. Then, the control unit 21 controls the reproduction processing unit 20 according to a command input from the user using an input device (not shown).

For example, the control unit 21 receives an input of a random access instruction. Then, the control unit 21 causes the reproduction processing unit 20 to execute the random access. At that time, the control unit 21 causes the file processing unit 202 to determine whether or not the random access sample is GRA, and determines whether to execute GRA as random access or normal decoding processing. In this normal decoding process, random access using the IDR picture is executed.

The reproduction processing unit 20 decodes and displays the image data. Further, when the operator instructs the random access, the reproduction processing unit 20 receives the control from the control unit 21 and executes the random access. The details of the reproduction processing unit 20 will be described below. The reproduction processing unit 20 includes a file acquisition unit 201, a file processing unit 202, a GRA information acquisition unit 203, a decoding processing unit 204, a display information generation unit 205, and a display unit 206.

The file acquisition unit 201 acquires the segment file of the video content to be reproduced from the Web server 3 according to the video reproduction instruction input from the user. Then, the file acquisition unit 201 outputs the segment file of the acquired video content to the file processing unit 202.

The file processing unit 202 receives the input of the segment file in which the data of the video content to be played is stored from the file acquisition unit 201. The file processing unit 202 parses the acquired segment file. Then, the file processing unit 202 acquires image data and metadata. After that, the file processing unit 202 outputs the image data to the decoding processing unit 204. Further, the file processing unit 202 outputs the metadata to the display information generation unit 205.

Further, when the user inputs a random access instruction, the file processing unit 202 receives an instruction from the control unit 21 to confirm the random access sample. Then, the file processing unit 202 confirms whether or not there is a sample group of the GRA picture represented by the GraSyncSampleGroupEntryBox, confirms whether or not the random access sample is GRA, and whether or not to use GRA as the random access. To judge.

When GRA is not used, the file processing unit 202 causes the file processing unit 202 to execute a normal decoding process. In this case, the file processing unit 202 identifies an IDR picture that is a random access point corresponding to the random access specified by the user. Then, the file processing unit 202 outputs the image data following from the image data of the specified IDR picture to the decoding processing unit 204, and causes random access using the IDR picture.

On the other hand, when GRA is used, the file processing unit 202 outputs the GraSyncSampleGroupEntryBox, which is a sample group of GRA pictures, to the GRA information acquisition unit 203. Further, the file processing unit 202 identifies a GRA picture that is a random access point corresponding to the random access specified by the user. Then, the file processing unit 202 outputs the image data following from the specified GRA picture, and instructs the decoding processing unit 204 to execute the GRA.

The GRA information acquisition unit 203 acquires a GraSyncSampleGroupEntryBox, which is a sample group of GRA pictures, from the file processing unit 202 when a random access instruction is input by the user and GRA is used. Then, the GRA information acquisition unit 203 acquires the information of GradualOutputStruct (), the information of GradualOutputInformationStruct (), the information of IntarpolationSturct (), and the information of roll_disntance from GraSyncSampleGroupEntry () which is a sample group of GRA pictures.

Next, the GRA information acquisition unit 203 determines whether or not gradual display is permitted from the value of GradualOutputStruct (). When the gradual display is permitted, the GRA information acquisition unit 203 acquires the gradual display type information from the value of the GradualOutputInformationStruct (). Further, the GRA information acquisition unit 203 acquires dirty region interpolation information from the value of IntarpolationSturct (). Further, the GRA information acquisition unit 203 acquires a role which is the number of frames from the value of roll_disntance to the picture next to the GRA picture to the picture as the recovery point. Then, the GRA information acquisition unit 203 outputs the gradual display type information, the dirty area interpolation information, and the roll information to the decoding processing unit 204 together with the instruction of the gradual display.

On the other hand, when the gradual display is not permitted, the GRA information acquisition unit 203 acquires a role which is the number of frames from the next frame of the GRA picture to the picture which is the recovery point from the value of roll_disntance. Then, the GRA information acquisition unit 203 outputs to the decoding processing unit 204 an instruction for displaying after full-screen decoding, which is displayed after the entire screen of the picture can be decoded in the clean area, together with the roll information.

The decoding processing unit 204 receives the input of image data from the file processing unit 202. Then, the decoding processing unit 204 performs a decoding process on the acquired image data. After that, the decoding processing unit 204 outputs the decrypted 3DoF image data to the display information generation unit 205.

Further, when an instruction to execute random access is input from the user and GRA is not used as random access, the decoding processing unit 204 inputs the image data continuing from the IDR picture serving as the random access point to the file processing unit 202. Receive from. Then, the decoding processing unit 204 decodes the image data following the IDR picture and outputs the decoded image data to the display information generation unit 205. As a result, the image is reproduced from the IDR picture which is the random access point designated by the user, and the random access is executed.

On the other hand, when an instruction to execute random access is input from the user and GRA is used as random access, the decoding processing unit 204 inputs the image data following from the GRA picture serving as the random access point to the file processing unit 202. Receive from.

Further, when the gradual display is permitted, the decoding processing unit 204 receives the input of the following information from the GRA information acquisition unit 203 together with the instruction of the gradual display. For example, the decoding processing unit 204 receives input from the GRA information acquisition unit 203 of the gradual display type information, the dirty area interpolation information, and the roll information which is the number of frames from the next frame of the GRA picture to the picture as the recovery point. ..

Next, the decoding processing unit 204 identifies what kind of gradual display is to be performed from the roll information and the gradual display type information. Then, the decoding processing unit 204 starts decoding from the image data of the GRA picture. After that, the decoding processing unit 204 decodes each frame so that the gradual display is executed while interpolating the dirty area according to the dirty area interpolation information until the last frame of the gradual display.

The decoding processing unit 204 sequentially outputs the image data from the decoded GRA picture to the last frame of the subsequent gradual display to the display information generation unit 205. After the output of the last frame of the gradual display is completed, the decoding processing unit 204 returns to the normal decoding processing. Then, the decoding processing unit 204 continues to output the image data obtained by normal decoding to the display information generation unit 205.

On the other hand, when the gradual display is not permitted, the decoding processing unit 204 receives the input of the display instruction after full-screen decoding from the GRA information acquisition unit 203 together with the roll information. Then, the decoding processing unit 204 starts decoding from the GRA picture. Next, the decoding processing unit 204 identifies the last picture in the GRA from the number of frames specified as the role. Then, the decoding processing unit 204 executes decoding up to the specified last picture, and generates the decoded image data with the entire screen of the picture as a clean area. After that, the decoding processing unit 204 outputs the decoded image data to the display information generation unit 205 with the entire screen of the picture as a clean area.

The display information generation unit 205 receives the input of the decoded image data from the decoding processing unit 204. Further, the display information generation unit 205 receives the input of metadata from the file processing unit 202. Then, the display information generation unit 205 generates a display image from the image data by using the information at the time specified in the metadata. After that, the display information generation unit 205 provides the generated display image to the display unit 206 for display.

In particular, when GRA is used and gradual display is performed, the display information generation unit 205 receives input of image data following from the GRA picture that performs gradual display from the decoding processing unit 204. Then, the display information generation unit 205 generates a display image for gradual display according to the display method of the gradual display in which the clean area is designated and the processing method of the dirty area. Then, the display information generation unit 205 outputs the generated display image to the display unit 206 for display, thereby performing gradual display.

At that time, the display information generation unit 205 presents the user with information indicating that the display is gradual display. For example, the display information generation unit 205 may display information indicating that the gradual display is being displayed on the display unit 206.

Further, when GRA is used and gradual display is not performed, the display information generation unit 205 receives the input of the image data decoded by using the entire screen of the picture as a clean area from the decoding processing unit 204. Then, the display information generation unit 205 generates a display image in which the reproduction / return of the entire screen is completed. Then, the display information generation unit 205 outputs the generated display image to the display unit 206 and displays it, so that the video content is displayed from the state in which the reproduction / return of the entire screen is completed.

The display unit 206 has a display device such as a monitor. The display unit 206 receives the input of the display image generated by the display information generation unit 205. Then, the display unit 206 causes the display device to display the acquired display image.

(File generation procedure according to the first embodiment)
Next, with reference to FIG. 9, the flow of the file generation process by the file generation device 1 will be described in detail. FIG. 9 is a flowchart of a file generation process by the file generation device.

The data acquisition unit 101 acquires the original data of the video content from the Web server 3. The original data includes image data and control information of a plurality of images. Then, the data acquisition unit 101 outputs the image data included in the acquired original data to the coding unit 102. Further, the data acquisition unit 101 outputs the control information included in the acquired original data to the metadata generation unit 103. The coding unit 102 receives an input of image data from the data acquisition unit 101. Then, the coding unit 102 executes the coding of the image data so that the GRA can be executed (step S101). Then, the coding unit 102 outputs the coded image data to the file generation unit 105. Further, the metadata generation unit 103 generates metadata from the control information input from the data acquisition unit 101 and outputs the metadata to the file generation unit 105.

The determination unit 104 identifies a role that is the number of frames from the GRA picture of the GRA and the frame next to the GRA picture to the frame of the recovery point from the image data encoded by the coding unit 102. After that, the determination unit 104 outputs the GRA picture information and the role information to the file generation unit 105. The file generation unit 105 receives the input of image data from the encoding unit 102. Further, the file generation unit 105 receives input of GRA picture and role information from the determination unit 104. Then, the file generation unit 105 newly defines the GraSyncSampleGroupEntryBox as a sample group of the GRA picture. Next, the file generation unit 105 sets the roll information in the roll_distance of GraSyncSampleGroupEntry () (step S102).

Next, the file generation unit 105 acquires the transition information of the refresh area using the image data (step S103).

Then, the file generation unit 105 generates the gradual display type information from the transition information of the refresh area and sets it as the GradualOutputInformationStruct () of GraSyncSampleGroupEntry () (step S104).

Next, the file generation unit 105 sets the gradual display permission information as the GradualOutputOutputStruct () of GraSyncSampleGroupEntry (). Further, the file generation unit 105 sets the dirty area interpolation information as InterpolationStruct () of GraSyncSampleGroupEntry () (step S105).

After that, the file generation unit 105 sets the GraSyncSampleGroupEntryBox in the moov including other management information in the ISOBMFF file (step S106).

Next, the file generation unit 105 is a segment file of video content including mdat which is video information and moov which is management information, or a segment file of video content which includes mdat which is video information and moov and moof which is management information. Is generated (step S107). The transmission unit 108 uploads the segment file of the video content generated by the file generation unit 105 to the Web server 3.

(Regeneration processing procedure according to the first embodiment)
Next, with reference to FIG. 10, the flow of the reproduction process executed by the client device 2 will be described. FIG. 10 is a flowchart of the reproduction process executed by the client device.

The file acquisition unit 201 acquires the segment file of the video content to be played back from the Web server 3. The file processing unit 202 parses the segment file of the video content acquired by the file acquisition unit 201. Then, the file processing unit 202 outputs the image data to the decoding processing unit 204. Further, the file processing unit 202 outputs the metadata to the display information generation unit 205. The decoding processing unit 204 decodes the acquired image data and outputs it to the display information generation unit 205. The display information generation unit 205 generates a display image using image data and metadata and outputs the display image to the display unit 206 to display the display image. The control unit 21 determines whether or not a random access instruction has been detected (step S201). When the random access instruction is not detected (step S201: negation), the control unit 21 causes the file acquisition unit 201 to continue the process as it is. Then, the reproduction process proceeds to step S208.

On the other hand, when the random access instruction is detected (step S201: affirmative), the control unit 21 instructs the file processing unit 202 to execute the random access. The file processing unit 202 determines whether or not the random access sample is GRA (step S202).

When the random access sample is GRA (step S202: affirmative), the file processing unit 202 transmits the GraSyncSampleGroupEntryBox to the GRA information acquisition unit 203. The GRA information acquisition unit 203 acquires the information of the GraSyncSampleGroupEntryBox (step S203). Specifically, the GRA information acquisition unit 203 acquires GRA picture information, gradual display permission information, gradual display type information, dirty area interpolation information, and role information.

Then, the GRA information acquisition unit 203 determines whether or not the gradual display is permitted by using the gradual display permission information (step S204).

When the gradual display is permitted (step S204: affirmative), the GRA information acquisition unit 203 outputs the GRA picture information, the gradual display type information, the dirty area interpolation information, and the roll information to the decoding processing unit 204. .. The decoding processing unit 204 decodes the image data following from the GRA picture so that the images are displayed in the display order indicated by the gradual display type information while interpolating the dirty area according to the dirty area interpolation information to generate the display information. Output to unit 205. Then, the display information generation unit 205 generates a display image for gradual display using the image data acquired from the decoding processing unit 204, and provides the display information unit 206 for display. At that time, the display information generation unit 205 presents the user with information indicating that the display is gradual display (step S205).

After that, the decoding processing unit 204 determines whether or not the gradual display is completed by using the roll information and the like (step S206). If the gradual display is not completed (step S206: negative), the video reproduction process returns to step S205. On the other hand, when the gradual display is completed (step S206: affirmative), the video reproduction process returns to step S201.

On the other hand, when the gradual display is prohibited (step S204), the GRA information acquisition unit 203 outputs the GRA picture information and the roll to the decoding processing unit 204 after full-screen decoding. The decoding processing unit 204 decodes from the GRA picture in response to the instruction of displaying after full-screen compounding, and confirms that the entire screen of the picture can be decoded as a clean screen by using the roll information. Then, after all the screens of the picture are decoded as the clean area, the decoding processing unit 204 outputs the image data in which the entire screen of the picture is decoded as the clean area to the display information generation unit 205. The display information generation unit 205 generates a display image obtained by decoding the entire screen of the picture as a clean area and provides it to the display unit 206 for display (step S207). After that, the video reproduction process returns to step S201.

On the other hand, if the random access sample is not GRA (step S202: negative), the video playback process proceeds to step S208.

The file processing unit 202, the decoding processing unit 204, the display information generation unit 205, and the display unit 206 execute normal decoding and display on the input image (step S208). Here, in the case of random access, random access using an IDR picture is performed in normal decoding.

After that, the file processing unit 202, the decoding processing unit 204, and the display information generation unit 205 determine whether or not all the image data of the video content has been decoded (step S209). If the image data to be decoded remains (step S209: negative), the video reproduction process returns to step S201. On the other hand, when the decoding of all the image data of the video content is completed (step S209: affirmative), the file processing unit 202, the decoding processing unit 204, and the display information generation unit 205 end the video reproduction processing.

As described above, in the file generation device according to the present embodiment, the GRA is executablely encoded, the GRA picture is specified from the encoded image data, and the sample group of the GRA picture is newly created. Define and store role information. As a result, the maximum coding amount is suppressed, the code delay is reduced by the coding process and the transmission process, the image distortion due to the reproduction of the dirty area is prevented, and the same content is reproduced by any reproduction device. GRA can be executed properly. That is, the file generation device according to the present embodiment can provide the user with a high-quality viewing experience.

[1.1 Modifications of the First Embodiment (1)]
Next, a modified example (1) of the first embodiment will be described. In this modification, the gradual display permission information will be described in detail. FIG. 11 is a diagram showing an example of the syntax of GradualOutputStruct ().

The file generation unit 105 generates a GradualOutputStruct () that stores the gradual display permission information in the GRA sample group by using the syntax shown in FIG.

Specifically, the file generation unit 105 stores the gradual_output_flag in the GradualOutputStruct () as shown in FIG. Then, the file generation unit 105 defines the value of the gradual_output_flag as shown in FIG. FIG. 12 is a diagram of an example of the contents indicated by each value of gradual_output_flag. For example, the file generation unit 105 defines that the gradual display is valid when the value of gradual_output_flag is 0, and the gradual display is invalid when the value of gradual_output_flag is 1. The fact that the gradual display is valid means that the gradual display is permitted to be executed at the time of random access. On the other hand, when the gradual display is invalid, it means that it is prohibited to execute the gradual display at the time of random access. Then, the file generation unit 105 sets the generated GradualOutputStruct () in the GraSyncSampleGroupEntryBox to generate an ISOBMFF file.

Here, in this embodiment, a flag called gradual_output_flag is newly defined to indicate permission or prohibition of gradual display, but the setting method of this gradual permission information is not limited to this. For example, the file generation unit 105 sets the picture as the recovery point in GraSyncSampleGroupEntry (). Then, the file generation unit 105 may explicitly prohibit the gradual display by setting the number of frames from the picture one frame before the picture as the recovery point to the GRA picture as a role as a role.

As described above, the file generation device according to this embodiment stores information indicating either permission or prohibition of gradual display by using the flag set in GradualOutputStruct (). By making it possible to prohibit the gradual display in this way, it is possible to avoid the execution of the interpolation processing and to avoid problems such as image distortion due to decoding that occurs at the time of random access. GRA is premised on gradual display because it aims to reduce the code delay in the coding process and the transmission process by suppressing the maximum coding amount, but it is possible to suppress image distortion depending on the user's request. It is possible to provide a viewing experience that meets the needs of users.

[1.2 Modification example (2) of the first embodiment]
Next, a modified example (2) of the first embodiment will be described. In this modification, the gradual display type information will be described in detail. FIG. 13 is a diagram showing an example of the syntax of GradualOutputInformationStruct ().

The file generation unit 105 generates a GradualOutputInformationStruct () that stores the gradual display type information in the GRA sample group by using the syntax shown in FIG.

Specifically, the file generation unit 105 stores the gradual_output_type in the GradualOutputInformationStruct () as shown in FIG. Then, the file generation unit 105 defines the value of gradual_output_type as shown in FIG. FIG. 14 is a diagram of an example of the contents indicated by each value of gradual_output_type.

For example, the file generation unit 105 defines that when the value of gradual_output_type is 0, it indicates that the refresh area moves from left to right or right to left on the picture display screen. In this case, the image is gradually displayed from left to right or from right to left on the screen. Further, the file generation unit 105 defines that when the value of gradual_output_type is 1, it indicates that the refresh area moves from the top to the bottom or from the bottom to the top of the picture display screen. In this case, the image is gradually displayed from the top to the bottom of the screen or from the bottom to the top. Further, the file generation unit 105 defines that when the value of gradual_output_type is 2, it means that the refresh area moves from the center to the edge of the picture display screen. In this case, the image is gradually displayed from the center of the screen toward the outer edge of the screen. Further, the file generation unit 105 defines that when the value of gradual_output_type is 3, the refresh area moves in the order of the raster scan of the picture display screen. In this case, the images are gradually displayed in the order of raster scan. Further, the file generation unit 105 defines that when the value of gradual_output_type is 4, it means that the refresh area moves randomly on the picture display screen. In this case, the images are displayed randomly and gradually. Further, the file generation unit 105 sets the value of gradual_output_type to 5 when the transition order of the refresh area is not specified.

Here, in this embodiment, the six types of gradual display patterns shown in FIG. 14 are shown together with the undefined ones, but if it can be expressed by one bit other than this, the file generation unit 105 has a gradient_output_type. Other patterns may be defined.

As described above, the file generator according to this embodiment stores the gradual display type information indicating how the gradual display is performed using the flag set in the GradualOutputInformationStruct (). Thus, by storing the graph dual display type information to a sample group of GRA picture, the client device may figure out without analyzing Parmeter_set and slice_header, performs advance what graph dual display before decoding can do. This makes it easy to distinguish between the accurately decoded area and the other areas, and the client device is suitable for displaying other information such as information indicating that the gradual display is being executed. The area can be easily specified.

Furthermore, the file generation unit 105 can also define GradualOutputInformationStruct () using information other than gradual_output_type. FIG. 15 is a diagram showing a first example of GradualOutputInformationStruct () using other definitions. Here, the case where the gradual display is gradually displayed at a constant ratio and linearly will be described.

In this case, the file generation unit 105 stores the information of the display area of the clean area that is first output to GradualOutputInformationStruct () as shown in FIG. First_output_clean_region_x, first_output_clean_region_y, first_output_clean_region_width, and first_output_clean_region_height in FIG. 15 represent the x-coordinate, Y-coordinate, width, and height of the reference point of the display area of the clean area that is output first, respectively.

When the gradual display is performed linearly at a constant ratio, when the client device 2 knows the display area of the clean area to be output first, how the gradual display is performed based on the roll information. It is possible to identify the client. Therefore, the file generation unit 105 can also set the gradual display type information in the GradualOutputInformationStruct () by using the syntax shown in FIG.

FIG. 16 is a diagram showing a second example of GradualOutputInformationStruct () using another definition. Here, too, the case where the gradual display is gradually displayed at a constant ratio and linearly will be described.

In this case, the file generation unit 105 stores the information of the display area of the first and last refresh areas in GradualOutputInformationStruct () as shown in FIG. The first_output_refresh_region_x, first_output_refresh_region_y, first_output_refresh_region_width, and first_output_refresh_region_height in FIG. 16 represent the x-coordinate, Y-coordinate, width, and height of the reference point of the display area of the refresh area that is output first, respectively. Further, last_output_refresh_region_x, last_output_refresh_region_y, last_output_refresh_region_width, and last_output_refresh_region_height in FIG. 16 represent the x-coordinate, Y-coordinate, width, and height of the reference point of the display area of the refresh area that is output last, respectively.

When the gradual display is performed linearly at a constant ratio, the client device 2 knows the display areas of the first and last refresh areas, and how the gradual display is performed based on the roll information. Can be identified. Therefore, the file generation unit 105 can also set the gradual display type information in the GradualOutputInformationStruct () by using the syntax shown in FIG.

FIG. 17 is a diagram showing a third example of GradualOutputInformationStruct () using other definitions. Here, in the gradual display, the case where the amount of information for each frame increases monotonically in the lecture, but the amount of increase is not a constant ratio will be described.

In this case, as shown in FIG. 17, the file generation unit 105 stores the information of all the clean areas of each frame used in GRA in the GradualOutputInformationStruct () as table information. The first_output_clean_region_x, first_output_clean_region_y, first_output_clean_region_width, and first_output_clean_region_height in FIG. 17 represent the x-coordinate, Y-coordinate, width, and height of the reference point of the display area of the clean area in the i-th frame with the GRA picture as the 0th frame, respectively.

The amount of information for each frame is a monotonous increase in the lecture, but if the amount of increase is not a constant ratio and gradual display is performed, the client device 2 can display the gradual display if the clean area of each frame can be grasped. It is possible to identify what is done. Therefore, the file generation unit 105 can also set the gradual display type information in the GradualOutputInformationStruct () by using the syntax shown in FIG. By using such a definition for GradualOutputInformationStruct (), it is possible to notify the client device 2 how the gradual display is performed even when the transition of the refresh area is complicated.

FIG. 18 is a diagram showing a fourth example of GradualOutputInformationStruct () using other definitions. Here, too, in the gradual display, the case where the amount of information for each frame increases monotonically in the lecture, but the amount of increase is not a constant ratio will be described.

In this case, the file generation unit 105 stores the information of all the refresh areas of each frame used in GRA in the GradualOutputInformationStruct () as table information as shown in FIG. The first_output_clean_region_x, first_output_clean_region_y, first_output_clean_region_width, and first_output_clean_region_height in FIG. 18 represent the x-coordinate, Y-coordinate, width, and height of the reference point of the display area of the refresh area in the i-th frame with the GRA picture as the 0th frame, respectively.

The amount of information for each frame is a monotonous increase in the lecture, but if the amount of increase is not a constant ratio and gradual display is performed, the client device 2 can display the gradual display if the refresh area of each frame can be grasped. It is possible to identify what is done. Therefore, the file generation unit 105 can also set the gradual display type information in the GradualOutputInformationStruct () by using the syntax shown in FIG. By using such a definition for GradualOutputInformationStruct (), it is possible to notify the client device 2 how the gradual display is performed even when the transition of the refresh area is complicated.

In any of the definition methods shown in FIGS. 15 to 18 described above, the client device 2 does not refer to the values of the VVC parameter_set and slice_header in the display process after decoding, and the clean area and dirty. It becomes possible to distinguish from the area. Further, the client device 2 can also utilize the information that identifies the clean area and the dirty area for the interpolation process.

As described above, in the distribution system according to this modification, the information on how the gradient is displayed at the time of GRA can be identified by the client device before decoding. The client device can use the identified information for UX (User Experience) such as notification to the user at the time of random access. Further, since the client device can identify information as to which area is displayed in the gradient by GRA without using parameter_set, it can be used for the interpolation processing of the dirty area.

[1.3 Modified example of the first embodiment (3)]
Next, a modified example (3) of the first embodiment will be described. In this modification, the dirty area interpolation information will be described in detail. FIG. 19 is a diagram showing an example of the syntax of InterpolationStruct ().

The file generation unit 105 generates InterpolationStruct (), which stores dirty region interpolation information in the GRA sample group, using the syntax shown in FIG.

Specifically, the file generation unit 105 stores interpolation_type in InterpolationStruct () as information indicating how to interpolate the dirty area as shown in FIG. Then, the file generation unit 105 defines the value of interpolation_type as shown in FIG. FIG. 20 is a diagram of an example of the contents indicated by each value of interpolation_type.

For example, the file generation unit 105 defines that when the value of interpolation_type is 0, the dirty area is interpolated with the set color. In this case, the user determines the color to interpolate the dirty area. By interpolating the dirty area with an appropriate color in this way, the image is graduated like a frame-in at the time of GRA random access. Further, the file generation unit 105 defines that when the value of interpolation_type is 1, the image of the frame before the start of random access is displayed as a still image in the dirty area. By interpolating the dirty area with the image before the start of the random access in this way, the image is gradually displayed like a crossfade at the time of the GRA random access. Further, the file generation unit 105 sets the value of interpolation_type to 2 when the interpolation method of the dirty area is not determined. In this case, the method of interpolating the dirty region depends on the mounting state of the video reproduction function in the client device 2.

As described above, the file generation device according to this embodiment stores dirty area interpolation information indicating how to perform dirty area interpolation using the flag set in InterpolationStruct (). By storing the dirty area interpolation information in the GRA picture sample group in this way, the dirty area interpolation is performed by the same method regardless of the client device, and image distortion at the time of random access is suppressed. And the appearance can be unified. In addition, the content creator can set the optimum display method for the dirty area, and the gradual display can be used as a UX such as fade-in or crossfade to realize the same content playback method regardless of the playback device. It is possible to provide a high-quality viewing experience.

[2. Second Embodiment]
In each of the above embodiments and their modifications, the case of storing in ISOBMFF has been described. However, even when transmitting using the Matroska Media Container (http://www.matroska.org/) shown in FIG. 21, it is possible to provide gradual display permission information, gradual display type information, and dirty area interpolation information. Is. FIG. 21 is a diagram showing the format of the Matroska Media Container. In that case, the file generation unit 105 stores the transition identification information, the transition execution area information, and the transition trigger information in the element newly defined in the Track Entry element.

[Hardware configuration]
FIG. 22 is a hardware configuration diagram of the computer. The file generation device 1 and the client device 2 can be realized by the computer 90 shown in FIG. In the computer 90, the processor 91, the memory 92, the network interface 93, the non-volatile storage 94, the input / output interface 95, and the display interface 86 are connected to each other via a bus.

External devices such as an input device, an output device, a storage device, and a drive are connected to the input / output interface 95. The input device is, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, or the like. The output device is, for example, a speaker, an output terminal, or the like. The storage device is, for example, a hard disk, a RAM (Random Access Memory) disk, or the like. The drive drives removable media such as magnetic disks, optical disks, magneto-optical disks, or semiconductor memories. A display 98, which is a display device, is connected to the display interface 96.

The network interface 93 is connected to an external network. The file generation device 1 and the client device 2 are connected to each other via the network interface 93. Further, the file generation device 1 and the client device 2 are connected to the Web server 3 via the network interface 93. The non-volatile storage 94 is a built-in auxiliary storage device such as a hard disk or SSD (Solid State Drive).

In the computer 90 configured as described above, the processor 91, for example, loads the program stored in the non-volatile storage 94 into the memory 92 via the bus and executes the series of processing described above. Is done. The memory 92 also appropriately stores data and the like necessary for the processor 91 to execute various processes.

The program executed by the processor 91 can be recorded and applied to removable media such as package media, for example. In that case, the program can be installed in the non-volatile storage 94 via the input / output interface 95 by mounting the removable media in the drive which is the external device 97.

This program can also be provided via wired or wireless transmission media such as local area networks, the Internet, and digital satellite broadcasting. In that case, the program can be received at the network interface 93 and installed in the non-volatile storage 94.

In addition, this program can be installed in advance in the non-volatile storage 94.

Although the embodiments of the present disclosure have been described above, the technical scope of the present disclosure is not limited to the above-described embodiments as they are, and various changes can be made without departing from the gist of the present disclosure. In addition, components covering different embodiments and modifications may be combined as appropriate.

Note that the effects described in the present specification are merely examples and are not limited, and other effects may be obtained.

Note that this technology can also have the following configuration.

(1) A coding unit that encodes an image in an image sequence to generate a coded stream, and
A determination unit that determines one or more decoding start images in the image sequence that can be used as an image to start decoding during Gradual Random Access (GRA).
The header area of the file format including the header area and the data area is provided with a file generation unit that inserts GRA information regarding the decoding start image determined by the determination unit and inserts the coded stream into the data area. Information processing device.
(2) The information processing device according to the appendix (1), wherein the file generation unit includes gradual display permission information indicating permission or disapproval of gradual display in the GRA information.
(3) The information processing device according to the appendix (2), wherein the file generation unit includes gradual display type information on how to perform the gradual display in the GRA information.
(4) The information processing device according to the appendix (3), wherein the file generation unit sets the position and area information of a clean area in each of the images displayed at the time of executing the gradual display as the gradual display type information.
(5) The information processing device according to the appendix (3), wherein the file generation unit sets the position of a refresh area in each of the images displayed when the gradual display is executed as the gradual display type information.
(6) The information processing apparatus according to the appendix (2), wherein the file generation unit includes dirty area interpolation information indicating a dirty area area information and a display method in the GRA information.
(7) The images in the image sequence are encoded to generate an encoded stream.
Determine one or more decoding start images in the image sequence that can be used as the image to initiate decoding during gradual random access.
An information processing method in which a computer is made to execute a process of inserting GRA information about a determined decoding start image into the header area of a file format including a header area and a data area and inserting the coded stream into the data area.
(8) A file acquisition unit for acquiring a file generated according to a file format including a header area and a data area containing a coded stream containing coded series of image data, and a file acquisition unit.
One or more decodings that can be used as an image to start decoding at the time of Gradual Random Access (GRA) of the series of images from the header area of the file acquired by the file acquisition unit. A GRA information acquisition unit that acquires GRA information for identifying the start image,
A reproduction processing apparatus including a decoding processing unit that decodes the coded stream based on the GRA information acquired by the GRA information acquisition unit.
(9) Obtain a file generated according to a file format including a header area and a data area containing a coded stream containing data of a series of encoded images.
From the header area of the acquired file, GRA information for identifying one or more decoding start images that can be used as an image to start decoding at the time of gradual random access in the series of images is acquired.
A reproduction processing method in which a computer executes a process of decoding the coded stream based on the acquired GRA information.

1 File generation device 2 Client device 3 Web server 10 File generation processing unit 11 Control unit 12 Transmission unit 20 Playback processing unit 21 Control unit 100 Distribution system 101 Data acquisition unit 102 Coding unit 103 Metadata generation unit 104 Judgment unit 105 File generation Part 201 File acquisition part 202 File processing part 203 GRA information acquisition part 204 Decoding processing part 205 Display information generation part 206 Display part

Claims

An encoding unit that encodes an image in an image sequence to generate an encoded stream,
A determination unit that determines one or more decoding start images in the image sequence that can be used as an image to start decoding during Gradual Random Access (GRA).
The header area of the file format including the header area and the data area is provided with a file generation unit that inserts GRA information regarding the decoding start image determined by the determination unit and inserts the coded stream into the data area. Information processing device.
The information processing device according to claim 1, wherein the file generation unit includes gradual display permission information indicating permission or disapproval of gradual display in the GRA information.
The information processing device according to claim 2, wherein the file generation unit includes gradual display type information on how to perform the gradual display in the GRA information.
The information processing device according to claim 3, wherein the file generation unit sets the position and area information of a clean area in each of the images displayed when the gradual display is executed as the gradual display type information.
The information processing device according to claim 3, wherein the file generation unit sets the position of a refresh area in each of the images displayed when the gradual display is executed as the gradual display type information.
The information processing device according to claim 2, wherein the file generation unit includes dirty area interpolation information indicating a dirty area area information and a display method in the GRA information.
Encode the images in the image sequence to generate a coded stream,
Determine one or more decoding start images in the image sequence that can be used as the image to initiate decoding during gradual random access.
An information processing method in which a computer is made to execute a process of inserting GRA information about a determined decoding start image into the header area of a file format including a header area and a data area and inserting the coded stream into the data area.
A file acquisition unit that acquires a file generated according to a file format including a header area and a data area containing a coded stream containing data of a series of encoded images, and a file acquisition unit.
One or more decodings that can be used as an image to start decoding at the time of Gradual Random Access (GRA) of the series of images from the header area of the file acquired by the file acquisition unit. A GRA information acquisition unit that acquires GRA information for identifying the start image,
A reproduction processing apparatus including a decoding processing unit that decodes the coded stream based on the GRA information acquired by the GRA information acquisition unit.
Gets the file generated according to the file format containing the header area and the data area containing the coded stream containing the coded series of image data.
From the header area of the acquired file, GRA information for identifying one or more decoding start images that can be used as an image to start decoding at the time of gradual random access in the series of images is acquired.
A reproduction processing method in which a computer executes a process of decoding the coded stream based on the acquired GRA information.