WO2015083354A1

WO2015083354A1 - File generation method, playback method, file generation device, playback device, and recording medium

Info

Publication number: WO2015083354A1
Application number: PCT/JP2014/005963
Authority: WO
Inventors: 遠間　正真; 智輝小川; 洋矢羽田; 山本　雅哉; 村瀬　薫; 小塚　雅之
Original assignee: パナソニックＩｐマネジメント株式会社
Priority date: 2013-12-03
Filing date: 2014-11-28
Publication date: 2015-06-11

Abstract

A file generation method according to one embodiment of the present disclosure is one for generating an MP4 file and including: a step for generating one MP4 file by integrating two streams in a manner such that the two streams are played back continuously; and a step for storing, in the generated MP4 file, information indicating an interval during which the playback timing for each of the two streams overlaps. As a result, it is possible to generate an MP4 file appropriate for overlapping playback.

Description

File generation method, playback method, file generation device, playback device, and recording medium

This disclosure relates to a file generation method for generating an MP4 file.

The file format used in conventional optical discs is the MPEG2-TS (MPEG-2 Transport Stream) system defined by ISO / IEC 138181-1. Hereinafter, the MPEG2-TS system is simply referred to as MPEG2-TS. That is, a file configured by multiplexing a video stream, an audio stream, and a subtitle stream in the MPEG2-TS file format is recorded on the optical disc. Specifically, in MPEG2-TS, a video stream, an audio stream, a subtitle stream, and the like are each divided into a plurality of 188-byte TS packets, multiplexed, and recorded on an optical disc. This MPEG2-TS is optimized for media such as broadcast or optical discs that transmit or record data that is read and processed sequentially from the front. Therefore, even a consumer device having a relatively small buffer capacity can efficiently read, decode, and reproduce a stream.

On the other hand, the file format that is being used in the distribution of contents in the recent network is the MP4 system defined by ISO / IEC 14496-12. Hereinafter, the MP4 system is simply referred to as MP4. MP4 employs an extremely flexible data structure on the premise that it is applied to a randomly accessible medium such as an HDD (Hard Disk Drive) or a flash memory. In this general usage form of MP4, streams such as a video stream, an audio stream, and a subtitle stream are divided into units of fragments of about several seconds, and these fragments are sequentially arranged to form one file. The

As a medium for distributing high-quality content such as 4K, which is expected to become popular in the future, it is considered that optical discs are still often used because of the problem of bit unit price. On the other hand, a smartphone or tablet does not have an optical disk drive, but it is used as a terminal for receiving and playing back content distribution over a network by taking advantage of its high portability or the recent increase in screen size and definition. ing. For this reason, smartphones and tablets have many functions and processes corresponding to MP4, and their application to MPEG2-TS has not progressed much.

Therefore, when copying content that is an MPEG2-TS file distributed on an optical disc to a smartphone or tablet, the file format of the content may be converted to MP4 (see, for example, Patent Document 1). By such conversion, an MP4 file that is an MP4 file is generated.

JP 2012-175608 A

A file generation method according to an aspect of the present disclosure is a file generation method for generating an MP4 file, and a single MP4 file is obtained by integrating the two streams so that the two streams are continuously played back. Generated and stores information indicating a section in which the reproduction timing overlaps in each of the two streams in the generated MP4 file.

Note that these comprehensive or specific modes may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM, and the system, method, integrated circuit, computer program And any combination of recording media.

FIG. 1 is a diagram schematically showing an example of the structure of MPEG2-TS content stored on an optical disc. FIG. 2 is a diagram for explaining a method of decrypting the aligned unit. FIG. 3 is a diagram showing an internal structure of the aligned unit in a plain text state. FIG. 4 is a diagram illustrating a method of creating an actual Elementary Stream from a plurality of TS Payloads. FIG. 5 is a block diagram illustrating a configuration of the file generation device according to the first embodiment. FIG. 6 is a diagram for explaining a method of generating an MP4 stream file from an MPEG2-TS stream file, a difference file, and a copy manifest file in the first embodiment. FIG. 7 is a diagram for explaining a method of generating a difference file and a copy manifest file in the first embodiment. FIG. 8 is a flowchart of the file generation method according to the first embodiment. FIG. 9 is a diagram for explaining a file generation method according to the first modification of the first embodiment. FIG. 10A is a diagram for explaining data encryption in the AES-CTR mode in the first modification of the first embodiment. FIG. 10B is a diagram for explaining data decoding in the AES-CTR mode in the first modification of the first embodiment. FIG. 11 is a diagram illustrating an example in which the MPEG-4 AAC access unit stored in the transport stream is stored in the MP4 file in the second modification of the first embodiment. FIG. 12 is a diagram illustrating an example in which the MPEG-4 AVC access unit stored in the transport stream is stored in MP4 in the second modification of the first embodiment. FIG. 13A is a diagram illustrating a storage example of a LATM header and a LATM payload in a TS packet in the second modification of the first embodiment. FIG. 13B is a diagram illustrating an example of syntax of the AU_info table in the second modification of the first embodiment. FIG. 13C is a diagram illustrating another example of syntax of the AU_info table in the second modification of the first embodiment. FIG. 14 is a block diagram illustrating a configuration of the file generation device according to the second modification of the first embodiment. FIG. 15A is a diagram showing a schematic structure of a NAL unit in the second modification of the first embodiment. FIG. 15B is a diagram illustrating an example of the storage format of the NAL unit in MPEG2-TS in the second modification of the first embodiment. FIG. 15C is a diagram illustrating an example of the storage format of the NAL unit in MP4 in the second modification of the first embodiment. FIG. 16A is a diagram illustrating a configuration example of an access unit in a transport stream according to the second modification of the first embodiment. FIG. 16B is a diagram illustrating an example of the syntax of the size information included in the size information NAL unit in the second modification of the first embodiment. FIG. 16C is a diagram illustrating another example of the syntax of the size information included in the size information NAL unit in the second modification of the first embodiment. FIG. 17 is a flowchart illustrating a processing operation in which the file generation device according to the second modification of the first embodiment generates an MP4 file. FIG. 18 is a diagram illustrating a specific example of address designation when mode 2 is used in the third modification of the first embodiment. FIG. 19 is a diagram illustrating an example of reading a continuous area exceeding the upper limit value of the copy size in the third modification of the first embodiment. FIG. 20 is a diagram for describing a process of generating an MP4 file by copying data from an elementary stream in the third modification of the first embodiment. FIG. 21 is a diagram illustrating an example of audio and video playback sections of two MP4 files that are played back continuously in the second embodiment. FIG. 22A is a diagram for describing a method of generating one MP4 file by integrating playback sections in the second embodiment. FIG. 22B is a block diagram of a file generation device according to Embodiment 2. FIG. 22C is a flowchart of the file generation method in the second embodiment. FIG. 22D is a block diagram of the playback device in the second embodiment. FIG. 22E is a flowchart of the reproduction method in the second embodiment. FIG. 23A is a diagram showing an example of a menu screen when an MP4 file is generated from content stored on an optical disc in the third embodiment. FIG. 23B is a diagram for describing a method of generating an MP4 file using an optical disc and a network in the third embodiment. FIG. 24 is a diagram illustrating an example of a copy manifest showing the size of the NAL unit, the PTS, and the DTS in the third embodiment. FIG. 25 is a diagram illustrating an example of caption data stored at the end of the MP4 file in the third embodiment. FIG. 26 is a diagram illustrating a case where subtitles with 2K resolution are scaled to 4K and displayed in the third embodiment.

(Knowledge that became the basis of this disclosure)
The inventor has found that the following problems occur with respect to the file generation method of Patent Document 1 described in the “Background Art” section.

In the file generation method of Patent Document 1, content multiplexed in MPEG2-TS is once returned to each stream such as a video stream, an audio stream, or a subtitle stream, and the file format of the content is changed to MP4. Need to convert. In general, commercial content distributed on an optical disc is encrypted. Therefore, at the time of conversion, it is necessary to first decrypt the encryption, convert the file format, and then perform encryption again. The structure of MPEG2-TS content will be described in detail below.

FIG. 1 is a diagram schematically showing an example of the structure of MPEG2-TS content stored on an optical disc. On the optical disc, a Stream File is stored as content. In the example shown in FIG. 1, only one Stream File is stored on the optical disc, but a plurality of Stream Files may be stored. Also here, Stream File is XXXXXX. It is recorded with a file name of M2TS. A number is described in XXXXXX. When a plurality of contents are stored, it is possible to individually manage these contents by this number.

The Stream File is divided into a plurality of units each called 6144-byte Aligned Unit. Aligned Unit is a unit of encryption. Note that the data amount of the Stream File may not necessarily be a multiple of 6144 bytes. If it is not a multiple of 6144 bytes, it is desirable to make the data amount of the stream file a multiple of 6144 bytes by a method such as storing NULL Data at the end of the content.

FIG. 2 is a diagram for explaining a method of decrypting the aligned unit.

The content on the optical disc is encrypted using the unit key Ku which is data. At the time of encryption, the 6144-byte data included in the aligned unit is separated into the first 16-byte data and the remaining 6128-byte data, and the remaining 6128-byte data is encrypted.

When decrypting the Aligned Unit, AES_E first encrypts the first 16 bytes of data using the AES (Advanced Encryption Standard) encryption method using the unit key Ku. Next, an exclusive OR operation is performed on the data obtained by this encryption and the leading 16-byte data. AES_DCBC uses the result of this exclusive OR operation as a key, and decrypts the remaining 6128 bytes of data in AES-CBC (Cipher Block Chaining) mode. The first 16 bytes of data are added to the plaintext data obtained by this decryption. As a result, 6144 bytes of plaintext corresponding to the aligned unit is obtained.

FIG. 3 is a diagram showing the internal structure of the aligned unit in a plain text state.

Aligned Unit is composed of 32 source packets of 192 bytes each. Each Source Packet is composed of a TP_extra_header that is a 4-byte header and a 188-byte Transport Packet that is a TS packet. Further, the 188-byte Transport Packet is composed of a 4-byte TS Header and a 184-byte TS Payload. Information indicating the attribute of TS Payload is described in TS Header. Specifically, TS Header from sync_byte (8bits), transport_error_indicator (1bit), payload_unit_start_indicator (1bit), transport_priority (1bit), PID (13bits), transport_scrambling_control (2bits), adaptation_field_control (2bits), and continuity_counter (4 bits) Composed. Here, the PID is information for identifying the type of elementary stream stored in the TS Payload, such as video or audio. Even when there are a plurality of types of audio, the type of audio of the elementary stream can be identified by this PID.

FIG. 4 is a diagram showing a method for creating an actual Elementary Stream from a plurality of TS Paloads. A PES_Header and an Elementary Stream are configured by connecting a plurality of TS Payloads to which the same PID is assigned. Here, the first TS Payload of the plurality of TS Payloads is configured to include PES_Header. A PES (Packetized Elementary Stream) or a PES packet is composed of at least a part of the PES_Header and the Elementary Stream.

As described above, the MPEG2-TS file (Stream File) is encrypted for each Aligned Unit. Therefore, in order to convert the file into an MP4 file, the above-described decryption is performed, and further re-encryption is performed. In a conversion device such as a smartphone or a tablet, there is a problem that it takes time to perform the decryption and encryption. Furthermore, since plain text content is once created, there is a security problem.

In order to solve such a problem, a file generation method according to an aspect of the present disclosure is a file generation method for generating an MP4 file, an original file configured in a file format different from MP4 is acquired, A difference file including data not included in the original file is acquired, a procedure file indicating a procedure for generating the MP4 file is acquired, and included in the difference file according to the procedure indicated in the procedure file The MP4 file is generated by combining data and data included in the original file. For example, in the acquisition of the original file, the original file configured in the MPEG2-TS file format is acquired.

Thus, an MP4 file is generated by combining the data included in the difference file and the data included in the original file according to a predetermined procedure. Therefore, an MP4 file can be easily generated without returning an original file constituted by a file format such as MPEG2-TS to each stream such as a video stream or an audio stream. Even if the original file is encrypted, it is not necessary to decrypt and re-encrypt it. Therefore, it is possible to suppress the processing load for generating the MP4 file.

Further, the acquisition of the original file, the difference file, and the procedure file may be acquired by reading the original file, the difference file, and the procedure file from an optical disc.

Thereby, since all the files necessary for generating the MP4 file are acquired from one optical disk, it is possible to save the trouble of searching for these files and to generate the MP4 file more easily.

Further, in the procedure file, each of a plurality of parts included in the difference file is arranged so that a range of parts included in the difference file and a range of parts included in the original file are alternately arranged. A range and each range of a plurality of parts included in the original file are described, and in the generation of the MP4 file, the parts indicated by the ranges are combined in the order of the ranges described in the procedure file. As a result, the MP4 file may be generated.

Thus, each part included in the MP4 file can be generated in order from the beginning of the MP4 file, and the MP4 file can be generated more easily because there is no backtracking.

Each range of the plurality of parts included in the difference file is described in the procedure file according to a data size, and each range of the plurality of parts included in the original file includes the start position and the data size of the part. And may be described in the procedure file.

This makes it possible to copy or obtain appropriate parts from the difference file and the original file and combine them based on the description of the procedure file. In addition, since the start position is not used to describe the range of the part included in the difference file, the data size of the procedure file can be suppressed.

In addition, the data size of the MP4 file is described in the procedure file, and the file generation method further records the MP4 file based on the data size of the MP4 file described in the procedure file. It may be determined whether there is enough free space on the medium.

Thus, since it is determined whether or not there is enough free space for recording the MP4 file on the medium, if there is no free space, processing such as canceling the generation of the MP4 file in advance may be performed. it can. That is, it is possible to prevent an error from occurring.

The attribute of the MP4 file may be described in the procedure file, and the file generation method may further read the attribute described in the procedure file.

Thus, if the MP4 file attribute is read from the procedure file before the MP4 file is generated, it can be determined in advance whether or not the desired MP4 file is generated.

Further, the procedure file may describe a buffer size necessary for reproducing the MP4 file, and the file generation method may further read the buffer size described in the procedure file.

Thus, if the buffer size necessary for reproducing the MP4 file is read from the procedure file, it is possible to easily determine whether or not the MP4 file can be reproduced without analyzing the MP4 file.

In the procedure file, a first file name that is the name of the original file and a second file name that is the name of the difference file are described, and the file generation method further includes the procedure file A file having a first file name described in a file may be specified as the original file, and a file having a second file name described in the procedure file may be specified as the difference file.

This makes it possible to appropriately acquire the original file and the difference file used for generating the MP4 file.

In the generation of the MP4 file, header information corresponding to MP4 that is data included in the difference file may be combined with data included in the original file.

This makes it possible to easily generate an MP4 file having appropriate MP4 header information.

In addition, in the acquisition of the original file, the original file in a plain text state may be acquired, and in the generation of the MP4 file, the generated MP4 file may be encrypted.

Thus, if the original file is deleted after conversion to MP4, the confidentiality of the data can be secured while leaving the data included in the original file as an MP4 file.

In addition, in the generation of the MP4 file, each time the part constituting the original file is acquired, the part of the MP4 file corresponding to the part is generated and encrypted, and the part constituting the MP4 file is encrypted. Each time it is done, the part of the original file corresponding to the part may be deleted.

This makes it possible to prevent all data contained in the plaintext original file from being temporarily stored in the storage area, and to ensure the confidentiality of the data more reliably.

Furthermore, in the file generation method disclosed in Patent Document 1, it is not considered that two streams are reproduced continuously. Furthermore, it is not considered to overlap the timing of reproduction of a part of each of the two streams. Therefore, an MP4 file suitable for overlapping reproduction cannot be generated.

In order to solve such a problem, a file generation method according to an aspect of the present disclosure is a file generation method for generating an MP4 file, wherein the two streams are played back in succession. Are combined to generate one MP4 file, and information indicating an interval in which reproduction timing overlaps in each of the two streams is stored in the generated MP4 file. For example, in the integration of the two streams, the two streams that are at least part of the original file each configured in the MP4 file format are integrated. Further, for example, in the integration of the two streams, the two streams each including audio data are integrated.

Thus, information indicating the overlap section is stored in the MP4 file. Therefore, the playback device that plays back the MP4 file can easily specify the data of the overlap section from the MP4 file using the information. As a result, the playback apparatus can appropriately play back the data by combining the data in the overlapping section. That is, an MP4 file suitable for overlapping reproduction can be generated.

In addition, in the integration of the two streams, when the section exists over a plurality of samples included in any one of the two streams, after deleting at least one of the plurality of samples The two streams may be integrated.

∙ As a result, the sample is deleted, so the overlap interval can be shortened. As a result, it is possible to reduce the burden of special processing by the playback device for the overlap section.

In the storage of the information, time information indicating the time length of the section may be stored in the MP4 file as the information.

Thus, a playback device that plays back an MP4 file can easily specify the time length of the overlap section using the information. As a result, the reproducing apparatus can appropriately reproduce the data within the specified time length by combining the data of the overlapping sections.

In addition, in the storage of the information, the time information may be stored in a traf in a moof in the MP4 file.

Thereby, the playback device can appropriately acquire the stored time information.

In the file generation method, the information may be acquired from a device or an optical disk that holds the information.

This makes it possible to easily store the information in the MP4 file without generating information indicating the overlap section.

Further, the playback method according to an aspect of the present disclosure is a playback method for playing back an MP4 file, and extracts information indicating two sections in which playback timing overlaps in the playback target content from the MP4 file, The two sections in the content are specified based on the extracted information, and the decoding results for the data in the two sections are combined and output.

Thereby, the playback device can easily specify the data of the overlap section from the MP4 file. As a result, the playback device can appropriately play back the data in the overlap section.

A recording medium according to an aspect of the present disclosure is a recording medium that records an MP4 file, and the MP4 file is read by a computer and played back, and the playback timing of the content is Information indicating two overlapping sections.

Thus, a playback device that reads and plays back the MP4 file from the recording medium can easily identify the data of the above two sections from the MP4 file using the information. As a result, the playback device can appropriately play back the data by combining the data of those sections.

Hereinafter, embodiments will be specifically described with reference to the drawings.

It should be noted that each of the embodiments described below shows a comprehensive or specific example. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are merely examples, and are not intended to limit the present disclosure. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements.

(Embodiment 1)
FIG. 5 is a block diagram showing a configuration of the file generation apparatus according to the present embodiment.

The file generation apparatus 10 in the present embodiment is an apparatus that generates an MP4 file, and includes an original file acquisition unit 11, a difference file acquisition unit 12, a procedure file acquisition unit 13, and a generation unit 14. The original file acquisition unit 11 acquires an original file configured in a file format different from MP4. The difference file acquisition unit 12 acquires a difference file including data that is not included in the original file. The procedure file acquisition unit 13 acquires a procedure file indicating a procedure for generating an MP4 file. The generation unit 14 generates an MP4 file by combining the data included in the difference file and the data included in the original file according to the procedure indicated in the procedure file. In other words, the generation unit 14 converts the original file into an MP4 file.

Hereinafter, the processing operation by the file generation device 10 will be described in detail.

FIG. 6 is a diagram for explaining a method of generating an MP4 stream file from an MPEG2-TS stream file, a difference file, and a copy manifest file.

In the present embodiment, for example, on an optical disc (for example, a Blu-ray (registered trademark) disc), a stream file A (XXXX.M2TS) that is the above-mentioned original file and a copy manifest file (XXXX.X.) that is the above-described procedure file. CMNF) and the above-described difference file (XXXX.DMP4) are recorded. The copy manifest file (XXXX.CMNF) describes a conversion procedure for processing the original file and converting it into an MP4 file. The difference file (XXXX.DMP4) stores data necessary for conversion to an MP4 file.

When converting the MPEG2-TS stream file A (XXXX.M2TS), the generating unit 14 converts the data of the stream file A (XXXX.M2TS) and the difference file (XXXX.M2TS) according to the description of the copy manifest file (XXXX.CMNF). The stream file B (XXXX.MP4) which is an MP4 file is generated by alternately combining the data of XXXXXX.DMP4). In other words, the generation unit 14 converts the stream file A (XXXX.M2TS) into the stream file B (XXXX.MP4).

Thus, the original file can be converted without being restored to an audio or video elementary stream (for example, a HE-AAC or MPEG-4 AVC stream). If the original file is encrypted, the conversion from the original file to the MP4 file can be easily performed without breaking the encryption.

This copy manifest file (XXXX.CMNF) is “Input File A”, “Input File B”, “Output File”, “Output File Size”, “ATTRIBUTE”, “MP4 DECODER BUFFER SIZE” and “MP4 DECODER BUFFER SIZE”. including. “Input File A” and “Input File B” indicate the file names of the two input files. The conversion process starts from a copy of the top part of the file indicated by “Input File A”. “Output File” indicates an output file, that is, a file name of a generated MP4 file. “Output File Size” indicates the data size of the output MP4 file. This data size is described with byte precision. By confirming this data size, it is possible to confirm whether or not the medium on which the output MP4 file is recorded has sufficient free space before the conversion process. “ATTRIBUTE” indicates the attribute of each file, specifically, what kind of elementary stream is converted and what kind of file is generated. In the example shown in FIG. 6, “ATTRIBUTE” includes 4K video in which the converted MP4 file is compressed with HEVC (High Efficiency Video Coding) and 5.1ch English audio of AAC (Advanced Audio Coding). Indicates that Thus, before the conversion process, it is possible to confirm in advance what the converted file obtained when converting according to this copy manifest file is. “ATTRIBUTE” may indicate the data structure of the MP4 file or the like, and may indicate the brand (that is, the type) of MP4 stored in “ftyp” of the MP4 file.

In addition, “MP4 DECODER BUFFER SIZE” indicates a minimum buffer size necessary for reproducing the converted stream file B (XXXX.MP4), which is an output MP4 file. This buffer size is an amount that depends on what multiplexing rule is used to multiplex video and audio in the MP4 file that is the stream file B after conversion. In addition to the buffer size, “MP4 DECODER BUFFER SIZE” may indicate what kind of decoder resources (memory amount and data transfer rate) are necessary to reproduce the stream file B after conversion.

“COPY MANIFEST” indicates the range of each part of the file indicated by “Input File A” and the range of each part of the file indicated by “Input File B”. Each of these portions is a portion that is sequentially copied and pasted to generate an MP4 file indicated by “Output File”. The range of each part is indicated by the copy start position and the copy size, or only by the copy size. In addition, “COPY MANIFEST” is a file part indicated by “Input File A” and a part of the file indicated by “Input File B” are alternately copied and pasted. Indicates the range of the part.

The difference file (XXXX.DMP4) is copied and pasted from the beginning of the difference file for each part of the specified size. Therefore, in “COPY MANIFEST”, it is not necessary to specify the copy start position in order to indicate each part of the difference file (XXXX.DMP4). That is, the range of each part of the difference file is indicated only by the copy size (data size) without using the copy start position.

On the other hand, the stream file A (XXXX.M2TS) includes data that is not necessary for the converted stream file B (XXXX.MP4). Therefore, in “COPY MANIFEST”, in order to indicate each part of the stream file A that is the original file, the range of each part is indicated by the copy start position and the copy size. The copy start position is a byte position from the beginning of the file, and the copy size is a data size in bytes.

The generation unit 14 repeats copying and pasting alternately the portion included in the original file and the portion included in the difference file indicated by “COPY MANIFEST” according to the copy manifest file. That is, the generation unit 14 repeats alternately combining the part included in the original file and the part included in the difference file, which are indicated by “COPY MANIFEST”. As a result, an MP4 file that is the stream file B after conversion is generated.

Therefore, according to the copy manifest file, the same converted stream file B (XXXX.MP4) can be generated regardless of the device used to convert the original file. That is, it becomes easy to reproduce the stream file B after conversion without any trouble on any device.

The file generation apparatus 10 may obtain the difference file (XXXX.DMP4) and the copy manifest file (XXXX.CMNF) from an optical disk such as a Blu-ray (registered trademark) disk, or both of them or one of them. The file may be obtained via the Internet. In this case, it is a special change to the file generation apparatus 10 to present the user with various selections such as converting the original file into a file in the latest file format or an MP4 file including another video and audio. It is possible without requiring.

Further, the file generation apparatus 10 may perform the conversion while the stream file A that is the original file is encrypted, and after the decryption of the original file is performed, the file is generated by performing the conversion to the MP4 file. The MP4 file may be encrypted again. Further, the file stream (XXXX.M2TS) which is the original file may be plain text. Since the difference file is composed of header information, it is not necessary to encrypt the difference file. However, if stream file A (XXXX.M2TS), which is the original file, or stream file B (XXXX.MP4), which is an MP4 file, is plain text, the entire file is expanded in memory or temporarily saved on the hard disk Operations such as doing this may not be allowed for security reasons.

Therefore, the file generation apparatus 10 may delete the areas of the stream file A (XXXX.M2TS), which is the plaintext original file, in order from the area where the conversion to the MP4 file is completed. Further, when encrypting the stream file B (XXXXXXX.MP4) which is an MP4 file, the file generation apparatus 10 generates a portion such as “Movie fragment” included in the MP4 file or a predetermined number of MP4 samples. Each time it is done, the generated portions may be encrypted in sequence. If an encryption method in which the data size does not change before and after encryption is used, the area of the data to be copied does not change regardless of the presence or absence of encryption.

Also, the copy manifest file may indicate whether or not the stream file A (XXXX.M2TS) that is the original file is encrypted. Alternatively, the copy manifest file can be converted after the encoded data multiplexed in the stream file A is encrypted or converted into an MP4 file after converting the encrypted encoded data into plain text. It may indicate whether or not

FIG. 7 is a diagram for explaining a method of generating a difference file and a copy manifest file.

The difference file (XXXX.DMP4) and copy manifest file (XXXX.CMNF) can be generated as follows. At the authoring stage, the stream file A (XXXX.M2TS), which is the original file, is converted in format to generate the converted stream file B (XXXX.MP4), which is an MP4 file. Next, at least one portion including data matching each other is searched between the stream file A and the stream file B which are the original files. At this time, the search is performed so that the data size of each part becomes the largest. Next, a difference file is generated by concatenating the remaining parts included in the stream file B other than the part searched as described above according to the order included in the stream file B. The correlation result of each file at this time is recorded in the copy manifest file as “COPY MANIFEST”.

Note that the above-described search is performed by sequentially acquiring the data included in each of the stream file A and the stream file B in the direction from the top to the back of each file and comparing the data. And the comparison is performed without returning to the reverse direction to the above-mentioned direction. As a result, the above-described conversion process (copy process) can be performed sequentially, that is, by one continuous process, and speeding up or memory reduction can be realized.

In addition, when video and audio are multiplexed by MPEG2-TS, a picture of a predetermined PTS (Presentation TimeStamp) included in the video is audio so that data of a picture having a large code amount such as an I picture does not underflow. May be multiplexed earlier in time than a frame of the same PTS as the predetermined PTS included in the. On the other hand, when video and audio are multiplexed by MP4, the PTS of the first picture of the video and the PTS of the first frame of the audio in “Movie fragment” should be multiplexed or close to each other. Is common.

As described above, if the audio, video, or text multiplexing unit is different between the stream file A and the stream file B, the conversion process may not be performed sequentially. As a result, conversion may occur while the pointer for reading or writing is sequentially moved back and forth. Therefore, whether or not the conversion process can be performed by one continuous process, or when it cannot be performed by one continuous process, the maximum data size that needs to be returned may be registered in the copy manifest file.

Note that if the search for the portion including matching data, which is performed on the stream file A and the stream file B, is performed in units of small data, the data size of the copy manifest file becomes large and the conversion process is complicated. become. For this reason, a specific threshold value may be provided. For example, only a portion including data that matches 8 bytes or more is registered in the copy manifest file. Even if each of the remaining portions other than that portion included in the stream file B includes data of less than 8 bytes that matches the data of the stream file A, the data included in those remaining portions is stored in the difference file. Stored. Further, this threshold value may be described in the copy manifest file as “MIN COPY SIZE: 8 bytes”.

Further, instead of using the difference file, the MPEG-2TS file may be converted into the MP4 file by using a copy manifest file indicating a multiplexing unit in the MP4 file. For example, in MP4, audio, video, or text is multiplexed as different “Movie fragments”. At this time, the first and last audio frame and video picture DTS in decoding order and the byte offset value from the beginning of the MPEG-2TS file are registered together with the media identification information. May be. It should be noted that the registration order of “Movie fragment” is the same as the order of appearance of “Movie fragment” in the MP4 file.

FIG. 8 is a flowchart of the file generation method in the present embodiment.

The file generation method in the present embodiment is a method in which the file generation device 10 generates an MP4 file. In this file generation method, first, the original file acquisition unit 11 of the file generation device 10 acquires an original file configured in a file format different from MP4 (step S11). Next, the difference file acquisition unit 12 acquires a difference file including data that is not included in the original file (step S12). Next, the procedure file acquisition unit 13 acquires a procedure file indicating a procedure for generating an MP4 file (step S13). And the production | generation part 14 produces | generates the above-mentioned MP4 file by combining the data contained in a difference file, and the data contained in an original file according to the procedure shown in the procedure file (step S14). For example, in step S11, the original file acquisition unit 11 acquires an original file configured in the MPEG2-TS file format.

In steps S11 to S13, the original file acquisition unit 11, the difference file acquisition unit 12, and the procedure file acquisition unit 13 acquire the files by reading the original file, the difference file, and the procedure file from the optical disc, respectively. As a result, since all the files necessary for generating the MP4 file are acquired from one optical disc, it is possible to save the trouble of searching for these files and to generate the MP4 file more easily.

Here, in the procedure file, the ranges of the parts included in the difference file and the ranges of the parts included in the original file are alternately arranged so that the ranges of the parts included in the difference file, Each range of a plurality of parts included in the original file is described. Therefore, in step S14, the generation unit 14 generates an MP4 file by combining the parts indicated by the ranges in the order of the ranges described in the procedure file. Thereby, each part included in the MP4 file can be generated in order from the head side of the MP4 file, and the MP4 file can be generated more easily because it does not return.

Also, the range of each of the multiple parts included in the difference file is described in the procedure file according to the data size. On the other hand, each range of the plurality of parts included in the original file is described in the procedure file by the start position and data size of the part. Thereby, based on the description of the procedure file, an appropriate part can be copied or acquired from the difference file and the original file and combined. In addition, since the start position is not used to describe the range of the part included in the difference file, the data size of the procedure file can be suppressed.

Also, the data size of the MP4 file is described in the procedure file. Therefore, in step S14, the generation unit 14 further determines whether the medium has enough free space for recording the MP4 file based on the data size of the MP4 file described in the procedure file. Also good. Thus, since it is determined whether or not there is enough free space for recording the MP4 file on the medium, if there is no free space, processing such as canceling the generation of the MP4 file in advance may be performed. it can. That is, it is possible to prevent an error from occurring.

Also, the MP4 file attributes are described in the procedure file. Therefore, the file generation apparatus 10 may further read out the attribute described in the procedure file.

Also, the procedure file describes the buffer size required for playing MP4 files. Therefore, the file generation device 10 may further read the buffer size described in the procedure file. Thus, if the buffer size necessary for reproducing the MP4 file is read from the procedure file, it is possible to easily determine whether or not the MP4 file can be reproduced without analyzing the MP4 file.

In the procedure file, a first file name that is the name of the original file and a second file name that is the name of the difference file are described. Therefore, the file generation device 10 further specifies the file having the first file name described in the procedure file as the original file, and specifies the file having the second file name described in the procedure file as the difference file. May be. Thereby, the original file and difference file used for generating the MP4 file can be appropriately acquired.

In step S14, the generation unit 14 combines header information corresponding to MP4 that is data included in the difference file with data included in the original file. Thereby, an MP4 file having appropriate header information of MP4 can be easily generated.

In step S11, the original file acquisition unit 11 may acquire an original file in a plain text state, and in step S14, the generation unit 14 may encrypt the generated MP4 file. Thus, if the original file is deleted after conversion to MP4, the confidentiality of the data can be ensured while leaving the data contained in the original file as the MP4 file.

Here, in step S14, every time the part constituting the original file is acquired, the generation unit 14 generates and encrypts the part of the MP4 file corresponding to the part, and the part constituting the MP4 file is encrypted. Each time it is done, the portion of the original file corresponding to that portion may be deleted. Thereby, it is possible to prevent all the data included in the plaintext original file from being temporarily stored in the storage area, and to ensure the confidentiality of the data more reliably.

(Modification 1)
In the above embodiment, the MP4 file is generated using the difference file and the copy manifest file. However, the MP4 file may be generated without using these files. In the file generation method according to this modification, an MPEG2-TS stream file composed of a plurality of Source Packets each having a plaintext header is converted into an MP4 stream file using a counter. As a result, an MP4 stream file is generated. The MPEG2-TS stream file is an original file, and the MP4 stream file is an MP4 file.

FIG. 9 is a diagram for explaining a file generation method in the present modification.

An MPEG2-TS stream file (that is, content) is composed of a plurality of source packets as described above. In FIG. 9, the source packet is abbreviated as SP.

In each source packet included in the MPEG2-TS stream file in this modification, only the TS Payload portion of the data included in the source packet is encrypted. That is, among the data included in the source packet, TS_extra_header and TS Header are not encrypted and are plain text.

AES-CTR (CountTeR) mode is used for encryption. In the AES-CTR mode, encryption and decryption are performed using the counter value. As shown in FIG. 9, using the value of the video counter (AES Counter for Video), multiple TS Payloads each containing video data are encrypted, and the value of the audio counter (AES Counter for Audio) is obtained. Are used to encrypt a plurality of TS Payloads each including audio data. The video counter counts only the video source packet data so that the count value increases in accordance with the arrow shown in FIG. The audio counter counts only the audio source packet data so that the count value increases in accordance with the arrow shown in FIG. Details of encryption in the AES-CTR mode will be described later.

The file generation device can easily convert an MPEG2-TS stream file into an MP4 stream file by taking out only TS Payload from each of a plurality of Source Packets included in the MPEG2-TS stream file. it can.

FIG. 10A is a diagram for explaining data encryption in the AES-CTR mode.

When encrypting, a key and an initial value IV (Initial Vector) are used. First, the IV is encrypted using the key. The ciphertext c1 corresponding to the first 16 bytes is generated by the exclusive OR operation between the value obtained by this encryption and the first 16 bytes of the data to be encrypted (m1 shown in FIG. 10A). . For the next 16 bytes of data (block) included in the data to be encrypted (m2 shown in FIG. 10A), the same processing as that for the first 16 bytes described above after IV is updated as IV = IV + 1. Process. Thereby, the ciphertext c2 corresponding to the next 16-byte data is generated. The IV updated as described above is the above-described counter value, and the video counter and the audio counter shown in FIG. 9 each calculate IV = IV + 1.

By continuously performing such processing, it is possible to create a ciphertext even for long data of 16 bytes or more. If the length of the data to be encrypted is not a multiple of 16 bytes, a ciphertext is generated by performing an exclusive OR in bit units in the last block.

FIG. 10B is a diagram for explaining data decoding in the AES-CTR mode.

At the time of decryption, the same processing as that for encryption is performed on the data to be decrypted. That is, at the time of decryption, the process of encrypting the IV using the key is performed.

As described above, the file generation method according to the present modification is a file generation method for generating an MP4 file, which acquires an original file composed of a plurality of packets, and for each packet included in the original file, Among them, only the remaining encrypted payload portion excluding the header information in the plaintext state is acquired and combined to generate an MP4 file. For example, in the acquisition of the original file, the original file configured in the MPEG2-TS file format is acquired. Thereby, an MP4 file can be easily generated without returning an original file constituted by a file format such as MPEG2-TS to each stream such as a video stream or an audio stream. Also, there is no need to decrypt and re-encrypt the original file. Therefore, it is possible to suppress the processing load for generating the MP4 file.

Here, among the plurality of packets included in the original file, the payloads of the plurality of packets including video data are encrypted using the counter value of the first counter for video, and the audio data Each of the payloads of the plurality of packets including is encrypted using a counter value of a second counter for audio different from the first counter. The first counter counts only data included in each of a plurality of packets corresponding to video from the beginning of the original file to the back, and the second counter proceeds from the beginning of the original file to the back. Thus, only the data included in each of the plurality of packets corresponding to the audio is counted. Thus, since the first counter is used as a video-dedicated counter, it is possible to easily decode a video elementary configured by combining the payloads of a plurality of packets corresponding to video in the MP4 file. it can. Similarly, since the second counter is used as an audio-only counter, it is possible to easily decode an audio elementary configured by combining the payloads of a plurality of packets corresponding to audio in the MP4 file. it can.

The original file is composed of a plurality of aligned units, and each of the plurality of aligned units is composed of a plurality of source packets. The plurality of packets included in the original file are a plurality of Source Packets included in each of the plurality of Aligned Units. The payload is TS Payload, and the header information is composed of TP_extra_header and TS Header.

(Modification 2)
In the above embodiment, the MP4 file is generated using the difference file and the copy manifest file. However, the MP4 file may be generated without using these files. In the file generation method according to this modification, an MPEG2-TS stream file including auxiliary information is converted into an MP4 stream file using the auxiliary information. As a result, an MP4 stream file is generated.

In MPEG2-TS and MP4, the access unit data multiplexing method is partially different. Therefore, when storing MPEG2-TS data in an MP4 file, the access unit is separated into a plurality of parts. Need to be stored separately. Basically, an access unit in MPEG2-TS includes both initialization information essential for decoding a video picture or audio frame, and encoded data of a picture or frame. On the other hand, an access unit (referred to as a sample or MP4 sample in MP4) in an MP4 file is composed only of encoded data of a picture or a frame, and initialization information necessary for decoding is encoded data as header information of the MP4 file. Stored separately.

Below, when converting encoded data multiplexed by MPEG2-TS into an MP4 file, auxiliary information for reducing the amount of processing related to the conversion, and a method of converting a multiplexed format using the auxiliary information explain. In particular, when the encoded data is encrypted, the amount of processing when decrypting and re-encrypting is large. In this modification, conversion to an MP4 file can be performed only by copying data without decrypting the encoded data.

Note that the MPEG2-TS stream file that is the original file before conversion may be another TS (transport stream). In other words, the original file is used not only for the TS specified by the MPEG-2 system but also for a TS (for example, Blu-ray (registered trademark) disc or video distribution) in which header information of a predetermined number of bytes is added to the TS packet. TS). In addition, the MP4 file generated by the conversion may be an MP4 file using “Movie fragment”, or an MP4 file not using “Movie fragment”. Furthermore, the format of the file generated by the conversion is DEFF (Digital Entertainment Content Ecosystem) CFF (Common File Format) or MPEG-DASH (Dynamic Adaptive Stream) such as MPEG4 based on MP4. Good. Hereinafter, the original file will be described as a transport stream.

FIG. 11 is a diagram showing an example in which an MPEG-4 AAC access unit stored in a transport stream is stored in an MP4 file.

The MPEG-4 AAC access unit in the transport stream is composed of three types of data: LATM (Low Overhead Audio Transport Multiplex) header, PayloadLengthInfo (), and PayloadMux (). The LATM header includes initialization information necessary for decoding MPEG-4 AAC encoded data (also referred to as AAC data) such as the number of channels and sampling frequency. More specifically, the initialization information is stored in AudioSpecificConfig () in the LATM header. PayloadLengthInfo () stores the size of PayloadMux (), and AAC data is stored in PayloadMux ().

When storing the data of this access unit in the MP4 file, AudioSpecificConfig () in the LATM header is stored in the sample entry in stsd in the moov of the MP4 file. Furthermore, PayloadLengthInfo () and PayloadMux () are stored in mdat as sample data. Note that the sample data is data stored in the sample. In addition, sample data in mdat is referred to from moov, or when “Movie fragment” is used, sample data in mdat is referred to from moof. In MPEG-2 AAC, an ADTS (Audio Data Transport Stream) header is used instead of a LATM header, and an access unit is composed of an ADTS header and AAC data (called raw_data_block ()). Also at this time, the ADTS header is separated from the access unit, and at least ads_fixed_header () among the data included in the ADTS header is stored in the sample entry. Furthermore, AAC data is stored in mdat as sample data.

FIG. 12 is a diagram illustrating an example in which an access unit of MPEG-4 AVC (Advanced Video Coding) stored in the transport stream is stored in MP4.

As in the case of MPEG-4 AAC, in the transport stream, initialization information necessary for decoding, such as Sequence Parameter Set (SPS) and Picture Parameter Set (PPS), is stored as part of the access unit. . On the other hand, in the MP4 file, the initialization information is stored as the header information of the MP4 file. The access unit shown in FIG. 12 constitutes an IDR (Instantaneous Decoder Refresh) picture. Each NPS (Network Adaptation Layer) unit of SPS and PPS is separated from the access unit and stored in the sample entry in stsd in the moov of the MP4 file. Other data included in the access unit is stored in mdat as sample data.

In MP4, a mode in which initialization information such as SPS and PPS can be included in MPEG-4 AVC sample data can be selected. The mode is indicated by the identification information of the sample entry. When the identification information is “avc1” or “avc2”, it is prohibited to include the initialization information in the sample data. On the other hand, when the identification information is “avc3” or “avc4”, it is permitted to include initialization information in the sample data. Therefore, when the transport stream is converted into an MP4 file, when the above-described identification information in the MP4 file is set to “avc1” or “avc2”, the data stored in the mdat from the access unit in the transport stream is set. The NPS unit of SPS and PPS or the NAL unit of FillerData used for stuffing is deleted from the inside. When the identification information is set to “avc3” or “avc4”, the SPS or PPS NAL unit does not have to be deleted. Therefore, whether to delete SPS and PPS may be switched according to the setting value of the identification information in the MP4 file.

Also, HEVC (High Efficiency Video Coding) data, which is a next-generation moving image coding system, is structured by NAL units and has initialization information such as SPS and PPS, as in MPEG-4 AVC. When the HEVC data is stored in the MP4 file, initialization information may be included in the sample data. Therefore, when converting the transport stream into an MP4 file, whether to delete the initialization information from the data stored in the mdat from the access unit in the transport stream is determined according to the type of encoding method. The processing may be performed according to the determination result.

As described above, when storing MPEG-4 AAC encoded data in an MP4 file, the LATM header is separated from the access unit. Furthermore, only PayloadLengthInfo () and PayloadMux () are stored in mdat as sample data. Hereinafter, PayloadLengthInfo () and PayloadMux () are collectively referred to as LATM payload.

Therefore, in this modification, when storing the data of the MPEG-4 AAC access unit in a plurality of TS packets, the LATM header and the LATM payload are stored in separate TS packets. Thereby, the LATM header can be easily separated.

FIG. 13A is a diagram showing an example of storing the LATM header and the LATM payload in the TS packet. Stuffing is performed as necessary so that the data of the LATM header and the LATM payload are not mixed in the payload of the same TS packet. For example, the LATM header of access unit 1 is stored in the payload of the first TS packet. At this time, if the size of the LATM header is less than the size of the TS payload, stuffing is performed on the remaining area of the TS payload. In the example shown in FIG. 13A, the PES packet is not described, but actually, the data of the access unit is stored in the payload of the PES packet, and the data of the PES packet is stored in the payload of the TS packet.

Next, a method for identifying a TS packet storing the LATM header and a TS packet storing the LATM payload will be described. When storing one access unit as one PES packet, the payload_unit_start_indicator of the TS header is set to 1 in the TS packet including the head data of the PES packet. If it is ensured that the LATM header is included in the payload of the TS packet in which payload_unit_start_indicator is set to 1, whether or not the LATM header is included in the TS packet can be determined based on the value of payload_unit_start_indicator. When storing a plurality of access units in one PES packet, an AU_info table as auxiliary information may be arranged at the head of the payload of the PES packet. This AU_info table includes the number of access units included in the payload of the PES packet, the LATM header and the size of the LATM payload in each access unit.

FIG. 13B is a diagram illustrating an example of the syntax of the AU_info table.

The AU_info table includes AU_info_identification_code, number_of_AU indicating the number of access units, and a size_of_LengthInfo indicating the size of the LATM header and LATM payload in the access unit. AU_info_identification_code is a bit string specific to the AU_info table. By searching this code, it can be determined whether or not the AU_info table exists. Alternatively, as in the MP4 Box structure, the AU_info table may have a data structure in which the Box data size and the Box type are combined. However, if the presence of the AU_info table is signaled by a descriptor in the transport stream or auxiliary data for conversion to an MP4 file prepared separately from the transport stream, the above code is omitted. Also good. The AU_info table may also be used when storing one access unit in one PES packet.

A method for identifying data to be separated for conversion to an MP4 file by indicating the size or data offset of each component in the access unit is applied to MPEG-4 AVC encoded data. May be. That is, when storing MPEG-4 AVC encoded data in an MP4 file, the above-described method may be applied to separate the NPS units of SPS and PPS.

FIG. 13C is a diagram illustrating an example of the syntax of the AU_info table for indicating whether or not each constituent element in the access unit needs to be excluded from the sample data of the MP4 file.

The AU_info table includes AU_info_identification_code, number_of_data_unit, size_of_data_unit, and conversion_mode. number_of_data_unit indicates the number of data units included in the access unit. size_of_data_unit indicates the size of the data unit. The conversion_mode is a conversion mode indicating how to handle the data unit at the time of conversion to the MP4 file. If the conversion_mode is 0, the conversion_mode indicates that the data unit is copied as sample data of the MP4 file. If the conversion_mode is 1, the conversion_mode indicates that the data unit is excluded from the sample data, that is, the data unit is not copied as the sample data.

When conversion_mode is 1, the conversion_mode may indicate the handling of the data unit after the data unit is excluded. For example, the conversion_mode indicates that the data unit is excluded from the sample data and then stored in the sample entry.

Also, information of each of the plurality of data units is stored in ascending order in the decoding order. For example, when one access unit of MPEG-4 AVC is stored as one PES packet, the data unit corresponds to a NAL unit, and the number of NAL units constituting the access unit is indicated by num_of_data_unit. Then, conversion_mode is set to 1 for each NPS unit of SPS and PPS. Each of the LATM header and the LATM payload may be regarded as a data unit. In this case, by setting the conversion_mode to 1 for the data unit corresponding to the LATM header, this AU_info table can be applied to the MPEG-4 AAC. When a plurality of access units are stored in the PES packet, number_of_data_unit indicates the total number of data units included in all the access units in the PES packet.

Note that the AU_info table may be stored in the adaptation_field of the TS packet header (TS Header in FIG. 3) including the start portion of the PES packet. The AU_info table may be stored as a part of encoded data such as NAL unit in MPEG-4 AVC or HEVC or SEI (Supplemental Enhancement Information). When storing the AU_info table as part of the encoded data, the AU_info table can be stored for each access unit or for each random access. Also, when the AU_info table is stored in the TS packet or PES packet, the AU_info table may be stored for each PES packet, and only in the PES packet including the access unit leading in the decoding order in the random access unit. A table may be stored. When storing the AU_info table for each random access unit, the AU_info table stores information on all access units constituting the random access unit.

FIG. 14 is a block diagram showing a configuration of a file generation apparatus according to this modification.

The file generation apparatus 100 according to this modification generates the MP4 file by converting the transport stream including the AU_info table into the MP4 file. The file generation apparatus 100 includes an auxiliary information search unit 101, an auxiliary information analysis unit 103, and a generation unit 104. The auxiliary information search unit 101 searches the AU_info table that is auxiliary information from the transport stream. The auxiliary information analysis unit 103 determines whether or not an AU_info table exists based on the search result. Further, when the auxiliary information analysis unit 103 determines that the AU_info table exists, the auxiliary information analysis unit 103 analyzes the AU_info table. The generation unit 104 generates an MP4 file based on the analysis result from the auxiliary information analysis unit 103.

Specifically, the generation unit 104 includes a sample generation unit 104a and a sample entry generation unit 104b. The sample generation unit 104a stores LATM payload data or NAL units other than SPS and PPS as sample data in the mdat of the MP4 file. The sample entry generation unit 104b stores the LATM header data or each NPS unit of SPS and PPS in the sample entry in stsd in the moov of the MP4 file.

Such a file generation device 100 can easily convert a transport stream including the AU_info table described above into an MP4 file.

As described above, when storing encoded data of audio or video, MPEG2-TS and MP4 have different storage locations for initialization information required for decoding. Furthermore, the storage format of NAL units in MPEG-4 AVC or HEVC differs between MPEG2-TS and MP4. Therefore, conversion from a transport stream to an MP4 file requires conversion of the storage format. Hereinafter, the storage format of the NAL unit in MPEG2-TS and MP4 will be described with reference to FIGS. 15A to 15C.

FIG. 15A is a diagram showing a schematic structure of a NAL unit.

The NAL unit is composed of a header and a payload. In the header, type information indicating the type of data stored in the payload is stored.

FIG. 15B is a diagram showing an example of the storage format of the NAL unit in MPEG2-TS. In MPEG2-TS, a unique bit string called a start code is added to the NAL unit as identification information in order to identify the boundary of the NAL unit (hereinafter, the format of such identification information is called a start code format). . A decoding device or the like can separate a desired NAL unit by searching the start code and the type information stored in the header of the NAL unit.

FIG. 15C is a diagram showing an example of the storage format of the NAL unit in MP4. In MP4, in order to identify the boundary of the NAL unit, a field indicating the data size of the NAL unit as identification information is added to the NAL unit (hereinafter, the format of such identification information is referred to as a NAL size format). Here, the field length of the field indicating the data size is stored in the AVCDecoderConfigurationRecord in the sample entry. AVCDecoderConfigurationRecord is an area for storing initialization information at the time of decoding. A decoding device or the like can separate a desired NAL unit based on the data size of the NAL unit.

As described above, MPEG2-TS and MP4 also differ in that the format of identification information indicating the boundary between NAL units is a start code format or a NAL size format. Therefore, when converting a transport stream into an MP4 file, it is necessary to convert identification information indicating the boundary of the NAL unit. Since the storage format of the NAL unit is defined for each encoding method, the conversion operation to the MP4 file may be switched with reference to the audio or video encoding method.

When converting to MP4 file, if NAL size format data is handled, NAL unit data size is required. Therefore, it is desirable that the data size of the NAL unit constituting the access unit can be acquired in advance. By doing so, it is possible to reduce the processing amount when determining the data size of the NAL unit by searching the start code in the access unit of the start code format and detecting the NAL unit boundary.

That is, size information indicating the data size of each NAL unit included in the access unit is stored at the head of the access unit in the transport stream in this modification.

FIG. 16A is a diagram illustrating a configuration example of an access unit in the transport stream according to the present modification. As shown in FIG. 16A, size information indicating the data size of each NAL unit constituting the access unit multiplexed in the transport stream is stored at the head of the access unit. For example, this size information is stored in a newly defined size information NAL unit. NAL unit types include private and user data. Therefore, one of these types is selected, and the selected type of NAL unit is used as the size information NAL unit. Since there is no need to encrypt the size information, this size information NAL unit is stored in plain text.

FIG. 16B is a diagram illustrating an example of the syntax of the size information included in the size information NAL unit. The size information includes number_of_nalu and nal_size. number_of_nalu indicates the number of NAL units constituting the access unit, and nal_size indicates the data size of each NAL unit. Since the size information NAL unit is not stored in the MP4 file, the size information may not indicate the data size of the size information NAL unit itself. In the example shown in FIG. 16A, the size information NAL unit is arranged in front of the NAL unit for signaling the head of the access unit called Access Unit Delimiter (AUD), but is arranged immediately after the AUD. Also good. When the size information NAL unit is arranged immediately after the AUD, the size information of the size information NAL unit indicates the size of each NAL unit after the AUD. Therefore, the size information does not indicate the data size of AUD. However, since the data size of the AUD is fixed, the file generation device 100 may store the data size in advance. Further, similarly to the AU_info table, the size information may be arranged at the head of the payload of the PES packet.

FIG. 16C is a diagram illustrating another example of the syntax of the size information included in the size information NAL unit.

As shown in FIG. 16A, a variable length code such as zero_byte may be included in front of the NAL unit in the transport stream in addition to the start code. Therefore, as shown in FIG. 16C, the data size of the identification information including the start code of the NAL unit (prefix_size shown in FIG. 16C) may be stored in the size information. If the data size of the identification information is fixed, the data size of the identification information may be stored in the MPEG2-TS descriptor or auxiliary data at the time of conversion to the MP4 file. Further, in the NAL size format, the field length of the field indicating the data size of the NAL unit after conversion into the MP4 file may be indicated.

Also, the identification information may be regarded as a data unit, and the contents of Nal_size_info may be indicated by an AU_info table. At this time, a mode of converting the data structure of the data unit may be added in addition to the two operations of deleting the data unit and leaving it as it is by making conversion_mode multi-valued. Furthermore, identification information for identifying the type of the data unit may be added to the AU_info table. For example, the type of NAL unit such as an SPS NAL unit, a PPS NAL unit, or a slice NAL unit in MPEG-4 AVC or HEVC can be determined based on such identification information. Alternatively, it is possible to determine the MPEG-4 AAC LATM header or LATM payload. Further, a field indicating whether or not the data unit is encrypted may be provided separately.

Data units with different conversion_modes may be stored in different TS packets. When the conversion_mode is 2, the data unit stored in the sample entry as header information may be stored in the TS packet as plain text.

For encryption, only the NAL unit of the slice data may be encrypted, and the other parts may be plain text. Since encryption is performed in units of TS packets, plaintext and encrypted parts cannot be mixed in the payload of TS packets, AUD has a small data size, and if AUD is stored as an independent packet, stuffing increases and multiplexing This is because efficiency decreases. Further, stuffing is necessary when the boundary of the NAL unit data of the encrypted slice is not aligned with the end of the payload of the TS packet. When performing stuffing, there is a method of using adaptation_field of a TS packet header, or inserting Filler Data NAL unit or Filler Data SEI into encoded data. When the adaptation_field of the TS packet header is used, the adaptation_field needs to be in plain text, but switching the process of whether or not to encrypt a variable-length area is expensive. Therefore, when performing stuffing, it is desirable that the boundary of the NAL unit data is aligned with the end of the payload of the TS packet by using the stuffing data structure in the encoded data.

Also, the user data storage SEI (User unregistered SEI) may be inserted into the adaptation field of the TS packet or the access unit, and the size information may be stored in the SEI. When converting to an MP4 file, the start code format can be converted to the NAL size format based on the size information of the NAL unit stored by any one or more of these methods. Further, information indicating whether size information is stored may be stored using a descriptor in MPEG2-TS.

FIG. 17 is a flowchart showing a processing operation in which the file generation device 100 according to this modification generates an MP4 file. Specifically, this flowchart shows an example of a processing operation for changing a transport stream into an MP4 file with reference to the AU_info table shown in FIG. 13C.

The auxiliary information search unit 101 of the file generation device 100 searches the AU_info table arranged at the beginning of the payload of the PES packet (step S101). Next, the auxiliary information analysis unit 103 determines whether or not an AU_info table exists based on the search result (step S102). When it is determined that the AU_info table exists (“Yes” in step S102), the auxiliary information analysis unit 103 includes the data size of each data unit included in the AU_info table and the conversion of each data unit. The mode is acquired (step S103). On the other hand, if it is determined that the AU_info table does not exist (“No” in step S102), the sample generation unit 104a of the generation unit 104 regards the access unit separated from the PES packet as a data unit, and samples the data unit. The data is copied and pasted on mdat (step S105). When one access unit is stored in one PES packet, the sample generation unit 104a regards the payload of the PES packet as data for one access unit and separates it. When multiple access units are stored in one PES packet, or when an access unit is fragmented and stored in a PES packet, the sample generation unit 104a searches for the boundary of each access unit in the encoded data. To separate those access units.

After step S103, the auxiliary information analysis unit 103 determines whether the conversion mode is 0 (step S104). If it is determined that the conversion mode is 0 (“Yes” in step S104), the sample generation unit 104a copies the data unit corresponding to the conversion mode as sample data and pastes it to mdat (step S105). On the other hand, if it is determined that the conversion mode is not 0 (“No” in step S104), the auxiliary information analysis unit 103 determines whether or not the conversion mode is 1 (step S106). If it is determined that the conversion mode is 1 (“Yes” in step S106), the sample generation unit 104a converts the data structure of the data unit corresponding to the conversion mode, and converts the converted data structure to the data structure. The stored data unit is stored in mdat as sample data (step S107). For example, the sample generation unit 104a converts the format of the identification information of the NAL unit boundary from the start code format to the NAL size format. On the other hand, when it is determined that the conversion mode is 2 instead of 1 (“No” in step S106), the sample entry generation unit 104b stores the data unit corresponding to the conversion mode in the mdat without storing the data unit. Are stored in the sample entry (step S108). For example, the sample entry generation unit 104b stores NPS units of SPS and PPS in MPEG-4 AVC in the sample entry. Alternatively, the sample entry generation unit 104b separates AudioSpecificConfig () from the MPEG-4 AAC LATM header and stores it in the sample entry. Of the data of the data unit, the portion stored in the sample entry is defined in advance in the encoding method, but auxiliary data for specifying the storage portion in the sample entry may be indicated in the AU_info table.

(Modification 3)
The address indicating the copy start position of the stream file described in the copy manifest file in the above embodiment may not be an absolute value from the beginning of the stream file. For example, the address indicating the copy start position may be a difference value from the address indicating the copy start position of the immediately preceding entry.

The second entry included in the stream file A shown in FIG. 7 is not (copy start position address, copy size) = (577, 180), but (copy start position address, copy size) = (367, 180). ). Further, for example, the following three methods (modes 1 to 3) are possible as a method for describing the address of the copy start position and the copy size.

In mode 1, data is copied alternately from the stream file and the difference file. For example, the range of data to be copied is described as (address of copy start position, copy size), and the bit length of each field is (int32, int8), for example.

In mode 2, if the copy size is 0, copy is skipped. Since data can be copied continuously from the same file, the data length of the address can be shortened compared to mode 1. For example, the range of data to be copied is described as (address of copy start position, copy size), and the bit length of each field is (int16, int8), for example.

In mode 3, instead of explicitly indicating the copy size in mode 2, data up to the end of the payload of the TS packet indicated by the address is copied. The copy size field is omitted. Further, instead of specifying an address, a difference value between index numbers of TS packets in the stream file may be used. For example, the range of data to be copied is described as (packet number, copy mode), and the bit length of each field is, for example, (int7, int1). The copy mode indicates whether copying is skipped.

FIG. 18 is a diagram showing a specific example of address designation when mode 2 is used.

As shown in FIG. 18, the start position 1 and the start position 2 which are copy start positions are the 210th byte and the 91428th byte, respectively. If the field length of the difference value of the address is 16 bits, the maximum value of the field length is 65535. Therefore, the difference value of the address cannot be expressed by one entry. Therefore, when jumping from the start position 1 to the start position 2, two entries (65535, 0) and (25683, 180) are used. By doing so, it is possible to acquire 180 bytes of data after jumping to the start position 2.

FIG. 19 is a diagram illustrating an example of reading a continuous area exceeding the upper limit of the copy size.

In this case, by describing as (255), (0, 0), (182), the area 1 and the area 2 in the difference file can be copied continuously.

Note that the address or copy size indicating the copy start position may be described in units of 4 bytes instead of units of 1 byte. In addition, in each entry, identification information indicating whether to copy from a difference file or a stream file is separately provided, so that entries for one file are not consecutively described, but entries for one file are consecutive. May be described.

Up to this point, the process of copying data from an MPEG2-TS file to generate an MP4 file has been described. However, data is copied based on an elementary stream separated from the payload of a PES packet multiplexed into a TS packet. May be.

FIG. 20 is a diagram for explaining a process of creating an MP4 file by copying data from an elementary stream.

In the MPEG2-TS file, copy information is required for each 192-byte packet of a TS with a time stamp used in 188 bytes or a BD (Blu-ray (registered trademark) disc). Therefore, in content with a high bit rate, the number of TS packets increases and the size of the copy manifest file increases. Therefore, by using an elementary stream separated from an MPEG2-TS file, data can be copied in units of frames or units such as NAL units in HEVC or AVC. As a result, the number of entries included in “COPY MANIFEST” of the copy manifest file can be greatly reduced. For example, if the size of an MPEG2-TS file in which video is multiplexed is 10 GB and the size of a TS packet is 192 bytes, the number of packets is 55924053, and the same number of entries is required. On the other hand, if the playback time length of this video is 2 hours and the frame rate is 30 Hz, the total number of frames is 216000. Therefore, when an entry is generated for each frame, the number of entries can be greatly reduced as compared with a case where an entry is generated for each packet. In addition, there is a case in which an MPEG2-TS file is double-encrypted by AACS (Advanced Access Content System) and BD +, and data is interleaved and rearranged by BD +. Even in such a case, the order of the data constituting the encoded stream can be uniquely determined after decryption and separation of the encoded stream, so that the copying operation according to this method is possible.

When copying data from an elementary stream, since there are a plurality of elementary streams, their identification information is required. In “COPY MANIFEST”, the following description can be made corresponding to the above-described mode 1 or mode 2. Here, the address of the copy start position is indicated by a difference value from the previous entry.

In mode 1, the entry consists of (file ID, copy start position address, copy size). The bit length of each field is, for example, (int4, int22, int22 in 1-byte units). The file ID is identification information of a file including data to be copied. For example, 0 is assigned to the difference file as the file ID, and a value of 1 or more is assigned to the file of the elementary stream. In HEVC or AVC encoded data having a resolution of 4K, if the minimum compression ratio is 4, the maximum size of one frame is about 4 Mbytes. When copying data from the encoded stream, the maximum difference value of the address at the copy start position may be the maximum size of one frame, and 4 Mbytes can be expressed by 22 bits. At this time, the copy size is 22 bits in order to express the maximum size of one frame.

In mode 2, the entry consists of (file ID, copy start position address, copy size). The bit length of each field is, for example, (int4, int12, int16 in 1-byte units). In AVC or HEVC, it is necessary to convert the header of the NAL unit from the start code format to the NAL size format, and it is necessary to copy data in units of payload of the NAL unit. For this reason, a field for the address of the copy start position is provided, but if the elementary stream in the TS can be copied as it is, the field of the address for the copy start position may be omitted. Alternatively, one entry may indicate the entire NAL unit, and the header of the NAL unit may be converted into a NAL size format after copying. Also at this time, since the data can be read continuously, the address of the copy start position is unnecessary.

(Embodiment 2)
According to the BD-ROM standard, a plurality of MPEG2-TS files can be continuously reproduced by referring to a playlist, and a stream constraint condition at a file boundary is also defined. For example, when a plurality of files or playback sections are seamlessly connected, the playback sections of two audio to be connected may be overlapped.

Also in MP4, it is possible to specify a plurality of MP4 files that are continuously played back using a playlist, and it is assumed that the same constraints are added.

FIG. 21 is a diagram showing an example of audio and video playback sections in two MP4 files that are played back continuously. As shown in FIG. 21, the audio playback sections overlap. However, DTS (Decode TimeStamp) or PTS in MP4 is expressed by a relative time with the DTS of the first sample in the file as a reference (= 0), and cannot be expressed by an absolute time. Therefore, when two MP4 files (XXX1.MP4 and XXX2.MP4) whose playback sections overlap each other are continuously played back, XXX1. Immediately after the final sample of MP4, XXX2. The first sample of MP4 is played back. As a result, each overlap portion is reproduced continuously.

Therefore, in this embodiment, XXX1. MP4 and XXX2. Information indicating whether or not the MP4 playback sections overlap or information indicating the overlapping playback sections is stored in the playlist or MP4 file. In the example shown in FIG. 21, the PTS of the first sample of audio and video in the MP4 file referred to by the play item in the playlist and the playback end time of the last sample are described. Also, the absolute time of the DTS or PTS of the first sample of the MP4 file may be stored in the moov of the MP4 file. Alternatively, the absolute time of DTS or PTS of the first sample of Movie fragment or the first sample of each track in Movie fragment may be stored in moof or traf.

For example, XXX1. MP4 and XXX2. In MP4, there is no overlap or gap in the video playback section, and it is guaranteed that PTS is continuous. In this case, it is only necessary to show the information about overlap only for audio or text. XXX2. If the PTS of the first sample of the MP4 video is T0, XXX2. The offset value between PTS and T0 of the first sample of audio in MP4, or XXX1. An offset value between T0 and the reproduction end time of the last sample of the MP4 audio may be stored. XXX1. Playback end time of the last audio sample of MP4 and XXX2. The offset value with the PTS of the first sample of MP4 is set to XXX2. You may store in MP4. The playback device selects or synthesizes audio samples to be output based on these pieces of information related to overlap. Note that the video playback sections may be overlapped, or the audio or video playback sections may not be overlapped, but a gap may be provided.

In the example shown in FIG. 21, a plurality of files are continuously played using a playlist. However, in the present embodiment, a single MP4 file may be generated by integrating playback sections that are played back continuously. .

FIG. 22A is a diagram for explaining a method of generating one MP4 file by integrating playback sections.

In this embodiment, as shown in FIG. 22A, for example, playback section 1 and playback section 2 are integrated to form XXX3. MP4 is generated. The audio in the playback section 1 is composed of, for example, 100 samples from sample 1-1 to sample 1-100. The audio in the playback section 2 is composed of, for example, 100 samples from sample 2-1 to sample 2-100. Hereinafter, audio will be described as an example.

First, if there is no overlap or gap in each playback section of playback section 1 and playback section 2, XXX3. In the MP4 audio track, samples 2-1 to 2-100 are stored following samples 1-1 to 1-100, and each sample is played back in order during playback. Next, the case where the playback sections of sample 1-100 and sample 2-1 overlap will be described. If sample 1-100 and sample 2-1 have the same playback section (start time: PTS, end time: PTS + playback time length), sample 2-1 is deleted and XXX3. An MP4 audio track is configured. Thereby, the overlap between samples is eliminated. The same applies to the case where the playback section of the plurality of samples included in the playback section 1 and the playback section of the plurality of samples included in the playback section 2 overlap.

On the other hand, if the playback sections of sample 1-100 and sample 2-1 overlap and the playback sections of the samples are not the same, it cannot be handled by deleting the sample.

Therefore, if there is an overlap, XXX3. In the MP4 audio track, samples 1-100 and 2-1 are stored together, and information indicating the overlapped playback sections is stored. Let ΔT be the time length of the overlapping playback sections. In this case, for example, the last sample in the playback section 1 and the first sample in the playback section 2 are different from each other, and ΔT is stored in the trof of the moof. This ΔT indicates that the playback section of ΔT from the beginning of the audio track in the movie fragment included in the playback section 2 overlaps with the playback section of the audio track of the immediately previous movie fragment.

When there is a gap, a non-reproduction section corresponding to the gap section is generated by the edit list function in MP4 or the duration-is-empty flag set in the movie fragment.

Also, an overlap section (overlapping playback section) or a gap section (a section where a gap exists) may be indicated in the metadata in the MP4 header or an external file. In addition, MP4 including a gap section has higher affinity with conventional MP4 than MP4 including an overlap section. Therefore, when the MP4 file is generated by integrating the playback sections including the overlap sections, for example, in the playback section 2 of FIG. 22A, samples including at least a part of the overlap sections are deleted. In addition, when a gap occurs, a non-reproduction section is generated. This method can be applied not only to audio but also to video and subtitles. Further, information indicating whether or not deletion is necessary, a sample that needs to be deleted, a time length of a gap generated as a result of deletion, and the like may be stored as auxiliary information such as a copy manifest.

Such a playback apparatus that plays back an MP4 file plays back audio data in the overlap section based on information such as the overlap section included in the MP4 file. That is, if the audio data to be reproduced is data in the overlap section, the playback apparatus synthesizes the decoding result of the audio sample including the data in each overlapped playback section, for example. Then, the playback device outputs the synthesized decoding result. Further, when integrating playback sections that are continuously played back, it is necessary to identify whether or not an overlap section or a gap section exists in the playback section to be integrated. Therefore, information indicating the presence / absence of an overlap section and the time length thereof may be acquired as auxiliary information in the optical disc or from the Internet.

FIG. 22B is a block diagram of the file generation apparatus according to the present embodiment.

The file generation apparatus 20 in the present embodiment is a file generation apparatus that generates an MP4 file, and includes an integration unit 21 and a storage unit 22. The integration unit 21 generates one MP4 file by integrating the two streams so that the two streams are continuously reproduced. The storage unit 22 stores, in the generated MP4 file, information indicating a section in which the playback timing overlaps in each of the two streams. For example, the above-mentioned two streams are the playback section 1 and the playback section 2 shown in FIG. 22A, and the generated MP4 file is a file “XXX3.MP4” shown in FIG. 22A. Furthermore, the section where the playback timing overlaps in each of the two streams is, for example, a section of ΔT shown in FIG. 22A.

FIG. 22C is a flowchart of the file generation method in the present embodiment.

The file generation method in the present embodiment is a file generation method for generating an MP4 file, and includes step S21 and step S22. In step S21, one MP4 file is generated by integrating the two streams so that the two streams are continuously reproduced. Next, in step S21, information indicating a section where the reproduction timing overlaps in each of the two streams is stored in the generated MP4 file. Here, in step S21, two streams that are at least part of the original file each configured in the MP4 file format are integrated. In step S21, two streams each including audio data are integrated.

Thus, in the present embodiment, information indicating the overlap section is stored in the MP4 file. Therefore, the playback device that plays back the MP4 file can easily specify the data of the overlap section from the MP4 file using the information. As a result, the playback apparatus can appropriately play back the data by combining the data in the overlapping section. That is, an MP4 file suitable for overlapping reproduction can be generated.

In step S21, if the above-described section exists over a plurality of samples included in any one of the two streams, the two streams are deleted after deleting at least one of the plurality of samples. To integrate. Thereby, since the sample is deleted, the overlap section can be shortened. As a result, it is possible to reduce the burden of special processing by the playback device for the overlap section.

In step S22, time information indicating the time length of the above-described section is stored in the MP4 file as the above-described information. That is, the time information indicating the above ΔT is stored in the MP4 file. As a result, a playback apparatus that plays back an MP4 file can easily specify the time length of the overlap section using the information. As a result, the reproducing apparatus can appropriately reproduce the data within the specified time length by combining the data of the overlapping sections.

In step S22, the time information is stored in the traf in the moof in the MP4 file. Thereby, the reproducing | regenerating apparatus can acquire the stored time information appropriately.

Further, in the file generation method according to the present embodiment, the information may be acquired from a device holding the above information via a communication network such as the Internet. Alternatively, the information may be obtained from an optical disk on which the above information is recorded. Accordingly, the information can be easily stored in the MP4 file without generating information indicating the overlap section.

FIG. 22D is a block diagram of the playback device in the present embodiment.

The playback device 30 in the present embodiment is a playback device that plays back an MP4 file, and includes an extraction unit 31 and a synthesis unit 32. The extraction unit 31 extracts, from the MP4 file, information indicating two sections in which playback timing overlaps in a playback target content (for example, an audio track). The synthesizing unit 32 specifies two sections in the content based on the extracted information, and synthesizes and outputs the decoding results for the respective data in the two sections.

FIG. 22E is a flowchart of the reproduction method according to the present embodiment.

The reproduction method in the present embodiment is a reproduction method for reproducing an MP4 file, and includes step S31 and step S32. In step S31, information indicating two sections where the playback timing overlaps in the content to be played (for example, an audio track) is extracted from the MP4 file. Next, in step S32, two sections in the content are specified based on the extracted information, and the decoding results for the respective data in the two sections are synthesized and output.

Also, the recording medium in the present embodiment is a recording medium that records an MP4 file. The MP4 file includes content (for example, an audio track) that is read and reproduced by a computer, and information indicating two sections in which reproduction timing overlaps. Thereby, the playback device that reads and plays back the MP4 file from the recording medium can easily specify the data of the two sections described above from the MP4 file using the information. As a result, the playback device can appropriately play back the data by combining the data of those sections.

(Embodiment 3)
FIG. 23A is a diagram illustrating an example of a menu screen when an MP4 file is generated from content stored on an optical disc. FIG. 23B is a diagram for explaining a method of generating an MP4 file using an optical disc and a network.

In the optical disc, audio and subtitles in multiple languages are stored, and when the MP4 file is generated, the language stored in the MP4 file can be selected. In this example, Japanese and English are selected for each of audio and subtitles from Japanese, English, Spanish, and Chinese stored in the disc. Here, if the sample size in audio or subtitle differs for each language, the content of the copy manifest file depends on the sample size. For this reason, the number of copy manifest files increases in proportion to the combination of selected languages in audio or subtitles. Therefore, the MP4 file may always store audio and subtitle data in all languages, further store information indicating the language selected by the user, and select the user's desired language during playback. . By doing so, the copy manifest file can be made the same regardless of the selected language. Or you may prepare the copy manifest file corresponding to two cases, the case where only one of each language is stored, and the case where all the languages are stored. Also, an audio encoding method such as AAC or AC3 may be selected depending on whether or not a device that plays back an MP4 file has support. Alternatively, audio data of all encoding methods may be stored in the MP4 file. In the case of storing audio data of all encoding methods, at the time of reproduction, the encoding method is selected based on a user's selection operation or preset information of the reproduction device.

Alternatively, the audio and subtitles of all languages may be stored without selecting the language when generating the MP4 file, and the user may select it during playback. Further, as shown in FIG. 23B, if the copy manifest file can be acquired via the network, it is not necessary to store the copy manifest file on the optical disc. In particular, when the number of copy manifest files increases to make it possible to select an arbitrary language, acquisition via a network is effective. Only the copy manifest file corresponding to the default language combination may be stored in the optical disc, and the copy manifest file corresponding to the other combination may be downloaded from the server. Also, audio or subtitles in a language that can be acquired from an optical disc or a network, and audio or subtitle languages included in the MP4 file are respectively acquired, and an external language out of the languages not included in the MP4 file is acquired. The user may select and acquire a language that can be acquired from.

Or, a list of externally obtainable audio is stored in an MP4 file or the like. If the playback apparatus cannot decode the audio encoding system in the MP4 file during playback of the MP4 file, the audio of the encoding system supported by the playback apparatus may be selected and acquired from the outside. At this time, the data acquired from the outside may be encoded data including only captions or audio, or an MP4 file. At this time, at the time of reproduction, the video etc. included in the original MP4 file and the newly acquired data are reproduced synchronously. Alternatively, a complete MP4 file including all of video, audio, and subtitles may be acquired from the outside.

Also, the content stored on the optical disc may be an MP4 file instead of an MPEG2-TS file. In this case, the MP4 file data stored in the optical disc may be copied or exported as it is to a device-bound or media-bound recording medium or device without performing the conversion process. At the time of copying or exporting, the key for encrypting the content may be changed. A device that generates an MP4 file such as a BD player converts the content in the optical disc into the MP4 file by determining whether the content in the optical disc is in the MPEG2-TS format or the MP4 format based on the identification information of the data format. An MP4 file may be generated by determining whether it is necessary. Alternatively, auxiliary information such as a copy manifest file may indicate information indicating whether conversion to MP4 is necessary. Further, even if the content stored on the optical disc is in the MP4 format, the user may select a type such as audio or subtitles in a specific language in the content, a theater release version, or a director's cut version. Then, only the data selected from the MP4 file in the optical disk based on the selection result may be extracted to generate the MP4 file.

Here, as described with reference to FIG. 7, the file size can also be reduced by indicating the information for specifying the multiplexing unit of the output MP4 file in the copy manifest file, and the number of copy manifest files can be reduced. It is effective when there are many. At this time, as the information shown in the copy manifest, it is possible to uniquely determine the unit of the movie fragment in the MP4 file and use language-independent information. The information is, for example, the identification information of the PTS or DTS of the sample that is the head of the movie fragment, or the MPEG2-TS file from which the sample data is acquired. Here, when the sample size of the MP4 file is different, the contents of the box included in the stbl in the moov or the trun in the moof in the header information of the MP4 file are different. For this reason, the MP4 file header information cannot be included in a copy manifest file that can be used in common for different languages. Therefore, when converting to an MP4 file, the unit of the movie fragment is determined based on the copy manifest file, and the header information of the MP4 file is generated based on the PTS or DTS for each sample or the size of the sample. To do.

When obtaining the PTS, DTS or size for each sample, from the data such as audio, video or subtitles multiplexed in MPEG2-TS or stored in another area without being multiplexed, You need to search the boundary. Alternatively, processing such as analysis of the PES packet header is required. These required processing loads are high for high bit rate videos. Therefore, at least for the video, the size of the NAL unit constituting the access unit (corresponding to the sample of the MP4 file) and information indicating the PTS or DTS may be stored in the copy manifest.

FIG. 24 is a diagram showing an example of a copy manifest showing the NAL unit size, PTS, and DTS.

In the copy manifest, for each access unit, information indicating the size, PTS, and DTS of each NAL unit constituting the access unit is stored. With this information, the process of searching for a video stream can be greatly reduced. Furthermore, there is an advantage that the MPEG2-TS start code format can be converted to the MP4 NAL size format using the size information. The size information may indicate the size of the start code portion and the size of the NAL unit portion separately. The byte length of the field indicating the size of the NAL unit may be the same as the byte length of the size portion in the NAL size format. Thereby, the data in the start code portion can be converted into the NAL size format by replacing the data indicating the size of the NAL unit portion. This start code corresponds to the identification information shown in FIG. 15B and includes zero_byte.

Also, if all start code parts have the same size, only the default value may be set. Further, the PTS or DTS may be expressed not in the PTS or DTS value in the MPEG2-TS PES header but in the data format used in the MP4 header. For example, for DTS, the difference value between the DTSs of two samples that are consecutive in decoding order can be shown, and for PTS, the difference value between DTS and PTS can be shown. You may convert the time scale of these information into the time scale in MP4. Furthermore, information indicating the absolute value of the PTS or DTS of the first sample may be stored. Further, information for identifying a NAL unit to be deleted at the time of conversion to an MP4 file may be added. Further, when similar auxiliary information is stored for an AAC encoded stream, header information such as ADTS and LATM is deleted from the sample data. However, if the size of the header information is fixed, only one of the total value of the header information and the size of the payload data and the size of the payload data may be indicated. In the case of audio, since the frame rate is fixed, only the default value may be indicated in the DTS information.

Note that if the audio encoding method is different, the playback time length for each sample may also be different. As a result, since the PTS or DTS for each sample is also different, a copy manifest file may be prepared for each audio encoding method.

Here, when subtitle data is stored in the MP4 file, the subtitle data may be stored together at the end of the file.

FIG. 25 is a diagram showing an example of caption data stored at the end of the MP4 file.

In this case, even if the language of the caption data is changed, the movie fragment of the AV (Advanced Video) data is the same. On the other hand, when the video or audio movie fragment and the subtitle movie fragment are interleaved and stored, it is necessary to change the contents of the moof. This is because the position of the movie fragment of the AV data changes when the size of the caption data changes. In addition, caption data is smaller in size than AV data. Therefore, it is also possible to reproduce the subtitle data of the entire content or the subtitle data included in a unit such as a chapter obtained by dividing the content in a memory collectively. At this time, if subtitle data is stored together at the end of the file, there is an advantage that the subtitle data can be easily obtained.

Here, the caption data may be based on a text font or image data in a PNG (Portable Network Graphics) format or the like. In the case of image data, since the data size is larger than that of the text format, trun may be generated for each unit such as a chapter to improve accessibility to subtitle data included in a predetermined unit. Alternatively, the trun may be generated so that the size of the caption data constituting the trun is equal to or smaller than the buffer size in accordance with the buffer size that holds the text data during reproduction.

Furthermore, when storing subtitle data in multiple languages, subtitle data in a specific language can be easily obtained by storing each language in a different movie fragment. At this time, information for specifying the language stored in the movie fragment is required. Therefore, for example, each language may be handled as a different track, and the track ID may be associated with the language. The track ID is indicated by a box in the traf. The information that associates the track ID with the language may be stored in a metadata storage box or the like in MP4, or may be management information different from the MP4 file. Moreover, the correspondence between languages and movie fragments can be applied to audio.

A random accessible sample in video, audio or subtitle is indicated by mfra. Here, if the playback sections of video and audio movie fragments that are continuous with each other match, only the video random access points need be shown. In this case, an audio sample having the same PTS, immediately before or immediately after can be obtained from the immediately following movie fragment. For example, in FIG. 25, the PTS of the first sample of video (V-1) matches the first sample of audio (A-1). On the other hand, when the text is stored at the end of the file, it is necessary to indicate a random access point independently for the text.

Also, audio or subtitle data in all languages in the optical disc content is stored in the MP4 file. Here, when the user selects a desired language when generating the MP4 file, a random access point may be indicated in mfra only for a track corresponding to the selected language.

Here, it is assumed that the content stored on the optical disc has different resolutions for video and graphics. For example, the resolution of video is 4K, and the resolution of graphics such as captions is 2K in order to reduce the processing amount.

FIG. 26 is a diagram illustrating a case in which subtitles with 2K resolution are scaled to 4K and displayed. When displaying subtitles, information for specifying a subtitle display area is required together with subtitle data and its resolution. The display area is specified using, for example, the size and display position of a rectangular display area. For example, the information indicating the resolution of the track can indicate that the subtitle track is 2K and the video track is 4K. Also, in the SMPTE (Society of Motion Picture and Television Envisioners) or W3C (World Wide Web Consortium) Timed Text, etc., the XML description part of the Timed Text is part of the XML (extensive data) that constitutes the Timed Text. Alternatively, it may be stored in a box indicating metadata in the MP4 file.

When playing back an MP4 file, the resolutions of the video and subtitles are acquired, and if the resolutions of both are different, the subtitles are scaled and displayed so as to match the video resolution. At this time, if the caption is image data, the image data is enlarged, and if it is text data, a size that matches the video resolution is selected. The display area is also determined by calculation according to the scaling coefficient. Information indicating the display area after scaling according to the video resolution may be stored.

In the above-described embodiment and each modification, each component may be configured by dedicated hardware, or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. Here, the software that realizes the file generation apparatus and the like in the above-described embodiment and each modification causes the computer to execute each step included in the flowchart shown in FIG. 8 or FIG.

As described above, the file generation device and the playback device according to one or more aspects have been described based on each embodiment and each modification. However, the present disclosure is limited to these embodiments and each modification. It is not a thing. Unless it deviates from the gist of the present disclosure, various modifications conceived by those skilled in the art are applied to each embodiment and each modification, and a form constructed by combining the components in each embodiment and each modification is also possible. It may be included within the scope of one or more embodiments.

For example, in the first embodiment and its modification, a file constituted by MPEG2-TS is used as the original file. However, the original file may be any file or transport stream other than the MPEG2-TS file as long as the file is configured in a file format different from MP4.

Further, in the second embodiment, as shown in FIG. 22A, the playback section 1 and the playback section 2 are integrated, but each of these playback sections may be a movie fragment of an MP4 file, It may be a stream.

The following cases may also be included in the present disclosure.

(1) Each of the above devices is specifically a computer system including a microprocessor, a ROM, a RAM, a hard disk unit, a display unit, a keyboard, a mouse, and the like. A computer program is stored in the RAM or hard disk unit. Each device achieves its functions by the microprocessor operating according to the computer program. Here, the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.

(2) A part or all of the components constituting each of the above devices may be configured by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically, a computer system including a microprocessor, ROM, RAM, and the like. . A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

(3) A part or all of the constituent elements constituting each of the above devices may be constituted by an IC card or a single module that can be attached to and detached from each device. The IC card or the module is a computer system including a microprocessor, a ROM, a RAM, and the like. The IC card or the module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.

(4) The present disclosure may be the method described above. Further, the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of the computer program.

The present disclosure also relates to a computer-readable recording medium such as a flexible disk, a hard disk, a CD-ROM, an MO, a DVD, a DVD-ROM, a DVD-RAM, a BD (Blu-ray ( (Registered trademark) Disc), or recorded in a semiconductor memory or the like. The digital signal may be recorded on these recording media.

In the present disclosure, the computer program or the digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.

Further, the present disclosure may be a computer system including a microprocessor and a memory, and the memory may store the computer program, and the microprocessor may operate according to the computer program.

In addition, the program or the digital signal is recorded on the recording medium and transferred, or the program or the digital signal is transferred via the network or the like and executed by another independent computer system. You may do that.

(5) The above embodiment and the above modifications may be combined.

As described above, the data transmission method and the data reproduction method according to one or more aspects of the present disclosure have been described based on the embodiments, but the present disclosure is not limited to the embodiments. Unless it deviates from the gist of the present disclosure, one or more of the present disclosure may be applied to various modifications conceived by those skilled in the art in the present embodiment, or forms configured by combining components in different embodiments. It may be included within the scope of the embodiments.

The present disclosure has an effect that the processing load can be suppressed, and can be applied to, for example, a device that converts a file format from MPEG2-TS to MP4, and can be used for a device such as a smartphone or a tablet.

DESCRIPTION OF

SYMBOLS

10, 20, 100 File generation apparatus 11 Original file acquisition part 12 Difference file acquisition part 13 Procedure file acquisition part 14,104 Generation part 21 Integration part 22 Storage part 30 Playback apparatus 31 Extraction part 32 Composition part 101 Auxiliary information search part 103 Auxiliary Information analysis unit 104a Sample generation unit 104b Sample entry generation unit

Claims

A file generation method for generating an MP4 file,
One MP4 file is generated by combining the two streams so that the two streams are played back continuously,
A file generation method for storing information indicating a section in which reproduction timing overlaps in each of the two streams in the generated MP4 file.
In the integration of the two streams,
The file generation method according to claim 1, wherein the two streams that are at least a part of an original file each configured in an MP4 file format are integrated.
In the integration of the two streams,
When the section exists over a plurality of samples included in any one of the two streams,
The file generation method according to claim 2, wherein the two streams are integrated after deleting at least one of the plurality of samples.
In storing the information,
The file generation method according to any one of claims 1 to 3, wherein time information indicating a time length of the section is stored in the MP4 file as the information.
In storing the information,
The file generation method according to claim 4, wherein the time information is stored in a traf in a moof in the MP4 file.
In the file generation method,
The file generation method according to any one of claims 1 to 5, wherein the information is acquired from a device or an optical disc that holds the information.
In the integration of the two streams,
7. The file generation method according to claim 1, wherein the two streams each including audio data are integrated.
A playback method for playing MP4 files,
Extracting from the MP4 file information indicating two sections where playback timing overlaps in the content to be played back,
A reproduction method for specifying the two sections in the content based on the extracted information, and combining and outputting a decoding result for each data of the two sections.
A file generation device for generating an MP4 file,
An integration unit that generates one MP4 file by integrating the two streams so that the two streams are continuously played back;
A file generation apparatus comprising: a storage unit that stores information indicating an interval in which reproduction timing overlaps in each of the two streams in the generated MP4 file.
A playback device for playing MP4 files,
An extraction unit that extracts information indicating two sections in which reproduction timing overlaps in content to be reproduced from the MP4 file;
A playback device comprising: a combining unit that specifies the two sections in the content based on the extracted information, and combines and outputs a decoding result for each data of the two sections.
A recording medium that records an MP4 file,
The MP4 file includes a content that is read and reproduced by a computer, and information indicating two sections in which reproduction timing overlaps in the content.