WO2017169890A1 - 情報処理装置および方法 - Google Patents
情報処理装置および方法 Download PDFInfo
- Publication number
- WO2017169890A1 WO2017169890A1 PCT/JP2017/010871 JP2017010871W WO2017169890A1 WO 2017169890 A1 WO2017169890 A1 WO 2017169890A1 JP 2017010871 W JP2017010871 W JP 2017010871W WO 2017169890 A1 WO2017169890 A1 WO 2017169890A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- data
- file
- information
- setting unit
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title abstract description 185
- 238000004458 analytical method Methods 0.000 claims description 53
- 238000003672 processing method Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 abstract description 84
- 239000000523 sample Substances 0.000 description 558
- 230000008569 process Effects 0.000 description 107
- 238000006243 chemical reaction Methods 0.000 description 64
- 230000006835 compression Effects 0.000 description 51
- 238000007906 compression Methods 0.000 description 51
- 238000009826 distribution Methods 0.000 description 51
- 239000000872 buffer Substances 0.000 description 41
- 230000005540 biological transmission Effects 0.000 description 33
- 238000005516 engineering process Methods 0.000 description 33
- 230000005236 sound signal Effects 0.000 description 33
- 238000005070 sampling Methods 0.000 description 25
- 238000004891 communication Methods 0.000 description 22
- 238000010586 diagram Methods 0.000 description 20
- 238000007726 management method Methods 0.000 description 13
- 238000003860 storage Methods 0.000 description 13
- AWSBQWZZLBPUQH-UHFFFAOYSA-N mdat Chemical compound C1=C2CC(N)CCC2=CC2=C1OCO2 AWSBQWZZLBPUQH-UHFFFAOYSA-N 0.000 description 7
- 230000004044 response Effects 0.000 description 7
- 239000000284 extract Substances 0.000 description 5
- 239000012634 fragment Substances 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 230000008929 regeneration Effects 0.000 description 4
- 238000011069 regeneration method Methods 0.000 description 4
- 238000012384 transportation and delivery Methods 0.000 description 4
- 101100223811 Caenorhabditis elegans dsc-1 gene Proteins 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 244000144972 livestock Species 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 241000938605 Crocodylia Species 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000007306 functionalization reaction Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
- G11B27/30—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording on the same track as the main recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/752—Media network packet handling adapting media to network capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0017—Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
Definitions
- the present disclosure relates to an information processing apparatus and method, and more particularly, to an information processing apparatus and method capable of transmitting higher-quality audio data.
- MPEG-DASH Moving, Picture, Experts, Group, phase, and Dynamic Adaptive Streaming, over HTTP
- ISO International Organization for Standardization
- ISO International Organization for Standardization
- DSD Direct Stream Digital
- DSD lossless compression method a lossless compression method
- MPEG-DASH Dynamic-Adaptive-Streaming-over-HTTP
- URL http://mpeg.chiariglione.org/standards/mpeg-dash/media-presentation-description-and-segment-formats/text-isoiec-23009-12012-dam -1)
- the present disclosure has been made in view of such a situation, and makes it possible to transmit higher-quality audio data.
- An information processing apparatus stores encoded data that is encoded data of audio data and has a structure in which blocks, which are access units of the encoded data, are grouped every predetermined number.
- the information processing apparatus includes a sample setting unit that sets a sample including initialization information used for decoding the group of blocks as a sample which is a minimum access unit in the file in a file of a predetermined file format.
- the sample setting unit is configured to set a sample including the initialization information and a head block of the group, and a sample for each block of the other block of the group, and is set by the sample setting unit It is possible to further include a sync sample setting unit that sets the sample including the initialization information and the first block of the group to a sync sample including information necessary for starting decoding.
- a setting unit can be further provided.
- the sample setting unit may set a sample including the initialization information and all blocks of the group.
- the sample including the initialization information and all blocks of the group set by the sample setting unit, the subsample including the initialization information, the subsample including the first block of the group, and the blocks of the group A sub-sample setting unit that sets each sub-sample can be further provided.
- a sync sample setting unit configured to set a sample including the initialization information set by the sample setting unit to a sync sample including information necessary for starting decoding; and the sample setting unit further includes: It may be configured to set a sample that includes all blocks.
- the sample set by the sample setting unit may further include a subsample setting unit that sets a subsample for each block in the sample including all blocks of the group.
- the sample setting unit is configured to set a sample including the initialization information and the first block of the group, and a sample including all other blocks of the group, and the sample setting unit is set by the sample setting unit, It is possible to further include a sync sample setting unit that sets a sample including the initialization information and the first block of the group to a sync sample including information necessary for starting decoding.
- the sample set by the sample setting unit may further include a subsample setting unit that sets a subsample for each block in the sample including all the other blocks in the group.
- the sub-sample setting unit further includes a sub-sample including the initialization information and a first block of the group in a sample including the initialization information and the first block of the group set by the sample setting unit. Subsamples to be included can be set.
- the sample setting unit may further set the sample for each block to a track different from the track in which the sample including the initialization information is set.
- the sample setting unit can further set the sample for each block to a file different from the file in which the sample including the initialization information is set.
- An extension box setting unit that sets information related to the audio data in an extension box of an audio sample entry can be further provided.
- the audio data may be DSD (Direct Stream Digital) data, and the encoded data may be obtained by lossless encoding of the DSD data.
- DSD Direct Stream Digital
- the file format may be a file format compliant with ISO / IEC14496.
- An information processing method stores encoded data that is encoded data of audio data and has a structure in which blocks that are access units of the encoded data are grouped into a predetermined number.
- a sample including initialization information used for decoding the group of blocks is set as a sample which is a minimum access unit in the file in a file having a predetermined file format.
- An information processing apparatus is a minimum access unit in a file of a predetermined file format for storing encoded data of audio data, and is initialization information used for decoding the group of blocks
- a sample analysis unit that obtains decoder configuration information used for decoding the encoded data based on the analysis result, and sets the decoder configuration information obtained by the sample analysis unit
- An information processing apparatus comprising: a setting unit; and a decoding unit that decodes the encoded data using the decoder configuration information set by the setting unit.
- An information processing method is a minimum access unit in a file of a predetermined file format for storing encoded data of audio data, and initialization information used for decoding the group of blocks
- the decoder configuration information used for decoding the encoded data based on the analysis result, setting the acquired decoder configuration information, and setting the decoder configuration information This is an information processing method for decoding the encoded data by using.
- encoded data that is encoded data of audio data and has a structure in which blocks, which are access units of encoded data, are grouped every predetermined number is stored.
- a sample including initialization information used for decoding a group of blocks is set as a sample which is a minimum access unit in the file.
- initialization of a file having a predetermined file format for storing encoded data of audio data is a minimum access unit in the file and is used for decoding a group of blocks
- the sample including the information is analyzed, the decoder configuration information used for decoding the encoded data is acquired based on the analysis result, the acquired decoder configuration information is set, and the set decoder configuration information is used.
- the encoded data is decoded.
- information can be processed.
- higher quality audio data can be transmitted.
- FIG. 20 is a block diagram illustrating a main configuration example of a computer.
- DSD lossless stream MP4 file> ⁇ Distribution of video and audio>
- streaming delivery via the Internet is expected as a means of delivering video and music to consumers.
- the Internet as a transmission means is unstable in transmission compared with broadcasting and optical disks.
- the maximum transmission band rate varies greatly depending on the user's environment.
- a constant transmission band is not always ensured, and fluctuates over time.
- the fact that the transmission band fluctuates also means that the response time for requests from clients is not constant.
- MPEG-DASH Moving Picture Experts Group Dynamic Dynamic Adaptive Streaming over HTTP
- MPD Media Presentation Presentation
- a general HTTP (HyperText Transfer Protocol) server can be used by using http without using a special protocol.
- the file format is not only MPEG-TS (Moving Picture Experts Group Transport Transport) but also ISOBMFF (International Organization for Standardization Base Media File Format).
- ⁇ MPEG-DASH> An example of data transmission using MPEG-DASH is shown in FIG.
- a file generation device 2 generates video data and audio data as moving image content, encodes them, and files them in a file format for transmission. For example, the file generation device 2 converts these data into files (segments) for every 10 seconds. The file generation device 2 uploads the generated segment file to the Web server 3. Further, the file generation device 2 generates an MPD file (management file) for managing moving image content and uploads it to the Web server 3.
- MPD file management file
- the Web server 3 as a DASH server performs live distribution of the moving image content file generated by the file generation device 2 to the playback terminal 5 via the Internet 4 in accordance with the MPEG-DASH method. For example, the Web server 3 stores the segment file and MPD file uploaded from the file generation device 2. Further, the Web server 3 transmits the stored segment file or MPD file to the playback terminal 5 in response to a request from the playback terminal 5.
- the playback terminal 5 includes streaming data control software (hereinafter also referred to as control software) 6, video playback software 7, and HTTP access client software (hereinafter referred to as access software). ) Perform 8 etc.
- control software hereinafter also referred to as control software
- access software HTTP access client software
- the control software 6 is software that controls data streamed from the Web server 3. For example, the control software 6 acquires an MPD file from the Web server 3. Further, the control software 6 transmits the segment file to be reproduced based on the reproduction time information indicating the reproduction time specified by the MPD file or the moving image reproduction software 7 and the network bandwidth of the Internet 4, for example. The request is instructed to the access software 8.
- the video playback software 7 is software that plays back an encoded stream acquired from the Web server 3 via the Internet 4.
- the moving image reproduction software 7 designates the reproduction time information to the control software 6.
- the moving image reproduction software 7 obtains the reception start notification from the access software 8, it decodes the encoded stream supplied from the access software 8.
- the moving image reproduction software 7 outputs video data and audio data obtained as a result of decoding.
- Access software 8 is software that controls communication with Web server 3 using HTTP. For example, the access software 8 supplies a notification of reception start to the moving image reproduction software 7. Further, the access software 8 transmits a transmission request for the encoded stream of the segment file to be played back to the Web server 3 in response to a command from the control software 6. Further, the access software 8 receives a segment file having a bit rate corresponding to the communication environment and the like transmitted from the Web server 3 in response to the transmission request. Then, the access software 8 extracts an encoded stream from the received file and supplies it to the moving image reproduction software 7.
- DSD Direct Stream Digital
- PCM Pulse Code Modulation
- the sampling frequency is as high as 2.8MHz, 5.6MHz, and 11.2MHz, so the bit rates are 5.6Mbps, 11.2Mbps, and 22.4Mbps respectively with 2ch. Therefore, a method for compressing such high-rate DSD data without loss has been devised.
- DST Down Stream Transfer
- SACD Super Audio Compact Disc
- MPEG4 AAC Advanced Audio Coding
- ISO International Organization for Standardization / International Electrotechnical Commission
- the bit rate is constant, so that the video data has a bit rate corresponding to the bandwidth variation of the transmission path. Select a rate.
- the DSD lossless stream has local rate fluctuations. large. In other words, a bandwidth margin caused by the rate fluctuation can be allocated to video data transmission, and higher-quality video data transmission is possible.
- FIG. 4 shows a main configuration example of a compression encoding apparatus corresponding to such a new DSD lossless compression encoding system.
- 4 is an apparatus that converts an analog audio signal into a digital signal by ⁇ (sigma delta) modulation, compresses and encodes the converted audio signal, and outputs the digital signal.
- the compression encoding apparatus 10 is an apparatus that generates a DSD lossless stream by modulating and digitizing an audio signal using the DSD method and encoding the digital data (DSD data) using the new DSD lossless compression encoding method described above. It is.
- Analog audio signals are input from the input unit 11 and supplied to an ADC (Analog Digital Converter) 12.
- the ADC 12 digitizes the supplied analog audio signal by ⁇ modulation and outputs it to the input buffer 13.
- the ADC 12 includes an adder 21, an integrator 22, a comparator 23, a one sample delay circuit 24, and a 1-bit DAC (Digital Analog Converter) 25.
- the audio signal supplied from the input unit 11 is supplied to the adder 21.
- the adder 21 adds the analog audio signal one sample period before supplied from the 1-bit DAC 25 and the audio signal from the input unit 11, and outputs the result to the integrator 22.
- the integrator 22 integrates the audio signal from the adder 21 and outputs it to the comparator 23.
- the comparator 23 is compared with the midpoint potential of the input audio signal and performs 1-bit quantization for each sample period.
- the frequency of the sampling period (sampling frequency) is 64 times or 128 times that of the conventional 48 kHz or 44.1 kHz.
- the comparator 23 outputs the 1-bit quantized audio signal to the input buffer 13 and supplies it to the 1-sample delay circuit 24.
- the one-sample delay circuit 24 delays the audio signal from the comparator 23 by one sample period and outputs it to the 1-bit DAC 25.
- the 1-bit DAC 25 converts the digital signal from the 1-sample delay circuit 24 into an analog signal and outputs it to the adder 21.
- the ADC 12 configured as described above converts the audio signal supplied from the input unit 11 into a 1-bit digital signal (A / D conversion) and outputs it to the input buffer 13.
- a / D conversion of ⁇ modulation a digital audio signal having a wide dynamic range can be obtained even with a small number of bits, for example, by increasing the frequency of the sampling period (sampling frequency) sufficiently.
- a stereo (two-channel) audio signal is input to the ADC 12 from the input unit 11, and the ADC 12 converts the signal into a 1-bit signal at a sampling frequency 128 times 44.1 kHz and converts it into the input buffer 13.
- the number of quantization bits can be 2 bits or 4 bits.
- the input buffer 13 temporarily stores the 1-bit digital audio signal supplied from the ADC 12 and supplies the audio signal to the control unit 14, the encoding unit 15, and the data amount comparison unit 17 in the subsequent stage in units of one frame.
- one frame is a unit in which an audio signal is divided into a predetermined time (period) and regarded as one unit. For example, 3 seconds may be set as one frame.
- the input buffer 13 supplies the audio signal to the control unit 14, the encoding unit 15, and the data amount comparison unit 17 in units of 3 seconds.
- the audio signal input from the input unit 11 is a stereo (two-channel) signal, and is A / D converted into a 1-bit signal at a sampling frequency 128 times 44.1 kHz.
- the ⁇ modulated digital signal supplied from the input buffer 13 is also referred to as DSD data.
- the control unit 14 controls the overall operation of the compression encoding apparatus 10.
- the control unit 14 has a function of creating a conversion table table1 necessary for the encoding unit 15 to perform compression encoding and supplying the conversion table table1 to the encoding unit 15.
- the control unit 14 creates a data generation count table pretable using one frame of DSD data supplied from the input buffer 13, and further creates a conversion table table1 from the data generation count table pretable.
- the control unit 14 supplies the created conversion table table1 to the encoding unit 15 and the data transmission unit 18.
- the conversion table table1 is created (updated) in units of one frame and supplied to the encoding unit 15.
- the encoding unit 15 uses the conversion table table1 supplied from the control unit 14 to compress and encode the DSD data supplied from the input buffer 13 in units of 4 bits. Accordingly, the DSD data is supplied from the input buffer 13 to the control unit 14 at the same time as the encoding unit 15 is supplied to the control unit 14, but the encoding unit 15 waits until the conversion table is supplied from the control unit 14.
- the encoding unit 15 encodes 4-bit DSD data into 2-bit data, or encodes it into 6-bit data, and outputs the encoded data to the encoded data buffer 16.
- the encoded data buffer 16 temporarily buffers compressed data that is DSD data compression-encoded by the encoding unit 15 and supplies the compressed data to the data amount comparison unit 17 and the data transmission unit 18.
- the data amount comparison unit 17 compares the data amount of the DSD data (hereinafter also referred to as uncompressed data) supplied from the input buffer 13 and the compressed data supplied from the encoded data buffer 16 in units of frames. Since the encoding unit 15 encodes 4-bit DSD data into 2-bit data or 6-bit data as described above, the amount of data after compression is the amount of data before compression on the algorithm. It is because it may exceed. Therefore, the data amount comparison unit 17 compares the data amounts of the compressed data and the non-compressed data, selects the one with the smaller data amount, and supplies the data transmission unit 18 with selection control data indicating which one has been selected. .
- the data amount comparison unit 17 also supplies uncompressed data to the data transmission unit 18 when supplying selection control data indicating that uncompressed data has been selected to the data transmission unit 18.
- the selection control data can be said to be a flag indicating whether or not the audio data transmitted from the data transmission unit 18 is data that has been compression-encoded by the encoding unit 15 when viewed from the receiving-side device that receives the transmission data. .
- the data transmission unit 18 Based on the selection control data supplied from the data amount comparison unit 17, the data transmission unit 18 selects either compressed data supplied from the encoded data buffer 16 or uncompressed data supplied from the data amount comparison unit 17. Either one is selected, and the selected control data is transmitted to the partner apparatus via the output unit 19 together with the selection control data. In addition, when transmitting the compressed data, the data transmission unit 18 also adds the data of the conversion table table1 supplied from the control unit 14 to the compressed data and transmits it to the partner apparatus. The data transmission unit 18 can transmit the transmission data by adding a synchronization signal and an error correction code (ECC) to a digital signal for each predetermined number of samples.
- ECC error correction code
- the control unit 14 creates a data generation count table pretable for one frame of DSD data, and represents the DSD data supplied from the input buffer 13 in units of 4 bits as follows. ... D4 [n-3], D4 [n-2], D4 [n-1], D4 [n], D4 [n + 1], D4 [n + 2], D4 [n + 3], ...
- D4 [n] represents 4-bit continuous data, and is hereinafter also referred to as D4 data (n> 3).
- the control unit 14 counts the number of occurrences of D4 data next to the past three D4 data (past 12-bit data), and creates a data generation count table pretable [4096] [16] shown in FIG.
- [4096] and [16] of the data generation count table pretable [4096] [16] indicate that the data generation count table is a table (matrix) having 4096 rows and 16 columns, and [0] to [4095].
- the number of times was 10 times, the number of times “3” was 18 times, the number of times “4” was 20 times, the number of times “5” was 31 times, and “6” 11 times, “7” was 0 times, “8” was 4 times, “9” was 12 times, “ This indicates that the number of times “10” was 5 and the number of times “11” to “15” was 0.
- control unit 14 counts the number of occurrences of D4 data next to the past three D4 data (past 12-bit data) for one frame of DSD data, and generates a data generation count table pretable.
- the control unit 14 creates a conversion table table1 [4096] [3] of 4096 rows and 3 columns based on the previously generated data generation count table pretable.
- each row [0] to [4095] of the conversion table table1 [4096] [3] corresponds to a value that can be taken by the past three D4 data
- each column [0] to [2] includes the following: Of the 16 values that can be taken by the D4 data, three values having a high occurrence frequency are stored.
- the first column [0] of the conversion table table1 [4096] [3] stores the value with the highest occurrence frequency (first), and the second column [1] has the second occurrence frequency value. Is stored, and the third column [2] stores the value of the third occurrence frequency.
- FIG. 6 shows an example of the conversion table table1 [4096] [3] corresponding to the data generation count table pretable shown in FIG. Table 1 [117] [0] to [117] [2], which are the 118th row of the conversion table table1 [4096] [3], are ⁇ 05, 04, 03 ⁇ .
- This corresponds to the contents of pretable [117] [0] to [117] [15] on line 118 of the data generation count table pretable in FIG.
- the most frequently occurring (first) value is “5” generated 31 times.
- the second value of occurrence frequency is “4” generated 20 times
- the third value of occurrence frequency is “3” generated 18 times.
- ⁇ 05 ⁇ is stored in the 118th row, first column table1 [117] [0] of the conversion table table1 [4096] [3]
- ⁇ 04 ⁇ is stored
- ⁇ 03 ⁇ is stored in the 118th row, third column table1 [117] [2].
- table1 [0] [0] to [0] [2] on the first row of the conversion table table1 [4096] [3] are pretable [0] on the first row of the data generation count table pretable in FIG. This corresponds to the contents of [0] to [0] [15].
- the conversion table table1 [4096] [3] of 4096 rows and 3 columns is created based on the previously generated data generation count table pretable and supplied to the encoding unit 15.
- the encoding unit 15 converts the previous 12-bit data D4 [n-3], D4 [n-2], and D4 [n-1] immediately before 12 into a group of 12 bits. Assuming bit data, three values of the address (row) indicated by D4 [n-3], D4 [n-2], D4 [n-1] in the conversion table table1 [4096] [3], table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [0], table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [ 1], table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [2] are searched.
- the encoding unit 15 includes three values of the address (row) indicated by D4 [n-3], D4 [n-2], and D4 [n-1] of the conversion table table1 [4096] [3], table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [0], table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [1] , table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [2] are the same as D4 [n], and table1 [D4 [n-3], D4 If [n-2], D4 [n-1]] [0] is the same, D4 [n] is converted to “01b” and 2 bits, and table1 [D4 [n-3], D4 [n-2 ], D4 [n-1]] [1], D4 [n] is converted to “10b” and 2 bits, and table1 [D4 [n-3], D4 [n-2], D4 [ In the case of n-1]] [
- the encoding unit 15 has three values of the address (row) indicated by D4 [n-3], D4 [n-2], and D4 [n-1] in the conversion table table1 [4096] [3]. If the same is not found, “00b” is added before D4 [n] and converted to 6 bits, such as “00b + D4 [n]”. Here, “01b”, “10b”, “11b”, “00b + D4 [n]”, b represents binary notation.
- the encoding unit 15 converts the 4-bit DSD data D4 [n] into 2-bit data “01b”, “10b”, or “11b” using the conversion table table1, or , Converted into 6-bit data “00b + D4 [n]” and output to the encoded data buffer 16.
- FIG. 7 is a diagram illustrating a configuration example of the encoding unit 15 that performs the above-described compression encoding.
- the 4-bit DSD data (for example, D4 [n]) supplied from the input buffer 13 is stored in the register 51 that stores 4 bits.
- the output of the register 51 is connected to one input terminal 56a of the selector 55 and a register 52 for storing 12 bits.
- the register 52 stores the data immediately before the 4-bit DSD data stored in the register 51.
- the past 12-bit data (for example, D4 [n-3], D4 [n-2], D4 [n-1]) is stored.
- the conversion table processing unit 53 has a conversion table table1 supplied from the control unit 14.
- the conversion table processing unit 53 includes three values of addresses indicated by 12-bit data (for example, D4 [n-3], D4 [n-2], D4 [n-1]) stored in the register 52, table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [0], table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [1], table1 [D4 [n-3], D4 [n-2], D4 [n-1]] [2], the 4-bit data (for example, D4 [ n]), and if so, the value corresponding to the column in which the same value is stored, that is, one of “01b”, “10b”, or “11b” is 2 Store in bit register 54.
- 12-bit data for example, D4 [n-3], D4 [n-2], D4 [n-1]
- the data stored in the 2-bit register 54 is supplied to one input terminal 56 c of the selector 55.
- the conversion table processing unit 53 has three addresses of 12 bits (for example, D4 [n-3], D4 [n-2], D4 [n-1]) stored in the register 52. If there is no 4-bit data (for example, D4 [n]) stored in the register 51 in the value, a signal indicating that no conversion is performed (hereinafter referred to as no conversion signal) is selected by the selector 55. Output to.
- the selector 55 selects one of the three input terminals 56 a to 56 c and outputs data acquired from the selected input terminal 56 from the output terminal 57.
- 4-bit DSD data for example, D4 [n]
- the register 51 is supplied to the input terminal 56a
- "00b" is supplied to the input terminal 56b
- the register 54 is supplied to the input terminal 56c. 2 bits of conversion data stored in is supplied.
- the selector 55 selects the input terminal 56b, outputs “00b” from the output terminal 57, and then selects the input terminal 56a.
- 4-bit DSD data (for example, D4 [n]) stored in the register 51 is output from the output terminal 57.
- 6 bits “00b + D4 [n]” output when the conversion table table1 does not have the same D4 [n] are output from the output terminal 57.
- the selector 55 selects the input terminal 56 c and is supplied from the register 54.
- 2-bit conversion data is output from the output terminal 57.
- two bits that are output when there is the same D4 [n] in the conversion table table1, that is, “01b”, “10b”, or “11b” are output from the output terminal 57. .
- step S1 the control unit 14 counts the number of occurrences of D4 data next to the past three D4 data (past 12-bit data) for one frame of DSD data, and generates a data generation count table. Create a pretable.
- step S2 the control unit 14 creates a conversion table table1 having 4096 rows and 3 columns based on the created data occurrence count table pretable.
- the control unit 14 supplies the created conversion table table1 to the encoding unit 15 and the data transmission unit 18.
- step S3 the encoding unit 15 performs compression encoding on the DSD data for one frame period using the conversion table table1. Specifically, the encoding unit 15 converts the 4-bit DSD data D4 [n] into 2-bit data “01b”, “10b”, or “11b”, or 6-bit data “00b +” The process of converting to D4 [n] ”is performed on the DSD data for one frame period. The compressed data obtained by the compression encoding is supplied to the encoded data buffer 16 and the data amount comparison unit 17.
- step S ⁇ b> 4 the data amount comparison unit 17 compares the data amount of one frame of uncompressed data supplied from the input buffer 13 with the data amount of one frame of compressed data supplied from the encoded data buffer 16. It is determined whether or not it has been reduced than before compression.
- step S4 If it is determined in step S4 that the data amount has been reduced from before compression, the process proceeds to step S5, and the data amount comparison unit 17 sends selection control data indicating that the compressed data has been selected to the data transmission unit 18. Supply.
- step S ⁇ b> 6 the data transmission unit 18 supplies the selection control data (a flag indicating compression-encoded data) indicating that the compressed data has been selected and the compressed data supplied from the encoding unit 15 from the control unit 14.
- the data of the converted conversion table table1 (conversion table data) is added and transmitted to the partner apparatus.
- step S4 If it is determined in step S4 that the data amount has not been reduced than before compression, the process proceeds to step S7, and the data amount comparison unit 17 selects selection control data indicating that uncompressed data has been selected. The data is supplied to the data transmitter 18 together with the uncompressed data.
- step S8 the data transmitting unit 18 transmits selection control data (a flag indicating uncompressed data) indicating that uncompressed data has been selected and uncompressed data to the counterpart device.
- selection control data a flag indicating uncompressed data
- steps S1 to S8 described above are repeatedly performed on DSD data in units of one frame that are sequentially supplied from the input buffer 13.
- FIG. 9 shows a main configuration example of a decoding apparatus corresponding to the above-described new DSD lossless compression encoding method.
- the decoding device 70 in FIG. 9 is a device that receives and decompresses (losslessly decodes) an audio signal that has been compressed and transmitted by the compression coding device 10 in FIG. 4.
- the audio signal transmitted after compression encoding by the compression encoding apparatus 10 of FIG. 4 is transmitted from a network (not shown) (for example, LAN (Local Area Network), WAN (Wide Area Network), Internet, telephone line network, satellite communication network, etc.)
- the data is received by the input unit 71 of the decoding device 70 via the public line network and the like and supplied to the data receiving unit 72.
- the data receiving unit 72 separates the synchronization signal included in the received data, and detects and corrects transmission errors that occur during network transmission. Then, the data reception unit 72 determines whether or not the audio signal is compression-encoded based on selection control data included in the reception data and indicating whether or not the audio signal is compression-encoded. When the audio signal is compression-encoded, the data receiving unit 72 supplies the received compressed data to the encoded data buffer 73. When the audio signal is not compressed and encoded, the data receiving unit 72 supplies the received uncompressed data to the output buffer 76. Further, the data receiving unit 72 supplies the data (conversion table data) of the conversion table table1 included in the received data to the table storage unit 75. The table storage unit 75 stores the conversion table table1 supplied from the data receiving unit 72 and supplies it to the decoding unit 74 as necessary.
- the encoded data buffer 73 temporarily stores the compressed data supplied from the data receiving unit 72 and supplies the compressed data to the subsequent decoding unit 74 at a predetermined timing.
- the decoding unit 74 decodes the compressed data to a state before compression (reversible decoding) and supplies it to the output buffer 76.
- a decoding method by the decoding unit 74 will be described.
- the case where the compressed data transmitted after being compressed and encoded by the compression encoding device 10 is expressed in units of 2 bits as follows and E2 [n] is decoded will be described. ... E2 [n-3], E2 [n-2], E2 [n-1], E2 [n], E2 [n + 1], E2 [n + 2], E2 [n + 3], ...
- E2 [n] represents 2-bit continuous data and is also referred to as E2 data.
- the decoding unit 74 first determines the value of E2 [n]. If E2 [n] is “00b”, the data is not mounted in the received conversion table table1 [4096] [3], so the next 4-bit data “E2 [n +] after E2 [n] 1] + E2 [n + 2] ”is the data to be decoded. In addition, when E2 [n] is “01b”, “10b”, or “11b”, it is the data mounted in the received conversion table table1 [4096] [3], so it was decoded immediately before Using the 12-bit D4 data D4 [n-3], D4 [n-2], and D4 [n-1], the conversion table table1 [4096] [3] is referenced to search for data to be decoded.
- Data to be decoded is data stored in “table1 [D4 [n-3], D4 [n-2], D4 [n ⁇ 1]] [E2 [n] ⁇ 1]”.
- the decoding unit 74 can decode (reversibly decode) the compressed data to the state before compression.
- the decoding unit 74 includes a 2-bit register 91, a 12-bit register 92, a conversion table processing unit 93, a 4-bit register 94, and a selector 95.
- the 2-bit E2 data (for example, E2 [n]) supplied from the encoded data buffer 73 is stored in the register 91.
- the 12-bit register 92 is supplied with the output of the selector 95, and the register 92 decodes immediately before the 2-bit E2 data (for example, E2 [n]) stored in the register 91.
- 12-bit data for example, D4 [n-3], D4 [n-2], D4 [n-1]
- the selector 95 selects the input terminal 96a and the 4-bit data next to E2 [n].
- E2 [n + 1] + E2 [n + 2] is output from the output terminal 97 as a decoding result.
- the conversion table processing unit 93 is supplied from the table storage unit 75.
- the selector 95 selects the input terminal 96b, and outputs the data stored in the register 94 from the output terminal 97 as a decoding result.
- the output buffer 76 appropriately selects either the uncompressed data supplied from the data receiving unit 72 or the decoded data supplied from the decoding unit 74, and supplies the selected data to the analog filter 77.
- the analog filter 77 performs predetermined filter processing such as a low-pass filter and a band-pass filter on the decoded data supplied from the output buffer 76, and outputs the result from the output unit 78.
- predetermined filter processing such as a low-pass filter and a band-pass filter
- step S21 the data receiving unit 72 determines whether the received data is compression-encoded compressed data based on selection control data included in the received data.
- step S21 If it is determined in step S21 that the received data is compressed data, the process proceeds to step S22, and the data receiving unit 72 supplies the conversion table data included in the received data to the table storage unit 75.
- the conversion table processing unit 93 acquires the received conversion table table1 via the table storage unit 75.
- step S22 the compressed data included in the received data is supplied to the encoded data buffer 73.
- step S23 the decoding unit 74 decodes the compressed data supplied from the encoded data buffer 73 using the conversion table table1, and supplies the decoded data to the output buffer 76. That is, when the 2-bit E2 data (for example, E2 [n]) is “00b”, the decoding unit 74 performs the next 4-bit data “E2 [n + 1] + E2 [n] after E2 [n].
- step S21 If it is determined in step S21 that the received data is not compressed data, that is, uncompressed data, the process proceeds to step S24, and the data receiving unit 72 includes the uncompressed data included in the received data. Is supplied to the output buffer 76.
- uncompressed data or data decoded by the decoding unit 74 is supplied to the output buffer 76, and the data supplied to the output buffer 76 is output to the analog filter 77.
- step S25 the analog filter 77 performs a predetermined filter process on the data supplied via the output buffer 76.
- the filtered audio signal is output from the output unit 78.
- the above processing is repeatedly executed for the audio signal in units of one frame.
- a GOB Group of Blocks
- DSD_lossless_payload a unit in which configuration information (configuration) is added to the head of the GOB.
- Information code book; reference table
- GOB header GOB header
- GOB data GOB data
- DSD lossless stream is composed of a plurality of DSD lossless payloads (DSD_lossless_payload ()).
- one DSD lossless payload is composed of a format version (format version), a GOB config (GOB config), and a GOB.
- the GOB is composed of a GOB header (GOB header), GOB data (GOB data), and 10 blocks (Block 1 to Block 10).
- the GOB header and the GOB data are also referred to as GOB initializers (GOB initializer) used for decoding the GOB.
- the GOB initializer includes decoder configuration information (decoder configuration), metadata (metadata), codebook (code book), and the like used for decoding.
- the block (Block) includes a block header (Block header), left channel audio data (L), right channel audio data (R), and byte alignment (byte). align) (when DSD data is 2ch left and right).
- DSD_lossless_payload stores, for example, format version, DSD_lossless_gob_configuration (), DSD_lossless_gob (number_of_audio_data), and the like.
- This format version corresponds to the format version shown in FIG.
- DSD_lossless_gob_configuration corresponds to the GOB config (GOB config) in FIG.
- DSD_lossless_gob corresponds to the GOB in FIG.
- DSD_lossless_gob_configuration stores, for example, channel_configuration, number of blocks, sampling_frequency, comment_flag, comment_size, comment_byte, and the like.
- DSD_lossless_gob stores, for example, DSD_lossless_gob_header (), DSD_lossless_gob_data (), DSD_lossless_block (), byte_align (), and the like.
- This DSD_lossless_gob_header () corresponds to the GOB header (GOB header) in FIG.
- DSD_lossless_gob_data corresponds to the GOB data (GOB data) in FIG.
- DSD_lossless_block corresponds to each block (Block 1 to Block 10) in FIG.
- DSD_lossless_gob_header stores, for example, DSD_lossless_block_info and the like.
- DSD_lossless_gob_data stores, for example, gob_codebook_length, gob_codebook [i], and the like.
- gob_codebook [i] corresponds to the code book (code book) in FIG.
- ⁇ Decryption> An example of a state of decoding the DSD lossless stream of the new DSD lossless compression encoding method described above will be described.
- data for a predetermined time is collectively managed as GOB. That is, as shown in FIG. 13A, the DSD lossless stream has a configuration in which a GOB initializer and a predetermined number of blocks (for example, 10 blocks) are continuous. Since the GOB initializer has a playback time of 0, the management of the playback time becomes complicated if it is a single access unit. Therefore, the GOB initializer is added to block 1 which is the first block of the GOB, and the GOB initializer and block 1 are used as one access unit.
- the DSD lossless stream decoder expands and decodes each block using the decoder configuration information included in the GOB initializer. Therefore, in order to decode each block of the GOB, it is necessary to first read the GOB initializer into the DSD lossless decoder. For example, when each block is sequentially decoded from the first block (block 1) of GOB (in the case of sequential decoding), as shown in FIG. 13B, since the GOB initializer is added to block 1, What is necessary is just to input to a DSD lossless decoder sequentially.
- the GOB initializer can be read from the DSD lossless stream separately from the block 1, first, as shown in FIG. 13D, the GOB initializer is read, added to the block 6, and input to the DSD lossless decoder. You may do it. In this case, decoding of unnecessary blocks can be omitted.
- MP4 file a file (hereinafter also referred to as MP4 file) in a file format (hereinafter also referred to as MP4) defined in Part 14 of MPEG-4 (ISO / IEC 14496-14: 2003), which is a derived format of the ISOBMFF format was considered to be used.
- Video and audio data can be converted into MP4 files as shown in the example of FIG.
- uncompressed video material (video data) is converted into an image format, encoded with an AVC (Advanced Video Coding) encoder, HEVC (High Efficiency Video Coding) encoder, etc., and a file with the extension “bsf” (.Bsf file) is generated.
- the .bsf file is a file that stores an encoded stream.
- DSD audio material (DSD data) is encoded by, for example, the new DSD lossless compression encoding method described above using a DSD lossless encoder, and a file with the extension “enc” (.enc file), extension An “afr” file (.afr file), an extension “esd” file (.esd file), and the like are generated.
- the .enc file is a file that stores an encoded DSD lossless stream.
- the .afr file is a file that stores metadata that assists in creating a sample table when storing in an MP4 file.
- the .esd file is a file that stores metadata for decoder configuration.
- DSD audio material is converted to PCM data by DSD-PCM conversion, encoded by AAC encoder, and files with extension “aac” (.aac file), .afr file, .esd file, etc. are generated You may be made to do.
- the .aac file is a file that stores an encoded aac stream.
- ⁇ Information stored in these files is multiplexed to generate MP4 files.
- an MP4 file has a hierarchical structure called a box.
- an MP4 file includes a file type compatibility box (Flie Type Compatibility Box (ftyp)), a movie box (Movie Box (moov)), a media data box (Media Data Box (mdat) )).
- the file type compatibility box (ftyp) represents the beginning of the file and stores information for identifying the type of the file format.
- the movie box (moov) stores content metadata and the like.
- the media data box (mdat) stores actual AV data (actual data).
- a movie box includes a movie header box (Movie Header Box), a track box (Track Box (track)), and the like.
- the movie header box stores, for example, movie time axis setting information, enlargement / reduction, rotation, reproduction speed information, and the like.
- a track box (track) is generated for each track.
- the track box (track) stores information about the track, for example.
- a track box has a track header box (Track Header Box), an edit box (Edit Box), a media box (Media Box (mdia)), and the like.
- the track header box stores, for example, information relating to screen composition such as a spatial position, size, enlargement / reduction, layer, etc., information relating to association between tracks, and the like.
- information related to AV synchronization such as time position and playback speed is stored.
- the media box for example, information regarding AV data is stored.
- the media box (mdia) includes a media header box (Media header box), a media handler box (Media handler box), a media information box (Media information box (minf)), and the like.
- Media header box for example, information on the type of AV data, media time axis setting, language setting, and the like are stored.
- media information box (minf) for example, information on data and samples is stored.
- the media information box (minf) has a data information box (Data Information Box), a sample table box (Sample Data Box), and the like.
- the data information box stores information related to data reference such as a data storage location and a reference method.
- information related to sample management such as data time and address information is stored.
- sample table box A sample is a minimum access unit in the MP4 file format.
- a main configuration example of the sample table box is shown in FIG.
- the sample table box (stbl) includes a sample description box (Sample Description Box), a time-to-sample box (Time To Sample Box), a sample size box (Sample Size Box), It has a Sample To Chunk Box, Chunk Offset Box, Sync Sample Box, and Subsample Information Box.
- sample description box for example, information on codec, image size, etc. is stored.
- the sample description box stores a sample entry (sample entry) in which information about the sample is stored. Decode configuration information is stored in this sample entry.
- time-to-sample box for example, information regarding the time of the sample is stored.
- sample size box for example, information regarding the size of the sample is stored.
- sample-to-chunk box for example, information regarding the position of sample data is stored.
- chunk offset box for example, information regarding data offset is stored.
- sync sample box for example, information about sync samples is stored.
- a sync sample (Sync ⁇ ⁇ ⁇ ⁇ Sample) is a sample that can be randomly accessed, that is, a sample that can serve as a starting point of decoding.
- the sync sample box stores information necessary for starting decoding (for example, information necessary for decoding, information indicating the starting point of decoding, etc.).
- An example of the definition of the sync sample is shown in FIG.
- An example of the syntax of the sync sample is shown in FIG.
- An example of the semantics of the sync sample is shown in FIG.
- a subsample is a unit representing a part of a byte range indicated by a sample (sample). That is, the byte range pointed to by the sample can be divided into a plurality of subsamples. In other words, multiple subsamples can be set within a sample.
- An example of syntax of the subsample is shown in FIG.
- An example of subsample semantics is shown in FIG.
- DSD lossless stream obtained by encoding high-quality DSD data using the new DSD lossless compression encoding method is converted into an MP4 file and distributed using MPEG-DASH to deliver higher-quality data Is possible.
- a method for storing this DSD lossless stream in an MP4 file has not yet been considered. For example, if one audio sample (one quantized sample) in an elementary stream is assigned to one MP4 sample in the MP4 system layer as a rule, the number of MP4 samples becomes enormous. For example, in the case of 2.8MHz DSD data, 2.8 million MP4 samples will be created in one second. This is a very heavy load and inefficiency in a system that processes each MP4 sample and is difficult to realize. Therefore, the DSD lossless stream cannot be distributed by MPEG-DASH. As a result, higher quality audio data could not be transmitted.
- an audio access unit in which a plurality of quantized samples within a certain time is collected is configured to correspond to 1MP4 sample.
- the processing load can be reduced by handling the access unit with a larger data size in consideration of the mounting load.
- FIG. 18 is a block diagram illustrating an example of a configuration of a distribution system that is an aspect of an information processing system to which the present technology is applied.
- a distribution system 100 shown in FIG. 18 is a system for distributing data (contents) such as images and sounds.
- the file generation device 101, the distribution server 102, and the playback terminal 103 are connected via a network 104 so as to communicate with each other.
- the file generation apparatus 101 performs processing related to generation of an MP4 file that stores audio data.
- the file generation device 101 generates audio data, generates an MP4 file that stores the generated audio data, and supplies the generated MP4 file to the distribution server 102.
- the distribution server 102 performs processing related to MP4 file distribution.
- the distribution server 102 acquires and manages the MP4 file supplied from the file generation apparatus 101, and provides a distribution service using MPEG-DASH.
- the distribution server 102 supplies the MP4 file to the playback terminal 103 in response to a request from the playback terminal 103.
- the playback terminal 103 performs processing related to playback of audio data.
- the playback terminal 103 requests the distribution server 102 to distribute the MP4 file according to MPEG-DASH, and acquires the MP4 file supplied in response to the request.
- the playback terminal 103 decodes the MP4 file and plays back the audio data.
- the network 104 is an arbitrary communication network, may be a wired communication network, a wireless communication network, or may be configured by both of them. Further, the network 104 may be configured by a single communication network, or may be configured by a plurality of communication networks.
- the network 104 may include a communication network or a communication path of an arbitrary communication standard such as a wired communication network complying with the standard.
- the file generation device 101, the distribution server 102, and the playback terminal 103 are connected to the network 104 so that they can communicate with each other, and can exchange information with each other via the network 104.
- the file generation apparatus 101, the distribution server 102, and the playback terminal 103 may be connected to the network 104 by wired communication, wireless communication, or both. You may be made to do.
- the distribution server 100 is configured with one file generation device 101, one distribution server 102, and one reproduction terminal 103.
- these numbers are arbitrary and the same. It does not have to be.
- each of the file generation device 101, the distribution server 102, and the playback terminal 103 may be singular or plural.
- FIG. 19 is a block diagram illustrating a main configuration example of the file generation device 101.
- the file generation apparatus 101 includes a DSD generation unit 111, a DSD encoding unit 112, an MP4 file generation unit 113, and a setting unit 114.
- the DSD generation unit 111 performs processing related to generation of DSD data. For example, the DSD generation unit 111 performs ⁇ modulation on the input audio signal (audio analog signal) and converts it into DSD data that is 1-bit digital data. In addition, the DSD generation unit 111 supplies the generated DSD data to the DSD encoding unit 112, for example.
- the DSD encoding unit 112 performs processing related to encoding of DSD data. For example, the DSD encoding unit 112 encodes the DSD data supplied from the DSD generation unit 111 using the new DSD lossless compression encoding method described above, and generates a DSD lossless stream. Further, the DSD encoding unit 112 supplies the generated DSD lossless stream to the MP4 file generation unit 113, for example.
- the MP4 file generation unit 113 performs processing related to generation of an MP4 file. For example, the MP4 file generation unit 113 acquires the DSD lossless stream supplied from the DSD encoding unit 112 and generates an MP4 file that stores the DSD lossless stream. For example, the MP4 file generation unit 113 generates the MP4 file according to the setting by the setting unit 114. Also, the MP4 file generation unit 113 outputs the generated MP4 file to the outside of the file generation apparatus 101, for example. For example, the MP4 file generation unit 113 supplies the MP4 file to the distribution server 102 via the network 104.
- the setting unit 114 performs processing related to the setting for the generation of the MP4 file by the MP4 file generation unit 113.
- the setting unit 114 generates settings related to the generation of the MP4 file, and sets the settings in the MP4 file generation unit 113.
- the setting unit 114 includes a sample table box setting unit 121, a sample entry setting unit 122, a sync sample box setting unit 123, and a subsample information box setting unit 124.
- the sample table box setting unit 121 performs processing related to setting of the sample table box.
- the sample entry setting unit 122 performs processing related to setting of sample entries.
- the sync sample box setting unit 123 performs processing related to the setting of the sync sample box.
- the subsample information box setting unit 124 performs processing related to the setting of the subsample information box.
- the MP4 file generation unit 113 and the setting unit 114 may be a single device (MP4 file generation device 131).
- the MP4 file generation device 131 generates and outputs an MP4 file that stores the input DSD lossless stream.
- the DSD encoding unit 112 may be added to the configuration of the MP4 file generation device 131 to form one device (MP4 file generation device 132).
- the MP4 file generation device 132 generates a DSD lossless stream by lossless encoding the input DSD data, and further generates and outputs an MP4 file that stores the DSD lossless stream.
- FIG. 20 is a block diagram illustrating a main configuration example of the playback terminal 103.
- the playback terminal 103 includes an MP4 file acquisition unit 141, a DSD decoding unit 142, an output control unit 143, an output unit 144, and a control unit 145.
- the MP4 file acquisition unit 141 performs processing related to MP4 file acquisition. For example, the MP4 file acquisition unit 141 requests the distribution server 102 to distribute content in accordance with MPEG-DASH, and acquires the MP4 file of the content supplied in response to the request. Further, for example, the MP4 file acquisition unit 141 extracts a DSD lossless stream or the like from the acquired MP4 file, and supplies it to the DSD decoding unit 142. Also, for example, the MP4 file acquisition unit 141 extracts control information and the like from the acquired MP4 file and supplies it to the control unit 145.
- the DSD decoding unit 142 performs processing related to decoding of the DSD lossless stream. For example, the DSD decoding unit 142 decodes the DSD lossless stream by a decoding method corresponding to the above-described new DSD lossless compression encoding method, and restores the DSD data. Note that the DSD decoding unit 142 performs this decoding based on the control of the control unit 145, for example. For example, the DSD decoding unit 142 supplies the restored DSD data to the output control unit 143.
- the output control unit 143 performs processing related to control of output of DSD data. For example, the output control unit 143 controls the output of the DSD data by discarding the DSD data supplied from the DSD decoding unit 142 or supplying the DSD data to the output unit 144. Note that the output control unit 143 performs this output control based on the control of the control unit 145, for example.
- the output unit 144 performs processing related to output of DSD data.
- the output unit 144 includes a speaker or the like, converts DSD data supplied from the output control unit 143 into an audio signal (audio analog signal), and outputs the audio signal from the speaker or the like.
- the output unit 144 may include an output terminal, for example, and output an audio signal or DSD data to the outside of the playback terminal 103 (supplied to other devices).
- the configuration of the output unit 144 is arbitrary and may include devices other than those described above.
- the control unit 145 performs processing related to control of decoding of the DSD lossless stream and control of output of DSD data. For example, the control unit 145 controls the DSD decoding unit 142 to control decoding of the DSD lossless stream. For example, the control unit 145 controls the output control unit 143 to control the output of DSD data. For example, the control unit 145 acquires control information from the MP4 file acquisition unit 141 and performs these controls based on the control information.
- control unit 145 includes a sample table box analysis unit 151, a subsample information box analysis unit 152, a sync sample box analysis unit 153, a sample entry analysis unit 154, a decoder configuration information setting unit 155, and a reproduction control unit 156. .
- the sample table box analysis unit 151 performs processing related to analysis of the sample table box.
- the subsample information box analysis unit 152 performs processing related to the analysis of the subsample information box.
- the sync sample box analysis unit 153 performs processing related to the analysis of the sync sample box.
- the sample entry analysis unit 154 performs processing related to analysis of sample entries.
- the decoder configuration information setting unit 155 performs processing related to analysis of the decoder configuration information.
- the playback control unit 156 performs processing related to playback control of DSD data.
- the MP4 file acquisition unit 113, the DSD decoding unit 142, the output control unit 143, and the control unit 145 may be a single device (MP4 file playback device 161).
- the MP4 file playback device 161 extracts a DSD lossless stream from the input MP4 file, decodes the DSD lossless stream, and generates DSD data. Further, the MP4 file playback device 161 outputs a desired range (from a desired position to a desired position) of the generated DSD data.
- ⁇ MP4 sample settings> As described above, a method for storing a DSD lossless stream obtained by encoding high-quality DSD data using a new DSD lossless compression encoding method in an MP4 file has not yet been considered. For example, what kind of data is assigned to a sample of an MP4 file was not defined.
- a file having a predetermined file format for storing encoded data of audio data and having encoded data having a structure in which blocks as access units of the encoded data are grouped by a predetermined number is stored in the file.
- a sample including initialization information used for decoding a group of blocks is set as a sample which is the minimum access unit in.
- the encoded data may be, for example, the above-described DSD lossless stream (a stream obtained by encoding DSD data using the above-described new DSD lossless compression encoding method).
- the block may be a block in the DSD lossless stream
- the group of blocks may be GOB
- the initialization information may be a GOB initializer.
- the predetermined file format may be the above-described MP4 file format (file format conforming to ISO / IEC14496)
- the sample may be a sample in the MP4 file. That is, for example, a sample including a DSD lossless stream GOB initializer may be set in an MP4 file.
- the sample entry setting unit 122 sets a sample including such a GOB initializer.
- each block can be decoded using the initializer information.
- the DSD lossless stream can be stored in the MP4 file. Therefore, streaming distribution of a DSD lossless stream using MPEG-DASH is possible, and higher-quality audio data can be transmitted.
- Each block of the DSD lossless stream may be assigned to different samples. That is, one block may be assigned to one sample of the MP4 file. For example, as shown in FIG. 21, a sample may be set as indicated by a double arrow 172 for a DSD lossless stream 171 for 1 GOB (10 blocks). A double arrow 172 indicates the range of the sample.
- a sample for each block may be set.
- a total of 11 samples are set for the DSD lossless stream 171 including samples including the GOB initializer and samples for each block.
- the sample entry setting unit 122 further sets such a sample for each block.
- a sample including a GOB initializer may be set as a sync sample (Sync Sample).
- a sync sample (Sync Sample).
- an ellipse 173 indicates that the sample is a sync sample.
- the sync sample box setting unit 123 sets a sample including the GOB initializer as a sync sample. In this case, although the sync sample includes the GOB initializer but does not include the block, the reproduction time of this sample is set to zero.
- the GOB initializer may include the first block (block 1) of the GOB in the sample including the GOB initializer, and the second and subsequent blocks of the GOB may be assigned to the samples.
- a total of 10 samples including the samples including the GOB initializer and the block 1 and the samples of the blocks 2 to 10 are set.
- the sample entry setting unit 122 sets a sample including the GOB initializer and the block 1, and further sets a sample for each block for the blocks 2 to 10.
- a sample including the GOB initializer and block 1 may be set as a sync sample (Sync Sample).
- the sync sample box setting unit 123 sets a sample including the GOB initializer and block 1 as a sync sample.
- the GOB initializer part and the block 1 part Each may be set as a subsample.
- a dotted double-pointed arrow 174 indicates a sub-sample range. That is, in this case, a subsample including the GOB initializer and a subsample including the block 1 are set in the sample including the GOB initializer and the block 1.
- the subsample information box setting unit 124 sets such a subsample.
- the sample including the GOB initializer and block 1 may be set as a sync sample (Sync Sample).
- the sync sample box setting unit 123 sets a sample including the GOB initializer and block 1 as a sync sample. That is, in this case, both the sync sample and the subsample are set.
- sample number (sample number) is set as shown in A of FIG. 22 in the sync sample box (SyncSampleBox (stss)).
- sample number is set as shown in B of FIG. 22 in the sync sample box (stss).
- sample delta is a parameter indicating (identifying) the position (number) of a sample for setting a subsample.
- the subsample count is a parameter indicating the number of set subsamples.
- the subsample size is a parameter indicating the size of each subsample. For example, subsample size 1 (subsample_size_1) indicates the size of the subsample including the GOB initializer, and its value is 25 bytes (bytes).
- Subsample size 2 indicates the size of the subsample including block 1, and the value depends on each subsample (for example, x01, x02, x03, ).
- Decoding can be controlled in units of blocks by setting samples as in (A-1). Thereby, an increase in load can be suppressed as compared with the case of sampling for each quantized sample.
- a sample including the GOB initializer is set for each GOB, random access in units of blocks can be performed by controlling to access the first sample of the GOB first. For example, when accessing a block in the middle of a GOB, if the GOB initializer is first read, decoding can be started from the block in the middle (decoding of the block before that block is omitted).
- the sample including the GOB initializer can be set as the sync sample. Therefore, the sample including the GOB initializer can be accessed more easily, and any block in the GOB can be accessed. That is, random access in units of blocks becomes possible. Also, since the GOB initializer is stored in a different sample from the block, the GOB initializer can be read without requiring decoding of the block. Therefore, the GOB initializer can be read at a higher speed.
- the sample including the GOB initializer and block 1 can be set as the sync sample. Therefore, random access in block units is possible.
- setting samples as shown in (A-6) makes it easier to match the number of samples and playback time, enabling faster reading of the GOB initializer, and for each block. Random access becomes possible.
- Each block of the DSD lossless stream may be assigned to one sample. That is, 1 GOB may be assigned to one sample of the MP4 file. For example, as shown in FIG. 23, a sample may be set as indicated by a double arrow 172 for a DSD lossless stream 171 for 1 GOB (10 blocks).
- the entire GOB that is, a sample including the GOB initializer and the blocks 1 to 10 may be set.
- one sample is set for the DSD lossless stream 171.
- the sample entry setting unit 122 sets the sample including the GOB initializer and all blocks.
- a sample including the entire GOB set as shown in (B-1) may be set as a sync sample.
- an ellipse 173 indicates a sync sample.
- the sync sample box setting unit 123 sets the sync sample in this way.
- the GOB initializer and the block 1 part and the block 2 to block 10 block parts are respectively You may make it set as a subsample.
- a dotted double-pointed arrow 174 indicates a subsample range. That is, in this case, the sum of the sample including the entire GOB, the subsample including the GOB initializer and block 1, the subsample including block 2, the subsample including block 3, ..., and the subsample including block 10 Ten subsamples are set.
- the subsample information box setting unit 124 sets such a subsample.
- the GOB initializer portion and the block portions of blocks 1 to 10 Each may be set as a subsample. That is, in this case, a total of 11 samples, including a sample including the entire GOB, a subsample including the GOB initializer, a subsample including the block 1, a subsample including the block 2, ..., and a subsample including the block 10 are included. Subsample is set. For example, the subsample information box setting unit 124 sets such a subsample.
- a sample including the entire GOB in which subsamples are set as in (B-3) may be set as a sync sample.
- the sync sample box setting unit 123 sets the sync sample in this way.
- a sample including the entire GOB in which the subsample is set as shown in (B-4) may be set as the sync sample.
- the sync sample box setting unit 123 sets the sync sample in this way.
- sample delta (sample_delta), subsample count (subsample_count), subsample size (subsample_size_1, ..., subsample_size_10) Etc. are set as shown in FIG.
- the value of the sample delta is set to “1”.
- the value of the subsample count is set to “10”.
- the size of each subsample is set as each subsample size (for example, x11, ..., x110, y11, ..., y110, z11, ..., z110).
- sample delta (sample_delta), subsample count (subsample_count), subsample size (subsample_size_1, ..., subsample_size_11) Etc. are set as shown in FIG.
- the value of the sample delta is set to “1”.
- the value of the subsample count is set to “11”.
- Subsample 1 (subsample_size_1) indicates the size of the subsample including the GOB initializer, and its value is set to “25” (bytes).
- the size of each subsample is set as each subsample size after subsample 2 (for example, x21, ..., x210, y21, ..., y210, z21, ..., z210).
- Decoding can be controlled in GOB units by setting samples as in (B-1) (as in (B-2)). Thereby, an increase in load can be suppressed as compared with the case of sampling for each quantized sample.
- a sample including a GOB initializer is set for each GOB, and each sample is automatically set as a sync sample. Therefore, random access is possible at least for each GOB. Access to a block in the middle of GOB may be sequentially decoded from block 1 and the output result before the desired block may be discarded in output control.
- the GOB initializer and blocks 1 to 10 can be extracted as subsamples.
- the initializer can be read at a higher speed, and random access in units of blocks becomes possible.
- a sample including all blocks in the GOB may be set.
- a total of two samples are set for the DSD lossless stream 171: a sample including a GOB initializer and a sample including all blocks in the GOB.
- the sample entry setting unit 122 further sets a sample including all the blocks in the GOB.
- a sample including a GOB initializer may be set as a sync sample.
- an ellipse 173 indicates a sync sample.
- the sync sample box setting unit 123 sets a sample including the GOB initializer as a sync sample. In this case, although the sync sample includes the GOB initializer but does not include the block, the reproduction time of this sample is set to zero.
- a sample including a GOB initializer which is different from a sample including all blocks in the GOB in which subsamples are set as in (C-3), is set as a sync sample. You may do it.
- the sync sample box setting unit 123 sets the sync sample in this way.
- a sample including the GOB initializer and block 1 and blocks 2 to 10 are included.
- a sample may be set.
- a total of two samples, a sample including the GOB initializer and the block 1, and a sample including the blocks 2 to 10 are set for the DSD lossless stream 171.
- the sample entry setting unit 122 sets such two samples.
- a sample including the GOB initializer and block 1 may be set as a sync sample.
- the sync sample box setting unit 123 sets a sample including the GOB initializer and block 1 as a sync sample.
- the playback time of this sample is the playback time of block 1.
- the GOB initializer portion is compared with the sample including the GOB initializer and block 1. And the portion of block 1 may be set as subsamples. In other words, in this case, a total of 11 subsamples including a subsample including the GOB initializer, a subsample including block 1, a subsample including block 2,..., And a subsample including block 10 are set. Yes.
- the subsample information box setting unit 124 sets such a subsample.
- the sample including the GOB initializer and block 1 is used as a sync sample instead of the sample including blocks 2 to 10 in which subsamples are set as in (C-7). You may make it set.
- the sync sample box setting unit 123 sets the sync sample in this way.
- the sample including the GOB initializer and the block 1 in which the subsample is set as in (C-8) may be set as the sync sample.
- the sync sample box setting unit 123 sets the sync sample in this way.
- sample delta (sample_delta), subsample count (subsample_count), and subsample size (subsample_size_1, ..., subsample_size_10) in the subsample information box are shown in FIG. Is set as follows.
- the value of the sample delta is set to “2”.
- 10 subsamples are set for the samples including the blocks 1 to 10, the value of the subsample count is set to “10”.
- the size of each subsample is set as each subsample size (for example, x31, ..., x310, y31, ..., y310, z31, ..., z310).
- parameters such as sample delta (sample_delta), subsample count (subsample_count), subsample size (subsample_size_1, ..., subsample_size_9) in the subsample information box are as shown in FIG. Is set as follows.
- the value of the sample delta is set to “2”.
- the value of the subsample count is set to “9”.
- the size of each subsample is set as each subsample size (for example, x41, ..., x49, y41, ..., y49, z41, ..., z49).
- sample delta (sample_delta), subsample count (subsample_count), subsample size (subsample_size_1, ..., subsample_size_9) in the subsample information box are as shown in FIG. Is set.
- the value of the sample delta is set to “1”.
- the value of the subsample count is “2”. "Or” 9 ".
- the size of each subsample is set as each subsample size (for example, 25, x41, y51, ..., y59, 25, z51).
- Decoding can be controlled in GOB units by setting samples as in (C-2). Thereby, an increase in load can be suppressed as compared with the case of sampling for each quantized sample.
- a sample including a GOB initializer is set for each GOB, random access is possible at least for each GOB. Access to a block in the middle of GOB may be sequentially decoded from block 1 and the output result before the desired block may be discarded in output control.
- each of the blocks 1 to 10 can be extracted as a sub-block, so that random access in units of blocks becomes possible.
- the DSD lossless stream can be stored in the MP4 file, and higher quality audio data can be transmitted.
- the DSD generation unit 111 of the file generation apparatus 101 ⁇ -modulates the audio analog signal in step S101 to generate DSD data.
- the DSD encoding unit 112 encodes the DSD data generated in step S101 using the new DSD lossless compression encoding method described above, and generates a DSD lossless stream.
- the MP4 file generation device 131 (that is, the MP4 file generation unit 113 and the setting unit 114) generates an MP4 file that stores the DSD lossless stream generated in step S102 by executing the MP4 file generation process. To do. This MP4 file generation process will be described later.
- the MP4 file generation unit 113 provides the generated MP4 file to the distribution server 102 in step S104.
- the distribution data generation process ends.
- the sample table box setting unit 121 sets a sample table box in step S111.
- the sample entry setting unit 122 sets a sample entry.
- the sample entry setting unit 122 refers to the .afr file and sets a byte position (samplesize) to be divided into samples. That is, the sample entry setting unit 122 assigns each block to a sample based on the .afr file.
- the sample entry setting unit 122 performs, for example, the above-described (A-1) to (A-6), (B-1) to (B-6), and (C-1) to (C-10). ) Apply one of the methods.
- the sample entry setting unit 122 refers to the .esd file and stores the decoder configuration information, which is information necessary for decoding the GOB, in the sample entry. That is, the sample entry setting unit 122 sets a GOB initializer including decoder configuration information based on the .esd file and assigns it to the sample.
- the sample entry setting unit 122 performs, for example, the above-described (A-1) to (A-6), (B-1) to (B-6), and (C-1) to (C-10). ) Apply one of the methods.
- step S115 the sync sample box setting unit 123 refers to the .afr file, creates a list of samples including the GOB initializer, and sets a sync sample box for storing the list.
- step S116 when operating the subsample, the subsample information box setting unit 124 checks the GOB initializer range, the block boundary, etc. in each sample, and sets the subsample information box based on the information. Various parameters such as sample delta, subsample count, and subsample size are set.
- step S117 the setting unit 114 generates a file type compatibility box (ftyp).
- step S118 the setting unit 114 generates a movie box according to the setting. That is, the setting unit 114 generates a movie box (moov) that stores the sample table box set as described above.
- step S119 the setting unit 114 generates a media data box (mdat) and stores the DSD lossless stream.
- a DSD lossless stream can be stored in an MP4 file, and higher-quality audio data can be transmitted using MPEG-DASH.
- the playback terminal 103 acquires an MP4 file from the distribution server 102, executes playback processing to play back the MP4 file, and outputs audio data. An example of the flow of this reproduction process will be described with reference to the flowchart of FIG.
- the MP4 file acquisition unit 141 of the reproduction terminal 103 acquires the MP4 file distributed from the distribution server 102 in step S131.
- the MP4 file playback device 161 (for example, the DSD decoding unit 142, the output control unit 143, and the control unit 145) performs a decoding process, extracts and decodes the DSD lossless stream from the MP4 file, and obtains the obtained DSD. Output starts from the data playback start time.
- step S133 the output unit 144 outputs the sound (audio analog signal) reproduced by the process in step S132.
- the sample table box analysis unit 151 refers to the sample table box of the MP4 file acquired by the MP4 file acquisition unit 141 in step S141, and chunks of the DSD lossless track corresponding to the playback start time.
- the byte position of the sync sample and the decoding start sample is specified.
- step S142 the sample entry analysis unit 154 refers to the sample entry.
- step S143 the sample entry analysis unit 154 determines whether or not there is decoder configuration information. If it is determined that the file does not exist, the process proceeds to step S144.
- step S144 the sample entry analysis unit 154 reads the sample data indicated by the sync sample from the MP4 file and acquires the GOB initializer.
- the process proceeds to step S145. If it is determined in step S143 that there is decoder configuration information, the process in step S144 is omitted, and the process proceeds to step S145.
- step S145 the decoder configuration information setting unit 155 sets the decoder configuration information included in the MP4 file in the decoder.
- step S146 the DSD decoding unit 142 reads data (DSD lossless stream) from the start byte position of the decoding start sample of the MP4 file acquired by the MP4 file acquisition unit 141 based on the decoder configuration information.
- step S147 the DSD decoding unit 142 starts decoding the read data.
- step S148 the playback control unit 156 specifies the playback start time to the output control unit 143.
- the output control unit 143 starts outputting the DSD data obtained by the process of step S147 from the designated time.
- step S148 When the process of step S148 is completed, the decoding process is completed, and the process returns to FIG.
- the boundary of a fragment may be made to coincide with one of GOB boundaries.
- the top of the fragment is the GOB head.
- a plurality of GOBs may be stored in one fragment.
- Second Embodiment> ⁇ Separation of parameter set and elementary stream>
- the GOB initializer (sample) necessary for decoding may be stored in a track different from the track (track) storing the block (sample).
- the GOB initializer which is a parameter set required for decoding, is at the beginning of each GOB and changes over time. By configuring this parameter set portion as a single track track, it is possible to more easily access and read the parameter set.
- a track is a sequence of samples (or chunks).
- the GOB initializer header, config, GOB data (code book)
- the sample duration is 1 GOB playback time.
- the GOB of the DSD lossless stream has a structure as shown in FIG. That is, the DSD lossless stream 171 for 1 GOB has a GOB initializer and 10 blocks (Blook1 to Blook10).
- An example of an MP4 file storing such a DSD lossless stream 171 is shown in FIG.
- the media data box (mdat) of the MP4 file 181 stores the GOB initializer of the DSD lossless stream 171 and the data of each block.
- the movie box (moov) two tracks, a DSD lossless parameter set track (DSD lossless parameter set track) and a DSD lossless elementary stream track (DSD lossless elementary track) are formed.
- the DSD lossless parameter set track is a track for storing parameter set management information necessary for decoding the DSD lossless stream.
- the DSD lossless elementary stream track is a track for storing data management information of blocks of the DSD lossless stream.
- DSD lossless stream parameter set management information is stored. That is, information regarding each GOB initializer of the media data box (mdat) is stored in this sample entry (dsdp).
- the management information of the elementary stream of the DSD lossless stream is stored in the sample entry (dsde) formed in the sample description box (stsd) of the DSD lossless elementary stream track. That is, information regarding each block of the media data box (mdat) is stored in the sample entry (dsde).
- any of the methods described above with reference to FIGS. 21 to 27 may be applied to assign samples in the DSD lossless elementary stream track. That is, in the example of B in FIG. 33, one sample is assigned to one block, but the way of assigning samples is not limited to this example.
- decoder configuration information (setting information necessary for decoding; set in the decoder before starting decoding) can be acquired at higher speed.
- header information can be acquired at high speed during random access. Therefore, it can be expected that the time until the start of reproduction is shortened and the stream switching speed is increased.
- the GOB initializer sample is a sample with a playback time of zero. By separating such a sample with a playback time of 0 as a separate track, it is possible to prevent a sample other than a sample indicating a block from being included in the track. That is, it is possible to prevent samples having different properties from being mixed in one track, and to manage information more easily.
- sample of the DSD lossless elementary track may include the GOB initializer and block 1 (first block). That is, in the DSD lossless elementary track, the GOB initializer may be added to the first block (block 1) of the GOB.
- the GOB of the DSD lossless stream has a structure as shown in FIG. That is, the DSD lossless stream 171 for 1 GOB has a GOB initializer and 10 blocks (Blook1 to Blook10).
- An example of an MP4 file for storing such a DSD lossless stream 171 is shown in FIG. 34B.
- the configuration example of the MP4 file 182 shown in FIG. 34B is basically the same as that of the MP4 file 181 shown in FIG. 33B.
- a GOB initializer is added to the first block (block 1) of GOB. Since the data in the media data box referred to by each track may be duplicated, such a sample allocation method is also possible. By using such an allocation method, it is possible to perform decoding only with the sample information of the DSD lossless elementary track.
- the sample table box setting unit 121 sets a sample table box in step S161, and sets a parameter set track (DSD lossless parameter set track) and an elementary stream track (DSD lossless element). Mental stream track).
- the sample entry setting unit 122 sets a sample entry ('dsdp' and 'dsde') for each track.
- the sample entry setting unit 122 refers to the .afr file and sets a byte position (samplesize) to be divided into samples. That is, the sample entry setting unit 122 assigns each block to a sample of the DSD lossless elementary stream track based on the .afr file.
- the sample entry setting unit 122 performs, for example, the above-described (A-1) to (A-6), (B-1) to (B-6), and (C-1) to (C-10). ) Apply one of the methods.
- the sample entry setting unit 122 refers to the .esd file and stores the decoder configuration information in the sample entry ('dsdp') of the DSD lossless parameter set track. That is, the sample entry setting unit 122 assigns the GOB initializer including the decoder configuration information to the sample of the DSD lossless parameter set track based on the .esd file.
- the sample entry setting unit 122 performs, for example, the above-described (A-1) to (A-6), (B-1) to (B-6), and (C-1) to (C-10). ) Apply one of the methods.
- step S165 to step S169 is executed in the same manner as each process from step S115 to step S119 in FIG.
- the MP4 file generation process ends, and the process returns to FIG.
- the DSD lossless stream can be stored in the MP4 file, and higher-quality audio data can be transmitted using MPEG-DASH.
- the sample table box analysis unit 151 refers to the sample table box of the DSD lossless elementary stream track of the MP4 file acquired by the MP4 file acquisition unit 141 in step S181, and starts the playback start time. This specifies the DSD lossless track chunk, sync sample, byte position of the decode start sample, etc.
- step S182 the sample entry analysis unit 154 refers to the sample entry ('dedp') of the DSD lossless parameter set track.
- step S183 the sample entry analysis unit 154 determines whether or not decoder configuration information exists. If it is determined that it does not exist, the process proceeds to step S184. In this case, in step S184, the sample entry analysis unit 154 reads the sample data indicated by the sync sample from the sample entry ('dedp') of the DSD lossless parameter set track, and acquires the GOB initializer. When the GOB initializer is acquired, the process proceeds to step S185. If it is determined in step S183 that there is decoder configuration information, the process in step S184 is omitted, and the process proceeds to step S185.
- step S185 to step S188 is executed in the same manner as each process from step S145 to step S148 in FIG.
- the decoding process ends, and the process returns to FIG.
- the DSD lossless parameter set track and the DSD lossless elementary stream track described above may be different files.
- a DSD lossless stream obtained by DSD lossless encoding of uncompressed DSD data cannot be decoded without a GOB initializer.
- the present technique may be applied to a DRM (Digital Rights Management) system using a DSD lossless stream as an encrypted stream, and using an MP4 file including a GOB initializer as decryption key information.
- DRM Digital Rights Management
- a method of managing playback by DRM can be considered.
- the GOB initializer and the block are stored in different MP4 files, and the MP4 file in which the GOB initializer is stored is separately obtained as decryption key information necessary for decoding the MP4 file in which the block is stored. You may make it deliver.
- MP4 files containing only blocks should be shared and widely distributed. For example, copying may be permitted. However, since this MP4 file does not include the GOB initializer, it is not possible to play back content by itself.
- the protection scheme infobox (sinf) is provided in the sample entry as shown in A of FIG.
- the protection scheme info box (sinf) is provided with boxes such as an original format box (Original Format Box) and a scheme type box (Scheme Type Box).
- FIG. 37B shows examples of syntaxes of the protection scheme info box, the original format box, and the scheme type box.
- the original format box stores information about the stream before encryption.
- uncompressed DSD data is regarded as this pre-encrypted stream.
- the value of the parameter original_format indicating the data format of the stream before encryption is set to 'dsd0' (uncompressed DSD data).
- the scheme type box stores information related to decryption of the encrypted stream.
- a DSD lossless stream obtained by encoding DSD data using a new DSD lossless encoding method is regarded as an encrypted stream.
- the value of the parameter sheme_type indicating the encryption method (encoding method) is set to 'dsde' (new DSD lossless compression method).
- link information to the license file is stored in the scheme type box.
- the specification of the license file is arbitrary, but for example, link information to a GOB initializer that is a decryption key is described.
- the user who acquired the MP4 file 201 in which only the block is stored becomes a legitimate user who can reproduce the content by paying, for example, and the DSD of the MP4 file 201
- the license file 202 is acquired based on the information in the protection scheme infobox (sinf) of the lossless elementary stream track.
- the MP4 file 203 in which the GOB initializer corresponding to the MP4 file 201 is stored is acquired.
- the user can play the MP4 file 201 and view the content.
- a DRM system can be constructed by applying this technology and storing the GOB initializer and block in different MP4 files.
- the correspondence relationship between the MP4 file 201 and the MP4 file 203 is expressed via the license file 202 so that, for example, when the correspondence relationship is updated, the update is performed more easily. be able to. That is, management of correspondence becomes easier.
- the file containing the GOB initializer is not limited to the MP4 file, but if it is an MP4 file, the time from the beginning of the file can be known from the information below the sample table box, and the player can associate the GOB initializer with the block.
- the management information of the DSD lossless stream may be stored in the audio sample entry (AudioSampleEntryV1).
- FIG. 39 shows an example of the syntax of the audio sample entry.
- an audio format identifier is set. For example, in the case of DSD data, an identifier “dsd1” indicating DSD data is set. In the channel count (channelcount), the number of channels is set. For example, when the DSD data is 2 channels, the value “2” is set to this parameter. In the sample size (samplesize), the bit depth of the audio data is set. For example, in the case of DSD data, since the bit depth of the DSD data is 1 bit, a value “1” is set in this parameter. The sample rate (samplerate) is set to a value “AC444.10000h” indicating a fixed value “44.1 kHz”. This value is a dummy value, and a correct value is set in an expanded box described later.
- Sampling rate is a parameter set in the sampling rate box (SamplingRateBox) which is an extension box.
- sampling rate box SamplingRateBox
- a correct value is set for this sampling rate (sampling_rate). For example, when the sampling frequency of the DSD data is 2.8 MHz, the value “00 2b 11 00h” (2822400 Hz) is set in this parameter. For example, when the sampling frequency of DSD data is 5.6 MHz, the value “00 56 22 00h” (564480000 Hz) is set in this parameter.
- sampling frequency of the DSD data is 11.2 MHz
- a value “00 AC 44 00h” (11289600 Hz) is set in this parameter.
- the media time scale (media (timescale) is set to the same value as this sampling rate (sampling_rate) or sample rate (samplerate).
- FIG. 40 is a block diagram illustrating a main configuration example of the file generation apparatus 101 in this case.
- the file generation apparatus 101 in this case includes an audio sample entry setting unit 211 in addition to the configuration described with reference to FIG.
- the audio sample entry setting unit 211 performs processing related to the setting of the audio sample entry.
- the sample table box setting unit 121 sets a sample table box in step S201.
- the audio sample entry setting unit 211 sets an audio sample entry.
- the audio sample entry setting unit 211 refers to the .afr file and sets a byte position (samplesize) to be divided into samples. That is, the audio sample entry setting unit 211 assigns each block to a sample based on the .afr file.
- the audio sample entry setting unit 211 performs, for example, the above-described (A-1) to (A-6), (B-1) to (B-6), and (C-1) to (C-). Apply one of the methods in 10).
- the audio sample entry setting unit 211 refers to the .esd file and stores the decoder configuration information, which is information necessary for decoding the GOB, in the audio sample entry. That is, the audio sample entry setting unit 211 sets a GOB initializer including decoder configuration information based on the .esd file and assigns it to the sample. As this allocation method, the audio sample entry setting unit 211 performs, for example, the above-described (A-1) to (A-6), (B-1) to (B-6), and (C-1) to (C-). Apply one of the methods in 10).
- step S205 to step S209 is executed in the same manner as each processing from step S115 to step S119 in FIG.
- the MP4 file generation process ends, and the process returns to FIG.
- the DSD lossless stream can be stored in the MP4 file, and information (management information) regarding the DSD lossless stream can be stored in the audio sample entry (AudioSampleEntryV1). Therefore, higher quality audio data can be transmitted using MPEG-DASH.
- FIG. 42 is a block diagram illustrating a main configuration example of the playback terminal 103 in this case.
- the playback terminal 103 in this case includes an audio sample entry analysis unit 221 in addition to the configuration described with reference to FIG.
- the audio sample entry analysis unit 221 performs processing related to the analysis of the audio sample entry.
- the sample table box analysis unit 151 refers to the sample table box of the MP4 file acquired by the MP4 file acquisition unit 141 in step S231, and the chunk of the DSD lossless track corresponding to the playback start time.
- the byte position of the sync sample and the decoding start sample is specified.
- the audio sample entry analysis unit 221 refers to the audio sample entry (AudioSampleEntryV1).
- step S233 the audio sample entry analysis unit 221 determines whether there is no decoder configuration information. If it is determined that it does not exist, the process proceeds to step S234.
- step S234 the audio sample entry analysis unit 221 reads the sample data indicated by the sync sample from the MP4 file and acquires the GOB initializer.
- the process proceeds to step S235. If it is determined in step S233 that the decoder configuration information exists, the process in step S234 is omitted, and the process proceeds to step S235.
- step S235 to step S238 is executed in the same manner as each process from step S145 to step S148 in FIG.
- the decoding process ends, and the process returns to FIG.
- the DSD lossless stream that is stored and transmitted in the MP4 file is decoded using the information (management information) related to the DSD lossless stream stored in the audio sample entry (AudioSampleEntryV1). Audio data can be output. That is, higher quality audio data can be transmitted using MPEG-DASH.
- an extension box may be newly defined in the audio sample entry (AudioSampleEntryV1), and information (config information) unique to the DSD lossless encoding method may be stored in the extension box. This makes it possible to know the stream attributes (decoder configuration information) without accessing the media data box (mdat), and can be expected to speed up the playback process during random access. .
- this extension box may store basic decoder configuration information common to the streams, for example.
- format_version and DSD_lossless_gob_configuration may be read from the DSD lossless payload (DSD_lossless_payload ()) of the DSD lossless stream and stored in this extension box (DSDSpecificBox ()).
- an MP4 file 231 configured as shown in FIG. 45B is generated for the DSD lossless stream 171 configured as shown in FIG.
- an extension box ('dsc1') is provided in the DSD audio sample entry (DSDAudioSampleEntryV1) ('dsd1'), and this extension box ('dsc1') is common to the streams. Only basic parameters are copied.
- GOB-specific information may be stored in this extension box (DSDSpecificBox ()).
- DSDSpecificBox () An example of the syntax of the extension box (DSDSpecificBox ()) in this case is shown in FIG.
- DSD_lossless_gob_header () and DSD_lossless_gob_data () (codebook) may also be read and stored in this extension box (DSDSpecificBox ()).
- DSD_lossless_gob_header () and DSD_lossless_gob_data () are read from DSD_lossless_gob (). That is, these pieces of information are information included in the GOB initializer and include information specific to the GOB.
- an MP4 file 232 having a configuration as shown in FIG. 46B is generated for a DSD lossless stream 171 having a configuration as shown in FIG.
- a plurality of extension boxes ('dsc2') are provided in the DSD audio sample entry (DSDAudioSampleEntryV1) ('dsd1'), and each extension box ('dsc2') has its own.
- Information (decoder configuration information) necessary for decoding the corresponding GOB is stored.
- the GOB corresponding to the extension box can be decoded. That is, GOB can be decoded without referring to the GOB initializer, and processing can be performed at higher speed.
- FIG. 47 is a block diagram showing a main configuration example of the file generation apparatus 101 in this case.
- the file generation apparatus 101 in this case includes a DSD audio sample entry setting unit 241 in addition to the configuration described with reference to FIG.
- the DSD audio sample entry setting unit 241 performs processing related to audio sample entry settings, extension box settings, and the like.
- the sample table box setting unit 121 sets a sample table box in step S251.
- the DSD audio sample entry setting unit 241 sets a DSD audio sample entry, and further sets an extension box (DSDSpecificBox) therein.
- the DSD audio sample entry setting unit 241 refers to the .afr file and sets a byte position (samplesize) to be divided into samples. That is, the DSD audio sample entry setting unit 241 assigns each block to a sample based on the .afr file. As this allocation method, the DSD audio sample entry setting unit 241 performs, for example, the above-described (A-1) to (A-6), (B-1) to (B-6), and (C-1) to (C-1). Apply one of the methods in -10).
- the DSD audio sample entry setting unit 241 refers to the .esd file and stores the decoder configuration information in the extension box (DSDSpecificBox) of the DSD audio sample entry. That is, the DSD audio sample entry setting unit 241 sets a GOB initializer including decoder configuration information based on the .esd file and assigns it to the sample. As this allocation method, the DSD audio sample entry setting unit 241 performs, for example, the above-described (A-1) to (A-6), (B-1) to (B-6), and (C-1) to (C-1). Apply one of the methods in -10).
- step S255 to step S259 is executed in the same manner as each process from step S115 to step S119 in FIG.
- the MP4 file generation process ends, and the process returns to FIG.
- the DSD lossless stream can be stored in the MP4 file, and information (management information) about the DSD lossless stream is stored in the extension box (DSDSpecificBox) of the DSD audio sample entry (DSDAudioSampleEntryV1) Can be stored. Therefore, higher quality audio data can be transmitted using MPEG-DASH.
- FIG. 49 is a block diagram illustrating a main configuration example of the playback terminal 103 in this case.
- the playback terminal 103 in this case includes a DSD audio sample entry analysis unit 251 in addition to the configuration described with reference to FIG.
- the DSD audio sample entry analysis unit 251 performs processing relating to analysis of audio sample entries, analysis of expansion boxes, and the like.
- the sample table box analysis unit 151 refers to the sample table box of the MP4 file acquired by the MP4 file acquisition unit 141 in step S271, and chunks of the DSD lossless track corresponding to the playback start time.
- the byte position of the sync sample and the decoding start sample is specified.
- the DSD audio sample entry analysis unit 251 refers to the DSD audio sample entry (DSDAudioSampleEntryV1) and further refers to the extension box (DSDSpecificBox).
- step S273 the DSD audio sample entry analysis unit 251 sets the decoder configuration information stored in the extension box (DSDSpecificBox) in the decoder.
- step S274 to step S276 is executed in the same manner as each process from step S146 to step S148 in FIG.
- the decoding process ends, and the process returns to FIG.
- the decoder configuration information stored in the extension box (DSDSpecificBox) of the DSD audio sample entry (DSDAudioSampleEntryV1) is used to decode the GOB and output the audio data.
- DSDSpecificBox the extension box of the DSD audio sample entry
- DSDAudioSampleEntryV1 the decoder configuration information stored in the extension box (DSDSpecificBox) of the DSD audio sample entry (DSDAudioSampleEntryV1) is used to decode the GOB and output the audio data.
- DSDSpecificBox extension box
- DSDAudioSampleEntryV1 DSD audio sample entry
- the system, device, processing unit, etc. to which this technology is applied can be used in any field such as traffic, medical care, crime prevention, agriculture, livestock industry, mining, beauty, factory, home appliance, weather, nature monitoring, etc. .
- the present technology can also be applied to a system or device that transmits an image used for viewing.
- the present technology can be applied to a system or a device that is used for transportation.
- the present technology can also be applied to a system or device used for security.
- the present technology can be applied to a system or a device provided for sports.
- the present technology can also be applied to a system or a device provided for agriculture.
- the present technology can also be applied to a system or device used for livestock industry.
- the present technology can also be applied to systems and devices that monitor natural conditions such as volcanoes, forests, and oceans.
- the present technology can be applied to, for example, a weather observation system or a weather observation apparatus that observes weather, temperature, humidity, wind speed, sunshine duration, and the like.
- the present technology can also be applied to systems and devices for observing the ecology of wildlife such as birds, fish, reptiles, amphibians, mammals, insects, and plants.
- ⁇ Computer> The series of processes described above can be executed by hardware or can be executed by software.
- a program constituting the software is installed in the computer.
- the computer includes, for example, a general-purpose personal computer that can execute various functions by installing a computer incorporated in dedicated hardware and various programs.
- FIG. 51 is a block diagram showing an example of the hardware configuration of a computer that executes the above-described series of processing by a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- An input / output interface 910 is also connected to the bus 904.
- An input unit 911, an output unit 912, a storage unit 913, a communication unit 914, and a drive 915 are connected to the input / output interface 910.
- the input unit 911 includes, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, and the like.
- the output unit 912 includes, for example, a display, a speaker, an output terminal, and the like.
- the storage unit 913 includes, for example, a hard disk, a RAM disk, a nonvolatile memory, and the like.
- the communication unit 914 includes a network interface, for example.
- the drive 915 drives a removable medium 921 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
- the CPU 901 loads the program stored in the storage unit 913 to the RAM 903 via the input / output interface 910 and the bus 904 and executes the program. A series of processing is performed.
- the RAM 903 also appropriately stores data necessary for the CPU 901 to execute various processes.
- the program executed by the computer 900 can be recorded and applied to, for example, a removable medium 921 as a package medium or the like.
- the program can be installed in the storage unit 913 via the input / output interface 910 by attaching the removable medium 921 to the drive 915.
- This program can also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting. In that case, the program can be received by the communication unit 914 and installed in the storage unit 913.
- a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
- the program can be received by the communication unit 914 and installed in the storage unit 913.
- this program can be installed in advance in the ROM 902, the storage unit 913, or the like.
- ⁇ Others> In this specification, an example in which various types of information are multiplexed with encoded data (bitstream) and transmitted from the encoding side to the decoding side has been described. However, the method of transmitting such information is such an example. It is not limited. For example, these pieces of information may be transmitted or recorded as separate data associated with the encoded data without being multiplexed with the encoded data.
- the term “associate” means that, for example, an image (which may be a part of an image such as a slice or a block) included in encoded data and information corresponding to the image can be linked at the time of decoding.
- the information associated with the encoded data (image) may be transmitted on a different transmission path from the encoded data (image).
- the information associated with the encoded data (image) may be recorded on a recording medium different from the encoded data (image) (or another recording area of the same recording medium).
- the image and information corresponding to the image may be associated with each other in an arbitrary unit such as a plurality of frames, one frame, or a part of the frame.
- the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Accordingly, a plurality of devices housed in separate housings and connected via a network and a single device housing a plurality of modules in one housing are all systems. .
- the configuration described as one device (or processing unit) may be divided and configured as a plurality of devices (or processing units).
- the configurations described above as a plurality of devices (or processing units) may be combined into a single device (or processing unit).
- a configuration other than that described above may be added to the configuration of each device (or each processing unit).
- a part of the configuration of a certain device (or processing unit) may be included in the configuration of another device (or other processing unit). .
- the present technology can take a configuration of cloud computing in which one function is shared and processed by a plurality of devices via a network.
- the above-described program can be executed in an arbitrary device.
- the device may have necessary functions (functional blocks and the like) so that necessary information can be obtained.
- each step described in the above flowchart can be executed by one device or can be executed by a plurality of devices. Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.
- the program executed by the computer may be executed in a time series in the order described in this specification for the processing of the steps describing the program, or in parallel or called. It may be executed individually at a necessary timing. That is, as long as no contradiction occurs, the processing of each step may be executed in an order different from the order described above. Furthermore, the processing of the steps describing this program may be executed in parallel with the processing of other programs, or may be executed in combination with the processing of other programs.
- An information processing apparatus comprising: a sample setting unit configured to set a sample including initialization information used for decoding the group of blocks as a sample which is a minimum access unit in the file.
- the method further includes a sync sample setting unit configured to set a sample including the initialization information, which is set by the sample setting unit, as a sync sample including information necessary for starting decoding.
- Information processing device comprising: a sample setting unit configured to set a sample including initialization information used for decoding the group of blocks as a sample which is a minimum access unit in the file.
- the sample setting unit sets a sample including the initialization information and a head block of the group, and a sample for each block of the other block of the group.
- (1) to (3) The information processing apparatus according to any one of the above.
- a sync sample setting unit configured to set a sample including the initialization information and the first block of the group set by the sample setting unit to a sync sample including information necessary for starting decoding.
- the information processing apparatus according to any one of 1) to (4).
- a subsample including the initialization information and a subsample including the first block of the group are set in the sample including the initialization information and the first block of the group set by the sample setting unit.
- the information processing apparatus according to any one of (1) to (5), further including a subsample setting unit.
- a sync sample setting unit configured to set a sample including the initialization information and the first block of the group set by the sample setting unit as a sync sample including information necessary for starting decoding.
- the information processing apparatus according to any one of 1) to (6).
- the information processing apparatus according to any one of (1) to (8), further including a subsample setting unit configured to set the subsample.
- a sync sample setting unit configured to set the sample including the initialization information set by the sample setting unit to a sync sample including information necessary for starting decoding;
- the information processing apparatus according to any one of (1) to (11), wherein the sample setting unit is further configured to set a sample including all blocks of the group.
- the sample setting unit is configured to set a sample including the initialization information and a head block of the group, and a sample including all other blocks of the group,
- a sync sample setting unit configured to set a sample including the initialization information and the first block of the group set by the sample setting unit to a sync sample including information necessary for starting decoding;
- the information processing apparatus according to any one of (13).
- the method further includes a subsample setting unit configured to set a subsample for each block in the sample set by the sample setting unit and including all the other blocks of the group.
- the sub-sample setting unit further includes a sample including the initialization information and a head block of the group set by the sample setting unit, a sub-sample including the initialization information, and a head of the group
- the information processing apparatus according to any one of (1) to (15).
- the sample setting unit further includes a sample including the initialization information, a sample including the first block of the group, a sample for each block of the other block of the group, and a sample including the initialization information.
- the information processing apparatus according to any one of (1) to (17), wherein the track is set to a track different from the set track.
- the sample setting unit further sets the sample for each block to a file different from a file in which the sample including the initialization information is set. apparatus.
- the sample setting unit further sets a protection scheme infobox that stores information related to DRM (Digital Rights Management).
- the information processing apparatus according to any one of (1) to (20), further including an audio sample entry setting unit that sets information regarding the audio data in an audio sample entry.
- the information processing device according to any one of (1) to (25), wherein the audio sample entry setting unit sets the same value as the sampling_rate of the sampling rate box in the field sampling_rate.
- the information processing apparatus according to any one of (1) to (26), further including an extension box setting unit that sets information related to the audio data in an extension box of an audio sample entry.
- the information processing apparatus according to any one of (1) to (27), wherein the extension box setting unit stores information specific to the encoding method of the encoded data in the extension box.
- the information processing apparatus according to any one of (1) to (28), wherein the extension box setting unit stores DSD_lossless_gob_configuration in the extension box.
- the information processing apparatus according to any one of (1) to (29), wherein the extension box setting unit further stores DSD_lossless_gob_header and DSD_lossless_gob_data in the extension box.
- the initialization information includes decoder configuration information used for decoding the encoded data.
- the initialization information includes information related to a fragment boundary.
- the audio data is DSD (Direct Stream Digital) data, and the encoded data is obtained by lossless encoding of the DSD data.
- Information processing device is DSD (Direct Stream Digital) data, and the encoded data is obtained by lossless encoding of the DSD data.
- the information processing apparatus according to any one of (1) to (33), wherein the file format is a file format compliant with ISO / IEC14496. (35) The information processing apparatus according to any one of (1) to (34), further including a file generation unit that generates the file based on settings of the sample setting unit. (36) The information processing apparatus according to any one of (1) to (35), further including: an encoding unit that performs lossless encoding of the audio data to generate the encoded data. (37) The information processing apparatus according to any one of (1) to (36), further including an audio data generation unit that generates the audio data.
- a file of a predetermined file format for storing the encoded data which is encoded data of audio data and has a structure in which blocks that are access units of the encoded data are grouped by a predetermined number
- An information processing apparatus comprising: a decoding unit that decodes the encoded data using the decoder configuration information set by the setting unit. (40) Analyzing a sample including initialization information which is a minimum access unit in the file of a file of a predetermined file format for storing encoded data of audio data and is used for decoding the group of blocks; Obtaining decoder configuration information used for decoding the encoded data based on the analysis result; Set the obtained decoder configuration information, An information processing method for decoding the encoded data using the set decoder configuration information.
- 100 distribution system 101 file generation device, 102 distribution server, 103 playback terminal, 104 network, 111 DSD generation unit, 112 DSD encoding unit, 113 MP4 file generation unit, 114 setting unit, 121 sample table box setting unit, 122 samples Entry setting unit, 123 Sync sample box setting unit, 124 Subsample information box setting unit, 131 MP4 file generation device, 132 MP4 file generation device, 141 MP4 file acquisition unit, 142 DSD decoding unit, 143 output control unit, 144 output unit , 145 control unit, 151 sample table box analysis unit, 152 subsample information box analysis unit, 153 sync sample Analysis unit, 154 sample entry analysis unit, 155 decoder configuration information setting unit, 156 playback control unit, 171 DSD lossless stream, 181 and 182 MP4 file, 201 MP4 file, 202 license file, 203 MP4 file, 211 audio sample entry Setting unit, 221 audio sample entry analysis unit, 231 and 232 MP4 files, 241 DSD audio sample entry setting
Abstract
Description
1.DSDロスレスストリームのMP4ファイル化
2.第1の実施の形態(配信システム:サンプルとブロックの対応付け)
3.第2の実施の形態(配信システム:パラメータセットとエレメンタリストリームの分離)
4.第3の実施の形態(配信システム:オーディオサンプルエントリの設定)
5.第4の実施の形態(配信システム:拡張ボックスの設定)
6.その他
<映像や音声の配信>
近年、映像と音楽を消費者に届ける手段として、インターネットを介したストリーミング配信が期待されている。しかしながら、伝送手段としてのインターネットは、放送や光ディスクと比べて伝送が不安定である。まずユーザの環境によって伝送帯域の最高レートが大きく変わる。さらに同一ユーザであっても常に一定の伝送帯域が確保されていることはなく、時間の経過で変動する。また伝送帯域が変動するということは、クライアントからの要求に対する応答時間が一定ではないということでもある。
MPEG-DASHを用いたデータ伝送の様子の例を図1に示す。図1の情報処理システム1において、ファイル生成装置2は、動画コンテンツとして、ビデオデータやオーディオデータを生成し、符号化し、伝送用のファイルフォーマットでファイル化する。例えば、ファイル生成装置2は、これらのデータを10秒程度の時間毎にファイル化する(セグメント化する)。ファイル生成装置2は、生成したセグメントファイルを、Webサーバ3にアップロードする。また、ファイル生成装置2は、動画コンテンツを管理するMPDファイル(管理ファイル)を生成し、それをWebサーバ3にアップロードする。
ところで、映像や音楽のデータは高品質化が進み、それとともに、配信においてもより高品質なデータの配信が求められている。例えば、音声信号の高品位な変調方式としてDSD(Direct Stream Digital)が知られている(図2)。図2に示されるように、PCM(Pulse Code Modulation)の場合、オーディオアナログ信号の各サンプリング時刻の信号値が固定数ビットのデジタルデータに変換されるのに対して、DSDの場合、オーディオアナログ信号がΔΣ変調され、1ビットのデジタルデータに変換される。
例えば、DSDデータの可逆圧縮符号化方式として、SACD(Super Audio Compact Disc)向けに開発しMPEG4 AAC(Advanced Audio Coding)(IEC/ISO(International Organization for Standardization / International Electrotechnical Commission) 14496-3)で規格化されているDST(Direct Stream Transfer)がある。しかしながら、このDSTは負荷が大き過ぎて、ソフトウエアでの処理には適さない。
そこで、DSTとは異なる手法で、組み込み系プロセッサでのソフトウエア処理でも実現できる新たなDSD可逆圧縮符号化方式が開発された。この新たなDSD可逆圧縮符号化方式で生成したDSDロスレスストリームを配信に用いることで、伝送に必要な帯域を抑えることが可能になるとともに、PCやモバイル端末のようなクライアントでのソフトウエア処理でのリアルタイムデコードが期待できる。
このような新たなDSD可逆圧縮符号化方式に対応する圧縮符号化装置の主な構成例を図4に示す。図4に示される圧縮符号化装置10は、アナログのオーディオ信号をΣ△(シグマデルタ)変調によりデジタル信号に変換し、変換後のオーディオ信号を圧縮符号化して出力する装置である。つまり、圧縮符号化装置10は、オーディオ信号をDSD方式で変調してデジタル化し、そのデジタルデータ(DSDデータ)を上述した新たなDSD可逆圧縮符号化方式で符号化し、DSDロスレスストリームを生成する装置である。
次に、制御部14によるデータ発生カウントテーブルpretableの作成方法について説明する。
...D4[n-3],D4[n-2],D4[n-1],D4[n],D4[n+1],D4[n+2],D4[n+3],...
ここで、D4[n]は、4ビットの連続データを表し、以下では、D4データともいう(n>3)。
次に、制御部14による変換テーブルtable1の作成方法について説明する。
次に、エンコード部15による、変換テーブルtable1を用いた圧縮符号化方法について説明する。例えば、入力バッファ13から供給されるDSDデータ
...D4[n-3],D4[n-2],D4[n-1],D4[n],D4[n+1],D4[n+2],D4[n+3],...
のうち、エンコード部15が、D4[n]を符号化する場合について説明する。
図7は、上述した圧縮符号化を行うエンコード部15の構成例を示す図である。
図8のフローチャートを参照して、圧縮符号化装置10による圧縮符号化処理について説明する。
図9は、上述した新たなDSD可逆圧縮符号化方式に対応する復号装置の主な構成例を示している。図9の復号装置70は、図4の圧縮符号化装置10が圧縮符号化して送信したオーディオ信号を受信して伸長処理(可逆復号)する装置である。
デコード部74による復号方法について説明する。圧縮符号化装置10で圧縮符号化されて送信されてきた圧縮データを、2ビット単位で以下のように表し、E2[n]を復号する場合について説明する。
...E2[n-3],E2[n-2],E2[n-1],E2[n],E2[n+1],E2[n+2],E2[n+3],...
ここで、E2[n]は、2ビットの連続データを表し、E2データともいう。
図10のフローチャートを参照して、復号装置70の復号処理についてさらに説明する。
上述の新たなDSD可逆圧縮符号化方式において、DSDデータは、1chあたり固定長(4096x32=131072ビット)のブロック(Block)に分割されて圧縮される。圧縮後、連続する10ブロック分の圧縮データにヘッダを付けてGOB(Group of Blocks)が構成される。さらにそのGOBの先頭にコンフィギュレーション情報(configuration)を付加した単位が、DSDロスレスペイロード(DSD_lossless_payload())となる。ブロックの伸長に必要な情報(code book;参照テーブル)は、GOBヘッダ(GOB header)とGOBデータ(GOB data)に格納される。AACとのストリームスイッチングも考慮し、ブロック(Block(audio frame))の時間長はAACと同程度に設定されている。
DSDロスレスペイロードのシンタクスの例を図12のAに示す。図12のAに示されるように、DSDロスレスペイロード(DSD_lossless_payload())には、例えば、format version、DSD_lossless_gob_configuration()、DSD_lossless_gob(number_of_audio_data)等が格納される。このformat versionは、図11のフォーマットバージョン(format version)に相当する。また、DSD_lossless_gob_configuration()は、図11のGOBコンフィグ(GOB config)に相当する。また、DSD_lossless_gob()は、図11のGOBに相当する。
上述した新たなDSD可逆圧縮符号化方式のDSDロスレスストリームの復号の様子の例について説明する。DSDロスレスストリームにおいては、上述したように、所定の時間分のデータがGOBとしてまとめて管理される。すなわち、DSDロスレスストリームは、図13のAに示されるように、GOBイニシャライザと所定数のブロック(例えば10ブロック)とが連続する構成となる。GOBイニシャライザは再生時間が0であるので単体でアクセス単位とすると再生時刻の管理が複雑になる。そこで、GOBイニシャライザは、そのGOBの先頭のブロックであるブロック1に付加し、GOBイニシャライザとブロック1とを1つのアクセス単位とするものとする。
MPEG-DASHにおいて、高品位な映像と音楽をユーザに送る手段として、ISO/IEC14496-12で規定されたISOBMFF形式のファイルを用いることが考えられた。例えば、ISOBMFF形式の派生フォーマットである、MPEG-4の第14部(ISO/IEC 14496-14:2003)で規定されているファイルフォーマット(以下MP4とも称する)のファイル(以下、MP4ファイルとも称する)を用いることが考えられた。
MP4ファイルの主な構成例を、図15のAに示す。MP4ファイルは、ボックスと称する階層構造を有する。例えば、MP4ファイルは、図15のAに示されるように、ファイルタイプコンパチビリティボックス(Flie Type Compatibility Box(ftyp))、ムービーボックス(Movie Box(moov))、メディアデータボックス(Media Data Box(mdat))を有する。ファイルタイプコンパチビリティボックス(ftyp)は、ファイルの先頭を表し、ファイルフォーマットの種別を識別する情報を格納する。ムービーボックス(moov)は、コンテンツのメタデータ等を格納する。メディアデータボックス(mdat)は、実際のAVデータ(actual data)を格納する。
サンプルとは、MP4ファイルフォーマットにおける最小アクセス単位である。サンプルテーブルボックスの主な構成例を図15のBに示す。図15のBに示されるように、例えば、サンプルテーブルボックス(stbl)は、サンプルディスクリプションボックス(Sample Description Box)、タイムトゥーサンプルボックス(Time To Sample Box)、サンプルサイズボックス(Sample Size Box)、サンプルトゥーチャンクボックス(Sample To Chunk Box)、チャンクオフセットボックス(Chunk Offset Box)、シンクサンプルボックス(Sync Sample Box)、サブサンプルインフォメーションボックス(Subsample Information Box)を有する。
以上のように、高品質なDSDデータを新たなDSD可逆圧縮符号化方式で符号化したDSDロスレスストリームをMP4ファイル化し、MPEG-DASHを用いてストリーム配信することにより、より高品質なデータの配信が可能になる。しかしながら、このDSDロスレスストリームをMP4ファイルに格納する方法はまだ考えられていなかった。例えば、原則通りにエレメンタリストリームにおける1オーディオサンプル(1量子化サンプル)をMP4システム層の1つのMP4サンプルに割り当てると、MP4サンプル数が膨大になる。例えば2.8MHzのDSDデータの場合、1秒で280万個のMP4サンプルが作られることになる。これは1つのMP4サンプルごとに処理を行うシステムにおいては非常に高負荷・非効率であり、実現は困難である。したがって、DSDロスレスストリームをMPEG-DASHにより配信することができなかった。そのため、より高品質な音声データを伝送することができなかった。
<配信システム>
以下に、本技術のより詳細について説明する。なお、以下においては、本技術に関連する音声データ(オーディオデータ)の配信について説明する。図18は、本技術を適用した情報処理システムの一態様である配信システムの構成の一例を示すブロック図である。図18に示される配信システム100は、画像や音声などのデータ(コンテンツ)を配信するシステムである。配信システム100において、ファイル生成装置101、配信サーバ102、および再生端末103は、ネットワーク104を介して互いに通信可能に接続されている。
図19は、ファイル生成装置101の主な構成例を示すブロック図である。図19に示されるように、ファイル生成装置101は、DSD生成部111、DSD符号化部112、MP4ファイル生成部113、および設定部114を有する。
図20は、再生端末103の主な構成例を示すブロック図である。図20に示されるように、再生端末103は、MP4ファイル取得部141、DSD復号部142、出力制御部143、出力部144、および制御部145を有する。
上述のように、高品質なDSDデータを新たなDSD可逆圧縮符号化方式で符号化したDSDロスレスストリームをMP4ファイル化に格納する方法はまだ考えられていなかった。例えば、MP4ファイルのサンプルにどのようなデータを割り当てるかが定められていなかった。
DSDロスレスストリームの各ブロックを互いに異なるサンプルに割り当てるようにしてもよい。すなわち、MP4ファイルの1サンプルに1ブロックを割り当てるようにしてもよい。例えば、図21に示されるように、1GOB(10ブロック)分のDSDロスレスストリーム171に対して、両矢印172で示されるようにサンプルを設定するようにしてもよい。両矢印172は、サンプルの範囲を示す。
DSDロスレスストリームの各ブロックを1つのサンプルに割り当てるようにしてもよい。すなわち、MP4ファイルの1サンプルに1GOBを割り当てるようにしてもよい。例えば、図23に示されるように、1GOB(10ブロック)分のDSDロスレスストリーム171に対して、両矢印172で示されるようにサンプルを設定するようにしてもよい。
DSDロスレスストリームのGOBに対して、GOBイニシャライザを含むサンプルと、GOBイニシャライザを含まないサンプル(ブロックのみを含むサンプル)との、2つのサンプルを設定するようにしてもよい。すなわち、MP4ファイルの1サンプルにGOBイニシャライザ若しくはブロック群を割り当てるようにしてもよい。例えば、図25に示されるように、1GOB(10ブロック)分のDSDロスレスストリーム171に対して、両矢印172で示されるようにサンプルを設定するようにしてもよい。
次に、配信システム100の各装置において実行される処理について説明する。最初に、ファイル生成装置101において実行される配信用データ生成処理の流れの例を、図28のフローチャートを参照して説明する。ファイル生成装置101は、音声データのMP4ファイルを生成する際に、この配信用データ生成処理を行う。
次に、この配信用データ生成処理のステップS103において実行されるMP4ファイル生成処理の流れの例を、図29のフローチャートを参照して説明する。
次に、MP4ファイルの再生(復号)について説明する。なお、以下においては、図30に示されるようなランダム再生のケースについて説明する。すなわち、シンクサンプル(Sync sample)に格納されるデコーダコンフィギュレーション情報を読み込み、そのデコーダコンフィギュレーション情報に基づいて、そのシンクサンプルと同じGOBのサンプル(sample)の途中のブロックからの再生を開始するものとする。この場合、図30に示されるように、再生開始時刻は、そのサンプルの途中のブロック(つまりサンプルの途中)に指定され、デコード開始時刻は、そのサンプルの先頭に指定される。
次に、図32のフローチャートを参照して、再生処理のステップS132において実行される復号処理の流れの例を、図33のフローチャートを参照して説明する。
<パラメータセットとエレメンタリストリームの分離>
なお、デコードに必要なGOBイニシャライザ(のサンプル)を、ブロック(のサンプル)を格納するトラック(track)と異なるトラックに格納するようにしてもよい。
この場合も配信用データ生成処理は、図28のフローチャートを参照して上述した場合と基本的に同様の流れで行うことができる。この場合のMP4ファイル生成処理の流れの例を、図35のフローチャートを参照して説明する。
この場合も再生処理は、図31のフローチャートを参照して上述した場合と基本的に同様の流れで行うことができる。この場合の復号処理の流れの例を、図36のフローチャートを参照して説明する。
なお、上述したDSDロスレスパラメータセットトラックとDSDロスレスエレメンタリストリームトラックとを互いに異なるファイルとしてもよい。
<オーディオサンプルエントリの設定>
なお、DSDロスレスストリームの管理情報を、オーディオサンプルエントリ(AudioSampleEntryV1)に格納するようにしてもよい。図39にオーディオサンプルエントリのシンタクスの例を示す。オーディオサンプルエントリを利用する場合、各パラメータを以下のように設定する。
図40は、この場合のファイル生成装置101の主な構成例を示すブロック図である。図40に示されるように、この場合のファイル生成装置101は、図19を参照して説明した構成に加え、オーディオサンプルエントリ設定部211を有する。オーディオサンプルエントリ設定部211は、オーディオサンプルエントリの設定に関する処理を行う。
この場合も配信用データ生成処理は、図28のフローチャートを参照して説明した場合と基本的に同様に行われる。
図42は、この場合の再生端末103の主な構成例を示すブロック図である。図42に示されるように、この場合の再生端末103は、図20を参照して説明した構成に加え、オーディオサンプルエントリ解析部221を有する。オーディオサンプルエントリ解析部221は、オーディオサンプルエントリの解析に関する処理を行う。
この場合も再生処理は、図31のフローチャートを参照して説明した場合と基本的に同様に行われる。
<拡張ボックスの設定>
なお、オーディオサンプルエントリ(AudioSampleEntryV1)に拡張ボックスを新たに定義し、その拡張ボックスにDSD可逆符号化方式固有の情報(config情報)を格納するようにしてもよい。これにより、メディアデータボックス(mdat)の中をアクセスしなくても、ストリームの属性(デコーダコンフィギュレーション情報)を知ることができるようになり、ランダムアクセス時等での再生処理の高速化が期待できる。
この拡張ボックス(DSDSpecificBox())のシンタクスの例を図44のBに示す。図44のBに示されるように、この拡張ボックス(DSDSpecificBox())には、例えば、ストリーム内で共通の基本的なデコーダコンフィギュレーション情報が格納されるようにしてもよい。例えば、DSDロスレスストリームのDSDロスレスペイロード(DSD_lossless_payload())から、format_versionと、DSD_lossless_gob_configuration()とが読み出され、それらが、この拡張ボックス(DSDSpecificBox())に格納されるようにしてもよい。
なお、この拡張ボックス(DSDSpecificBox())にGOB固有の情報を格納するようにしてもよい。この場合の拡張ボックス(DSDSpecificBox())のシンタクスの例を図46のAに示す。図46のAに示されるように、例えば、DSD_lossless_gob_header() と DSD_lossless_gob_data()(codebook)も読み出して、この拡張ボックス(DSDSpecificBox())に格納するようにしてもよい。上述したように、DSD_lossless_gob_header()とDSD_lossless_gob_data()は、DSD_lossless_gob()から読み出される。つまり、これらの情報は、GOBイニシャライザに含まれる情報であり、当該GOB固有の情報が含まれる。
図47は、この場合のファイル生成装置101の主な構成例を示すブロック図である。図47に示されるように、この場合のファイル生成装置101は、図19を参照して説明した構成に加え、DSDオーディオサンプルエントリ設定部241を有する。DSDオーディオサンプルエントリ設定部241は、オーディオサンプルエントリの設定や拡張ボックスの設定等に関する処理を行う。
この場合も配信用データ生成処理は、図28のフローチャートを参照して説明した場合と基本的に同様に行われる。
図49は、この場合の再生端末103の主な構成例を示すブロック図である。図49に示されるように、この場合の再生端末103は、図20を参照して説明した構成に加え、DSDオーディオサンプルエントリ解析部251を有する。DSDオーディオサンプルエントリ解析部251は、オーディオサンプルエントリの解析や拡張ボックスの解析等に関する処理を行う。
この場合も再生処理は、図31のフローチャートを参照して説明した場合と基本的に同様に行われる。
<規格>
以上においては、DSDロスレスストリームをMP4ファイルに格納し、MPEG-DASHを利用して配信する場合について説明したが、本技術はこれ以外の例にも適用することができる。例えば、本技術は、DSDロスレスストリーム以外の任意のデータにも適用することができる。また、本技術は、MP4ファイル以外の任意のファイルフォーマットに格納する場合にも適用することができる。さらに、本技術は、MPEG-DASH以外の任意の規格のデータ配信にも適用することができる。
本技術を適用したシステム、装置、処理部等は、例えば、交通、医療、防犯、農業、畜産業、鉱業、美容、工場、家電、気象、自然監視等、任意の分野に利用することができる。
上述した一連の処理は、ハードウエアにより実行させることもできるし、ソフトウエアにより実行させることもできる。一連の処理をソフトウエアにより実行する場合には、そのソフトウエアを構成するプログラムが、コンピュータにインストールされる。ここでコンピュータには、専用のハードウエアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータ等が含まれる。
なお、本明細書では、各種情報が、符号化データ(ビットストリーム)に多重化されて、符号化側から復号側へ伝送される例について説明したが、これら情報を伝送する手法はかかる例に限定されない。例えば、これら情報は、符号化データに多重化されることなく、符号化データと関連付けられた別個のデータとして伝送され又は記録されてもよい。ここで、「関連付ける」という用語は、例えば、符号化データに含まれる画像(スライスまたはブロックなど、画像の一部であってもよい)と当該画像に対応する情報とを復号時にリンクさせ得るようにすることを意味する。即ち、この符号化データ(画像)に関連付けられた情報は、符号化データ(画像)とは別の伝送路上で伝送されるようにしてもよい。また、この符号化データ(画像)に関連付けられた情報は、符号化データ(画像)とは別の記録媒体(又は同一の記録媒体の別の記録エリア)に記録されるようにしてもよい。さらに、画像とその画像に対応する情報とが、例えば、複数フレーム、1フレーム、又はフレーム内の一部分などの任意の単位で互いに関連付けられるようにしてもよい。
(1) 音声データの符号化データであって、前記符号化データのアクセス単位であるブロックが所定数毎にグループ化される構造を有する前記符号化データを格納する所定のファイルフォーマットのファイルに、前記ファイルにおける最小アクセス単位であるサンプルとして、前記ブロックのグループの復号に利用されるイニシャライズ情報を含むサンプルを設定するサンプル設定部
を備える情報処理装置。
(2) 前記サンプル設定部は、さらに、前記ブロック毎にサンプルを設定する
(1)に記載の情報処理装置。
(3) 前記サンプル設定部により設定された、前記イニシャライズ情報を含むサンプルを、復号の開始に必要な情報を含むシンクサンプルに設定するシンクサンプル設定部をさらに備える
(1)または(2)に記載の情報処理装置。
(4) 前記サンプル設定部は、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルと、前記グループの前記他のブロックの前記ブロック毎のサンプルとを設定する
(1)乃至(3)のいずれかに記載の情報処理装置。
(5) 前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルを、復号の開始に必要な情報を含むシンクサンプルに設定するシンクサンプル設定部をさらに備える
(1)乃至(4)のいずれかに記載の情報処理装置。
(6) 前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルに、前記イニシャライズ情報を含むサブサンプルと、前記グループの先頭のブロックを含むサブサンプルとを設定するサブサンプル設定部をさらに備える
(1)乃至(5)のいずれかに記載の情報処理装置。
(7) 前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルを、復号の開始に必要な情報を含むシンクサンプルに設定するシンクサンプル設定部をさらに備える
(1)乃至(6)のいずれかに記載の情報処理装置。
(8) 前記サンプル設定部は、前記イニシャライズ情報と前記グループの全ブロックとを含むサンプルを設定する
(1)乃至(7)のいずれかに記載の情報処理装置。
(9) 前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの全ブロックとを含むサンプルに、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサブサンプルと、前記グループの前記ブロック毎のサブサンプルとを設定するサブサンプル設定部をさらに備える
(1)乃至(8)のいずれかに記載の情報処理装置。
(10) 前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの全ブロックとを含むサンプルに、前記イニシャライズ情報を含むサブサンプル、前記グループの先頭のブロックを含むサブサンプル、並びに、前記グループの前記ブロック毎のサブサンプルを設定するサブサンプル設定部をさらに備える
(1)乃至(9)のいずれかに記載の情報処理装置。
(11) 前記サンプル設定部により設定された全サンプルが、復号の開始に必要な情報を含むシンクサンプルとされる
(1)乃至(10)のいずれかに記載の情報処理装置。
(12) 前記サンプル設定部により設定された、前記イニシャライズ情報を含むサンプルを、復号の開始に必要な情報を含むシンクサンプルに設定するシンクサンプル設定部をさらに備え、
前記サンプル設定部は、さらに、前記グループの全ブロックを含むサンプルを設定するように構成される
(1)乃至(11)のいずれかに記載の情報処理装置。
(13) 前記サンプル設定部により設定された、前記グループの全ブロックを含むサンプルに、前記ブロック毎のサブサンプルを設定するサブサンプル設定部をさらに備える
(1)乃至(12)のいずれかに記載の情報処理装置。
(14) 前記サンプル設定部は、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルと、前記グループの他のブロックを全て含むサンプルを設定するように構成され、
前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルを、復号の開始に必要な情報を含むシンクサンプルに設定するシンクサンプル設定部をさらに備える
(1)乃至(13)のいずれかに記載の情報処理装置。
(15) 前記サンプル設定部により設定された、前記グループの前記他のブロックを全て含むサンプルに、前記ブロック毎のサブサンプルを設定するサブサンプル設定部をさらに備える
(1)乃至(14)のいずれかに記載の情報処理装置。
(16) 前記サブサンプル設定部は、さらに、前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルに、前記イニシャライズ情報を含むサブサンプルと、前記グループの先頭のブロックを含むサブサンプルとを設定する
(1)乃至(15)のいずれかに記載の情報処理装置。
(17) 前記サンプル設定部は、さらに、前記ブロック毎のサンプルを、前記イニシャライズ情報を含むサンプルを設定したトラックとは異なるトラックに設定する
(1)乃至(16)のいずれかに記載の情報処理装置。
(18) 前記サンプル設定部は、さらに、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルと、前記グループの他の前記ブロックの前記ブロック毎のサンプルとを、前記イニシャライズ情報を含むサンプルを設定したトラックとは異なるトラックに設定する
(1)乃至(17)のいずれかに記載の情報処理装置。
(19) 前記サンプル設定部は、さらに、前記ブロック毎のサンプルを、前記イニシャライズ情報を含むサンプルを設定したファイルとは異なるファイルに設定する
(1)乃至(18)のいずれかに記載の情報処理装置。
(20) 前記サンプル設定部は、さらに、DRM(Digital Rights Management)に関する情報を格納するプロテクションスキームインフォボックスを設定する
(1)乃至(19)のいずれかに記載の情報処理装置。
(21) 前記音声データに関する情報をオーディオサンプルエントリに設定するオーディオサンプルエントリ設定部をさらに備える
(1)乃至(20)のいずれかに記載の情報処理装置。
(22) 前記オーディオサンプルエントリ設定部は、フィールドcodingnameに、前記音声データのフォーマットを示す所定の値を設定する
(1)乃至(21)のいずれかに記載の情報処理装置。
(23) 前記オーディオサンプルエントリ設定部は、フィールドchannelcountに、値「2」を設定する
(1)乃至(22)のいずれかに記載の情報処理装置。
(24) 前記オーディオサンプルエントリ設定部は、フィールドsamplesizeに、値「1」を設定する
(1)乃至(23)のいずれかに記載の情報処理装置。
(25) 前記オーディオサンプルエントリ設定部は、フィールドsamplerateに、「44.1KHz」を示す値を設定する
(1)乃至(24)のいずれかに記載の情報処理装置。
(26) 前記オーディオサンプルエントリ設定部は、フィールドsampling_rateに、サンプリングレートボックスのsampling_rateと同じ値を設定する
(1)乃至(25)のいずれかに記載の情報処理装置。
(27) 前記音声データに関する情報をオーディオサンプルエントリの拡張ボックスに設定する拡張ボックス設定部をさらに備える
(1)乃至(26)のいずれかに記載の情報処理装置。
(28) 前記拡張ボックス設定部は、前記拡張ボックスに、前記符号化データの符号化方式固有の情報を格納させる
(1)乃至(27)のいずれかに記載の情報処理装置。
(29) 前記拡張ボックス設定部は、前記拡張ボックスに、DSD_lossless_gob_configurationを格納させる
(1)乃至(28)のいずれかに記載の情報処理装置。
(30) 前記拡張ボックス設定部は、前記拡張ボックスに、さらに、DSD_lossless_gob_headerとDSD_lossless_gob_dataとを格納させる
(1)乃至(29)のいずれかに記載の情報処理装置。
(31) 前記イニシャライズ情報は、前記符号化データの復号に利用されるデコーダコンフィギュレーション情報を含む
(1)乃至(30)のいずれかに記載の情報処理装置。
(32) 前記イニシャライズ情報は、フラグメントの境界に関する情報を含む
(1)乃至(31)のいずれかに記載の情報処理装置。
(33) 前記音声データはDSD(Direct Stream Digital)データであり、前記符号化データは、前記DSDデータが可逆符号化されて得られたものである
(1)乃至(32)のいずれかに記載の情報処理装置。
(34) 前記ファイルフォーマットは、ISO/IEC14496に準拠したファイルフォーマットである
(1)乃至(33)のいずれかに記載の情報処理装置。
(35) 前記サンプル設定部の設定に基づいて前記ファイルを生成するファイル生成部をさらに備える
(1)乃至(34)のいずれかに記載の情報処理装置。
(36) 前記音声データ可逆符号化して前記符号化データを生成する符号化部をさらに備える
(1)乃至(35)のいずれかに記載の情報処理装置。
(37) 前記音声データを生成する音声データ生成部をさらに備える
(1)乃至(36)のいずれかに記載の情報処理装置。
(38) 音声データの符号化データであって、前記符号化データのアクセス単位であるブロックが所定数毎にグループ化される構造を有する前記符号化データを格納する所定のファイルフォーマットのファイルに、前記ファイルにおける最小アクセス単位であるサンプルとして、前記ブロックのグループの復号に利用されるイニシャライズ情報を含むサンプルを設定する
情報処理方法。
(39) 音声データの符号化データを格納する所定のファイルフォーマットのファイルの、前記ファイルにおける最小アクセス単位であり、前記ブロックのグループの復号に利用されるイニシャライズ情報を含むサンプルを解析し、解析結果に基づいて前記符号化データの復号に利用されるデコーダコンフィギュレーション情報を取得するサンプル解析部と、
前記サンプル解析部により取得された前記デコーダコンフィギュレーション情報を設定する設定部と、
前記設定部により設定された前記デコーダコンフィギュレーション情報を用いて前記符号化データを復号する復号部と
を備える情報処理装置。
(40) 音声データの符号化データを格納する所定のファイルフォーマットのファイルの、前記ファイルにおける最小アクセス単位であり、前記ブロックのグループの復号に利用されるイニシャライズ情報を含むサンプルを解析し、
解析結果に基づいて前記符号化データの復号に利用されるデコーダコンフィギュレーション情報を取得し、
取得された前記デコーダコンフィギュレーション情報を設定し、
設定された前記デコーダコンフィギュレーション情報を用いて前記符号化データを復号する
情報処理方法。
Claims (20)
- 音声データの符号化データであって、前記符号化データのアクセス単位であるブロックが所定数毎にグループ化される構造を有する前記符号化データを格納する所定のファイルフォーマットのファイルに、前記ファイルにおける最小アクセス単位であるサンプルとして、前記ブロックのグループの復号に利用されるイニシャライズ情報を含むサンプルを設定するサンプル設定部
を備える情報処理装置。 - 前記サンプル設定部は、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルと、前記グループの前記他のブロックの前記ブロック毎のサンプルとを設定するように構成され、
前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルを、復号の開始に必要な情報を含むシンクサンプルに設定するシンクサンプル設定部をさらに備える
請求項1に記載の情報処理装置。 - 前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルに、前記イニシャライズ情報を含むサブサンプルと、前記グループの先頭のブロックを含むサブサンプルとを設定するサブサンプル設定部をさらに備える
請求項2に記載の情報処理装置。 - 前記サンプル設定部は、前記イニシャライズ情報と前記グループの全ブロックとを含むサンプルを設定する
請求項1に記載の情報処理装置。 - 前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの全ブロックとを含むサンプルに、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサブサンプルと、前記グループの前記ブロック毎のサブサンプルとを設定するサブサンプル設定部をさらに備える
請求項4に記載の情報処理装置。 - 前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの全ブロックとを含むサンプルに、前記イニシャライズ情報を含むサブサンプル、前記グループの先頭のブロックを含むサブサンプル、並びに、前記グループの前記ブロック毎のサブサンプルを設定するサブサンプル設定部をさらに備える
請求項4に記載の情報処理装置。 - 前記サンプル設定部により設定された全サンプルが、復号の開始に必要な情報を含むシンクサンプルとされる
請求項4に記載の情報処理装置。 - 前記サンプル設定部により設定された、前記イニシャライズ情報を含むサンプルを、復号の開始に必要な情報を含むシンクサンプルに設定するシンクサンプル設定部をさらに備え、
前記サンプル設定部は、さらに、前記グループの全ブロックを含むサンプルを設定するように構成される
請求項1に記載の情報処理装置。 - 前記サンプル設定部により設定された、前記グループの全ブロックを含むサンプルに、前記ブロック毎のサブサンプルを設定するサブサンプル設定部をさらに備える
請求項8に記載の情報処理装置。 - 前記サンプル設定部は、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルと、前記グループの他のブロックを全て含むサンプルを設定するように構成され、
前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルを、復号の開始に必要な情報を含むシンクサンプルに設定するシンクサンプル設定部をさらに備える
請求項1に記載の情報処理装置。 - 前記サンプル設定部により設定された、前記グループの前記他のブロックを全て含むサンプルに、前記ブロック毎のサブサンプルを設定するサブサンプル設定部をさらに備える
請求項10に記載の情報処理装置。 - 前記サブサンプル設定部は、さらに、前記サンプル設定部により設定された、前記イニシャライズ情報と前記グループの先頭のブロックとを含むサンプルに、前記イニシャライズ情報を含むサブサンプルと、前記グループの先頭のブロックを含むサブサンプルとを設定する
請求項11に記載の情報処理装置。 - 前記サンプル設定部は、さらに、前記ブロック毎のサンプルを、前記イニシャライズ情報を含むサンプルを設定したトラックとは異なるトラックに設定する
請求項1に記載の情報処理装置。 - 前記サンプル設定部は、さらに、前記ブロック毎のサンプルを、前記イニシャライズ情報を含むサンプルを設定したファイルとは異なるファイルに設定する
請求項1に記載の情報処理装置。 - 前記音声データに関する情報をオーディオサンプルエントリの拡張ボックスに設定する拡張ボックス設定部をさらに備える
請求項1に記載の情報処理装置。 - 前記音声データはDSD(Direct Stream Digital)データであり、前記符号化データは、前記DSDデータが可逆符号化されて得られたものである
請求項1に記載の情報処理装置。 - 前記ファイルフォーマットは、ISO/IEC14496に準拠したファイルフォーマットである
請求項1に記載の情報処理装置。 - 音声データの符号化データであって、前記符号化データのアクセス単位であるブロックが所定数毎にグループ化される構造を有する前記符号化データを格納する所定のファイルフォーマットのファイルに、前記ファイルにおける最小アクセス単位であるサンプルとして、前記ブロックのグループの復号に利用されるイニシャライズ情報を含むサンプルを設定する
情報処理方法。 - 音声データの符号化データを格納する所定のファイルフォーマットのファイルの、前記ファイルにおける最小アクセス単位であり、前記ブロックのグループの復号に利用されるイニシャライズ情報を含むサンプルを解析し、解析結果に基づいて前記符号化データの復号に利用されるデコーダコンフィギュレーション情報を取得するサンプル解析部と、
前記サンプル解析部により取得された前記デコーダコンフィギュレーション情報を設定する設定部と、
前記設定部により設定された前記デコーダコンフィギュレーション情報を用いて前記符号化データを復号する復号部と
を備える情報処理装置。 - 音声データの符号化データを格納する所定のファイルフォーマットのファイルの、前記ファイルにおける最小アクセス単位であり、前記ブロックのグループの復号に利用されるイニシャライズ情報を含むサンプルを解析し、
解析結果に基づいて前記符号化データの復号に利用されるデコーダコンフィギュレーション情報を取得し、
取得された前記デコーダコンフィギュレーション情報を設定し、
設定された前記デコーダコンフィギュレーション情報を用いて前記符号化データを復号する
情報処理方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/088,234 US20200411021A1 (en) | 2016-03-31 | 2017-03-17 | Information processing apparatus and information processing method |
CN201780019337.9A CN108885874A (zh) | 2016-03-31 | 2017-03-17 | 信息处理装置和方法 |
JP2018509039A JP6876928B2 (ja) | 2016-03-31 | 2017-03-17 | 情報処理装置および方法 |
EP17774431.5A EP3438976A4 (en) | 2016-03-31 | 2017-03-17 | INFORMATION PROCESSING DEVICE AND METHOD |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016-072171 | 2016-03-31 | ||
JP2016072171 | 2016-03-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017169890A1 true WO2017169890A1 (ja) | 2017-10-05 |
Family
ID=59964307
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/010871 WO2017169890A1 (ja) | 2016-03-31 | 2017-03-17 | 情報処理装置および方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20200411021A1 (ja) |
EP (1) | EP3438976A4 (ja) |
JP (1) | JP6876928B2 (ja) |
CN (1) | CN108885874A (ja) |
WO (1) | WO2017169890A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110611639A (zh) * | 2018-06-14 | 2019-12-24 | 视联动力信息技术股份有限公司 | 流媒体会议的音频数据处理方法和装置 |
WO2020142364A1 (en) * | 2019-01-04 | 2020-07-09 | Tencent America LLC | Flexible interoperability and capability signaling using initialization hierarchy |
US11184665B2 (en) * | 2018-10-03 | 2021-11-23 | Qualcomm Incorporated | Initialization set for network streaming of media data |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017203976A1 (ja) * | 2016-05-24 | 2017-11-30 | ソニー株式会社 | 圧縮符号化装置及び方法、復号装置及び方法、並びにプログラム |
KR102471492B1 (ko) * | 2017-12-27 | 2022-11-28 | 삼성전자 주식회사 | 디스플레이장치 및 그 제어방법 |
EP4131961A4 (en) * | 2020-04-13 | 2023-09-13 | LG Electronics, Inc. | DEVICE FOR TRANSMITTING POINT CLOUD DATA, METHOD FOR TRANSMITTING POINT CLOUD DATA, DEVICE FOR RECEIVING POINT CLOUD DATA AND METHOD FOR RECEIVING POINT CLOUD DATA |
TWI779772B (zh) * | 2021-08-13 | 2022-10-01 | 瑞昱半導體股份有限公司 | 訊號處理方法以及訊號處理器 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001344905A (ja) * | 2000-05-26 | 2001-12-14 | Fujitsu Ltd | データ再生装置、その方法及び記録媒体 |
JP2002196794A (ja) * | 2000-12-25 | 2002-07-12 | Olympus Optical Co Ltd | 音声記録再生装置 |
WO2003032296A1 (fr) * | 2001-10-03 | 2003-04-17 | Sony Corporation | Appareil et procede de codage, appareil et procede de decodage et appareil et procede d'enregistrement de support d'enregistrement |
JP2009544054A (ja) * | 2006-07-18 | 2009-12-10 | トムソン ライセンシング | 非可逆符号化信号のビットストリーム・データ及び上記信号の可逆拡張符号化データのオーディオ・ビットストリーム・データ構造配置 |
JP2014016625A (ja) * | 2008-01-04 | 2014-01-30 | Dolby International Ab | オーディオコーディングシステム、オーディオデコーダ、オーディオコーディング方法及びオーディオデコーディング方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2131590A1 (en) * | 2008-06-02 | 2009-12-09 | Deutsche Thomson OHG | Method and apparatus for generating or cutting or changing a frame based bit stream format file including at least one header section, and a corresponding data structure |
-
2017
- 2017-03-17 CN CN201780019337.9A patent/CN108885874A/zh not_active Withdrawn
- 2017-03-17 JP JP2018509039A patent/JP6876928B2/ja active Active
- 2017-03-17 WO PCT/JP2017/010871 patent/WO2017169890A1/ja active Application Filing
- 2017-03-17 EP EP17774431.5A patent/EP3438976A4/en not_active Withdrawn
- 2017-03-17 US US16/088,234 patent/US20200411021A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001344905A (ja) * | 2000-05-26 | 2001-12-14 | Fujitsu Ltd | データ再生装置、その方法及び記録媒体 |
JP2002196794A (ja) * | 2000-12-25 | 2002-07-12 | Olympus Optical Co Ltd | 音声記録再生装置 |
WO2003032296A1 (fr) * | 2001-10-03 | 2003-04-17 | Sony Corporation | Appareil et procede de codage, appareil et procede de decodage et appareil et procede d'enregistrement de support d'enregistrement |
JP2009544054A (ja) * | 2006-07-18 | 2009-12-10 | トムソン ライセンシング | 非可逆符号化信号のビットストリーム・データ及び上記信号の可逆拡張符号化データのオーディオ・ビットストリーム・データ構造配置 |
JP2014016625A (ja) * | 2008-01-04 | 2014-01-30 | Dolby International Ab | オーディオコーディングシステム、オーディオデコーダ、オーディオコーディング方法及びオーディオデコーディング方法 |
Non-Patent Citations (2)
Title |
---|
ERWIN JANSSEN ET AL.: "DSD compression for resent ultra high quality 1-bit coders", AUDIO ENGINEERING SOCIETY, May 2005 (2005-05-01), XP055521872 * |
See also references of EP3438976A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110611639A (zh) * | 2018-06-14 | 2019-12-24 | 视联动力信息技术股份有限公司 | 流媒体会议的音频数据处理方法和装置 |
US11184665B2 (en) * | 2018-10-03 | 2021-11-23 | Qualcomm Incorporated | Initialization set for network streaming of media data |
WO2020142364A1 (en) * | 2019-01-04 | 2020-07-09 | Tencent America LLC | Flexible interoperability and capability signaling using initialization hierarchy |
US11546402B2 (en) | 2019-01-04 | 2023-01-03 | Tencent America LLC | Flexible interoperability and capability signaling using initialization hierarchy |
US11770433B2 (en) | 2019-01-04 | 2023-09-26 | Tencent America LLC | Flexible interoperability and capability signaling using initialization hierarchy |
Also Published As
Publication number | Publication date |
---|---|
JP6876928B2 (ja) | 2021-05-26 |
JPWO2017169890A1 (ja) | 2019-02-14 |
EP3438976A1 (en) | 2019-02-06 |
US20200411021A1 (en) | 2020-12-31 |
EP3438976A4 (en) | 2019-04-24 |
CN108885874A (zh) | 2018-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6876928B2 (ja) | 情報処理装置および方法 | |
KR102451342B1 (ko) | 다수의 오브젝트 오디오의 인코딩 후 비트레이트 감소 | |
KR101739272B1 (ko) | 멀티미디어 스트리밍 시스템에서 컨텐트의 저장 및 재생을 위한 장치 및 방법 | |
KR101695214B1 (ko) | 파일 포맷 기반의 적응적 스트림 생성, 재생 방법 및 장치와 그 기록 매체 | |
JP2018186524A (ja) | コンテンツ送信装置およびコンテンツ再生装置 | |
WO2016058411A1 (zh) | 一种http实时流媒体分片的拼接方法及拼接系统 | |
JP6439691B2 (ja) | ファイル生成装置および方法、並びにコンテンツ再生装置および方法 | |
KR20060111904A (ko) | 메타-데이터 및 미디어-데이터를 포함하는 멀티미디어파일의 스트리밍 | |
JP2024026650A (ja) | 独立的に符号化されたタイルを組み込む基本ビットストリームを保護するためのシステムおよび方法 | |
JP7238948B2 (ja) | 情報処理装置および情報処理方法 | |
WO2016002495A1 (ja) | 情報処理装置および方法 | |
CN105208440A (zh) | 一种mp4格式视频在线播放的方法及系统 | |
WO2018142946A1 (ja) | 情報処理装置および方法 | |
KR20150047459A (ko) | 멀티-미디어 파일 에뮬레이션 디바이스 | |
US20190088265A1 (en) | File generation device and file generation method | |
CN105122821A (zh) | 服务器装置、内容提供方法以及计算机程序 | |
US9398351B2 (en) | Method and apparatus for converting content in multimedia system | |
WO2018142947A1 (ja) | 情報処理装置および方法 | |
WO2017169891A1 (ja) | 情報処理装置および方法 | |
WO2015115253A1 (ja) | 受信装置、受信方法、送信装置、及び、送信方法 | |
CN113014930B (zh) | 信息处理设备、信息处理方法和计算机可读记录介质 | |
WO2018079293A1 (ja) | 情報処理装置および方法 | |
WO2015050001A1 (ja) | ファイル生成装置および方法、並びにファイル再生装置および方法 | |
IRT et al. | D4. 1: Requirements for Representation, Archiving and Provision of Object-based Audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2018509039 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2017774431 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2017774431 Country of ref document: EP Effective date: 20181031 |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17774431 Country of ref document: EP Kind code of ref document: A1 |