WO2017073034A1 - 動画像符号化装置、動画像符号化方法およびプログラム記録媒体 - Google Patents
動画像符号化装置、動画像符号化方法およびプログラム記録媒体 Download PDFInfo
- Publication number
- WO2017073034A1 WO2017073034A1 PCT/JP2016/004631 JP2016004631W WO2017073034A1 WO 2017073034 A1 WO2017073034 A1 WO 2017073034A1 JP 2016004631 W JP2016004631 W JP 2016004631W WO 2017073034 A1 WO2017073034 A1 WO 2017073034A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- processing
- unit
- quantization
- transform
- list
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
- H04N19/122—Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
- H04N19/45—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder performing compensation of the inverse transform mismatch, e.g. Inverse Discrete Cosine Transform [IDCT] mismatch
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/88—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving rearrangement of data among different coding units, e.g. shuffling, interleaving, scrambling or permutation of pixel data or permutation of transform coefficient data among different blocks
Definitions
- the present invention relates to a moving image encoding apparatus, a moving image encoding method, and a program recording medium.
- the present invention relates to a moving image encoding apparatus, a moving image encoding method, and a program recording medium that can execute data-dependent processing in parallel without reducing the efficiency of parallel processing.
- H.264 Motion Picture Experts Group
- ISO International Organization for Standardization
- H.264 Moving Picture Coding
- H.265 HEVC High Efficiency Video Coding
- the H.264 and H.265 encoding methods include a prediction method that reduces inter-frame redundancy or intra-frame redundancy, and a conversion that reduces the spatial redundancy of the prediction residual by converting the spatial component of the prediction residual into a frequency component. It consists of a quantization method and an entropy coding method that assigns a variable-length code to the frequency of data generation. H.264 and H.265 encoding schemes are also called hybrid encoding schemes.
- the encoding efficiency of H.265 is twice that of H.264. Since the encoding efficiency is high, the amount of calculation related to encoding is greatly increased in H.265.
- H.265 encoding is performed in units of code blocks (CU: Coding Unit).
- CU Coding Unit
- prediction is performed in units of prediction blocks (PU: Prediction Unit)
- transformation is performed in units of transform blocks (TU: Transform Unit).
- TU Transform Unit
- TU patterns in H.264 are two patterns of 4x4 and 8x8.
- the TU patterns in H.265 are 4 patterns of 4x4, 8x8, 16x16, and 32x32, and two patterns are added compared to H.264.
- 4x4 etc. represents the size of the TU.
- 4x4 means a TU having 4 pixels in the vertical direction and 4 pixels in the horizontal direction. As the number of TU patterns that can be processed increases, in encoding performed based on the H.265 standard, various TUs are mixed on the screen to be processed.
- Patent Document 1 describes an image encoding apparatus that selects a mode and a quantization parameter that are optimal for encoding efficiency.
- Non-Patent Document 1 describes the contents of processing based on the H.265 standard.
- An example of the configuration of a video encoding device (encoder) based on the H.265 standard is shown in FIG.
- FIG. 23 is a block diagram illustrating a configuration example of a moving image encoding device based on the H.265 standard.
- the 23 includes an intra prediction unit 1000, an inter prediction unit 2000, a conversion processing unit 3000, an entropy coding unit 4000, a subtractor 5000, an adder 6000, and a multiplexer 7000. And a multiplexer 8000.
- the intra prediction unit 1000 is a prediction processing unit having a function of performing a prediction process for reducing the redundancy in the frame with respect to the spatial component of the input image.
- the inter prediction unit 2000 is a prediction processing unit having a function of performing a prediction process for reducing the redundancy between frames regarding the spatial component of the input image.
- the intra prediction unit 1000 and the inter prediction unit 2000 output the prediction image generated by the prediction process.
- the conversion processing unit 3000 performs a conversion process of converting a spatial component of a residual image, which is a difference between an input image and a predicted image, into a frequency component.
- the conversion processing unit 3000 outputs the conversion coefficient generated by the conversion process.
- the conversion processing unit 3000 converts the conversion coefficient back to pixel information for the inter prediction unit 2000 that uses the image of the previous frame.
- An adder 6000 obtains a reconstructed image by adding the inversely transformed pixel information and the predicted image. The obtained reconstructed image is input to the inter prediction unit 2000 as shown in FIG.
- the entropy coding unit 4000 is a coding processing unit that has a function of scanning a transform coefficient, variable-length-coding the transform coefficient based on the appearance probability of data, and outputting a bit stream.
- frequency component information converted into a format that is easily encoded by the conversion processing unit 3000 is input as a bit stream.
- the entropy encoding unit 4000 encodes the input bit stream based on the appearance probability of “0” or “1”.
- CBF Coded Block Block
- HM HEVC Test Model
- FIG. 24 is a block diagram illustrating a configuration example of the conversion processing unit 3000 illustrated in FIG.
- the transformation processing unit 3000 shown in FIG. 24 includes a transformation / quantization unit 3100 and an inverse transformation / inverse quantization unit 3200.
- the transform / quantization unit 3100 converts the spatial component of the input residual image into a frequency component, and generates a transform coefficient corresponding to the transform result. Next, the transform / quantization unit 3100 quantizes the transform coefficient, and inputs the quantized transform coefficient to the inverse transform / inverse quantization unit 3200.
- the inverse transform / inverse quantization unit 3200 reconstructs an image based on the input transform coefficient so that the once encoded image is used in the inter prediction process for the next frame.
- the inverse transform / inverse quantization unit 3200 performs inverse quantization on the quantized transform coefficient that is a frequency component input from the transform / quantization unit 3100.
- the inverse transform / inverse quantization unit 3200 inversely transforms the inversely quantized transform coefficient into a spatial component.
- integer DCT Discrete Cosine Transform
- integer DST Discrete Sine Transform
- Conversion processing is executed for each TU.
- the H.265 standard defines an integer-precision orthogonal transform in both DCT and DST. That is, the processing result of the conversion process is a matrix product of a pixel value included in the TU and a conversion matrix defined for each TU size (hereinafter also referred to as “TU size”). Since the processing result is a matrix product of TU units, the conversion process is a process that depends on the relationship between the pixels in units of rows or columns. The specific contents of the conversion formula are described in Non-Patent Document 1.
- Quantization processing is executed based on the input quantization parameter.
- the quantization process does not depend on the relationship between pixels.
- the inverse transform process is an inverse process of the transform process, and the inverse quantization process is an inverse process of the quantization process.
- Signal processing such as video coding has a large amount of processing.
- moving image coding is a process executed with a high degree of parallelism. Therefore, moving picture encoding is required to be executed at high speed by parallel processing.
- parallel processing is parallel processing using a many-core architecture such as GPU (Graphics Processing Unit).
- Graphics Processing Unit Graphics Processing Unit
- GPGPU General / Purpose / computing / on / Graphics / Processing / Units.
- CPU Central Processing Unit
- GPU has thousands of processor cores. Therefore, the GPU can realize processing with a high degree of parallelism.
- SIMT Single Instruction Multiple Multiple Thread
- Kepler architecture of NVIDIA which is a kind of SIMT architecture
- a group of 32 threads is called a warp.
- instructions are executed in warp units. That is, when one of the 32 threads performs different processing, another thread of the same warp stalls.
- a stall is a state in which the operation stops and no operation is accepted. Therefore, the SIMT architecture is a technique suitable for realizing an application that executes the same processing on a large amount of data.
- the conversion process in the moving image coding is a process that depends on the relationship between the pixels between the rows of the TU or the relationship between the pixels between the columns, it is difficult to execute with a high degree of parallelism, and the processing efficiency decreases. Furthermore, since the conversion process depends on the size of the TU, allocation of the conversion process of each TU to each thread becomes complicated.
- Patent Document 2 describes a decoding method in which a plurality of processing units perform processing in units of macroblocks on encoded image data to be processed.
- the decoding method described in Patent Document 2 is characterized in that blocks having a dependency relationship are collectively executed in order to reduce communication between processors.
- FIG. 25 is an explanatory diagram illustrating an example of arrangement of transform blocks based on the H.264 standard.
- the image to be processed is composed of 4x4 or 8x8.
- H.264 has only two types of TU arrangement patterns for macroblocks: a pattern in which 16 4x4 are arranged and a pattern in which four 8x8 are arranged.
- the parallelism of the pattern in which 4x4 is arranged is 64.
- the parallelism of the pattern in which 8x8 is arranged is 32.
- the degree of parallelism per macroblock is 32 or more in any arrangement pattern. Therefore, a process related to one macroblock is assigned to one warp. Therefore, no overhead occurs in the conversion / quantization processing based on the H.264 standard when warp is used.
- FIG. 26 is an explanatory diagram showing an example of arrangement of transform blocks based on the H.265 standard.
- the image to be processed may be composed of TUs of all patterns of 4x4, 8x8, 16x16, and 32x32.
- CBF 0 is set in the 8 ⁇ 8 TU, and there is a TU in which the conversion process, the quantization process, the inverse conversion process, and the inverse quantization process may not be executed.
- FIG. 27 is a time chart showing an example of processing timing of conversion processing based on the H.265 standard.
- FIG. 27 is a time chart when a thread is assigned as shown in FIG. 26 and conversion processing is executed.
- the arrows shown in FIG. 27 represent TU conversion processing by threads.
- the blank shown in FIG. 27 represents the period during which the thread is stalled.
- one 16x16 TU is converted.
- an architecture such as SIMT cannot convert TUs of different sizes at the same time. That is, a large overhead occurs in the conversion process. The reason is as follows.
- the thread granularity shown in FIG. 27 is large, even if the thread granularity is reduced, it is required to assign one thread to one column or one row.
- the parallelism of the 4 ⁇ 4 TU conversion process with the smallest size is 4. That is, in the conversion process based on the H.265 standard, the degree of parallelism per block may be reduced, and it may be difficult to allocate the same process to 32 threads in many cases.
- Non-patent document 3 describes an example of a technique that solves the above problem and can be applied to the conversion processing unit.
- FIG. 28 is a block diagram illustrating a configuration example of the conversion processing unit 3000 to which the technique described in Non-Patent Document 3 is applied.
- Non-Patent Document 3 describes a technique for a decoder based on the H.264 standard.
- Non-Patent Document 3 describes a technique for assigning the same processing to all threads by collecting data of the same TU size in a temporary area in order to allocate the same processing to each thread and processing the data collectively.
- FIG. 28 shows a conversion processing unit obtained by extending the conversion processing unit described in Non-Patent Document 3 to an encoder (encoder) that performs conversion / quantization processing.
- a transformation / quantization unit 3101 to transformation / quantization unit 310N includes a transformation / quantization unit 3101 to transformation / quantization unit 310N, an inverse transformation / inverse quantization unit 3201 to 320N, a gather unit 3900, and a scatter unit 3910 to 3920.
- transform / quantization units 3101 to 310N and the inverse transform / inverse quantization units 3201 to 320N are each included in the number of TU patterns. That is, N corresponds to the number of TU patterns. Each processes a corresponding size TU.
- the gather unit 3900 receives a residual image and TU size information indicating information on TUs constituting the residual image.
- the gather unit 3900 uses the input TU size information to store the input residual image data for each TU size in a temporary area (not shown).
- the transform / quantization units 3101 to 310N perform transform / quantization processing on the residual image data stored in the temporary area corresponding to the TU size to be processed. Since data is stored for each TU size in the temporary area, the transform / quantization units 3101 to 310N can execute parallel processing efficiently. Each transform / quantization unit writes the generated transform coefficient back to the temporary area.
- the inverse transform / inverse quantization units 3201 to 320N respectively perform inverse transform / inverse quantization processing (inverse transform processing and inverse transform processing on the data of the transform coefficient stored in the temporary area corresponding to the TU size to be processed. Inverse quantization processing). Since the transform coefficient is stored for each TU size in the temporary area, the inverse transform / inverse quantization units 3201 to 320N can execute parallel processing efficiently in the same manner as the transform / quantization units 3101 to 310N. The inverse transform / inverse quantization units 3201 to 320N write a part of the generated reconstructed image back to the temporary area.
- the scatter unit 3910 rewrites a part of the reconstructed image for each TU size reconstructed by the inverse transform / inverse quantization units 3201 to 320N from the temporary region to the original region.
- the scatter unit 3920 writes the transform coefficient for each TU size generated by the transform / quantization units 3101 to 310N back from the temporary region to the original region.
- the gather unit 3900, the scatter unit 3910, and the scatter unit 3920 are mainly realized by a CPU suitable for executing the sequential processes.
- the transform / quantization units 3101 to 310N and the inverse transform / inverse quantization units 3201 to 320N are mainly realized by a GPU suitable for execution of parallel processing.
- each transform / quantization unit and each inverse transform / inverse quantization unit shown in FIG. 28 collectively process only data related to TUs of the same size. That is, when the transformation processing unit 3000 shown in FIG. 28 is realized by a GPU, a plurality of identical processes are allocated to warps that perform transformation / quantization processing and inverse transformation / inverse quantization processing.
- FIG. 29 is a time chart showing another example of processing timing of conversion processing based on the H.265 standard.
- FIG. 29 is a time chart when the conversion processing unit 3000 shown in FIG. 28 executes conversion processing for the TUs in the arrangement example shown in FIG.
- the warp is divided for each TU size, and the threads to be used are packed. Therefore, the number of stalled threads is reduced and the conversion process is executed more efficiently. If the number of TUs to be processed is not a multiple of the number of threads per warp, a thread that stalls is generated.
- Non-Patent Document 2 As described above, in H.264, which is the subject of Non-Patent Document 2, overhead is not generated even if a warp that performs transformation / quantization processing is assigned to a macroblock.
- Non-Patent Document 2 the performance of a configuration that sequentially processes two types of TUs as shown in FIG. 27 and the performance of a configuration that processes two types of TUs in parallel as shown in FIG. Is described.
- Non-Patent Document 2 describes that the performance of the configuration for processing in parallel is better.
- Non-Patent Document 3 shown in FIG. 28 is particularly effective in an encoding method in which TUs are adaptively arranged as in H.265.
- Patent Document 4 describes that an image analyzing method includes a step of recording the coordinates of an image block.
- the first problem of the conversion processing unit described in Non-Patent Document 3 is that the gather unit is required to have a temporary area.
- the gather unit 3900 shown in FIG. 28 is required to collect data for each TU size. Since the gather unit 3900 stores data in the temporary area for each TU size, it is required to have a temporary area equivalent to the size of the original image at the maximum.
- the conversion processing unit 3000 shown in FIG. 28 may be required to have at least twice the area of the residual image. As the size of the image to be processed increases, a conversion processing unit having a larger area is required, and extra cost is required.
- Non-Patent Document 3 The second problem of the conversion processing unit described in Non-Patent Document 3 is that communication occurring between the CPU and GPU becomes a major bottleneck. When processing images with a large resolution such as 4K or 8K, it becomes a particularly big bottleneck.
- the processes executed by the gather unit 3900, the scatter unit 3910, and the scatter unit 3920 are sequential processes. Therefore, when the scatter part and the gather part are realized by a massively parallel architecture such as the SIMT architecture, the SIMT architecture cannot perform processing efficiently. The reason is that the massively parallel architecture cannot efficiently execute sequential processing.
- the gather unit 3900, the scatter unit 3910, and the scatter unit 3920 are each realized by a CPU.
- each transform / quantization unit and each inverse transform / inverse quantization unit are realized by the GPU, a large amount of communication occurs between the CPU and the GPU. Since the amount of communication that occurs is so large that it becomes a bottleneck, there is a need for a conversion processing unit that realizes all the components on the GPU and suppresses the occurrence of communication unrelated to the original video encoding process. .
- the present invention provides a moving image coding apparatus, a moving image coding method, and a program recording medium that can solve the above-described problems and that can perform moving image coding processing in parallel without reducing the efficiency of parallel processing.
- the purpose is to provide.
- a moving image encoding apparatus includes a generation unit that generates position information indicating the position of each of a plurality of image blocks in an image for each size of the image block, and a predetermined size at a position indicated by the generated position information And an image processing unit that performs a conversion process on the image block.
- the moving picture coding method creates position information indicating the position of each of a plurality of image blocks in an image for each size of the image block, and an image block of a predetermined size at the position indicated by the created position information It is characterized by performing a conversion process on.
- the program recording medium includes a creation process for creating position information indicating the position of each of a plurality of image blocks in an image for each image block size, and an image having a predetermined size at the position indicated by the position information.
- a program for executing a conversion process on a block is recorded.
- the moving image encoding processing can be executed in parallel without reducing the parallel processing efficiency.
- FIG. 6 is a block diagram illustrating a configuration example of a list creation unit 3300.
- FIG. 6 is an explanatory diagram illustrating an example of an execution TU list created by a list creation unit 3300.
- FIG. It is a flowchart which shows the conversion quantization process which concerns on the conversion process part 3000 of 1st Embodiment.
- 5 is a flowchart showing list creation processing executed by list creation unit 3300.
- FIG. 10 is a block diagram illustrating a configuration example of a list update unit 3600.
- FIG. 10 is an explanatory diagram illustrating an example of a process of moving execution TU information in a list executed by a list moving unit 3620.
- FIG. FIG. 25 is an explanatory diagram showing another example of the process of moving the execution TU information in the list executed by the list moving unit 3620.
- 25 is an explanatory diagram showing still another example of the process of moving the execution TU information in the list executed by the list moving unit 3620. It is a flowchart which shows the conversion quantization process performed by the conversion process part 3000 of 4th Embodiment. 10 is a flowchart showing list update processing executed by list update unit 3600. It is a block diagram which shows the structural example of 5th Embodiment of the conversion process part by this invention. 10 is a block diagram illustrating a configuration example of a list initialization unit 3700. FIG. It is a flowchart which shows the conversion quantization process performed by the conversion process part 3000 of 5th Embodiment. 10 is a flowchart showing list initialization processing executed by list initialization unit 3700.
- FIG. 1 is a block diagram illustrating a configuration example of a moving image encoding device based on the H.265 standard. It is a block diagram which shows the structural example of the conversion process part 3000 shown in FIG. It is explanatory drawing which shows the example of arrangement
- FIG. 10 is a block diagram illustrating an exemplary configuration of an extended list creation unit 4100. It is a flowchart which shows the conversion and quantization process performed by the conversion process part 3000 of 6th Embodiment. 10 is a flowchart showing an extended list creation process executed by an extended list creation unit 4100. It is explanatory drawing which shows the relationship between an extended list and intermediate data. It is explanatory drawing which shows the compression order of a conversion factor. It is a block diagram which shows the outline
- FIG. 1 is a block diagram illustrating a configuration example of a conversion processing unit according to the first embodiment of the present invention. Note that although there are four types of H.265 TU size patterns of 4x4, 8x8, 16x16, and 32x32, it is assumed that there are N types of TU size patterns in this embodiment. Further, the arrows shown in the block diagrams after FIG. 1 show an example of the information flow, and are not intended to limit the information flow.
- the conversion processing unit 3000 of the moving picture coding apparatus does not include a gather unit 3900 and scatter units 3910-3920.
- the scatter unit for writing the image data stored in the temporary area back to the original area is not included.
- the conversion processing unit 3000 shown in FIG. 1 includes a list creation unit 3300, unlike the conversion processing unit 3000 shown in FIG.
- the configuration of the conversion processing unit 3000 shown in FIG. 1 is the same as the configuration of the conversion processing unit 3000 shown in FIG. 28 except for the list creation unit 3300.
- the gather unit 3900 receives TU size information and a residual image. On the other hand, TU size information and CBF are input to the list creation unit 3300.
- the address of the temporary area where the residual images of the corresponding TU size are collected and the number of execution TU sizes are input to each of the transform / quantization units 3101 to 310N shown in FIG.
- the residual image and the execution TU list are input to each of the transform / quantization units 3101 to 310N shown in FIG.
- Similar data is also input to each of the inverse transform / inverse quantization units 3201 to 320N.
- the list creation unit 3300 of this embodiment has a function of creating an execution TU list that is a list in which the position coordinates of TUs are listed for each TU size, with CBF and TU size information as inputs.
- the list creation unit 3300 since the list creation unit 3300 creates a list in which position coordinates are listed, operations such as processing on the input residual image data are not required. The reason is that each of the transform / quantization units 3101 to 310N can search for the TU of the processing target size in the residual image using the information of the list corresponding to the TU of the processing target size.
- the list creation unit 3300 can create an execution TU list in parallel.
- the list creation unit 3300 can create an execution TU list in parallel for each region that is the minimum size of a TU configured by 32x32. That is, when the screen is divided into 32 ⁇ 32 block areas, the list creation unit 3300 can process each 32 ⁇ 32 block in the screen in parallel.
- Each of the conversion / quantization units 3101 to 310N of the present embodiment executes conversion / quantization processing related to a plurality of TUs of the corresponding pattern. Therefore, when the transform / quantization units 3101 to 310N are realized by a SIMT architecture such as a GPU, TUs having the same size are allocated to warps, and parallel processing is efficiently executed.
- the data to be processed may exist discontinuously on the memory.
- SIMD Single Instruction Multiple Multiple Data
- data that exists continuously in the memory is processed together, so it is parallel when data that is discontinuous in the memory is processed. Processing efficiency decreases.
- each thread has a register independently, and each thread holds an execution target address in its own register. That is, there is an advantage that parallel processing is efficiently executed regardless of whether or not the data to be processed exists continuously in the memory.
- the inverse transform / inverse quantization units 3201 to 320N of the present embodiment execute inverse transform / inverse quantization processing for a plurality of TUs of the corresponding pattern. Therefore, when the inverse transform / inverse quantization units 3201 to 320N are realized by a SIMT architecture such as a GPU, transform coefficients of the same TU are allocated to the warp, and parallel processing is efficiently executed.
- FIG. 2 is a block diagram illustrating a configuration example of the list creation unit 3300.
- the list creation unit 3300 includes a count unit 3310, an address calculation unit 3320, and a list storage unit 3330.
- the counting unit 3310 has a function of counting the TU to be executed in the allocated area (that is, CBF ⁇ 0) for each TU size using the input TU size information and CBF.
- the area is an area of the divided residual image assigned so that the list creation processing is executed in parallel.
- the address calculation unit 3320 has a function of calculating each address in the list in which each execution TU information is stored in the allocated area.
- the list storage unit 3330 has a function of writing each execution TU information to each address in the list obtained by the address calculation unit 3320.
- the execution TU information is created by the list storage unit 3330.
- the list storage unit 3330 outputs a list in which all execution TU information is written as an execution TU list.
- the execution TU list is input to the transform quantization units 3101 to 310N.
- FIG. 3 is an explanatory diagram showing an example of an execution TU list created by the list creation unit 3300.
- the execution TU list for each TU size shown in FIG. 3 is a list created based on the TU arrangement example shown in FIG.
- the execution TU information includes, for example, the x coordinate and y coordinate of the execution target TU.
- FIG. 4 is a flowchart showing the transformation / quantization processing executed by the transformation processing unit 3000 according to the first embodiment.
- the conversion processing unit 3000 accepts input of residual images and TU size information. Based on the input CBF and TU size information, list creation unit 3300 creates an execution TU list in which execution TU information of TUs in the allocated area is listed for each TU size (step S101).
- the transform / quantization unit 3101 receives the list 1 and residual image related to the TU size pattern 1 in the execution TU list created by the list creation unit 3300 as input, and applies only to the TU related to the TU size pattern 1
- the transformation / quantization processing is executed collectively (step S102).
- the inverse transform / inverse quantization unit 3201 receives the transform coefficient output from the transform / quantization unit 3101 as an input, and performs inverse transform / inverse quantization processing (inverse transform processing and inverse transform processing on only the transform coefficient related to the TU size pattern 1. Inverse quantization processing is executed collectively (step S103).
- the transform / quantization unit 3102 receives the list 2 and the residual image related to the TU size pattern 2 in the execution TU list created by the list creation unit 3300 as input, and applies only to the TU related to the TU size pattern 2
- the transformation / quantization processing is executed collectively (step S104).
- the inverse transform / inverse quantization unit 3202 receives the input of the transform coefficient output from the transform / quantization unit 3102 and collectively executes the inverse transform inverse quantization process on only the transform coefficient related to the TU size pattern 2. (Step S105).
- the transform / quantization process and the inverse transform inverse quantization process are repeated in the same manner for each of the N types of TU size patterns (steps S102 to S107). After the processing is performed on each of the N types of TU size patterns, the conversion processing unit 3000 ends the conversion / quantization processing.
- transformation / quantization processing and the inverse transformation / inverse quantization processing for each of the N types of TU size patterns may be executed sequentially as shown in FIG. 4, or may be executed in parallel.
- FIG. 5 is a flowchart showing the list creation process executed by the list creation unit 3300. That is, the processing in steps S111 to S113 shown in FIG. 5 corresponds to the processing in step S101 shown in FIG.
- the list creation unit 3300 receives the input of the TU size information and the CBF, executes the list creation process, and then outputs the list for each TU size.
- counting section 3310 uses the input TU size information and CBF to count the TUs to be subjected to conversion / quantization processing existing in the allocated area for each TU size (step S111).
- the area is an area of the divided residual image assigned so that the list creation processing is executed in parallel. Since the process of step S111 is an independent process for each region, the count unit 3310 can efficiently execute parallel processing.
- the address calculation unit 3320 receives the input of the TU number information generated by the counting unit 3310, and calculates the address of the list to which the execution TU information of the TU to be subjected to the conversion / quantization process is written (step S112) .
- the address calculation unit 3320 calculates an address for each TU size.
- Parallel Scan is a method for efficiently obtaining partial sums in parallel processing, and is a method used in Stream Compaction.
- Stream Compaction is a process of filling out only significant data with respect to input data in which significant data exists discontinuously. That is, Stream Compaction is similar to the processing of the list creation unit 3300 that outputs the coordinate data related to the TU to be executed. The specific contents of Parallel Scan and Stream Compaction are described in Non-Patent Document 4.
- the address calculator 3320 calculates a partial sum of the TU numbers using Parallel Scan. Therefore, the address calculation unit 3320 can efficiently calculate an address such that a list in a format in which only the execution TU information of the execution target TU is packed is created by parallel processing.
- the list storage unit 3330 receives input of information indicating the address of the list generated by the address calculation unit 3320, and writes execution TU information to each address (step S113). Since the process of step S113 is an independent process for each execution region, the list storage unit 3330 can efficiently execute parallel processing. After writing all execution TU information, the list storage unit 3330 outputs an execution TU list. After outputting the execution TU list, the list creation unit 3300 ends the list creation process.
- the list creation unit 3300 of the present embodiment creates a list storing data of the same TU size for each TU size.
- the transform / quantization units 3101 to 310N and the inverse transform / inverse quantization units 3201 to 320N do not perform operations on image data such as gather and scatter, and operate multiple TUs of the same size. Can be executed collectively. That is, the transform / quantization process and the inverse transform / inverse quantization process are efficiently performed in parallel.
- the list created by the list creation unit 3300 lists only TU position information. For this reason, the temporary area required to create the list is smaller than at least the temporary area that can store the entire image required by the gather unit 3900 shown in FIG.
- the list creation unit 3300 of the present embodiment can efficiently execute the list creation processing in parallel on each area of the divided image, and thus is realized with a many-core architecture such as a GPU.
- the GPU can efficiently execute the list creation processing in parallel. That is, since the entire conversion processing unit 3000 including the list creation unit 3300 can be realized by a many-core architecture such as a GPU, the encoding process is efficiently executed.
- the moving picture encoding apparatus of this embodiment can perform moving picture encoding without reducing parallel processing efficiency, it can implement a high-speed moving picture encoding process.
- FIG. 6 is a block diagram illustrating a configuration example of the conversion processing unit 3000 according to the second embodiment of the present invention. Note that although there are four types of H.265 TU size patterns of 4x4, 8x8, 16x16, and 32x32, it is assumed that there are N types of TU size patterns in this embodiment.
- the conversion processing unit 3000 of the moving picture coding apparatus includes execution check units 3401 to 340N.
- the configuration of the conversion processing unit 3000 shown in FIG. 6 is the same as the configuration of the conversion processing unit 3000 shown in FIG. 1 except for the execution check units 3401 to 340N.
- the transform processing unit 3000 When all the transform coefficients output from the transform / quantization units 3101 to 310N are “0”, the transform processing unit 3000 according to the present embodiment does not perform the inverse transform / inverse quantization process on the transform coefficients.
- the reason why the transform processing unit 3000 does not perform the inverse transform / inverse quantization process is that even if the inverse transform / inverse quantization is performed on the transform coefficient of all “0”, only the result of “0” is obtained. This is because the cost required for the inverse quantization process is wasted.
- the execution check unit 3401 of this embodiment has a function of confirming whether or not a non-zero coefficient is included in a conversion coefficient related to a TU having a corresponding TU size.
- the execution check unit 3401 receives input of the transform coefficient output from the transform quantization unit 3101 and the execution TU list output from the list creation unit 3300, and scans the input transform coefficient.
- the execution check unit 3401 When all the conversion coefficients are “0” as a result of the scanning, the execution check unit 3401 performs an inverse operation on the data in the execution TU list of the TU corresponding to the scanned conversion coefficient (for example, list 1 or list 2). Flag information indicating that the TU is not subject to transformation / inverse quantization processing is added.
- the functions of the execution check units 3402 to 340N are the same as the functions of the execution check unit 3401.
- FIG. 7 is a flowchart illustrating the conversion / quantization processing executed by the conversion processing unit 3000 according to the second embodiment.
- step S201 is the same as the processing in step S101 shown in FIG. That is, list creation section 3300 creates an execution TU list in which execution TU information of TUs in the allocated area is listed for each TU size, based on the input CBF and TU size information.
- the conversion / quantization unit 3101 receives input of the list 1 and the residual image related to the TU size pattern 1 in the execution TU list created by the list creation unit 3300, and converts only the TU related to the TU size pattern 1. -Quantization processing is executed collectively. Next, the transform / quantization unit 3101 inputs the transform coefficient that is the execution result to the execution check unit 3401 (step S202).
- the execution check unit 3401 determines whether or not a non-zero coefficient is included in the conversion coefficient of the TU corresponding to the execution TU information described in the list 1 based on the input execution TU list and the conversion coefficient. Scan the transform coefficients to confirm.
- the execution check unit 3401 performs an inverse operation on the data in the list of TUs corresponding to the scanned conversion coefficients (that is, list 1). Flag information indicating that the TU is not subject to transformation / inverse quantization is added. When at least one non-zero coefficient is included in the conversion coefficient, the execution check unit 3401 does not perform processing on the list 1.
- the execution check unit 3401 inputs the transform coefficient and the execution TU list to the inverse transform / inverse quantization unit 3201 (step S203).
- the inverse transform / inverse quantization unit 3201 refers to the execution TU list list 1 input from the execution check unit 3401. If flag information is given to the referenced list 1, the inverse transform / inverse quantization unit 3201 does not perform the inverse transform / inverse quantization process on the input transform coefficient.
- the inverse transform / inverse quantization unit 3201 performs an inverse transform / inverse quantization process on the input transform coefficient.
- the inverse transform / inverse quantization unit 3201 collectively executes the inverse transform / inverse quantization processing for only the transform coefficient related to the TU size pattern 1 (step S204).
- the transformation / quantization processing, execution check processing, and inverse transformation / inverse quantization processing are repeatedly executed in the same manner for each of the N types of TU size patterns (steps S202 to S210). After the processing is performed on each of the N types of TU size patterns, the conversion processing unit 3000 ends the conversion / quantization processing.
- transformation / quantization processing, execution check processing, and inverse transformation / inverse quantization processing for each of the N types of TU size patterns may be performed sequentially as shown in FIG. 7, but are performed in parallel. May be.
- the execution check units 3401 to 340N of the present embodiment determine whether or not the input transform coefficient is an execution target of inverse transform / inverse quantization processing. By adding the execution check units 3401 to 340N, when there is a transform coefficient that does not need to be subjected to the inverse transform inverse quantization process, the amount of computation related to the inverse transform / inverse quantization process is reduced.
- FIG. 8 is a block diagram illustrating a configuration example of the conversion processing unit 3000 according to the third embodiment of the present invention. Note that although there are four types of H.265 TU size patterns of 4x4, 8x8, 16x16, and 32x32, it is assumed that there are N types of TU size patterns in this embodiment.
- the conversion processing unit 3000 of the moving picture coding apparatus includes a list creation unit 3500 after the execution check units 3401 to 340N. It is.
- the configuration of the conversion processing unit 3000 shown in FIG. 8 is the same as the configuration of the conversion processing unit 3000 shown in FIG. 6 except for the list creation unit 3500.
- the transform processing unit 3000 executes the inverse transform / inverse quantization process using the execution TU information including flag information indicating that the transform coefficient is an execution target of the inverse transform / inverse quantization process. It is characterized by creating an execution TU list again before being executed.
- the functions of the list creation unit 3500 of this embodiment are the same as the functions of the list creation unit 3300.
- the configuration of list creation unit 3500 is the same as the configuration of list creation unit 3300.
- the list creation unit 3500 has a function of taking the TU size information as an input and creating an execution TU list in which the execution TU information of the TU in the allocated area is listed for each TU size. Note that the list creation unit 3500 can execute creation processing for each region in parallel.
- FIG. 9 is a flowchart illustrating the transformation / quantization processing executed by the transformation processing unit 3000 according to the third embodiment.
- steps S301 to S302 is the same as the processing of steps S201 to S202 shown in FIG. That is, list creation section 3300 creates an execution TU list in which execution TU information of TUs in the allocated area is listed for each TU size, based on the input CBF and TU size information. Also, the transform / quantization unit 3101 accepts input of the list 1 and residual image related to the TU size pattern 1 in the execution TU list created by the list creation unit 3300, and only the TU related to the TU size pattern 1 is received. The transformation / quantization processing is executed collectively.
- the execution check unit 3401 confirms whether or not a non-zero coefficient is included in the conversion coefficient of the TU corresponding to the execution TU information described in the list 1, based on the input execution TU list and the conversion coefficient. Therefore, the conversion coefficient is scanned. When the scanned transform coefficient includes a non-zero coefficient, the execution check unit 3401 adds the execution target of the inverse transform / inverse quantization process to the execution TU information in the list 1 of TUs corresponding to the non-zero coefficient. Flag information indicating TU is attached (step S303).
- the execution check unit 3401 inputs to the list creation unit 3500 an execution TU list to which flag information indicating that the TU is an execution target of the inverse transform / inverse quantization process.
- the conversion / quantization process and the execution check process are repeatedly executed in the same manner for each of the N types of TU size patterns (steps S302 to S307).
- the list creation unit 3500 uses the TU execution TU information in the allocated area for each TU size. Create an execution TU list.
- the list creation unit 3500 Based on the execution TU information to which flag information indicating that the TU is an execution target of the inverse transform / inverse quantization process is given, the list creation unit 3500 performs an execution TU for the inverse transform / inverse quantization process. A list is created (step S308).
- the execution TU list created by the list creation unit 3500 has the execution TU information of TUs that are not subject to inverse transformation / inverse quantization deleted from the execution TU list created by the list creation unit 3300. That is, an execution TU list having a format in which execution TU information is included more closely is obtained.
- the inverse transform / inverse quantization unit 3201 inputs the list 1 regarding the TU size pattern 1 in the execution TU list created by the list creation unit 3500 and the transform coefficient output by the transform / quantization unit 3101. And performs inverse transform / inverse quantization processing on only the transform coefficients related to the TU size pattern 1 (step S309).
- List 1 includes only execution TU information related to the TU to be subjected to the inverse transform / inverse quantization process. Therefore, the inverse transform / inverse quantization unit 3201 needs to execute the inverse transform / inverse quantization process only on the transform coefficient corresponding to the TU to be executed with reference to the list 1.
- the inverse transform / inverse quantization process is repeatedly executed in the same manner for each of the N types of TU size patterns (steps S309 to S311). After the processing is performed on each of the N types of TU size patterns, the conversion processing unit 3000 ends the conversion / quantization processing.
- transformation / quantization processing, execution check processing, and inverse transformation / inverse quantization processing for each of the N types of TU size patterns may be performed sequentially as shown in FIG. 9, but are performed in parallel. May be.
- the list creation unit 3500 of this embodiment recreates the execution TU list before the inverse transform / inverse quantization process is performed. Therefore, the inverse transform / inverse quantization units 3201 to 320N can reduce threads required for the inverse transform / inverse quantization. The reason is as follows.
- the TU corresponding to the transform coefficient processed by one warp includes the TU that is the target of the inverse transform / inverse quantization process and the TU that is not the target of execution. -Inverse quantization processing is not executed efficiently.
- the inverse transform / inverse quantization units 3201 to 320N need only operate the threads required to execute the inverse transform / inverse quantization processing for the transform coefficient of the TU to be executed.
- FIG. 10 is a block diagram illustrating a configuration example of the conversion processing unit 3000 according to the fourth embodiment of the present invention. Note that although there are four types of H.265 TU size patterns of 4x4, 8x8, 16x16, and 32x32, it is assumed that there are N types of TU size patterns in this embodiment.
- the conversion processing unit 3000 of the moving picture coding apparatus includes a list update unit 3600 instead of the list creation unit 3500. Yes.
- the configuration of the conversion processing unit 3000 illustrated in FIG. 10 is the same as the configuration of the conversion processing unit 3000 illustrated in FIG. 8 except for the list update unit 3600.
- the list update unit 3600 of the present embodiment is characterized by simply updating the execution TU list created by the list creation unit 3300.
- the list updating unit 3600 uses the TU size information including flag information indicating that the TU is the target of the inverse transform / inverse quantization process, and executes it before the inverse transform / inverse quantization process is performed. Update the TU list.
- the function of the list update unit 3600 of the present embodiment is different from the function of the list creation unit 3300.
- the list update unit 3600 based on flag information indicating that the TU is an execution target of the inverse transform / inverse quantization process, executes TU information in the list for each TU size of an arbitrary region, regarding the TU to be executed Rearrange so that execution TU information is collected.
- the list update unit 3600 can execute update processing for each area in parallel.
- the conversion processing unit 3000 of the present embodiment may include the list update unit 3600 as many as the number of divided areas.
- SIMT architecture such as GPU fetches instructions for warps.
- the fetch is a process of reading the instruction code from the memory and transferring it to a register in the processor at the first stage when the microprocessor executes the instruction. That is, all threads in the warp are required to perform the same operation.
- the list update unit 3600 arranges execution TU information in an arbitrary area list together with execution TU information related to an execution target TU so that all threads in the warp perform the same operation. Rearrange as follows. When the list update unit 3600 does not update the list, a thread that stalls in the warp is generated if the TU assigned to the warp includes a non-executable TU.
- FIG. 11 is a block diagram illustrating a configuration example of the list update unit 3600. As illustrated in FIG. 11, the list update unit 3600 includes a TU execution check unit 3610 and a list moving unit 3620.
- the TU execution check unit 3610 has a function of searching for execution TU information related to TUs that are not subjected to inverse transformation / inverse quantization.
- the TU execution check unit 3610 searches for execution TU information related to TUs not to be executed in the execution TU list including flag information indicating that the TU is to be subjected to inverse transformation / inverse quantization.
- the list moving unit 3620 has a function of changing the position in the list of execution TU information related to TUs not to be executed in the allocated area. That is, list moving section 3620 moves execution TU information related to TUs that are not to be executed to another position in the list.
- ⁇ SIMT architecture such as GPU can execute processing efficiently if the processing of threads in the warp is uniform.
- the list moving unit 3620 rearranges the execution TU information in the list so that the processing of the threads in the warp is uniform.
- FIG. 12 is an explanatory diagram illustrating an example of a process of moving the execution TU information in the list executed by the list moving unit 3620.
- a rectangle that is not hatched indicates execution TU information of the execution target TU.
- a hatched rectangle indicates execution TU information of a TU that is not an execution target.
- An arrow indicates a warp.
- a rectangle including an arrow indicates execution TU information processed by the warp indicated by the arrow.
- a list 12a shows an example of execution TU information before movement.
- the execution TU information includes the execution TU information of the execution target TU and the execution TU information of the non-execution TU.
- warps that are forced to execute inefficient processing because the execution target TU and non-execution TU are mixed in the processing target block are indicated as "inefficient warp". .
- the list moving unit 3620 Based on the execution TU information included in the list 12a, the list moving unit 3620 sets the execution TU information of the execution target TU in the list to “1” and the execution TU information of the non-execution TU as “0”, for example. Sort the entire execution TU information in the list of any area sequentially.
- the list moving unit 3620 may sort the entire execution TU information in the list using a parallel sort algorithm.
- the list 12b in FIG. 12 shows an example of the execution TU information after movement.
- the sorted execution TU information is collected for each execution TU information of the execution target TU and for each execution TU information of the TU that is not the execution target. That is, the list moving unit 3620 can reduce the “inefficient warp” that is forced to execute inefficient processing because the execution target TU and the non-execution TU are mixed in the processing target block.
- FIG. 13 is an explanatory diagram showing another example of the process of moving the execution TU information in the list executed by the list moving unit 3620.
- a list 13a shows another example of execution TU information before movement.
- the list 13a is divided into a partial list 1 and a partial list 2.
- the partial list 1 and the partial list 2 there are a plurality of “inefficient warps” in which the execution target TU and the non-execution TU are mixed in the processing target block.
- the list 13b in FIG. 13 shows another example of the execution TU information after movement.
- the list moving unit 3620 sorts the execution TU information included in each partial list independently. By sorting each partial list, the execution TU list is easily updated with a smaller calculation amount than the example shown in FIG.
- FIG. 14 is an explanatory diagram showing still another example of the process of moving the execution TU information in the list executed by the list moving unit 3620.
- a list 14a shows still another example of execution TU information before movement.
- the list 14a is divided into a partial list 1 and a partial list 2.
- warps A to E which are “inefficient warps” in which the execution target TU and the non-execution TU are mixed in the block to be processed.
- the list moving unit 3620 exchanges execution TU information of each TU processed by each warp. Focusing on the fact that the processing is executed efficiently if the threads in the warp perform the same operation, the processing is performed by replacing the execution TU information so that the blocks processed by the threads in the warp are only the TU to be executed. Is executed efficiently.
- the list moving unit 3620 exchanges the execution TU information of the execution target TU processed by the warp A and the execution TU information of the non-execution TU processed by the warp B.
- the list moving unit 3620 exchanges the execution TU information of the execution target TU processed by the warp C and the execution TU information of the non-execution TU processed by the warp E.
- the execution TU information of TUs that are not to be executed is collected, so the warp corresponding to the warp A and the warp C is deleted. That is, by performing execution TU information exchange, inverse transform / inverse quantization processing is executed with fewer warps.
- FIG. 15 is a flowchart illustrating the transformation / quantization processing executed by the transformation processing unit 3000 according to the fourth embodiment.
- steps S401 to S407 is the same as the processing of steps S301 to S307 shown in FIG.
- the list update unit 3600 displays a list for each TU size.
- the execution TU information related to is updated so as to be arranged together (step S408).
- the inverse transform / inverse quantization unit 3201 inputs the list 1 regarding the TU size pattern 1 in the execution TU list updated by the list update unit 3600 and the transform coefficient output by the transform / quantization unit 3101. Acceptance is performed, and inverse transformation / inverse quantization processing is collectively executed only on the transformation coefficient relating to TU size pattern 1 (step S409).
- execution TU information related to TUs to be subjected to the inverse transform / inverse quantization process is collectively arranged. Therefore, the inverse transform / inverse quantization unit 3201 needs to execute the inverse transform / inverse quantization process only on the transform coefficient corresponding to the TU to be executed with reference to the list 1.
- the inverse transform / inverse quantization process is repeatedly executed in the same manner for each of the N types of TU size patterns (steps S409 to S411). After the processing is performed on each of the N types of TU size patterns, the conversion processing unit 3000 ends the conversion / quantization processing.
- transform / quantization process execution check process
- inverse transform / inverse quantization process for each of the N types of TU size patterns may be performed sequentially as shown in FIG. 15, but are performed in parallel. May be.
- FIG. 16 is a flowchart showing a list update process executed by the list update unit 3600. That is, the processing in steps S421 to S422 shown in FIG. 16 corresponds to the processing in step S408 shown in FIG.
- the TU execution check unit 3610 excludes the execution based on the TU size information to which the flag information indicating that the input TU list is the TU to be subjected to the inverse transform / inverse quantization process is added.
- the execution TU information related to the TU is searched (step S421).
- the list moving unit 3620 moves the execution TU information so that the execution TU information in the list related to the TUs not to be executed searched by the TU execution check unit 3610 is collected (step S422). After moving the execution TU information, the list update unit 3600 ends the list update process.
- the list update unit 3600 of the present embodiment simply updates the execution TU list before the inverse transform / inverse quantization process is executed.
- the amount of calculation related to the list update processing of the present embodiment is smaller than the amount of calculation when the execution TU list is recreated by taking a partial sum as in the third embodiment, for example. Therefore, the conversion processing unit 3000 according to the present embodiment can reduce the thread required for the inverse conversion / inverse quantization process with a smaller amount of calculation.
- FIG. 17 is a block diagram illustrating a configuration example of a conversion processing unit 3000 according to the fifth embodiment of the present invention. Note that although there are four types of H.265 TU size patterns of 4x4, 8x8, 16x16, and 32x32, it is assumed that there are N types of TU size patterns in this embodiment.
- the conversion processing unit 3000 of the moving picture coding apparatus differs from the conversion processing unit 3000 shown in FIG. 10 in that a list initialization unit 3700 and a list update are used instead of the list creation unit 3300. Part 3800 is included.
- the configuration of the conversion processing unit 3000 illustrated in FIG. 17 is the same as the configuration of the conversion processing unit 3000 illustrated in FIG. 10 except for the list initialization unit 3700 and the list update unit 3800.
- the conversion processing unit 3000 is characterized in that an execution TU list is simply created using TU size information.
- the list initialization unit 3700 of the present embodiment has a function of creating a list in which the TU execution TU information in the allocated area is listed for each TU size based on the input TU size information.
- the list creation unit 3300 of the first to fourth embodiments creates execution TU information as many as the number of TUs to be subjected to transformation / quantization processing.
- the list initialization unit 3700 of the present embodiment creates execution TU information (hereinafter also referred to as entries) by the number of TUs theoretically present in the screen.
- list initialization unit 3700 can execute initialization processing for each area in parallel.
- the conversion processing unit 3000 of this embodiment may include the list initialization unit 3700 as many as the number of divided areas.
- the configuration of the list update unit 3800 is the same as the configuration of the list update unit 3600 shown in FIG.
- the list update unit 3800 is a format in which the thread in the warp that implements the transform / quantization units 3101 to 310N can easily execute the transform / quantization processing in parallel in the format of the list created by the list initialization unit 3700 in the previous stage. It has a function to update to.
- the list update unit 3800 can execute update processing for each area in parallel.
- the conversion processing unit 3000 of the present embodiment may include the list update unit 3800 as many as the number of divided areas.
- FIG. 18 is a block diagram illustrating a configuration example of the list initialization unit 3700.
- the list initialization unit 3700 includes a TU execution check unit 3710 and an entry creation unit 3720.
- the TU execution check unit 3710 has a function of searching for TUs that are not subject to conversion / quantization processing.
- the TU execution check unit 3710 scans all the TUs in the allocated area using the CBF and TU size information indicating the TUs that are not subject to the conversion / quantization processing, and is excluded from the number of divided areas. Search for TUs.
- the entry creation unit 3720 has a function of creating an entry in the execution TU list for the allocated area.
- the entry creation unit 3720 creates an entry in each execution TU list by distinguishing between execution TUs and non-execution TUs for all TUs existing in the allocated area.
- the entry creation unit 3720 stores the created entry in the execution TU list.
- FIG. 19 is a flowchart showing the transformation / quantization processing executed by the transformation processing unit 3000 according to the fifth embodiment.
- the list initialization unit 3700 receives input of CBF and TU size information, and creates an execution TU list in which execution TU information of TUs in the allocated area is listed for each TU size (step S501).
- the list update unit 3800 is provided with flag information indicating that it is a TU that is not subject to execution of transformation / quantization processing, and a list of execution TU information for each TU size is listed.
- the execution TU information related to the TU is updated so as to be arranged together (step S502).
- the processing in steps S503 to S512 is the same as the processing in steps S402 to S411 shown in FIG.
- the conversion processing unit 3000 ends the conversion / quantization processing.
- transformation / quantization processing, execution check processing, and inverse transformation / inverse quantization processing for each of the N types of TU size patterns may be performed sequentially as shown in FIG. 19, but are performed in parallel. May be.
- FIG. 20 is a flowchart showing list initialization processing executed by the list initialization unit 3700. That is, the processing in steps S521 to S522 shown in FIG. 20 corresponds to the processing in step S501 shown in FIG.
- Arbitrary image areas are assigned to the list initialization unit 3700 so that the initialization process is executed in parallel.
- the list initialization unit 3700 accepts input of TU size information and CBF.
- the TU execution check unit 3710 uses the input TU size information to count the execution target TU and the non-execution TU existing in the allocated area for each TU size (step S521).
- the TU execution check unit 3710 inputs the acquired number of TUs to the entry creation unit 3720.
- the entry creation unit 3720 creates each entry in the execution TU list for the allocated area based on the number of TUs acquired by the TU execution check unit 3710 (step S522).
- the entry creation unit 3720 creates each entry by distinguishing the execution target TU and the non-execution TU.
- the entry creation unit 3720 stores the created entry in the execution TU list. After storing all the entries, the list initialization unit 3700 ends the list initialization process.
- the list initialization unit 3700 of the present embodiment simply creates an execution TU list, and the list update unit 3800 updates the execution TU list.
- the amount of calculation related to the list initialization process and the list update process of the present embodiment is smaller than the amount of calculation related to the list creation process when the execution TU list is created from the beginning by taking a partial sum, for example. Therefore, the conversion processing unit 3000 according to the present embodiment can reduce threads required for conversion / quantization with a smaller amount of calculation.
- Embodiment 6 In general, when using an accelerator attached to a CPU such as a GPU, data transfer between the CPU and the GPU via the bus is indispensable, so there is a tendency for the transfer time generated in this data transfer to become a major bottleneck. is there.
- the data transfer speed in PCI (Peripheral Component Interconnect) Express which is a commonly used bus communication standard, is 1 to 2 digits lower than the data transfer speed to the memory inside the CPU or GPU.
- the technique disclosed in Patent Document 3 stores the transform coefficient included in the block after the transform / quantization processing separately for only non-zero values in position information and values in the block. As described above, since many conversion coefficients are “0” after the conversion / quantization processing, the technique described in Patent Document 3 can realize data compression and can be expected to improve the data transfer speed.
- the technique described in Patent Document 3 can be divided into blocks each having a predetermined number of pixels, which are execution units of parallel processing, and each block can be processed in parallel.
- the technique described in Patent Document 3 scans the transform coefficients in the block sequentially when compressing the block, and thins out all the transform coefficients in the block when the number of non-zero coefficients exceeds the threshold. By reducing the number of bits of the transform coefficient, the data size required for storing the transform coefficient is reduced.
- the block here is preferably a TU. If it does so, a compression process will be performed for every TU.
- the TU is preferably compressed in the order of processing in the subsequent encoding process (so-called Z scan). As described above, after transform quantization, many non-significant TUs, that is, TUs whose transform coefficients are not “0”, are generated. Therefore, the compression process only needs to be performed on significant TUs.
- the encoding unit can specify the position of each TU by using the TU size information in the frame and the CBF at the time of encoding.
- TU position information corresponding to the compressed data may be added.
- the compression process includes a process of scanning the conversion coefficient in the TU.
- the number of data to be scanned varies depending on the TU size. Therefore, if the TU size is different, the processing executed on the TU is also different, so that different TU sizes generated by transform quantization occur, resulting in a reduction in the efficiency of parallel processing. Therefore, also in the compression processing, by using the list used in the transform quantization described in the first to fifth embodiments described above and processing for each TU size, it is possible to expect improvement in the efficiency of parallel processing.
- the compressed data has a variable length size. Then, in order to execute the compression processing in parallel, it is necessary to calculate in advance the position where the compressed data is written. Since each TU may become a non-significant TU after transform / quantization processing, it is necessary to calculate the compression order after transform quantization. However, since each TU at this time is classified for each TU size by the list, it is difficult to calculate the compression order in consideration of TUs of all sizes as shown in FIG. Therefore, it is necessary to calculate the compression order for the compression process and re-create the list for compression, which may be a big bottleneck.
- FIG. 30 is a block diagram illustrating a configuration example of a conversion processing unit according to the sixth embodiment of the present invention. Note that although there are four types of H.265 TU size patterns of 4x4, 8x8, 16x16, and 32x32, it is assumed that there are N types of TU size patterns in this embodiment.
- the conversion processing unit 3000 of the moving picture coding apparatus includes an extended list creation unit 4100 instead of the list creation unit 3300, unlike the conversion processing unit 3000 shown in FIG.
- the conversion processing unit 3000 includes an intermediate data update unit 4300 and data compression units 4401 to 440N.
- the execution check units 4201 to 420N input and output intermediate data.
- the configuration of the conversion processing unit 3000 shown in FIG. 30 is the same as that of the conversion processing unit 3000 shown in FIG. 6 except for the configuration of the extended list creation unit 4100, execution check units 4201 to 420N, intermediate data update unit 4300, and data compression units 4401 to 440N. It is the same as that of the structure.
- One feature of the conversion processing unit 3000 according to the present embodiment is that the conversion coefficient to be transferred to the CPU is compressed by using the execution TU list and the intermediate data.
- the extended list creation unit 4100 has a function of receiving input of TU size information and CBF, and outputting an extended list and intermediate data.
- 4x4 block unit position information corresponding to intermediate data in 4x4 block units is stored in the elements of the extended list.
- the position information in units of 4 ⁇ 4 blocks is information for identifying the position of the intermediate data, for example, an index.
- the extended list and the intermediate data are associated with each other by an intermediate data index as illustrated in FIG.
- the intermediate data index makes it possible to access the intermediate data corresponding to the index from the extended list.
- an entry (element) whose block coordinate (x, y) is (0, 0) corresponds to an index “0”, that is, the first (first) entry in the intermediate data.
- an entry having a block coordinate (x, y) of (4, 0) corresponds to the index “1”, that is, the next (second) entry after the first entry in the intermediate data.
- the intermediate data index represents the correspondence between the extended list and the intermediate data.
- the intermediate data index illustrated in FIG. 34 is an index when an offset, which will be described later, is “0”.
- the intermediate data is described here using an example generated in units of 4x4, this is not restrictive.
- the intermediate data may be data that can be associated with each TU.
- “Position information in units of 4 ⁇ 4 blocks” and “index” are information indicating the correspondence between the extended list and the intermediate data, and correspond to an example of “correspondence information” in the present invention.
- the execution check units 4201 to 420N accept input of the extension list and intermediate data output from the extension list creation unit 4100 and the transform coefficients output from the transform / quantization units 3101 to 310N.
- the execution check units 4201 to 420N scan the TU conversion coefficient indicated by each entry in the extended list to check whether the TU is insignificant, and flag information in the intermediate data indicated by the index in the entry. Has a function of writing.
- the intermediate data update unit 4300 has a function of receiving input of intermediate data and CBF after transform quantization, and updating and outputting the intermediate data.
- the compression order related to the data compression units 4401 to 440N is stored.
- the extended list creation unit 4100 divides the extended list into a plurality of lists for each block size.
- the intermediate data update unit 4300 updates the intermediate data so that the compression order of each block is at the link destination described in the entry of the extended list corresponding to each block without being based on the extended list.
- the compression order can be updated without using an extended list divided for each size.
- the intermediate data update unit 4300 stores the compression order of each block in the intermediate data pointed to by the index described in the extended list entry of each block based on the execution flag included in the intermediate data.
- the processing as described above is performed for each entry of the execution flag included in the intermediate data when execution is “1” and non-execution is “0” in the execution flag entry included in the intermediate data. This can be realized by calculating the partial sum of. As described above, the partial sum can be efficiently calculated in parallel by using Parallel Scan.
- the update of the intermediate data by the intermediate data update unit 4300 may be to rewrite the execution flag in the compression order, or to write the compression order in addition to the execution flag.
- the intermediate data update unit 4300 calculates a partial sum in the same manner as the extended list creation unit 4100, the intermediate data can be updated in parallel.
- the intermediate data update unit 4300 operates in parallel on an arbitrary fixed-length area. For example, when the screen is divided into 32 ⁇ 32 block areas and given to the intermediate data update unit 4300, the intermediate data update unit 4300 can process each 32 ⁇ 32 block in the screen in parallel.
- the data compression units 4401 to 440N use the extended list and intermediate data to refer to the intermediate data corresponding to the entries in the extended list, thereby compressing the data for each block size and outputting the compressed data To do. Therefore, if the data compression unit 4201 to 420N is realized by the SIMT architecture such as GPU like the transformation / quantization unit 3101 to 310N and the inverse transformation / inverse quantization unit 3201 to 320N, the blocks of the same size are warped. And parallel processing is executed efficiently.
- FIG. 31 is a block diagram showing a configuration example of the extended list creation unit 4100.
- the extended list creation unit 4100 includes an index calculation unit 4130.
- the extended list creation unit 4100 is different from the list creation unit 3300 shown in FIG. 2 in that the list storage unit 3330 is replaced with the extended list storage unit 4140, and the output is the extended list and intermediate data. Is different.
- the configurations of the block count unit 4110 and the address calculation unit 4120 are the same as those of the list creation unit 3300 shown in FIG. However, the address calculation unit 4120 calculates the address of the extended list instead of the address of the list.
- the index calculation unit 4130 has a function of calculating position information of 4x4 block units of the target block as an index. For example, the index calculation unit 4130 uses information (relative position information) indicating the relative position of each block in an area (in charge area) assigned to a certain thread, a value for identifying each thread such as a thread ID, and the value in the area. By offsetting with a value obtained by multiplying the number of blocks, position information in units of 4 ⁇ 4 blocks can be easily calculated.
- FIG. 37 is a diagram illustrating an example of a method for calculating an index including an offset.
- the index of the block whose relative position information is “16” in the assigned area of the thread ID “1” is the product of the thread ID (1) and the number of blocks (64) to the value (16) of the relative position information.
- the offset is also “0”.
- the extended list storage unit 4140 accepts the input of the storage address of the extended list calculated by the address calculation unit 4120 and the index calculated by the index calculation unit 4130, and uses the block position information and the index as list data. , And a function of storing in the storage address of the extended list.
- FIG. 32 is a flowchart illustrating transform quantization processing and data compression processing executed by the transform processing unit 3000 according to the sixth embodiment.
- the conversion processing unit 3000 accepts input of a residual image, TU size information, and CBF.
- the extended list creation unit 4100 uses the input TU size information and CBF to expand the list data including the position information to the intermediate data corresponding to the position information of the execution target block for each block size.
- a list is created (step S601).
- step S602 is the same as the processing in step S202 shown in FIG. That is, the transform / quantization unit 3101 accepts the input of the list 1 and the residual image related to the TU size pattern 1 in the execution TU list created by the list creation unit 3300, and applies only to the TU related to the TU size pattern 1.
- the transformation / quantization processing is executed collectively.
- the execution check unit 4201 determines whether each TU of the TU size pattern 1 has become insignificant due to the conversion / quantization process with respect to the list regarding the TU size pattern 1 in the extended list created by the extended list creating unit 4100.
- the execution flag is written in the TU area of the intermediate data using the index described in the entry (step S603).
- step S604 is the same as the process of step S204 shown in FIG. That is, the inverse transform / inverse quantization unit 3201 performs an inverse transform / inverse quantization process on the input transform coefficient.
- step S605 is the same as the processing in step S205 shown in FIG. That is, the transform / quantization unit 3101 executes transform / quantization processing.
- the execution check unit 4202 determines whether each TU of the TU size pattern 2 has become insignificant due to the conversion / quantization process for the list related to the TU size pattern 2 in the extended list created by the extended list creation unit 4100. Using the index described in the entry, the execution flag is written in the area for the TU on the intermediate data (step S606).
- step S607 is the same as the process of step S207 shown in FIG. That is, the inverse transform / inverse quantization unit 3201 performs an inverse transform / inverse quantization process on the input transform coefficient.
- the conversion processing unit 3000 also performs processing on the list related to the TU size pattern 3 and later as in the case of the TU size patterns 1 and 2. Conversion processing unit 3000 repeats the same processing up to the list related to TU size pattern N (steps S608 to S610).
- the intermediate data update unit 4300 receives the input of the intermediate data output by the execution check units 4201 to 420N, and stores it in the intermediate data indicated by the index described in the entry of the expansion list corresponding to the compression order of each TU. In step S611, the intermediate data is updated.
- the data compression unit 4401 uses the extended list output by the execution check unit 4201, the transform coefficient output by the transform / quantization unit 3101, and the intermediate data output by the intermediate data update unit 4300, Of the conversion coefficients of the entire screen, the conversion coefficient related to TU size pattern 1 is compressed (step S612).
- the data compression unit 4401 receives as input the extended list that is the output of the execution check unit 4201, the transform coefficient that is the output of the transform / quantization unit 3101, and the intermediate data that is the output of the intermediate data update unit 4300.
- the conversion coefficient related to the TU size pattern 2 in the entire conversion coefficient is compressed (step S613).
- the data compression process is repeated for N types of TU size patterns in the same manner (steps S612 to S614). After the processing is performed for each of the N types of TU size patterns, the conversion processing unit 3000 ends the conversion processing.
- the conversion processing unit 3000 may sequentially execute conversion processing and data compression processing for each of the N types of TU size patterns as shown in FIG. 32, but processing for each of the N types of TU size patterns is performed in parallel. It may be executed.
- FIG. 33 is a flowchart showing an extended list creation process executed by the extended list creation unit 4100. That is, the processing in steps S621 to S624 shown in FIG. 33 corresponds to the processing in step S601 shown in FIG.
- the block number counting unit 4110 counts the number of execution target blocks in the processing target area using the TU size information and the CBF (step S621).
- the address calculation unit 4120 uses the number of execution target blocks in the assigned area counted by the block number counting unit 4110 to calculate the address at which the extended list entry of each TU to be executed is stored (step) S622).
- the index calculation unit 4130 calculates position information in units of 4 ⁇ 4 blocks corresponding to each execution target block in the assigned area using the TU size information and CBF (step S623).
- the extended list storage unit 4140 creates an extended list entry for each execution target block in the assigned area using the address calculated in step S622 and the index calculated in step S623, and stores it in the corresponding address. (Step S624). After storing the extended list entries for all execution target blocks, the extended list creating unit 4100 ends the extended list creating process.
- the extended list creation unit 4100 creates a list in which data of the same block size is stored for each block size, and stores the correspondence to the intermediate data in the extended list as a common index for all block sizes. It has the composition to do. By having such a configuration, the extended list itself is separated for each block size, but the dependency relationship between the block sizes can be maintained through the intermediate data. Therefore, since the extended list creation unit 4100 can calculate the compression order based on the intermediate data from the execution flag after the conversion process is completed, the same extended list can be used for the conversion / quantization process and the data compression process. Calculation cost for re-creation can be reduced.
- the extended list creation unit 4100 can execute the calculation of the compression order for all the block sizes at once by managing the data for all the block sizes with the same intermediate data.
- the extended list creation unit 4100 can solve the problem that the amount of calculation for creating the list becomes a bottleneck. Therefore, the moving image processing apparatus according to the present embodiment can execute moving image processing in which the amount of calculation required for creating a list is reduced, so that high-speed moving image processing can be realized.
- the embodiment of the moving picture encoding apparatus according to the present invention is not limited to the first to sixth embodiments described above.
- the embodiment of the moving image encoding apparatus according to the present invention performs other processes such as other moving image encoding processes in which similar processes are executed, and motion compensation prediction processes other than transform / quantization processes, for example. It may be an embodiment to be executed.
- each of the above embodiments can be configured by hardware, but can also be realized by a computer program recorded on a recording medium, for example.
- the information processing apparatus shown in FIG. 21 includes a processor 1001, a program memory 1002, a storage medium (recording medium) 1003 for storing video data, and a storage medium 1004 for storing data such as a bit stream.
- the storage medium 1003 and the storage medium 1004 may be separate storage media, or may be storage areas composed of the same storage medium. As these storage media, magnetic storage media such as a hard disk can be used.
- In the storage medium 1003, at least an area in which a program is stored is a non-transitory tangible storage medium (non-transitory tangible media).
- the program memory 1002 stores a program for realizing the function of each block shown in FIGS. 1, 6, 8, 10, 17, and 30.
- the processor 1001 implements the functions of the conversion processing units shown in FIGS. 1, 6, 8, 10, 17, and 30 by executing processing according to a program stored in the program memory 1002.
- FIG. 22 is a block diagram showing an example of an outline of a moving image encoding apparatus according to the present invention.
- the moving image coding apparatus 10 according to the present invention is created by a creation unit 11 (for example, a list creation unit 3300) that creates position information indicating the position of each of a plurality of image blocks in an image for each size of the image block.
- An image processing unit 12 (for example, transformation / quantization units 3101 to 310N and inverse transformation / inverse quantization units 3201 to 320N) that performs transformation processing on an image block of a predetermined size at the position indicated by the positional information.
- the video encoding device 10 can execute the video encoding processing in parallel without reducing the efficiency of the parallel processing.
- the creation unit 11 creates position information indicating the position of the image block that is the target of the conversion process and the quantization process
- the image processing unit 12 refers to the position information and performs the conversion process on the image block of a predetermined size.
- a transform quantization unit for example, transform / quantization units 3101 to 310N
- an inverse transform inverse quantization unit that performs inverse quantization processing and inverse transform processing on the processing result of the transform quantization unit (For example, inverse transform / inverse quantization units 3201 to 320N) may be included.
- the moving image encoding apparatus can reduce threads required for the conversion process and the quantization process.
- the inverse transform inverse quantization unit may perform an inverse quantization process and an inverse transform process on a processing result other than 0.
- the moving picture coding apparatus can reduce the amount of calculation related to the inverse quantization process and the inverse transform process.
- the image processing unit 12 creates second position information that indicates the position of the image block that is the target of the inverse quantization process and the inverse transform process for each size of the image block using the processing result of the transform quantization unit.
- 2 creation unit for example, list creation unit 3500
- the inverse transform inverse quantization unit refers to the second position information, and performs transform quantization corresponding to the image block that is the target of the inverse quantization process and the inverse transform process.
- the inverse quantization process and the inverse transform process may be performed on the processing result of the unit.
- the moving image encoding apparatus can reduce threads required for the inverse quantization process and the inverse transform process.
- the image processing unit 12 uses the processing result of the transform quantization unit to provide third position information that continuously includes information indicating the position of the image block that is the target of the inverse quantization process and the inverse transform process. May be included by updating the position information created by the creating unit 11 (for example, a list updating unit 3600).
- the inverse transform inverse quantization unit refers to the third position information and performs a predetermined unit on the processing result of the transform quantization unit corresponding to the image block that is the target of the inverse quantization process and the inverse transform process. You may perform an inverse quantization process and an inverse transformation process for every.
- the moving picture encoding apparatus can reduce the warp required for the inverse quantization process and the inverse transform process.
- the creation unit 11 (for example, the list initialization unit 3700 and the list update unit 3800) creates position information that continuously includes information indicating the position of the image block that is the target of the transformation process and the quantization process.
- the image processing unit 12 performs conversion processing and quantization processing on an image block of a predetermined size with reference to position information for each predetermined unit, and processing results of the conversion quantization unit And an inverse transform inverse quantization unit that performs an inverse quantization process and an inverse transform process.
- the moving picture encoding apparatus can reduce the warp required for the conversion process and the quantization process.
- the creation unit 11 may create position information based on each of the image areas that are the divided image data in parallel.
- the moving image encoding apparatus can execute the list creation processing for the residual image in parallel.
- FIG. 36 is a block diagram showing another example of the outline of the moving picture encoding apparatus according to the present invention.
- the moving image coding apparatus 20 includes a creation unit 21 (for example, an extended list creation unit 4100), an image processing unit 22 (for example, transform / quantization units 3101 to 310N), and an update unit 23 (for example, an intermediate data update unit). 4300) and a data compression unit 24 (for example, data compression units 4401 to 440N).
- the creation unit 21 stores position information indicating the position of each of the plurality of image blocks in the image for each image block size, and data (for example, intermediate data) in which the position information and the compression order of the image blocks by the data compression unit 24 are stored.
- the image processing unit 22 performs a conversion process on an image block of a predetermined size at the position indicated by the position information created by the creating unit 21.
- the update unit 23 collectively updates the data created by the creation unit 21 based on the result of the conversion process by the image processing unit 22.
- the data compression unit 24 uses the data updated by the update unit 23 to compress the image block for each size.
- the video encoding device 20 can execute the video encoding processing in parallel without reducing the efficiency of the parallel processing.
- Embodiments of the present invention are not limited to the above-described embodiments, and may include modifications that can be understood by those skilled in the art.
- the embodiment of the present invention may be a form in which some or all of the above-described embodiments are appropriately combined.
- some or all of the embodiments of the present invention can be described as the following supplementary notes, but are not limited thereto.
- Appendix 1 A creation unit that creates position information indicating the position of each of the plurality of image blocks in the image for each size of the image block; An image processing unit that performs a conversion process on an image block of a predetermined size at a position indicated by the generated position information.
- the creation unit creates position information indicating a position of an image block that is a target of transformation processing and quantization processing
- the image processing unit refers to the position information, performs a transform process and a quantization process on an image block of a predetermined size, and performs an inverse quantization process on a processing result of the transform quantization unit
- the moving image encoding apparatus according to appendix 1, further comprising: an inverse transform inverse quantization unit that performs an inverse transform process.
- the image processing unit creates second position information that indicates a position of an image block that is a target of the inverse quantization process and the inverse transform process for each size of the image block by using the processing result of the transform quantization unit.
- the inverse transform inverse quantization unit refers to the second position information, and performs an inverse quantization process on a processing result of the transform quantization unit corresponding to an image block that is an object of the inverse quantization process and the inverse transform process.
- the moving image encoding apparatus according to Supplementary Note 2 or Supplementary Note 3, which performs an inverse transform process.
- the image processing unit uses the processing result of the transform quantization unit to obtain third position information in which information indicating the position of the image block that is the target of the inverse quantization process and the inverse transform process is continuously included. Including a third creation unit created by updating the location information created by the creation unit; The inverse transform inverse quantization unit refers to the third position information, and performs a predetermined unit for the processing result of the transform quantization unit corresponding to the image block that is the target of the inverse quantization process and the inverse transform process.
- the moving picture coding apparatus according to attachment 2 or attachment 3, wherein the inverse quantization processing and the inverse transformation processing are performed on
- the creation unit creates position information that continuously includes information indicating the position of the image block that is the target of the transformation process and the quantization process
- the image processing unit refers to the position information, performs transform processing and quantization processing on an image block of a predetermined size for each predetermined unit, and processing results of the transform quantization unit
- the moving image coding apparatus according to appendix 1, further comprising: an inverse transform inverse quantization unit that performs an inverse quantization process and an inverse transform process.
- the creation unit creates the position information and correspondence information indicating a correspondence relationship between the position information and data in which the compression order of the image blocks is stored,
- An update unit that collectively updates the data based on the result of the conversion process;
- the moving image encoding apparatus according to any one of appendices 1 to 6, further comprising: a data compression unit that compresses an image block for each size using the updated data.
- a moving image for executing creation processing for creating position information indicating the position of each of a plurality of image blocks in the image for each size of the image block, and conversion processing for an image block of a predetermined size at the position indicated by the position information Encoding program.
- the present invention can execute moving image encoding at high speed without reducing parallel processing efficiency, and can realize high-speed processing of high-resolution video. Therefore, the present invention can be suitably applied to an imaging system or a transcoding system that requires high resolution processing.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
本例では、どのパターンのTUにも、変換処理等が実行されなくてもよいTUが存在する可能性がある。
[構成]
以下、本発明の実施形態を、図面を参照して説明する。図1は、本発明の第1の実施形態に係る変換処理部の構成例を示すブロック図である。なお、H.265のTUサイズパターンは4x4、8x8、16x16、および32x32の4種類であるが、本実施形態のTUサイズパターンはN種類であるとする。また、図1以降のブロック図に示される矢印は、情報の流れの一例を示し、情報の流れを限定することを意図しない。
すなわち、画面が32x32ブロックの各領域に分割された場合、リスト作成部3300は、画面内の各32x32ブロックを並列に処理できる。
リスト格納部3330は、全ての実行TU情報が書き込まれたリストを、実行TUリストとして出力する。実行TUリストは、変換量子化部3101~310Nに入力される。
以下、本実施形態の変換処理部3000の動作を図4を参照して説明する。図4は、第1の実施形態の変換処理部3000により実行される変換・量子化処理を示すフローチャートである。
なお、上記の通り、領域は、並列にリスト作成処理が実行されるように割り当てられた、分割された残差画像の領域である。ステップS111の処理が領域ごとに独立した処理であるため、カウント部3310は、並列処理を効率的に実行できる。
よって、アドレス計算部3320は、実行対象のTUの実行TU情報のみが詰められた形式のリストが作成されるようなアドレスを、並列処理で効率的に計算できる。
次に、本実施形態による効果を説明する。本実施形態のリスト作成部3300は、同じTUサイズのデータが格納されたリストをTUサイズごとに作成する。作成されたリストを用いることで、変換・量子化部3101~310Nおよび逆変換・逆量子化部3201~320Nは、gatherやscatterなどの画像データに対する操作を経ずに、同一サイズの複数のTUに関する処理をまとめて実行できる。すなわち、変換・量子化処理および逆変換・逆量子化処理が、効率よく並列に実施される。
そのため、リストの作成に要する一時領域は、図28に示すgather部3900が要する少なくとも画像全体を記憶可能な一時領域よりも小さい。
[構成]
次に、本発明の第2の実施形態を、図面を参照して説明する。図6は、本発明の第2の実施形態に係る変換処理部3000の構成例を示すブロック図である。なお、H.265のTUサイズパターンは4x4、8x8、16x16、および32x32の4種類であるが、本実施形態のTUサイズパターンはN種類であるとする。
以下、本実施形態の変換処理部3000の動作を図7を参照して説明する。図7は、第2の実施形態の変換処理部3000により実行される変換・量子化処理を示すフローチャートである。
次に、本実施形態による効果を説明する。本実施形態の実行チェック部3401~340Nは、入力された変換係数に対して、逆変換・逆量子化処理の実行対象であるか否かを判定する。実行チェック部3401~340Nが追加されることによって、逆変換逆量子化処理の実行が不要な変換係数がある場合、逆変換・逆量子化処理に係る演算量が削減される。
[構成]
次に、本発明の第3の実施形態を、図面を参照して説明する。図8は、本発明の第3の実施形態に係る変換処理部3000の構成例を示すブロック図である。なお、H.265のTUサイズパターンは4x4、8x8、16x16、および32x32の4種類であるが、本実施形態のTUサイズパターンはN種類であるとする。
以下、本実施形態の変換処理部3000の動作を図9を参照して説明する。図9は、第3の実施形態の変換処理部3000により実行される変換・量子化処理を示すフローチャートである。
また、変換・量子化部3101は、リスト作成部3300で作成された実行TUリスト内の、TUサイズパターン1に関するリスト1と残差画像との入力を受け付け、TUサイズパターン1に関するTUのみに対して変換・量子化処理をまとめて実行する。
次に、本実施形態による効果を説明する。本実施形態のリスト作成部3500は、逆変換・逆量子化処理が実行される前に実行TUリストを作成し直す。よって、逆変換・逆量子化部3201~320Nは、逆変換・逆量子化に要するスレッドを削減できる。その理由は、以下の通りである。
[構成]
次に、本発明の第4の実施形態を、図面を参照して説明する。図10は、本発明の第4の実施形態に係る変換処理部3000の構成例を示すブロック図である。なお、H.265のTUサイズパターンは4x4、8x8、16x16、および32x32の4種類であるが、本実施形態のTUサイズパターンはN種類であるとする。
以下、本実施形態の変換処理部3000の動作を図15を参照して説明する。図15は、第4の実施形態の変換処理部3000により実行される変換・量子化処理を示すフローチャートである。
次に、本実施形態による効果を説明する。本実施形態のリスト更新部3600は、逆変換・逆量子化処理が実行される前に実行TUリストを簡易的に更新する。本実施形態のリスト更新処理に係る演算量は、例えば第3の実施形態のように部分和をとることによって実行TUリストを作成し直す場合の演算量よりも少ない。よって、本実施形態の変換処理部3000は、逆変換・逆量子化処理に要するスレッドをより少ない演算量で削減できる。
[構成]
次に、本発明の第5の実施形態を、図面を参照して説明する。図17は、本発明の第5の実施形態に係る変換処理部3000の構成例を示すブロック図である。なお、H.265のTUサイズパターンは4x4、8x8、16x16、および32x32の4種類であるが、本実施形態のTUサイズパターンはN種類であるとする。
エントリ作成部3720は、作成されたエントリを実行TUリストに格納する。
以下、本実施形態の変換処理部3000の動作を図19を参照して説明する。図19は、第5の実施形態の変換処理部3000により実行される変換・量子化処理を示すフローチャートである。
N種類のTUサイズパターンそれぞれに対して処理が実行された後、変換処理部3000は、変換・量子化処理を終了する。
すなわち、図20に示すステップS521~S522の処理は、図19に示すステップS501の処理に相当する。
次に、本実施形態による効果を説明する。本実施形態のリスト初期化部3700は、実行TUリストを簡易的に作成し、リスト更新部3800が実行TUリストを更新する。本実施形態のリスト初期化処理およびリスト更新処理に係る演算量は、例えば部分和をとることによって実行TUリストを最初から作成する場合のリスト作成処理に係る演算量よりも少ない。よって、本実施形態の変換処理部3000は、変換・量子化に要するスレッドをより少ない演算量で削減できる。
[構成]
一般に、GPUのようなCPUに付随したアクセラレータを使用する場合、CPUとGPUの間におけるバスを介したデータ転送が必須となるため、このデータ転送において発生する転送時間が大きなボトルネックとなる傾向がある。例えば、一般的に用いられるバス通信規格であるPCI(Peripheral Component Interconnect) Expressにおけるデータ転送速度は、CPUやGPU内部のメモリへのデータ転送速度に対して1~2桁低速である。
尚、前述の通り、変換量子化後には、非有意なTU、すなわち変換係数が“0”でないTUが多数発生するので、圧縮処理は、有意なTUのみに対して実行されればよい。この場合、符号化部は、符号化の際にはフレーム内のTUサイズ情報とCBFとを用いることで各TUの位置を特定できる。あるいは、より簡単にTUの位置を計算するために、圧縮データに対応するTUの位置情報が付加されても良い。
以下、本実施形態の変換処理部3000の動作について図32を参照して説明する。図32は、第6の実施形態の変換処理部3000により実行される変換量子化処理とデータ圧縮処理を示すフローチャートである。
図33は、拡張リスト作成部4100により実行される拡張リスト作成処理を示すフローチャートである。すなわち、図33に示すステップS621~S624の処理は、図32に示すステップS601の処理に相当する。
次に、本実施の形態の効果について説明する。
本発明の実施の形態は、上述した実施形態に限定されず、当業者に理解され得る変形を含み得る。例えば、本発明の実施の形態は、上述した各実施形態の一部または全部を適宜に組み合わせた形態であってもよい。また、本発明の実施形態の一部または全部は、以下の付記のように記載され得るが、これらに限定されない。
画像における複数の画像ブロックの各々の位置を画像ブロックのサイズごとに示す位置情報を作成する作成部と、
作成された前記位置情報が示す位置における所定のサイズの画像ブロックに対する変換処理を行う画像処理部とを備える
ことを特徴とする動画像符号化装置。
前記作成部は、変換処理および量子化処理の対象である画像ブロックの位置を示す位置情報を作成し、
前記画像処理部は、前記位置情報を参照して所定のサイズの画像ブロックに対する変換処理および量子化処理を行う変換量子化部と、前記変換量子化部の処理結果に対して逆量子化処理および逆変換処理を行う逆変換逆量子化部とを含む
付記1記載の動画像符号化装置。
前記逆変換逆量子化部は、0以外の処理結果に対して逆量子化処理および逆変換処理を行う
付記2記載の動画像符号化装置。
前記画像処理部は、前記変換量子化部の処理結果を用いて逆量子化処理および逆変換処理の対象である画像ブロックの位置を画像ブロックのサイズごとに示す第2位置情報を作成する第2作成部を含み、
前記逆変換逆量子化部は、前記第2位置情報を参照して逆量子化処理および逆変換処理の対象である画像ブロックに対応する前記変換量子化部の処理結果に対して逆量子化処理および逆変換処理を行う
付記2または付記3記載の動画像符号化装置。
前記画像処理部は、前記変換量子化部の処理結果を用いて逆量子化処理および逆変換処理の対象である画像ブロックの位置を示す情報が連続して含まれている第3位置情報を、前記作成部が作成した位置情報を更新することによって作成する第3作成部を含み、
前記逆変換逆量子化部は、前記第3位置情報を参照して逆量子化処理および逆変換処理の対象である画像ブロックに対応する前記変換量子化部の処理結果に対して所定の単位ごとに逆量子化処理および逆変換処理を行う
付記2または付記3記載の動画像符号化装置。
前記作成部は、変換処理および量子化処理の対象である画像ブロックの位置を示す情報が連続して含まれている位置情報を作成し、
前記画像処理部は、前記位置情報を参照して所定のサイズの画像ブロックに対する変換処理および量子化処理を所定の単位ごとに行う変換量子化部と、前記変換量子化部の処理結果に対して逆量子化処理および逆変換処理を行う逆変換逆量子化部とを含む
付記1記載の動画像符号化装置。
前記作成部は、前記位置情報と、前記位置情報と前記画像ブロックの圧縮順が格納されるデータとの対応関係を示す対応情報とを作成し、
前記変換処理の結果に基づいて前記データを一括して更新する更新部と、
更新された前記データを用いて、画像ブロックをサイズごとに圧縮するデータ圧縮部とを更に備える 付記1~6のいずれかに記載の動画像符号化装置。
画像における複数の画像ブロックの各々の位置を画像ブロックのサイズごとに示す位置情報を作成し、
作成された前記位置情報が示す位置における所定のサイズの画像ブロックに対する変換処理を行う
ことを特徴とする動画像符号化方法。
変換処理および量子化処理の対象である画像ブロックの位置を示す位置情報を作成し、 前記位置情報を参照して所定のサイズの画像ブロックに対する変換処理および量子化処理を行い、
前記変換処理および量子化処理の処理結果に対して逆量子化処理および逆変換処理を行う
付記8記載の動画像符号化方法。
コンピュータに、
画像における複数の画像ブロックの各々の位置を画像ブロックのサイズごとに示す位置情報を作成する作成処理、および
前記位置情報が示す位置における所定のサイズの画像ブロックに対する変換処理
を実行させるための動画像符号化プログラム。
コンピュータに、
前記変換処理および量子化処理の対象である画像ブロックの位置を示す位置情報を作成する作成処理、
前記位置情報を参照して所定のサイズの画像ブロックに対する変換処理および量子化処理を行う変換量子化処理、および
前記変換量子化処理の処理結果に対して逆量子化処理および逆変換処理を行う逆変換逆量子化処理を実行させる
付記10記載の動画像符号化プログラム。
11 作成部
12 画像処理部
1000 イントラ予測部
1001 プロセッサ
1002 プログラムメモリ
1003、1004 記憶媒体
2000 インター予測部
3000 変換処理部
3100~310N 変換・量子化部
3200~320N 逆変換・逆量子化部
3300 リスト作成部
3310 カウント部
3320 アドレス計算部
3330 リスト格納部
3401~340N、4201~420N 実行チェック部
3500 リスト作成部
3600 リスト更新部
3610、3710 TU実行チェック部
3620 リスト移動部
3700 リスト初期化部
3720 エントリ作成部
3800 リスト更新部
3900 gather部
3910、3920 scatter部
4000 エントロピー符号化部
4100 拡張リスト作成部
4300 中間データ更新部
4401~440N データ圧縮部
5000 減算器
6000 加算器
7000、8000 マルチプレクサ
Claims (11)
- 画像における複数の画像ブロックの各々の位置を画像ブロックのサイズごとに示す位置情報を作成する作成手段と、
作成された前記位置情報が示す位置における所定のサイズの画像ブロックに対する変換処理を行う画像処理手段とを備える
ことを特徴とする動画像符号化装置。 - 前記作成手段は、変換処理および量子化処理の対象である画像ブロックの位置を示す位置情報を作成し、
前記画像処理手段は、前記位置情報を参照して所定のサイズの画像ブロックに対する変換処理および量子化処理を行う変換量子化手段と、前記変換量子化手段の処理結果に対して逆量子化処理および逆変換処理を行う逆変換逆量子化手段とを含む
請求項1記載の動画像符号化装置。 - 前記逆変換逆量子化手段は、0以外の処理結果に対して逆量子化処理および逆変換処理を行う
請求項2記載の動画像符号化装置。 - 前記画像処理手段は、前記変換量子化手段の処理結果を用いて逆量子化処理および逆変換処理の対象である画像ブロックの位置を画像ブロックのサイズごとに示す第2位置情報を作成する第2作成手段を含み、
前記逆変換逆量子化手段は、前記第2位置情報を参照して逆量子化処理および逆変換処理の対象である画像ブロックに対応する前記変換量子化手段の処理結果に対して逆量子化処理および逆変換処理を行う
請求項2または請求項3記載の動画像符号化装置。 - 前記画像処理手段は、前記変換量子化手段の処理結果を用いて逆量子化処理および逆変換処理の対象である画像ブロックの位置を示す情報が連続して含まれている第3位置情報を、前記作成手段が作成した位置情報を更新することによって作成する第3作成手段を含み、
前記逆変換逆量子化手段は、前記第3位置情報を参照して逆量子化処理および逆変換処理の対象である画像ブロックに対応する前記変換量子化手段の処理結果に対して所定の単位ごとに逆量子化処理および逆変換処理を行う
請求項2または請求項3記載の動画像符号化装置。 - 前記作成手段は、変換処理および量子化処理の対象である画像ブロックの位置を示す情報が連続して含まれている位置情報を作成し、
前記画像処理手段は、前記位置情報を参照して所定のサイズの画像ブロックに対する変換処理および量子化処理を所定の単位ごとに行う変換量子化手段と、前記変換量子化手段の処理結果に対して逆量子化処理および逆変換処理を行う逆変換逆量子化手段とを含む
請求項1記載の動画像符号化装置。 - 前記作成手段は、前記位置情報と、前記位置情報と前記画像ブロックの圧縮順が格納されるデータとの対応関係を示す対応情報とを作成し、
前記変換処理の結果に基づいて前記データを一括して更新する更新手段と、
更新された前記データを用いて、前記画像ブロックを前記画像ブロックのサイズごとに圧縮するデータ圧縮手段とを更に備える
請求項1~6のいずれか1項記載の動画像符号化装置。 - 画像における複数の画像ブロックの各々の位置を画像ブロックのサイズごとに示す位置情報を作成し、
作成された前記位置情報が示す位置における所定のサイズの画像ブロックに対する変換処理を行う
ことを特徴とする動画像符号化方法。 - 変換処理および量子化処理の対象である画像ブロックの位置を示す位置情報を作成し、
前記位置情報を参照して所定のサイズの画像ブロックに対する変換処理および量子化処理を行い、
前記変換処理および量子化処理の処理結果に対して逆量子化処理および逆変換処理を行う
請求項8記載の動画像符号化方法。 - コンピュータに、
画像における複数の画像ブロックの各々の位置を画像ブロックのサイズごとに示す位置情報を作成する作成処理、および
前記位置情報が示す位置における所定のサイズの画像ブロックに対する変換処理
を実行させるためのプログラムを記録したコンピュータ読み取り可能なプログラム記録媒体。 - コンピュータに、
前記変換処理および量子化処理の対象である画像ブロックの位置を示す位置情報を作成する作成処理、
前記位置情報を参照して所定のサイズの画像ブロックに対する変換処理および量子化処理を行う変換量子化処理、および
前記変換量子化処理の処理結果に対して逆量子化処理および逆変換処理を行う逆変換逆量子化処理を実行させる
請求項10記載のプログラム記録媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017547609A JP6791158B2 (ja) | 2015-10-28 | 2016-10-19 | 動画像符号化装置、動画像符号化方法、およびプログラム |
US15/771,560 US20180316920A1 (en) | 2015-10-28 | 2016-10-19 | Video image encoding device, video image encoding method and program recording medium |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015211659 | 2015-10-28 | ||
JP2015-211659 | 2015-10-28 | ||
JP2016153570 | 2016-08-04 | ||
JP2016-153570 | 2016-08-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017073034A1 true WO2017073034A1 (ja) | 2017-05-04 |
Family
ID=58631491
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2016/004631 WO2017073034A1 (ja) | 2015-10-28 | 2016-10-19 | 動画像符号化装置、動画像符号化方法およびプログラム記録媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20180316920A1 (ja) |
JP (1) | JP6791158B2 (ja) |
WO (1) | WO2017073034A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10771783B2 (en) * | 2018-06-11 | 2020-09-08 | Google Llc | Transforms for large video and image blocks |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012147293A (ja) * | 2011-01-13 | 2012-08-02 | Canon Inc | 画像符号化装置、画像符号化方法及びプログラム、画像復号装置、画像復号方法及びプログラム |
JP2013017167A (ja) * | 2011-07-01 | 2013-01-24 | Mitsubishi Electric Corp | ピクチャをコーディングする方法 |
WO2015146646A1 (ja) * | 2014-03-28 | 2015-10-01 | ソニー株式会社 | 画像復号装置および方法 |
-
2016
- 2016-10-19 JP JP2017547609A patent/JP6791158B2/ja active Active
- 2016-10-19 US US15/771,560 patent/US20180316920A1/en not_active Abandoned
- 2016-10-19 WO PCT/JP2016/004631 patent/WO2017073034A1/ja active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012147293A (ja) * | 2011-01-13 | 2012-08-02 | Canon Inc | 画像符号化装置、画像符号化方法及びプログラム、画像復号装置、画像復号方法及びプログラム |
JP2013017167A (ja) * | 2011-07-01 | 2013-01-24 | Mitsubishi Electric Corp | ピクチャをコーディングする方法 |
WO2015146646A1 (ja) * | 2014-03-28 | 2015-10-01 | ソニー株式会社 | 画像復号装置および方法 |
Non-Patent Citations (2)
Title |
---|
HIROAKI IGARASHI ET AL.: "Highly Parallel Transformation and Quantization for HEVC Encoder on GPUs", VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 27 November 2016 (2016-11-27), XP055379185 * |
HIROAKI IGARASHI ET AL.: "Parallel processing technique for HEVC transformation and quantization on GPU", ITE TECHNICAL REPORT, vol. 39, no. 47, 26 November 2015 (2015-11-26), pages 87 - 92 * |
Also Published As
Publication number | Publication date |
---|---|
US20180316920A1 (en) | 2018-11-01 |
JP6791158B2 (ja) | 2020-11-25 |
JPWO2017073034A1 (ja) | 2018-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110024392B (zh) | 用于视频译码的低复杂度符号预测 | |
US10362311B2 (en) | Method of coding and decoding images, coding and decoding device and computer programs corresponding thereto | |
US8218641B2 (en) | Picture encoding using same-picture reference for pixel reconstruction | |
US8218640B2 (en) | Picture decoding using same-picture reference for pixel reconstruction | |
Cheung et al. | Video coding on multicore graphics processors | |
US10542276B2 (en) | Data caching method and apparatus for video decoder | |
JP6310524B2 (ja) | 画像の符号化および復号の方法、符号化および復号デバイス、ならびにそれに対応するコンピュータプログラム | |
JP4182442B2 (ja) | 画像データの処理装置、画像データの処理方法、画像データの処理方法のプログラム及び画像データの処理方法のプログラムを記録した記録媒体 | |
CN103918273B (zh) | 确定用于变换系数的二进制码字的方法 | |
CN104683805B (zh) | 图像编码、解码方法及装置 | |
CN103931197B (zh) | 确定用于变换系数的二进制码字的方法 | |
US20140153635A1 (en) | Method, computer program product, and system for multi-threaded video encoding | |
US10225569B2 (en) | Data storage control apparatus and data storage control method | |
US8902994B1 (en) | Deblocking filtering | |
CN102369522A (zh) | 计算引擎的并行流水线式集成电路实现 | |
CN103621099A (zh) | 熵解码方法和使用其的解码装置 | |
CN114125449B (zh) | 基于神经网络的视频处理方法、系统和计算机可读介质 | |
WO2017073034A1 (ja) | 動画像符号化装置、動画像符号化方法およびプログラム記録媒体 | |
CN116600134B (zh) | 一种适配图形引擎的并行视频压缩方法和装置 | |
De Cea-Dominguez et al. | GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000 | |
CN116389741A (zh) | 用于对图像或视频数据的当前块进行编码的方法 | |
CN110035288B (zh) | 对视频序列进行编码的方法、编码装置和存储介质 | |
CN102801980A (zh) | 一种用于可伸缩视频编码的解码装置和方法 | |
KR100999505B1 (ko) | 매크로블록 기반의 데이터 병렬 처리를 수행하는 동영상 인코딩/디코딩 장치 | |
KR20140031974A (ko) | 화상 부호화 방법, 화상 복호 방법, 화상 부호화 장치, 화상 복호 장치, 화상 부호화 프로그램 및 화상 복호 프로그램 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16859272 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2017547609 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15771560 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16859272 Country of ref document: EP Kind code of ref document: A1 |