WO2020184715A1

WO2020184715A1 - Image processing device, and image processing method

Info

Publication number: WO2020184715A1
Application number: PCT/JP2020/011206
Authority: WO
Inventors: 健治近藤
Original assignee: ソニー株式会社
Priority date: 2019-03-14
Filing date: 2020-03-13
Publication date: 2020-09-17

Abstract

The present disclosure relates to an image processing device and an image processing method which make it possible to reduce calculation complexity. This image processing device comprises: a DMVR processing unit which performs DMVR processing and which includes a motion vector search unit that searches for a motion vector; and a BDOF processing unit which performs BDOF processing and which includes a prediction unit that uses the motion vector to predict a current prediction block. An SAD value obtained when the motion vector search unit searches for a motion vector is supplied from the DMVR processing unit to the BDOF processing unit. The SAD value is used in an early termination determination for determining whether to terminate BDOF processing early. The present technology is applicable, for example, to an image encoding device and an image decoding device.

Description

Image processing device and image processing method

The present disclosure relates to an image processing apparatus and an image processing method, and more particularly to an image processing apparatus and an image processing method capable of reducing the complexity of calculation.

Conventionally, BDOF (Bi-Directional Optical Flow) processing disclosed in Non-Patent Document 1 is applied to image processing including inter-prediction, and DMVR (Decoder-side Motion Vector Refinement) disclosed in Non-Patent Document 2 is applied. The motion vector searched for in the process is used.

By the way, conventionally, the SAD (Sum of Absolute Difference) value is calculated in the BDOF processing, and the early termination determination of the BDOF processing is performed using the SAD value, but the complexity of the calculation is reduced as a whole of the image processing. Is required to do.

This disclosure has been made in view of such a situation, and is intended to reduce the complexity of calculation.

The image processing device on one aspect of the present disclosure includes a motion vector search unit that searches for a motion vector, a DMVR processing unit that performs DMVR processing, and a prediction unit that predicts the current prediction block using the motion vector. It is provided with a BDOF processing unit that has and performs BDOF processing, and supplies the SAD value obtained when the motion vector search unit searches for the motion vector from the DMVR processing unit to the BDOF processing unit to perform the BDOF processing. The SAD value is used for the early termination determination to determine whether or not to terminate early.

The image processing method of one aspect of the present disclosure includes a motion vector search unit that searches for a motion vector, a DMVR processing unit that performs DMVR processing, and a prediction unit that predicts the current prediction block using the motion vector. An image processing apparatus having a BDOF processing unit that performs BDOF processing supplies the SAD value obtained when the motion vector search unit searches for the motion vector from the DMVR processing unit to the BDOF processing unit. And, the SAD value is used for the early termination determination for determining whether or not the BDOF processing is terminated early.

In one aspect of the present disclosure, the image processing apparatus has a motion vector search unit that searches for a motion vector, a DMVR processing unit that performs DMVR processing, and a prediction unit that predicts a current prediction block using the motion vector. It is provided with a BDOF processing unit that performs BDOF processing. Then, the SAD value obtained when the motion vector search unit searches for the motion vector is supplied from the DMVR processing unit to the BDOF processing unit, and the SAD value is used for the early termination determination to determine whether or not the BDOF processing is terminated early. It will be used.

It is a figure which shows a reference document. It is a block diagram explaining the conventional DMVR processing part and BDOF processing part. It is a block diagram which shows the structural example of one Embodiment of the DMVR processing unit and BDOF processing unit to which this technique is applied. It is a flowchart explaining the conventional BDOF processing. It is a flowchart explaining BDOF processing to which this technique is applied. It is a figure which shows the outline of the test result of the simulation. It is a block diagram which shows the structural example of one Embodiment of the computer-based system to which this technique is applied. It is a block diagram which shows the structural example of one Embodiment of an image coding apparatus. It is a flowchart explaining the coding process. It is a block diagram which shows the structural example of one Embodiment of an image decoding apparatus. It is a flowchart explaining the decoding process. It is a block diagram which shows the structural example of one Embodiment of the computer to which this technique is applied.

<Documents that support technical contents and technical terms>
The scope disclosed herein is not limited to the content of the examples, and the content of reference REF1 shown in FIG. 1, which is known at the time of filing, is also incorporated herein by reference.

In other words, the content described in reference REF1 shown in Fig. 1 is also the basis for determining support requirements. Similarly, technical terms such as Parsing, Syntax, and Semantics are also within the scope of the present disclosure, even if they are not directly defined in the detailed description of the invention. Yes, and shall meet the support requirements of the claims.

<Terms>
In this application, the following terms are defined as follows.

<Block>
Unless otherwise specified, a "block" (not a block indicating a processing unit) used as a partial area or a processing unit of an image (picture) indicates an arbitrary partial area in the picture, and its size, shape, and processing. The characteristics are not limited. For example, "block" includes TB (Transform Block), TU (Transform Unit), PB (Prediction Block), PU (Prediction Unit), SCU (Smallest Coding Unit), CU (Coding Unit), and LCU (Largest Coding Unit). ), CTB (Coding TreeBlock), CTU (Coding Tree Unit), conversion block, subblock, macroblock, tile, slice, etc., any partial area (processing unit) shall be included.

<Specify block size>
Further, when specifying the size of such a block, not only the block size may be directly specified, but also the block size may be indirectly specified. For example, the block size may be specified using the identification information that identifies the size. Further, for example, the block size may be specified by the ratio or difference with the size of the reference block (for example, LCU or SCU). For example, when transmitting information for specifying a block size as a syntax element or the like, the information for indirectly specifying the size as described above may be used as the information. By doing so, the amount of information of the information can be reduced, and the coding efficiency may be improved. Further, the designation of the block size also includes the designation of the range of the block size (for example, the designation of the range of the allowable block size).

<Unit of information / processing>
The data unit in which various information is set and the data unit targeted by various processes are arbitrary and are not limited to the above-mentioned examples. For example, these information and processing are TU (Transform Unit), TB (Transform Block), PU (Prediction Unit), PB (Prediction Block), CU (Coding Unit), LCU (Largest Coding Unit), and subblock, respectively. , Blocks, tiles, slices, pictures, sequences, or components may be set, or the data of those data units may be targeted. Of course, this data unit can be set for each information or process, and it is not necessary that the data unit of all the information or process is unified. The storage location of these information is arbitrary, and may be stored in the header, parameter set, or the like of the above-mentioned data unit. Further, it may be stored in a plurality of places.

<Control information>
The control information related to the present technology may be transmitted from the coding side to the decoding side. For example, control information (for example, enabled_flag) that controls whether or not the application of the present technology described above is permitted (or prohibited) may be transmitted. Further, for example, control information indicating an object to which the present technology is applied (or an object to which the present technology is not applied) may be transmitted. For example, control information may be transmitted that specifies the block size (upper and lower limits, or both) to which the present technology is applied (or allowed or prohibited), frames, components, layers, and the like.

<Flag>
In the present specification, the "flag" is information for identifying a plurality of states, and is not only information used for identifying two states of true (1) or false (0), but also three or more states. It also contains information that can identify the state. Therefore, the value that this "flag" can take may be, for example, 2 values of 1/0 or 3 or more values. That is, the number of bits constituting this "flag" is arbitrary, and may be 1 bit or a plurality of bits. Further, the identification information (including the flag) is assumed to include not only the identification information in the bit stream but also the difference information of the identification information with respect to a certain reference information in the bit stream. In, the "flag" and "identification information" include not only the information but also the difference information with respect to the reference information.

<Associate metadata>
Further, various information (metadata, etc.) regarding the coded data (bit stream) may be transmitted or recorded in any form as long as it is associated with the coded data. Here, the term "associate" means, for example, to make the other data available (linkable) when processing one data. That is, the data associated with each other may be combined as one data or may be individual data. For example, the information associated with the coded data (image) may be transmitted on a transmission path different from the coded data (image). Further, for example, the information associated with the coded data (image) may be recorded on a recording medium (or another recording area of the same recording medium) different from the coded data (image). Good. Note that this "association" may be a part of the data, not the entire data. For example, an image and information corresponding to the image may be associated with each other in an arbitrary unit such as a plurality of frames, one frame, or a part within the frame.

In the present specification, "synthesize", "multiplex", "add", "integrate", "include", "store", "insert", "insert", "insert". A term such as "" means combining a plurality of objects into one, for example, combining encoded data and metadata into one data, and means one method of "associating" described above. Further, in the present specification, the coding includes not only the whole process of converting an image into a bit stream but also a part of the process. For example, it not only includes processing that includes prediction processing, orthogonal transformation, quantization, arithmetic coding, etc., but also includes processing that collectively refers to quantization and arithmetic coding, prediction processing, quantization, and arithmetic coding. Including processing, etc. Similarly, decoding includes not only the entire process of converting a bitstream into an image, but also some processes. For example, it not only includes processing that includes inverse arithmetic decoding, inverse quantization, inverse orthogonalization, prediction processing, etc., but also processing that includes inverse arithmetic decoding and inverse quantization, inverse arithmetic decoding, inverse quantization, and prediction processing. Including processing that includes and.

Hereinafter, specific embodiments to which the present technology is applied will be described in detail with reference to the drawings.

<DMVR processing and BDOF processing>
First, DMVR processing and BDOF processing will be described with reference to FIGS. 2 to 6.

In the present disclosure, a technique for terminating the BDOF process earlier is introduced, and the DMVR cost is used in the early termination determination for determining whether or not the BDOF process is terminated earlier.

For example, conventionally, in the BDOF design, early termination using the SAD value is already implemented between the reference image L0 and the reference image L1. On the other hand, in DMVR processing, the same SAD value as BDOF is calculated and the motion vector is adjusted.

Therefore, when the SAD value calculated by DMVR processing is reused, the simulation results show that the decoding execution time is shortened and the BD performance is hardly reduced. That is, in the simulation results, the BD rates of the Y, U, and V components are 0.02%, 0.01%, and -0.02%, respectively, and the encoding time and decoding time in the random access configuration are 100% and 99%, respectively. Was shown.

FIG. 2 shows a block diagram illustrating the DMVR processing unit 11 and the BDOF processing unit 12. For example, DMVR processing and BDOF processing are executed when the condition check of the current VTM (VVCTestModel) -4.0 design is enabled.

The DMVR processing unit 11 has a motion vector search unit 21, searches for an appropriate motion vector MV by the motion vector search unit 21, and supplies the motion vector MV to the BDOF processing unit 12.

The BDOF processing unit 12 has a SAD calculation unit 22 and a prediction unit 23, and predicts the current prediction block using the motion vector MV supplied from the DMVR processing unit 11. At this time, in the BDOF processing unit 12, the SAD value between the reference image L0 and the reference image L1 calculated by the SAD calculation unit 22 is used for determining the early end of the BDOF processing.

Here, in the search for the motion vector MV performed by the DMVR processing unit 11, SAD is calculated as a cost for obtaining a more appropriate motion vector MV. Therefore, in the DMVR processing unit 11 and the BDOF processing unit 12, the SAD was calculated twice stepwise.

Therefore, in the present disclosure, it is proposed that the BDOF processing unit 12 reuses the SAD obtained by the DMVR processing unit 11 to reduce the complexity of the calculation.

FIG. 3 is a block diagram showing a configuration example of an embodiment of the DMVR processing unit 11 and the BDOF processing unit 12 to which the present technology is applied.

As shown in FIG. 3, not only the motion vector MV is supplied from the DMVR processing unit 11 to the BDOF processing unit 12, but also the motion vector search unit 21 calculates the SAD value obtained when searching for the motion vector MV. It is supplied to the unit 22. That is, when the SAD calculation unit 22 can acquire the SAD value supplied from the motion vector search unit 21, it can skip the calculation of the SAD value. Then, the BDOF processing unit 12 can reuse the SAD value from the DMVR processing unit 11 to determine the early termination of the BDOF processing.

FIG. 4 is a flowchart illustrating the conventional BDOF process.

In the conventional BDOF processing, the BDOF processing unit 12 calculates the SAD value in step S11, applies a gradient filter in step S12, detects the optical flow in step S13, and makes a prediction in step S14.

At this time, in step S14, the BDOF processing unit 12 uses the SAD value in order to finish the prediction at the 4 × 4 subblock level at an early stage. In the current design of BDOF, normal bi-prediction and optical flow technology prediction are selected. And if the SAD value of the 4x4 subblock is less than the threshold, the usual bi-prediction is used to reduce complexity. As described above, the SAD value of the 4 × 4 subblock is required.

FIG. 5 is a flowchart illustrating BDOF processing to which the present technology is applied.

In the BDOF processing to which the present technology is applied, the BDOF processing unit 12 acquires the SAD value supplied from the DMVR processing unit 11 in step S21. Then, the BDOF processing unit 12 applies a gradient filter in step S22, detects an optical flow in step S23, and makes a prediction in step S24. Further, in step S24, the BDOF processing unit 12 uses the SAD value acquired in step S21 for the early termination determination for determining whether or not to terminate the prediction at the 4 × 4 subblock level early.

At this time, in step S24, if the BDOF processing unit 12 determines that the BDOF processing is not terminated early in the early termination determination of the BDOF processing, the BDOF processing is not terminated early even in the early termination determination performed after the block is divided. Is determined. That is, when the BDOF processing unit 12 uses the SAD value from the DMVR processing unit 11 to determine that the prediction of 16 × 16 blocks should not be terminated early, the prediction of 4 × 4 subblocks is not terminated early. Is determined. Further, when the block size after dividing the block is 4 × 4, the BDOF processing unit 12 determines that the BDOF processing is not terminated early in the early termination determination of the BDOF processing.

The following describes an experiment to confirm the performance of BDOF processing to which this technology is applied.

For example, BDOF processing to which this technology was applied was implemented in VTM-4.0, and experiments were conducted according to the JVET common test conditions disclosed in Non-Patent Document 3 above.

In addition, a platform machine similar to the PC cluster shown below was used for the simulation.
Encode
・ OS: CentOS 6.10
-Compiler: g ++ (GCC) 6.3.1
-CPU: Intel Core i7-7700K 4.2GHz
・ SIMD: SSE42.
· Memory: 32 GB
Decode
・ OS: CentOS 6.10
-Compiler: g ++ (GCC) 6.3.1
-CPU: Intel Core i5-2500K 3.3 GHz
・ SIMD: SSE42
· Memory: 24 GB

And in the runtime measurement, the user time of the output log was used. Also, parallel encoding was used for random access and for all intras. Also, to measure the encoding time with parallel encoding, the sum of all user times for each frame was calculated.

FIG. 6 shows an outline of the test results of the simulation performed under the above conditions.

As shown in FIG. 6, the influence of coding efficiency can be ignored. Also, for the decoder runtime, CE9-2.2a and CE9-2.2b were shown to be slightly faster than CE9-2.2c. The reason may be that CE9-2.2a and CE9-2.2b can skip and process BDOF in the early termination region 16x16, but CE9-2.2c does not.

Here, the draft specifications are changed as follows.

If X is 0 and 1, respectively, and predFlagLX [xSbIdx] [ySbIdx] is equal to 1, then the following applies:
• The reference image consisting of the ordered 2D array refPicLXL of the luminance sample and the ordered 2D array refPicLXCb and refPicLXCr of the chroma sample uses X and refIdxLX as inputs and is drafted in section 8.5.7.2. Derived by calling the process specified in.

The motion vector offset mvOffset is set to be equal to refMvLX [xSbIdx] [xSbIdx] -mvLX [xSbIdx] [ySbIdx].

If one or more of the following conditions are true, mvOffset [0] is set to 0.
-XSb is not equal to xCb and mvOffset [0] is less than 0-(xSb + sbWidth) is not equal to (xCb + cbWidth) and mvOffset [0] is greater than 0

If one or more of the following conditions are true, mvOffset [1] is set to 0.
· YSb is not equal to yCb and mvOffset [1] is less than 0 · (ySb + sbHeight) is not equal to (yCb + cbHeight) and mvOffset [1] is greater than 0

If cIdx is equal to 0, then the following applies:
The array predSamplesLXL uses the fractional sample interpolation process specified in section 8.5.7.3 of the draft for the luminance position (xCb, yCb), coding subblock width sbWidth, coding subblock height sbHeight for the luma sample, and luma motion vector offset mvOffset. To get the luminance motion vector refMvLX [xSb] [xSb], the reference arrays refPicLXL, bdofFlag, and cIdx as inputs.

Otherwise, if cIdx is equal to 1, the following applies:
The array predSamplesLXCb uses the fractional sample interpolation process specified in section 8.5.7.3 of the draft for the luminance position (xCb, yCb), coding subblock width sbWidth / 2, coding subblock height sbHeight / 2, and chroma motion vector. Call with offset mvOffset to get the chroma motion vector refMvLX [xSb] [xSb], the reference arrays refPicLXCb, bdofFlag, and cIdx as inputs.

Otherwise (cIdx is equal to 2), the following applies:
The array predSamplesLXCr uses the fractional sample interpolation process specified in section 8.5.7.3 of the draft for the luminance position (xCb, yCb), coding subblock width sbWidth / 2, coding subblock height sbHeight / 2, and chroma motion vector. Called with offset mvOffset to get the chroma motion vector refMvLX [xSb] [xSb], the reference arrays refPicLXCr, bdofFlag, and cIdx as inputs.

If bdofFlag is equal to TRUE, then the following applies:
-The variable shift is set to Max (2, 14 --BitDepthY).

The variables bdofBlkDiffThres, etSumDiff [etxIdx] [etyIdx], numEtxIdx, and numEtyIdx are derived according to the following equation (1).

For xIdx = 0 .. (sbWidth >> 2) -1 and yIdx = 0 .. (sbHeight >> 2) -1, the variables bdofBlkSumDiff and the bidirectional optical flow utilization flag bdofUtilizationFlag [xIdx] [yIdx] are: Derived according to equation (2).

The predSamples array of the predicted sample is derived as explained below.

If cIdx is 0, the predicted samples in the current room coding subblock, predSamples [xL + xSb] [yL + ySb], xL = 0..sbWidth-1 and yL = 0..sbHeight-1 It becomes a street.

If bdofFlag is equal to TRUE, then in the bidirectional optical flow sample prediction process specified in section 8.5.7.4 of the draft, nCbW is set equal to the width of the luma coding subblock sbWidth and nCbH is the height of the luma coding subblock sbHeight. Is set to. Here, sample arrays predSamplesL0L and predSamplesL1L, variables predFlagL0 [xSbIdx] [ySbIdx], predFlagL1 [xSbIdx] [ySbIdx], refIdxL0, refIdxL1, etSumDiff [etxIdx] [etxIdx] [etxIdx] ] [YIdx] with xIdx = 0 .. (sbWidth >> 2) -1, and yIdx = 0 .. (sbHeight >> 2)-1as is input, predSamples [xL + xSb] [yL + ySb] Is the output.

Otherwise (bdofFlag is FALSE), the weighted sample prediction process specified in section 8.5.7.5 of the draft is the brightness coding subblock width sbWidth, the brightness coding subblock height sbHeight, the sample sequences predSamplesL0L and predSamplesL1L, and Called with the variable predFlagL0. Here, [xSbIdx] [ySbIdx], predFlagL1 [xSbIdx] [ySbIdx], refIdxL0, refIdxL1, gbiIdx, and cIdx are input, and predSamples [xL + xSb] [yL + ySb] is output.

Bi-directional optical flow prediction process

The inputs to this process are as follows:
-Two variables nCbW and nCbH that specify the width and height of the current coding block
-Two (nCbW + 2) x (nCbH + 2) brightness prediction sample arrays predSamplesL0 and predSamplesL1
· Forecast list utilization flags predFlagL0 and predFlagL1
-Reference indexes refIdxL0 and refIdxL1
· EtxIdx = 0 .. numEtxIdx-1 and etyIdx = 0 .. numEtyIdx-1 early termination region total of absolute differences etSumDiff [etxIdx] [etyIdx] • Early termination region etDiffThres threshold • Bidirectional optical flow utilization Flags bdofUtilizationFlag [xIdx] [yIdx] with xIdx = 0 .. (nCbW >> 2) -1, yIdx = 0 .. (nCbH >> 2) -1

The output of this process is the (nCbW) x (nCbH) array pbSamples of the brightness prediction sample values.

The variables bitDepth, shift1, shift2, shift3, shift4, offset4, and mvRefineThres are derived as follows.
-Variable bitDepth is set equal to BitDepthY-Variable shift1 is set equal to Max (2,14-bitDepth) -Variable shift2 is set equal to Max (8, bitDepth-4) -Variable shift3 , Max (5, bitDepth-7) is set equally ・ Variable shift4 is set to Max (3,15-bitDepth), variable offset4 is set to 1 << (shift4-1) ・ Variable mvRefineThres is Set equal to Max (2, 1 << (13-bitDepth))

For xIdx = 0 .. (nCbW >> 2) -1 and yIdx = 0 .. (nCbH >> 2) -1, the following applies:
-The variable xSb is set to (xIdx << 2) + 1 and ySb is set to (yIdx << 2) + 1.-If etSumDiff [xSbIdx >> 2] [xSbIdx >> 2] is smaller than etDiffThres Or if bdofUtilizationFlag [xSbIdx] [yIdx] is equal to FALSE, then x = xSb ―― 1..xSb + 2, y = ySb ―― 1. ySb + 2, the predicted sample value of the current subblock is derived according to the following equation (3).

• Otherwise (etSumDiff [xSbIdx >> 2] [xSbIdx >>2]> = etDiffThres equals TRUE, bdofUtilizationFlag [xSbIdx] [yIdx] equals TRUE), the predicted sample values for the current subblock are: It is derived as follows.

If x = xSb-1..xSb + 4, y = ySb-1..ySb + 4, the following steps apply.
1. 1. The position (hx, vy) of the corresponding sample position (x, y) in the predicted sample sequence is derived according to the following equation (4).

2. 2. The variables gradientHL0 [x] [y], gradientVL0 [x] [y], gradientHL1 [x] [y], and gradientVL1 [x] [y] are derived according to the following equation (5).

3. 3. The variables temp [x] [y], tempH [x] [y] and tempV [x] [y] are derived according to the following equation (6).

The variables sGx2, sGy2, sGxGy, sGxdI, and sGydI are derived according to the following equation (7).

<Computer-based system configuration example>
FIG. 7 is a block diagram showing a configuration example of an embodiment of a computer-based system to which the present technology is applied.

FIG. 7 is a block diagram showing a configuration example of a network system in which one or more computers, servers, and the like are connected via a network. Note that the hardware and software environment shown in the embodiment of FIG. 7 is shown as an example of being able to provide a platform for implementing the software and / or method according to the present disclosure.

As shown in FIG. 7, the network system 31 includes a computer 32, a network 33, a remote computer 34, a web server 35, a cloud storage server 36, and a computer server 37. Here, in the present embodiment, a plurality of instances are executed by one or more of the functional blocks shown in FIG.

Further, in FIG. 7, the detailed configuration of the computer 32 is illustrated. The functional blocks shown in the computer 32 are shown for establishing exemplary functions, and are not limited to such a configuration. Further, although the detailed configurations of the remote computer 34, the web server 35, the cloud storage server 36, and the computer server 37 are not shown, they include the same configurations as the functional blocks shown in the computer 32. ing.

The computer 32 may be a personal computer, desktop computer, laptop computer, tablet computer, netbook computer, personal digital assistant, smartphone, or other programmable electronic device capable of communicating with other devices on the network. Can be done.

The computer 32 includes a bus 41, a processor 42, a memory 43, a non-volatile storage 44, a network interface 46, a peripheral device interface 47, and a display interface 48. Each of these functions is implemented in an individual electronic subsystem (integrated circuit chip or combination of chips and related devices) in some embodiments, or in some embodiments, some of the functions are combined. It may be mounted on a single chip (system on chip or SoC (System on Chip)).

Bus 41 can adopt various proprietary or industry standard high-speed parallel or serial peripheral interconnection buses.

The processor 42 may employ one designed and / or manufactured as one or more single or multi-chip microprocessors.

The memory 43 and the non-volatile storage 44 are storage media that can be read by the computer 32. For example, the memory 43 can employ any suitable volatile storage device such as DRAM (Dynamic Random Access Memory) or SRAM (Static RAM). The non-volatile storage 44 includes a flexible disk, a hard disk, an SSD (SolidStateDrive), a ROM (ReadOnlyMemory), an EPROM (ErasableandProgrammableReadOnlyMemory), a flash memory, a compact disk (CD or CD-ROM), and a DVD (CD or CD-ROM). At least one or more of DigitalVersatileDisc), card type memory, or stick type memory can be adopted.

In addition, the program 45 is stored in the non-volatile storage 44. Program 45 is, for example, a collection of machine-readable instructions and / or data used to create, manage, and control specific software functions. In a configuration in which the memory 43 is much faster than the non-volatile storage 44, the program 45 can be transferred from the non-volatile storage 44 to the memory 43 before being executed by the processor 42.

The computer 32 can communicate and interact with other computers via the network 33 via the network interface 46. The network 33 can adopt, for example, a LAN (Local Area Network), a WAN (Wide Area Network) such as the Internet, or a combination of LAN and WAN, including a wired, wireless, or optical fiber connection. .. In general, the network 33 consists of any combination of connections and protocols that support communication between two or more computers and related devices.

The peripheral device interface 47 can input / output data to / from other devices that can be locally connected to the computer 32. For example, the peripheral interface 47 provides a connection to the external device 51. The external device 51 includes a keyboard, mouse, keypad, touch screen, and / or other suitable input device. The external device 51 may also include, for example, a thumb drive, a portable optical or magnetic disk, and a portable computer readable storage medium such as a memory card.

In embodiments of the present disclosure, for example, the software and data used to implement Program 45 may be stored on such a portable computer readable storage medium. In such an embodiment, the software may be loaded directly into the non-volatile storage 44 or into the memory 43 via the peripheral interface 47. Peripheral device interface 47 may use an industry standard such as RS-232 or USB (Universal Serial Bus) for connection with the external device 51.

The display interface 48 can connect the computer 32 to the display 52, and the display 52 can be used to present a command line or graphical user interface to the user of the computer 32. For example, for the display interface 48, industry standards such as VGA (Video Graphics Array), DVI (Digital Visual Interface), DisplayPort, and HDMI (High-Definition Multimedia Interface) (registered trademark) can be adopted.

<Configuration example of image coding device>
FIG. 8 shows the configuration of an embodiment of an image coding device as an image processing device to which the present disclosure is applied.

The image coding device 60 shown in FIG. 8 encodes image data using prediction processing. Here, as the coding method, for example, a HEVC (High Efficiency Video Coding) method or the like is used.

The image coding device 60 of FIG. 8 has a screen sorting buffer 61, a control unit 62, a calculation unit 63, an orthogonal conversion unit 64, a quantization unit 65, a lossless coding unit 66, and a storage buffer 67. Further, the image coding device 60 includes an inverse quantization unit 68, an inverse orthogonal conversion unit 69, an arithmetic unit 70, a deblocking filter 71, an adaptive offset filter 72, an adaptive loop filter 73, a frame memory 74, a selection unit 75, and an intra prediction unit. It has a motion prediction / compensation section 77, a prediction image selection section 78, and a rate control section 79.

The screen rearrangement buffer 61 stores the input image data (Picture (s)), and the image of the frame in the stored display order is framed for coding according to the GOP (Group of Picture) structure. Sort in the order of. The screen rearrangement buffer 61 outputs an image in which the frame order is rearranged to the calculation unit 63, the intra prediction unit 76, and the motion prediction / compensation unit 77 via the control unit 62.

The control unit 62 controls reading of an image from the screen rearrangement buffer 61.

The calculation unit 63 subtracts the prediction image supplied from the intra prediction unit 76 or the motion prediction / compensation unit 77 via the prediction image selection unit 78 from the image output from the control unit 62, and orthogonally converts the difference information. Output to unit 64.

For example, in the case of an image to be intra-encoded, the calculation unit 63 subtracts the prediction image supplied from the intra prediction unit 76 from the image output from the control unit 62. Further, for example, in the case of an image to be inter-encoded, the calculation unit 63 subtracts the predicted image supplied from the motion prediction / compensation unit 77 from the image output from the control unit 62.

The orthogonal transform unit 64 performs orthogonal transforms such as discrete cosine transform and Karhunen-Loève transform on the difference information supplied from the arithmetic unit 63, and supplies the conversion coefficients to the quantization unit 65.

The quantization unit 65 quantizes the conversion coefficient output by the orthogonal conversion unit 64. The quantized unit 65 supplies the quantized conversion coefficient to the lossless coding unit 66.

The lossless coding unit 66 applies lossless coding such as variable length coding and arithmetic coding to the quantized conversion coefficient.

The lossless coding unit 66 acquires parameters such as information indicating the intra prediction mode from the intra prediction unit 76, and acquires parameters such as information indicating the inter prediction mode and motion vector information from the motion prediction / compensation unit 77.

The lossless coding unit 66 encodes the quantized conversion coefficient and encodes each acquired parameter (syntax element) to be a part (multiplex) of the header information of the coded data. The lossless coding unit 66 supplies the coded data obtained by coding to the storage buffer 67 and stores it.

For example, in the lossless coding unit 66, lossless coding processing such as variable length coding or arithmetic coding is performed. Examples of variable-length coding include CAVLC (Context-Adaptive Variable Length Coding). Examples of arithmetic coding include CABAC (Context-Adaptive Binary Arithmetic Coding).

The storage buffer 67 temporarily holds the coded stream (Encoded Data) supplied from the reversible coding unit 66, and at a predetermined timing, as a coded image, for example, not shown in the subsequent stage. Output to a recording device or transmission line. That is, the storage buffer 67 is also a transmission unit that transmits a coded stream.

Further, the conversion coefficient quantized in the quantization unit 65 is also supplied to the inverse quantization unit 68. The dequantization unit 68 dequantizes the quantized conversion coefficient by a method corresponding to the quantization by the quantization unit 65. The inverse quantization unit 68 supplies the obtained conversion coefficient to the inverse orthogonal conversion unit 69.

The inverse orthogonal conversion unit 69 performs inverse orthogonal conversion of the supplied conversion coefficient by a method corresponding to the orthogonal conversion processing by the orthogonal conversion unit 64. The inverse orthogonally converted output (restored difference information) is supplied to the calculation unit 70.

The calculation unit 70 supplies the inverse orthogonal conversion result supplied from the inverse orthogonal conversion unit 69, that is, the restored difference information from the intra prediction unit 76 or the motion prediction / compensation unit 77 via the prediction image selection unit 78. The predicted images are added to obtain a locally decoded image (decoded image).

For example, when the difference information corresponds to an image to be intra-encoded, the calculation unit 70 adds the predicted image supplied from the intra prediction unit 76 to the difference information. Further, for example, when the difference information corresponds to an image to be intercoded, the calculation unit 70 adds the prediction image supplied from the motion prediction / compensation unit 77 to the difference information.

The decoded image that is the result of the addition is supplied to the deblocking filter 71 and the frame memory 74.

The deblocking filter 71 suppresses block distortion of the decoded image by appropriately performing deblocking filter processing on the image from the calculation unit 70, and supplies the filter processing result to the adaptive offset filter 72. The deblocking filter 71 has parameters β and Tc obtained based on the quantization parameter QP. The parameters β and Tc are threshold values (parameters) used for determining the deblocking filter.

Note that the parameters β and Tc possessed by the deblocking filter 71 are extended from β and Tc defined by the HEVC method. Each offset of the parameters β and Tc is encoded by the lossless coding unit 66 as a parameter of the deblocking filter and transmitted to the image decoding device 80 of FIG. 10 described later.

The adaptive offset filter 72 mainly performs an offset filter (SAO: Sample adaptive offset) process that suppresses ringing on the image after filtering by the deblocking filter 71.

There are 9 types of offset filters, 2 types of band offset, 6 types of edge offset, and no offset. The adaptive offset filter 72 uses a quad-tree structure in which the type of offset filter is determined for each divided region and an offset value for each divided region to filter the image after filtering by the deblocking filter 71. Apply processing. The adaptive offset filter 72 supplies the filtered image to the adaptive loop filter 73.

In the image coding apparatus 60, the quad-tree structure and the offset value for each divided region are calculated and used by the adaptive offset filter 72. The calculated quad-tree structure and the offset value for each divided region are encoded by the lossless coding unit 66 as adaptive offset parameters and transmitted to the image decoding device 80 of FIG. 10 to be described later.

The adaptive loop filter 73 performs adaptive loop filter (ALF: Adaptive Loop Filter) processing for each processing unit using the filter coefficient on the image after filtering by the adaptive offset filter 72. In the adaptive loop filter 73, for example, a two-dimensional Wiener filter is used as the filter. Of course, a filter other than the Wiener filter may be used. The adaptive loop filter 73 supplies the filter processing result to the frame memory 74.

Although not shown in the example of FIG. 8, in the image coding apparatus 60, the filter coefficient is an adaptive loop filter 73 for each processing unit so as to minimize the residual with the original image from the screen rearrangement buffer 61. It is calculated and used by. The calculated filter coefficient is encoded by the lossless coding unit 66 as an adaptive loop filter parameter and transmitted to the image decoding device 80 of FIG. 10 described later.

The frame memory 74 outputs the stored reference image to the intra prediction unit 76 or the motion prediction / compensation unit 77 via the selection unit 75 at a predetermined timing.

For example, in the case of an image to be intra-encoded, the frame memory 74 supplies the reference image to the intra-prediction unit 76 via the selection unit 75. Further, for example, when intercoding is performed, the frame memory 74 supplies the reference image to the motion prediction / compensation unit 77 via the selection unit 75.

When the reference image supplied from the frame memory 74 is an image to be intra-encoded, the selection unit 75 supplies the reference image to the intra prediction unit 76. Further, when the reference image supplied from the frame memory 74 is an image to be intercoded, the selection unit 75 supplies the reference image to the motion prediction / compensation unit 77.

The intra prediction unit 76 performs intra prediction (in-screen prediction) to generate a prediction image using the pixel values in the screen. The intra prediction unit 76 performs intra prediction in a plurality of modes (intra prediction mode).

The intra prediction unit 76 generates prediction images in all intra prediction modes, evaluates each prediction image, and selects the optimum mode. When the optimum intra prediction mode is selected, the intra prediction unit 76 supplies the prediction image generated in the optimum mode to the calculation unit 63 and the calculation unit 70 via the prediction image selection unit 78.

Further, as described above, the intra prediction unit 76 appropriately supplies parameters such as intra prediction mode information indicating the adopted intra prediction mode to the lossless coding unit 66.

The motion prediction / compensation unit 77 uses an input image supplied from the screen rearrangement buffer 61 and a reference image supplied from the frame memory 74 via the selection unit 75 for the image to be intercoded. Predict movement. Further, the motion prediction / compensation unit 77 performs motion compensation processing according to the motion vector detected by the motion prediction, and generates a prediction image (inter-prediction image information).

The motion prediction / compensation unit 77 performs inter-prediction processing in all candidate inter-prediction modes and generates a prediction image. The motion prediction / compensation unit 77 supplies the generated predicted image to the calculation unit 63 and the calculation unit 70 via the prediction image selection unit 78. Further, the motion prediction / compensation unit 77 supplies parameters such as inter-prediction mode information indicating the adopted inter-prediction mode and motion vector information indicating the calculated motion vector to the reversible coding unit 66.

Here, in the motion prediction / compensation unit 77, as described with reference to FIG. 3 described above, the SAD value is supplied from the DMVR processing unit 11 to the BDOF processing unit 12, and the SAD value is used for determining the early termination of the BDOF processing. Will be reused.

The prediction image selection unit 78 supplies the output of the intra prediction unit 76 to the calculation unit 63 and the calculation unit 70 in the case of an image to be intra-encoded, and the motion prediction / compensation unit 77 in the case of an image to be inter-encoded. The output is supplied to the calculation unit 63 and the calculation unit 70.

The rate control unit 79 controls the rate of the quantization operation of the quantization unit 65 based on the compressed image stored in the storage buffer 67 so that overflow or underflow does not occur.

<Operation of image coding device>
With reference to FIG. 9, the flow of the coding process executed by the image coding apparatus 60 as described above will be described.

In step S31, the screen rearrangement buffer 61 stores the input images and rearranges the images from the display order to the encoding order.

When the image to be processed supplied from the screen rearrangement buffer 61 is an image of a block to be intra-processed, the referenced decoded image is read from the frame memory 74, and the intra-prediction unit is read through the selection unit 75. It is supplied to 76.

Based on these images, in step S32, the intra prediction unit 76 intrapredicts the pixels of the block to be processed in all the candidate intra prediction modes. As the decoded pixel to be referred to, a pixel that has not been filtered by the deblocking filter 71 is used.

By this process, intra-prediction is performed in all the candidate intra-prediction modes, and the cost function value is calculated for all the candidate intra-prediction modes. Then, the optimum intra prediction mode is selected based on the calculated cost function value, and the prediction image generated by the intra prediction in the optimum intra prediction mode and the cost function value thereof are supplied to the prediction image selection unit 78.

When the image to be processed supplied from the screen rearrangement buffer 61 is an interprocessed image, the referenced image is read from the frame memory 74 and supplied to the motion prediction / compensation unit 77 via the selection unit 75. Will be done. Based on these images, in step S33, the motion prediction / compensation unit 77 performs motion prediction / compensation processing. Here, as described with reference to FIG. 3 described above, the motion prediction / compensation unit 77 supplies the SAD value from the DMVR processing unit 11 to the BDOF processing unit 12 and uses the SAD value for early termination determination of the BDOF processing. By reusing it, the complexity of calculation can be reduced as compared with the conventional case.

By this processing, motion prediction processing is performed in all candidate inter-prediction modes, cost function values are calculated for all candidate inter-prediction modes, and optimal inter-prediction is calculated based on the calculated cost function values. The mode is determined. Then, the predicted image generated by the optimum inter prediction mode and the cost function value thereof are supplied to the predicted image selection unit 78.

In step S34, the prediction image selection unit 78 optimizes one of the optimum intra prediction mode and the optimum inter prediction mode based on each cost function value output from the intra prediction unit 76 and the motion prediction / compensation unit 77. Determine to predict mode. Then, the prediction image selection unit 78 selects the determined prediction image of the optimum prediction mode and supplies it to the

calculation units

63 and 70. This predicted image is used for the calculation of steps S35 and S40 described later.

The selection information of this prediction image is supplied to the intra prediction unit 76 or the motion prediction / compensation unit 77. When the prediction image of the optimum intra prediction mode is selected, the intra prediction unit 76 supplies information indicating the optimum intra prediction mode (that is, parameters related to the intra prediction) to the lossless coding unit 66.

When the prediction image of the optimum inter prediction mode is selected, the motion prediction / compensation unit 77 reversibly encodes the information indicating the optimum inter prediction mode and the information corresponding to the optimum inter prediction mode (that is, the parameters related to the motion prediction). Output to unit 66. Examples of the information according to the optimum inter-prediction mode include motion vector information and reference frame information.

In step S35, the calculation unit 63 calculates the difference between the images sorted in step S31 and the predicted image selected in step S34. The predicted image is supplied to the calculation unit 63 from the motion prediction / compensation unit 77 for inter-prediction and from the intra-prediction unit 76 for intra-prediction via the prediction image selection unit 78.

The amount of difference data is smaller than that of the original image data. Therefore, the amount of data can be compressed as compared with the case where the image is encoded as it is.

In step S36, the orthogonal conversion unit 64 orthogonally converts the difference information supplied from the calculation unit 63. Specifically, orthogonal transforms such as the discrete cosine transform and the Karhunen-Loève transform are performed, and the transform coefficients are output.

In step S37, the quantization unit 65 quantizes the conversion coefficient. In this quantization, the rate is controlled as described in the process of step S48 described later.

The difference information quantized as described above is locally decoded as follows. That is, in step S38, the inverse quantization unit 68 dequantizes the conversion coefficient quantized by the quantization unit 65 with a characteristic corresponding to the characteristic of the quantization unit 65. In step S39, the inverse orthogonal conversion unit 69 performs inverse orthogonal conversion with the characteristics corresponding to the characteristics of the orthogonal conversion unit 64 of the conversion coefficients inversely quantized by the inverse quantization unit 68.

In step S40, the calculation unit 70 adds the predicted image input via the predicted image selection unit 78 to the locally decoded difference information, and the locally decoded (that is, locally decoded) image. (Image corresponding to the input to the calculation unit 63) is generated.

In step S41, the deblocking filter 71 performs a deblocking filter process on the image output from the calculation unit 70. At this time, the parameters β and Tc extended from β and Tc defined by the HEVC method are used as the threshold value for the determination regarding the deblocking filter. The filtered image from the deblocking filter 71 is output to the adaptive offset filter 72.

Note that each offset of the parameters β and Tc used in the deblocking filter 71, which is input by the user by operating the operation unit or the like, is supplied to the reversible coding unit 66 as a parameter of the deblocking filter.

In step S42, the adaptive offset filter 72 performs adaptive offset filter processing. By this processing, the quad-tree structure in which the type of offset filter is determined for each divided area and the offset value for each divided area are used to filter the image after filtering by the deblocking filter 71. Be given. The filtered image is fed to the adaptive loop filter 73.

The determined quad-tree structure and the offset value for each divided region are supplied to the lossless coding unit 66 as an adaptive offset parameter.

In step S43, the adaptive loop filter 73 performs adaptive loop filter processing on the image filtered by the adaptive offset filter 72. For example, the image after filtering by the adaptive offset filter 72 is filtered for each processing unit by using the filter coefficient, and the filtering result is supplied to the frame memory 74.

In step S44, the frame memory 74 stores the filtered image. Images not filtered by the deblocking filter 71, the adaptive offset filter 72, and the adaptive loop filter 73 are also supplied and stored in the frame memory 74 from the calculation unit 70.

On the other hand, the conversion coefficient quantized in step S37 described above is also supplied to the lossless coding unit 66. In step S45, the lossless coding unit 66 encodes the quantized conversion coefficient output from the quantizing unit 65 and each of the supplied parameters. That is, the difference image is losslessly coded and compressed by variable length coding, arithmetic coding, and the like. Here, examples of the encoded parameters include deblocking filter parameters, adaptive offset filter parameters, adaptive loop filter parameters, quantization parameters, motion vector information and reference frame information, prediction mode information, and the like.

In step S46, the storage buffer 67 stores the encoded difference image (that is, the coded stream) as a compressed image. The compressed image stored in the storage buffer 67 is appropriately read out and transmitted to the decoding side via the transmission line.

In step S47, the rate control unit 79 controls the rate of the quantization operation of the quantization unit 65 based on the compressed image stored in the storage buffer 67 so that overflow or underflow does not occur.

When the process of step S47 is completed, the coding process is completed.

<Configuration example of image decoding device>
FIG. 10 shows the configuration of an embodiment of an image decoding device as an image processing device to which the present disclosure is applied. The image decoding device 80 shown in FIG. 10 is a decoding device corresponding to the image coding device 60 of FIG.

The coded stream (Encoded Data) encoded by the image coding device 60 is transmitted to the image decoding device 80 corresponding to the image coding device 60 via a predetermined transmission line and is decoded. ..

As shown in FIG. 10, the image decoding device 80 includes a storage buffer 81, a reversible decoding unit 82, an inverse quantization unit 83, an inverse orthogonal conversion unit 84, an arithmetic unit 85, a deblocking filter 86, an adaptive offset filter 87, and an adaptive. It includes a loop filter 88, a screen sorting buffer 89, a frame memory 90, a selection unit 91, an intra prediction unit 92, a motion prediction / compensation unit 93, and a selection unit 94.

The storage buffer 81 is also a receiving unit that receives the transmitted encoded data. The storage buffer 81 receives the transmitted encoded data and stores it. This coded data is encoded by the image coding device 60. The lossless decoding unit 82 decodes the coded data read from the storage buffer 81 at a predetermined timing by a method corresponding to the coding method of the lossless coding unit 66 of FIG.

The reversible decoding unit 82 supplies parameters such as information indicating the decoded intra prediction mode to the intra prediction unit 92, and supplies parameters such as information indicating the inter prediction mode and motion vector information to the motion prediction / compensation unit 93. .. Further, the reversible decoding unit 82 supplies the decoded deblocking filter parameters to the deblocking filter 86, and supplies the decoded adaptive offset parameters to the adaptive offset filter 87.

The inverse quantization unit 83 dequantizes the coefficient data (quantization coefficient) obtained by decoding by the reversible decoding unit 82 by a method corresponding to the quantization method of the quantization unit 65 in FIG. That is, the inverse quantization unit 83 performs the inverse quantization of the quantization coefficient by the same method as the inverse quantization unit 68 of FIG. 8 using the quantization parameters supplied from the image coding device 60.

The inverse quantized unit 83 supplies the inverse quantized coefficient data, that is, the orthogonal conversion coefficient to the inverse orthogonal conversion unit 84. The inverse orthogonal conversion unit 84 is a method corresponding to the orthogonal conversion method of the orthogonal conversion unit 64 of FIG. 8, and the orthogonal conversion coefficient is inversely orthogonally converted to the residual data before the orthogonal conversion by the image coding apparatus 60. Obtain the corresponding decoding residual data.

The decoding residual data obtained by the inverse orthogonal conversion is supplied to the calculation unit 85. Further, the calculation unit 85 is supplied with a prediction image from the intra prediction unit 92 or the motion prediction / compensation unit 93 via the selection unit 94.

The calculation unit 85 adds the decoded residual data and the predicted image, and obtains the decoded image data corresponding to the image data before the predicted image is subtracted by the calculation unit 63 of the image coding device 60. The calculation unit 85 supplies the decoded image data to the deblocking filter 86.

The deblocking filter 86 suppresses block distortion of the decoded image by appropriately performing deblocking filter processing on the image from the calculation unit 85, and supplies the filter processing result to the adaptive offset filter 87. The deblocking filter 86 is basically configured in the same manner as the deblocking filter 71 of FIG. That is, the deblocking filter 86 has parameters β and Tc obtained based on the quantization parameters. The parameters β and Tc are threshold values used for determining the deblocking filter.

Note that the parameters β and Tc of the deblocking filter 86 are extended from β and Tc specified by the HEVC method. Each offset of the parameters β and Tc of the deblocking filter encoded by the image coding device 60 is received by the image decoding device 80 as a parameter of the deblocking filter, decoded by the reversible decoding unit 82, and deblocking. Used by filter 86.

The adaptive offset filter 87 mainly performs offset filter (SAO) processing that suppresses ringing on the image after filtering by the deblocking filter 86.

The adaptive offset filter 87 uses a quad-tree structure in which the type of offset filter is determined for each divided region and an offset value for each divided region to filter the image after filtering by the deblocking filter 86. Apply processing. The adaptive offset filter 87 supplies the filtered image to the adaptive loop filter 88.

The quad-tree structure and the offset value for each divided region are calculated by the adaptive offset filter 72 of the image coding device 60, and are encoded and sent as the adaptive offset parameter. Then, the quad-tree structure encoded by the image coding device 60 and the offset value for each divided region are received by the image decoding device 80 as adaptive offset parameters, decoded by the reversible decoding unit 82, and the adaptive offset. Used by filter 87.

The adaptive loop filter 88 filters the image filtered by the adaptive offset filter 87 for each processing unit using the filter coefficient, and supplies the filter processing result to the frame memory 90 and the screen sorting buffer 89. To do.

Although not shown in the example of FIG. 10, in the image decoding device 80, the filter coefficient is calculated for each LUC by the adaptive loop filter 73 of the image coding device 60, and is encoded and sent as an adaptive loop filter parameter. What has been obtained is decoded by the reversible decoding unit 82 and used.

The screen rearrangement buffer 89 rearranges the images, and the images (Decoded Picture (s)) are output to a display (not shown) and displayed. That is, the order of the frames rearranged for the coding order by the screen rearrangement buffer 61 of FIG. 8 is rearranged in the original display order.

The output of the adaptive loop filter 88 is further supplied to the frame memory 90.

The frame memory 90, the selection unit 91, the intra prediction unit 92, the motion prediction / compensation unit 93, and the selection unit 94 are the frame memory 74, the selection unit 75, the intra prediction unit 76, and the motion prediction / compensation unit of the image coding device 60. It corresponds to 77 and the prediction image selection unit 78, respectively.

The selection unit 91 reads the interprocessed image and the referenced image from the frame memory 90 and supplies the motion prediction / compensation unit 93. Further, the selection unit 91 reads the image used for the intra prediction from the frame memory 90 and supplies it to the intra prediction unit 92.

Information and the like indicating the intra prediction mode obtained by decoding the header information are appropriately supplied to the intra prediction unit 92 from the reversible decoding unit 82. Based on this information, the intra prediction unit 92 generates a prediction image from the reference image acquired from the frame memory 90, and supplies the generated prediction image to the selection unit 94.

Information (prediction mode information, motion vector information, reference frame information, flags, various parameters, etc.) obtained by decoding the header information is supplied to the motion prediction / compensation unit 93 from the reversible decoding unit 82.

The motion prediction / compensation unit 93 generates a prediction image from the reference image acquired from the frame memory 90 based on the information supplied from the reversible decoding unit 82, and supplies the generated prediction image to the selection unit 94. Then, in the motion prediction / compensation unit 93, as described with reference to FIG. 3 described above, the SAD value is supplied from the DMVR processing unit 11 to the BDOF processing unit 12, and the SAD value is re-determined for the early termination determination of the BDOF processing. It will be used.

The selection unit 94 selects the prediction image generated by the motion prediction / compensation unit 93 or the intra prediction unit 92 and supplies it to the calculation unit 85.

<Operation of image decoding device>
An example of the flow of the decoding process executed by the image decoding apparatus 80 as described above will be described with reference to FIG.

When the decoding process is started, in step S51, the storage buffer 81 receives the transmitted coded stream (data) and stores it. In step S52, the reversible decoding unit 82 decodes the coded data supplied from the storage buffer 81. The I picture, P picture, and B picture encoded by the lossless coding unit 66 of FIG. 8 are decoded.

Prior to decoding the picture, parameter information such as motion vector information, reference frame information, and prediction mode information (intra prediction mode or inter prediction mode) is also decoded.

When the prediction mode information is the intra prediction mode information, the prediction mode information is supplied to the intra prediction unit 92. When the prediction mode information is the inter-prediction mode information, the motion vector information corresponding to the prediction mode information is supplied to the motion prediction / compensation unit 93. The parameters of the deblocking filter and the adaptive offset parameters are also decoded and supplied to the deblocking filter 86 and the adaptive offset filter 87, respectively.

In step S53, the intra prediction unit 92 or the motion prediction / compensation unit 93 performs prediction image generation processing, respectively, in response to the prediction mode information supplied from the reversible decoding unit 82. Here, as described with reference to FIG. 3 described above, the motion prediction / compensation unit 93 supplies the SAD value from the DMVR processing unit 11 to the BDOF processing unit 12 and uses the SAD value for early termination determination of the BDOF processing. By reusing it, the complexity of calculation can be reduced as compared with the conventional case.

That is, when the intra prediction mode information is supplied from the reversible decoding unit 82, the intra prediction unit 92 generates an intra prediction image of the intra prediction mode. When the inter-prediction mode information is supplied from the reversible decoding unit 82, the motion prediction / compensation unit 93 performs the motion prediction / compensation processing in the inter-prediction mode to generate the inter-prediction image.

By this process, the prediction image (intra prediction image) generated by the intra prediction unit 92 or the prediction image (inter prediction image) generated by the motion prediction / compensation unit 93 is supplied to the selection unit 94.

In step S54, the selection unit 94 selects a predicted image. That is, the prediction image generated by the intra prediction unit 92 or the prediction image generated by the motion prediction / compensation unit 93 is supplied. Therefore, the supplied predicted image is selected and supplied to the calculation unit 85, and is added to the output of the inverse orthogonal conversion unit 84 in step S57 described later.

The conversion coefficient decoded by the reversible decoding unit 82 in step S52 described above is also supplied to the inverse quantization unit 83. In step S55, the inverse quantization unit 83 dequantizes the conversion coefficient decoded by the reversible decoding unit 82 with a characteristic corresponding to the characteristic of the quantization unit 65 of FIG.

In step S56, the inverse orthogonal conversion unit 84 converts the conversion coefficient inversely quantized by the inverse quantization unit 83 with a characteristic corresponding to the characteristic of the orthogonal conversion unit 64 in FIG. As a result, the difference information corresponding to the input of the orthogonal conversion unit 64 (output of the calculation unit 63) of FIG. 8 is decoded.

In step S57, the calculation unit 85 adds the predicted image selected in the process of step S54 described above and input via the selection unit 94 to the difference information. This decodes the original image.

In step S58, the deblocking filter 86 performs a deblocking filter process on the image output from the calculation unit 85. At this time, the parameters β and Tc extended from β and Tc defined by the HEVC method are used as the threshold value for the determination regarding the deblocking filter. The filtered image from the deblocking filter 86 is output to the adaptive offset filter 87. In the deblocking filter processing, the offsets of the parameters β and Tc of the deblocking filter supplied from the reversible decoding unit 82 are also used.

In step S59, the adaptive offset filter 87 performs adaptive offset filter processing. By this processing, the filter processing is performed on the image after filtering by the deblocking filter 86 by using the quad-tree structure in which the type of the offset filter is determined for each divided area and the offset value for each divided area. Be given. The filtered image is fed to the adaptive loop filter 88.

In step S60, the adaptive loop filter 88 performs adaptive loop filter processing on the image filtered by the adaptive offset filter 87. The adaptive loop filter 88 performs filter processing for each processing unit on the input image using the filter coefficient calculated for each processing unit, and supplies the filter processing result to the screen sorting buffer 89 and the frame memory 90. To do.

In step S61, the frame memory 90 stores the filtered image.

In step S62, the screen sorting buffer 89 sorts the images after the adaptive loop filter 88. That is, the order of the frames sorted for coding by the screen sort buffer 61 of the image coding device 60 is rearranged to the original display order. After that, the images sorted by the screen sorting buffer 89 are output to a display (not shown), and the images are displayed.

When the process of step S62 is completed, the decryption process is completed.

<Computer configuration example>
Next, the series of processes (image processing method) described above can be performed by hardware or software. When a series of processes is performed by software, the programs constituting the software are installed on a general-purpose computer or the like.

FIG. 12 is a block diagram showing a configuration example of an embodiment of a computer on which a program for executing the above-mentioned series of processes is installed.

The program can be recorded in advance on the hard disk 105 or ROM 103 as a recording medium built in the computer.

Alternatively, the program can be stored (recorded) in the removable recording medium 111 driven by the drive 109. Such a removable recording medium 111 can be provided as so-called package software. Here, examples of the removable recording medium 111 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory.

In addition to installing the program on the computer from the removable recording medium 111 as described above, the program can be downloaded to the computer via a communication network or a broadcasting network and installed on the built-in hard disk 105. That is, for example, the program transfers wirelessly from a download site to a computer via an artificial satellite for digital satellite broadcasting, or transfers to a computer by wire via a network such as LAN (Local Area Network) or the Internet. be able to.

The computer includes a CPU (Central Processing Unit) 102, and an input/output interface 110 is connected to the CPU 102 via a bus 101.

When a command is input by the user by operating the input unit 107 or the like via the input / output interface 110, the CPU 102 executes a program stored in the ROM (Read Only Memory) 103 accordingly. .. Alternatively, the CPU 102 loads the program stored in the hard disk 105 into the RAM (Random Access Memory) 104 and executes it.

As a result, the CPU 102 performs processing according to the above-mentioned flowchart or processing performed according to the above-mentioned block diagram configuration. Then, the CPU 102 outputs the processing result from the output unit 106, transmits it from the communication unit 108, or records it on the hard disk 105, if necessary, via the input / output interface 110, for example.

The input unit 107 is composed of a keyboard, a mouse, a microphone, and the like. Further, the output unit 106 is composed of an LCD (Liquid Crystal Display), a speaker, or the like.

Here, in the present specification, the processing performed by the computer according to the program does not necessarily have to be performed in chronological order in the order described as the flowchart. That is, the processing performed by the computer according to the program also includes processing executed in parallel or individually (for example, parallel processing or processing by an object).

Further, the program may be processed by one computer (processor) or may be distributed by a plurality of computers. Further, the program may be transferred to a distant computer and executed.

Furthermore, in the present specification, the system means a set of a plurality of constituent elements (devices, modules (parts), etc.), and it does not matter whether or not all constituent elements are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a device in which a plurality of modules are housed in one housing are both systems. ..

Further, for example, the configuration described as one device (or processing unit) may be divided and configured as a plurality of devices (or processing units). On the contrary, the configurations described above as a plurality of devices (or processing units) may be collectively configured as one device (or processing unit). Further, of course, a configuration other than the above may be added to the configuration of each device (or each processing unit). Further, if the configuration and operation of the entire system are substantially the same, a part of the configuration of one device (or processing unit) may be included in the configuration of another device (or other processing unit). ..

Further, for example, this technology can have a cloud computing configuration in which one function is shared by a plurality of devices via a network and jointly processed.

Further, for example, the above-mentioned program can be executed in any device. In that case, the device may have necessary functions (functional blocks, etc.) so that necessary information can be obtained.

Further, for example, each step described in the above flowchart can be executed by one device or can be shared and executed by a plurality of devices. Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices. In other words, a plurality of processes included in one step can be executed as processes of a plurality of steps. On the contrary, the processes described as a plurality of steps can be collectively executed as one step.

In the program executed by the computer, the processing of the steps for describing the program may be executed in chronological order according to the order described in this specification, or may be called in parallel or called. It may be executed individually at a necessary timing such as time. That is, as long as there is no contradiction, the processing of each step may be executed in an order different from the above-mentioned order. Further, the processing of the step for writing this program may be executed in parallel with the processing of another program, or may be executed in combination with the processing of another program.

It should be noted that the present techniques described in the present specification can be independently implemented independently as long as there is no contradiction. Of course, any plurality of the present technologies can be used in combination. For example, some or all of the techniques described in any of the embodiments may be combined with some or all of the techniques described in other embodiments. It is also possible to carry out a part or all of any of the above-mentioned techniques in combination with other techniques not described above.

<Example of configuration combination>
The present technology can also have the following configurations.
(1)
A DMVR processing unit that has a motion vector search unit that searches for motion vectors and performs DMVR (Decoder-side Motion Vector Refinement) processing.
It has a prediction unit that predicts the current prediction block using the motion vector, and also has a BDOF processing unit that performs BDOF (Bi-Directional Optical Flow) processing.
The SAD (Sum of Absolute Difference) value obtained by the motion vector search unit when searching for the motion vector is supplied from the DMVR processing unit to the BDOF processing unit.
An image processing device that uses the SAD value for early termination determination to determine whether or not to terminate BDOF processing early.
(2)
The above (1), wherein when it is determined that the BDOF process is not terminated early in the early termination determination, it is determined that the BDOF process is not terminated early even in the early termination determination performed after the block is divided. Image processing device.
(3)
The image processing apparatus according to (2) above, wherein when the block size after division of the block is 4 × 4, it is determined by the early termination determination that the BDOF processing is not terminated early.
(4)
A DMVR processing unit that has a motion vector search unit that searches for motion vectors and performs DMVR (Decoder-side Motion Vector Refinement) processing.
An image processing device having a prediction unit that predicts the current prediction block using the motion vector and a BDOF processing unit that performs BDOF (Bi-Directional Optical Flow) processing
The DMVR processing unit supplies the SAD (Sum of Absolute Difference) value obtained when the motion vector search unit searches for the motion vector to the BDOF processing unit.
An image processing method including using the SAD value for an early termination determination to determine whether or not to terminate the BDOF processing early.

Note that the present embodiment is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present disclosure. Further, the effects described in the present specification are merely examples and are not limited, and other effects may be obtained.

11 DMVR processing unit, 12 BDOF processing unit, 21 motion vector search unit, 22 SAD calculation unit, 23 prediction unit

Claims

A DMVR processing unit that has a motion vector search unit that searches for motion vectors and performs DMVR (Decoder-side Motion Vector Refinement) processing.
It has a prediction unit that predicts the current prediction block using the motion vector, and also has a BDOF processing unit that performs BDOF (Bi-Directional Optical Flow) processing.
The SAD (Sum of Absolute Difference) value obtained by the motion vector search unit when searching for the motion vector is supplied from the DMVR processing unit to the BDOF processing unit.
An image processing device that uses the SAD value for early termination determination to determine whether or not to terminate BDOF processing early.
The image according to claim 1, wherein if it is determined in the early termination determination that the BDOF processing is not terminated early, it is determined that the BDOF processing is not terminated early even in the early termination determination performed after the block is divided. Processing equipment.
The image processing apparatus according to claim 2, wherein when the block size after division of the block is 4 × 4, it is determined by the early termination determination that the BDOF processing is not terminated early.
A DMVR processing unit that has a motion vector search unit that searches for motion vectors and performs DMVR (Decoder-side Motion Vector Refinement) processing.
An image processing device having a prediction unit that predicts the current prediction block using the motion vector and a BDOF processing unit that performs BDOF (Bi-Directional Optical Flow) processing
The DMVR processing unit supplies the SAD (Sum of Absolute Difference) value obtained when the motion vector search unit searches for the motion vector to the BDOF processing unit.
An image processing method including using the SAD value for an early termination determination to determine whether or not to terminate the BDOF processing early.