CN116868566A - AI-based image encoding and decoding apparatus and method thereof - Google Patents

AI-based image encoding and decoding apparatus and method thereof Download PDF

Info

Publication number
CN116868566A
CN116868566A CN202280016009.4A CN202280016009A CN116868566A CN 116868566 A CN116868566 A CN 116868566A CN 202280016009 A CN202280016009 A CN 202280016009A CN 116868566 A CN116868566 A CN 116868566A
Authority
CN
China
Prior art keywords
optical flow
current
image
previous
feature data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280016009.4A
Other languages
Chinese (zh)
Inventor
Q·丁
朴慜祐
朴缗茱
崔光杓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020210171269A external-priority patent/KR20220120436A/en
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority claimed from PCT/KR2022/002493 external-priority patent/WO2022177383A1/en
Publication of CN116868566A publication Critical patent/CN116868566A/en
Pending legal-status Critical Current

Links

Abstract

A method of reconstructing optical flow using Artificial Intelligence (AI), comprising: obtaining characteristic data of a current residual light stream of a current image from a bit stream; obtaining a current residual optical flow by applying feature data of the current residual optical flow to a first decoder based on a neural network; obtaining a current predicted optical flow based on at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow; and reconstructing a current optical flow based on the current residual optical flow and the current predicted optical flow.

Description

AI-based image encoding and decoding apparatus and method thereof
Technical Field
The present disclosure relates to image encoding and decoding. More particularly, the present disclosure relates to techniques for encoding and decoding optical flow required for inter prediction of an image by using Artificial Intelligence (AI) (e.g., a neural network), and techniques for encoding and decoding an image.
Background
Codecs such as h.264 Advanced Video Codec (AVC) and High Efficiency Video Codec (HEVC) may divide an image into blocks and predictively encode and decode each block through inter prediction or intra prediction.
Intra prediction is a method of compressing pictures by removing spatial redundancy in the pictures, and inter prediction is a method of compressing pictures by removing temporal redundancy between pictures.
A representative example of inter prediction is motion estimation codec. Motion estimation codec predicts a block of a current picture by using a reference picture. By using a certain evaluation function, the reference block most similar to the current block can be found within a certain search range. The current block is predicted based on the reference block, and a residual block is generated and encoded by subtracting the prediction block generated as a prediction result from the current block.
In order to calculate a motion vector indicating a reference block in a reference picture, a motion vector of a previously encoded block may be used as a predicted motion vector of a current block. The differential motion vector corresponding to the difference between the motion vector of the current block and the predicted motion vector is signaled to the decoder in some way.
Disclosure of Invention
Technical problem
An image encoding and decoding apparatus and method according to an embodiment of the present disclosure may signal an optical flow required for inter prediction at a low bit rate.
An image encoding and decoding apparatus and method according to an embodiment of the present disclosure, which can accurately reconstruct an optical flow, are also provided.
An image encoding and decoding apparatus and method according to an embodiment of the present disclosure are also provided, which can accurately reconstruct an image from a bitstream at a low bit rate.
Technical solution to the problem
According to an aspect of the disclosure, a method of reconstructing optical flow by using Artificial Intelligence (AI) includes: obtaining characteristic data of a current residual light stream of a current image from a bit stream; obtaining a current residual optical flow by applying feature data of the current residual optical flow to a first decoder based on a neural network; obtaining a current predicted optical flow based on at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow; and reconstructing a current optical flow based on the current residual optical flow and the current predicted optical flow.
The disclosed advantageous effects
The image encoding and decoding apparatus and method according to the embodiments of the present disclosure may signal an optical flow required for inter prediction at a low bit rate.
Further, the image encoding and decoding apparatus and method according to the embodiments of the present disclosure may accurately reconstruct an optical flow.
Further, the image encoding and decoding apparatus and method according to the embodiments of the present disclosure may accurately reconstruct an image from a bitstream at a low bit rate.
Drawings
Fig. 1 is a diagram illustrating an Artificial Intelligence (AI) -based inter prediction process for an image according to an embodiment.
Fig. 2 is a diagram showing a continuous image and an optical flow between continuous images according to an embodiment.
Fig. 3 is a diagram showing a configuration of an image decoding apparatus according to an embodiment.
Fig. 4 is a diagram showing a configuration of the acquirer of fig. 3 according to an embodiment.
Fig. 5 is a diagram illustrating a configuration of the predictive decoder of fig. 3 according to an embodiment.
Fig. 6 is a diagram showing a configuration of an optical flow predictor according to an embodiment.
Fig. 7 is a diagram showing a configuration of an optical flow predictor according to an embodiment.
Fig. 8 is a diagram showing a configuration of an optical flow predictor according to an embodiment.
FIG. 9 is a flow chart illustrating a method of reconstructing optical flow according to an embodiment.
Fig. 10 is a diagram showing another configuration of a predictive decoder.
FIG. 11 is a flow chart illustrating a method of reconstructing optical flow according to an embodiment.
Fig. 12 is a diagram showing a configuration of an image encoding apparatus according to an embodiment.
Fig. 13 is a diagram illustrating a configuration of the predictive encoder of fig. 12 according to an embodiment.
Fig. 14 is a diagram showing a configuration of an optical flow predictor according to an embodiment.
Fig. 15 is a diagram illustrating a configuration of the generator of fig. 12 according to an embodiment.
FIG. 16 is a flow chart illustrating a method of encoding optical flow according to an embodiment.
Fig. 17 is a diagram illustrating another configuration of a predictive encoder according to an embodiment.
Fig. 18 is a diagram showing a structure of a neural network according to an embodiment.
Fig. 19 is a diagram for describing a convolution operation performed at the convolution layer of fig. 18 according to an embodiment.
Fig. 20 is a diagram for describing a method of training a neural network used in an inter prediction process according to an embodiment.
Fig. 21 is a diagram for describing a process in which a training apparatus trains a neural network in an inter-prediction process according to an embodiment.
Fig. 22 is a diagram for describing another process in which a training apparatus trains a neural network in an inter-prediction process according to an embodiment.
Detailed Description
According to an aspect of the disclosure, a method of reconstructing optical flow by using Artificial Intelligence (AI) includes: obtaining characteristic data of a current residual light stream of a current image from a bit stream; obtaining a current residual optical flow by applying feature data of the current residual optical flow to a first decoder based on a neural network; obtaining a current predicted optical flow based on at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow; and reconstructing a current optical flow based on the current residual optical flow and the current predicted optical flow.
The current image may be reconstructed based on the current residual image data and a current predicted image generated based on the previously reconstructed image and the reconstructed current optical flow.
Obtaining the current predicted optical flow may include selecting a previous optical flow as the current predicted optical flow.
Obtaining the current predicted optical flow may include applying at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow to the first predicted neural network.
Obtaining the current predicted optical flow may include: obtaining a second order optical flow between the current predicted optical flow and the previous optical flow by applying at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow to the second predicted neural network; and generating a current predicted optical flow by modifying the previous optical flow according to the second order optical flow.
Obtaining the current predicted optical flow may include: obtaining feature data of a second order optical flow between a current predicted optical flow and a previous optical flow from the bitstream; obtaining a second-order optical flow by applying the feature data of the second-order optical flow to a third decoder based on a neural network; and generating a current predicted optical flow by modifying the previous optical flow according to the second order optical flow.
The feature data of the current residual optical flow may be obtained by performing entropy decoding and inverse quantization on the bitstream.
The first decoder based on the neural network may be trained based on: first loss information corresponding to a difference between a current training image and a current reconstructed training image related to the current training image; and second loss information corresponding to entropy of feature data of a current residual optical flow of the current training image.
The feature data of the current residual optical flow may be obtained from the bitstream based on the current picture corresponding to the predicted (P) frame and based on a P frame following another P frame.
The method may further comprise obtaining feature data of the current optical flow from the bitstream; and reconstructing the current optical flow by applying the feature data of the current optical flow to a fourth decoder based on the neural network.
According to an aspect of the disclosure, a computer-readable recording medium has instructions recorded thereon, which when executed by at least one processor of a device for reconstructing optical flow by using Artificial Intelligence (AI), cause the at least one processor to: obtaining characteristic data of a current residual light stream of a current image from a bit stream; obtaining a current residual optical flow by applying feature data of the current residual optical flow to a first decoder based on a neural network; obtaining a current predicted optical flow based on at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow; and reconstructing the current optical flow by using the current residual optical flow and the current predicted optical flow.
According to an aspect of the disclosure, an apparatus for reconstructing optical flow by using Artificial Intelligence (AI) includes: at least one processor configured to implement a bitstream acquirer configured to acquire feature data of a current residual optical flow from a bitstream of a current image; and a predictive decoder configured to: obtaining a current residual optical flow by applying the feature data of the current residual optical flow to a first decoder based on the neural network, obtaining a current predicted optical flow using at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow, and reconstructing the current optical flow based on the current residual optical flow and the current predicted optical flow.
According to one aspect of the disclosure, a method of encoding an optical stream by using Artificial Intelligence (AI) includes: obtaining a current predicted optical flow from at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow; obtaining feature data of a current residual optical flow by applying the current image, the previously reconstructed image, and the current predicted optical flow to a first encoder based on a neural network; and generating a bitstream corresponding to the feature data of the current residual optical flow, wherein the current residual optical flow corresponds to a difference between the current optical flow and the current predicted optical flow.
According to an aspect of the disclosure, an apparatus for encoding an optical stream by using Artificial Intelligence (AI) includes: at least one processor configured to implement: a predictive encoder configured to: obtaining a current predicted optical flow from at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow, and obtaining the feature data of the current residual optical flow by applying the current image, the previous reconstructed image, and the current predicted optical flow to a first encoder based on a neural network; and a bitstream generator configured to generate a bitstream corresponding to the feature data of the current residual optical flow, wherein the current residual optical flow corresponds to a difference between the current optical flow and the current predicted optical flow.
According to an aspect of the disclosure, a method of reconstructing optical flow by using Artificial Intelligence (AI) includes: obtaining a current residual optical flow by applying feature data of the current residual optical flow to a first decoder based on a neural network; obtaining a current predicted optical flow based on at least one of a previous optical flow corresponding to a previous reconstructed image, feature data of a previous optical flow, or feature data of a previous residual optical flow corresponding to a previous optical flow; combining the current predicted optical flow with the current residual optical flow to obtain a current optical flow; obtaining a current predicted image by performing motion compensation on a previous reconstructed image based on a current optical flow; and reconstructing the current image based on the current prediction image and the current residual image data.
Mode for the invention
While the embodiments of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit embodiments of the disclosure to the particular forms disclosed, but on the contrary, the embodiments of the disclosure are to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure.
In the following description of the present disclosure, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure unclear. It will be understood that the terms "first," "second," and the like, herein are used merely to distinguish one element from another element.
It will also be understood that when an element is referred to as being "connected to" another element, it can be "directly connected to" the other element or be "connected to" the other element through intervening elements unless the context clearly dictates otherwise.
Embodiments may be described and illustrated in terms of blocks that perform one or more of the functions described, as is conventional in the art. As shown, these blocks, which may be referred to herein as "units" or "modules" or the like, or as, for example, encoders, decoders, acquisitors, quantizers, converters, subtractors, compensators, changers, etc., may be physically implemented by analog or digital circuits (such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits, etc.), and may be driven by firmware and software. For example, the circuitry may be embodied in one or more semiconductor chips, or on a substrate support such as a printed circuit board or the like. The circuitry included in a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware that performs some of the functions of the block and a processor that performs other functions of the block. Each block of an embodiment may be physically separated into two or more interacting and discrete blocks. Also, the blocks of the embodiments may be physically combined into more complex blocks.
Two or more elements expressed as "units", "modules", or the like may be combined into one element, or one element may be divided into two or more elements for the function of subdivision. Each element described herein may perform not only its main function, but also some or all of the functions of other elements in addition, and some of the main functions of each element may be performed exclusively by another element.
Throughout this disclosure, the expression "at least one of a, b or c" indicates all or variants thereof of a only, b only, c only, both a and b, both a and c, both b and c, a, b and c.
As used herein, the term "image" may refer to a still image or frame, a moving image comprising a plurality of consecutive still images or frames, or video.
"neural network" is a representative example of an artificial neural network model that simulates brain nerves, and is not limited to an artificial neural network model using a particular algorithm. The neural network may also be referred to as a deep neural network.
Further, "parameters" used herein may be values for calculation at each layer included in the neural network, and may be used, for example, to apply an input value to a certain operation formula. The parameter is a value set as a training result, and may be updated based on separate training data as necessary.
Further, "characteristic data" as used herein may refer to data obtained by processing input data by a neural network-based encoder. The feature data may be 1-dimensional or 2-dimensional data including a plurality of samples. The feature data may also be referred to as a potential representation. The feature data represents potential features of the data output by the decoder as described below.
Further, as used herein, a "current image" may refer to an image that is currently to be processed, "current optical flow" may refer to optical flow obtained for the current image, and "current residual data" may refer to residual data obtained for the current image.
Further, as used herein, a "previous image" may refer to an image to be processed prior to a current image, "previous optical flow" may refer to optical flow obtained for a previous image, and "previous residual data" may refer to residual data obtained for a previous image.
Further, "sample points" as used herein may correspond to data assigned to sampling locations in an image, feature map, or feature data, and refer to data to be processed. For example, a sample may be a pixel in a 2-dimensional image.
Fig. 1 is a diagram illustrating an Artificial Intelligence (AI) -based inter prediction process for an image.
Fig. 1 shows encoding and decoding a current image x i Is a process of (2). In the inter prediction, a first encoder 110, a second encoder 130, a first decoder 150, and a second decoder 170 are used. The first encoder 110, the second encoder 130, the first decoder 150, and the second decoder 170 are implemented as a neural network.
Inter prediction is performed by using the current image x i And previously reconstructed image y i-1 Temporal redundancy between to encode and decode current image x i Is a process of (2).
Current image x i Block or sample in (a) and previously reconstructed image y i-1 Position differences (or motion vectors) between reference blocks or reference samples in (a) are used for encoding and decoding a current image x i . These positional differences may be referred to as optical flows. Optical flow may be defined as a set of motion vectors corresponding to points or blocks in an image.
Optical flow represents the previously reconstructed image y i-1 The position of the sample point in the current image x i How to change, or the current image x i Is used to reconstruct the image y in advance i-1 Where it is located. For example, when in the current image x i The sample point located at (1, 1) in the previous reconstructed image y i-1 When located at (2, 1), the optical flow or motion vector of the sample can be derived as (1 (=2-1), 0 (=1-1)).
In the image encoding and decoding process using the AI, the first encoder 110 and the first decoder 150 are used to obtain the current image x i Current optical flow g i
Specifically, the image y is previously reconstructed i-1 And the current image x i Is input to the first encoder 110. The first encoder 110 is based onProcessing a current image x as a result of training set parameters i And previously reconstructed image y i-1 To output the characteristic data w of the current optical flow i
Characteristic data w of current optical flow i Representing potential features of the current optical flow.
Characteristic data w of current optical flow i Is input to the first decoder 150. The first decoder 150 processes the inputted feature data w by the parameter set as the training result i To output the current optical flow g i
According to the current optical flow g i The previously reconstructed image y is warped, for example by warping (warping) 190 i-1 And obtains the current predicted image x' i As a result of the distortion 190. The warping 190 is a geometric transformation used to change the position of the sample points in the image. Similar to the current image x i Current prediction image x' i By being based on the current optical flow g i Warping (e.g., by warping 190) the previously reconstructed image y i-1 Obtained by the method, wherein the current optical flow g i Representing a previously reconstructed image y i-1 Sample points in (1) and current image x i Relative positions between the spots in the array. For example, when the image y is reconstructed in the previous time i-1 The sample point at (1, 1) and the current image x i When the sample points located at (2, 1) are most similar, the image y is reconstructed before i-1 The position of the sample point located at (1, 1) in (c) can be changed to (2, 1) by the distortion 190.
Because the previously reconstructed image y is used i-1 Generated current prediction image x' i Not the current image x i Itself, so that the current prediction image x 'can be obtained' i And the current image x i Current residual image data r corresponding to the difference between them i
For example, by subtracting the current image x from the current image x i Subtracting the current predicted image x 'from the sample value in (a)' i Obtaining current residual image data r from the sample values in (a) i
Current residual image data r i Is input to the second encoder 130. The second encoder 130 is passed throughProcessing current residual image data r according to parameters set as training results i To output characteristic data v of current residual image data i
Feature data v of current residual image data i Is input to the second decoder 170. The second decoder 170 processes the input feature data v by the parameters set as a result of training i To output the current residual image data r' i
By combining the current residual image data r' i And previously reconstructing the image y by warping i-1 Current predicted image x '(e.g., generated by warping 190)' i Combining to obtain the current reconstructed image y i
In the inter prediction process of fig. 1, the feature data w of the current optical flow obtained by the first encoder 110 i Is input to the first decoder 150.
When the current image x is encoded and decoded from the viewpoint of the image encoding device i The encoding device should generate characteristic data w corresponding to the current optical flow i Corresponding bit stream to signal characteristic data w of current optical flow to decoding device i . However, when the current image x i And previous image x i-1 When the object included in (a) has a large motion, the size of the sample values included in the current optical flow is large, and thus, based on the feature data w representing the potential features of the current optical flow i And the bit rate of the generated bit stream may also be increased.
In the following embodiments of the present disclosure, the size of a bitstream generated as a result of encoding the current optical flow may be reduced by using the previous optical flow. An example of the correlation between the previous optical flow and the current optical flow will be described with reference to fig. 2.
Referring to fig. 2, a first optical flow 25 is obtained between the current image 23 and the first previous image 22, and a second optical flow 24 is obtained between the first previous image 22 and the second previous image 21.
The first optical flow 25 and the second optical flow 24 of fig. 2 are visualized based on the magnitudes of the motion vectors or samples included in each optical flow.
The first optical flow 25 may be referred to as the current optical flow and the second optical flow 24 may be referred to as the previous optical flow.
Referring to fig. 2, the degree of similarity between the first optical flow 25 and the second optical flow 24 may be identified. For example, the similarity of the sample values of the region a in the first optical flow 25 and the region B in the second optical flow 24 may be identified.
Since objects in temporally successive images tend to move linearly, the degree of similarity between the first optical flow 25 and the second optical flow 24 can be predicted.
That is, when the previous optical flow (e.g., the second optical flow 24) is used to encode the current optical flow (e.g., the first optical flow 25) of the current image 23, the size of the bit stream generated as a result of encoding the current optical flow may be reduced.
Fig. 3 is a diagram showing a configuration of an image decoding apparatus 300 according to an embodiment of the present disclosure.
Referring to fig. 3, an image decoding apparatus 300 according to an embodiment of the present disclosure includes an acquirer 310 and a predictive decoder 330, wherein the acquirer 310 may be, for example, a bitstream acquirer.
The fetcher 310 and the predictive decoder 330 may be implemented as processors and may operate according to instructions stored in memory.
Although the acquirer 310 and the predictive decoder 330 are respectively illustrated in fig. 3, in an embodiment, the acquirer 310 and the predictive decoder 330 may be implemented as one element, for example, one processor. In this case, the acquirer 310 and the prediction decoder 330 may be implemented as a dedicated processor, or a combination of software and a general-purpose processor, such as an Application Processor (AP), a Central Processing Unit (CPU), or a Graphics Processing Unit (GPU). The special purpose processor may include a memory to implement embodiments of the present disclosure, or may include a memory processor to use an external memory.
In an embodiment, the acquirer 310 and the predictive decoder 330 may be implemented as a plurality of processors. In this case, the acquirer 310 and the prediction decoder 330 may be implemented as a combination of dedicated processors, or a combination of software and a general-purpose processor such as an AP, a CPU, or a GPU.
The acquirer 310 acquires a bitstream including a result of encoding the current image.
The acquirer 310 may receive a bit stream transmitted through a network from the image encoding apparatus 1200 described below. In embodiments of the present disclosure, the acquirer 310 may acquire a bit stream from a data storage medium including a magnetic medium (e.g., a hard disk, a floppy disk, or a magnetic tape), an optical medium (e.g., a compact disk read only memory (CD-ROM) or a Digital Versatile Disk (DVD)), or a magneto-optical medium (e.g., a floppy disk).
The acquirer 310 acquires feature data of the current residual optical flow and feature data of the current residual image data by parsing the bit stream.
The current residual optical flow corresponds to a difference between the current predicted optical flow predicted from the previous optical flow and the current optical flow. The current residual image data corresponds to a difference between a current predicted image predicted from a previous reconstructed image and the current image.
The feature data of the current residual optical flow and the feature data of the current residual image data may be obtained as a result of processing by the encoder based on the neural network.
In an embodiment of the present disclosure, the acquirer 310 may acquire a first bit stream corresponding to the feature data of the current residual optical flow and a second bit stream corresponding to the feature data of the current residual image data, and may acquire the feature data of the current residual optical flow and the feature data of the current residual image data by parsing the first bit stream and the second bit stream, respectively.
The feature data of the current residual light stream and the feature data of the current residual image data are transmitted to the prediction decoder 330, and the prediction decoder 330 obtains a current reconstructed image corresponding to the current image by using the feature data of the current residual light stream and the feature data of the current residual image data.
According to an implementation example, in an embodiment, the feature data of the current residual image data may not be included in the bitstream. The acquirer 310 may acquire feature data of the current residual optical flow from the bitstream, and the prediction decoder 330 may reconstruct the current optical flow. In this case, the image decoding apparatus 300 may be referred to as an optical flow decoding apparatus.
The current optical flow reconstructed by the predictive decoder 330 may be transmitted to another device, and a current reconstructed image may be generated by the other device based on the current optical flow.
In detail, the other device may generate the current reconstructed image by combining current residual image data obtained from the bitstream with a current predicted image generated from a previous reconstructed image according to a current optical flow.
An example of the operation of the acquirer 310 and the predictive decoder 330 will now be described in detail with reference to fig. 4 and 5.
Fig. 4 is a diagram showing a configuration of the acquirer 310 of fig. 3.
Referring to fig. 4, the acquirer 310 includes an entropy decoder 311 and an inverse quantizer 313.
The entropy decoder 311 obtains quantization characteristic data of the current residual optical flow and quantization characteristic data of the current residual image data by entropy decoding binary bits (bins) included in the bitstream.
The inverse quantizer 313 obtains the feature data of the current residual optical flow and the feature data of the current residual image data by inversely quantizing the quantized feature data of the current residual optical flow and the quantized feature data of the current residual image data.
According to an implementation example, in an embodiment, the acquirer 310 may further include an inverse transformer. The inverse transformer inversely transforms the feature data output from the inverse quantizer 313 from the frequency domain to the spatial domain. When the image encoding apparatus 1200 described below transforms the feature data of the current residual optical flow and the feature data of the current residual image data from the spatial domain to the frequency domain, the inverse transformer may inverse transform the feature data output from the inverse quantizer 313 from the frequency domain to the spatial domain.
Further, according to an implementation example, in an embodiment, the acquirer 310 may not include the inverse quantizer 313. That is, the feature data of the current residual optical flow and the feature data of the current residual image data can be obtained by the processing of the entropy decoder 311.
Further, according to an implementation example, in an embodiment, the acquirer 310 may acquire the feature data of the current residual optical flow and the feature data of the current residual image data by merely inverse binarizing binary bits included in the bit stream. This may be performed for the case where the image encoding apparatus 1200 generates a bitstream by binarizing the characteristic data of the current residual optical flow and the characteristic data of the current residual image data (in other words, the case where the image encoding apparatus 1200 does not apply entropy decoding, transformation, and quantization to the characteristic data of the current residual optical flow and the characteristic data of the current residual image data).
Fig. 5 is a diagram showing a configuration of the prediction decoder 330 of fig. 3.
Referring to fig. 5, the prediction decoder 330 may include a first decoder 331, a second decoder 333, an optical flow predictor 334, a first combiner 336, a motion compensator 335, and a second combiner 337.
The first decoder 331 and the second decoder 333 may be stored in a memory. In an embodiment of the present disclosure, the first decoder 331 and the second decoder 333 may be implemented as at least one dedicated processor for AI.
The feature data of the current residual optical flow output from the acquirer 310 is input to the first decoder 331, and the feature data of the current residual image data is input to the second decoder 333.
According to an embodiment, in order to accurately reconstruct the current residual image data, in an embodiment, the feature data of the current residual optical flow or the feature data of the current optical flow may be concatenated with the feature data of the current residual image data and then may be input to the second decoder 333. The concatenation may refer to a process of combining two or more pieces of feature data in a channel direction.
The first decoder 331 processes feature data of the current residual optical flow according to parameters set through training to obtain the current residual optical flow. The current residual optical flow, which is 1-dimensional or 2-dimensional data, may include a plurality of samples.
The second decoder 333 obtains the current residual image data by processing the feature data of the current residual image data according to the parameters set through training. The current residual image data, which is 1-dimensional or 2-dimensional data, may include a plurality of samples.
The optical flow predictor 334 obtains a current predicted optical flow by using at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow.
The current predicted optical flow, which is 1-dimensional or 2-dimensional data, may include a plurality of samples.
In embodiments of the present disclosure, optical flow predictor 334 may determine or select a previous optical flow as the current predicted optical flow.
As described with reference to fig. 2, the previous optical flow may be very similar to the current optical flow because objects in successive images tend to move linearly. Therefore, when the previous optical flow is determined as the current predicted optical flow, the size of the sample value of the current residual optical flow and the size of the sample value of the feature data of the current residual optical flow can be reduced.
The current predicted optical flow obtained by the optical flow predictor 334 and the current residual optical flow obtained by the first decoder 331 are provided to the first combiner 336.
The first combiner 336 reconstructs the current optical flow by combining the current predicted optical flow with the current residual optical flow. The first combiner 336 may reconstruct the current optical flow by combining the sample values of the current predicted optical flow with the sample values of the current residual optical flow.
The motion compensator 335 generates a current predicted image similar to the current image by processing the previous reconstructed image according to the current optical flow. The previously reconstructed image is an image reconstructed by decoding a previous image to be processed before processing the current image.
The motion compensator 335 may warp the previously reconstructed image according to the current optical flow to generate a current predicted image. The warping used to generate the current predicted image is merely an example, and the motion compensator 335 may apply various types of image processing to change the positions of the samples in the previously reconstructed image to generate the previously reconstructed image, thereby generating the current predicted image similar to the current image.
The current prediction image generated by the motion compensator 335 is provided to a second combiner 337.
The second combiner 337 obtains a current reconstructed image by combining the current predicted image with the current residual image data. In an example, the second combiner 337 may obtain the current reconstructed image including a value obtained by summing a sample value of the current prediction image and a sample value of the current residual image data.
The current reconstructed image and the current optical flow may be used for the next image decoding process.
According to an implementation example, in an embodiment, the prediction decoder 330 may reconstruct a current optical flow from the feature data of the current residual optical flow, and may provide the reconstructed current optical flow to another device. In this case, the second decoder 333, the motion compensator 335, and the second combiner 337 may not be included in the predictive decoder 330.
According to an implementation example, when the current residual image data is available from the bitstream, in an embodiment, the second decoder 333 may not be included in the prediction decoder 330. That is, the prediction decoder 330 may generate a current reconstructed image by combining current residual image data obtained from a bitstream with a current predicted image.
According to the embodiments of the present disclosure, since the bit stream is generated based on the current residual optical flow including the samples having the size smaller than the current optical flow, a lower bit rate than when the bit stream is generated from the current optical flow can be achieved.
Although in the embodiment corresponding to fig. 5, the optical flow predictor 334 determines the previous optical flow as the current predicted optical flow, an example of the operation of the optical flow predictor 334 according to another embodiment of the present disclosure will be described with reference to fig. 6 to 8.
Fig. 6 is a diagram showing a configuration of an optical flow predictor 600 according to an embodiment of the present disclosure.
Referring to FIG. 6, optical flow predictor 600 includes a first prediction neural network 610. The first predictive neural network 610 may be stored in memory. In an embodiment of the present disclosure, the first predictive neural network 610 may be implemented as at least one dedicated processor for AI.
At least one of the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow is input to the first prediction neural network 610 to obtain a current prediction optical flow.
The feature data of the previous optical flow represents potential features of the previous optical flow used during the process of reconstructing the previous image.
In embodiments of the present disclosure, when the previous image is a predicted (P) frame following an intra (I) frame, the feature data of the previous optical flow may be obtained during the process of reconstructing the previous optical flow. The I frame and the P frame will be described below.
In another embodiment of the present disclosure, the predictive decoder 330 may reconstruct the previous optical flow and then may obtain feature data of the previous optical flow by applying the reconstructed previous optical flow to the neural network.
The first predicted optical flow is obtained by the first predicted neural network 610 by processing at least one of the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow according to the parameters set by training.
As described with reference to fig. 5, when the current prediction optical flow and the current residual optical flow are combined with each other, the current optical flow for generating the current prediction image is obtained. As described below with reference to fig. 20 and 21, the first prediction neural network 610 may be trained by sharing loss information (in other words, having a common purpose) together with the first encoder 1211, the second encoder 1215, the first decoder 331, and the second decoder 333.
Because the data output from the first prediction neural network 610 is combined with the current residual optical flow output by the first decoder 331 and then used to generate the current prediction image, the first prediction neural network 610 may be trained to output a difference between the current optical flow and the current residual optical flow, i.e., the current prediction optical flow.
Fig. 7 is a diagram showing a configuration of an optical flow predictor 700 according to another embodiment of the present disclosure.
Referring to fig. 7, the optical flow predictor 700 includes a second prediction neural network 710 and a modifier 720.
The second predictive neural network 710 may be stored in memory. In an embodiment of the present disclosure, the second predictive neural network 710 may be implemented as at least one dedicated processor for AI.
At least one of the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow is input to the second prediction neural network 710.
The second predictive neural network 710 obtains a second order optical flow between the current predicted optical flow and the previous optical flow from the parameters set by training.
The second order optical flow, which may be an optical flow between optical flows, may be defined or expressed as a set of motion vectors corresponding to samples or blocks in the optical flow.
The second order optical flow may represent how the locations of the samples in the previous optical flow change in the current predicted optical flow, or where in the previous optical flow the reference samples of the current predicted optical flow are located. For example, when a sample point located at (1, 1) in the previous optical flow is located at (2, 1) in the current predicted optical flow, the second-order optical flow or motion vector of the sample point may be derived as (1 (=2-1), 0 (=1-1)).
The modifier 720, which may be, for example, an optical flow modifier or an optical flow processor, obtains the current predicted optical flow by processing the previous optical flow according to the second order optical flow.
The operation of the changer 720 is similar to the operation of the motion compensator 335 of fig. 5. That is, the motion compensator 335 may obtain the current predicted image by warping the previously reconstructed image from the current optical flow, and the modifier 720 may obtain the current predicted optical flow by warping the previous optical flow from the second order optical flow.
The distortion used to generate the current predicted optical flow is merely an example, and modifier 720 may apply various types of processing for changing the positions of the samples in the previous optical flow to generate the current predicted optical flow similar to the current optical flow.
Because the data output from the second predictive neural network 710 is used to change the position of the sample in the previous optical flow, the second predictive neural network 710 can output data for changing the previous optical flow to the current predicted optical flow (i.e., second order optical flow) by training the second predictive neural network 710 based on the loss information.
Fig. 8 is a diagram showing a configuration of an optical flow predictor 800 according to another embodiment of the present disclosure.
Referring to fig. 8, the optical flow predictor 800 includes a third decoder 810 and a changer 720. The third decoder 810 may be stored in a memory. In an embodiment of the present disclosure, the third decoder 810 may be implemented as at least one dedicated processor for AI.
The third decoder 810 obtains the second-order optical flow by processing the feature data of the second-order optical flow according to the parameters set by training.
The feature data of the second-order optical flow may be obtained from the bit stream. The acquirer 310 may acquire feature data of the second-order optical flow from the bit stream, and may provide the feature data of the second-order optical flow to the prediction decoder 330.
The image encoding apparatus 1200 may generate a bitstream including the feature data of the current residual optical flow and the feature data of the current residual image data. According to an implementation example, in an embodiment, the image encoding apparatus 1200 may generate a bit stream of feature data further including a second-order optical flow, an example of which will be described below with reference to fig. 14.
The modifier 720 may obtain the current predicted optical flow by processing the previous optical flow according to the second order optical flow.
In embodiments of the present disclosure, modifier 720 may warp the previous optical flow from the second order optical flow to obtain the current predicted optical flow. The distortion used to generate the current predicted optical flow is merely an example, and modifier 720 may apply various types of processing for changing the positions of the sample points in the previous optical flow from the second-order optical flow to the previous optical flow.
In the embodiment according to fig. 8, the feature data of the second-order optical flow supplied from the image encoding apparatus 1200 is input to the third decoder 810 and processed by the third decoder 810. Accordingly, the complexity of the third decoder 810 may be reduced when compared to the first and second prediction neural networks 610 and 710 that receive and process at least one of the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow. This is because the third decoder 810 processes the feature data of the second-order optical flow representing the features of the second-order optical flow itself, and the first and second prediction neural networks 610 and 710 process the previous optical flow, the feature data of the previous optical flow, and/or the feature data of the previous residual optical flow, which may have relatively low correlation with the current and second-order optical flows.
FIG. 9 is a flow chart illustrating a method of reconstructing optical flow according to an embodiment of the present disclosure.
In operation S910, the image decoding apparatus 300 obtains feature data of a current residual optical flow from a bitstream of a current image.
The image decoding apparatus 300 may obtain the feature data of the current residual optical flow by applying at least one of inverse binarization, entropy decoding, inverse quantization, or inverse transformation to binary bits included in the bitstream.
In operation S920, the image decoding apparatus 300 obtains the current residual optical flow by applying the characteristic data of the current residual optical flow to the first decoder based on the neural network.
In operation S930, the image decoding apparatus 300 obtains a current predicted optical flow by using at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow.
In an embodiment of the present disclosure, the image decoding apparatus 300 may determine the previous optical flow as the current predicted optical flow.
In another embodiment of the present disclosure, the image decoding apparatus 300 may obtain the current predicted optical flow by applying at least one of the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow to the first prediction neural network 610.
In another embodiment of the present disclosure, the image decoding apparatus 300 may obtain the second order optical flow by applying at least one of the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow to the second prediction neural network 710, and may obtain the current prediction optical flow by processing the previous optical flow according to the second order optical flow.
In another embodiment of the present disclosure, the image decoding apparatus 300 may obtain the second order optical flow by applying the feature data of the second order optical flow obtained from the bitstream to the third decoder 810, and may obtain the current predicted optical flow by processing the previous optical flow according to the second order optical flow.
In operation S940, the image decoding apparatus 300 reconstructs a current optical flow by using the current residual optical flow and the current predicted optical flow. The image decoding apparatus 300 may obtain the current optical flow by summing the sample value of the current residual optical flow and the sample value of the current predicted optical flow.
In an embodiment of the present disclosure, the image decoding apparatus 300 may obtain the feature data of the current residual image data from the bitstream, and may obtain the current residual image data by applying the feature data of the current residual image data to the second decoder 333. The image decoding apparatus 300 may obtain the current predicted image by processing the previous reconstructed image according to the current optical flow, and may obtain the current reconstructed image by combining the current predicted image with the current residual image data.
In another embodiment of the present disclosure, the image decoding apparatus 300 may obtain current residual image data from a bitstream. The image decoding apparatus 300 may obtain the current predicted image by processing the previous reconstructed image according to the current optical flow, and may obtain the current reconstructed image by combining the current predicted image with the current residual image data.
In another embodiment of the present disclosure, the image decoding apparatus 300 may provide the current optical flow to another device so that the other device obtains the current reconstructed image.
The example of the inter prediction process described with reference to fig. 3 to 9 considers the case of processing a previous image through inter prediction. This is because the previous optical flow for reconstructing the current optical flow is generated in the inter-prediction process of the previous image.
That is, when the current picture corresponds to a P frame subsequent to a predicted (P) frame, in other words, when the previous picture is a P frame and the current picture is a P frame, the inter prediction process described with reference to fig. 3 to 9 may be applied. The term "P-frame" refers to an image or frame that can be reconstructed by intra-prediction or inter-prediction. An image or frame that can only be reconstructed by intra prediction is called an intra (I) frame.
Thus, when the previous image is an I-frame, the previous optical flow is not obtained. Accordingly, an example of an inter prediction process in the case where the current picture is a P frame after the I frame, in other words, in the case where the previous picture is an I frame and the current picture is a P frame, is described below.
Fig. 10 is a diagram showing another configuration of the predictive decoder 330.
Referring to fig. 10, the predictive decoder 330 includes a fourth decoder 1010, a second decoder 333, a motion compensator 335, and a second combiner 337.
The fourth decoder 1010 may be stored in a memory. In an embodiment of the present disclosure, the fourth decoder 1010 may be implemented as at least one dedicated processor for AI.
The fourth decoder 1010 obtains the current optical flow by processing the feature data of the current optical flow according to the parameters set by training.
Characteristic data of the current optical flow may be obtained from the bit stream. That is, the acquirer 310 may acquire the feature data of the current optical flow by applying at least one of inverse binarization, entropy decoding, inverse quantization, or inverse transformation to binary bits included in the bitstream.
The second decoder 333 obtains the current residual image data by processing the feature data of the current residual image data according to the parameters set through training.
The motion compensator 335 obtains a current predicted image by processing a previously reconstructed image according to the current optical flow, and the second combiner 337 obtains a current reconstructed image by combining the current predicted image with the current residual image data.
According to an implementation example, in an embodiment, the predictive decoder 330 may send the current optical flow to another device so that the other device obtains the current reconstructed image. In this case, the second decoder 333, the motion compensator 335, and the second combiner 337 may not be included in the predictive decoder 330.
In an embodiment, the predictive decoder 330 may further include a determiner configured to determine whether the current picture is a P frame following an I frame or a P frame following a P frame.
In an embodiment, when the current picture is a P frame following the P frame, the prediction decoder 330 may reconstruct the current optical flow through the first decoder 331, the optical flow predictor 334, and the first combiner 336 of fig. 5, and when the current picture is a P frame following the I frame, the prediction decoder 330 may reconstruct the current optical flow through the fourth decoder 1010 of fig. 10.
FIG. 11 is a flow chart illustrating a method of reconstructing optical flow according to another embodiment of the present disclosure.
In operation S1110, the image decoding apparatus 300 determines whether the current image is a P frame subsequent to the I frame.
When the current image is a P frame after the I frame, the image decoding apparatus 300 obtains feature data of the current optical flow from the bitstream in operation S1120.
The image decoding apparatus 300 may obtain the feature data of the current optical flow by applying at least one of inverse binarization, entropy decoding, inverse quantization, or inverse transformation to binary bits included in the bitstream.
In operation S1130, the image decoding apparatus 300 obtains the current optical flow by applying the feature data of the current optical flow to the fourth decoder 1010.
When the current picture is not a P frame following the I frame, in other words, when the current picture is a P frame following the P frame, the picture decoding apparatus 300 may reconstruct the current optical flow through operations S910 to S940 of fig. 9.
An example of the operation of the image encoding apparatus 1200 will now be described with reference to fig. 12 to 15.
Fig. 12 is a diagram showing a configuration of an image encoding apparatus 1200 according to an embodiment of the present disclosure.
Referring to fig. 12, the image encoding apparatus 1200 includes a predictive encoder 1210, a generator 1230, an acquirer 1250, and a predictive decoder 1270.
The predictive encoder 1210, the generator 1230, the acquirer 1250, and the predictive decoder 1270 may be implemented as processors, and the predictive encoder 1210, the generator 1230, the acquirer 1250, and the predictive decoder 1270 may operate according to instructions stored in a memory.
Although the predictive encoder 1210, the generator 1230, the acquirer 1250 and the predictive decoder 1270 are shown separately in fig. 12, in an embodiment, the predictive encoder 1210, the generator 1230, the acquirer 1250 and the predictive decoder 1270 may be implemented as one element, for example, one processor. In this case, the predictive coder 1210, the generator 1230, the acquirer 1250 and the predictive decoder 1270 may be implemented as a dedicated processor or a combination of software and a general-purpose processor such as an AP, a CPU or a GPU. Further, the special purpose processor may include a memory to implement embodiments of the present disclosure, or may include a memory processor to use an external memory.
The predictive encoder 1210, the generator 1230, the acquirer 1250, and the predictive decoder 1270 may be implemented as a plurality of processors. In this case, the prediction encoder 1210, the generator 1230, the acquirer 1250, and the prediction decoder 1270 may be implemented as a combination of dedicated processors, or a combination of software and a general-purpose processor such as an AP, a CPU, or a GPU.
The prediction encoder 1210 obtains feature data of a current residual optical flow and feature data of current residual image data by using a current image and a previously reconstructed image.
The prediction encoder 1210 may use a first encoder 1211 based on a neural network and a second encoder 1215 based on a neural network to obtain characteristic data of a current residual optical flow and characteristic data of a current residual image data.
The feature data of the current residual optical flow and the feature data of the current residual image data obtained by the prediction encoder 1210 are transmitted to a generator 1230, which may be a bitstream generator, for example.
The generator 1230 generates a bitstream from the feature data of the current residual optical flow and the feature data of the current residual image data. In an embodiment of the present disclosure, the generator 1230 may generate a first bit stream corresponding to the feature data of the current residual optical flow and a second bit stream corresponding to the feature data of the current residual image data.
The bitstream may be transmitted from the image decoding apparatus 300 through a network. Further, in embodiments of the present disclosure, the bitstream may be recorded on a data storage medium including a magnetic medium (e.g., hard disk, floppy disk, or magnetic tape), an optical medium (e.g., CD-ROM or DVD), or a magneto-optical medium (e.g., floppy disk).
The acquirer 1250, which may be, for example, a bitstream acquirer, acquires the feature data of the current residual optical flow and the feature data of the current residual image data from the bitstream generated by the generator 1230. According to an implementation example, in an embodiment, the acquirer 1250 may receive the feature data of the current residual optical flow and the feature data of the current residual image data from the prediction encoder 1210.
The feature data of the current residual light stream and the feature data of the current residual image data are transmitted to the prediction decoder 1270, and the prediction decoder 1270 reconstructs the current light stream by using the feature data of the current residual light stream and the current reconstructed image by using the feature data of the current light stream and the current residual image data.
The current optical flow and the current reconstructed image obtained by the predictive decoder 1270 may be used in the next image encoding process.
The configuration and operation of the acquirer 1250 and the predictive decoder 1270 may correspond to the operation of the acquirer 310 and the predictive decoder 330 of fig. 3 to 5, and thus, a detailed description thereof will be omitted. However, in embodiments where predictive encoder 1210 includes optical flow predictor 1217, for example as shown in FIG. 13, unlike predictive decoder 330 of FIG. 5, predictive decoder 1270 may not include optical flow predictor 334. This is because the prediction decoder 1270 can use the current predicted optical flow obtained by the optical flow predictor 1217 included in the prediction encoder 1210.
In an embodiment of the present disclosure, the prediction encoder 1210 may obtain feature data of the current residual optical flow by using the current image and the previously reconstructed image, and the generator 1230 may generate a bitstream corresponding to the feature data of the current residual optical flow. The acquirer 1250 may acquire feature data of the current residual optical flow from the bitstream, and the prediction decoder 1270 may reconstruct the current optical flow based on the feature data of the current residual optical flow.
That is, since the current optical flow is encoded by the prediction encoder 1210, the generator 1230, the acquirer 1250, and the prediction decoder 1270, the image encoding apparatus 1200 may be referred to as an optical flow encoding apparatus in this case.
The current optical flow reconstructed by the prediction decoder 1270 may be transmitted to another device so that the other device encodes the current residual image data. In detail, the other device may encode current residual image data corresponding to a difference between a current image and a current predicted image obtained from a previous reconstructed image according to a current optical flow.
An example of the configuration of the predictive encoder 1210 and the generator 1230 will be described in more detail with reference to fig. 13 to 15.
Fig. 13 is a diagram illustrating a configuration of the predictive coder 1210 of fig. 12.
The predictive encoder 1210 includes an optical flow predictor 1217, a first encoder 1211, a second encoder 1215, and a subtractor 1213.
The first encoder 1211 and the second encoder 1215 may be stored in a memory. In an embodiment of the present disclosure, the first encoder 1211 and the second encoder 1215 may be implemented as at least one dedicated processor for AI.
Referring to fig. 13, the optical flow predictor 1217 obtains a current predicted optical flow by using at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow.
The optical flow predictor 1217 may obtain a current predicted optical flow by using the same method as the optical flow predictor 334 of the image decoding apparatus 300.
For example, the optical flow predictor 1217 may have the same configuration as the optical flow predictor 600 or 700 of fig. 6 or 7, and may obtain a current predicted optical flow.
In detail, as described with reference to fig. 6, the optical flow predictor 1217 may obtain a current predicted optical flow by applying at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow to the first prediction neural network 610.
Further, as described with reference to fig. 7, the optical flow predictor 1217 may obtain a second order optical flow by applying at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow to the second prediction neural network 710, and may obtain a current predicted optical flow by processing the previous optical flow according to the second order optical flow.
In another example, optical flow predictor 1217 may determine the previous optical flow as the current predicted optical flow.
When the optical flow predictor 334 of the image decoding apparatus 300 includes the third decoder 810 and the changer 720 as shown in fig. 8, the optical flow predictor 1217 of the image encoding apparatus 1200 may have a configuration as described below with reference to fig. 14.
At least one of the current image, the previously reconstructed image, or the current predicted optical flow is input to the first encoder 1211. At least one of the current image, the previously reconstructed image, or the current predicted optical flow may be concatenated and then may be input to the first encoder 1211.
Since information on the current optical flow may be derived from the current image and the previous reconstructed image, the first encoder 1211 may output feature data of the current residual optical flow corresponding to the difference between the current optical flow and the current predicted optical flow by using the current optical flow identified from the current image and the previous reconstructed image and the current predicted optical flow generated by the optical flow predictor 1217.
The first encoder 1211 outputs feature data of the current residual optical flow by processing at least one of the current image, the previously reconstructed image, or the current predicted optical flow according to the parameters set as the training result.
The prediction decoder 1270 of fig. 12 reconstructs a current optical flow based on feature data of the current residual optical flow, and supplies a current predicted image generated from a previous reconstructed image according to the current optical flow to the subtractor 1213.
The subtractor 1213 obtains current residual image data between the current image and the current predicted image. The subtractor 1213 may obtain the current residual image data by subtracting the sample value of the current prediction image from the sample value of the current image.
The current residual image data is input to the second encoder 1215, and the second encoder 1215 outputs feature data of the current residual image data by processing the current residual image data according to parameters set as a training result.
The generator 1230 generates a bitstream based on the characteristic data of the current residual optical flow and the characteristic data of the current residual image data output from the prediction encoder 1210.
Fig. 14 is a diagram showing a configuration of an optical flow predictor 1217 of the image encoding apparatus 1200 corresponding to the optical flow predictor 800 of fig. 8.
Referring to fig. 14, the optical flow predictor 1217 includes a third encoder 1410, a third decoder 810, and a changer 720. When compared to fig. 8, the optical flow predictor 1217 includes a third encoder 1410.
The third encoder 1410 and the third decoder 810 may be stored in a memory. In an embodiment of the present disclosure, the third encoder 1410 and the third decoder 810 may be implemented as at least one dedicated processor for AI.
The third encoder 1410 obtains feature data of the second order optical flow by processing at least one of the current image, the previous reconstructed image, the previous optical flow, feature data of the previous optical flow, or feature data of the previous residual optical flow according to parameters set according to training.
A bit stream corresponding to the feature data of the second-order optical flow may be provided to the image decoding apparatus 300.
The third decoder 810 obtains the second-order optical flow by processing the feature data of the second-order optical flow according to the parameters set by training.
The modifier 720 may obtain the current predicted optical flow by processing the previous optical flow according to the second order optical flow.
In embodiments of the present disclosure, modifier 720 may warp the previous optical flow from the second order optical flow to obtain the current predicted optical flow. The warping is merely an example, and modifier 720 may apply various types of processing for changing the positions of the samples in the previous optical flow to generate the current predicted optical flow.
The optical flow predictor 1217 of fig. 14 obtains feature data of the second-order optical flow by using various types of data that can be used by the image encoding apparatus 1200. The feature data of the second-order optical flow is signaled to the image decoding apparatus 300. The optical flow predictor 800 of the image decoding apparatus 300 processes the feature data of the second-order optical flow signaled from the image encoding apparatus 1200 by using the third decoder 810, and obtains the second-order optical flow.
The predicted optical flow obtained by the image decoding apparatus 300 by using the feature data of the second-order optical flow signaled from the image encoding apparatus 1200 may be more accurate than the predicted optical flow obtained by the image decoding apparatus 300 itself. This is because the image encoding apparatus 1200 can obtain the feature data of the second-order optical flow using more types of data than are available or used by the image decoding apparatus 300. For example, because the current image may not be used by the image decoding apparatus 300 before decoding the current image, the optical flow predictors 600 and 700 of fig. 6 and 7, for example, do not use the current image to obtain the current predicted optical flow.
Fig. 15 is a diagram showing a configuration of the generator 1230 of fig. 12.
Referring to fig. 15, the generator 1230 includes a quantizer 1231 and an entropy encoder 1233.
The quantizer 1231 quantizes the feature data of the current residual optical flow and the feature data of the current residual image data.
The entropy encoder 1233 generates a bitstream by entropy encoding the quantized feature data of the current residual optical flow and the quantized feature data of the current residual image data.
According to an implementation example, in an embodiment, generator 1230 may also include a transformer. The transformer transforms the feature data of the current residual optical flow and the feature data of the current residual image data from the spatial domain to the frequency domain, and supplies the transformed feature data to the quantizer 1231.
Further, according to an implementation example, in an embodiment, the generator 1230 may not include the quantizer 1231. That is, a bit stream corresponding to the feature data of the current residual optical flow and the feature data of the current residual image data may be obtained through the processing of the entropy encoder 1233.
Further, according to an implementation example, in an embodiment, the generator 1230 may generate a bitstream by binarizing feature data of the current residual optical flow and feature data of the current residual image data. That is, when the generator 1230 performs only binarization, the quantizer 1231 and the entropy encoder 1233 may not be included in the generator 1230.
FIG. 16 is a flow chart illustrating a method of encoding optical flow according to an embodiment of the present disclosure.
In operation S1610, the image encoding apparatus 1200 obtains a current predicted optical flow from at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow.
In an embodiment of the present disclosure, the image encoding apparatus 1200 may determine the previous optical flow as the current predicted optical flow.
In another embodiment of the present disclosure, the image encoding apparatus 1200 may obtain the current predicted optical flow by applying at least one of the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow to the first prediction neural network 610.
In another embodiment of the present disclosure, the image encoding apparatus 1200 may obtain the second order optical flow by applying at least one of the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow to the second prediction neural network 710, and may obtain the current prediction optical flow by processing the previous optical flow according to the second order optical flow.
In another embodiment of the present disclosure, the image encoding apparatus 1200 obtains the feature data of the second order optical flow by applying at least one of the current image, the previous reconstructed image, the previous optical flow, the feature data of the previous optical flow, or the feature data of the previous residual optical flow to the third encoder 1410, and obtains the second order optical flow by applying the feature data of the second order optical flow to the third decoder 810. The image encoding apparatus 1200 can obtain the current predicted optical flow by processing the previous optical flow according to the second-order optical flow.
In operation S1620, the image encoding apparatus 1200 obtains feature data of the current residual optical flow by applying at least one of the current image, the previously reconstructed image, or the current predicted optical flow to the first encoder 1211 based on the neural network.
In operation S1630, the image encoding device 1200 generates a bitstream corresponding to the feature data of the current residual optical flow.
In embodiments of the present disclosure, the bitstream may further comprise feature data of the second order optical flow and/or feature data of the current residual image data.
In an embodiment of the present disclosure, the image encoding apparatus 1200 reconstructs a current optical flow from feature data of the current residual optical flow, and obtains a current predicted image by processing a previously reconstructed image based on the reconstructed current optical flow. The image encoding apparatus 1200 may obtain feature data of the current residual image data by applying the current residual image data corresponding to the difference between the current predicted image and the current image to the second encoder 1215. The feature data of the current residual image data may be included in the bitstream.
In another embodiment of the present disclosure, the image encoding apparatus 1200 reconstructs a current optical flow from feature data of the current residual optical flow, and reconstructs a current prediction image by processing a previously reconstructed image based on the reconstructed current optical flow. Current residual image data corresponding to a difference between the current prediction image and the current image may be included in the bitstream.
The encoding process described with reference to fig. 12 to 16 considers the case of processing the previous image by inter prediction. This is because the previous optical flow for encoding the current optical flow is generated in the inter-prediction process of the previous image.
That is, when the current picture is a P frame and the previous picture is a P frame, for example, when the current picture is a P frame subsequent to the P frame, the encoding process described with reference to fig. 12 to 16 may be applied. When the previous image is an I-frame, the previous optical flow may not be obtained. Thus, an encoding process in the case where the current picture is a P frame subsequent to the I frame will be described.
Fig. 17 is a diagram showing another configuration of the predictive coder 1210.
Referring to fig. 17, the predictive encoder 1210 includes a fourth encoder 1710, a second encoder 1215, and a subtractor 1213.
The fourth encoder 1710 and the second encoder 1215 may be stored in a memory. In an embodiment of the present disclosure, the fourth encoder 1710 and the second encoder 1215 may be implemented as at least one dedicated processor for AI.
The fourth encoder 1710 obtains feature data of the current optical flow by processing the current image and the previously reconstructed image according to parameters set according to training.
The prediction decoder 1270 of fig. 12 reconstructs the current optical flow based on the feature data of the current optical flow, and supplies the current predicted image generated from the previous reconstructed image according to the current optical flow to the subtractor 1213. The predictive decoder 1270 may use the fourth decoder 1010 of fig. 10 to reconstruct the current optical flow.
The subtractor 1213 obtains current residual image data between the current image and the current predicted image. The subtractor 1213 may obtain the current residual image data by subtracting the sample value of the current prediction image from the sample value of the current image.
The current residual image data is input to the second encoder 1215, and the second encoder 1215 outputs feature data of the current residual image data by processing the current residual image data according to parameters set as a training result.
The generator 1230 generates a bitstream based on the feature data of the current optical flow and the feature data of the current residual image data output from the prediction encoder 1210.
The bitstream may be transmitted from the image decoding apparatus 300 through a network. The bitstream may be recorded on a data storage medium including a magnetic medium (e.g., a hard disk, a floppy disk, or a magnetic tape), an optical medium (e.g., a compact disc-read only memory (CD-ROM) or a Digital Versatile Disc (DVD)), or a magneto-optical medium (e.g., a floppy disk).
In an embodiment, the predictive coder 1210 may further include a determiner configured to determine whether the current picture is a P frame following an I frame or a P frame following a P frame.
When the current image is a P frame after the P frame, the prediction encoder 1210 may obtain the feature data of the current residual optical flow through the optical flow predictor 1217 and the first encoder 1211 of fig. 13, and when the current image is a P frame after the I frame, the prediction encoder 1210 may obtain the feature data of the current optical flow through the fourth encoder 1710 of fig. 17.
At least one of the first encoder 1211, the second encoder 1215, the third encoder 1410, the fourth encoder 1710, the first decoder 331, the second decoder 333, the third decoder 810, the fourth decoder 1010, the first prediction neural network 610, or the second prediction neural network 710 may include a convolution layer.
An example of a structure of each of the first encoder 1211, the second encoder 1215, the third encoder 1410, the fourth encoder 1710, the first decoder 331, the second decoder 333, the third decoder 810, the fourth decoder 1010, the first prediction neural network 610, and the second prediction neural network 710 will be described with reference to fig. 18.
Fig. 18 is a diagram illustrating a structure of a neural network 1800 according to an embodiment of the present disclosure.
As shown in fig. 18, the input data 1805 is input to a first convolution layer 1810. The input data 1805 varies according to whether the neural network 1800 is used as the first encoder 1211, the second encoder 1215, the third encoder 1410, the fourth encoder 1710, the first decoder 331, the second decoder 333, the third decoder 810, the fourth decoder 1010, the first prediction neural network 610, or the second prediction neural network 710.
For example, when the neural network 1800 is used as the first encoder 1211, the input data 1805 may correspond to the result of concatenating the current image, the previously reconstructed image, and the predicted optical flow. As another example, when the neural network 1800 is used as the second encoder 1215, the input data 1805 may correspond to current residual image data.
The indication "3X 4" marked on the first convolution layer 1810 of fig. 18 may indicate that convolution is performed with one piece of input data 1805 using four filter kernels of size 3X 3. The four feature maps are generated by four filter kernels as a result of the convolution.
The signature generated by the first convolution layer 1810 represents unique features of the input data 1805. For example, each feature map may represent a vertical feature, a horizontal feature, or an edge feature of the input data 1805.
An example of the convolution operation performed by the first convolution layer 1810 will be described in detail with reference to fig. 19.
A signature 1950 may be generated by performing multiplications and additions between parameters of a 3x3 size filter kernel 1930 used in the first convolution layer 1810 and sample values in corresponding input data 1805 corresponding thereto. Because four filter kernels 1930 are used in the first convolution layer 1810, four feature maps 1950 may be generated by performing convolution using the four filter kernels 1930.
In fig. 19, I1 to I49 marked on the input data 1805 indicate samples of the input data 1805, and F1 to F9 marked on the filter kernel 1930 indicate samples of the filter kernel 1930, which may also be referred to as parameters. In addition, M1 to M9 marked on the feature map 1950 indicate the samples of the feature map 1950.
In the convolution operation, the sample values of I1, I2, I3, I8, I9, I10, I15, I16, and I17 of the input data 1805 may be multiplied by F1, F2, F3, F4, F5, F6, F7, F8, and F9 of the filter kernel 1930, respectively, and a value obtained by combining (e.g., adding) the result values of the multiplication may be designated as a value of M1 of the feature map 1950. When a step size of 2 is set for the convolution operation, the sample values of I3, I4, I5, I10, I11, I12, I17, I18, and I19 of the input data 1805 may be multiplied by F1, F2, F3, F4, F5, F6, F7, F8, and F9 of the filter kernel 1930, respectively, and a value obtained by combining the result values of the multiplication may be designated as a value of M2 of the feature map 1950.
When the filter kernel 1930 moves to the last sample point of the input data 1805 based on a step size, a feature map 1950 having a certain size can be obtained by performing a convolution operation between the sample point values in the input data 1805 and the sample points of the filter kernel 1930.
In accordance with the present disclosure, the parameter values of the neural network 1800 used by the convolutional layer of the neural network 1800, e.g., the samples of the filter kernel 1930 (e.g., F1, F2, F3, F4, F5, F6, F7, F8, and F9 of the filter kernel 1930) may be optimized by training the neural network 1800.
Although the convolution layers included in the neural network 1800 may perform the convolution operation of fig. 19, the convolution operation of fig. 19 is merely an example, and the embodiment is not limited thereto.
Referring back to fig. 18, the signature of the first convolution layer 1810 is input to the first activation layer 1820.
The first activation layer 1820 may give each feature map a non-linear characteristic. The first active layer 1820 may include a sigmoid (sigmoid) function, a hyperbolic tangent (tanh) function, and a rectifying linear unit (ReLU) function, but the embodiment is not limited thereto.
When the first activation layer 1820 gives a nonlinear characteristic, this means that some sample values of the characteristic map are changed and output. In this case, the change is performed by applying a nonlinear characteristic.
The first activation layer 1820 determines whether to send the sample values of the feature map to the second convolution layer 1830. For example, some of the sample values of the feature map are activated by the first activation layer 1820 and sent to the second convolution layer 1830, while some other sample values are deactivated by the first activation layer 1820 and not sent to the second convolution layer 1830. The unique features of the input data 1805 represented by the feature map are emphasized by the first activation layer 1820.
The feature map 1825 output from the first active layer 1820 is input to the second convolutional layer 1830. Any of the feature maps 1825 of fig. 18 are the result of the feature map 1950 of fig. 19 being processed by the first activation layer 1820.
The indication "3X 4" marked on the second convolution layer 1830 may indicate that the convolution is performed on the input signature 1825 by using four filter kernels of size 3X 3. The output of the second convolution layer 1830 is input to a second activation layer 1840. The second activation layer 1840 may give the input feature map a non-linear characteristic.
The feature map 1845 output from the second activation layer 1840 is input to the third convolution layer 1850. The 3X1 marked on the third convolution layer 1850 represents that convolution is performed using a filter kernel of size 3X3 to generate a strip of output data 1855.
The output data 1855 varies according to whether the neural network 1800 is used as the first encoder 1211, the second encoder 1215, the third encoder 1410, the fourth encoder 1710, the first decoder 331, the second decoder 333, the third decoder 810, the fourth decoder 1010, the first prediction neural network 610, or the second prediction neural network 710.
For example, when the neural network 1800 is used as the first encoder 1211, the output data 1855 may be characteristic data of the current residual optical flow. As another example, when the neural network 1800 is used as the second encoder 1215, the output data 1855 may be characteristic data of current residual image data.
Although the neural network 1800 is shown in fig. 18 as including three convolution layers and two activation layers, this is merely an example, and the number of convolution layers and activation layers included in the neural network 1800 may vary in various ways in embodiments according to implementation examples.
Further, according to an implementation example, in an embodiment, the neural network 1800 may be implemented as a Recurrent Neural Network (RNN). This means that the neural network 1800 according to embodiments of the present disclosure may change from a Convolutional Neural Network (CNN) structure to an RNN structure.
In an embodiment of the present disclosure, the image decoding apparatus 300 and the image encoding apparatus 1200 may include at least one Arithmetic Logic Unit (ALU) for convolution and activation operations.
The ALU may be implemented as a processor. For convolution operations, the ALU may include a multiplier for multiplying the sample value of the filter kernel by the sample value of the feature map or input data 1805 output from the previous layer, and an adder for adding the result value of the multiplication.
For the activation operation, the ALU may include a multiplier for multiplying the input sample value by a weight for a predetermined sigmoid function, tanh function, or ReLU function, and a comparator for comparing the result of the multiplication with a certain value and determining whether to send the input sample value to the next layer.
Examples of methods of training a neural network used in image encoding and decoding processes are described below with reference to fig. 20 to 22.
Fig. 20 is a diagram for describing a method of training the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the neural network used in the optical flow predictor 2090. In an embodiment, optical flow predictor 2090 may correspond to optical flow predictor 600 including first prediction neural network 610 of fig. 6. In an embodiment, optical flow predictor 2090 may correspond to optical flow predictor 700 including second prediction neural network 710 of fig. 7. In an embodiment, optical flow predictor 2090 may correspond to optical flow predictor 800 including third decoder 810. In an embodiment, optical flow predictor 2090 may correspond to optical flow predictor 1217 including third encoder 1410 of fig. 14.
In fig. 20, a current training image 2010, a previously reconstructed training image 2030, and a current reconstructed training image 2050 correspond to the current image, the previously reconstructed image, and the current reconstructed image, respectively.
When training the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the neural network used in the optical flow predictor 2090, the similarity between the current reconstructed training image 2050 and the current training image 2010 and the bit rate of the bit stream to be generated by encoding the current training image 2010 may be considered. To this end, in an embodiment of the present disclosure, the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the neural network used in the optical flow predictor 2090 may be trained according to the first loss information 2060 corresponding to the similarity between the current training image 2010 and the current reconstructed training image 2050 and the second loss information 2070 and the third loss information 2080 corresponding to the size of the bitstream.
Referring to fig. 20, the current predicted optical flow is obtained by the optical flow predictor 2090. The current predicted optical flow may be obtained according to the embodiment described with reference to fig. 6, the embodiment described with reference to fig. 7, or the embodiment described with reference to fig. 14. According to an implementation example, in an embodiment, the previous optical flow may be determined to be the current predicted optical flow.
The current predicted optical flow, the current training image 2010, and the previously reconstructed training image 2030 are input to a first encoder 1211. The first encoder 1211 outputs feature data h of the current residual optical flow by processing the current predicted optical flow, the current training image 2010, and the previously reconstructed training image 2030 i
Characteristic data h of current residual light flow i Is input to the first decoder 331, and the first decoder 331 processes the characteristic data h of the current residual optical flow i To output the current residual optical flow d i
When the current predicted optical flow and the current residual optical flow d i When combined with each other, for example, using combiner 2095, a current optical flow g is obtained i
When according to the current optical flow g i Upon previously reconstructing the training image 2030 (e.g., by warping 190), a current predictive training image x 'is generated' i And obtaining a training image x 'corresponding to the current prediction' i Current residual image data r corresponding to the difference between the current training images 2010 i
Current residual image data r i Is input to the second encoder 1215, and the second encoder 1215 processes the current residual image data r i To output characteristic data v of current residual image data i
Feature data v of current residual image data i Is input to the second decoder 333.
The second decoder 333 processes the characteristic data v of the current residual image data i To output the current residual image data r' i And when the training image x 'is currently predicted' i And current residual image data r' i When combined with each other, the current reconstructed training image 2050 is obtained.
To train the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the neural network used in the optical flow predictor 2090, at least one of the first loss information 2060, the second loss information 2070, or the third loss information 2080 may be obtained.
The first penalty information 2060 corresponds to the difference between the current training image 2010 and the current reconstructed training image 2050. The difference between the current training image 2010 and the current reconstructed training image 2050 may include at least one of an L1 norm value, an L2 norm value, a structural similarity (structural similarity, SSIIM) value, a peak signal-to-noise ratio-human visual system (PSNR-HVS) value, a multi-scale (MS) SSIM value, a variance expansion factor (variance inflation factor, VIF) value, or a video multi-method evaluation fusion (video multimethod assessment fusion, VMAF) value determined based on the current training image 2010 and the current reconstructed training image 2050.
Because the first loss information 2060 is related to the quality of the current reconstructed training image 2050, the first loss information 2060 may be referred to as quality loss information.
The second loss information 2070 corresponds to the feature data h of the current residual optical flow i Entropy or characteristic data h of current residual light flow i Bit rate of the corresponding bit stream. Further, the third loss information 2080 corresponds to the feature data v of the current residual image data i Entropy or characteristic data v of current residual image data i Bit rate of the corresponding bit stream.
In an embodiment, when the bitstream comprises the characteristic data h of the current residual optical flow i And feature data v of current residual image data i At this time, fourth loss information corresponding to the bit rate of the bit stream may be calculated. In this case, the second loss information 2070 and the third loss information 2080 may not be used for training.
Since the second loss information 2070 and the third loss information 2080 are related to efficiency of encoding the current training image 2010, the second loss information 2070 and the third loss information 2080 may be referred to as compression loss information.
The first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the neural network used in the optical flow predictor 2090 may be trained to reduce or minimize final loss information derived from at least one of the first loss information 2060, the second loss information 2070, or the third loss information 2080.
In detail, the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the neural network used in the optical flow predictor 2090 may be trained to reduce or minimize final loss information by changing values of preset parameters.
In an embodiment of the present disclosure, the final loss information may be calculated according to equation 1.
[ equation 1]
Final loss information = a first loss information + b second loss information + c third loss information
In equation 1, a, b, and c represent weights applied to the first loss information 2060, the second loss information 2070, and the third loss information 2080, respectively.
According to equation 1, it is found that the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the neural network used in the optical flow predictor 2090 may be trained such that the current reconstructed training image 2050 is as similar as possible to the current training image 2010, and the size of the bit stream corresponding to the data output from the first encoder 1211 and the second encoder 1215 is minimized.
The training process of fig. 20 corresponds to a training process for a P frame following the P frame. For P frames following the I frame, the fourth encoder 1710 and the fourth decoder 1010 of fig. 10 and 17 may be trained. To this end, the first encoder 1211 and the first decoder 331 of fig. 20 may be replaced by the fourth encoder 1710 and the fourth decoder 1010. The optical flow predictor 2090 and the combiner 2095 may not be used for the training process.
An example of a process of training the second encoder 1215, the second decoder 333, the fourth encoder 1710, and the fourth decoder 1010 for a P frame following an I frame is described below. The current training image 2010 and the previously reconstructed training image 2030 may be input to a fourth encoder 1710.
The fourth encoder 1710 outputs feature data of the current optical flow by processing the current training image 2010 and the previously reconstructed training image 2030, and the feature data of the current optical flow is input to the fourth decoder 1010.
The fourth decoder 1010 outputs the current optical flow by processing the feature data of the current optical flow.
When the training image 2030 is previously reconstructed from the current optical flow warp (e.g., by warp 190), a current predicted training image x 'is generated' i And obtaining a training image x 'corresponding to the current prediction' i Current residual image data r corresponding to the difference between the current training images 2010 i
Current residual image data r i Is input to the second encoder 1215, and the second encoder 1215 processes the current residual image data r i To output characteristic data v of current residual image data i
Feature data v of current residual image data i Is input to the second decoder 333. The second decoder 333 processes the characteristic data v of the current residual image data i To output the current residual image data r' i And when the training image x 'is currently predicted' i And current residual image data r' i When combined with each other, the current reconstructed training image 2050 is obtained.
The second encoder 1215, the second decoder 333, the fourth encoder 1710, and the fourth decoder 1010 may be trained such that final lost information derived from at least one of the first lost information 2060, the second lost information 2070, or the third lost information 2080 is reduced or minimized.
The first penalty information 2060 may correspond to the difference between the current training image 2010 and the current reconstructed training image 2050. The second loss information 2070 may correspond to entropy of feature data of the current optical flow or a bit rate of a bit stream corresponding to the feature data of the current optical flow. Further, the third loss information 2080 may correspond to the feature data v of the current residual image data i Entropy or characteristic data v of current residual image data i Bit rate of the corresponding bit stream.
In an embodiment, the second encoder 1215 and the second decoder 333 are used for both the training process for P frames following P frames and the training process for P frames following I frames.
In an embodiment of the present disclosure, the second encoder 1215 and the second decoder 333 may be trained by a training process for a P frame following the P frame, and then may be additionally trained by a training process for a P frame following the I frame.
In another embodiment of the present disclosure, the second encoder 1215 and the second decoder 333 may be trained by a training process for P frames following the I frame, and then may be additionally trained for P frames following the P frame.
In another embodiment of the present disclosure, the second encoder 1215 and the second decoder 333 may be trained separately through a training process for a P frame following an I frame and a training process for a P frame following a P frame. For example, the second encoder 1215 and the second decoder 333 trained through the training process for the P frame after the P frame may be applied to the current picture after the P frame, and the second encoder 1215 and the second decoder 333 trained through the training process for the P frame after the I frame may be applied to the current picture after the I frame.
Fig. 21 is a diagram for describing a procedure in which the training apparatus 2100 trains the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the optical flow prediction neural network 2200.
The optical flow prediction neural network 2200 is a neural network for obtaining a predicted optical flow, and may be the first prediction neural network 610 of fig. 6, the second prediction neural network 710 of fig. 7, or the third encoder 1410 and the third decoder 810 of fig. 14.
An example of a training process described with reference to fig. 20 may be performed by training device 2100. The training device 2100 may be, for example, the image encoding apparatus 1200 or a separate server. Parameters obtained as a result of training may be stored in the image encoding apparatus 1200 and the image decoding apparatus 300.
Referring to fig. 21, the training apparatus 2100 initially sets parameters of the first encoder 1211, the first decoder 331, the second encoder 1215, the second decoder 333, and the optical flow prediction neural network 2200 (S2110). Accordingly, the first encoder 1211, the first decoder 331, the second encoder 1215, the second decoder 333, and the optical flow prediction neural network 2200 may operate according to the initially set parameters.
In operation S2115, the training device 2100 inputs data (e.g., a previous optical flow, feature data of a previous optical flow, and feature data of a previous residual optical flow) required for the optical flow prediction neural network 2200 to obtain a current predicted optical flow to the optical flow prediction neural network 2200.
In operation S2120, the optical flow prediction neural network 2200 outputs a current predicted optical flow to the first encoder 1211 and the training device 2100 by processing the input data.
In operation S2125, the training apparatus 2100 inputs the current training image 2010 and the previously reconstructed training image 2030 to the first encoder 1211.
In operation S2130, the first encoder 1211 processes the feature data h of the current residual optical flow by processing the current predicted optical flow, the current training image 2010, and the previously reconstructed training image 2030 i Output to the training device 2100 and to the first decoder 331.
In operation S2135, the training device 2100 determines feature data h of the current residual optical flow i Second loss information 2070 is calculated.
In operation S2140, the first decoder 331 processes the feature data h of the current residual optical flow i Current residual error optical flow d i Output to the exercise device 2100.
In operation S2145, the training device 2100 uses the current predicted optical flow and the current residual optical flow d based on i Generating a current predictive training image x 'from the current optical flow obtained' i And obtaining a training image x 'corresponding to the current prediction' i Current residual image data r corresponding to the difference between the current training images 2010 i
In operation S2150, the training device 2100 subjects the current residual image data r i Is input to the second encoder 1215, and in operation S2155, the second encoder 1215 converts the characteristic data v of the current residual image data i Output to the training device 2100 and to the second decoder 333.
In operation S2160, training apparatus2100 from the characteristic data v of the current residual image data i Third loss information 2080 is calculated.
In operation S2165, the second decoder 333 processes the feature data v of the current residual image data i The current residual image data r' i Output to the exercise device 2100.
In operation S2170, the training apparatus 2100 extracts from the current residual image data r' i And the current predictive training image x' i A current reconstructed training image 2050 is generated.
In operation S2180, the training device 2100 calculates first loss information 2060 corresponding to the difference between the current training image 2010 and the current reconstructed training image 2050.
In operations S2181, S2183, S2185, S2187, and S2189, the training device 2100 calculates final loss information by combining at least one of the first loss information 2060, the second loss information 2070, or the third loss information 2080, and the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the optical flow prediction neural network 2200 update the initially set parameters by back propagation based on the final loss information.
Next, the training apparatus 2100, the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the optical flow prediction neural network 2200 update the parameters by repeatedly performing operations S2115 to S2189 until the final loss information is minimized. In this case, during each repetition operation, the first decoder 331, the second decoder 333, the first encoder 1211, the second encoder 1215, and the optical flow prediction neural network 2200 operate according to the parameters updated in the previous process.
Fig. 22 is a diagram for describing a process in which the training apparatus 2100 trains the fourth encoder 1710, the fourth decoder 1010, the second encoder 1215, and the second decoder 333.
The training process of fig. 21 may be a training process for a P frame following a P frame, while the training process of fig. 22 may be a training process for a P frame following an I frame.
Referring to fig. 22, the training apparatus 2100 initially sets parameters of the fourth encoder 1710, the fourth decoder 1010, the second encoder 1215, and the second decoder 333 in operation S2210. Accordingly, the fourth encoder 1710, the fourth decoder 1010, the second encoder 1215, and the second decoder 333 may operate according to the initially set parameters. According to an implementation example, in an embodiment, the second encoder 1215 and the second decoder 333 may initially operate according to parameters set through the training process of fig. 21.
In operation S2215, the training device 2100 inputs the current training image 2010 and the previously reconstructed training image 2030 to the fourth encoder 1710.
In operation S2220, the fourth encoder 1710 outputs feature data of the current optical flow to the training device 2100 and the fourth decoder 1010 by processing the current training image 2010 and the previously reconstructed training image 2030.
In operation S2225, the training device 2100 calculates second loss information 2070 from the feature data of the current optical flow.
In operation S2230, the fourth decoder 1010 outputs the current optical flow to the training device 2100 by processing the feature data of the current optical flow.
In operation S2235, the training apparatus 2100 generates a current predicted training image x 'by using the current optical flow' i And obtaining a training image x 'corresponding to the current prediction' i Current residual image data r corresponding to the difference between the current training images 2010 i
In operation S2240, the training device 2100 subjects the current residual image data r i Is input to the second encoder 1215, and the second encoder 1215 outputs the characteristic data v of the current residual image data in operation S2245 i Output to the training device 2100 and to the second decoder 333.
In operation S2250, the training apparatus 2100 performs training on feature data v of the current residual image data i Third loss information 2080 is calculated.
In operation S2260, the second decoder 333 processes the characteristic data v of the current residual image data i The current residual image data r' i Output to the exercise device 2100.
Training in operation S2265The exercise device 2100 extracts the current residual image data r' i And the current predictive training image x' i A current reconstructed training image 2050 is generated.
In operation S2270, the training device 2100 calculates first loss information 2060 corresponding to the difference between the current training image 2010 and the current reconstructed training image 2050.
In operations S2271, S2273, S2275, and S2277, the training device 2100 calculates final loss information by combining at least one of the first loss information 2060, the second loss information 2070, or the third loss information 2080, and the fourth encoder 1710, the fourth decoder 1010, the second encoder 1215, and the second decoder 333 update the initially set parameters by back propagation based on the final loss information.
Next, the training apparatus 2100, the fourth encoder 1710, the fourth decoder 1010, the second encoder 1215, and the second decoder 333 update the parameters by repeatedly performing operations S2215 to S2277 until the final loss information is minimized. In this case, during each repetition operation, the fourth encoder 1710, the fourth decoder 1010, the second encoder 1215, and the second decoder 333 operate according to the parameters updated in the previous process.
Embodiments of the present disclosure described herein may be written as computer-executable programs and the written programs may be stored in machine-readable storage media.
A machine-readable storage medium may be provided as a non-transitory storage medium. Here, "non-transitory" means that the storage medium does not include a signal and is tangible, but does not distinguish whether the data is semi-permanently or temporarily stored in the storage medium. For example, a "non-transitory storage medium" may include a buffer that temporarily stores data.
According to embodiments of the present disclosure, methods according to various embodiments of the present disclosure may be provided in a computer program product. The computer program product is a product that can be purchased between a seller and a buyer. The computer program product may be distributed in the form of a machine-readable storage medium, e.g. a compact disc read only memory (CD-ROM), or electronically distributed (e.g. downloaded or uploaded) via an application store or directly between two user devices, e.g. smartphones. When distributed online, at least a portion of the computer program product (e.g., the downloadable application) may be temporarily generated, or at least temporarily stored, in a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
While embodiments have been particularly shown and described, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope as defined by the following claims.

Claims (13)

1. A method of reconstructing optical flow using Artificial Intelligence (AI), the method comprising:
obtaining characteristic data of a current residual light stream of a current image from a bit stream;
obtaining a current residual optical flow by applying feature data of the current residual optical flow to a first decoder based on a neural network;
obtaining a current predicted optical flow based on at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow; and
the current optical flow is reconstructed based on the current residual optical flow and the current predicted optical flow.
2. The method of claim 1, wherein the current image is reconstructed based on the current residual image data and a current predicted image generated based on a previously reconstructed image and a reconstructed current optical flow.
3. The method of claim 1, wherein obtaining a current predicted optical flow comprises selecting a previous optical flow as the current predicted optical flow.
4. The method of claim 1, wherein obtaining a current predicted optical flow comprises applying at least one of a previous optical flow, feature data of a previous optical flow, or feature data of a previous residual optical flow to the first predicted neural network.
5. The method of claim 1, wherein obtaining a current predicted optical flow comprises:
Obtaining a second order optical flow between the current predicted optical flow and the previous optical flow by applying at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow to the second predicted neural network; and
the current predicted optical flow is generated by modifying the previous optical flow according to the second order optical flow.
6. The method of claim 1, wherein obtaining a current predicted optical flow comprises:
obtaining feature data of a second order optical flow between a current predicted optical flow and a previous optical flow from the bitstream;
obtaining a second-order optical flow by applying the feature data of the second-order optical flow to a third decoder based on a neural network; and
the current predicted optical flow is generated by modifying the previous optical flow according to the second order optical flow.
7. The method of claim 1, wherein the characteristic data of the current residual optical flow is obtained by performing entropy decoding and inverse quantization on the bitstream.
8. The method of claim 1, wherein the first neural network-based decoder is trained based on:
first loss information corresponding to a difference between a current training image and a current reconstructed training image related to the current training image; and
Second loss information corresponding to entropy of feature data of a current residual optical flow of the current training image.
9. The method of claim 1, wherein the characteristic data of the current residual optical flow is obtained from the bitstream based on a current picture corresponding to a predicted (P) frame and based on a P frame subsequent to another P frame.
10. The method of claim 9, wherein based on a P frame following an intra (I) frame, the method further comprises:
obtaining feature data of the current optical flow from the bit stream; and
the current optical flow is reconstructed by applying the feature data of the current optical flow to a fourth decoder based on a neural network.
11. A computer-readable recording medium having recorded thereon a program for executing the method according to claim 1.
12. An apparatus for reconstructing optical flow using Artificial Intelligence (AI), the apparatus comprising:
at least one processor configured to implement:
a bit stream acquirer configured to acquire feature data of a current residual optical flow from a bit stream of a current image; and
a predictive decoder configured to:
the current residual optical flow is obtained by applying the characteristic data of the current residual optical flow to a first decoder based on a neural network,
Obtaining a current predicted optical flow using at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow, and
the current optical flow is reconstructed based on the current residual optical flow and the current predicted optical flow.
13. A method of encoding a light stream using Artificial Intelligence (AI), the method comprising:
obtaining a current predicted optical flow from at least one of the previous optical flow, the feature data of the previous optical flow, and the feature data of the previous residual optical flow;
obtaining feature data of a current residual optical flow by applying the current image, the previously reconstructed image, and the current predicted optical flow to a first encoder based on a neural network; and
generates a bit stream corresponding to the characteristic data of the current residual optical flow,
wherein the current residual optical flow corresponds to a difference between the current optical flow and the current predicted optical flow.
CN202280016009.4A 2021-02-22 2022-02-21 AI-based image encoding and decoding apparatus and method thereof Pending CN116868566A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
KR10-2021-0023695 2021-02-22
KR10-2021-0123369 2021-09-15
KR10-2021-0171269 2021-12-02
KR1020210171269A KR20220120436A (en) 2021-02-22 2021-12-02 Artificial intelligence based encoding apparatus and decoding apparatus of image, and method thereby
PCT/KR2022/002493 WO2022177383A1 (en) 2021-02-22 2022-02-21 Ai-based image encoding and decoding apparatus, and method by same

Publications (1)

Publication Number Publication Date
CN116868566A true CN116868566A (en) 2023-10-10

Family

ID=88225426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280016009.4A Pending CN116868566A (en) 2021-02-22 2022-02-21 AI-based image encoding and decoding apparatus and method thereof

Country Status (1)

Country Link
CN (1) CN116868566A (en)

Similar Documents

Publication Publication Date Title
TWI600313B (en) Image prediction decoding device, image prediction decoding method
KR101456491B1 (en) Method and apparatus for encoding and decoding image based on plurality of reference pictures
JP2004007650A (en) Computerized method for processing video image and recording medium
JP2004032718A (en) System and method for processing video frame by fading estimation/compensation
JP2006311526A (en) Video decoding device, video decoding method, video decoding program, and video decoding integrated circuit
JP6042001B2 (en) Moving picture coding apparatus and moving picture coding method
US20180227592A1 (en) Motion picture encoding device, motion picture encoding method, and storage medium storing motion picture encoding program
US11863756B2 (en) Image encoding and decoding apparatus and method using artificial intelligence
CN103517071A (en) Image coding apparatus and image coding method
CN107534765B (en) Motion vector selection and prediction in video coding systems and methods
EP4250729A1 (en) Ai-based image encoding and decoding apparatus, and method by same
CN116868566A (en) AI-based image encoding and decoding apparatus and method thereof
KR20140022122A (en) Video encoding apparatus and method using rate distortion optimization
JP6564315B2 (en) Encoding device, decoding device, and program
KR20220120436A (en) Artificial intelligence based encoding apparatus and decoding apparatus of image, and method thereby
US20230145525A1 (en) Image encoding apparatus and image decoding apparatus both using artificial intelligence, and image encoding method and image decoding method performed by the image encoding apparatus and the image decoding apparatus
CN116888961A (en) Apparatus for image encoding and decoding using AI and method for image encoding and decoding using the same
EP4354871A1 (en) Ai-based image encoding and decoding device, and method performed thereby
US20230247212A1 (en) Device and method for encoding and decoding image using ai
KR20230022085A (en) Artificial intelligence based encoding apparatus and decoding apparatus of image, and method thereby
US20240064336A1 (en) Image decoding apparatus and image encoding apparatus using ai, and methods performed thereby
US20230044603A1 (en) Apparatus and method for applying artificial intelligence-based filtering to image
KR20230067492A (en) Image encoding apparatus and image decoding apparatus using artificial intelligence, and method for encoding and decondg image thereby
CN117837146A (en) AI-based image encoding and decoding apparatus and method of performing the same
KR20230041601A (en) Apparatus and method for encoding and decodng image using artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination