WO2021080145A1 - Appareil et procédé de remplissage d'image - Google Patents
Appareil et procédé de remplissage d'image Download PDFInfo
- Publication number
- WO2021080145A1 WO2021080145A1 PCT/KR2020/011074 KR2020011074W WO2021080145A1 WO 2021080145 A1 WO2021080145 A1 WO 2021080145A1 KR 2020011074 W KR2020011074 W KR 2020011074W WO 2021080145 A1 WO2021080145 A1 WO 2021080145A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- map
- feature
- feature map
- mask
- encoding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 239000011159 matrix material Substances 0.000 claims description 74
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 238000003786 synthesis reaction Methods 0.000 claims description 6
- 239000003550 marker Substances 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 7
- 239000002131 composite material Substances 0.000 abstract 1
- 239000000284 extract Substances 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/40—Filling a planar surface by adding surface attributes, e.g. colour or texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Definitions
- the present invention relates to an image filling apparatus and method, and to an image filling apparatus and method using non-local feature synchronization.
- Image filling refers to the work of synthesizing content that can be visually and semantically replaced in the missing or invalid area corresponding to the valid area that is not missing from the image. Image filling is also known as image inpainting, image hole filling, or image completion. It can be usefully applied in many applications.
- the most basic method of filling an image is the exemplar-based inpainting method.
- the example-based filling technique searches for a patch that is most similar to the missing area among valid areas that are not new locks in the image, and searches for the most similar patch to be copied and filled. This technique is effective in the case of restoring high-frequency texture details, but has high computational complexity.
- it since it is not suitable for preserving the semantic structure, it is not suitable for filling the missing areas while maintaining semantic consistency for the existing areas that are not missing, so there is a limitation that the missing areas for complex objects and scenes cannot be filled.
- a mask that assigns a value of 0 to invalid pixels and a value of 1 to valid pixels is applied to distinguish valid and valid pixels during convolution operation.
- a convolution technique has been proposed. In the partial convolution technique, only valid features are extracted and propagated, and invalid pixels are not propagated, so that stable filling performance can be obtained.
- Existing techniques using partial convolution include an encoder that gradually extracts effective features from an image, and a decoder that recovers a region missing from the features extracted by the encoder into a stepwise filled image. At this time, each step of the decoder fills the missing area by receiving and combining the features and masks extracted in the corresponding step of the encoder through the skip connection, and thus, the meaningless features of the invalid pixels among the features applied through the skip connection. There is a problem that it is transmitted and a visual error occurs.
- An object of the present invention is to obtain a non-local presenter by fusing the most semantically similar features irrespective of the distance to the missing region among the features extracted from the non-missing effective region of an image, and to obtain a non-local presenter based on the omission. It is to provide an image filling apparatus and method capable of reducing visual errors by reconstructing the created area.
- Another object of the present invention is to apply a non-local feature synthesis layer capable of reconstructing the features of the invalid region using the features of the valid region even if the features of the invalid region are transmitted from the encoder to the decoder through a skip connection. It is to provide an image filling apparatus and method capable of reconfiguring and filling the missing area semantically consistently by obtaining a local presenter.
- the image filling apparatus for achieving the above object receives an input image in which an invalid area is designated by a mask, and encodes the input image and the mask in stages according to a previously learned pattern estimation method.
- An encoder that acquires a feature map and updates the mask in a known manner; And a feature map obtained in a corresponding step among the feature maps finally obtained by the encoder, the feature maps obtained in stages, and the updated masks, and the masks are sequentially decoded according to a previously learned feature restoration method to obtain a reconstructed feature map. Includes a decoder to obtain.
- the decoder obtains a decoding map in which attention is weighted for each pixel to a feature map finally obtained by the encoder or a reconstructed feature map obtained in a previous step, and the mask and the feature attention matrix obtained in a corresponding step of the encoder are obtained.
- a hole fill similarity matrix for searching for a pixel suitable to be filled in an ineffective region in the effective region is obtained, and an encoding map is obtained by combining the hole fill similarity matrix and the feature map obtained in the corresponding step.
- the reconstructed feature map may be obtained by combining the decoding map and the encoding map and decoding according to a previously learned feature restoration method.
- the decoder is connected in stages to obtain a reconstructed feature map by receiving a feature map finally obtained from the encoder or a reconstructed feature map obtained from a previous stage and a feature map obtained from a corresponding step of the encoder and a mask to obtain a reconstructed feature map.
- Each of the plurality of encoding layers obtains a characteristic attention matrix representing the degree of attention for each pixel according to a pattern estimation method learned in advance from a feature map finally obtained by the encoder or a reconstructed feature map obtained from a previous stage, and the characteristic attention matrix And the reconstructed feature map obtained in the previous step to obtain a decoding map, and using the mask obtained in the corresponding step of the encoder and the feature attention matrix, a pixel suitable for filling the ineffective region in the feature map is selected as an effective region.
- NFS layer Non-Local Feature Synthesis-Layer
- a deconvolution layer that combines the decoding map and the encoding map, and obtains a reconstructed feature map by deconvolving based on a pre-learned weight.
- the NFS layer extracts three features by convolving the feature map finally obtained from the encoder or the reconstructed feature map obtained from the previous stage with three different weights obtained by learning, and extracts three features.
- a decoding map obtaining unit that calculates a correlation between two features to obtain the characteristic attention matrix, multiplies the characteristic attention matrix with the remaining features, and adds an applied feature map or a reconstructed feature map to obtain the decoding map; And slicing the mask obtained in the corresponding step of the encoder to a predetermined size, and multiplying the sliced mask and the transpose mask obtained by inverting and transposing pixel values of the sliced mask to obtain a hole filling indicator, and the hole
- the fill indicator is multiplied by the feature attention matrix and normalized in a predetermined manner to obtain a hole fill similarity matrix, the hole fill similarity matrix and the applied feature map are multiplied, and the result is added to the applied feature map again to obtain the encoding map.
- It may include an encoding map acquisition unit that
- the encoder includes a plurality of encoding layers that are stepwise connected to the plurality of encoding layers in reverse order, and the plurality of encoding layers are applied with a mask corresponding to the input image or a feature map obtained from a previous stage, and A feature map may be obtained by performing a predetermined operation on the feature map and the mask by using the weight obtained by learning, and the mask may be updated in a predetermined manner.
- an input image in which an invalid area is designated by a mask is applied, and the input image and the mask are encoded in stages according to a previously learned pattern estimation method.
- the encoding step the feature map obtained in the corresponding step and the mask obtained in the corresponding step among the feature maps obtained in stages, the feature maps obtained in stages, and the updated mask are applied and decoded according to a pre-learned feature reconstruction method, and the reconstructed feature map is stepwise obtained. It includes a decoding step of obtaining.
- the decoding step may include obtaining a decoding map in which attention of each pixel is weighted to a feature map finally obtained in the encoding step or a reconstructed feature map previously obtained in the decoding step; Using the corresponding mask obtained in the encoding step and the feature attention matrix, a hole filling similarity matrix for searching in an effective area for a pixel suitable for filling an ineffective area in the feature map is obtained, and the hole filling similarity matrix and the Obtaining an encoding map by combining the corresponding feature maps obtained in the encoding step; And obtaining the reconstructed feature map by combining the decoding map and the encoding map and decoding according to a previously learned feature restoration method.
- the feature of the invalid region can be filled by using the feature of the valid region.
- a local feature synthesis layer to obtain a non-local presenter and reconstructing the missing region based on the non-local presenter, visual errors can be reduced and the missing regions can be semantically consistently filled.
- FIG. 1 is a view for explaining a concept of filling an image in an image filling apparatus and method according to an embodiment of the present invention.
- FIG. 2 shows a schematic structure of an image filling apparatus according to an embodiment of the present invention.
- FIG. 3 is a diagram for explaining a schematic operation of the image filling apparatus of FIG. 2.
- FIG. 4 shows a detailed configuration of an encoding layer in the image filling apparatus of FIG. 2.
- FIG. 5 shows a detailed configuration of a decoder unit in the image filling apparatus of FIG. 2.
- FIG. 6 shows a detailed configuration of a non-local feature synthesis layer in the decoding layer of FIG. 5.
- FIG 7 shows an image filling method according to an embodiment of the present invention.
- FIG. 1 is a view for explaining a concept of filling an image in an image filling apparatus and method according to an embodiment of the present invention.
- FIG. 1 shows an input image including a missing area to be filled, (b) shows an output image in which the missing area is filled by the image filling apparatus and method of the present embodiment, and (c) shows the missing area. Shows the original image that has not been generated.
- (d) to (f) are the concepts in which the image filling apparatus and method of the present embodiment extract pixels in the effective area having characteristics semantically corresponding to each of the three pixels in the missing area of (a) to fill the image. Represents.
- the image filling apparatus and method according to the present embodiment includes at least one pixel having the most similar characteristics among pixels of an ineffective area and pixels of an effective area to be filled in an image.
- the output image of (b) which is almost identical to the original image shown in (c) can be obtained. That is, the image filling apparatus and method according to the present embodiment uses the most semantically similar features among the features in the effective area, regardless of the distance to the ineffective area than the features of the effective area around the border of the ineffective area. By filling it, visual errors are small and semantically consistent images can be obtained.
- FIG. 2 shows a schematic structure of an image filling apparatus according to an embodiment of the present invention
- FIG. 3 is a diagram for explaining a schematic operation of the image filling apparatus of FIG. 2.
- 4 shows a detailed configuration of an encoder unit in the image filling apparatus of FIG. 2
- FIG. 5 shows a detailed configuration of a decoder unit in the image filling apparatus of FIG. 2
- FIG. 6 shows a non-local feature synthesis layer in the decoding layer of FIG. The detailed configuration is shown.
- the image filling apparatus includes an encoding unit 100 and a decoding unit 200.
- the encoding unit 100 receives the input image in which some areas are missing, extracts the features of the effective area that are not missing from the input image, and the decoding unit 200 is omitted based on the features extracted from the encoding unit 100 The features of the invalid region are reconstructed to obtain an image filled with the invalid region.
- the encoding unit 100 includes an input image acquisition unit 110 and an encoder 120 including a plurality of encoding layers EL1 to EL5 having a multi-stage structure.
- the encoder 120 is shown to include five encoding layers EL1 to EL5, but the number of encoding layers may be variously adjusted.
- the input image acquisition unit 110 acquires an input image IN in which at least some areas are missing.
- the input image acquisition unit 110 may acquire an input image IN in which some areas are already missing, but as shown in FIG. 3, a mask for removing some areas in the image together with a general image ( MK) can also be licensed and combined to create an input image.
- MK general image
- the mask MK is used to remove an area determined to be unnecessary in the image, which may be provided by the user.
- the mask MK may be provided by having a size corresponding to the applied image and having a pixel value of an area to be removed have a value of 0, for example, while a pixel value of the remaining area has a value of 1.
- the input image acquisition unit 110 may perform an inter-element multiplication operation between the applied image and the mask MK to obtain an input image IN in which a pixel value of a region designated by the mask MK is removed as 0. .
- the area to be removed from the image to which the mask MK is applied is set in a shape of a black square in the center. Accordingly, it can be seen that the center of the input image IN is removed in a square shape.
- the input image acquisition unit 110 may randomly generate the mask MK.
- the area removed by the mask from the input image IN is referred to as an ineffective area, and the remaining area that is not removed and retains the pixel values of the applied image is referred to as the effective area.
- Each of the plurality of encoding layers EL1 to EL5 of the encoder 120 receives the input image IN obtained by the input image acquisition unit 110 or the feature map output from the encoding layers EL1 to EL4 of the previous stage.
- a feature map is obtained by encoding an applied input image IN or a feature map according to a pre-learned pattern estimation method to extract features.
- each of the plurality of encoding layers EL1 to EL5 receives and encodes a mask of a pattern corresponding to an input image IN or a region removed from the feature map, that is, an invalid region.
- Multiple encoding layers (EL1 to EL5) receive a mask corresponding to an input image (IN) or a feature map, and perform partial convolution with a weight obtained by learning the applied feature map and mask. Acquire a feature map.
- the mask is applied so that the features of the effective region are extracted from each of the encoding layers (EL1 to EL5), while the features are not extracted from the non-effective region. Since the mask is encoded with the input image IN or the feature map, the encoding layers EL1 to EL5 can extract only features of the effective area, and as a result, the encoding layers EL1 to EL5 can extract reliable features. .
- the mask is a mask obtained by the input image acquisition unit 110 or a mask updated and applied from the previous encoding layers EL1 to EL5.
- each of the plurality of encoding layers filters the input image (IN) or feature map using the applied mask, and then, the applied mask is updated according to a known method and transferred to the next encoding layer. do.
- FIG. 4 only a part of the encoder 120 is illustrated for convenience of description, and one first encoding layer EL l among a plurality of encoding layers EL1 to EL5 of the encoder 120 will be described as an example.
- the l-encoded Layer (EL l) is prior to the l-1 encoding, Layer a feature map is applied is obtained in the (EL l-1) (X enc l-1) (or the input image (IN)) And the updated mask (M enc l-1 ) are applied. And the feature map by performing partial convolution as shown in Equation 1 using the weight (W enc l ) acquired by learning on the applied feature map (X enc l-1 ) and the mask (M enc l-1 ). (X enc l ) can be obtained.
- u and v denote the coordinates in the feature map (X enc l ), and u'and v'denote the coordinates (u', v' ⁇ R) in the weight (W enc l) window (R).
- the first encoding layer EL l-1 may update the applied mask M enc l-1 according to Equation 2 to obtain the mask M enc l.
- ⁇ denotes a preset threshold value for controlling the effective region, and if the threshold value ⁇ is more than half the size of the weight window R, the ineffective region of the feature map X enc l is the applied feature map ( It is reduced compared to X enc l-1 ).
- the first encoding layer (EL l ) not only transfers the obtained feature map (X enc l ) and the mask (M enc l ) to the l+ 1th encoding layer (EL l+1 ), but also corresponds to the decoding unit 200
- a skip connection is delivered to the L-th decoding layer DL l to be performed.
- the skip connection simply represents a delivery path that is transmitted to the decoding layer corresponding to the feature map (X enc l ) acquired from the encoding layer and the mask (M enc l ).
- the decoding unit 200 may include a decoder 220 including a plurality of decoding layers DL1 to DL5 having a multi-stage structure and an image output unit 210.
- the plurality of decoding layers DL1 to DL5 of the decoder 220 are configured in an order corresponding to the reverse order of the plurality of encoding layers EL1 to EL5 of the encoder 120.
- the plurality of decoding layers DL1 to DL5 include a feature map (X enc ) output from the last encoding layer (EL5) of the encoding unit 100 or a reconstructed feature map (X dec l+1 ) output from a previous decoding layer and a plurality of The feature map (X enc l ) and the mask (M enc l ) transmitted from the corresponding encoding layer among the encoding layers (EL1 to EL5) of are decoded according to the learned pattern reconstruction method, and the reconstructed feature map (X dec l ) Prints.
- each of the di-number of decoding layers DL1 to DL5 includes a non-local feature synthesis layer (NFS layer), and a reconstructed feature map (X dec l+1 ) and feature map (X enc l) and the mask (M enc l) a decoding map (Y dec l) and encoded map (Y dec l) to generate and non-local characteristics generated by the synthetic layer decoding map (Y dec l using ) And the encoding map (Y dec l ) by performing a predetermined operation to obtain a reconstructed feature map (X dec l ).
- NFS layer non-local feature synthesis layer
- l decoding layer (DL l) is the l + 1 decoding layer restored characteristic map (X dec l) from (DL l + 1) according to the encoding layer and the reverse arrangement in the embodiment Is received, and a feature map (X enc l ) and a mask (M enc l ) are applied from a corresponding encoding layer (EL l ).
- the NFS layer of the first decoding layer (DL l ) is encoded with the decoding map (Y dec l ) using the applied reconstructed feature map (X dec l ), the feature map (X enc l ), and the mask (M enc l ). Generate a map (Y dec l ).
- the first decoding layer (DL l ) concatenates the decoding map (Y dec l ) generated in the NFS layer and the encoding map (Y dec l ), and deconvolutions the combined map with weights obtained by learning.
- the reconstructed feature map X dec l-1 is obtained, and the obtained reconstructed feature map X dec l-1 is transferred to the l-1th decoding layer DL l-1 of the next stage.
- the NFS layer does not acquire a pattern for filling the invalid region in a limited region near the boundary of the invalid region in which a plurality of decoding layers (DL1 to DL5) is semantically the most semantically in the entire valid region of the feature map. It is provided in order to obtain a pattern with similar characteristics and to fill the ineffective area.
- the NFS layer may include a decoding map acquisition unit DM and an encoding map acquisition unit EM.
- the decoding map acquisition unit DM reconstructs the reconstructed feature map (X dec l ) of the size C l ⁇ H l ⁇ W l applied from the l+1th decoding layer (DL l+1 ), and the reconstructed reconstructed feature map 3 features ( ⁇ l (X dec l ; W l ⁇ ), ⁇ l (X dec l) by 1 X 1 convolution of (X dec l ) with different weights (W l ⁇ , W l ⁇ , W l ⁇ ) By extracting l ; W l ⁇ ) and ⁇ l (X dec l ; W l ⁇ )), the reconstructed feature map (X dec l ) is included in three different feature spaces.
- the feature attention matrix (A l ) obtained as in Equation 3 is multiplied by the feature ( ⁇ l (X dec l ; W l ⁇ )) and the scale variable ( ⁇ l ), and then again with the reconstructed feature map (X dec l ).
- a decoding map (Y dec l ) can be obtained.
- the scale variable ⁇ l is a parameter for controlling the feature update process of the decoding layer DL l , and an initial value may be set to 0, and may be adjusted by learning.
- the decoding map acquisition unit (DM) analyzes the degree of attention of each area in the reconstructed feature map (X dec l ), and obtains a decoding map (Y dec l ) by weighting each area according to the analyzed attention level. do.
- the encoding map acquisition unit EM receives a feature map (X enc l ) and a mask (M enc l ) of size C 1 ⁇ H 1 ⁇ W 1 from the corresponding encoding layer EL 1 , and the applied mask Slice (M enc l ) into 1 ⁇ H l W l size. And the sliced mask (m enc l ) is inverted (1- m enc l ) and the transposed reversal mask is transposed, and the sliced mask (m enc l ) is multiplied by ((1- m enc l ) T m enc l ) To obtain the hole fill indicator. This is to ensure that the features are synthesized and filled only in the ineffective region designated by the applied mask (M enc l ).
- the hole filling indicator ((1- m enc l) T m enc l) and the feature note matrix (A l) obtained in the decoding map obtaining unit (DM) multiplied by ((1- m enc l ) T m enc l A l ) and normalized to obtain a hole-filling similarity matrix (S l ).
- the hole filling similarity matrix (S l ) is suitable for filling the feature of position j with the feature of position i of the decoding map (Y dec l ) in the feature map (X enc l ) applied from the corresponding encoding layer (EL l ). It is a matrix to determine whether or not.
- the hole fill similarity matrix (S l ) is the product of the hole fill indicator ((1- m enc l ) T m enc l ) and the feature attention matrix (A l ) ((1- m enc l ) T m enc l A l ) Can be obtained by normalizing according to Equation 4.
- H denotes a pixel in an ineffective area
- V denotes a pixel in an effective area
- Equation 5 the obtained hole-filling similarity matrix (S l ) and the feature map (X enc l ) are multiplied, and a feature map (X enc l ) is added to the result to obtain an encoding map (Y dec l ). Generate.
- the decoding map acquisition unit DM analyzes the degree of attention of each area in the reconstructed feature map (X dec l ), and weights each area according to the analyzed attention level.
- a decoding map (Y dec l ) is obtained, and the encoding map acquisition unit (EM) is a mask applied from the encoding layer (EL l ) corresponding to the characteristic attention matrix (A l) obtained from the decoding map acquisition unit (DM).
- an encoding map (Y dec l ) is generated by deriving a feature of the feature map (X enc l ) for the ineffective region (H).
- the deconvolution layer of the first decoding layer (DL l ) is reconstructed by combining the decoding map (Y dec l ) and the encoding map (Y dec l ) and performing deconvolution with weights, as described above.
- the feature map X dec l-1 is acquired, and the obtained reconstructed feature map X dec l-1 is transferred to the l-1th decoding layer DL l-1 of the next stage.
- each of the plurality of decoding layers is an encoding corresponding to the decoding map (Y dec l ) to which the attention of each region is weighted in the applied restoration feature map (X dec l) including the NFS layer.
- the reconstruction feature map (X dec l ) in the ineffective region has the highest degree of attention. Allows the characteristics of the pixels to be synthesized. Therefore, it is possible to synthesize pixels similar in semantically with less visual error in each pixel of the invalid area.
- the image output unit 210 outputs a reconstructed feature map X dec l output from the decoding layer DL1 of the last stage among the plurality of decoding layers DL1 to DL5 of the decoder 220 as an estimated image.
- the reconstruction and perception loss (L recon ) can be calculated using the L 1 norm function as shown in Equation 6.
- an intermediate loss (L prec ) between corresponding layers can be calculated as in Equation 7.
- ⁇ l (I pred ) and ⁇ l (I gt ) represent a feature map (X enc l ) extracted by the first encoding layer (EL l ) for the original image (I gt ) and the estimated image (I pred ).
- a style loss (L SCC ) has been proposed, and in this case, a style loss (L SCC ) is calculated according to Equation 8.
- P l and G l are matrix representations of ⁇ l (I pred ) and ⁇ l (I gt ), respectively, and S l is a hole fill similarity matrix obtained from the NFS layer.
- hole fill similarity constraint loss (L const-s ) and texture constraint loss (L const-t ) may be defined as in Equations 9 and 10, respectively.
- Equation 8 The style loss (L SCC ) of Equation 8 may be expressed by Equation 11 using the conditional expression of the hole fill similarity matrix S l of Equation 4.
- Equation 11 may be formulated as Equation 12 in consideration of the hole fill similarity constraint loss (L const-s ) and the texture constraint loss (L const-t) of Equations 9 and 10.
- the total loss of the image filling device according to the present embodiment is calculated by Equation 13, and the calculated total loss is backpropagated, so that the image filling device can be learned.
- ⁇ recon , ⁇ prec , ⁇ SCC , ⁇ const-s and ⁇ const-t are loss weights.
- FIG 7 shows an image filling method according to an embodiment of the present invention.
- the image filling method is largely extracted in the encoding step (S10) and encoding step (S10) of extracting features from the effective area of the input image.
- an input image is obtained (S11).
- the input image IN may be applied with the original image and a mask MK for removing a partial area from the original image, and a matrix multiplication operation may be performed to obtain the input image IN, but in some cases, random
- the input image IN may be obtained by using the mask thus generated.
- iterative encoding is performed stepwise on the input image IN and the mask MK.
- a weight obtained by learning is applied to the input image IN and the mask MK, and partial convolution is performed according to Equation 1 to obtain a feature map X enc l (S12).
- the used mask M enc l-1 is updated according to Equation 2 to obtain the mask M enc l (S13).
- a decoding step is performed (S20).
- the decoding step (S20) similarly to the encoding step (S10), the applied feature map (X enc l+1 ) is repeatedly encoded in stages, and the number of encodings repeated in the encoding step (S10) and the decoding step (S20) The number of times of repeated decoding in
- the feature map X enc l+1 finally obtained in the encoding step may be applied as a reconstructed feature map X dec l+1.
- the reconstructed feature map (X dec l+1 ) is reconstructed into a matrix of a predetermined size, and three different weights (W l ⁇ , W l ⁇ , and W) for the reconstructed reconstructed feature map (X dec l+1) l ⁇ ) 3 of features by convolution 1 X 1 in ( ⁇ l (X dec l; W l ⁇ ), ⁇ l (X dec l; W l ⁇ ), ⁇ l (X dec l; W l ⁇ )) Is extracted, and the correlation between two of the three features ( ⁇ l (X dec l ; W l ⁇ ), ⁇ l (X dec l ; W l ⁇ )) is calculated according to a known method to pay attention to the
- the corresponding feature maps (X enc l ) and masks (M enc l ) are applied and applied in the reverse order.
- a hole filling indicator is obtained by using the masked mask (M enc l) (S23).
- the hole fill indicator slices the applied mask (M enc l ) to a predetermined size, and transposes the inverted mask (1- m enc l ) that inverts the sliced mask (m enc l ), and the sliced mask. It can be obtained by multiplying (m enc l ) by ((1- m enc l ) T m enc l ).
- the hole fill similarity matrix S l is obtained using the hole fill indicator ((1- m enc l ) T m enc l ) and the obtained feature attention matrix A l (S24). ).
- the product of the hole fill indicator ((1- m enc l ) T m enc l ) and the feature attention matrix (A l ) ((1- m enc l ) T m enc l A l ) is normalized according to Equation 4 Can be obtained.
- the decoding map (Y dec l ) and the encoding map (Y dec l ) are obtained, the obtained decoding map (Y dec l ) and the encoding map (Y dec l ) are combined in a known manner, and the combined map is learned.
- a deconvolution operation using the weight obtained by the method a reconstructed feature map (X dec l-1 ) is obtained (S26).
- the reconstructed feature map (X dec l-1 ) When the reconstructed feature map (X dec l-1 ) is acquired, it is determined whether decoding is completed by repeatedly performing decoding by a predetermined number of times (S27). If decoding is not completed, a feature attention matrix A l-1 is obtained for the obtained reconstructed feature map X dec l-1 (S21). However, when it is determined that decoding is complete, the obtained reconstructed feature map (X dec l-1 ) is output as an estimated image (S28).
- the method according to the present invention can be implemented as a computer program stored in a medium for execution on a computer.
- the computer-readable medium may be any available medium that can be accessed by a computer, and may also include all computer storage media.
- Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, and ROM (Read Dedicated memory), RAM (random access memory), CD (compact disk)-ROM, DVD (digital video disk)-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
La présente invention peut concerner un appareil et un procédé de remplissage d'image pouvant acquérir un descripteur non local, par application d'une couche composite de caractéristique non locale permettant de remplir une caractéristique d'une zone invalide, même si la caractéristique de la zone invalide est transmise d'un codeur à un décodeur par l'intermédiaire d'une connexion par saut, en utilisant une caractéristique d'une zone valide, et reconstruire une zone manquante sur la base du descripteur non local, ce qui permet de réduire des erreurs visuelles et de remplir de façon sémantiquement cohérente la zone manquante.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2019-0132702 | 2019-10-24 | ||
KR1020190132702A KR102225024B1 (ko) | 2019-10-24 | 2019-10-24 | 이미지 채움 장치 및 방법 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021080145A1 true WO2021080145A1 (fr) | 2021-04-29 |
Family
ID=75184969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2020/011074 WO2021080145A1 (fr) | 2019-10-24 | 2020-08-20 | Appareil et procédé de remplissage d'image |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102225024B1 (fr) |
WO (1) | WO2021080145A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298734A (zh) * | 2021-06-22 | 2021-08-24 | 云南大学 | 一种基于混合空洞卷积的图像修复方法及系统 |
CN113538273A (zh) * | 2021-07-13 | 2021-10-22 | 荣耀终端有限公司 | 图像处理方法及图像处理装置 |
WO2023014346A1 (fr) * | 2021-08-02 | 2023-02-09 | Halliburton Energy Services, Inc. | Remplissage d'espace d'image de trou de forage à l'aide d'un apprentissage profond |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113222874B (zh) * | 2021-06-01 | 2024-02-02 | 平安科技(深圳)有限公司 | 应用于目标检测的数据增强方法、装置、设备及存储介质 |
CN114743018B (zh) * | 2022-04-21 | 2024-05-31 | 平安科技(深圳)有限公司 | 图像描述生成方法、装置、设备及介质 |
CN115700781B (zh) * | 2022-11-08 | 2023-05-05 | 广东技术师范大学 | 一种动态场景下基于图像补绘的视觉定位方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180204111A1 (en) * | 2013-02-28 | 2018-07-19 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
EP3401843A1 (fr) * | 2017-05-11 | 2018-11-14 | Nokia Technologies Oy | Procédé, appareil et produit-programme d'ordinateur permettant de modifier un contenu de media |
US20190228508A1 (en) * | 2018-01-24 | 2019-07-25 | Adobe Inc. | Digital Image Fill |
US20190295228A1 (en) * | 2018-03-21 | 2019-09-26 | Nvidia Corporation | Image in-painting for irregular holes using partial convolutions |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101539013B1 (ko) | 2014-03-19 | 2015-07-24 | 한림대학교 산학협력단 | 이미지 복원 장치 및 방법 |
JP6564049B2 (ja) * | 2014-11-26 | 2019-08-21 | キュリアス アーイー オサケユイチア | ニューラルネットワーク構造とその方法 |
US10095977B1 (en) * | 2017-10-04 | 2018-10-09 | StradVision, Inc. | Learning method and learning device for improving image segmentation and testing method and testing device using the same |
-
2019
- 2019-10-24 KR KR1020190132702A patent/KR102225024B1/ko active IP Right Grant
-
2020
- 2020-08-20 WO PCT/KR2020/011074 patent/WO2021080145A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180204111A1 (en) * | 2013-02-28 | 2018-07-19 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
EP3401843A1 (fr) * | 2017-05-11 | 2018-11-14 | Nokia Technologies Oy | Procédé, appareil et produit-programme d'ordinateur permettant de modifier un contenu de media |
US20190228508A1 (en) * | 2018-01-24 | 2019-07-25 | Adobe Inc. | Digital Image Fill |
US20190295228A1 (en) * | 2018-03-21 | 2019-09-26 | Nvidia Corporation | Image in-painting for irregular holes using partial convolutions |
Non-Patent Citations (1)
Title |
---|
LIU HONGYU; JIANG BIN; XIAO YI; YANG CHAO: "Coherent Semantic Attention for Image Inpainting", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 4169 - 4178, XP033723446, DOI: 10.1109/ICCV.2019.00427 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298734A (zh) * | 2021-06-22 | 2021-08-24 | 云南大学 | 一种基于混合空洞卷积的图像修复方法及系统 |
CN113298734B (zh) * | 2021-06-22 | 2022-05-06 | 云南大学 | 一种基于混合空洞卷积的图像修复方法及系统 |
CN113538273A (zh) * | 2021-07-13 | 2021-10-22 | 荣耀终端有限公司 | 图像处理方法及图像处理装置 |
CN113538273B (zh) * | 2021-07-13 | 2023-09-19 | 荣耀终端有限公司 | 图像处理方法及图像处理装置 |
WO2023014346A1 (fr) * | 2021-08-02 | 2023-02-09 | Halliburton Energy Services, Inc. | Remplissage d'espace d'image de trou de forage à l'aide d'un apprentissage profond |
Also Published As
Publication number | Publication date |
---|---|
KR102225024B1 (ko) | 2021-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021080145A1 (fr) | Appareil et procédé de remplissage d'image | |
CN111738940B (zh) | 一种人脸图像眼部补全方法 | |
US8660295B2 (en) | Watermarking of digital images using watermark-indicators | |
US7860308B2 (en) | Approach for near duplicate image detection | |
CN112001914A (zh) | 深度图像补全的方法和装置 | |
CN111428575B (zh) | 一种基于孪生网络的针对模糊目标的跟踪方法 | |
CN112115783A (zh) | 基于深度知识迁移的人脸特征点检测方法、装置及设备 | |
WO2010038941A2 (fr) | Appareil et procédé pour obtenir une image à haute résolution | |
Freeman et al. | Markov networks for super-resolution | |
CN113222855A (zh) | 一种图像恢复方法、装置和设备 | |
CN110874575A (zh) | 一种脸部图像处理方法及相关设备 | |
Horiuchi | Estimation of color for gray-level image by probabilistic relaxation | |
Mai et al. | Deep unrolled low-rank tensor completion for high dynamic range imaging | |
CN108021927A (zh) | 一种基于慢变视觉特征的视频指纹提取方法 | |
WO2022255523A1 (fr) | Procédé et appareil pour restaurer une image d'objet multi-échelle | |
WO2023210884A1 (fr) | Dispositif et procédé d'élimination de bruit basés sur de moyens non locaux | |
CN114399423B (zh) | 图像内容移除方法、系统、介质、设备及数据处理终端 | |
CN111144492B (zh) | 面向移动端虚拟现实与增强现实的场景图谱生成方法 | |
CN106204451B (zh) | 基于约束固定邻域嵌入的图像超分辨重建方法 | |
Soyjaudah et al. | Fractal image compression using quadtree partitioning | |
Zhang et al. | Dynamic Long-Short Range Structure Learning for Low-Illumination Remote Sensing Imagery HDR Reconstruction | |
Aadil et al. | Improving super resolution methods via incremental residual learning | |
CN117635478B (zh) | 一种基于空间通道注意力的低光照图像增强方法 | |
WO2023224179A1 (fr) | Dispositif et procédé de génération d'image super-résolution permettant d'ajuster la netteté des bords | |
WO2024128377A1 (fr) | Procédé et système de compression de données au moyen de codes potentiels et de décodeurs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20878521 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20878521 Country of ref document: EP Kind code of ref document: A1 |