WO2021137415A1 - Procédé et appareil de traitement d'image basé sur l'apprentissage automatique - Google Patents
Procédé et appareil de traitement d'image basé sur l'apprentissage automatique Download PDFInfo
- Publication number
- WO2021137415A1 WO2021137415A1 PCT/KR2020/015722 KR2020015722W WO2021137415A1 WO 2021137415 A1 WO2021137415 A1 WO 2021137415A1 KR 2020015722 W KR2020015722 W KR 2020015722W WO 2021137415 A1 WO2021137415 A1 WO 2021137415A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- machine learning
- neural network
- convolutional neural
- loss function
- Prior art date
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 145
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 230000006870 function Effects 0.000 claims abstract description 70
- 238000013527 convolutional neural network Methods 0.000 claims description 82
- 238000000034 method Methods 0.000 claims description 31
- 230000004913 activation Effects 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 abstract 3
- 230000006872 improvement Effects 0.000 description 57
- 238000012937 correction Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 12
- 230000000694 effects Effects 0.000 description 12
- 238000003702 image correction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/778—Active pattern-learning, e.g. online learning of image or video features
- G06V10/7796—Active pattern-learning, e.g. online learning of image or video features based on specific statistical tests
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
- G06V10/993—Evaluation of the quality of the acquired pattern
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present invention relates to an image processing method and apparatus based on machine learning, and more particularly, to an image processing method and apparatus for performing machine learning on a plurality of convolutional neural networks, and processing an image based on this.
- an image improvement method using deep learning and a convolutional neural network can learn a network through supervised learning.
- normal images may be used as input images, and an image obtained by improving the input image may be used as a teacher image for supervised learning.
- An object of the present invention for solving the above problems is to provide an image processing method and apparatus for performing machine learning on a Generative Adversarial Network (GAN) and processing an image based thereon.
- GAN Generative Adversarial Network
- An image processing method based on machine learning for solving the above problems includes generating a first corrected image by inputting an input image to a first convolutional neural network, based on the input image generating an intermediate image, performing machine learning on a first loss function of the first convolutional neural network based on the first corrected image and the intermediate image, and generating the first corrected image and a natural image It may include performing machine learning on the second loss function of the first convolutional neural network based on the first convolutional neural network.
- the generating of the intermediate image may include generating the intermediate image by processing the input image based on an algorithm including a fixed parameter value.
- the step of performing machine learning on the first loss function may include: minimizing the sum of the squares of the pixel value difference between the pixel value of the first corrected image and the intermediate image for the first loss function. It may include performing machine learning.
- performing machine learning on the second loss function includes: inputting the first corrected image to a second convolutional neural network to obtain a first activation; and inputting the original image to the second convolutional neural network to The method may include acquiring a second activation and performing machine learning on the second convolutional neural network based on the first activation and the second activation.
- the performing machine learning on the second loss function may further include performing machine learning on the second loss function based on a machine learning result of the second convolutional neural network.
- the method may further include performing machine learning on a third loss function of the first convolutional neural network based on the first corrected image and the input image.
- performing machine learning on the third loss function may include inputting the first corrected image to a third convolutional neural network to generate a second corrected image, based on the second corrected image and the input image. performing machine learning on the third convolutional neural network, and performing machine learning on the third loss function based on a machine learning result of the third convolutional neural network.
- the step of performing machine learning on the third convolutional neural network includes minimizing the sum of the squares of the pixel value difference between the pixel value of the first corrected image and the input image for the third convolutional neural network. It may include performing machine learning.
- the number of convolutional layers of the first convolutional neural network may be equal to or greater than the number of convolutional layers of the third convolutional neural network.
- An image processing apparatus based on machine learning may include a processor, a memory in which one or more instructions executed by the processor are stored, a first convolutional neural network, and a second convolutional neural network. wherein the one or more instructions input an input image to the first convolutional neural network to generate a first corrected image, generate an intermediate image based on the input image, and generate an intermediate image based on the first corrected image and the intermediate image. to perform machine learning on the first loss function of the first convolutional neural network, and perform machine learning on the second loss function of the first convolutional neural network based on the first corrected image and the original image. have.
- the one or more instructions may be executed to process the input image based on an algorithm including a fixed parameter value to generate the intermediate image.
- the one or more instructions are configured to: minimize the sum of the squares of the pixel value difference between the pixel value of the first corrected image and the intermediate image. 1 It can be run to perform machine learning on the loss function.
- the one or more instructions when performing machine learning on the second loss function, input the first corrected image to the second convolutional neural network to obtain a first activation, and convert the original image to the second input to the convolutional neural network to obtain a second activation, and perform machine learning on the second convolutional neural network based on the first activation and the second activation.
- the one or more instructions may be further executed to perform machine learning on the second loss function based on a learning result for the second convolutional neural network.
- the one or more instructions may be further executed to perform machine learning on a third loss function of the first convolutional neural network based on the first corrected image and the input image.
- the image processing apparatus may further include a third convolutional neural network, and when performing machine learning on the third loss function, the one or more instructions apply the first corrected image to the third convolutional neural network.
- input to generate a second corrected image perform machine learning on the third convolutional neural network based on the second corrected image and the input image, and perform machine learning on the third convolutional neural network based on the machine learning result for the third convolutional neural network. It may be executed to perform machine learning on the first convolutional neural network.
- the one or more instructions when performing machine learning on the third convolutional neural network, may be performed in a manner that minimizes the sum of the squares of the pixel value difference between the pixel value of the first corrected image and the input image. 3 It can be run to perform machine learning on convolutional neural networks.
- the number of convolutional layers of the first convolutional neural network may be equal to or greater than the number of convolutional layers of the third convolutional neural network.
- a natural corrected image can be obtained by generating an intermediate image that is an unnatural corrected image based on an algorithm including a fixed value and performing machine learning based on the intermediate image.
- a natural corrected image can be obtained by correcting an input image using a plurality of convolutional neural networks.
- FIG. 1 is a block diagram of an image processing apparatus according to an embodiment of the present invention.
- FIG. 2 is a conceptual diagram of an image tone improvement model according to an embodiment of the present invention.
- FIG. 3 is a conceptual diagram of a naturalness discrimination model according to an embodiment of the present invention.
- FIG. 4 is a conceptual diagram of an inverse improvement model according to an embodiment of the present invention.
- FIG. 5 is a flowchart of a machine learning method according to an embodiment of the present invention.
- FIG. 6 is a conceptual diagram for explaining an effect of an image processing method according to an embodiment of the present invention.
- FIG. 7 is a block diagram of an image processing apparatus according to another embodiment of the present invention.
- first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.
- FIG. 1 is a block diagram of an image processing apparatus according to an embodiment of the present invention.
- an image processing apparatus 100 includes an image acquisition unit 110 , an image correction unit 120 , an intermediate image generation unit 130 , an image storage unit 140 , and The learning unit 150 may be included.
- the image corrector 120 may include an image tone improvement model 121 .
- the learning unit 150 may include a naturalness discrimination model 151 and an inverse improvement model 152 .
- the image tone improvement model 121 may be a generator of a Generative Adversarial Network (GAN), and the naturalness discrimination model 151 may be a discriminator of the GAN.
- GAN Generative Adversarial Network
- the image acquisition unit 110 may acquire an input image from the outside.
- the image acquisition unit 110 may be a camera.
- the image obtaining unit 110 may transmit the input image to the image correcting unit 120 , the intermediate image generating unit 130 , and the learning unit 150 .
- the image corrector 120 may receive an input image from the image obtainer 110 .
- the image corrector 120 may generate a first corrected image based on the input image.
- the image corrector 120 may generate a first corrected image using the image tone improvement model 121 . This can be described in detail as follows.
- FIG. 2 is a conceptual diagram of an image tone improvement model according to an embodiment of the present invention.
- the image tone enhancement model 200 of FIG. 2 may be configured the same as or similar to the image tone enhancement model 121 of FIG. 1 .
- the image tone improvement model 200 may be a convolutional neural network, and may include an encoder 210 , a residual unit 220 , and a decoder 230 .
- the encoder 210 , the residual unit 220 , and the decoder 230 may include a plurality of convolutional layers 211 to 213 , 221 to 224 , and 231 to 233 , respectively.
- a node of the encoder 210 and a node of the decoder 230 corresponding to the node of the encoder 210 may be connected in a skip-connection method, respectively.
- the speed of machine learning may be increased by connecting the node of the encoder 210 and the node of the decoder 230 .
- the encoder 210 may perform convolution on the input image using the plurality of layers 211 to 214 .
- the encoder 210 may extract a feature map by performing convolution on the input image.
- the encoder 210 may perform convolution on the input image using a stride convolution method.
- the encoder 210 may perform convolution on the input image in a stride convolution method every 2 pixels.
- the first layer 211 may perform convolution on the input image to generate a first characteristic map.
- the first layer 211 may transmit the first characteristic map to the second layer 212 .
- the second layer 212 may receive the first characteristic map from the first layer 211 .
- the second layer 212 may perform convolution on the first feature map to generate a second feature map.
- the size of the second characteristic map may be 1/4 of the size of the first characteristic map.
- the second layer 212 may transmit the second characteristic map to the third layer 213 .
- the third layer 213 may receive the second characteristic map from the second layer 212 .
- the third layer may generate a third feature map by performing convolution on the second feature map.
- the size of the third characteristic map may be 1/4 of the size of the second characteristic map.
- the third layer 213 may transmit the third characteristic map to the residual unit 220 .
- the residual unit 220 may receive the third characteristic map from the third layer 213 .
- the residual unit 220 may refine the third characteristic map using the plurality of layers 221 to 224 .
- the residual unit 220 may transmit the rectified third characteristic map to the decoder 230 .
- the decoder 230 may receive the rectified third characteristic map from the residual unit 220 .
- the decoder 230 may perform convolution on the third feature map using the plurality of layers 231 to 233 .
- the decoder 230 may generate a first corrected image by performing convolution on the third characteristic map.
- the decoder 230 may perform convolution on the input image using a stride convolution method.
- the decoder 230 may perform convolution on the third characteristic map by a stride convolution method every 2 pixels.
- the first layer 231 may generate a fourth feature map by performing convolution on the third feature map.
- the size of the fourth characteristic map may be four times the size of the third characteristic map.
- the first layer 231 may transmit the fourth characteristic map to the second layer 232 .
- the second layer 232 may receive the fourth characteristic map from the first layer 231 .
- the second layer 232 may perform convolution on the fourth feature map to generate a fifth feature map.
- the size of the fifth characteristic map may be four times the size of the fourth characteristic map.
- the second layer 232 may transmit the fifth characteristic map to the third layer 233 .
- the third layer 233 may perform convolution on the fifth characteristic map to generate a first corrected image.
- the size of the first correction image may be four times the size of the fifth characteristic map, and may be the same as the size of the input image.
- the image correction unit 120 may perform machine learning on the image tone improvement model 121 to correct the input image.
- the image corrector 120 may perform machine learning on the image tone improvement model 121 based on the loss function according to Equation 1 below.
- a loss function of the image tone improvement model 121 may be a first loss function, may be a second loss function, may be a third loss function.
- the first loss function and the first parameter may be for a color of the image
- the second loss function and the second parameter may be for the naturalness of the image
- the third loss function and the third The parameter may relate to an artifact of the image.
- the image corrector 120 may transmit the first corrected image to the learner 150 .
- the intermediate image generator 130 may receive an input image from the image acquirer 110 .
- the intermediate image generator 130 may include an algorithm.
- the parameter value of the parameter may be a fixed value.
- the intermediate image generator 130 may generate an intermediate image based on the input image.
- the intermediate image may be a color-corrected image from the input image.
- the intermediate image may be an unnatural corrected image.
- the intermediate image generating unit 130 may transmit the intermediate image to the learning unit 150 .
- the image storage unit 140 may include an original (natural) image.
- the original image may be a natural image. Meanwhile, the original image may be an image corresponding to the first corrected image, but may not be limited thereto.
- the image storage unit 140 may transmit the original image to the learning unit 150 .
- the learning unit 150 may receive an input image from the image acquiring unit 110 .
- the learning unit 150 may receive the first corrected image from the image correcting unit 120 .
- the learner 150 may receive the intermediate image from the intermediate image generator 130 .
- the learning unit 150 may receive the original image from the image storage unit 140 .
- the learning unit 150 may perform machine learning on the image tone improvement model 121 based on the first corrected image and the intermediate image.
- the learning unit 150 may perform machine learning based on the loss function of Equation 2 below.
- Equation 2 may be the first loss function of the image tone improvement model 121, may be the first corrected image, may be an intermediate image. That is, the learner 150 may perform machine learning on the first loss function in a manner that minimizes the sum of the squares of the pixel value difference between the first corrected image and the intermediate image.
- Equation 2 is (Mean Squared Error) function, but instead of a function loss function or (Structural Similarity Index) A function capable of reducing a pixel value difference between the first corrected image and the intermediate image, such as a loss function, may be used.
- the learning unit 150 may transmit the machine learning execution result to the image correcting unit 120 .
- the image corrector 120 may receive the machine learning performance result from the learner 150 , and may determine the weight of the image tone improvement model 121 based on the result.
- the weight of the image tone improvement model 121 may be a weight of a convolutional neural network included in the image tone improvement model 121 .
- the learning unit 150 may obtain a determination error based on the first corrected image.
- the learning unit 150 may obtain a determination error using the naturalness determination model 151 .
- the discrimination error may be about whether the first corrected image is a natural image or an unnatural image. This can be described in detail as follows.
- FIG. 3 is a conceptual diagram of a naturalness discrimination model according to an embodiment of the present invention.
- the naturalness determination model 300 of FIG. 3 may be configured the same as or similar to the naturalness determination model 151 of FIG. 1 .
- the naturalness discrimination model 300 may be a convolutional neural network, and may include an encoder 310 , a flatten layer 320 , and a fully connected layer 330 . .
- the encoder 310 may include a plurality of layers 311 to 314 .
- the encoder 310 may perform convolution on the first corrected image using the plurality of layers 311 to 314 .
- the encoder 310 may extract a feature map by performing convolution on the first corrected image.
- the encoder 310 may perform convolution on the first corrected image using a stride convolution method.
- the encoder 310 may perform convolution on the first corrected image in a stride convolution method every 2 pixels.
- the first layer 311 may perform convolution on the first corrected image to generate a first characteristic map.
- the first layer 311 may transmit the first characteristic map to the second layer 312 .
- the second layer 312 may receive the first characteristic map from the first layer 311 .
- the second layer 312 may generate a second feature map by performing convolution on the first feature map.
- the size of the second characteristic map may be 1/4 of the size of the first characteristic map.
- the third layer 313 may receive the second characteristic map from the second layer 312 .
- the third layer may generate a third feature map by performing convolution on the second feature map.
- the size of the third characteristic map may be 1/4 of the size of the second characteristic map.
- the third layer 313 may transmit the third characteristic map to the fourth layer 314 .
- the fourth layer 314 may receive the third characteristic map from the third layer 313 .
- the fourth layer 314 may generate a fourth feature map by performing convolution on the third feature map.
- the size of the fourth characteristic map may be 1/4 of the size of the third characteristic map.
- the fourth layer 314 may transmit the fourth characteristic map to the platen layer 320 .
- the platen layer 320 may receive the fourth characteristic map from the fourth layer 314 .
- the platen layer 320 may change the fourth characteristic map to one dimension by performing a platen operation on the fourth characteristic map.
- the platen layer 320 may transmit the one-dimensionally changed fourth characteristic map to the fully connected layer 330 .
- the fully connected layer 330 may receive the one-dimensionally changed fourth characteristic map from the platen layer 320 .
- the fully connected layer 330 may generate activation based on the one-dimensionally changed fourth characteristic map. Activation may be a value between 0 and 1.
- the learning unit 150 may obtain a determination error for the first corrected image based on the activation generated by the naturalness determination model 142 .
- the learning unit 150 may determine that the first corrected image is unnatural, and when the activation value is 1, it may determine that the first corrected image is natural.
- the learning unit 150 may perform machine learning on the naturalness discrimination model 151 in order to distinguish whether the first corrected image is a natural image.
- the learning unit 150 may perform machine learning on the naturalness determination model 151 based on the original image and the first corrected image.
- the learning unit 150 may perform machine learning on the naturalness determination model 151 based on the loss function of Equation 3 below.
- D(y) may be an activation value for the original image
- D(G(x)) may be an activation value for the first corrected image.
- the learning unit 150 may perform machine learning on the naturalness discrimination model 151 by labeling D(y) as 1 and D(G(X)) as 0. That is, the learning unit 150 may perform machine learning on the naturalness determination model 151 so that the naturalness determination model 151 determines the first corrected image as an unnatural image.
- the learning unit 150 may determine the weight of the naturalness discrimination model 151 based on the machine learning result.
- the weight of the naturalness determination model 151 may be a weight of the convolutional neural network included in the naturalness determination model 151 .
- the learning unit 150 may transmit the machine learning execution result to the image correcting unit 120 .
- the image corrector 120 may receive a machine learning execution result from the naturalness determination model 151 .
- the naturalness discrimination model 151 and the image tone improvement model 121 may be a relationship between a discriminator and a producer in the GAN. Accordingly, when the learning unit 150 performs machine learning on the naturalness discrimination model 151 , the image correcting unit 120 may perform machine learning on the image tone improvement model 121 in response thereto. .
- the image corrector 120 may perform machine learning on the image tone improvement model 121 to naturally correct the input image.
- the image corrector 120 may perform machine learning on the image tone improvement model 121 based on the loss function of Equation 4 below.
- the image corrector 120 may perform machine learning on the image tone improvement model 121 so that the D(G(X)) value becomes 1. That is, the image corrector 120 may perform machine learning on the image tone improvement model 121 so that the naturalness determination model 151 determines the first corrected image as a natural image. Machine learning of the naturalness discrimination model 151 and machine learning of the image tone improvement model 121 may be alternately performed. The image corrector 120 may determine the weight of the image tone improvement model 121 based on the machine learning result.
- the weight of the image tone improvement model 121 may be a weight of a convolutional neural network included in the image tone improvement model 121 .
- the learner 150 may generate a second corrected image based on the first corrected image.
- the learner 150 may generate a second corrected image using the inverse improvement model 152 . This can be described in detail as follows.
- FIG. 4 is a conceptual diagram of an inverse improvement model according to an embodiment of the present invention.
- the inverse improvement model 400 of FIG. 4 may be configured the same as or similar to the inverse improvement model 152 of FIG. 1 .
- the inverse improvement model 400 may be a convolutional neural network and may include a plurality of convolutional layers 401 to 405 .
- the inverse enhancement model 400 may perform convolution on the first corrected image using the convolutional layers 401 to 405 .
- the inverse enhancement model 400 may generate a second corrected image by performing convolution on the first corrected image.
- the second corrected image may be an image in which color and naturalness correction effects are removed from the first corrected image.
- the number of layers of the inverse enhancement model 400 may be less than or equal to the number of layers of the corrector (eg, the image corrector 120 of FIG. 1 ). Accordingly, the second corrected image may be an image in which color and naturalness correction effects are removed from the first corrected image, and the artifact removal effect of the first corrected image may be maintained.
- the learning unit 150 may perform machine learning on the inverse improvement model 152 based on the second corrected image and the input image.
- the learning unit 150 may learn the inverse enhancement model 152 so that the inverse enhancement model 152 removes the color correction effect and the naturalness correction effect among the correction effects of the first corrected image.
- the learning unit 150 may perform machine learning on the inverse improvement model 152 based on the loss function of Equation 5 below.
- Equation 5 may be a loss function of the inverse enhancement model 152 , C(G(X)) may be the second corrected image, and x may be the input image.
- the learner 150 may perform machine learning on the inverse enhancement model 152 in a manner that minimizes the sum of the squares of the pixel value difference between the second corrected image and the input image.
- the learning unit 150 may perform machine learning on the image tone improvement model 121 based on the second corrected image and the input image.
- the learning unit 150 may perform machine learning in a manner that reverses the effect of the correction performed by the image tone improvement model 121 .
- the learning unit 150 may perform machine learning on the image tone improvement model 121 based on Equation 6 below.
- Equation 6 may be a third loss function of the image tone enhancement model 121 .
- the learner 150 may perform machine learning on the image tone improvement model 121 in a manner that minimizes the sum of the squares of the pixel value difference between the second corrected image and the input image.
- the learner 150 may determine the weight of the inverse improvement model 152 based on the machine learning result.
- the weight of the inverse improvement model 152 may be a weight of the convolutional neural network included in the inverse improvement model 152 .
- the learning unit 150 may transmit the machine learning result to the image correcting unit 120 .
- the image corrector 120 may receive a machine learning result from the learner 150 .
- the image corrector 120 may determine the weight of the image tone improvement model 121 based on the machine learning result.
- the image corrector 120 may generate a first corrected image based on the weight on which the image corrector 110 has performed machine learning, and may perform a test based on this.
- the weight on which machine learning is performed may be a weight of a convolutional neural network on which machine learning is performed.
- FIG. 5 is a flowchart of a machine learning method according to an embodiment of the present invention.
- the image corrector may generate a first corrected image ( S510 ).
- the compensator may receive an input image from the image acquisition unit (eg, the image acquisition unit 110 of FIG. 1 ).
- the compensator may generate a first corrected image by performing correction on the input image.
- the corrector may generate the first corrected image by using the image tone enhancement model (eg, the image tone enhancement model 121 of FIG. 1 ).
- the image corrector may perform machine learning on the first loss function of the image tone improvement model ( S520 ).
- the learning unit eg, the learning unit 150 of FIG. 1
- the intermediate image may be generated by the intermediate image generator (eg, the intermediate image generator 130 of FIG. 1 ) based on the input image.
- the learning unit may transmit the machine learning result to the correction unit.
- the compensator may receive the machine learning result from the learner, and may perform machine learning on the first loss function of the image tone improvement model based on this.
- the image corrector may perform machine learning on the second loss function of the image tone improvement model ( S530 ).
- the learning unit may perform machine learning on the naturalness determination model (eg, the naturalness determination model 151 of FIG. 1 ) based on the first corrected image and the original image.
- the learning unit may perform machine learning on the naturalness discrimination model so that the naturalness discrimination model determines the first corrected image as an unnatural image.
- the learning unit may transmit a result of performing machine learning to the image correcting unit.
- the image compensator may receive a machine learning execution result from the learner.
- the image corrector may perform machine learning on the image tone improvement model.
- the image corrector may perform machine learning on the second loss function of the image tone improvement model.
- the image corrector may perform machine learning on the image tone improvement model in response to machine learning of the naturalness discrimination model.
- the image corrector may perform machine learning on the second loss function of the image tone improvement model so that the naturalness determination model determines the first corrected image as a natural image.
- the image corrector may learn the third loss function of the image tone improvement model ( S540 ).
- the learner may perform machine learning on the inverse improvement model based on the first corrected image and the input image.
- the learning unit may perform machine learning on the inverse enhancement model so that the inverse enhancement model removes a color correction effect and a naturalness correction effect among the correction effects of the first corrected image.
- the learning unit may transmit the machine learning result to the image correcting unit.
- the image corrector may receive the machine learning result from the learner.
- the image corrector may perform machine learning on the third loss function of the image tone improvement model based on the machine learning result.
- the image corrector may perform machine learning of the image tone improvement model by repeating the processes of S510 to S540.
- FIG. 6 is a conceptual diagram for explaining an effect of an image processing method according to an embodiment of the present invention.
- the first image 601 may be an input image
- the second image 602 is an image obtained by correcting only the color of the input image (in Equation 1).
- the third image 603 is an image (in Equation 1, may be an image corrected on the basis of )
- the fourth image 604 is an image (in Equation 1) in which color is corrected from the original image in the input image, the image is naturally corrected, and artifacts of the image are removed (in Equation 1) image corrected based on .
- the second image 602 may be an unnatural image in which the color of the image is corrected.
- the third image 601 may be a natural image compared to the second image 602 .
- the fourth image 604 may be a natural image compared to the third image 603 .
- the color of the fourth image 604 may be corrected when compared to the first image 601 .
- the image processing apparatus 100 of FIG. 1 may be configured as follows.
- FIG. 7 is a block diagram of an image processing apparatus according to another embodiment of the present invention.
- the image processing apparatus 700 may include at least one processor 710 , a memory 750 , and a transceiver 730 connected to a network to perform communication. Also, the image processing device 700 may further include an input interface device 740 , an output interface device 750 , a storage device 760 , and the like. Each of the components included in the image processing apparatus 700 may be connected by a bus 770 to communicate with each other. However, each of the components included in the image processing apparatus 700 may not be connected to the common bus 770 but to the processor 710 through an individual interface or an individual bus. For example, the processor 710 may be connected to at least one of the memory 750 , the transceiver 730 , the input interface device 740 , the output interface device 750 , and the storage device 760 through a dedicated interface. .
- the processor 710 may execute a program command stored in at least one of the memory 750 and the storage device 760 .
- the processor 710 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.
- Each of the memory 750 and the storage device 760 may be configured as at least one of a volatile storage medium and a non-volatile storage medium.
- the memory 750 may be configured as at least one of a read only memory (ROM) and a random access memory (RAM).
- the methods according to the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer-readable medium.
- the computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination.
- the program instructions recorded on the computer-readable medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software.
- Examples of computer-readable media include hardware devices specially configured to store and carry out program instructions, such as ROM, RAM, flash memory, and the like.
- Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.
- the hardware device described above may be configured to operate as at least one software module to perform the operations of the present invention, and vice versa.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Quality & Reliability (AREA)
- Probability & Statistics with Applications (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
L'invention concerne un procédé et un appareil de traitement d'image basé sur l'apprentissage automatique. Le procédé de traitement d'image basé sur l'apprentissage automatique, selon la présente invention, peut comprendre les étapes consistant à : générer une première image corrigée par entrée d'une image d'entrée dans un premier réseau neuronal à convolution ; générer une image intermédiaire sur la base de l'image d'entrée ; effectuer un apprentissage automatique sur une première fonction de perte du premier réseau neuronal à convolution sur la base de la première image corrigée et de l'image intermédiaire ; et effectuer un apprentissage automatique sur une deuxième fonction de perte du premier réseau neuronal à convolution sur la base de la première image corrigée et d'une image naturelle.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/770,993 US20220366539A1 (en) | 2019-12-30 | 2020-11-11 | Image processing method and apparatus based on machine learning |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020190178401A KR102537207B1 (ko) | 2019-12-30 | 2019-12-30 | 머신 러닝에 기반한 이미지 처리 방법 및 장치 |
KR10-2019-0178401 | 2019-12-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021137415A1 true WO2021137415A1 (fr) | 2021-07-08 |
Family
ID=76687092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2020/015722 WO2021137415A1 (fr) | 2019-12-30 | 2020-11-11 | Procédé et appareil de traitement d'image basé sur l'apprentissage automatique |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220366539A1 (fr) |
KR (1) | KR102537207B1 (fr) |
WO (1) | WO2021137415A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20230125948A (ko) | 2022-02-22 | 2023-08-29 | 김바올 | 무보정 이미지 인증 방법 및 장치 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170070715A (ko) * | 2015-12-14 | 2017-06-22 | 삼성전자주식회사 | 딥러닝 기반 영상 처리 장치 및 방법, 학습 장치 |
KR101871098B1 (ko) * | 2017-01-12 | 2018-06-25 | 포항공과대학교 산학협력단 | 이미지 처리 방법 및 장치 |
JP2019125014A (ja) * | 2018-01-12 | 2019-07-25 | コニカミノルタ株式会社 | 学習装置、学習方法、および学習プログラム |
JP6569047B1 (ja) * | 2018-11-28 | 2019-09-04 | 株式会社ツバサファクトリー | 学習方法、コンピュータプログラム、分類器、及び生成器 |
-
2019
- 2019-12-30 KR KR1020190178401A patent/KR102537207B1/ko active IP Right Grant
-
2020
- 2020-11-11 US US17/770,993 patent/US20220366539A1/en active Pending
- 2020-11-11 WO PCT/KR2020/015722 patent/WO2021137415A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170070715A (ko) * | 2015-12-14 | 2017-06-22 | 삼성전자주식회사 | 딥러닝 기반 영상 처리 장치 및 방법, 학습 장치 |
KR101871098B1 (ko) * | 2017-01-12 | 2018-06-25 | 포항공과대학교 산학협력단 | 이미지 처리 방법 및 장치 |
JP2019125014A (ja) * | 2018-01-12 | 2019-07-25 | コニカミノルタ株式会社 | 学習装置、学習方法、および学習プログラム |
JP6569047B1 (ja) * | 2018-11-28 | 2019-09-04 | 株式会社ツバサファクトリー | 学習方法、コンピュータプログラム、分類器、及び生成器 |
Non-Patent Citations (2)
Title |
---|
SON HYEONGSEOK, LEE GUNHEE, CHO SUNGHYUN, LEE SEUNGYONG: "Naturalness‐Preserving Image Tone Enhancement Using Generative Adversarial Networks", COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS, WILEY-BLACKWELL, OXFORD, vol. 38, no. 7, 1 October 2019 (2019-10-01), Oxford, pages 277 - 285, XP055826258, ISSN: 0167-7055, DOI: 10.1111/cgf.13836 * |
ZHAO HANG, GALLO ORAZIO, FROSIO IURI, KAUTZ JAN: "Loss Functions for Image Restoration With Neural Networks", IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, vol. 3, no. 1, 1 March 2017 (2017-03-01), pages 47 - 57, XP055796352, DOI: 10.1109/TCI.2016.2644865 * |
Also Published As
Publication number | Publication date |
---|---|
KR102537207B1 (ko) | 2023-05-25 |
US20220366539A1 (en) | 2022-11-17 |
KR20210085403A (ko) | 2021-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018217019A1 (fr) | Dispositif de détection d'un code malveillant variant sur la base d'un apprentissage de réseau neuronal, procédé associé, et support d'enregistrement lisible par ordinateur dans lequel un programme d'exécution dudit procédé est enregistré | |
WO2022092900A1 (fr) | Procédé et dispositif d'entraînement d'un réseau à super-résolution | |
WO2021080145A1 (fr) | Appareil et procédé de remplissage d'image | |
WO2013103184A1 (fr) | Appareil et procédé pour améliorer une image en utilisant des canaux de couleurs | |
WO2010038941A2 (fr) | Appareil et procédé pour obtenir une image à haute résolution | |
WO2019164237A1 (fr) | Procédé et dispositif pour réaliser un calcul d'apprentissage profond à l'aide d'un réseau systolique | |
EP3963516A1 (fr) | Apprendre à des gan (réseaux antagonistes génératifs) à générer une annotation par pixel | |
WO2022045485A1 (fr) | Appareil et procédé de génération d'une vidéo de parole qui créent ensemble des points de repère | |
WO2022169035A1 (fr) | Appareil et procédé de combinaison d'images permettant d'améliorer la qualité d'image | |
WO2022131497A1 (fr) | Appareil d'apprentissage et procédé de génération d'image, et appareil et procédé de génération d'image | |
WO2021137415A1 (fr) | Procédé et appareil de traitement d'image basé sur l'apprentissage automatique | |
EP3942481A1 (fr) | Procédé de réalisation, par un dispositif électronique, d'une opération de convolution au niveau d'une couche donnée dans un réseau neuronal, et dispositif électronique associé | |
WO2022045495A1 (fr) | Procédés de reconstruction de carte de profondeur et dispositif informatique électronique permettant de les implémenter | |
WO2020233089A1 (fr) | Procédé et appareil de création de jeu de test, terminal et support de stockage lisible par ordinateur | |
WO2019216513A1 (fr) | Processeur neuronal de calcul ligne par ligne et procédé de traitement de données l'utilisant | |
WO2020159016A1 (fr) | Procédé d'optimisation de paramètre de réseau neuronal approprié pour la mise en œuvre sur matériel, procédé de fonctionnement de réseau neuronal et appareil associé | |
WO2022146050A1 (fr) | Procédé et système d'entraînement d'intelligence artificielle fédéré pour le diagnostic de la dépression | |
WO2019039757A1 (fr) | Dispositif et procédé de génération de données d'apprentissage et programme informatique stocké dans un support d'enregistrement lisible par ordinateur | |
WO2022169036A1 (fr) | Appareil et procédé de synthèse d'image permettant d'améliorer la qualité d'image | |
WO2019117393A1 (fr) | Appareil d'apprentissage et procédé de génération d'informations de profondeur, appareil et procédé de génération d'informations de profondeur, et support d'enregistrement associé à ces derniers | |
WO2022154523A1 (fr) | Procédé et dispositif de mise en correspondance de données de balayage buccal tridimensionnel par détection de caractéristique 3d basée sur l'apprentissage profond | |
WO2021172674A1 (fr) | Appareil et procédé permettant de générer un résumé vidéo par modélisation de graphe récursif | |
WO2022004970A1 (fr) | Appareil et procédé d'entraînement de points clés basés sur un réseau de neurones artificiels | |
WO2020204610A1 (fr) | Procédé, système et programme de coloration basés sur un apprentissage profond | |
WO2022255523A1 (fr) | Procédé et appareil pour restaurer une image d'objet multi-échelle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20911172 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20911172 Country of ref document: EP Kind code of ref document: A1 |