WO2021137415A1 - Procédé et appareil de traitement d'image basé sur l'apprentissage automatique - Google Patents

Procédé et appareil de traitement d'image basé sur l'apprentissage automatique Download PDF

Info

Publication number
WO2021137415A1
WO2021137415A1 PCT/KR2020/015722 KR2020015722W WO2021137415A1 WO 2021137415 A1 WO2021137415 A1 WO 2021137415A1 KR 2020015722 W KR2020015722 W KR 2020015722W WO 2021137415 A1 WO2021137415 A1 WO 2021137415A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
machine learning
neural network
convolutional neural
loss function
Prior art date
Application number
PCT/KR2020/015722
Other languages
English (en)
Korean (ko)
Inventor
이승용
조성현
손형석
Original Assignee
포항공과대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 포항공과대학교 산학협력단 filed Critical 포항공과대학교 산학협력단
Priority to US17/770,993 priority Critical patent/US20220366539A1/en
Publication of WO2021137415A1 publication Critical patent/WO2021137415A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7796Active pattern-learning, e.g. online learning of image or video features based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates to an image processing method and apparatus based on machine learning, and more particularly, to an image processing method and apparatus for performing machine learning on a plurality of convolutional neural networks, and processing an image based on this.
  • an image improvement method using deep learning and a convolutional neural network can learn a network through supervised learning.
  • normal images may be used as input images, and an image obtained by improving the input image may be used as a teacher image for supervised learning.
  • An object of the present invention for solving the above problems is to provide an image processing method and apparatus for performing machine learning on a Generative Adversarial Network (GAN) and processing an image based thereon.
  • GAN Generative Adversarial Network
  • An image processing method based on machine learning for solving the above problems includes generating a first corrected image by inputting an input image to a first convolutional neural network, based on the input image generating an intermediate image, performing machine learning on a first loss function of the first convolutional neural network based on the first corrected image and the intermediate image, and generating the first corrected image and a natural image It may include performing machine learning on the second loss function of the first convolutional neural network based on the first convolutional neural network.
  • the generating of the intermediate image may include generating the intermediate image by processing the input image based on an algorithm including a fixed parameter value.
  • the step of performing machine learning on the first loss function may include: minimizing the sum of the squares of the pixel value difference between the pixel value of the first corrected image and the intermediate image for the first loss function. It may include performing machine learning.
  • performing machine learning on the second loss function includes: inputting the first corrected image to a second convolutional neural network to obtain a first activation; and inputting the original image to the second convolutional neural network to The method may include acquiring a second activation and performing machine learning on the second convolutional neural network based on the first activation and the second activation.
  • the performing machine learning on the second loss function may further include performing machine learning on the second loss function based on a machine learning result of the second convolutional neural network.
  • the method may further include performing machine learning on a third loss function of the first convolutional neural network based on the first corrected image and the input image.
  • performing machine learning on the third loss function may include inputting the first corrected image to a third convolutional neural network to generate a second corrected image, based on the second corrected image and the input image. performing machine learning on the third convolutional neural network, and performing machine learning on the third loss function based on a machine learning result of the third convolutional neural network.
  • the step of performing machine learning on the third convolutional neural network includes minimizing the sum of the squares of the pixel value difference between the pixel value of the first corrected image and the input image for the third convolutional neural network. It may include performing machine learning.
  • the number of convolutional layers of the first convolutional neural network may be equal to or greater than the number of convolutional layers of the third convolutional neural network.
  • An image processing apparatus based on machine learning may include a processor, a memory in which one or more instructions executed by the processor are stored, a first convolutional neural network, and a second convolutional neural network. wherein the one or more instructions input an input image to the first convolutional neural network to generate a first corrected image, generate an intermediate image based on the input image, and generate an intermediate image based on the first corrected image and the intermediate image. to perform machine learning on the first loss function of the first convolutional neural network, and perform machine learning on the second loss function of the first convolutional neural network based on the first corrected image and the original image. have.
  • the one or more instructions may be executed to process the input image based on an algorithm including a fixed parameter value to generate the intermediate image.
  • the one or more instructions are configured to: minimize the sum of the squares of the pixel value difference between the pixel value of the first corrected image and the intermediate image. 1 It can be run to perform machine learning on the loss function.
  • the one or more instructions when performing machine learning on the second loss function, input the first corrected image to the second convolutional neural network to obtain a first activation, and convert the original image to the second input to the convolutional neural network to obtain a second activation, and perform machine learning on the second convolutional neural network based on the first activation and the second activation.
  • the one or more instructions may be further executed to perform machine learning on the second loss function based on a learning result for the second convolutional neural network.
  • the one or more instructions may be further executed to perform machine learning on a third loss function of the first convolutional neural network based on the first corrected image and the input image.
  • the image processing apparatus may further include a third convolutional neural network, and when performing machine learning on the third loss function, the one or more instructions apply the first corrected image to the third convolutional neural network.
  • input to generate a second corrected image perform machine learning on the third convolutional neural network based on the second corrected image and the input image, and perform machine learning on the third convolutional neural network based on the machine learning result for the third convolutional neural network. It may be executed to perform machine learning on the first convolutional neural network.
  • the one or more instructions when performing machine learning on the third convolutional neural network, may be performed in a manner that minimizes the sum of the squares of the pixel value difference between the pixel value of the first corrected image and the input image. 3 It can be run to perform machine learning on convolutional neural networks.
  • the number of convolutional layers of the first convolutional neural network may be equal to or greater than the number of convolutional layers of the third convolutional neural network.
  • a natural corrected image can be obtained by generating an intermediate image that is an unnatural corrected image based on an algorithm including a fixed value and performing machine learning based on the intermediate image.
  • a natural corrected image can be obtained by correcting an input image using a plurality of convolutional neural networks.
  • FIG. 1 is a block diagram of an image processing apparatus according to an embodiment of the present invention.
  • FIG. 2 is a conceptual diagram of an image tone improvement model according to an embodiment of the present invention.
  • FIG. 3 is a conceptual diagram of a naturalness discrimination model according to an embodiment of the present invention.
  • FIG. 4 is a conceptual diagram of an inverse improvement model according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a machine learning method according to an embodiment of the present invention.
  • FIG. 6 is a conceptual diagram for explaining an effect of an image processing method according to an embodiment of the present invention.
  • FIG. 7 is a block diagram of an image processing apparatus according to another embodiment of the present invention.
  • first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.
  • FIG. 1 is a block diagram of an image processing apparatus according to an embodiment of the present invention.
  • an image processing apparatus 100 includes an image acquisition unit 110 , an image correction unit 120 , an intermediate image generation unit 130 , an image storage unit 140 , and The learning unit 150 may be included.
  • the image corrector 120 may include an image tone improvement model 121 .
  • the learning unit 150 may include a naturalness discrimination model 151 and an inverse improvement model 152 .
  • the image tone improvement model 121 may be a generator of a Generative Adversarial Network (GAN), and the naturalness discrimination model 151 may be a discriminator of the GAN.
  • GAN Generative Adversarial Network
  • the image acquisition unit 110 may acquire an input image from the outside.
  • the image acquisition unit 110 may be a camera.
  • the image obtaining unit 110 may transmit the input image to the image correcting unit 120 , the intermediate image generating unit 130 , and the learning unit 150 .
  • the image corrector 120 may receive an input image from the image obtainer 110 .
  • the image corrector 120 may generate a first corrected image based on the input image.
  • the image corrector 120 may generate a first corrected image using the image tone improvement model 121 . This can be described in detail as follows.
  • FIG. 2 is a conceptual diagram of an image tone improvement model according to an embodiment of the present invention.
  • the image tone enhancement model 200 of FIG. 2 may be configured the same as or similar to the image tone enhancement model 121 of FIG. 1 .
  • the image tone improvement model 200 may be a convolutional neural network, and may include an encoder 210 , a residual unit 220 , and a decoder 230 .
  • the encoder 210 , the residual unit 220 , and the decoder 230 may include a plurality of convolutional layers 211 to 213 , 221 to 224 , and 231 to 233 , respectively.
  • a node of the encoder 210 and a node of the decoder 230 corresponding to the node of the encoder 210 may be connected in a skip-connection method, respectively.
  • the speed of machine learning may be increased by connecting the node of the encoder 210 and the node of the decoder 230 .
  • the encoder 210 may perform convolution on the input image using the plurality of layers 211 to 214 .
  • the encoder 210 may extract a feature map by performing convolution on the input image.
  • the encoder 210 may perform convolution on the input image using a stride convolution method.
  • the encoder 210 may perform convolution on the input image in a stride convolution method every 2 pixels.
  • the first layer 211 may perform convolution on the input image to generate a first characteristic map.
  • the first layer 211 may transmit the first characteristic map to the second layer 212 .
  • the second layer 212 may receive the first characteristic map from the first layer 211 .
  • the second layer 212 may perform convolution on the first feature map to generate a second feature map.
  • the size of the second characteristic map may be 1/4 of the size of the first characteristic map.
  • the second layer 212 may transmit the second characteristic map to the third layer 213 .
  • the third layer 213 may receive the second characteristic map from the second layer 212 .
  • the third layer may generate a third feature map by performing convolution on the second feature map.
  • the size of the third characteristic map may be 1/4 of the size of the second characteristic map.
  • the third layer 213 may transmit the third characteristic map to the residual unit 220 .
  • the residual unit 220 may receive the third characteristic map from the third layer 213 .
  • the residual unit 220 may refine the third characteristic map using the plurality of layers 221 to 224 .
  • the residual unit 220 may transmit the rectified third characteristic map to the decoder 230 .
  • the decoder 230 may receive the rectified third characteristic map from the residual unit 220 .
  • the decoder 230 may perform convolution on the third feature map using the plurality of layers 231 to 233 .
  • the decoder 230 may generate a first corrected image by performing convolution on the third characteristic map.
  • the decoder 230 may perform convolution on the input image using a stride convolution method.
  • the decoder 230 may perform convolution on the third characteristic map by a stride convolution method every 2 pixels.
  • the first layer 231 may generate a fourth feature map by performing convolution on the third feature map.
  • the size of the fourth characteristic map may be four times the size of the third characteristic map.
  • the first layer 231 may transmit the fourth characteristic map to the second layer 232 .
  • the second layer 232 may receive the fourth characteristic map from the first layer 231 .
  • the second layer 232 may perform convolution on the fourth feature map to generate a fifth feature map.
  • the size of the fifth characteristic map may be four times the size of the fourth characteristic map.
  • the second layer 232 may transmit the fifth characteristic map to the third layer 233 .
  • the third layer 233 may perform convolution on the fifth characteristic map to generate a first corrected image.
  • the size of the first correction image may be four times the size of the fifth characteristic map, and may be the same as the size of the input image.
  • the image correction unit 120 may perform machine learning on the image tone improvement model 121 to correct the input image.
  • the image corrector 120 may perform machine learning on the image tone improvement model 121 based on the loss function according to Equation 1 below.
  • a loss function of the image tone improvement model 121 may be a first loss function, may be a second loss function, may be a third loss function.
  • the first loss function and the first parameter may be for a color of the image
  • the second loss function and the second parameter may be for the naturalness of the image
  • the third loss function and the third The parameter may relate to an artifact of the image.
  • the image corrector 120 may transmit the first corrected image to the learner 150 .
  • the intermediate image generator 130 may receive an input image from the image acquirer 110 .
  • the intermediate image generator 130 may include an algorithm.
  • the parameter value of the parameter may be a fixed value.
  • the intermediate image generator 130 may generate an intermediate image based on the input image.
  • the intermediate image may be a color-corrected image from the input image.
  • the intermediate image may be an unnatural corrected image.
  • the intermediate image generating unit 130 may transmit the intermediate image to the learning unit 150 .
  • the image storage unit 140 may include an original (natural) image.
  • the original image may be a natural image. Meanwhile, the original image may be an image corresponding to the first corrected image, but may not be limited thereto.
  • the image storage unit 140 may transmit the original image to the learning unit 150 .
  • the learning unit 150 may receive an input image from the image acquiring unit 110 .
  • the learning unit 150 may receive the first corrected image from the image correcting unit 120 .
  • the learner 150 may receive the intermediate image from the intermediate image generator 130 .
  • the learning unit 150 may receive the original image from the image storage unit 140 .
  • the learning unit 150 may perform machine learning on the image tone improvement model 121 based on the first corrected image and the intermediate image.
  • the learning unit 150 may perform machine learning based on the loss function of Equation 2 below.
  • Equation 2 may be the first loss function of the image tone improvement model 121, may be the first corrected image, may be an intermediate image. That is, the learner 150 may perform machine learning on the first loss function in a manner that minimizes the sum of the squares of the pixel value difference between the first corrected image and the intermediate image.
  • Equation 2 is (Mean Squared Error) function, but instead of a function loss function or (Structural Similarity Index) A function capable of reducing a pixel value difference between the first corrected image and the intermediate image, such as a loss function, may be used.
  • the learning unit 150 may transmit the machine learning execution result to the image correcting unit 120 .
  • the image corrector 120 may receive the machine learning performance result from the learner 150 , and may determine the weight of the image tone improvement model 121 based on the result.
  • the weight of the image tone improvement model 121 may be a weight of a convolutional neural network included in the image tone improvement model 121 .
  • the learning unit 150 may obtain a determination error based on the first corrected image.
  • the learning unit 150 may obtain a determination error using the naturalness determination model 151 .
  • the discrimination error may be about whether the first corrected image is a natural image or an unnatural image. This can be described in detail as follows.
  • FIG. 3 is a conceptual diagram of a naturalness discrimination model according to an embodiment of the present invention.
  • the naturalness determination model 300 of FIG. 3 may be configured the same as or similar to the naturalness determination model 151 of FIG. 1 .
  • the naturalness discrimination model 300 may be a convolutional neural network, and may include an encoder 310 , a flatten layer 320 , and a fully connected layer 330 . .
  • the encoder 310 may include a plurality of layers 311 to 314 .
  • the encoder 310 may perform convolution on the first corrected image using the plurality of layers 311 to 314 .
  • the encoder 310 may extract a feature map by performing convolution on the first corrected image.
  • the encoder 310 may perform convolution on the first corrected image using a stride convolution method.
  • the encoder 310 may perform convolution on the first corrected image in a stride convolution method every 2 pixels.
  • the first layer 311 may perform convolution on the first corrected image to generate a first characteristic map.
  • the first layer 311 may transmit the first characteristic map to the second layer 312 .
  • the second layer 312 may receive the first characteristic map from the first layer 311 .
  • the second layer 312 may generate a second feature map by performing convolution on the first feature map.
  • the size of the second characteristic map may be 1/4 of the size of the first characteristic map.
  • the third layer 313 may receive the second characteristic map from the second layer 312 .
  • the third layer may generate a third feature map by performing convolution on the second feature map.
  • the size of the third characteristic map may be 1/4 of the size of the second characteristic map.
  • the third layer 313 may transmit the third characteristic map to the fourth layer 314 .
  • the fourth layer 314 may receive the third characteristic map from the third layer 313 .
  • the fourth layer 314 may generate a fourth feature map by performing convolution on the third feature map.
  • the size of the fourth characteristic map may be 1/4 of the size of the third characteristic map.
  • the fourth layer 314 may transmit the fourth characteristic map to the platen layer 320 .
  • the platen layer 320 may receive the fourth characteristic map from the fourth layer 314 .
  • the platen layer 320 may change the fourth characteristic map to one dimension by performing a platen operation on the fourth characteristic map.
  • the platen layer 320 may transmit the one-dimensionally changed fourth characteristic map to the fully connected layer 330 .
  • the fully connected layer 330 may receive the one-dimensionally changed fourth characteristic map from the platen layer 320 .
  • the fully connected layer 330 may generate activation based on the one-dimensionally changed fourth characteristic map. Activation may be a value between 0 and 1.
  • the learning unit 150 may obtain a determination error for the first corrected image based on the activation generated by the naturalness determination model 142 .
  • the learning unit 150 may determine that the first corrected image is unnatural, and when the activation value is 1, it may determine that the first corrected image is natural.
  • the learning unit 150 may perform machine learning on the naturalness discrimination model 151 in order to distinguish whether the first corrected image is a natural image.
  • the learning unit 150 may perform machine learning on the naturalness determination model 151 based on the original image and the first corrected image.
  • the learning unit 150 may perform machine learning on the naturalness determination model 151 based on the loss function of Equation 3 below.
  • D(y) may be an activation value for the original image
  • D(G(x)) may be an activation value for the first corrected image.
  • the learning unit 150 may perform machine learning on the naturalness discrimination model 151 by labeling D(y) as 1 and D(G(X)) as 0. That is, the learning unit 150 may perform machine learning on the naturalness determination model 151 so that the naturalness determination model 151 determines the first corrected image as an unnatural image.
  • the learning unit 150 may determine the weight of the naturalness discrimination model 151 based on the machine learning result.
  • the weight of the naturalness determination model 151 may be a weight of the convolutional neural network included in the naturalness determination model 151 .
  • the learning unit 150 may transmit the machine learning execution result to the image correcting unit 120 .
  • the image corrector 120 may receive a machine learning execution result from the naturalness determination model 151 .
  • the naturalness discrimination model 151 and the image tone improvement model 121 may be a relationship between a discriminator and a producer in the GAN. Accordingly, when the learning unit 150 performs machine learning on the naturalness discrimination model 151 , the image correcting unit 120 may perform machine learning on the image tone improvement model 121 in response thereto. .
  • the image corrector 120 may perform machine learning on the image tone improvement model 121 to naturally correct the input image.
  • the image corrector 120 may perform machine learning on the image tone improvement model 121 based on the loss function of Equation 4 below.
  • the image corrector 120 may perform machine learning on the image tone improvement model 121 so that the D(G(X)) value becomes 1. That is, the image corrector 120 may perform machine learning on the image tone improvement model 121 so that the naturalness determination model 151 determines the first corrected image as a natural image. Machine learning of the naturalness discrimination model 151 and machine learning of the image tone improvement model 121 may be alternately performed. The image corrector 120 may determine the weight of the image tone improvement model 121 based on the machine learning result.
  • the weight of the image tone improvement model 121 may be a weight of a convolutional neural network included in the image tone improvement model 121 .
  • the learner 150 may generate a second corrected image based on the first corrected image.
  • the learner 150 may generate a second corrected image using the inverse improvement model 152 . This can be described in detail as follows.
  • FIG. 4 is a conceptual diagram of an inverse improvement model according to an embodiment of the present invention.
  • the inverse improvement model 400 of FIG. 4 may be configured the same as or similar to the inverse improvement model 152 of FIG. 1 .
  • the inverse improvement model 400 may be a convolutional neural network and may include a plurality of convolutional layers 401 to 405 .
  • the inverse enhancement model 400 may perform convolution on the first corrected image using the convolutional layers 401 to 405 .
  • the inverse enhancement model 400 may generate a second corrected image by performing convolution on the first corrected image.
  • the second corrected image may be an image in which color and naturalness correction effects are removed from the first corrected image.
  • the number of layers of the inverse enhancement model 400 may be less than or equal to the number of layers of the corrector (eg, the image corrector 120 of FIG. 1 ). Accordingly, the second corrected image may be an image in which color and naturalness correction effects are removed from the first corrected image, and the artifact removal effect of the first corrected image may be maintained.
  • the learning unit 150 may perform machine learning on the inverse improvement model 152 based on the second corrected image and the input image.
  • the learning unit 150 may learn the inverse enhancement model 152 so that the inverse enhancement model 152 removes the color correction effect and the naturalness correction effect among the correction effects of the first corrected image.
  • the learning unit 150 may perform machine learning on the inverse improvement model 152 based on the loss function of Equation 5 below.
  • Equation 5 may be a loss function of the inverse enhancement model 152 , C(G(X)) may be the second corrected image, and x may be the input image.
  • the learner 150 may perform machine learning on the inverse enhancement model 152 in a manner that minimizes the sum of the squares of the pixel value difference between the second corrected image and the input image.
  • the learning unit 150 may perform machine learning on the image tone improvement model 121 based on the second corrected image and the input image.
  • the learning unit 150 may perform machine learning in a manner that reverses the effect of the correction performed by the image tone improvement model 121 .
  • the learning unit 150 may perform machine learning on the image tone improvement model 121 based on Equation 6 below.
  • Equation 6 may be a third loss function of the image tone enhancement model 121 .
  • the learner 150 may perform machine learning on the image tone improvement model 121 in a manner that minimizes the sum of the squares of the pixel value difference between the second corrected image and the input image.
  • the learner 150 may determine the weight of the inverse improvement model 152 based on the machine learning result.
  • the weight of the inverse improvement model 152 may be a weight of the convolutional neural network included in the inverse improvement model 152 .
  • the learning unit 150 may transmit the machine learning result to the image correcting unit 120 .
  • the image corrector 120 may receive a machine learning result from the learner 150 .
  • the image corrector 120 may determine the weight of the image tone improvement model 121 based on the machine learning result.
  • the image corrector 120 may generate a first corrected image based on the weight on which the image corrector 110 has performed machine learning, and may perform a test based on this.
  • the weight on which machine learning is performed may be a weight of a convolutional neural network on which machine learning is performed.
  • FIG. 5 is a flowchart of a machine learning method according to an embodiment of the present invention.
  • the image corrector may generate a first corrected image ( S510 ).
  • the compensator may receive an input image from the image acquisition unit (eg, the image acquisition unit 110 of FIG. 1 ).
  • the compensator may generate a first corrected image by performing correction on the input image.
  • the corrector may generate the first corrected image by using the image tone enhancement model (eg, the image tone enhancement model 121 of FIG. 1 ).
  • the image corrector may perform machine learning on the first loss function of the image tone improvement model ( S520 ).
  • the learning unit eg, the learning unit 150 of FIG. 1
  • the intermediate image may be generated by the intermediate image generator (eg, the intermediate image generator 130 of FIG. 1 ) based on the input image.
  • the learning unit may transmit the machine learning result to the correction unit.
  • the compensator may receive the machine learning result from the learner, and may perform machine learning on the first loss function of the image tone improvement model based on this.
  • the image corrector may perform machine learning on the second loss function of the image tone improvement model ( S530 ).
  • the learning unit may perform machine learning on the naturalness determination model (eg, the naturalness determination model 151 of FIG. 1 ) based on the first corrected image and the original image.
  • the learning unit may perform machine learning on the naturalness discrimination model so that the naturalness discrimination model determines the first corrected image as an unnatural image.
  • the learning unit may transmit a result of performing machine learning to the image correcting unit.
  • the image compensator may receive a machine learning execution result from the learner.
  • the image corrector may perform machine learning on the image tone improvement model.
  • the image corrector may perform machine learning on the second loss function of the image tone improvement model.
  • the image corrector may perform machine learning on the image tone improvement model in response to machine learning of the naturalness discrimination model.
  • the image corrector may perform machine learning on the second loss function of the image tone improvement model so that the naturalness determination model determines the first corrected image as a natural image.
  • the image corrector may learn the third loss function of the image tone improvement model ( S540 ).
  • the learner may perform machine learning on the inverse improvement model based on the first corrected image and the input image.
  • the learning unit may perform machine learning on the inverse enhancement model so that the inverse enhancement model removes a color correction effect and a naturalness correction effect among the correction effects of the first corrected image.
  • the learning unit may transmit the machine learning result to the image correcting unit.
  • the image corrector may receive the machine learning result from the learner.
  • the image corrector may perform machine learning on the third loss function of the image tone improvement model based on the machine learning result.
  • the image corrector may perform machine learning of the image tone improvement model by repeating the processes of S510 to S540.
  • FIG. 6 is a conceptual diagram for explaining an effect of an image processing method according to an embodiment of the present invention.
  • the first image 601 may be an input image
  • the second image 602 is an image obtained by correcting only the color of the input image (in Equation 1).
  • the third image 603 is an image (in Equation 1, may be an image corrected on the basis of )
  • the fourth image 604 is an image (in Equation 1) in which color is corrected from the original image in the input image, the image is naturally corrected, and artifacts of the image are removed (in Equation 1) image corrected based on .
  • the second image 602 may be an unnatural image in which the color of the image is corrected.
  • the third image 601 may be a natural image compared to the second image 602 .
  • the fourth image 604 may be a natural image compared to the third image 603 .
  • the color of the fourth image 604 may be corrected when compared to the first image 601 .
  • the image processing apparatus 100 of FIG. 1 may be configured as follows.
  • FIG. 7 is a block diagram of an image processing apparatus according to another embodiment of the present invention.
  • the image processing apparatus 700 may include at least one processor 710 , a memory 750 , and a transceiver 730 connected to a network to perform communication. Also, the image processing device 700 may further include an input interface device 740 , an output interface device 750 , a storage device 760 , and the like. Each of the components included in the image processing apparatus 700 may be connected by a bus 770 to communicate with each other. However, each of the components included in the image processing apparatus 700 may not be connected to the common bus 770 but to the processor 710 through an individual interface or an individual bus. For example, the processor 710 may be connected to at least one of the memory 750 , the transceiver 730 , the input interface device 740 , the output interface device 750 , and the storage device 760 through a dedicated interface. .
  • the processor 710 may execute a program command stored in at least one of the memory 750 and the storage device 760 .
  • the processor 710 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.
  • Each of the memory 750 and the storage device 760 may be configured as at least one of a volatile storage medium and a non-volatile storage medium.
  • the memory 750 may be configured as at least one of a read only memory (ROM) and a random access memory (RAM).
  • the methods according to the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer-readable medium.
  • the computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination.
  • the program instructions recorded on the computer-readable medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software.
  • Examples of computer-readable media include hardware devices specially configured to store and carry out program instructions, such as ROM, RAM, flash memory, and the like.
  • Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.
  • the hardware device described above may be configured to operate as at least one software module to perform the operations of the present invention, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé et un appareil de traitement d'image basé sur l'apprentissage automatique. Le procédé de traitement d'image basé sur l'apprentissage automatique, selon la présente invention, peut comprendre les étapes consistant à : générer une première image corrigée par entrée d'une image d'entrée dans un premier réseau neuronal à convolution ; générer une image intermédiaire sur la base de l'image d'entrée ; effectuer un apprentissage automatique sur une première fonction de perte du premier réseau neuronal à convolution sur la base de la première image corrigée et de l'image intermédiaire ; et effectuer un apprentissage automatique sur une deuxième fonction de perte du premier réseau neuronal à convolution sur la base de la première image corrigée et d'une image naturelle.
PCT/KR2020/015722 2019-12-30 2020-11-11 Procédé et appareil de traitement d'image basé sur l'apprentissage automatique WO2021137415A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/770,993 US20220366539A1 (en) 2019-12-30 2020-11-11 Image processing method and apparatus based on machine learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020190178401A KR102537207B1 (ko) 2019-12-30 2019-12-30 머신 러닝에 기반한 이미지 처리 방법 및 장치
KR10-2019-0178401 2019-12-30

Publications (1)

Publication Number Publication Date
WO2021137415A1 true WO2021137415A1 (fr) 2021-07-08

Family

ID=76687092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/015722 WO2021137415A1 (fr) 2019-12-30 2020-11-11 Procédé et appareil de traitement d'image basé sur l'apprentissage automatique

Country Status (3)

Country Link
US (1) US20220366539A1 (fr)
KR (1) KR102537207B1 (fr)
WO (1) WO2021137415A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230125948A (ko) 2022-02-22 2023-08-29 김바올 무보정 이미지 인증 방법 및 장치

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170070715A (ko) * 2015-12-14 2017-06-22 삼성전자주식회사 딥러닝 기반 영상 처리 장치 및 방법, 학습 장치
KR101871098B1 (ko) * 2017-01-12 2018-06-25 포항공과대학교 산학협력단 이미지 처리 방법 및 장치
JP2019125014A (ja) * 2018-01-12 2019-07-25 コニカミノルタ株式会社 学習装置、学習方法、および学習プログラム
JP6569047B1 (ja) * 2018-11-28 2019-09-04 株式会社ツバサファクトリー 学習方法、コンピュータプログラム、分類器、及び生成器

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170070715A (ko) * 2015-12-14 2017-06-22 삼성전자주식회사 딥러닝 기반 영상 처리 장치 및 방법, 학습 장치
KR101871098B1 (ko) * 2017-01-12 2018-06-25 포항공과대학교 산학협력단 이미지 처리 방법 및 장치
JP2019125014A (ja) * 2018-01-12 2019-07-25 コニカミノルタ株式会社 学習装置、学習方法、および学習プログラム
JP6569047B1 (ja) * 2018-11-28 2019-09-04 株式会社ツバサファクトリー 学習方法、コンピュータプログラム、分類器、及び生成器

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SON HYEONGSEOK, LEE GUNHEE, CHO SUNGHYUN, LEE SEUNGYONG: "Naturalness‐Preserving Image Tone Enhancement Using Generative Adversarial Networks", COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS, WILEY-BLACKWELL, OXFORD, vol. 38, no. 7, 1 October 2019 (2019-10-01), Oxford, pages 277 - 285, XP055826258, ISSN: 0167-7055, DOI: 10.1111/cgf.13836 *
ZHAO HANG, GALLO ORAZIO, FROSIO IURI, KAUTZ JAN: "Loss Functions for Image Restoration With Neural Networks", IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, vol. 3, no. 1, 1 March 2017 (2017-03-01), pages 47 - 57, XP055796352, DOI: 10.1109/TCI.2016.2644865 *

Also Published As

Publication number Publication date
KR102537207B1 (ko) 2023-05-25
US20220366539A1 (en) 2022-11-17
KR20210085403A (ko) 2021-07-08

Similar Documents

Publication Publication Date Title
WO2018217019A1 (fr) Dispositif de détection d'un code malveillant variant sur la base d'un apprentissage de réseau neuronal, procédé associé, et support d'enregistrement lisible par ordinateur dans lequel un programme d'exécution dudit procédé est enregistré
WO2022092900A1 (fr) Procédé et dispositif d'entraînement d'un réseau à super-résolution
WO2021080145A1 (fr) Appareil et procédé de remplissage d'image
WO2013103184A1 (fr) Appareil et procédé pour améliorer une image en utilisant des canaux de couleurs
WO2010038941A2 (fr) Appareil et procédé pour obtenir une image à haute résolution
WO2019164237A1 (fr) Procédé et dispositif pour réaliser un calcul d'apprentissage profond à l'aide d'un réseau systolique
EP3963516A1 (fr) Apprendre à des gan (réseaux antagonistes génératifs) à générer une annotation par pixel
WO2022045485A1 (fr) Appareil et procédé de génération d'une vidéo de parole qui créent ensemble des points de repère
WO2022169035A1 (fr) Appareil et procédé de combinaison d'images permettant d'améliorer la qualité d'image
WO2022131497A1 (fr) Appareil d'apprentissage et procédé de génération d'image, et appareil et procédé de génération d'image
WO2021137415A1 (fr) Procédé et appareil de traitement d'image basé sur l'apprentissage automatique
EP3942481A1 (fr) Procédé de réalisation, par un dispositif électronique, d'une opération de convolution au niveau d'une couche donnée dans un réseau neuronal, et dispositif électronique associé
WO2022045495A1 (fr) Procédés de reconstruction de carte de profondeur et dispositif informatique électronique permettant de les implémenter
WO2020233089A1 (fr) Procédé et appareil de création de jeu de test, terminal et support de stockage lisible par ordinateur
WO2019216513A1 (fr) Processeur neuronal de calcul ligne par ligne et procédé de traitement de données l'utilisant
WO2020159016A1 (fr) Procédé d'optimisation de paramètre de réseau neuronal approprié pour la mise en œuvre sur matériel, procédé de fonctionnement de réseau neuronal et appareil associé
WO2022146050A1 (fr) Procédé et système d'entraînement d'intelligence artificielle fédéré pour le diagnostic de la dépression
WO2019039757A1 (fr) Dispositif et procédé de génération de données d'apprentissage et programme informatique stocké dans un support d'enregistrement lisible par ordinateur
WO2022169036A1 (fr) Appareil et procédé de synthèse d'image permettant d'améliorer la qualité d'image
WO2019117393A1 (fr) Appareil d'apprentissage et procédé de génération d'informations de profondeur, appareil et procédé de génération d'informations de profondeur, et support d'enregistrement associé à ces derniers
WO2022154523A1 (fr) Procédé et dispositif de mise en correspondance de données de balayage buccal tridimensionnel par détection de caractéristique 3d basée sur l'apprentissage profond
WO2021172674A1 (fr) Appareil et procédé permettant de générer un résumé vidéo par modélisation de graphe récursif
WO2022004970A1 (fr) Appareil et procédé d'entraînement de points clés basés sur un réseau de neurones artificiels
WO2020204610A1 (fr) Procédé, système et programme de coloration basés sur un apprentissage profond
WO2022255523A1 (fr) Procédé et appareil pour restaurer une image d'objet multi-échelle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911172

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911172

Country of ref document: EP

Kind code of ref document: A1