WO2022019566A1

WO2022019566A1 - Method for analyzing visualization map for improvement of image transform performance

Info

Publication number: WO2022019566A1
Application number: PCT/KR2021/009071
Authority: WO
Inventors: 박지은; 이진호; 이광희
Original assignee: 펄스나인 주식회사
Priority date: 2020-07-20
Filing date: 2021-07-14
Publication date: 2022-01-27

Abstract

The present invention relates to a method for analyzing a visualization map for the improvement of an image transform performance, the method comprising the steps of: receiving at least one image as an input, and extracting respective feature maps by means of the performance of an image transform algorithm; calculating the mapping relationship between the feature maps, during an image transform step; extracting a visualization map on the basis of the calculated mapping relationship; and modifying the extracted visualization map so as to finally transform into an image having a desired shape. According to the present invention described above, a visualization map can be extracted by calculating the mapping relationship between feature maps, during an image transform step according to the performance of an image transform algorithm, and the extracted visualization map can be modified to finally transform into an image having a desired shape.

Description

Visualization Map Analysis Method to Improve Image Conversion Performance

The present invention relates to a visualization map analysis method, and more particularly, in image conversion using artificial intelligence, by introducing an image conversion algorithm, image conversion performance improvement that enables conversion into a desired image in the end It relates to a visualization map analysis method for

In general, deep learning is defined as a set of machine learning algorithms that attempt high-level abstraction through a combination of several nonlinear transformation methods. .

When there is some data, many studies are being conducted to express it in a form that a computer can recognize (eg, a tool that expresses pixel information as a column vector in the case of an image) and apply it to learning.

Various deep learning techniques such as deep neural networks (DNN), convolutional neural network (CNN), and recurrent neural network (RNN) have been applied to fields such as speech signal processing, natural language processing, and image (video) processing, resulting in excellent performance applications Programs are being developed.

As shown in FIG. 1, neural style transfer is to restore an image (or create a new image) from a feature map extracted using the CNN 110 by receiving an image source source. , to create new image content C by synthesizing image A and image B.

In the above series of processes, there is a problem in that performance (style expression power) is deteriorated due to strong position preference when extracting characteristics (features) from neural style transition. In general, the expressive power of a style is evaluated by the user's subjective judgment. However, it is difficult to guarantee the reliability of the evaluation results because each user has different criteria for judgment. Therefore, it is necessary to more objectively quantify and explain the subjective judgment on the expressive power of style.

On the other hand, Korean Patent Application Laid-Open No. 10-2019-0062481 (Patent Document 1) discloses a system for executing a CNN, and the system for executing the CNN according to the hardware processor activates the input of the convolutional layer. Receive maps, wherein the input activation maps are a default input activation map layout, and extract pixel values of input activation maps from the default input activation map layout, an interleaved input activation comprising a plurality of clusters of input activation map pixels. reorder into a map layout, programmed by executable instructions to determine output activation maps of a convolutional layer from a plurality of kernel tiles and a plurality of clusters of input activation map pixels, wherein the output activation maps are output activation maps characterized in that it consists of an interleaved output activation map layout comprising a plurality of clusters of pixels.

In the case of Patent Document 1 as described above, pixel values of the input activation maps of the convolutional layer may be rearranged into an interleaved layout including a plurality of clusters of input activation map pixels, and the output activation maps are clusters of input activation map pixels Although it has the advantage that it can be determined on a tile-by-tile basis using fields and kernels, it is necessary to calculate the mapping relationship between the content characteristic map and the style characteristic map, extract it as an activation map (probability map, etc.), and modify the extracted activation map. Since there is no function, if an unwanted style area is referenced, it has a problem that it cannot be corrected.

The present invention was created in consideration of the above, and during the image conversion process according to the execution of the image conversion algorithm, a visualization map is extracted by calculating the mapping relationship between the characteristic maps, and the extracted visualization map is modified. The purpose of this is to provide a visualization map analysis method for improving image conversion performance that enables conversion to an image of a desired shape.

In order to achieve the above object, a visualization map analysis method for improving image conversion performance according to the present invention,

a) receiving at least one image and extracting each characteristic map by performing an image conversion algorithm;

b) calculating a mapping relationship between the feature maps during the image transformation process;

c) extracting a visualization map based on the calculated mapping relationship; and

d) by modifying the extracted visualization map, it is characterized in that it includes the step of finally converting the image into a desired shape.

Here, in calculating the mapping relationship between the characteristic maps in step b), a specific region is set in the characteristic map of one reference image, and a specific region of the other characteristic maps other than the reference image is mapped to each set region for visualization can do it

In addition, when the image conversion algorithm is a style transfer algorithm in step a), the similarity between the feature map of the content image and the feature map of the style image is calculated, and the content (target) data is used for the selected part. You can visualize the style (reference) area.

In addition, when the image conversion algorithm is GAN-based Image to Image translation in step a), the similarity is calculated using the self-attention GAN, and the style used for the selected part in the content (target) data ( See) area can be visualized.

In addition, prior to step a), the method may further include classifying content (target) data using a pre-trained classification machine learning model.

At this time, if the content (target) data is a photo, semantic segmentation or The method may further include segmenting content (target) data using an object detection algorithm.

In addition, in modifying the extracted visualization map in step d), the visualization map may be visualized in a polygonal form and modified, or several pairs of visualization maps may be modified simultaneously by providing a plurality of mapping areas at the same time.

In addition, in modifying the extracted visualization map in step d), the visualization map may be modified by exchanging the characteristic map of the mapping area between the content (target) data and the style (reference) data.

At this time, drawing N simple closed curves in the content (target) data and style (reference) data, respectively; and

The method may further include transforming the image according to the style selected in the style image for the region selected in the content image according to the order of drawing each of the closed curves.

According to the present invention, it is possible to extract a visualization map by calculating the mapping relationship between the characteristic maps during the image conversion process according to the execution of the image conversion algorithm, and finally convert the image into a desired shape by modifying the extracted visualization map. there are advantages to

1 is a diagram illustrating an overview of generating a new image from a feature map generated by using a CNN for an original image based on neural style transition.

2 is a flowchart illustrating an execution process of a visualization map analysis method for improving image conversion performance according to an embodiment of the present invention.

3 is a diagram illustrating a method of changing a style by selecting a specific area of a speaker.

4 is a diagram illustrating division of a content image into multi-area and converting the divided multi-area into multi-style in a style image.

The terms or words used in the present specification and claims should not be construed as limited to their ordinary or dictionary meanings, and the inventor may appropriately define the concept of the term in order to best describe his invention. Based on the principle, it should be interpreted as meaning and concept consistent with the technical idea of the present invention.

Throughout the specification, when a part "includes" a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated. In addition, terms such as “…unit”, “…group”, “module”, and “device” described in the specification mean a unit that processes at least one function or operation, which is hardware or software or a combination of hardware and software. can be implemented as

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Here, before describing the embodiment of the present invention in earnest, an image style transfer introduced in the present invention will be briefly described first to help the understanding of the present invention.

Referring to FIG. 1 , if any arbitrary source images (eg, images A and B) are provided, a computer system extracts a feature map using a convolutional neural network (CNN) 110 . That is, each characteristic map for the source images A and B is extracted. After extracting the feature map for the original image in this way, finally, using the extracted feature map, a new image (image content C) is generated based on the image style transfer technique.

In a series of processes as above, the image is converted into a single vector.

Expressed as , when there are N filters in the CNN model, a matrix consisting of the result of passing each pixel of the image through the filter

can be defined. here

Gram matrix using

This is defined

At this time

In the second layer, a loss function is defined as follows.

And content loss and style loss are respectively defined as follows.

here,

denotes the weighting coefficients for the contribution of each layer to the loss.

In addition, a total loss obtained by weighting a content loss and a style loss is defined as follows.

The total loss as described above is a linear combination between content loss and style loss.

Then, in the following, an embodiment of the present invention will be described based on the above.

Referring to FIG. 2 , the visualization map analysis method for improving image conversion performance according to an embodiment of the present invention first receives at least one image and extracts each characteristic map by performing an image conversion algorithm (step S201) ). Here, when the image conversion algorithm is a style transfer algorithm, the similarity between the feature map of the content image and the feature map of the style image is calculated, and the style (reference) region used for the selected part in the content (target) data. can be visualized.

In addition, when the image conversion algorithm is GAN-based Image to Image translation, the similarity is calculated using the self-attention GAN algorithm, and the style (reference) area used for the selected part in the content (target) data is selected. can be visualized.

Here, the self-attention generative adversarial networks (SAGAN) algorithm supplements the local convolution structure of the existing convolutional neural network (CNN) method by using a method called self-attention in generative adversarial networks (GAN). one way For such self-attention generative adversarial networks (SAGAN), the paper "Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-Attention generative adversarial networks. In Proceedings of the 36th International Conference on Machine Learning; Chaudhuri, K., Salakhutdinov, R., Eds.; PMLR: Long Beach, CA, USA, 2019; Volume 97, pp. 7354-7363." Hereinafter, a detailed description thereof will be omitted in this embodiment.

In addition, prior to step S201, the method may further include classifying content (target) data using a pre-trained classification machine learning model.

Also, as a method different from the series of methods described above, a method in which a user directly selects a part to be changed may be applied.

That is, a portion of the content data to be changed to the style of the style data is selected (eg, by dragging a specific area with a mouse). In addition, the same selection is made in the style data corresponding to the region selected in the content data, and the selected region is used as a style. At this time, if you want to use a different area in the style data, you can edit the selection area. In relation to the above, an amplified description will be made with reference to FIG. 3 .

As shown in FIGS. 3A and 3B , when a user selects a region including a specific part (eg, a speaker part) with a mouse, the style of the part is changed as shown in (C). That is, if you want to convert (A) the style part (large and small speaker part) in the speaker body and (B) the speaker body to the background (content) only the style part, as described above, the user selects a specific part with the mouse. If the included area is selected, as in (C), (B) the speaker body is used as the background (content) and (A) the style of the speaker (large and small speaker parts) is converted. In this case, the non-characteristic background is not converted.

As shown in (A) of FIG. 3 , when a user drags a region including eyes/nose, etc. with a mouse or selects a divided part with a mouse as shown in (B), the style of the part is changed. At this time, normal skin that is not characteristic such as the background or eyes, nose, mouth, and ears is not converted.

Meanwhile, when each feature map is extracted by performing an image conversion algorithm on the input image as described above, a mapping relationship between the feature maps is calculated during the image conversion process (step S202). Here, in calculating the mapping relationship between the characteristic maps, it is possible to set a specific region in the characteristic map of one reference image, and map a specific region of the other characteristic maps other than the reference image to each set region for visualization.

Thereafter, a visualization map is extracted based on the calculated mapping relationship (step S203).

Then, by modifying the extracted visualization map, it is finally converted into an image of a desired shape (step S204). Here, in modifying the extracted visualization map, a plurality of pairs of visualization maps may be simultaneously modified by visualizing and modifying the visualization map in a polygonal form or by providing a plurality of mapping areas at the same time.

In addition, in modifying the extracted visualization map, the visualization map may be modified by exchanging a characteristic map of a mapping area between content (target) data and style (reference) data.

At this time, drawing N simple closed curves in the content (target) data and style (reference) data, respectively; and transforming the image according to the style selected in the style image for the region selected in the content image according to the order of drawing each of the closed curves.

Here, if it is desired to convert the image to a different style, an editing function for changing the image to select a different area in the style image may be added.

In addition, a method of converting an image into a multi-style for a multi-region in a series of processes as described above may be applied.

That is, first, N simple closed curves are drawn from the content data and the style data, respectively. Then, each selected area in the content image according to the order of drawing the closing curve transforms the image according to the style selected in the style image. In this regard, an amplified description will be made with reference to FIG. 4 .

Referring to FIG. 4 , this is a method in which there are N areas that the user wants to convert in the content image, and the N areas are converted into N styles in the style image.

First, in the content image as shown in (A) and the style image as shown in (B), N closed curves (simple closed curves) without N self-intersections as shown in (C) are drawn respectively.

Then, according to the order of drawing each closed curve, the area selected in the content image is converted into an image as shown in (D) according to the style selected in the style image. Therefore, in this way, image conversion is performed N times in total.

As described above, the visualization map analysis method for improving image conversion performance according to the present invention calculates the mapping relationship between characteristic maps during the image conversion process according to the execution of the image conversion algorithm to extract the visualization map, and the extracted visualization map It has the advantage of finally being able to convert it into a desired shape image by modifying it.

In addition, the visualization map analysis method for improving image conversion performance of the present invention can be applied not only to art image conversion but also photo image conversion.

As mentioned above, although the present invention has been described in detail through preferred embodiments, the present invention is not limited thereto, and it is common in the art that various changes and applications can be made without departing from the technical spirit of the present invention. self-explanatory to the technician. Accordingly, the true protection scope of the present invention should be construed by the following claims, and all technical ideas within the equivalent scope should be construed as being included in the scope of the present invention.

Claims

a) receiving at least one image and extracting each characteristic map by performing an image conversion algorithm;

b) calculating a mapping relationship between the feature maps during the image transformation process;

c) extracting a visualization map based on the calculated mapping relationship; and

d) A visualization map analysis method for improving image conversion performance, comprising the step of modifying the extracted visualization map to finally convert it into an image of a desired shape.
According to claim 1,

In calculating the mapping relationship between the characteristic maps in step b), a specific region is set in the characteristic map of one reference image, and a specific region of the other characteristic maps other than the reference image is mapped to each set region and visualized Visualization map analysis method to improve transformation performance.
According to claim 1,

When the image conversion algorithm is a style transfer algorithm in step a), the similarity between the feature map of the content image and the feature map of the style image is calculated, and the style used for the selected part in the content (target) data ( See) Visualization map analysis method to improve image transformation performance to visualize regions.
According to claim 1,

When the image conversion algorithm in step a) is image-to-image translation based on generative adversarial networks (GAN), the similarity is calculated using self-attention GAN and used for the selected part in the content (target) data A visualization map analysis method to improve image transformation performance to visualize the styled (reference) regions.
5. The method of claim 3 or 4,

Visualization map analysis method for improving image transformation performance, further comprising the step of classifying content (target) data using a pre-trained classification machine learning model prior to step a).
6. The method of claim 5,

When the content (target) data is a picture, semantic segmentation or A visualization map analysis method for improving image conversion performance, further comprising the step of segmenting content (target) data with an object detection algorithm.
According to claim 1,

In modifying the visualization map extracted in step d), visualization for improving image conversion performance of modifying the visualization map in a polygonal form or modifying several pairs of visualization maps simultaneously by providing a plurality of mapping areas at the same time Map analysis method.
According to claim 1,

In revising the extracted visualization map in step d), a visualization map analysis method for improving image conversion performance to modify the visualization map by exchanging the characteristic map of the mapping area between the content (target) data and the style (reference) data .
9. The method of claim 8,

drawing N simple closed curves in the content (target) data and style (reference) data, respectively; and

The visualization map analysis method for improving image conversion performance further comprising the step of converting an image according to the style selected in the style image for the region selected in the content image according to the order of drawing each of the closed curves.