WO2019196718A1

WO2019196718A1 - Element image generation method, device and system

Info

Publication number: WO2019196718A1
Application number: PCT/CN2019/081217
Authority: WO
Inventors: 孙东慧; 张庆; 唐浩超
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2018-04-10
Filing date: 2019-04-03
Publication date: 2019-10-17
Also published as: CN110363830B; CN110363830A

Abstract

An element image generation method, device and system. The method comprises: generating a first feature image on the basis of an initial element image (S101); generating a second feature image on the basis of the first feature image (S103), and using the second feature image as a new first feature image (S105), and iteratively executing the step of generating the second feature image; and generating a target element image corresponding to the initial element image on the basis of the first feature image and at least one second feature image (S107). At least one of the steps of generating the second feature image comprises: generating a second feature image on the basis of the first feature image and a downsampled element image, the downsampled element image being obtained by downsampling the initial element image; the downsampled element image matches the space size of the first feature image. The method can efficiently expand different styles of target element images according to initial element images, improve the efficiency of constructing an element image library, and further reduce information losses during data processing and improve the accuracy of the generated target element image.

Description

Element image generation method, device and system

The present application claims priority to Chinese Patent Application No. 20110131505, the entire disclosure of which is incorporated herein by reference.

Technical field

The present application relates to the field of computer technology, and in particular, to an element image generation method, apparatus, and system.

Background technique

With the advent of the computer and Internet era, various graphic elements, such as fonts in the font, cartoon characters, fancy symbols, etc., have become an indispensable part of people's work and life, making people's lives more colorful. The graphic elements of different styles combine the external artistic expression with the inherent rich connotation, and become an effective means for people to express themselves.

When constructing an image library of graphic elements (referred to as an element image library), it is often necessary to create a series of image images of the same style, and it is necessary for the designer to design and manufacture each element image in the element image library one by one, which is time-consuming and labor-intensive. . Take the element image as an example of a font in a font library. The size of the Chinese character library is very large. Among them, the GB2312 national standard code contains 6,763 commonly used Chinese characters, the GBK encoding program contains 21,886 Chinese characters, and the latest GB18030 national standard code contains more than 70,044 Chinese characters. Since each new font is designed and produced by the font designer, it is necessary to repeat the labor of each character in the font to make it have the same style. Therefore, the workload is very heavy. Big.

Although the machine learning method is introduced in the related art to generate fonts, the implementation is not effective due to excessive information loss during data processing.

Therefore, there is a need for an efficient and accurate element image generation method to improve the efficiency of building an element image library and the accuracy of element images.

Summary of the invention

The embodiment of the present application provides an element image generation method and apparatus, aiming to efficiently and accurately generate an element image, so as to improve the efficiency of constructing an element image library and improve the accuracy of the constructed element image.

The embodiments of the present application adopt the following technical solutions:

In a first aspect, an embodiment of the present application provides an element image generating method, including:

Generating a first feature map based on the initial element image;

Generating a second feature map based on the first feature map;

Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;

Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The step of generating the second feature map by the iterative process includes the following steps at least once:

Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.

Preferably, in the element image generating method provided by the first aspect, the method is performed by an element image generating module; the element image generating module includes an encoding sub-module, and the encoding sub-module includes M-levels connected step by step. Coding unit, M is a natural number;

The step of generating a first feature map based on the initial element image is performed by the first level coding unit;

The step of generating the second feature map is performed by the second level coding unit to the Mth level coding unit.

Preferably, in the element image generating method provided by the first aspect, the step of generating a second feature map based on the first feature map and the downsampled element image is performed by at least a second level coding unit to an Mth level coding unit The primary coding unit performs.

Preferably, in the element image generating method provided by the first aspect, the method further includes:

Determining an element image pair including an original image and a real sample image corresponding to the original image;

Generating, by the element image generating module, a generated image corresponding to the original image based on the original image;

Determining, by the discriminating module, a degree of difference between the generated image and the real sample image; wherein the generated image is marked as a negative class, and the real sample image is marked as a positive class;

The parameters in the element image generating module and the discriminating module are alternately adjusted according to the degree of difference until the degree of difference satisfies a preset condition.

Preferably, in the element image generating method provided by the first aspect, determining a degree of difference between the generated image and the real sample image includes:

Calculating a loss function according to the generated image and the real sample image;

A discrimination result is output according to a calculation result of the loss function, the discrimination result being used to reflect a degree of difference between the generated image and the real sample image.

Preferably, in the element image generating method provided by the first aspect, the loss function includes a feature space loss function for reflecting a feature space between the generated image and the real sample image The difference.

Preferably, in the element image generating method provided by the first aspect, the loss function includes an anti-loss function for reflecting a degree of contribution of the element image generating module to reducing the degree of the difference and the The degree to which the discriminating module contributes to increasing the degree of the difference.

Preferably, in the element image generating method provided by the first aspect, the loss function comprises at least one of the following:

a pixel space loss function for reflecting a difference between the generated image and the real sample image at a corresponding pixel point;

A class loss function for reflecting a difference in categories between the generated image and the real sample image.

Preferably, in the element image generating method provided by the first aspect, after the generated image corresponding to the original image is generated based on the original image by using the element image generating module, the method further includes:

The generated generated image is tested to determine the degree of matching of the generated image with the real sample image.

Preferably, in the element image generating method provided by the first aspect, the generated image generated by the element image generating model is tested, and the matching degree of the generated image with the real sample image is determined, including:

Determining the degree of matching of the generated image with the real sample image according to at least one of the following indicators:

An L1 distance between the generated image and the real sample image;

a peak signal to noise ratio PSNR between the generated image and the real sample image;

The structural similarity SSIM between the generated image and the real sample image.

Preferably, in the element image generating method provided by the first aspect, after determining the degree of matching between the generated image and the real sample image, the method further includes:

Adjusting parameters in the element image generating module according to a degree of matching between the generated image and the real sample image.

Preferably, in the element image generating method provided by the first aspect, the element image is a text font.

In a second aspect, an embodiment of the present application provides an element image generating apparatus, where the apparatus includes:

a feature map first generating unit, configured to generate a first feature map based on the initial element image;

a second generation unit of the feature map, configured to generate a second feature map based on the first feature map;

a target element image generating unit, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The at least one of the second generation unit of the feature map is further configured to:

Generating a second feature map based on the first feature map and the downsampled element image; the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.

In a third aspect, an embodiment of the present application provides an element image generating system, where the system includes an element image generating module, where the element image generating module includes an encoding sub-module and a decoding sub-module; a connected M-level coding unit, where the decoding sub-module includes M-level decoding units connected in stages, and M is a natural number;

a first level coding unit, configured to generate a first feature map based on the initial element image;

a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;

And a decoding submodule, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.

Preferably, in the element image generating system provided by the third aspect, the system further includes:

An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image;

among them,

The generated image is generated by the element image generation module based on the original image;

The real sample image corresponds to the original image to form an element image pair;

The degree of difference is used to alternately adjust parameters in the element image generating module and the discriminating module such that the degree of difference satisfies a preset condition.

In a fourth aspect, an embodiment of the present application provides an electronic device, including:

Processor;

A memory arranged to store computer executable instructions that, when executed, cause the processor to perform the following operations:

Generating a first feature map based on the initial element image;

Generating a second feature map based on the first feature map;

In a fifth aspect, an embodiment of the present application provides a computer readable storage medium, where the one or more programs are stored, when the one or more programs are executed by an electronic device including multiple applications. , causing the electronic device to perform the following operations:

Generating a first feature map based on the initial element image;

Generating a second feature map based on the first feature map;

In a sixth aspect, the embodiment of the present application provides a method for generating a Chinese character font image, including:

Generating a first feature map based on the initial Chinese font image;

Generating a second feature map based on the first feature map;

Generating a target kanji font image corresponding to the initial kanji font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

Generating, according to the first feature map and the downsampled Chinese font image, the second feature map, wherein the downsampled Chinese font image is obtained by sampling the initial Chinese font image, and the downsampled Chinese font image and the first The spatial dimensions of a feature map match.

In a seventh aspect, the embodiment of the present application provides a text font image generating method, including:

Generating a first feature map based on the initial text font image;

Generating a second feature map based on the first feature map;

Generating a target text font image corresponding to the initial text font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

Generating, according to the first feature map and the downsampled text font image, the second feature map, wherein the downsampled text font image is obtained by sampling the initial text font image, and the downsampled text font image and the first The spatial dimensions of a feature map match.

The above at least one technical solution adopted by the embodiment of the present application can achieve the following beneficial effects:

With the technical solution provided by the embodiment of the present application, on the basis of generating the first feature map based on the initial element image, the step of generating the second feature map is iteratively performed to obtain a plurality of second feature maps. And, during the iterative execution, the downsampled element image is introduced at least once as supplementary information for generating the second feature map. Based on this, a target element image corresponding to the initial element image is generated based on the first feature map and the at least one second feature map of the plurality of second feature maps. Therefore, the technical solution provided by the embodiment of the present application can not only efficiently expand the image of the target element of different styles according to the initial element image, but also improve the efficiency of constructing the element image library, and can also reduce the information loss in the data processing process, thereby Conducive to improving the accuracy of the generated target element image.

DRAWINGS

The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:

FIG. 1 is a schematic flowchart diagram of an element image generating method according to an embodiment of the present application;

2 is a schematic structural diagram of an element image generation model used in an embodiment of the present application;

3 is a schematic structural diagram of a coding unit in an element image generation model used in an embodiment of the present application;

4 is a schematic structural diagram of a decoding unit in an element image generation model used in an embodiment of the present application;

FIG. 5 is a schematic flowchart of generating an image of a target element in an embodiment of the present application;

6 is a schematic diagram of a processing procedure of a first-level coding unit in an element image generation model used in an embodiment of the present application;

FIG. 7 is a schematic diagram of a processing procedure of coding units of each level in an element image generation model used in an embodiment of the present application; FIG.

FIG. 8 is a schematic structural diagram of an element image generation model training network used in an embodiment of the present application; FIG.

9 is a schematic structural diagram of a discriminating module in an element image generation model training network used in an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a model for calculating a feature space loss function according to an embodiment of the present application; FIG.

FIG. 11 is a schematic structural diagram of an element image generating apparatus according to an embodiment of the present application;

FIG. 12 is a schematic structural diagram of an element image generating system according to an embodiment of the present application;

FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

detailed description

In order to facilitate the understanding of the embodiments of the present application, several elements introduced in the present application are first introduced herein.

Convolutional Neural Network (CNN) is a kind of artificial neural network. It consists of a series of convolutional layers, nonlinear activation layers, pooling layers, normalized layers, and fully connected layers. .

Feature Map: A representation of a feature output by a convolutional layer, a pooled layer, a fully connected layer, or other layers in the network.

The technical solutions of the present application will be clearly and completely described in the following with reference to the specific embodiments of the present application and the corresponding drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

An embodiment of the present application provides an element image generating method. It can be understood that the element image to which the embodiment of the present application is applied may include various graphic elements such as a font of a character, a mark symbol on a musical score, and a cartoon character shape. The purpose of the embodiment of the present application is to select a style according to a small part of a sample of an element image (it can be understood that this part of the sample is usually designed by the user one by one, which can be used as a basis for determining the style feature), and automatically expands the generation and the element image collection. The other element image corresponds to the new element image, so that the generated new element image is consistent with the style of the user-designed sample, so that different styles of element image collection can be efficiently and accurately generated, and different styles of graphics can be realized. Automatic expansion of collections.

In an application scenario, the element image is embodied as a cartoon image. Assuming that the user wishes to design a uniform cartoon image, the cartoon image of the animal can be manually designed according to the original image of some animals, and the original image and the cartoon image of the animal (as a real sample image) are input as input elements. An image generation model (also referred to as an element image generation module) performs training. Furthermore, the image of the other animal can be input as the initial element image into the above-mentioned trained element image generation model, and other cartoon images unified with the style of the hand design can be automatically generated.

In another application scenario, the element image is embodied as a Chinese font. Each Chinese font requires a corresponding font library. Assuming that the user wants to build a new font library, you can manually design some new Chinese fonts (for example, 1000 new font Chinese characters) to determine the style of the new font, and then use the original font of this part of the Chinese characters (for example, Song). As the original image, the new image of this part of the Chinese character is used as the real sample image, and the element image generation model is trained. Further, the method provided in the embodiment of the present application may be used to construct a font library by using the remaining Chinese characters (for example, the Chinese character standard included in the GB18030 national standard code may be used, and the number of remaining Chinese characters is 70244-1000=69244; The original font of the specific Chinese character of the new font (for example, Song) is used as the initial element image, and the model is generated by the trained element image (also referred to as element image generation module, element image generation network) to generate the remaining Chinese characters. The new font, which can efficiently generate new fonts for Chinese characters, improves the efficiency of building font libraries.

The element image generation model (which can be embodied as the font generation model in the above application scenario) used in the embodiment of the present application receives the initial element image multiple times (can be embodied as the original font in the above application scenario), Accepting the original initial element image, and also accepting the downsampled element image after downsampling the original initial element image as supplementary information, thereby reducing information loss during data processing, and improving the generated target element image (may be Condensed into the accuracy of the new fonts in the above application scenarios.

As shown in FIG. 1 , an embodiment of the present application provides an element image generation method, which may include:

S101: Generate a first feature map based on the initial element image.

S103: Generate a second feature map based on the first feature map.

S105: The second feature map is used as a new first feature map, and the step S103 is performed iteratively to generate a second feature map to obtain a plurality of second feature maps.

S107: Generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

Wherein, in the above step S103, the step of generating the second feature map includes, at least once:

And generating a second feature map based on the first feature map and the downsampled element image; the downsampled element image is obtained by sampling the initial element image, and the downsampled element image is matched with the spatial size of the first feature map.

It can be understood that when the element image generating method shown in FIG. 1 is implemented, the initial element image can be determined first. The determined initial element image can be understood as the basis for generating the target element image, and the generated target element image corresponds to the initial element image. According to the application requirements of the user, the correspondence between the target element image and the initial element image may be different, and specifically may be implemented by a trained element image generation model (also referred to as an element image generation module). Specifically, in the application scenario of font generation, it can be understood that the corresponding target element image and the initial element image reflect different fonts (styles) of the same Chinese character.

It should be noted that when determining the initial element image, it can also be understood as determining a batch of images that are processed into a single image input model for processing according to the batch size. It can be understood that the batch size refers to the number of images that are input into the model for processing in a single time, and is a parameter in a model constructed based on a neural network. For example, if the value of batch size is 1, then only one initial element image is input to the model for processing at a time; the value of batch size is taken as 16, and 16 initial element images are input to the model as a batch each time. Correspondingly, the generated target element image will also be 16 sheets.

On the basis of determining the initial element image, the above steps S101 to S107 may be performed to generate a target element image corresponding to the initial element image. Specifically, the model element may be generated based on the trained element image, and the target element image corresponding to the initial element image is generated according to the downsampled element image after the downsampled processing of the initial element image and the initial element image.

With the above technical solution, on the basis of generating the first feature map based on the initial element image, the step of generating the second feature map is iteratively performed to obtain a plurality of second feature maps. And, during the iterative execution, the downsampled element image is introduced as supplementary information at least once, for generating the second feature map, and generating the second feature map based on the first feature map and the at least one second feature map of the plurality of second feature maps. A target element image corresponding to the initial element image. Therefore, the technical solution provided by the embodiment of the present application can not only efficiently expand the image of the target element of different styles according to the initial element image, but also improve the efficiency of constructing the element image library, and can also reduce the information loss in the data processing process, thereby Conducive to improving the accuracy of the generated target element image.

It can be understood that no matter what kind of application scenario, no matter how to build and train the element image generation model, as long as the model can accept the original initial element image and accept the downsampled element image after the downsampling process at least once. Based on the generated target element image corresponding to the initial element image, the target element image including the new element image can be efficiently and accurately generated based on the initial element image, and the information loss in the data processing process can be reduced. Conducive to improving the accuracy of the generated target element image.

In the following, with reference to the accompanying drawings, an application scenario in which an element image is embodied as a text font is used as an example, and a specific frame of an element image generation model (element image generation module) is illustrated, and an element provided by an embodiment of the present application is described in detail in combination with an element image generation model. An example of a multifaceted implementation of an image generation method.

FIG. 2 is a schematic diagram showing the framework of an element image generation model suitable for the embodiment of the present application. The element image generation model shown in Fig. 2 includes a generation module including an encoding sub-module (Encoder) and a decoding sub-module (Decoder). In a specific implementation, the input of the model is an initial element image, and the output of the model is a target element image corresponding to the initial element image. Taking the model specifically for generating a new font as an example, the input initial element image is the text of the original font (for example, the "趵" character in bold in FIG. 2), and the output target element image is the text of the new font (for example, in FIG. 2) The word "趵" in the body.

The encoding sub-module in the above element image generation model (element image generation module) is used to convert the initial element image into a high-dimensional feature map, and the decoding sub-module is used to convert the high-dimensional feature image into a new image, that is, the output Target element image. It can be understood that the processing process of the data by the encoding sub-module and the decoding sub-module is symmetric and reciprocal.

In the element image generation module, the coding sub-module includes M-level coding units connected in stages, and M is a natural number. The decoding sub-module includes M-level decoding units connected in stages, and M is a natural number. Taking the model shown in Figure 2 as an example, the value of M is 8.

Specifically, in the above step S101, the step of generating the first feature map based on the initial element image is performed by the first level coding unit (such as the coding unit e1 in FIG. 2). The first-level coding unit generates a first feature map based on the image of the initial element, and outputs the first feature map to the second-level coding unit (such as the coding unit e2 in FIG. 2). In addition to this, the first level coding unit may also output the first feature map to the decoding unit corresponding to the first level coding unit in the decoding submodule (such as the decoding unit d7 in FIG. 2).

The steps of generating the second feature map based on the first feature map (and the downsampled element image) in the above steps S103 to S105 are performed by the second level encoding unit to the Mth level encoding unit. And, at least one coding unit in the second level coding unit to the Mth level coding unit (for example, at least one coding unit in the coding units e2 to e8 in FIG. 2) is generated based on the previous level when generating the second feature picture. In addition to the feature map output by the coding unit, the downsampled element image is obtained based on the initial element image, and the downsampled element image is matched with the spatial size of the feature map output by the previous coding unit.

It should be noted that each coding unit in the coding sub-module has the same structure and is connected step by step in order, and each coding unit outputs a feature map to the next-stage coding unit (the feature picture outputted by the last-level coding unit will be first The level decoding unit accepts). The coding unit may be composed of a plurality of convolutional layers Conv, a drain modified linear cell layer LReLU, and a bulk normalization layer BN. The image or feature map input to the coding unit may be processed and output through the leakage improvement linear unit layer LReLU, the convolution layer Conv, and the batch normalization layer BN, as shown in FIG. It can be understood that the number of layers and the order between the layers can be adjusted, and the convolutional layer Conv and the leaky modified linear unit layer LReLU can usually be paired (for example, designed as Conv-LReLU-Conv-LReLU or LReLU-Conv- In the form of LReLU-Conv), the batch normalization layer BN can also be designed at the end of the coding unit.

Each decoding unit in the decoding sub-module has the same structure and is also connected in stages in order. And, the number of decoding units is the same as the number of coding units. The decoding unit may be composed of a modified linear unit layer ReLU, a deconvolution layer Deconv, a bulk normalization layer BN, and a gradient falling layer Dropout. The feature map input to the decoding unit may be processed and outputted through the modified linear unit layer ReLU, the deconvolution layer Deconv, the batch normalization layer BN, and the gradient falling layer Dropout, as shown in FIG. Similarly, the number of layers and the order between the layers can be adjusted, and the processing of the feature map can be realized, and finally the image of the target element can be generated.

In the encoding stage in the encoding sub-module, the spatial size of the output feature map is reduced to half of the spatial size of the input feature image for each processing of the feature map by one level of coding unit (specifically, the width of the feature map is reduced by half, And the height is reduced by half). In the decoding stage of the decoding sub-module, the spatial size of the output feature map will be increased to twice the spatial size of the input feature image after each processing of the feature map by one level of decoding unit (specifically, the width of the feature map is increased). It is twice and the height is doubled). Therefore, in the two stages of encoding and decoding, there will be the same spatial size of the feature picture output by a certain coding unit and a certain decoding unit, and the coding unit and the decoding unit satisfying the condition may be associated with each other. It can be called a symmetric pair. Specifically, it can be described that the coding unit and the decoding unit having the same spatial size of the output feature map have a corresponding relationship.

Taking the size of the initial element image as 256×256×3 as an example (the size representation of the image can be understood as: the number of pixels in the width and height of the image is 256, which is represented by the RGB color model, so for each Pixels are represented by three-dimensional attribute data). The spatial size of the feature map output by the 8-level coding unit included in the coding sub-module shown in FIG. 2 is as shown in Table 1.

Table 1 Example of coding unit output feature map size

The output feature map size shown in Table 1 is taken as an example with a batch size (batch=16) of 16. The size of the image is changed from 256×256 of the initial element image to the encoding unit e1 (first-level coding unit). 128×128, which is processed by the coding unit e2 (second-level coding unit) to become 64×64, and so on, until it is processed to 1×1 after being processed via the coding unit e8 (eight-level coding unit).

It can be understood that, in the above example, the number of convolution layers in each coding unit may be different, resulting in different values of channels in each level of output. Generally, in order to compensate for the information loss in the data processing process to some extent, the number of convolution layers preferably increases with the number of stages of the coding unit.

Corresponding to the output feature map size exemplified in Table 1, the spatial size of the feature map output by the 8-level decoding unit included in the decoding sub-module shown in Fig. 2 is as shown in Table 2 below.

Table 2 decoding unit output feature map size example

The structure of the element image generation model and the correspondence between the coding unit and the decoding unit are briefly explained above. Based on the element image generation model illustrated in FIG. 2, based on the trained element image generation model, a target corresponding to the initial element image is generated according to the downsampled element image after the downsampled processing of the initial element image and the initial element image. The element image may specifically include the following steps, as shown in FIG. 5:

S1031: Input the initial element image and the downsampled element image into the encoding submodule.

It should be noted that, in the embodiment of the present application, the downsampled element image after the downsampled processing of the initial element image is also input into the encoding submodule, which can reduce information loss in the data processing process, thereby generating a more accurate target element image.

In a specific implementation, the first level coding unit (for example, the coding unit e1 in FIG. 2) can directly accept the original initial element image, as shown in FIG. 6. Any Nth-level coding unit after the first-stage coding unit (N may be a natural number greater than 1 and not greater than M), and may be subjected to down-sampling processing in addition to the feature map outputted by the coding unit of the previous-stage coding unit. The image of the downsampled element is shown in Figure 7. For different coding units, the spatial size of the acceptable downsampled element image corresponds to the coding unit, and specifically, should be consistent with the spatial size of the feature image output by the coding unit received by the coding unit, so that the coding unit can After the two are merged, subsequent data processing is performed.

It can be understood that the downsampled element image can be input to each coding unit after the first stage coding unit (for example, coding units e2 to e8 in FIG. 2), or can be input only to the partial coding unit (for example, in FIG. 2). The coding units e2 and e7) may be as long as the spatial size of the downsampled element image coincides with the spatial size of the upper-level feature map received by the accessed coding unit. For example, the spatial size of the feature map output by the encoding unit e1 to the encoding unit e2 is 128×128, and the spatial size of the downsampled element image of the input encoding unit e2 should also be processed to be 128×128. In this way, layers of different depths of the model (corresponding to coding units of different levels) can supplement the information of the specific size of the initial element image, thereby facilitating the generation of high quality target element images.

Specifically, when the image of the downsampled element is obtained, the initial element image may be downsampled by using a plurality of methods such as bilinear interpolation, single interpolation, and nearest interpolation. This application does not limit this.

The coding unit combines the received feature map and the downsampled element image for subsequent processing. The specific method of fusion may have various options, for example, superimposing on a specific dimension or performing a superposition operation on attribute values of corresponding pixel points. Preferably, the superposition is performed on the feature map channel dimension.

S1033: Output multiple feature maps to the decoding submodule by using the encoding submodule.

As described above, the coding unit and the decoding unit having the same spatial size of the output feature map have a corresponding relationship, and constitute a symmetric unit pair, for example, <e1, d7>, <e3, d5>, and the like. In the embodiment of the present application, the output unit of the corresponding relationship and the output of the decoding unit are directly connected. Therefore, each coding unit in the coding sub-module outputs the generated feature map to its next-stage coding unit when the feature map is output (the feature map generated by the last-stage coding unit is output to the first-stage decoding unit), And output to its corresponding decoding unit. Combined with the model framework diagram shown in Figure 2, the details are as follows:

The first level coding unit (concrete to the coding unit e1) generates a first feature map according to the initial element image; the first level coding unit (concrete to e1) outputs the first feature map to the second level coding unit (concrete E2) and a decoding unit corresponding to the first level coding unit (concrete to the decoding unit d7);

The Kth order coding unit (concrete to any one of the coding units e2 to e7, for example, the coding unit e3) is output according to the downsampled element image and the K-1th coding unit (corresponding to the coding unit e2) a second feature map, generating a third feature map; the Kth level coding unit (eg, coding unit e3) outputs the third feature map to the K+1th level coding unit (corresponding to the coding unit e4) a decoding unit corresponding to a Kth-level coding unit (eg, coding unit e3) (corresponding to, specifically, decoding unit d5); wherein K is a natural number greater than 1 and less than M, a downsampled element image and a second feature The space of the figure is the same size;

Generating, by using the Mth-level coding unit (concrete to the coding unit e8), the fifth feature map according to the downsampled element image and the fourth feature map output by the M-1th coding unit ( embodied as the coding unit e7), and outputting the fifth feature map to the The first stage decoding unit in the decoding submodule (concrete to the decoding unit d1).

S1035: Generate a target element image corresponding to the initial element image according to the plurality of feature maps by using the decoding submodule.

The decoding units of each stage in the decoding sub-module are processed after receiving the feature map output by the decoding unit of the previous stage, and then the feature map outputted by the level is merged with the feature map transmitted by the corresponding coding unit, as the next level. The input of the decoding unit. Combined with the model framework diagram shown in Figure 2, the details are as follows:

The first level decoding unit (concrete to the decoding unit d1) generates a sixth feature map according to the fifth feature map output by the last level encoding unit ( embodied as the encoding unit e8), and outputs the sixth feature map to the second level decoding unit. Is the decoding unit d2);

The Lth stage decoding unit in the decoding submodule (specifically, any one of the decoding units d2 to d7, for example, the decoding unit d3) is output according to the L-1th stage decoding unit (corresponding to the decoding unit d2) a seventh feature map, generating an eighth feature map; the Lth level decoding unit (eg, decoding unit d3) is a coding unit corresponding to the eighth feature map and the Lth level decoding unit (eg, decoding unit d3) (corresponding, The ninth feature map outputted by the coding unit e5) is superimposed on the channel dimension to generate a tenth feature map, which is output to the L+1th level decoding unit (corresponding to the decoding unit d4); , L is a natural number greater than 1 and less than M;

The target element image corresponding to the initial element image is generated by the Mth stage decoding unit in the decoding submodule according to the eleventh feature map output by the M-1th stage decoding unit.

In the above manner, the coding unit in the coding sub-module accepts two kinds of output signals of the feature image output by the upper-level coding unit and the down-sampled element image after the down-sample processing of the initial element image, thereby being at different stages of the coding sub-module. Information with initial element images flows in. At the same time, the decoding unit in the decoding sub-module accepts the feature map directly output by the corresponding coding unit in the coding sub-module, in addition to the feature map output by the upper-level decoding unit, thereby further reducing the image data processing process. Information loss.

It can be understood that the above-mentioned numbering of the feature maps (first to eleventh feature maps) is for convenience of description only, and does not limit the feature map itself.

With the technical solution provided by the embodiment of the present application, after determining the initial element image, the trained element image generation model may be used to generate a target element image corresponding to the initial element image. Specifically, the element image generation model may be used according to the initial element. The image and the downsampled element image subjected to the downsampling process on the initial element image generate the target element image. By using the trained element image generation model, the target element image can be automatically generated according to the initial element image, so that the target element image of different styles can be efficiently expanded according to the initial element image, and the efficiency of constructing the element image library is improved. In addition, in the element image generation model, in addition to accepting the original initial element image, the downsampled element image after the downsample processing of the initial element image is accepted as supplementary information to generate a feature map, thereby reducing the data. The loss of information during processing facilitates the accuracy of the generated image of the target element.

The above example illustrates a specific implementation example of how to generate a target element image using the trained element image generation model. It can be understood that before this, the element image generation model can be trained to meet the usage requirements. The element image generation network including the element image generation module and the discrimination module may be specifically trained, as shown in FIG. 8. An element image generating module (hereinafter may be simply referred to as a generating module) is configured to generate a generated image corresponding to the original image based on the original image; the discriminating module is configured to discriminate the degree of difference between the generated image and the real sample image, and adjust the element according to the degree of difference The parameters in the image generation network; wherein the real sample image corresponds to the original image, constituting an element image pair.

Since the original image is encoded and decoded by the generation module, a new image is generated, that is, the image is generated. During the training process, the discriminating module determines whether the input image is a real sample image or a generated image generated by the model. The structure of the discriminating module can be as shown in FIG. In the discriminating module of the example shown in FIG. 9, the generated image and the real sample image enter the discriminating module, and the first two layers are the convolution layer Conv and the leak improving linear unit layer LReLU, and then are normalized by three [convolution layer Conv-batch zeros). The layer BN-drain improved linear unit layer LReLU] is formed by connecting in series. This is followed by a fully connected layer FC, and finally a Sigmoid layer, which maps the result between 0 and 1, reflecting the probability that the resulting image is a real sample image. It should be noted that, in the specific implementation, two or four [convolution layer Conv-batch zero-normalization layer BN-leak improved linear unit layer LReLU] structures may be connected in the middle, as long as the structure of the discriminating module is not complicated. Can meet the needs.

Specifically, the process of training the element image generation network may specifically include:

Inputting the original image into the generating module, and using the generating module to obtain a generated image corresponding to the generating module;

The generated image and the real sample image are used as a training sample input discriminating module, and the discriminating module determines the degree of difference between the generated image and the real sample image; wherein the generated image is marked as a negative class, and the real sample image is marked as a positive class;

According to the degree of difference, the parameters in the generating module and the discriminating module are alternately adjusted until the degree of difference satisfies the preset condition.

In the training process of the entire network (which can be understood as a machine learning model), the goal of the generation module is to make the generated generated image as real as possible, so that the discriminating module can be "spoofed" (that is, the discriminating module considers the generated image and the real There is no difference in the sample image, or the difference is small enough). The goal of the discriminating module is to correctly distinguish the real sample image and the generated image. Therefore, the training module and the discriminating module can be alternately trained during training. details as follows:

First, the system initializes the generation module (Generator) and the discriminator module (Discriminator), which are respectively recorded as G0 and D0; the system accepts an input picture of a batch (that is, a batch) (the number of inputs is the value of batch size), wherein Each input is a pair of maps, the original image and the corresponding real sample image. Then, the original image is sent to the generating module G0, and after a series of data processing, a new image is generated, that is, an image is generated.

Secondly, the real sample image is marked as a positive class, and the generated image is marked as a negative class, and the two are input as a training sample to the discriminating module D0. At this time, the generating module G0 is fixed, and the parameter of the discriminating module D0 is updated according to the calculation result of the loss function, so that D0 is updated to a new state and recorded as D1. Then D1 is fixed, and the parameters of the generating module G0 are updated according to the calculation result of the loss function, so that G0 is updated to a new state, which is denoted as G1. In this way, the generating module G and the discriminating module D are alternately trained throughout the training process, so that the calculation result of the loss function satisfies the preset requirement, and the two achieve the optimal state.

It can be understood that after the model is trained in the above manner, the initial element image can be input into the model as the original image in the training process, and the generated image generated by the model is the target element image desired by the user.

In the process of training the model, an unsupervised learning method is adopted, and the two neural networks, the generation module and the discriminating module, are learned from each other. The output of the generated module needs to mimic the real sample image in the training set as much as possible; and the purpose of the discriminating module is to distinguish the real sample image from the generated image. The two modules compete against each other and constantly adjust the parameters. The final purpose is to make the discriminating module unable to judge whether the output result of the generating module is true.

It can be understood that, in the process of model training, the loss function can be calculated according to the generated image and the real sample image, thereby outputting the discrimination result according to the calculation result of the loss function, reflecting the degree of difference between the generated image and the real sample image. The parameters in the element image generation model can be adjusted according to the calculation result of the loss function.

In the element image generation network employed in the embodiments of the present application, the loss function can have various options. For example, an anti-loss function can be included. The confrontation loss function is used to reflect the contribution of the generation module to the degree of difference reduction and the contribution of the discriminant module to the degree of difference. It can be understood that the generation module in the model continuously generates a new generated image, and it is hoped to pass the evaluation of the discriminating module. The discriminating module hopes to correctly resolve the generated generated image (marked as a negative class) and the real sample image (marked as a positive class). Therefore, the anti-loss function can be expressed by the following formula:

Where pdata(x) represents the target data distribution and pz(z) is a raw data distribution. In the image generation task, pdata(x) represents the real sample image to be input to the discriminating module, pz(z) is the original image of the input generating module, and G(z) represents the generated image generated by the model and input into the discriminating module.

Since the generation module G and the discrimination module D are alternately trained in a mutually opposing manner. The generating module G continuously generates a new generated image in an attempt to pass the evaluation of the discriminating module D; the discriminating module D attempts to accurately distinguish the real sample image and generate the image. Therefore, in the process of training the model, the discriminating module D gives a higher score to the real sample image, and gives a lower score to the generated generated image, that is, D has a maximum D(x), minimizes D. (G(z)) trend. Therefore the discriminating module D will maximize the anti-loss function V(D, G). The generation module G attempts to generate a real image, so there is a tendency to maximize D(G(z)), thus minimizing the anti-loss function V(D, G).

In addition, the feature space loss function (Perceptual Loss, also known as Perceptual Loss) can be introduced as a loss function in model training. The feature space loss function is used to reflect the difference in feature space between the generated image and the real sample image. The feature space refers to the space corresponding to the high-dimensional features rich in semantic information after the image is passed through a deep neural network. Unlike the pixel space, the vector of the feature space contains more advanced semantic information, so it can be used to measure the difference between the generated image and the real sample image.

In the specific implementation, the perceived loss can be calculated by using various methods such as AlexNet, ResNet, and VGG19. Here, the VGG19 is taken as an example for detailed description.

You can choose the VGG19 model that has been trained on the ImageNet dataset (a very large image dataset containing 1000 classes). VGG19 is a typical neural network model consisting of multiple convolutional layers Conv, pooling layer pooling and fully connected layer FC. The structure of the convoluted layer portion is shown in Fig. 10.

In the training process, the generated image and the real sample image are respectively passed through the VGG19 network, and the features of some convolutional layers are selected, and the L1 distance between the two features is calculated as the perceived loss. The VGG19 model has many convolutional layers because there are also many options. It is recommended to select convolutional layers of different depths. For example, as illustrated in Figure 10, the characteristics of the five convolutional layer outputs of the convolutional layers Conv1_2, Conv2_2, Conv3_2, Conv_4_2, and Conv5_2 can be selected to calculate the perceived loss. The calculation formula is expressed as follows:

Where Φ represents the VGG19 network model. l is the selected convolutional layer, real represents the real sample image, and fake represents the generated image. λ _l is the weight at which each convolution layer calculates the perceived loss. ||.|| ₁ means calculating the L1 distance. Finally, the overall perceived loss Ploss is the weighted sum of the perceptual layers calculated perceptual loss. The weight of each layer can be set differently. Considering that the closer to the output, the closer the information captured by the convolutional layer is to the bottom layer, the higher the degree of abstraction. Therefore, it is recommended to use the principle that the weight of the convolution layer near the input is smaller than the weight of the convolution layer near the output to determine the weight of each layer. . For example, in the example given in Figure 10, the weights can be set to: λ ₁ = λ ₂ = λ ₃ = λ ₄ =1, λ ₅ = 10.

Further, the pixel space loss function may also be used to reflect the difference between the target element image and the initial element image at the corresponding pixel point. Since the purpose of the element image generation module is to generate a generated image as much as possible as a real sample image. Therefore, to measure the difference between the generated image and the real sample image, you can also compare the difference between the corresponding pixel points, that is, the difference in the pixel space. Specifically, the L1 distance between the real sample image and the generated image can be calculated as a calculation result of the pixel space loss function.

In addition to this, the category loss function can also be used to reflect the difference in categories between the target element image and the initial element image. There are many training methods for the element image generation model, including single-stage training mode and multi-stage training mode. Taking the training font generation model as an example, single-stage training refers to directly generating a target font using a source font. The multi-stage training method is divided into a pre-training phase and a re-training phase. Multi-stage training In the pre-training phase, a source font is fixed, and a plurality of target fonts are generated using the source font; in the subsequent retraining phase, the source font is used to generate a target font. The method is called a one-to-many training method.

Since multiple fonts are involved in multi-stage training, it is desirable that the discriminating module can correctly predict the type of image (font) in addition to whether the image is a real sample image or an image. Therefore, the category loss function is introduced, which is embodied as the cross entropy between the real category and the predicted category, as shown in the following formula:

After the training element image generation model described above, the quality of the generated generated image can also be tested and evaluated. For example, the degree to which the generated image matches the real sample image can be determined based on one or more of the following metrics:

Generating the L1 distance between the image and the real sample image;

Generating a peak signal to noise ratio PSNR between the image and the real sample image;

A structural similarity SSIM between the generated image and the real sample image.

Specifically, the L1 distance between all the real sample images and the generated images in the test set can be calculated and averaged. It can be understood that the smaller the L1 distance, the closer the generated image is to the real sample image, indicating that the quality of the generated image is higher.

Specifically, the peak signal to noise ratio PSNR between all real sample images and generated images in the test set can be calculated. PSNR is a common method of measuring image quality, and its calculation formula is:

Wherein, I and J represent two images (specifically, a generated image and a real sample image), ||.|| ₂ represents an L2 distance, P represents a peak value, and generally a 3-channel 8-bit image P=255. It can be understood that the higher the peak signal to noise ratio PSNR, the better the quality of the generated font.

Specifically, the structural similarity SSIM between the generated image and the real sample image can be calculated. Structural Similarity SSIM measures the difference between two images (ie, the generated image and the real sample image) from three perspectives: structural correlation, contrast, and brightness. The SSIM formula for calculating two images can be expressed as:

SSIM=l(X,Y)*c(X,Y)*s(X,Y)

Where μ _x , μ _y , σ _x , σ _y are the mean and the standard deviation of the images X and Y, and σ _xy is the covariance of the two. C ₁ , C ₂ and C ₃ are constants.

In a specific implementation, it is preferable to test and evaluate the quality of the generated generated image in combination with the above various indicators. The following is an example of a font that is embodied as a text in an element image to illustrate the evaluation process of the font quality. In the preparation stage, a single word recognition model with good font recognition performance trained on the real font data set can be prepared. The following will be based on the single word recognition model for font quality evaluation.

In the first step, the font image generated by the embodiment of the present application (that is, the generated image) is identified by a single word model. If the generated words can be correctly identified, the newly generated words have been initially correct in terms of glyphs, strokes, and structures. A portion of the words that are not correctly recognized can be filtered based on the recognition result.

In the second step, for the words identified by the single-sub-recognition model, the three indicators of L1 distance, PSNR and SSIM of each generated word can be further calculated, and the distribution of the results of the three indicators is obtained. Then use the average of each distribution as the threshold: for the distribution of the results of the PSNR and SSIM indicators, filter out the words below the average; the distribution of the results for the L1 distance indicator, filter out the average Word.

In the third step, a manual evaluation is performed on all the generated words filtered by the first two steps, and the quality of the subject is considered to be good, so that the threshold of each index can be adjusted, and then the words are determined by the adjusted threshold. It is a good quality word.

The fourth step is to measure the total number of good quality words in the data set or the proportion of the generated data in the verification data set. The more the number and/or the higher the ratio, the higher the quality of the generated font. The better the font generation model.

The above method of evaluating font quality does not calculate an indicator on a sample set, but starts from the distribution of all generated words on a certain index, combining the structure of the single word recognition model on the structure of the font, the stroke, and the like. The recognition ability is used for screening, and manual evaluation is introduced to adjust the threshold of each index, thereby realizing an interactive evaluation of the quality of the generated fonts, combining the advantages of subjective evaluation and objective evaluation, and being able to reflect the model generation more accurately and comprehensively. The overall quality of the font. On this basis, the parameters in the model can also be adjusted according to the evaluation situation.

The specific implementation process of the element image generating method provided by the embodiment of the present application is mainly described above. When the element image is specifically a Chinese character font, the element image generating method provided by the embodiment of the present application is embodied as a Chinese character font image generating method, which may include the following steps:

Generating a first feature map based on the initial Chinese font image;

Generating a second feature map based on the first feature map;

Generating a target Chinese character font image corresponding to the initial Chinese character font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The step of iteratively generating the second feature map includes the following steps at least once:

The second feature map is generated based on the first feature map and the downsampled Chinese font image, and the downsampled Chinese font image is obtained by sampling the initial Chinese font image, and the downsampled Chinese font image matches the spatial size of the first feature map. .

It can be understood that the related description in the foregoing embodiments of the element image generating method is applicable to the Chinese character font image generating method, and details are not described herein again.

When the element image is specifically a text font, the element image generating method provided by the embodiment of the present application is embodied as a text font image generating method, and may include the following steps:

Generating a first feature map based on the initial text font image;

Generating a second feature map based on the first feature map;

Generating a second feature map based on the first feature map and the downsampled text font image, wherein the downsampled text font image is obtained by sampling the initial text font image, and the downsampled text font image matches the spatial size of the first feature image .

It can be understood that the related description in the foregoing embodiments of the element image generating method is applicable to the text font image generating method, and details are not described herein again.

Correspondingly, the embodiment of the present application further provides an element image generating apparatus, as shown in FIG.

a feature map first generating unit 101, configured to generate a first feature map based on the initial element image;

a feature map second generating unit 103, configured to generate a second feature map based on the first feature map;

The target element image generating unit 105 is configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The at least one of the feature map second generating unit 103 is further configured to:

The above element image generating device corresponds to the element image generating method in the foregoing embodiment. The description in the foregoing method embodiments is applicable to the device, and details are not described herein again.

At the same time, the specific implementation manner of each step is given in the embodiment of the present application. It can be understood that the steps may be implemented in other manners, which is not limited in this embodiment of the present application.

The embodiment of the present application further provides an element image generating system. As shown in FIG. 12, the system includes an element image generating module. The element image generating module includes an encoding sub-module and a decoding sub-module. The encoding sub-module includes a step-by-step connection. The M-level coding unit, the decoding sub-module includes M-level decoding units connected step by step, and M is a natural number;

Preferably, the above element image generating system further includes:

among them,

The real sample image corresponds to the original image and constitutes an element image pair;

The degree of difference is used to alternately adjust the parameters in the element image generating module and the discriminating module so that the degree of difference satisfies the preset condition.

It can be understood that the above-described element image generating system corresponds to the element image generating method in the foregoing embodiment. The descriptions in the foregoing method embodiments are applicable to the system, and are not described herein again.

FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to FIG. 13, at the hardware level, the electronic device includes a processor, optionally including an internal bus, a network interface, and a memory. The memory may include a memory, such as a high-speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended) Industry Standard Architecture, extending the industry standard structure) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one double-headed arrow is shown in Figure 13, but it does not mean that there is only one bus or one type of bus.

Memory for storing programs. In particular, the program can include program code, the program code including computer operating instructions. The memory can include both memory and non-volatile memory and provides instructions and data to the processor.

The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to form an element image generating device on a logical level. The processor executes the program stored in the memory and is specifically used to perform the following operations:

Generating a first feature map based on the initial element image;

Generating a second feature map based on the first feature map;

A second feature map is generated based on the first feature map and the downsampled element image, and the downsampled element image is obtained by sampling the initial element image, and the downsampled element image matches the spatial size of the first feature map.

The method performed by the element image generating apparatus disclosed in the embodiment shown in FIG. 1 of the present application may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software. The above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; or may be a digital signal processor (DSP), dedicated integration. Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software modules can be located in random memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, etc., which are well established in the art. The storage medium is located in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the above method.

The electronic device can also perform the method performed by the element image generating device in FIG. 1 and implement the functions of the element image generating device in the embodiment shown in FIG. 1. The embodiments of the present application are not described herein again.

The embodiment of the present application further provides a computer readable storage medium storing one or more programs, the one or more programs including instructions that are executed by an electronic device including a plurality of applications The electronic device can be caused to perform the method performed by the element image generating apparatus in the embodiment shown in FIG. 1, and is specifically configured to execute:

Generating a first feature map based on the initial element image;

Generating a second feature map based on the first feature map;

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.

Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device including the element.

Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The above description is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims

An element image generating method, comprising:

Generating a first feature map based on the initial element image;

Generating a second feature map based on the first feature map;

Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;

Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The step of generating the second feature map by the iterative process includes the following steps at least once:

Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
The method according to claim 1, wherein the method is performed by an element image generating module; the element image generating module includes an encoding sub-module, and the encoding sub-module includes M-level coding units connected in stages, M is a natural number;

The step of generating a first feature map based on the initial element image is performed by the first level coding unit;

The step of generating the second feature map is performed by the second level coding unit to the Mth level coding unit.
The method according to claim 2, wherein the step of generating a second feature map based on the first feature map and the downsampled element image, at least one level of coding from the second level coding unit to the Mth level coding unit Unit execution.
The method according to claim 2 or 3, wherein the method further comprises:

Determining an element image pair including an original image and a real sample image corresponding to the original image;

Generating, by the element image generating module, a generated image corresponding to the original image based on the original image;

Determining, by the discriminating module, a degree of difference between the generated image and the real sample image; wherein the generated image is marked as a negative class, and the real sample image is marked as a positive class;

The parameters in the element image generating module and the discriminating module are alternately adjusted according to the degree of difference until the degree of difference satisfies a preset condition.
The method of claim 4, wherein determining a degree of difference between the generated image and the real sample image comprises:

Calculating a loss function according to the generated image and the real sample image;

A discrimination result is output according to a calculation result of the loss function, the discrimination result being used to reflect a degree of difference between the generated image and the real sample image.
The method of claim 5 wherein said loss function comprises a feature space loss function for reflecting a difference in feature space between said generated image and said real sample image.
The method according to claim 5, wherein said loss function comprises an anti-loss function for reflecting a degree of contribution of said element image generating module to reducing said degree of difference and said discriminating module pair Increase the contribution of the degree of difference.
The method of claim 5 wherein said loss function comprises at least one of the following:

a pixel space loss function for reflecting a difference between the generated image and the real sample image at a corresponding pixel point;

A class loss function for reflecting a difference in categories between the generated image and the real sample image.
The method according to claim 4, wherein after the generating the image corresponding to the original image based on the original image, the method further comprises:

The generated generated image is tested to determine the degree of matching of the generated image with the real sample image.
The method according to claim 9, wherein the generated image generated by the element image generation model is tested, and the degree of matching between the generated image and the real sample image is determined, including:

Determining the degree of matching of the generated image with the real sample image according to at least one of the following indicators:

An L1 distance between the generated image and the real sample image;

a peak signal to noise ratio PSNR between the generated image and the real sample image;

The structural similarity SSIM between the generated image and the real sample image.
The method according to claim 9, wherein after determining the degree of matching between the generated image and the real sample image, the method further comprises:

Adjusting parameters in the element image generating module according to a degree of matching between the generated image and the real sample image.
The method according to any one of claims 1 to 3, 5 to 11, wherein the element image is a text font.
An element image generating apparatus, wherein the apparatus comprises:

a feature map first generating unit, configured to generate a first feature map based on the initial element image;

a second generation unit of the feature map, configured to generate a second feature map based on the first feature map;

a target element image generating unit, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The at least one of the second generation unit of the feature map is further configured to:

Generating a second feature map based on the first feature map and the downsampled element image; the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.
An element image generating system, wherein the system includes an element image generating module, the element image generating module includes an encoding sub-module and a decoding sub-module; and the encoding sub-module includes M-level encoding connected step by step. a unit, the decoding sub-module includes M-level decoding units connected in stages, and M is a natural number;

a first level coding unit, configured to generate a first feature map based on the initial element image;

a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;

And a decoding submodule, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.
The system of claim 14 wherein said system further comprises:

An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image;

among them,

The generated image is generated by the element image generation module based on the original image;

The real sample image corresponds to the original image to form an element image pair;

The degree of difference is used to alternately adjust parameters in the element image generating module and the discriminating module such that the degree of difference satisfies a preset condition.
An electronic device, comprising:

Processor;

A memory arranged to store computer executable instructions that, when executed, cause the processor to perform the following operations:

Generating a first feature map based on the initial element image;

Generating a second feature map based on the first feature map;

Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;

Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The step of generating the second feature map by the iterative process includes the following steps at least once:

Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
A computer readable storage medium, wherein the computer readable storage medium stores one or more programs, the one or more programs causing the electronic device when executed by an electronic device comprising a plurality of applications The device does the following:

Generating a first feature map based on the initial element image;

Generating a second feature map based on the first feature map;

Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;

Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The step of generating the second feature map by the iterative process includes the following steps at least once:

Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
A method for generating a Chinese character font image, comprising:

Generating a first feature map based on the initial Chinese font image;

Generating a second feature map based on the first feature map;

Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;

Generating a target kanji font image corresponding to the initial kanji font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The step of generating the second feature map by the iterative process includes the following steps at least once:

Generating, according to the first feature map and the downsampled Chinese font image, the second feature map, wherein the downsampled Chinese font image is obtained by sampling the initial Chinese font image, and the downsampled Chinese font image and the first The spatial dimensions of a feature map match.
A method for generating a text font image, comprising:

Generating a first feature map based on the initial text font image;

Generating a second feature map based on the first feature map;

Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;

Generating a target text font image corresponding to the initial text font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;

The step of generating the second feature map by the iterative process includes the following steps at least once:

Generating, according to the first feature map and the downsampled text font image, the second feature map, wherein the downsampled text font image is obtained by sampling the initial text font image, and the downsampled text font image and the first The spatial dimensions of a feature map match.