WO2019196718A1 - Element image generation method, device and system - Google Patents

Element image generation method, device and system Download PDF

Info

Publication number
WO2019196718A1
WO2019196718A1 PCT/CN2019/081217 CN2019081217W WO2019196718A1 WO 2019196718 A1 WO2019196718 A1 WO 2019196718A1 CN 2019081217 W CN2019081217 W CN 2019081217W WO 2019196718 A1 WO2019196718 A1 WO 2019196718A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature map
generating
element image
feature
Prior art date
Application number
PCT/CN2019/081217
Other languages
French (fr)
Chinese (zh)
Inventor
孙东慧
张庆
唐浩超
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019196718A1 publication Critical patent/WO2019196718A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Definitions

  • the present application relates to the field of computer technology, and in particular, to an element image generation method, apparatus, and system.
  • an element image library When constructing an image library of graphic elements (referred to as an element image library), it is often necessary to create a series of image images of the same style, and it is necessary for the designer to design and manufacture each element image in the element image library one by one, which is time-consuming and labor-intensive. .
  • the size of the Chinese character library is very large.
  • the GB2312 national standard code contains 6,763 commonly used Chinese characters
  • the GBK encoding program contains 21,886 Chinese characters
  • the latest GB18030 national standard code contains more than 70,044 Chinese characters. Since each new font is designed and produced by the font designer, it is necessary to repeat the labor of each character in the font to make it have the same style. Therefore, the workload is very heavy. Big.
  • the embodiment of the present application provides an element image generation method and apparatus, aiming to efficiently and accurately generate an element image, so as to improve the efficiency of constructing an element image library and improve the accuracy of the constructed element image.
  • an embodiment of the present application provides an element image generating method, including:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  • the method is performed by an element image generating module;
  • the element image generating module includes an encoding sub-module, and the encoding sub-module includes M-levels connected step by step.
  • Coding unit, M is a natural number;
  • the step of generating a first feature map based on the initial element image is performed by the first level coding unit;
  • the step of generating the second feature map is performed by the second level coding unit to the Mth level coding unit.
  • the step of generating a second feature map based on the first feature map and the downsampled element image is performed by at least a second level coding unit to an Mth level coding unit
  • the primary coding unit performs.
  • the method further includes:
  • the parameters in the element image generating module and the discriminating module are alternately adjusted according to the degree of difference until the degree of difference satisfies a preset condition.
  • determining a degree of difference between the generated image and the real sample image includes:
  • a discrimination result is output according to a calculation result of the loss function, the discrimination result being used to reflect a degree of difference between the generated image and the real sample image.
  • the loss function includes a feature space loss function for reflecting a feature space between the generated image and the real sample image The difference.
  • the loss function includes an anti-loss function for reflecting a degree of contribution of the element image generating module to reducing the degree of the difference and the The degree to which the discriminating module contributes to increasing the degree of the difference.
  • the loss function comprises at least one of the following:
  • a pixel space loss function for reflecting a difference between the generated image and the real sample image at a corresponding pixel point
  • a class loss function for reflecting a difference in categories between the generated image and the real sample image is a class loss function for reflecting a difference in categories between the generated image and the real sample image.
  • the method further includes:
  • the generated generated image is tested to determine the degree of matching of the generated image with the real sample image.
  • the generated image generated by the element image generating model is tested, and the matching degree of the generated image with the real sample image is determined, including:
  • the structural similarity SSIM between the generated image and the real sample image is the structural similarity SSIM between the generated image and the real sample image.
  • the method further includes:
  • the element image is a text font.
  • an embodiment of the present application provides an element image generating apparatus, where the apparatus includes:
  • a feature map first generating unit configured to generate a first feature map based on the initial element image
  • a second generation unit of the feature map configured to generate a second feature map based on the first feature map
  • a target element image generating unit configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
  • the at least one of the second generation unit of the feature map is further configured to:
  • the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.
  • an embodiment of the present application provides an element image generating system, where the system includes an element image generating module, where the element image generating module includes an encoding sub-module and a decoding sub-module; a connected M-level coding unit, where the decoding sub-module includes M-level decoding units connected in stages, and M is a natural number;
  • a first level coding unit configured to generate a first feature map based on the initial element image
  • a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;
  • a decoding submodule configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  • the system further includes:
  • An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image
  • the generated image is generated by the element image generation module based on the original image
  • the real sample image corresponds to the original image to form an element image pair
  • the degree of difference is used to alternately adjust parameters in the element image generating module and the discriminating module such that the degree of difference satisfies a preset condition.
  • an electronic device including:
  • a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the following operations:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  • an embodiment of the present application provides a computer readable storage medium, where the one or more programs are stored, when the one or more programs are executed by an electronic device including multiple applications. , causing the electronic device to perform the following operations:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  • the embodiment of the present application provides a method for generating a Chinese character font image, including:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the embodiment of the present application provides a text font image generating method, including:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the step of generating the second feature map is iteratively performed to obtain a plurality of second feature maps. And, during the iterative execution, the downsampled element image is introduced at least once as supplementary information for generating the second feature map. Based on this, a target element image corresponding to the initial element image is generated based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  • the technical solution provided by the embodiment of the present application can not only efficiently expand the image of the target element of different styles according to the initial element image, but also improve the efficiency of constructing the element image library, and can also reduce the information loss in the data processing process, thereby Conducive to improving the accuracy of the generated target element image.
  • FIG. 1 is a schematic flowchart diagram of an element image generating method according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an element image generation model used in an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a coding unit in an element image generation model used in an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a decoding unit in an element image generation model used in an embodiment of the present application
  • FIG. 5 is a schematic flowchart of generating an image of a target element in an embodiment of the present application
  • FIG. 6 is a schematic diagram of a processing procedure of a first-level coding unit in an element image generation model used in an embodiment of the present application;
  • FIG. 7 is a schematic diagram of a processing procedure of coding units of each level in an element image generation model used in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an element image generation model training network used in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a discriminating module in an element image generation model training network used in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a model for calculating a feature space loss function according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an element image generating apparatus according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an element image generating system according to an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • CNN Convolutional Neural Network
  • Feature Map A representation of a feature output by a convolutional layer, a pooled layer, a fully connected layer, or other layers in the network.
  • An embodiment of the present application provides an element image generating method.
  • the element image to which the embodiment of the present application is applied may include various graphic elements such as a font of a character, a mark symbol on a musical score, and a cartoon character shape.
  • the purpose of the embodiment of the present application is to select a style according to a small part of a sample of an element image (it can be understood that this part of the sample is usually designed by the user one by one, which can be used as a basis for determining the style feature), and automatically expands the generation and the element image collection.
  • the other element image corresponds to the new element image, so that the generated new element image is consistent with the style of the user-designed sample, so that different styles of element image collection can be efficiently and accurately generated, and different styles of graphics can be realized. Automatic expansion of collections.
  • the element image is embodied as a cartoon image.
  • the cartoon image of the animal can be manually designed according to the original image of some animals, and the original image and the cartoon image of the animal (as a real sample image) are input as input elements.
  • An image generation model also referred to as an element image generation module
  • the image of the other animal can be input as the initial element image into the above-mentioned trained element image generation model, and other cartoon images unified with the style of the hand design can be automatically generated.
  • the element image is embodied as a Chinese font.
  • Each Chinese font requires a corresponding font library.
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts (for example, 1000 new font Chinese characters) to determine the style of the new font, and then use the original font of this part of the Chinese characters (for example, Song).
  • the new image of this part of the Chinese character is used as the real sample image, and the element image generation model is trained.
  • the original font of the specific Chinese character of the new font (for example, Song) is used as the initial element image, and the model is generated by the trained element image (also referred to as element image generation module, element image generation network) to generate the remaining Chinese characters.
  • the new font which can efficiently generate new fonts for Chinese characters, improves the efficiency of building font libraries.
  • the element image generation model (which can be embodied as the font generation model in the above application scenario) used in the embodiment of the present application receives the initial element image multiple times (can be embodied as the original font in the above application scenario), Accepting the original initial element image, and also accepting the downsampled element image after downsampling the original initial element image as supplementary information, thereby reducing information loss during data processing, and improving the generated target element image (may be Condensed into the accuracy of the new fonts in the above application scenarios.
  • an embodiment of the present application provides an element image generation method, which may include:
  • S101 Generate a first feature map based on the initial element image.
  • S103 Generate a second feature map based on the first feature map.
  • the second feature map is used as a new first feature map, and the step S103 is performed iteratively to generate a second feature map to obtain a plurality of second feature maps.
  • S107 Generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
  • the step of generating the second feature map includes, at least once:
  • the downsampled element image is obtained by sampling the initial element image, and the downsampled element image is matched with the spatial size of the first feature map.
  • the initial element image can be determined first.
  • the determined initial element image can be understood as the basis for generating the target element image, and the generated target element image corresponds to the initial element image.
  • the correspondence between the target element image and the initial element image may be different, and specifically may be implemented by a trained element image generation model (also referred to as an element image generation module).
  • a trained element image generation model also referred to as an element image generation module.
  • the corresponding target element image and the initial element image reflect different fonts (styles) of the same Chinese character.
  • the initial element image when determining the initial element image, it can also be understood as determining a batch of images that are processed into a single image input model for processing according to the batch size.
  • the batch size refers to the number of images that are input into the model for processing in a single time, and is a parameter in a model constructed based on a neural network. For example, if the value of batch size is 1, then only one initial element image is input to the model for processing at a time; the value of batch size is taken as 16, and 16 initial element images are input to the model as a batch each time.
  • the generated target element image will also be 16 sheets.
  • the above steps S101 to S107 may be performed to generate a target element image corresponding to the initial element image.
  • the model element may be generated based on the trained element image, and the target element image corresponding to the initial element image is generated according to the downsampled element image after the downsampled processing of the initial element image and the initial element image.
  • the step of generating the second feature map is iteratively performed to obtain a plurality of second feature maps.
  • the downsampled element image is introduced as supplementary information at least once, for generating the second feature map, and generating the second feature map based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  • the technical solution provided by the embodiment of the present application can not only efficiently expand the image of the target element of different styles according to the initial element image, but also improve the efficiency of constructing the element image library, and can also reduce the information loss in the data processing process, thereby Conducive to improving the accuracy of the generated target element image.
  • the model can accept the original initial element image and accept the downsampled element image after the downsampling process at least once.
  • the target element image including the new element image can be efficiently and accurately generated based on the initial element image, and the information loss in the data processing process can be reduced. Conducive to improving the accuracy of the generated target element image.
  • an element image is embodied as a text font is used as an example, and a specific frame of an element image generation model (element image generation module) is illustrated, and an element provided by an embodiment of the present application is described in detail in combination with an element image generation model.
  • FIG. 2 is a schematic diagram showing the framework of an element image generation model suitable for the embodiment of the present application.
  • the element image generation model shown in Fig. 2 includes a generation module including an encoding sub-module (Encoder) and a decoding sub-module (Decoder).
  • the input of the model is an initial element image
  • the output of the model is a target element image corresponding to the initial element image.
  • the input initial element image is the text of the original font (for example, the " ⁇ " character in bold in FIG. 2)
  • the output target element image is the text of the new font (for example, in FIG. 2)
  • the word " ⁇ " in the body for example, the " ⁇ " character in bold in FIG. 2
  • the encoding sub-module in the above element image generation model is used to convert the initial element image into a high-dimensional feature map
  • the decoding sub-module is used to convert the high-dimensional feature image into a new image, that is, the output Target element image. It can be understood that the processing process of the data by the encoding sub-module and the decoding sub-module is symmetric and reciprocal.
  • the coding sub-module includes M-level coding units connected in stages, and M is a natural number.
  • the decoding sub-module includes M-level decoding units connected in stages, and M is a natural number. Taking the model shown in Figure 2 as an example, the value of M is 8.
  • the step of generating the first feature map based on the initial element image is performed by the first level coding unit (such as the coding unit e1 in FIG. 2).
  • the first-level coding unit generates a first feature map based on the image of the initial element, and outputs the first feature map to the second-level coding unit (such as the coding unit e2 in FIG. 2).
  • the first level coding unit may also output the first feature map to the decoding unit corresponding to the first level coding unit in the decoding submodule (such as the decoding unit d7 in FIG. 2).
  • the steps of generating the second feature map based on the first feature map (and the downsampled element image) in the above steps S103 to S105 are performed by the second level encoding unit to the Mth level encoding unit. And, at least one coding unit in the second level coding unit to the Mth level coding unit (for example, at least one coding unit in the coding units e2 to e8 in FIG. 2) is generated based on the previous level when generating the second feature picture.
  • the downsampled element image is obtained based on the initial element image, and the downsampled element image is matched with the spatial size of the feature map output by the previous coding unit.
  • each coding unit in the coding sub-module has the same structure and is connected step by step in order, and each coding unit outputs a feature map to the next-stage coding unit (the feature picture outputted by the last-level coding unit will be first
  • the level decoding unit accepts).
  • the coding unit may be composed of a plurality of convolutional layers Conv, a drain modified linear cell layer LReLU, and a bulk normalization layer BN.
  • the image or feature map input to the coding unit may be processed and output through the leakage improvement linear unit layer LReLU, the convolution layer Conv, and the batch normalization layer BN, as shown in FIG.
  • the convolutional layer Conv and the leaky modified linear unit layer LReLU can usually be paired (for example, designed as Conv-LReLU-Conv-LReLU or LReLU-Conv- In the form of LReLU-Conv), the batch normalization layer BN can also be designed at the end of the coding unit.
  • Each decoding unit in the decoding sub-module has the same structure and is also connected in stages in order. And, the number of decoding units is the same as the number of coding units.
  • the decoding unit may be composed of a modified linear unit layer ReLU, a deconvolution layer Deconv, a bulk normalization layer BN, and a gradient falling layer Dropout.
  • the feature map input to the decoding unit may be processed and outputted through the modified linear unit layer ReLU, the deconvolution layer Deconv, the batch normalization layer BN, and the gradient falling layer Dropout, as shown in FIG. Similarly, the number of layers and the order between the layers can be adjusted, and the processing of the feature map can be realized, and finally the image of the target element can be generated.
  • the spatial size of the output feature map is reduced to half of the spatial size of the input feature image for each processing of the feature map by one level of coding unit (specifically, the width of the feature map is reduced by half, And the height is reduced by half).
  • the spatial size of the output feature map will be increased to twice the spatial size of the input feature image after each processing of the feature map by one level of decoding unit (specifically, the width of the feature map is increased). It is twice and the height is doubled).
  • the coding unit and the decoding unit satisfying the condition may be associated with each other. It can be called a symmetric pair. Specifically, it can be described that the coding unit and the decoding unit having the same spatial size of the output feature map have a corresponding relationship.
  • the size representation of the image can be understood as: the number of pixels in the width and height of the image is 256, which is represented by the RGB color model, so for each Pixels are represented by three-dimensional attribute data).
  • the spatial size of the feature map output by the 8-level coding unit included in the coding sub-module shown in FIG. 2 is as shown in Table 1.
  • the size of the image is changed from 256 ⁇ 256 of the initial element image to the encoding unit e1 (first-level coding unit). 128 ⁇ 128, which is processed by the coding unit e2 (second-level coding unit) to become 64 ⁇ 64, and so on, until it is processed to 1 ⁇ 1 after being processed via the coding unit e8 (eight-level coding unit).
  • the number of convolution layers in each coding unit may be different, resulting in different values of channels in each level of output.
  • the number of convolution layers preferably increases with the number of stages of the coding unit.
  • the spatial size of the feature map output by the 8-level decoding unit included in the decoding sub-module shown in Fig. 2 is as shown in Table 2 below.
  • the structure of the element image generation model and the correspondence between the coding unit and the decoding unit are briefly explained above.
  • a target corresponding to the initial element image is generated according to the downsampled element image after the downsampled processing of the initial element image and the initial element image.
  • the element image may specifically include the following steps, as shown in FIG. 5:
  • S1031 Input the initial element image and the downsampled element image into the encoding submodule.
  • the downsampled element image after the downsampled processing of the initial element image is also input into the encoding submodule, which can reduce information loss in the data processing process, thereby generating a more accurate target element image.
  • the first level coding unit (for example, the coding unit e1 in FIG. 2) can directly accept the original initial element image, as shown in FIG. 6.
  • Any Nth-level coding unit after the first-stage coding unit (N may be a natural number greater than 1 and not greater than M), and may be subjected to down-sampling processing in addition to the feature map outputted by the coding unit of the previous-stage coding unit.
  • the image of the downsampled element is shown in Figure 7.
  • the spatial size of the acceptable downsampled element image corresponds to the coding unit, and specifically, should be consistent with the spatial size of the feature image output by the coding unit received by the coding unit, so that the coding unit can After the two are merged, subsequent data processing is performed.
  • the downsampled element image can be input to each coding unit after the first stage coding unit (for example, coding units e2 to e8 in FIG. 2), or can be input only to the partial coding unit (for example, in FIG. 2).
  • the coding units e2 and e7) may be as long as the spatial size of the downsampled element image coincides with the spatial size of the upper-level feature map received by the accessed coding unit.
  • the spatial size of the feature map output by the encoding unit e1 to the encoding unit e2 is 128 ⁇ 128, and the spatial size of the downsampled element image of the input encoding unit e2 should also be processed to be 128 ⁇ 128.
  • layers of different depths of the model can supplement the information of the specific size of the initial element image, thereby facilitating the generation of high quality target element images.
  • the initial element image may be downsampled by using a plurality of methods such as bilinear interpolation, single interpolation, and nearest interpolation. This application does not limit this.
  • the coding unit combines the received feature map and the downsampled element image for subsequent processing.
  • the specific method of fusion may have various options, for example, superimposing on a specific dimension or performing a superposition operation on attribute values of corresponding pixel points.
  • the superposition is performed on the feature map channel dimension.
  • S1033 Output multiple feature maps to the decoding submodule by using the encoding submodule.
  • the coding unit and the decoding unit having the same spatial size of the output feature map have a corresponding relationship, and constitute a symmetric unit pair, for example, ⁇ e1, d7>, ⁇ e3, d5>, and the like.
  • the output unit of the corresponding relationship and the output of the decoding unit are directly connected. Therefore, each coding unit in the coding sub-module outputs the generated feature map to its next-stage coding unit when the feature map is output (the feature map generated by the last-stage coding unit is output to the first-stage decoding unit), And output to its corresponding decoding unit.
  • the first level coding unit (concrete to the coding unit e1) generates a first feature map according to the initial element image; the first level coding unit (concrete to e1) outputs the first feature map to the second level coding unit (concrete E2) and a decoding unit corresponding to the first level coding unit (concrete to the decoding unit d7);
  • the Kth order coding unit (concrete to any one of the coding units e2 to e7, for example, the coding unit e3) is output according to the downsampled element image and the K-1th coding unit (corresponding to the coding unit e2) a second feature map, generating a third feature map; the Kth level coding unit (eg, coding unit e3) outputs the third feature map to the K+1th level coding unit (corresponding to the coding unit e4) a decoding unit corresponding to a Kth-level coding unit (eg, coding unit e3) (corresponding to, specifically, decoding unit d5); wherein K is a natural number greater than 1 and less than M, a downsampled element image and a second feature
  • K is a natural number greater than 1 and less than M, a downsampled element image and a second feature
  • S1035 Generate a target element image corresponding to the initial element image according to the plurality of feature maps by using the decoding submodule.
  • the decoding units of each stage in the decoding sub-module are processed after receiving the feature map output by the decoding unit of the previous stage, and then the feature map outputted by the level is merged with the feature map transmitted by the corresponding coding unit, as the next level.
  • the input of the decoding unit is as follows:
  • the first level decoding unit (concrete to the decoding unit d1) generates a sixth feature map according to the fifth feature map output by the last level encoding unit ( embodied as the encoding unit e8), and outputs the sixth feature map to the second level decoding unit. Is the decoding unit d2);
  • the Lth stage decoding unit in the decoding submodule (specifically, any one of the decoding units d2 to d7, for example, the decoding unit d3) is output according to the L-1th stage decoding unit (corresponding to the decoding unit d2) a seventh feature map, generating an eighth feature map;
  • the Lth level decoding unit (eg, decoding unit d3) is a coding unit corresponding to the eighth feature map and the Lth level decoding unit (eg, decoding unit d3) (corresponding,
  • the ninth feature map outputted by the coding unit e5) is superimposed on the channel dimension to generate a tenth feature map, which is output to the L+1th level decoding unit (corresponding to the decoding unit d4);
  • L is a natural number greater than 1 and less than M;
  • the target element image corresponding to the initial element image is generated by the Mth stage decoding unit in the decoding submodule according to the eleventh feature map output by the M-1th stage decoding unit.
  • the coding unit in the coding sub-module accepts two kinds of output signals of the feature image output by the upper-level coding unit and the down-sampled element image after the down-sample processing of the initial element image, thereby being at different stages of the coding sub-module.
  • Information with initial element images flows in.
  • the decoding unit in the decoding sub-module accepts the feature map directly output by the corresponding coding unit in the coding sub-module, in addition to the feature map output by the upper-level decoding unit, thereby further reducing the image data processing process. Information loss.
  • the trained element image generation model may be used to generate a target element image corresponding to the initial element image.
  • the element image generation model may be used according to the initial element.
  • the image and the downsampled element image subjected to the downsampling process on the initial element image generate the target element image.
  • the target element image can be automatically generated according to the initial element image, so that the target element image of different styles can be efficiently expanded according to the initial element image, and the efficiency of constructing the element image library is improved.
  • the downsampled element image after the downsample processing of the initial element image is accepted as supplementary information to generate a feature map, thereby reducing the data.
  • the loss of information during processing facilitates the accuracy of the generated image of the target element.
  • the above example illustrates a specific implementation example of how to generate a target element image using the trained element image generation model.
  • the element image generation network including the element image generation module and the discrimination module may be specifically trained, as shown in FIG. 8.
  • An element image generating module (hereinafter may be simply referred to as a generating module) is configured to generate a generated image corresponding to the original image based on the original image; the discriminating module is configured to discriminate the degree of difference between the generated image and the real sample image, and adjust the element according to the degree of difference The parameters in the image generation network; wherein the real sample image corresponds to the original image, constituting an element image pair.
  • the discriminating module determines whether the input image is a real sample image or a generated image generated by the model.
  • the structure of the discriminating module can be as shown in FIG.
  • the generated image and the real sample image enter the discriminating module, and the first two layers are the convolution layer Conv and the leak improving linear unit layer LReLU, and then are normalized by three [convolution layer Conv-batch zeros).
  • the layer BN-drain improved linear unit layer LReLU] is formed by connecting in series.
  • the process of training the element image generation network may specifically include:
  • the generated image and the real sample image are used as a training sample input discriminating module, and the discriminating module determines the degree of difference between the generated image and the real sample image; wherein the generated image is marked as a negative class, and the real sample image is marked as a positive class;
  • the parameters in the generating module and the discriminating module are alternately adjusted until the degree of difference satisfies the preset condition.
  • the goal of the generation module is to make the generated generated image as real as possible, so that the discriminating module can be "spoofed” (that is, the discriminating module considers the generated image and the real There is no difference in the sample image, or the difference is small enough).
  • the goal of the discriminating module is to correctly distinguish the real sample image and the generated image. Therefore, the training module and the discriminating module can be alternately trained during training. details as follows:
  • the system initializes the generation module (Generator) and the discriminator module (Discriminator), which are respectively recorded as G0 and D0; the system accepts an input picture of a batch (that is, a batch) (the number of inputs is the value of batch size), wherein Each input is a pair of maps, the original image and the corresponding real sample image. Then, the original image is sent to the generating module G0, and after a series of data processing, a new image is generated, that is, an image is generated.
  • the real sample image is marked as a positive class
  • the generated image is marked as a negative class
  • the two are input as a training sample to the discriminating module D0.
  • the generating module G0 is fixed, and the parameter of the discriminating module D0 is updated according to the calculation result of the loss function, so that D0 is updated to a new state and recorded as D1.
  • D1 is fixed, and the parameters of the generating module G0 are updated according to the calculation result of the loss function, so that G0 is updated to a new state, which is denoted as G1.
  • the generating module G and the discriminating module D are alternately trained throughout the training process, so that the calculation result of the loss function satisfies the preset requirement, and the two achieve the optimal state.
  • the initial element image can be input into the model as the original image in the training process, and the generated image generated by the model is the target element image desired by the user.
  • an unsupervised learning method is adopted, and the two neural networks, the generation module and the discriminating module, are learned from each other.
  • the output of the generated module needs to mimic the real sample image in the training set as much as possible; and the purpose of the discriminating module is to distinguish the real sample image from the generated image.
  • the two modules compete against each other and constantly adjust the parameters.
  • the final purpose is to make the discriminating module unable to judge whether the output result of the generating module is true.
  • the loss function can be calculated according to the generated image and the real sample image, thereby outputting the discrimination result according to the calculation result of the loss function, reflecting the degree of difference between the generated image and the real sample image.
  • the parameters in the element image generation model can be adjusted according to the calculation result of the loss function.
  • the loss function can have various options.
  • an anti-loss function can be included.
  • the confrontation loss function is used to reflect the contribution of the generation module to the degree of difference reduction and the contribution of the discriminant module to the degree of difference. It can be understood that the generation module in the model continuously generates a new generated image, and it is hoped to pass the evaluation of the discriminating module. The discriminating module hopes to correctly resolve the generated generated image (marked as a negative class) and the real sample image (marked as a positive class). Therefore, the anti-loss function can be expressed by the following formula:
  • pdata(x) represents the target data distribution and pz(z) is a raw data distribution.
  • pdata(x) represents the real sample image to be input to the discriminating module
  • pz(z) is the original image of the input generating module
  • G(z) represents the generated image generated by the model and input into the discriminating module.
  • the generation module G and the discrimination module D are alternately trained in a mutually opposing manner.
  • the generating module G continuously generates a new generated image in an attempt to pass the evaluation of the discriminating module D; the discriminating module D attempts to accurately distinguish the real sample image and generate the image. Therefore, in the process of training the model, the discriminating module D gives a higher score to the real sample image, and gives a lower score to the generated generated image, that is, D has a maximum D(x), minimizes D. (G(z)) trend. Therefore the discriminating module D will maximize the anti-loss function V(D, G).
  • the generation module G attempts to generate a real image, so there is a tendency to maximize D(G(z)), thus minimizing the anti-loss function V(D, G).
  • the feature space loss function (Perceptual Loss, also known as Perceptual Loss) can be introduced as a loss function in model training.
  • the feature space loss function is used to reflect the difference in feature space between the generated image and the real sample image.
  • the feature space refers to the space corresponding to the high-dimensional features rich in semantic information after the image is passed through a deep neural network.
  • the vector of the feature space contains more advanced semantic information, so it can be used to measure the difference between the generated image and the real sample image.
  • the perceived loss can be calculated by using various methods such as AlexNet, ResNet, and VGG19.
  • the VGG19 is taken as an example for detailed description.
  • VGG19 is a typical neural network model consisting of multiple convolutional layers Conv, pooling layer pooling and fully connected layer FC. The structure of the convoluted layer portion is shown in Fig. 10.
  • the generated image and the real sample image are respectively passed through the VGG19 network, and the features of some convolutional layers are selected, and the L1 distance between the two features is calculated as the perceived loss.
  • the VGG19 model has many convolutional layers because there are also many options. It is recommended to select convolutional layers of different depths. For example, as illustrated in Figure 10, the characteristics of the five convolutional layer outputs of the convolutional layers Conv1_2, Conv2_2, Conv3_2, Conv_4_2, and Conv5_2 can be selected to calculate the perceived loss.
  • the calculation formula is expressed as follows:
  • represents the VGG19 network model.
  • l is the selected convolutional layer, real represents the real sample image, and fake represents the generated image.
  • ⁇ l is the weight at which each convolution layer calculates the perceived loss.
  • 1 means calculating the L1 distance.
  • the overall perceived loss Ploss is the weighted sum of the perceptual layers calculated perceptual loss.
  • the weight of each layer can be set differently. Considering that the closer to the output, the closer the information captured by the convolutional layer is to the bottom layer, the higher the degree of abstraction. Therefore, it is recommended to use the principle that the weight of the convolution layer near the input is smaller than the weight of the convolution layer near the output to determine the weight of each layer. .
  • the pixel space loss function may also be used to reflect the difference between the target element image and the initial element image at the corresponding pixel point. Since the purpose of the element image generation module is to generate a generated image as much as possible as a real sample image. Therefore, to measure the difference between the generated image and the real sample image, you can also compare the difference between the corresponding pixel points, that is, the difference in the pixel space. Specifically, the L1 distance between the real sample image and the generated image can be calculated as a calculation result of the pixel space loss function.
  • the category loss function can also be used to reflect the difference in categories between the target element image and the initial element image.
  • the element image generation model There are many training methods for the element image generation model, including single-stage training mode and multi-stage training mode. Taking the training font generation model as an example, single-stage training refers to directly generating a target font using a source font.
  • the multi-stage training method is divided into a pre-training phase and a re-training phase. Multi-stage training In the pre-training phase, a source font is fixed, and a plurality of target fonts are generated using the source font; in the subsequent retraining phase, the source font is used to generate a target font.
  • the method is called a one-to-many training method.
  • the discriminating module can correctly predict the type of image (font) in addition to whether the image is a real sample image or an image. Therefore, the category loss function is introduced, which is embodied as the cross entropy between the real category and the predicted category, as shown in the following formula:
  • the quality of the generated generated image can also be tested and evaluated.
  • the degree to which the generated image matches the real sample image can be determined based on one or more of the following metrics:
  • the L1 distance between all the real sample images and the generated images in the test set can be calculated and averaged. It can be understood that the smaller the L1 distance, the closer the generated image is to the real sample image, indicating that the quality of the generated image is higher.
  • PSNR peak signal to noise ratio
  • I and J represent two images (specifically, a generated image and a real sample image),
  • the structural similarity SSIM between the generated image and the real sample image can be calculated.
  • Structural Similarity SSIM measures the difference between two images (ie, the generated image and the real sample image) from three perspectives: structural correlation, contrast, and brightness.
  • the SSIM formula for calculating two images can be expressed as:
  • ⁇ x , ⁇ y , ⁇ x , ⁇ y are the mean and the standard deviation of the images X and Y, and ⁇ xy is the covariance of the two.
  • C 1 , C 2 and C 3 are constants.
  • a font that is embodied as a text in an element image to illustrate the evaluation process of the font quality.
  • a single word recognition model with good font recognition performance trained on the real font data set can be prepared. The following will be based on the single word recognition model for font quality evaluation.
  • the font image generated by the embodiment of the present application (that is, the generated image) is identified by a single word model. If the generated words can be correctly identified, the newly generated words have been initially correct in terms of glyphs, strokes, and structures. A portion of the words that are not correctly recognized can be filtered based on the recognition result.
  • the three indicators of L1 distance, PSNR and SSIM of each generated word can be further calculated, and the distribution of the results of the three indicators is obtained. Then use the average of each distribution as the threshold: for the distribution of the results of the PSNR and SSIM indicators, filter out the words below the average; the distribution of the results for the L1 distance indicator, filter out the average Word.
  • a manual evaluation is performed on all the generated words filtered by the first two steps, and the quality of the subject is considered to be good, so that the threshold of each index can be adjusted, and then the words are determined by the adjusted threshold. It is a good quality word.
  • the fourth step is to measure the total number of good quality words in the data set or the proportion of the generated data in the verification data set. The more the number and/or the higher the ratio, the higher the quality of the generated font. The better the font generation model.
  • the above method of evaluating font quality does not calculate an indicator on a sample set, but starts from the distribution of all generated words on a certain index, combining the structure of the single word recognition model on the structure of the font, the stroke, and the like.
  • the recognition ability is used for screening, and manual evaluation is introduced to adjust the threshold of each index, thereby realizing an interactive evaluation of the quality of the generated fonts, combining the advantages of subjective evaluation and objective evaluation, and being able to reflect the model generation more accurately and comprehensively.
  • the overall quality of the font On this basis, the parameters in the model can also be adjusted according to the evaluation situation.
  • the element image generating method provided by the embodiment of the present application is embodied as a Chinese character font image generating method, which may include the following steps:
  • the step of iteratively generating the second feature map includes the following steps at least once:
  • the second feature map is generated based on the first feature map and the downsampled Chinese font image, and the downsampled Chinese font image is obtained by sampling the initial Chinese font image, and the downsampled Chinese font image matches the spatial size of the first feature map. .
  • the element image generating method is embodied as a text font image generating method, and may include the following steps:
  • the step of iteratively generating the second feature map includes the following steps at least once:
  • the embodiment of the present application further provides an element image generating apparatus, as shown in FIG.
  • a feature map first generating unit 101 configured to generate a first feature map based on the initial element image
  • a feature map second generating unit 103 configured to generate a second feature map based on the first feature map
  • the target element image generating unit 105 is configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
  • the at least one of the feature map second generating unit 103 is further configured to:
  • the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.
  • the above element image generating device corresponds to the element image generating method in the foregoing embodiment.
  • the description in the foregoing method embodiments is applicable to the device, and details are not described herein again.
  • the embodiment of the present application further provides an element image generating system.
  • the system includes an element image generating module.
  • the element image generating module includes an encoding sub-module and a decoding sub-module.
  • the encoding sub-module includes a step-by-step connection.
  • the M-level coding unit, the decoding sub-module includes M-level decoding units connected step by step, and M is a natural number;
  • a first level coding unit configured to generate a first feature map based on the initial element image
  • a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;
  • a decoding submodule configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  • the above element image generating system further includes:
  • An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image
  • the generated image is generated by the element image generation module based on the original image
  • the real sample image corresponds to the original image and constitutes an element image pair
  • the degree of difference is used to alternately adjust the parameters in the element image generating module and the discriminating module so that the degree of difference satisfies the preset condition.
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device includes a processor, optionally including an internal bus, a network interface, and a memory.
  • the memory may include a memory, such as a high-speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory.
  • RAM high-speed random access memory
  • non-volatile memory such as at least one disk memory.
  • the electronic device may also include hardware required for other services.
  • the processor, the network interface, and the memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended) Industry Standard Architecture, extending the industry standard structure) bus.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one double-headed arrow is shown in Figure 13, but it does not mean that there is only one bus or one type of bus.
  • the program can include program code, the program code including computer operating instructions.
  • the memory can include both memory and non-volatile memory and provides instructions and data to the processor.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to form an element image generating device on a logical level.
  • the processor executes the program stored in the memory and is specifically used to perform the following operations:
  • the step of iteratively generating the second feature map includes the following steps at least once:
  • a second feature map is generated based on the first feature map and the downsampled element image, and the downsampled element image is obtained by sampling the initial element image, and the downsampled element image matches the spatial size of the first feature map.
  • the method performed by the element image generating apparatus disclosed in the embodiment shown in FIG. 1 of the present application may be applied to a processor or implemented by a processor.
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software.
  • the above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; or may be a digital signal processor (DSP), dedicated integration.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • other programmable logic device discrete gate or transistor logic device, discrete hardware component.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software modules can be located in random memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, etc., which are well established in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the above method.
  • the electronic device can also perform the method performed by the element image generating device in FIG. 1 and implement the functions of the element image generating device in the embodiment shown in FIG. 1.
  • the embodiments of the present application are not described herein again.
  • the embodiment of the present application further provides a computer readable storage medium storing one or more programs, the one or more programs including instructions that are executed by an electronic device including a plurality of applications
  • the electronic device can be caused to perform the method performed by the element image generating apparatus in the embodiment shown in FIG. 1, and is specifically configured to execute:
  • the step of iteratively generating the second feature map includes the following steps at least once:
  • a second feature map is generated based on the first feature map and the downsampled element image, and the downsampled element image is obtained by sampling the initial element image, and the downsampled element image matches the spatial size of the first feature map.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

An element image generation method, device and system. The method comprises: generating a first feature image on the basis of an initial element image (S101); generating a second feature image on the basis of the first feature image (S103), and using the second feature image as a new first feature image (S105), and iteratively executing the step of generating the second feature image; and generating a target element image corresponding to the initial element image on the basis of the first feature image and at least one second feature image (S107). At least one of the steps of generating the second feature image comprises: generating a second feature image on the basis of the first feature image and a downsampled element image, the downsampled element image being obtained by downsampling the initial element image; the downsampled element image matches the space size of the first feature image. The method can efficiently expand different styles of target element images according to initial element images, improve the efficiency of constructing an element image library, and further reduce information losses during data processing and improve the accuracy of the generated target element image.

Description

元素图像生成方法、装置及系统Element image generation method, device and system
本申请要求2018年4月10日递交的申请号为201810315058.3、发明名称为“元素图像生成方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 20110131505, the entire disclosure of which is incorporated herein by reference.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种元素图像生成方法、装置及系统。The present application relates to the field of computer technology, and in particular, to an element image generation method, apparatus, and system.
背景技术Background technique
随着计算机和互联网时代的到来,各类图形元素,例如字库中的字体、卡通动漫人物、花式符号等,已经成为人们工作生活中必不可少的一部分,让人们的生活更加丰富多彩。风格各异的图形元素,将外在的艺术表现性与内在的丰富内涵融合在一起,成为人们表达自我的有效手段。With the advent of the computer and Internet era, various graphic elements, such as fonts in the font, cartoon characters, fancy symbols, etc., have become an indispensable part of people's work and life, making people's lives more colorful. The graphic elements of different styles combine the external artistic expression with the inherent rich connotation, and become an effective means for people to express themselves.
在构建图形元素的图像库(可简称元素图像库)时,往往需要制作一系列风格相同的元素图像,并需要设计师对元素图像库中每一个元素图像逐一进行设计、制作,耗时耗力。以元素图像具体化为字体库中的字体为例。汉字字库的规模都很大,其中GB2312国标码收录常用汉字有6763个,GBK编码方案收录汉字21886个,而最新的GB18030国标码收录汉字更多达70244个。由于每一款新字体的设计制作都需要字体设计师一笔一划手工完成,且在整个过程中需要对字库中的每个汉字进行重复劳动,使之具备相同的风格,因此,工作量非常大。When constructing an image library of graphic elements (referred to as an element image library), it is often necessary to create a series of image images of the same style, and it is necessary for the designer to design and manufacture each element image in the element image library one by one, which is time-consuming and labor-intensive. . Take the element image as an example of a font in a font library. The size of the Chinese character library is very large. Among them, the GB2312 national standard code contains 6,763 commonly used Chinese characters, the GBK encoding program contains 21,886 Chinese characters, and the latest GB18030 national standard code contains more than 70,044 Chinese characters. Since each new font is designed and produced by the font designer, it is necessary to repeat the labor of each character in the font to make it have the same style. Therefore, the workload is very heavy. Big.
虽然相关技术中引入了机器学习的方法生成字体,但是,仍然因数据处理的过程中信息损耗过大而导致实施效果不佳。Although the machine learning method is introduced in the related art to generate fonts, the implementation is not effective due to excessive information loss during data processing.
因此,亟需一种能够高效、准确元素图像生成方法,以便提高构建元素图像库的效率和元素图像的准确性。Therefore, there is a need for an efficient and accurate element image generation method to improve the efficiency of building an element image library and the accuracy of element images.
发明内容Summary of the invention
本申请实施例提供一种元素图像生成方法和装置,旨在高效、准确地生成元素图像,以便提高构建元素图像库的效率,并提高所构建元素图像的准确性。The embodiment of the present application provides an element image generation method and apparatus, aiming to efficiently and accurately generate an element image, so as to improve the efficiency of constructing an element image library and improve the accuracy of the constructed element image.
本申请实施例采用下述技术方案:The embodiments of the present application adopt the following technical solutions:
第一方面,本申请实施例提供一种元素图像生成方法,包括:In a first aspect, an embodiment of the present application provides an element image generating method, including:
基于初始元素图像生成第一特征图;Generating a first feature map based on the initial element image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像;Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
基于第一特征图和降采样元素图像,生成第二特征图,所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
优选地,在第一方面提供的元素图像生成方法中,所述方法由元素图像生成模块执行;所述元素图像生成模块中包括编码子模块,所述编码子模块中包括逐级相连的M级编码单元,M为自然数;Preferably, in the element image generating method provided by the first aspect, the method is performed by an element image generating module; the element image generating module includes an encoding sub-module, and the encoding sub-module includes M-levels connected step by step. Coding unit, M is a natural number;
所述基于初始元素图像生成第一特征图的步骤,由第一级编码单元执行;The step of generating a first feature map based on the initial element image is performed by the first level coding unit;
所述生成第二特征图的步骤,由第二级编码单元至第M级编码单元执行。The step of generating the second feature map is performed by the second level coding unit to the Mth level coding unit.
优选地,在第一方面提供的元素图像生成方法中,所述基于第一特征图和降采样元素图像,生成第二特征图的步骤,由第二级编码单元至第M级编码单元中至少一级编码单元执行。Preferably, in the element image generating method provided by the first aspect, the step of generating a second feature map based on the first feature map and the downsampled element image is performed by at least a second level coding unit to an Mth level coding unit The primary coding unit performs.
优选地,在第一方面提供的元素图像生成方法中,所述方法还包括:Preferably, in the element image generating method provided by the first aspect, the method further includes:
确定元素图像对,所述元素图像对中包括原始图像和与所述原始图像相对应的真实样本图像;Determining an element image pair including an original image and a real sample image corresponding to the original image;
利用元素图像生成模块,基于所述原始图像生成与所述原始图像相对应的生成图像;Generating, by the element image generating module, a generated image corresponding to the original image based on the original image;
利用判别模块,判别所述生成图像与所述真实样本图像之间的差异程度;其中,所述生成图像标记为负类,所述真实样本图像标记为正类;Determining, by the discriminating module, a degree of difference between the generated image and the real sample image; wherein the generated image is marked as a negative class, and the real sample image is marked as a positive class;
根据所述差异程度,交替调整所述元素图像生成模块和所述判别模块中的参数,直至所述差异程度满足预设条件。The parameters in the element image generating module and the discriminating module are alternately adjusted according to the degree of difference until the degree of difference satisfies a preset condition.
优选地,在第一方面提供的元素图像生成方法中,判别所述生成图像与所述真实样本图像之间的差异程度,包括:Preferably, in the element image generating method provided by the first aspect, determining a degree of difference between the generated image and the real sample image includes:
根据所述生成图像和所述真实样本图像计算损失函数;Calculating a loss function according to the generated image and the real sample image;
根据所述损失函数的计算结果输出判别结果,所述判别结果用于反映所述生成图像 与所述真实样本图像之间的差异程度。A discrimination result is output according to a calculation result of the loss function, the discrimination result being used to reflect a degree of difference between the generated image and the real sample image.
优选地,在第一方面提供的元素图像生成方法中,所述损失函数包括特征空间损失函数,所述特征空间损失函数用于反映所述生成图像与所述真实样本图像之间在特征空间上的差异。Preferably, in the element image generating method provided by the first aspect, the loss function includes a feature space loss function for reflecting a feature space between the generated image and the real sample image The difference.
优选地,在第一方面提供的元素图像生成方法中,所述损失函数包括对抗损失函数,所述对抗损失函数用于反映所述元素图像生成模块对降低所述差异程度的贡献程度和所述判别模块对提高所述差异程度的贡献程度。Preferably, in the element image generating method provided by the first aspect, the loss function includes an anti-loss function for reflecting a degree of contribution of the element image generating module to reducing the degree of the difference and the The degree to which the discriminating module contributes to increasing the degree of the difference.
优选地,在第一方面提供的元素图像生成方法中,所述损失函数包括以下至少一项:Preferably, in the element image generating method provided by the first aspect, the loss function comprises at least one of the following:
像素空间损失函数,所述像素空间损失函数用于反映所述生成图像与所述真实样本图像之间在对应像素点上的差异;a pixel space loss function for reflecting a difference between the generated image and the real sample image at a corresponding pixel point;
类别损失函数,所述类别损失函数用于反映所述生成图像与所述真实样本图像之间在类别上的差异。A class loss function for reflecting a difference in categories between the generated image and the real sample image.
优选地,在第一方面提供的元素图像生成方法中,在利用元素图像生成模块,基于所述原始图像生成与所述原始图像相对应的生成图像之后,所述方法还包括:Preferably, in the element image generating method provided by the first aspect, after the generated image corresponding to the original image is generated based on the original image by using the element image generating module, the method further includes:
对生成的生成图像进行测试,确定所述生成图像与所述真实样本图像的匹配程度。The generated generated image is tested to determine the degree of matching of the generated image with the real sample image.
优选地,在第一方面提供的元素图像生成方法中,对所述元素图像生成模型生成的生成图像进行测试,确定所述生成图像与所述真实样本图像的匹配程度,包括:Preferably, in the element image generating method provided by the first aspect, the generated image generated by the element image generating model is tested, and the matching degree of the generated image with the real sample image is determined, including:
根据以下至少一项指标确定所述生成图像与所述真实样本图像的匹配程度:Determining the degree of matching of the generated image with the real sample image according to at least one of the following indicators:
所述生成图像与所述真实样本图像之间的L1距离;An L1 distance between the generated image and the real sample image;
所述生成图像与所述真实样本图像之间的峰值信噪比PSNR;a peak signal to noise ratio PSNR between the generated image and the real sample image;
所述生成图像与所述真实样本图像之间的结构相似度SSIM。The structural similarity SSIM between the generated image and the real sample image.
优选地,在第一方面提供的元素图像生成方法中,在确定所述生成图像与所述真实样本图像的匹配程度之后,所述方法还包括:Preferably, in the element image generating method provided by the first aspect, after determining the degree of matching between the generated image and the real sample image, the method further includes:
根据所述生成图像与所述真实样本图像的匹配程度,调整所述元素图像生成模块中的参数。Adjusting parameters in the element image generating module according to a degree of matching between the generated image and the real sample image.
优选地,在第一方面提供的元素图像生成方法中,所述元素图像为文字字体。Preferably, in the element image generating method provided by the first aspect, the element image is a text font.
第二方面,本申请实施例提供一种元素图像生成装置,所述装置包括:In a second aspect, an embodiment of the present application provides an element image generating apparatus, where the apparatus includes:
特征图第一生成单元,用于基于初始元素图像生成第一特征图;a feature map first generating unit, configured to generate a first feature map based on the initial element image;
特征图第二生成单元,用于基于第一特征图,生成第二特征图;a second generation unit of the feature map, configured to generate a second feature map based on the first feature map;
目标元素图像生成单元,用于基于第一特征图和多个第二特征图中至少一个第二特 征图,生成与所述初始元素图像相对应的目标元素图像;a target element image generating unit, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,所述特征图第二生成单元中的至少一个,还用于:The at least one of the second generation unit of the feature map is further configured to:
基于第一特征图和降采样元素图像,生成第二特征图;所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image; the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.
第三方面,本申请实施例提供一种元素图像生成系统,所述系统包括元素图像生成模块,所述元素图像生成模块中包括编码子模块和解码子模块;所述编码子模块中包括逐级相连的M级编码单元,所述解码子模块中包括逐级相连的M级解码单元,M为自然数;其中,In a third aspect, an embodiment of the present application provides an element image generating system, where the system includes an element image generating module, where the element image generating module includes an encoding sub-module and a decoding sub-module; a connected M-level coding unit, where the decoding sub-module includes M-level decoding units connected in stages, and M is a natural number;
第一级编码单元,用于基于初始元素图像生成第一特征图;a first level coding unit, configured to generate a first feature map based on the initial element image;
第二级编码单元至第M级编码单元,用于基于上一级编码单元生成的特征图,生成第二特征图;a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;
解码子模块,用于基于第一特征图和多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像。And a decoding submodule, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.
优选地,在第三方面提供的元素图像生成系统中,所述系统还包括:Preferably, in the element image generating system provided by the third aspect, the system further includes:
元素图像判别模块,用于判别生成图像与真实样本图像之间的差异程度;An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image;
其中,among them,
所述生成图像由所述元素图像生成模块基于原始图像生成;The generated image is generated by the element image generation module based on the original image;
所述真实样本图像与所述原始图像相对应,构成元素图像对;The real sample image corresponds to the original image to form an element image pair;
所述差异程度用于交替调整所述元素图像生成模块和所述判别模块中的参数,使得所述差异程度满足预设条件。The degree of difference is used to alternately adjust parameters in the element image generating module and the discriminating module such that the degree of difference satisfies a preset condition.
第四方面,本申请实施例提供一种电子设备,包括:In a fourth aspect, an embodiment of the present application provides an electronic device, including:
处理器;以及Processor;
被安排成存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行以下操作:A memory arranged to store computer executable instructions that, when executed, cause the processor to perform the following operations:
基于初始元素图像生成第一特征图;Generating a first feature map based on the initial element image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初 始元素图像相对应的目标元素图像;Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
基于第一特征图和降采样元素图像,生成第二特征图,所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
第五方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储一个或多个程序,所述一个或多个程序当被包括多个应用程序的电子设备执行时,使得所述电子设备执行以下操作:In a fifth aspect, an embodiment of the present application provides a computer readable storage medium, where the one or more programs are stored, when the one or more programs are executed by an electronic device including multiple applications. , causing the electronic device to perform the following operations:
基于初始元素图像生成第一特征图;Generating a first feature map based on the initial element image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像;Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
基于第一特征图和降采样元素图像,生成第二特征图,所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
第六方面,本申请实施例提供一种汉字字体图像生成方法,包括:In a sixth aspect, the embodiment of the present application provides a method for generating a Chinese character font image, including:
基于初始汉字字体图像生成第一特征图;Generating a first feature map based on the initial Chinese font image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始汉字字体图像相对应的目标汉字字体图像;Generating a target kanji font image corresponding to the initial kanji font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
基于第一特征图和降采样汉字字体图像,生成第二特征图,所述降采样汉字字体图像由所述初始汉字字体图像经过将采样处理后得到,所述降采样汉字字体图像与所述第一特征图的空间尺寸相匹配。Generating, according to the first feature map and the downsampled Chinese font image, the second feature map, wherein the downsampled Chinese font image is obtained by sampling the initial Chinese font image, and the downsampled Chinese font image and the first The spatial dimensions of a feature map match.
第七方面,本申请实施例提供一种文字字体图像生成方法,包括:In a seventh aspect, the embodiment of the present application provides a text font image generating method, including:
基于初始文字字体图像生成第一特征图;Generating a first feature map based on the initial text font image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始文字字体图像相对应的目标文字字体图像;Generating a target text font image corresponding to the initial text font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
基于第一特征图和降采样文字字体图像,生成第二特征图,所述降采样文字字体图像由所述初始文字字体图像经过将采样处理后得到,所述降采样文字字体图像与所述第一特征图的空间尺寸相匹配。Generating, according to the first feature map and the downsampled text font image, the second feature map, wherein the downsampled text font image is obtained by sampling the initial text font image, and the downsampled text font image and the first The spatial dimensions of a feature map match.
本申请实施例采用的上述至少一个技术方案能够达到以下有益效果:The above at least one technical solution adopted by the embodiment of the present application can achieve the following beneficial effects:
采用本申请实施例提供的技术方案,在基于初始元素图像生成第一特征图的基础上,迭代执行生成第二特征图的步骤,以获得多个第二特征图。并且,在迭代执行的过程中,至少一次引入降采样元素图像作为补充信息,用于生成第二特征图。在此基础上,基于第一特征图和多个第二特征图中至少一个第二特征图,生成与初始元素图像相对应的目标元素图像。因此,本申请实施例提供的技术方案,不仅能够高效的根据初始元素图像扩充出不同风格的目标元素图像,提高了构建元素图像库的效率,还能够降低数据处理过程中的信息损耗,从而有利于提高生成的目标元素图像的准确性。With the technical solution provided by the embodiment of the present application, on the basis of generating the first feature map based on the initial element image, the step of generating the second feature map is iteratively performed to obtain a plurality of second feature maps. And, during the iterative execution, the downsampled element image is introduced at least once as supplementary information for generating the second feature map. Based on this, a target element image corresponding to the initial element image is generated based on the first feature map and the at least one second feature map of the plurality of second feature maps. Therefore, the technical solution provided by the embodiment of the present application can not only efficiently expand the image of the target element of different styles according to the initial element image, but also improve the efficiency of constructing the element image library, and can also reduce the information loss in the data processing process, thereby Conducive to improving the accuracy of the generated target element image.
附图说明DRAWINGS
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:
图1为本申请实施例提供的元素图像生成方法的流程示意图;FIG. 1 is a schematic flowchart diagram of an element image generating method according to an embodiment of the present application;
图2为本申请实施例中所采用的元素图像生成模型的结构示意图;2 is a schematic structural diagram of an element image generation model used in an embodiment of the present application;
图3为本申请实施例中所采用的元素图像生成模型中编码单元的结构示意图;3 is a schematic structural diagram of a coding unit in an element image generation model used in an embodiment of the present application;
图4为本申请实施例中所采用的元素图像生成模型中解码单元的结构示意图;4 is a schematic structural diagram of a decoding unit in an element image generation model used in an embodiment of the present application;
图5为本申请实施例中生成目标元素图像的具体流程示意图;FIG. 5 is a schematic flowchart of generating an image of a target element in an embodiment of the present application;
图6为本申请实施例中所采用的元素图像生成模型中第一级编码单元的处理过程示意图;6 is a schematic diagram of a processing procedure of a first-level coding unit in an element image generation model used in an embodiment of the present application;
图7为本申请实施例中所采用的元素图像生成模型中各级编码单元的处理过程示意 图;FIG. 7 is a schematic diagram of a processing procedure of coding units of each level in an element image generation model used in an embodiment of the present application; FIG.
图8为本申请实施例中所采用的元素图像生成模型训练网络的结构示意图;FIG. 8 is a schematic structural diagram of an element image generation model training network used in an embodiment of the present application; FIG.
图9为本申请实施例中所采用的元素图像生成模型训练网络中判别模块的结构示意图;9 is a schematic structural diagram of a discriminating module in an element image generation model training network used in an embodiment of the present application;
图10为本申请实施例中计算特征空间损失函数的模型结构示意图;FIG. 10 is a schematic structural diagram of a model for calculating a feature space loss function according to an embodiment of the present application; FIG.
图11为本申请实施例提供的元素图像生成装置的结构示意图;FIG. 11 is a schematic structural diagram of an element image generating apparatus according to an embodiment of the present application;
图12为本申请实施例提供的元素图像生成系统的结构示意图;FIG. 12 is a schematic structural diagram of an element image generating system according to an embodiment of the present application;
图13为本申请实施例提供的电子设备的结构示意图。FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式detailed description
为了方便理解本申请实施例,在此先介绍本申请中引入的几个要素。In order to facilitate the understanding of the embodiments of the present application, several elements introduced in the present application are first introduced herein.
卷积神经网络(Convolutional Neural Network,简称CNN),是人工神经网络的一种,由一系列卷积层、非线性激活层、池化层、归一化层、全连接层等基本单元连接构成。Convolutional Neural Network (CNN) is a kind of artificial neural network. It consists of a series of convolutional layers, nonlinear activation layers, pooling layers, normalized layers, and fully connected layers. .
特征图(Feature Map):由卷积层、池化层、全连接层或网络中的其他层输出的特征表示。Feature Map: A representation of a feature output by a convolutional layer, a pooled layer, a fully connected layer, or other layers in the network.
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions of the present application will be clearly and completely described in the following with reference to the specific embodiments of the present application and the corresponding drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
本申请实施例提供了元素图像生成方法。可以理解到,本申请实施例所适用的元素图像,可以包括文字的字体、乐谱上的标记符号、卡通人物造型等各种图形类元素。本申请实施例的目的在于,根据元素图像集合中小部分样本的风格特征(可以理解到,这部分样本通常由用户手工逐一设计,可作为确定风格特征的依据),自动扩充生成与元素图像集合中的其他元素图像相对应的新的元素图像,使得生成的新的元素图像与用户手工设计的样本的风格一致,从而可以高效、准确的生成不同风格的元素图像集合,实现了对不同风格的图形集合的自动扩充。An embodiment of the present application provides an element image generating method. It can be understood that the element image to which the embodiment of the present application is applied may include various graphic elements such as a font of a character, a mark symbol on a musical score, and a cartoon character shape. The purpose of the embodiment of the present application is to select a style according to a small part of a sample of an element image (it can be understood that this part of the sample is usually designed by the user one by one, which can be used as a basis for determining the style feature), and automatically expands the generation and the element image collection. The other element image corresponds to the new element image, so that the generated new element image is consistent with the style of the user-designed sample, so that different styles of element image collection can be efficiently and accurately generated, and different styles of graphics can be realized. Automatic expansion of collections.
在一种应用场景下,元素图像具体化为卡通形象。假设用户希望设计一套风格统一的卡通形象,则可以根据部分动物的原始图像手工设计这部分动物的卡通形象,以原始图像和这部分动物的卡通形象(作为真实样本图像)为输入,对元素图像生成模型(也 可称为元素图像生成模块)进行训练。进而可以将其他动物的图像作为初始元素图像输入上述训练好的元素图像生成模型,即可自动生成与手工设计的风格统一的其他卡通形象。In an application scenario, the element image is embodied as a cartoon image. Assuming that the user wishes to design a uniform cartoon image, the cartoon image of the animal can be manually designed according to the original image of some animals, and the original image and the cartoon image of the animal (as a real sample image) are input as input elements. An image generation model (also referred to as an element image generation module) performs training. Furthermore, the image of the other animal can be input as the initial element image into the above-mentioned trained element image generation model, and other cartoon images unified with the style of the hand design can be automatically generated.
在又一种应用场景下,元素图像具体化为汉字字体。每种汉字字体都需要有对应的字体库。假设用户希望构建一个新的字体库,则可以在手工设计部分汉字的新字体(例如,1000个新字体的汉字)以确定新字体的风格后,以这部分汉字的原字体(例如,宋体)作为原始图像,以这部分汉字的新字体作为真实样本图像,对元素图像生成模型进行训练。进而可以采用本申请实施例提供的方法,以剩余的汉字(例如,可以是按照GB18030国标码收录的汉字标准构建字体库,则剩余汉字的数量为70244-1000=69244;也可以是用户当前需要生成新字体的特定汉字)的原字体(例如,宋体)作为初始元素图像,通过这一训练好的元素图像生成模型(也可称为元素图像生成模块,元素图像生成网络),生成剩余的汉字的新字体,从而能够高效的生成汉字的新字体,提高了构建字体库的效率。In another application scenario, the element image is embodied as a Chinese font. Each Chinese font requires a corresponding font library. Assuming that the user wants to build a new font library, you can manually design some new Chinese fonts (for example, 1000 new font Chinese characters) to determine the style of the new font, and then use the original font of this part of the Chinese characters (for example, Song). As the original image, the new image of this part of the Chinese character is used as the real sample image, and the element image generation model is trained. Further, the method provided in the embodiment of the present application may be used to construct a font library by using the remaining Chinese characters (for example, the Chinese character standard included in the GB18030 national standard code may be used, and the number of remaining Chinese characters is 70244-1000=69244; The original font of the specific Chinese character of the new font (for example, Song) is used as the initial element image, and the model is generated by the trained element image (also referred to as element image generation module, element image generation network) to generate the remaining Chinese characters. The new font, which can efficiently generate new fonts for Chinese characters, improves the efficiency of building font libraries.
在本申请实施例中所采用的元素图像生成模型(可具体化为以上应用场景中的字体生成模型),会多次接受初始元素图像(可具体化为以上应用场景中的原字体),既接受原始的初始元素图像,也接受对原始的初始元素图像进行降采样处理后的降采样元素图像作为补充信息,从而可以降低数据处理过程中的信息损耗,有利于提高生成的目标元素图像(可具体化为以上应用场景中的新字体)的准确性。The element image generation model (which can be embodied as the font generation model in the above application scenario) used in the embodiment of the present application receives the initial element image multiple times (can be embodied as the original font in the above application scenario), Accepting the original initial element image, and also accepting the downsampled element image after downsampling the original initial element image as supplementary information, thereby reducing information loss during data processing, and improving the generated target element image (may be Condensed into the accuracy of the new fonts in the above application scenarios.
参见图1所示,本申请实施例提供一种元素图像生成方法,可包括:As shown in FIG. 1 , an embodiment of the present application provides an element image generation method, which may include:
S101:基于初始元素图像生成第一特征图;S101: Generate a first feature map based on the initial element image.
S103:基于第一特征图,生成第二特征图;S103: Generate a second feature map based on the first feature map.
S105:将第二特征图作为新的第一特征图,迭代执行上述步骤S103,生成第二特征图的步骤,以获得多个第二特征图;S105: The second feature map is used as a new first feature map, and the step S103 is performed iteratively to generate a second feature map to obtain a plurality of second feature maps.
S107:基于第一特征图和多个第二特征图中至少一个第二特征图,生成与初始元素图像相对应的目标元素图像;S107: Generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,上述步骤S103,生成第二特征图的步骤中,至少一次包括:Wherein, in the above step S103, the step of generating the second feature map includes, at least once:
基于第一特征图和降采样元素图像,生成第二特征图;降采样元素图像由初始元素图像经过将采样处理后得到,降采样元素图像与第一特征图的空间尺寸相匹配。And generating a second feature map based on the first feature map and the downsampled element image; the downsampled element image is obtained by sampling the initial element image, and the downsampled element image is matched with the spatial size of the first feature map.
可以理解到,在实施图1所示元素图像生成方法时,可以先确定初始元素图像。确定出的初始元素图像,可以理解为生成目标元素图像的依据,生成的目标元素图像与初 始元素图像相对应。根据用户的应用需求,目标元素图像与初始元素图像之间的对应方式可以不同,具体地可以通过训练好的元素图像生成模型(也可称为元素图像生成模块)实现。具体到字体生成这一应用场景下,可以理解为,相对应的目标元素图像和初始元素图像所反映的是同一汉字的不同字体(风格)。It can be understood that when the element image generating method shown in FIG. 1 is implemented, the initial element image can be determined first. The determined initial element image can be understood as the basis for generating the target element image, and the generated target element image corresponds to the initial element image. According to the application requirements of the user, the correspondence between the target element image and the initial element image may be different, and specifically may be implemented by a trained element image generation model (also referred to as an element image generation module). Specifically, in the application scenario of font generation, it can be understood that the corresponding target element image and the initial element image reflect different fonts (styles) of the same Chinese character.
需要说明的是,确定初始元素图像时,还可以理解为根据批大小,确定单次输入到元素图像生成模型中进行处理的一批图像。可以理解到,批大小(batch size),指的是单次输入到模型中进行处理的图像的数量,是基于神经网络构建的模型中的一个参数。例如,batch size的值取为1,则每次只向模型输入一张初始元素图像进行处理;batch size的值取为16,则每次将16张初始元素图像作为一个批次输入到模型,相对应地,生成的目标元素图像也将会是16张。It should be noted that when determining the initial element image, it can also be understood as determining a batch of images that are processed into a single image input model for processing according to the batch size. It can be understood that the batch size refers to the number of images that are input into the model for processing in a single time, and is a parameter in a model constructed based on a neural network. For example, if the value of batch size is 1, then only one initial element image is input to the model for processing at a time; the value of batch size is taken as 16, and 16 initial element images are input to the model as a batch each time. Correspondingly, the generated target element image will also be 16 sheets.
在确定初始元素图像的基础上,可执行上述步骤S101~S107,生成与初始元素图像相对应的目标元素图像。具体的,可以基于训练好的元素图像生成模型,根据初始元素图像和初始元素图像经过降采样处理后的降采样元素图像,生成与初始元素图像相对应的目标元素图像。On the basis of determining the initial element image, the above steps S101 to S107 may be performed to generate a target element image corresponding to the initial element image. Specifically, the model element may be generated based on the trained element image, and the target element image corresponding to the initial element image is generated according to the downsampled element image after the downsampled processing of the initial element image and the initial element image.
采用上述技术方案,在基于初始元素图像生成第一特征图的基础上,迭代执行生成第二特征图的步骤,以获得多个第二特征图。并且,在迭代执行的过程中,至少一次引入降采样元素图像作为补充信息,用于生成第二特征图,进而基于第一特征图和多个第二特征图中至少一个第二特征图,生成与初始元素图像相对应的目标元素图像。因此,本申请实施例提供的技术方案,不仅能够高效的根据初始元素图像扩充出不同风格的目标元素图像,提高了构建元素图像库的效率,还能够降低数据处理过程中的信息损耗,从而有利于提高生成的目标元素图像的准确性。With the above technical solution, on the basis of generating the first feature map based on the initial element image, the step of generating the second feature map is iteratively performed to obtain a plurality of second feature maps. And, during the iterative execution, the downsampled element image is introduced as supplementary information at least once, for generating the second feature map, and generating the second feature map based on the first feature map and the at least one second feature map of the plurality of second feature maps. A target element image corresponding to the initial element image. Therefore, the technical solution provided by the embodiment of the present application can not only efficiently expand the image of the target element of different styles according to the initial element image, but also improve the efficiency of constructing the element image library, and can also reduce the information loss in the data processing process, thereby Conducive to improving the accuracy of the generated target element image.
可以理解到,无论在何种应用场景下,无论采用何种方式搭建和训练元素图像生成模型,只要该模型能够接受原始的初始元素图像,并至少一次接受经过降采样处理后的降采样元素图像,基于此生成的与初始元素图像相对应的目标元素图像,都可以高效、准确的基于初始元素图像生成包含新的元素图像的目标元素图像,并且,能够降低数据处理过程中的信息损耗,有利于提高生成的目标元素图像的准确性。It can be understood that no matter what kind of application scenario, no matter how to build and train the element image generation model, as long as the model can accept the original initial element image and accept the downsampled element image after the downsampling process at least once. Based on the generated target element image corresponding to the initial element image, the target element image including the new element image can be efficiently and accurately generated based on the initial element image, and the information loss in the data processing process can be reduced. Conducive to improving the accuracy of the generated target element image.
以下结合附图,以元素图像具体化为文字字体的应用场景为主,举例说明元素图像生成模型(元素图像生成模块)的具体框架,并结合元素图像生成模型详细说明本申请实施例提供的元素图像生成方法的多方面实施示例。In the following, with reference to the accompanying drawings, an application scenario in which an element image is embodied as a text font is used as an example, and a specific frame of an element image generation model (element image generation module) is illustrated, and an element provided by an embodiment of the present application is described in detail in combination with an element image generation model. An example of a multifaceted implementation of an image generation method.
图2给出了适用于本申请实施例的一种元素图像生成模型的框架示意图。图2所示 的元素图像生成模型包括生成模块(Generator),生成模块中包括编码子模块(Encoder)和解码子模块(Decoder)。在具体实施时,该模型的输入为初始元素图像,该模型的输出为与初始元素图像相对应的目标元素图像。以该模型具体用于生成新字体为例,输入的初始元素图像为原字体的文字(例如图2中黑体的“趵”字),输出的目标元素图像为新字体的文字(例如图2中楷体的“趵”字)。FIG. 2 is a schematic diagram showing the framework of an element image generation model suitable for the embodiment of the present application. The element image generation model shown in Fig. 2 includes a generation module including an encoding sub-module (Encoder) and a decoding sub-module (Decoder). In a specific implementation, the input of the model is an initial element image, and the output of the model is a target element image corresponding to the initial element image. Taking the model specifically for generating a new font as an example, the input initial element image is the text of the original font (for example, the "趵" character in bold in FIG. 2), and the output target element image is the text of the new font (for example, in FIG. 2) The word "趵" in the body.
上述元素图像生成模型(元素图像生成模块)中的编码子模块用于将初始元素图像转换为高维特征图,解码子模块用于将高维特征图转换为一张新图像,也就是输出的目标元素图像。可以理解到,编码子模块和解码子模块对数据的处理过程为对称的、互逆的。The encoding sub-module in the above element image generation model (element image generation module) is used to convert the initial element image into a high-dimensional feature map, and the decoding sub-module is used to convert the high-dimensional feature image into a new image, that is, the output Target element image. It can be understood that the processing process of the data by the encoding sub-module and the decoding sub-module is symmetric and reciprocal.
元素图像生成模块中,编码子模块中包括逐级相连的M级编码单元,M为自然数。解码子模块中包括逐级相连的M级解码单元,M为自然数。以图2所示模型为例,M取值为8。In the element image generation module, the coding sub-module includes M-level coding units connected in stages, and M is a natural number. The decoding sub-module includes M-level decoding units connected in stages, and M is a natural number. Taking the model shown in Figure 2 as an example, the value of M is 8.
具体的,上述步骤S101,基于初始元素图像生成第一特征图的步骤,由第一级编码单元(如图2中编码单元e1)执行。第一级编码单元在接受初始元素图像的基础上,生成第一特征图,输出至第二级编码单元(如图2中编码单元e2)。除此之外,第一级编码单元还可将第一特征图还输出至解码子模块中与第一级编码单元相对应的解码单元(如图2中解码单元d7)。Specifically, in the above step S101, the step of generating the first feature map based on the initial element image is performed by the first level coding unit (such as the coding unit e1 in FIG. 2). The first-level coding unit generates a first feature map based on the image of the initial element, and outputs the first feature map to the second-level coding unit (such as the coding unit e2 in FIG. 2). In addition to this, the first level coding unit may also output the first feature map to the decoding unit corresponding to the first level coding unit in the decoding submodule (such as the decoding unit d7 in FIG. 2).
上述步骤S103~S105中基于第一特征图(和降采样元素图像),生成第二特征图的步骤,由第二级编码单元至第M级编码单元执行。并且,第二级编码单元至第M级编码单元中至少一级编码单元(例如,图2中编码单元e2~e8中至少一个编码单元),在生成第二特征图时,除基于上一级编码单元输出的特征图外,还基于由初始元素图像经过将采样处理后得到降采样元素图像,并且,降采样元素图像与上一级编码单元输出的特征图的空间尺寸相匹配。The steps of generating the second feature map based on the first feature map (and the downsampled element image) in the above steps S103 to S105 are performed by the second level encoding unit to the Mth level encoding unit. And, at least one coding unit in the second level coding unit to the Mth level coding unit (for example, at least one coding unit in the coding units e2 to e8 in FIG. 2) is generated based on the previous level when generating the second feature picture. In addition to the feature map output by the coding unit, the downsampled element image is obtained based on the initial element image, and the downsampled element image is matched with the spatial size of the feature map output by the previous coding unit.
需要说明的是,编码子模块中的各编码单元结构相同,按照顺序逐级相连,每个编码单元都会向其下一级编码单元输出特征图(最后一级编码单元输出的特征图将由第一级解码单元接受)。编码单元可以由若干卷积层Conv、漏改进线性单元层LReLU和批量归一化层BN构成。输入到编码单元的图像或者特征图可以先后经过漏改进线性单元层LReLU、卷积层Conv和批量归一化层BN的处理后输出,参见图3所示。能够理解,各层的数量以及各层间的顺序可以调整,通常可以将卷积层Conv和漏改进线性单元层LReLU成对设计(例如,设计为Conv-LReLU-Conv-LReLU或者 LReLU-Conv-LReLU-Conv的形式),还可将批量归一化层BN设计在编码单元的最后。It should be noted that each coding unit in the coding sub-module has the same structure and is connected step by step in order, and each coding unit outputs a feature map to the next-stage coding unit (the feature picture outputted by the last-level coding unit will be first The level decoding unit accepts). The coding unit may be composed of a plurality of convolutional layers Conv, a drain modified linear cell layer LReLU, and a bulk normalization layer BN. The image or feature map input to the coding unit may be processed and output through the leakage improvement linear unit layer LReLU, the convolution layer Conv, and the batch normalization layer BN, as shown in FIG. It can be understood that the number of layers and the order between the layers can be adjusted, and the convolutional layer Conv and the leaky modified linear unit layer LReLU can usually be paired (for example, designed as Conv-LReLU-Conv-LReLU or LReLU-Conv- In the form of LReLU-Conv), the batch normalization layer BN can also be designed at the end of the coding unit.
解码子模块中的各解码单元结构相同,也按照顺序逐级相连。并且,解码单元的数量与编码单元的数量相同。解码单元可以由修正线性单元层ReLU、反卷积层Deconv、批量归一化层BN和梯度下降层Dropout构成。输入到解码单元的特征图可以先后经过修正线性单元层ReLU、反卷积层Deconv、批量归一化层BN和梯度下降层Dropout的处理后输出,参见图4所示。类似的,各层的数量以及各层间的顺序可以调整,能够实现对特征图的处理,最终生成目标元素图像即可。Each decoding unit in the decoding sub-module has the same structure and is also connected in stages in order. And, the number of decoding units is the same as the number of coding units. The decoding unit may be composed of a modified linear unit layer ReLU, a deconvolution layer Deconv, a bulk normalization layer BN, and a gradient falling layer Dropout. The feature map input to the decoding unit may be processed and outputted through the modified linear unit layer ReLU, the deconvolution layer Deconv, the batch normalization layer BN, and the gradient falling layer Dropout, as shown in FIG. Similarly, the number of layers and the order between the layers can be adjusted, and the processing of the feature map can be realized, and finally the image of the target element can be generated.
在编码子模块中的编码阶段,特征图每经过一级编码单元的处理,输出特征图的空间尺寸将会减小为输入特征图的空间尺寸的一半(具体为特征图的宽度减小一半,且高度减小一半)。而在解码子模块中的解码阶段,特征图每经过一级解码单元的处理,输出特征图的空间尺寸将会增大为输入特征图的空间尺寸的两倍(具体为特征图的宽度增大为两倍,且高度增大为两倍)。因此,在编码和解码这两个阶段中,将会存在某个编码单元与某个解码单元输出的特征图的空间尺寸相同,则可为满足这一条件的编码单元和解码单元建立对应关系,可以称之为对称对。具体地,可以描述为,输出的特征图的空间尺寸相同的编码单元和解码单元具有对应关系。In the encoding stage in the encoding sub-module, the spatial size of the output feature map is reduced to half of the spatial size of the input feature image for each processing of the feature map by one level of coding unit (specifically, the width of the feature map is reduced by half, And the height is reduced by half). In the decoding stage of the decoding sub-module, the spatial size of the output feature map will be increased to twice the spatial size of the input feature image after each processing of the feature map by one level of decoding unit (specifically, the width of the feature map is increased). It is twice and the height is doubled). Therefore, in the two stages of encoding and decoding, there will be the same spatial size of the feature picture output by a certain coding unit and a certain decoding unit, and the coding unit and the decoding unit satisfying the condition may be associated with each other. It can be called a symmetric pair. Specifically, it can be described that the coding unit and the decoding unit having the same spatial size of the output feature map have a corresponding relationship.
以初始元素图像的尺寸为256×256×3为例(该图像的尺寸表示可以理解为:该图像宽度和高度上的像素点数量均为256,以RGB颜色模型的方式表示,因此对于每一像素点用三维属性数据表示)。图2所示的编码子模块中包含的8级编码单元输出的特征图的空间尺寸如下表1所示。Taking the size of the initial element image as 256×256×3 as an example (the size representation of the image can be understood as: the number of pixels in the width and height of the image is 256, which is represented by the RGB color model, so for each Pixels are represented by three-dimensional attribute data). The spatial size of the feature map output by the 8-level coding unit included in the coding sub-module shown in FIG. 2 is as shown in Table 1.
表1编码单元输出特征图尺寸示例Table 1 Example of coding unit output feature map size
Figure PCTCN2019081217-appb-000001
Figure PCTCN2019081217-appb-000001
Figure PCTCN2019081217-appb-000002
Figure PCTCN2019081217-appb-000002
表1所示例的输出特征图尺寸,以批大小(batch=16)为16为例,图像的尺寸由初始元素图像的256×256,经过编码单元e1(第一级编码单元)处理后变为128×128,再经由编码单元e2(第二级编码单元)处理后变为64×64,以此类推,直至经由编码单元e8(第八级编码单元)处理后变为1×1。The output feature map size shown in Table 1 is taken as an example with a batch size (batch=16) of 16. The size of the image is changed from 256×256 of the initial element image to the encoding unit e1 (first-level coding unit). 128×128, which is processed by the coding unit e2 (second-level coding unit) to become 64×64, and so on, until it is processed to 1×1 after being processed via the coding unit e8 (eight-level coding unit).
可以理解到,在上述示例中,各编码单元中卷积层的个数可以不尽相同,从而导致各级输出中通道数(Channel)的数值也不一样。通常,为了在一定程度上弥补数据处理过程中的信息损耗,卷积层的个数优选随着编码单元的级数递增。It can be understood that, in the above example, the number of convolution layers in each coding unit may be different, resulting in different values of channels in each level of output. Generally, in order to compensate for the information loss in the data processing process to some extent, the number of convolution layers preferably increases with the number of stages of the coding unit.
与表1中示例的输出特征图尺寸相对应地,图2所示的解码子模块中所包含的8级解码单元输出的特征图的空间尺寸如下表2所示。Corresponding to the output feature map size exemplified in Table 1, the spatial size of the feature map output by the 8-level decoding unit included in the decoding sub-module shown in Fig. 2 is as shown in Table 2 below.
表2解码单元输出特征图尺寸示例Table 2 decoding unit output feature map size example
Figure PCTCN2019081217-appb-000003
Figure PCTCN2019081217-appb-000003
Figure PCTCN2019081217-appb-000004
Figure PCTCN2019081217-appb-000004
以上简要阐述了元素图像生成模型的结构以及编码单元和解码单元之间的对应关系。在图2示例的元素图像生成模型的基础上,基于训练好的元素图像生成模型,根据初始元素图像和初始元素图像经过降采样处理后的降采样元素图像,生成与初始元素图像相对应的目标元素图像,可以具体包括以下步骤,参见图5所示:The structure of the element image generation model and the correspondence between the coding unit and the decoding unit are briefly explained above. Based on the element image generation model illustrated in FIG. 2, based on the trained element image generation model, a target corresponding to the initial element image is generated according to the downsampled element image after the downsampled processing of the initial element image and the initial element image. The element image may specifically include the following steps, as shown in FIG. 5:
S1031:将初始元素图像和降采样元素图像输入编码子模块。S1031: Input the initial element image and the downsampled element image into the encoding submodule.
需要说明的是,本申请实施例中,将初始元素图像经过降采样处理后的降采样元素图像也输入编码子模块,可以降低数据处理过程中的信息损耗,从而生成更准确的目标元素图像。It should be noted that, in the embodiment of the present application, the downsampled element image after the downsampled processing of the initial element image is also input into the encoding submodule, which can reduce information loss in the data processing process, thereby generating a more accurate target element image.
在具体实施时,第一级编码单元(例如图2中的编码单元e1)可以直接接受原始的初始元素图像,参见图6所示。第一级编码单元之后的任一第N级编码单元(N可取为大于1并且不大于M的自然数),除接收上一级编码单元输出的特征图之外,还可以接受经过降采样处理后的降采样元素图像,参见图7所示。对于不同的编码单元,能够接受的降采样元素图像的空间尺寸与编码单元相对应,具体地,应该与该编码单元接收的上一级输出的特征图的空间尺寸一致,以便该编码单元能够将两者进行融合后进行后续数据处理。In a specific implementation, the first level coding unit (for example, the coding unit e1 in FIG. 2) can directly accept the original initial element image, as shown in FIG. 6. Any Nth-level coding unit after the first-stage coding unit (N may be a natural number greater than 1 and not greater than M), and may be subjected to down-sampling processing in addition to the feature map outputted by the coding unit of the previous-stage coding unit. The image of the downsampled element is shown in Figure 7. For different coding units, the spatial size of the acceptable downsampled element image corresponds to the coding unit, and specifically, should be consistent with the spatial size of the feature image output by the coding unit received by the coding unit, so that the coding unit can After the two are merged, subsequent data processing is performed.
可以理解到,可以将降采样元素图像输入到第一级编码单元之后的每一个编码单元(例如图2中的编码单元e2~e8),也可以仅输入到部分编码单元(例如图2中的编码单元e2和e7),只要降采样元素图像的空间尺寸与所接入的编码单元接收的上一级特征图的空间尺寸一致即可。例如,编码单元e1向编码单元e2输出的特征图的空间尺寸为128×128,则输入编码单元e2的降采样元素图像的空间尺寸也应处理为128×128。采用这种方式,在模型不同深度的层(对应到不同级数的编码单元)都可以补充初始元素图像特定尺寸的信息,从而有利于生成高质量的目标元素图像。It can be understood that the downsampled element image can be input to each coding unit after the first stage coding unit (for example, coding units e2 to e8 in FIG. 2), or can be input only to the partial coding unit (for example, in FIG. 2). The coding units e2 and e7) may be as long as the spatial size of the downsampled element image coincides with the spatial size of the upper-level feature map received by the accessed coding unit. For example, the spatial size of the feature map output by the encoding unit e1 to the encoding unit e2 is 128×128, and the spatial size of the downsampled element image of the input encoding unit e2 should also be processed to be 128×128. In this way, layers of different depths of the model (corresponding to coding units of different levels) can supplement the information of the specific size of the initial element image, thereby facilitating the generation of high quality target element images.
具体地,获取降采样元素图像时,可以采用双线性插值、单次插值、最临近插值等多种方式对初始元素图像进行降采样处理,本申请对此不做限定。Specifically, when the image of the downsampled element is obtained, the initial element image may be downsampled by using a plurality of methods such as bilinear interpolation, single interpolation, and nearest interpolation. This application does not limit this.
编码单元对接收到的特征图和降采样元素图像进行融合后进行后续处理。融合的具体方式可以有多种选择,例如,在特定维度上叠加,或者进行对应像素点的属性值的叠加运算即可。优选在特征图通道(channel)维度上进行叠加。The coding unit combines the received feature map and the downsampled element image for subsequent processing. The specific method of fusion may have various options, for example, superimposing on a specific dimension or performing a superposition operation on attribute values of corresponding pixel points. Preferably, the superposition is performed on the feature map channel dimension.
S1033:利用编码子模块向解码子模块输出多个特征图。S1033: Output multiple feature maps to the decoding submodule by using the encoding submodule.
如前所述,输出的特征图的空间尺寸相同的编码单元和解码单元存在对应关系,构成对称单元对,例如,<e1,d7>,<e3,d5>等。本申请实施例中,存在有对应关系的编码单元和解码单元的输出直接相连。因此,编码子模块中的各编码单元在输出特征图时,会将生成的特征图输出至其下一级编码单元(最后一级编码单元生成的特征图将输出至第一级解码单元),并输出至其相对应的解码单元。结合图2所示的模型框架图,具体如下:As described above, the coding unit and the decoding unit having the same spatial size of the output feature map have a corresponding relationship, and constitute a symmetric unit pair, for example, <e1, d7>, <e3, d5>, and the like. In the embodiment of the present application, the output unit of the corresponding relationship and the output of the decoding unit are directly connected. Therefore, each coding unit in the coding sub-module outputs the generated feature map to its next-stage coding unit when the feature map is output (the feature map generated by the last-stage coding unit is output to the first-stage decoding unit), And output to its corresponding decoding unit. Combined with the model framework diagram shown in Figure 2, the details are as follows:
第一级编码单元(具体化为编码单元e1)根据初始元素图像将生成第一特征图;第一级编码单元(具体化为e1)将第一特征图输出至第二级编码单元(具体化为e2)和与第一级编码单元相对应的解码单元(具体化为解码单元d7);The first level coding unit (concrete to the coding unit e1) generates a first feature map according to the initial element image; the first level coding unit (concrete to e1) outputs the first feature map to the second level coding unit (concrete E2) and a decoding unit corresponding to the first level coding unit (concrete to the decoding unit d7);
第K级编码单元(具体化为编码单元e2~e7中的任一个,例如编码单元e3)根据降采样元素图像和第K-1级编码单元(相对应的,具体化为编码单元e2)输出的第二特征图,生成第三特征图;第K级编码单元(例如编码单元e3)将第三特征图输出至第K+1级编码单元(相对应的,具体化为编码单元e4)以及与第K级编码单元(例如编码单元e3)相对应的解码单元(相对应的,具体化为解码单元d5);其中,K为大于1并且小于M的自然数,降采样元素图像与第二特征图的空间尺寸相同;The Kth order coding unit (concrete to any one of the coding units e2 to e7, for example, the coding unit e3) is output according to the downsampled element image and the K-1th coding unit (corresponding to the coding unit e2) a second feature map, generating a third feature map; the Kth level coding unit (eg, coding unit e3) outputs the third feature map to the K+1th level coding unit (corresponding to the coding unit e4) a decoding unit corresponding to a Kth-level coding unit (eg, coding unit e3) (corresponding to, specifically, decoding unit d5); wherein K is a natural number greater than 1 and less than M, a downsampled element image and a second feature The space of the figure is the same size;
利用第M级编码单元(具体化为编码单元e8)根据降采样元素图像和第M-1级编码单元(具体化为编码单元e7)输出的第四特征图,生成第五特征图,输出至解码子模块中的第一级解码单元(具体化为解码单元d1)。Generating, by using the Mth-level coding unit (concrete to the coding unit e8), the fifth feature map according to the downsampled element image and the fourth feature map output by the M-1th coding unit ( embodied as the coding unit e7), and outputting the fifth feature map to the The first stage decoding unit in the decoding submodule (concrete to the decoding unit d1).
S1035:利用解码子模块根据多个特征图,生成与初始元素图像相对应的目标元素图像。S1035: Generate a target element image corresponding to the initial element image according to the plurality of feature maps by using the decoding submodule.
解码子模块中的各级解码单元在接受上一级解码单元输出的特征图后进行处理,再将本级输出的特征图与相对应的编码单元传来的特征图进行融合,作为下一级解码单元的输入。结合图2所示的模型框架图,具体如下:The decoding units of each stage in the decoding sub-module are processed after receiving the feature map output by the decoding unit of the previous stage, and then the feature map outputted by the level is merged with the feature map transmitted by the corresponding coding unit, as the next level. The input of the decoding unit. Combined with the model framework diagram shown in Figure 2, the details are as follows:
第一级解码单元(具体化为解码单元d1)根据最后一级编码单元(具体化为编码单元e8)输出的第五特征图,生成第六特征图,输出至第二级解码单元(具体化为解码单元d2);The first level decoding unit (concrete to the decoding unit d1) generates a sixth feature map according to the fifth feature map output by the last level encoding unit ( embodied as the encoding unit e8), and outputs the sixth feature map to the second level decoding unit. Is the decoding unit d2);
解码子模块中的第L级解码单元(具体化为解码单元d2~d7中的任一个,例如解码单元d3)根据第L-1级解码单元(相对应的,具体化为解码单元d2)输出的第七特征图,生成第八特征图;第L级解码单元(例如解码单元d3)将第八特征图与第L级解码单元(例如解码单元d3)相对应的编码单元(相对应的,具体化为编码单元e5)输出的第九 特征图,在通道维度上进行叠加,生成第十特征图,输出至第L+1级解码单元(相对应的,具体化为解码单元d4);其中,L为大于1且小于M的自然数;The Lth stage decoding unit in the decoding submodule (specifically, any one of the decoding units d2 to d7, for example, the decoding unit d3) is output according to the L-1th stage decoding unit (corresponding to the decoding unit d2) a seventh feature map, generating an eighth feature map; the Lth level decoding unit (eg, decoding unit d3) is a coding unit corresponding to the eighth feature map and the Lth level decoding unit (eg, decoding unit d3) (corresponding, The ninth feature map outputted by the coding unit e5) is superimposed on the channel dimension to generate a tenth feature map, which is output to the L+1th level decoding unit (corresponding to the decoding unit d4); , L is a natural number greater than 1 and less than M;
利用解码子模块中的第M级解码单元根据第M-1级解码单元输出的第十一特征图,生成与初始元素图像相对应的目标元素图像。The target element image corresponding to the initial element image is generated by the Mth stage decoding unit in the decoding submodule according to the eleventh feature map output by the M-1th stage decoding unit.
采用上述方式,编码子模块中的编码单元接受上一级编码单元输出的特征图和对初始元素图像经过降采样处理后的降采样元素图像这两种输出信号,从而在编码子模块的不同阶段都有初始元素图像的信息流入。同时,解码子模块中的解码单元在接受上一级解码单元输出的特征图之外,还接受编码子模块中相对应的编码单元直接输出的特征图,从而更进一步降低了图像数据处理过程中的信息损耗。In the above manner, the coding unit in the coding sub-module accepts two kinds of output signals of the feature image output by the upper-level coding unit and the down-sampled element image after the down-sample processing of the initial element image, thereby being at different stages of the coding sub-module. Information with initial element images flows in. At the same time, the decoding unit in the decoding sub-module accepts the feature map directly output by the corresponding coding unit in the coding sub-module, in addition to the feature map output by the upper-level decoding unit, thereby further reducing the image data processing process. Information loss.
可以理解到,上述对特征图的编号(第一特征图~第十一特征图),仅仅出于方便描述的目的,并不对特征图本身构成限定。It can be understood that the above-mentioned numbering of the feature maps (first to eleventh feature maps) is for convenience of description only, and does not limit the feature map itself.
采用本申请实施例提供的技术方案,在确定初始元素图像后,可采用训练好的元素图像生成模型生成与初始元素图像相对应的目标元素图像,具体地,可以利用元素图像生成模型根据初始元素图像以及对初始元素图像经过降采样处理后的降采样元素图像,生成上述目标元素图像。采用训练好的元素图像生成模型,可以根据初始元素图像自动的生成目标元素图像,从而能够高效的根据初始元素图像扩充出不同风格的目标元素图像,提高了构建元素图像库的效率。除此之外,在元素图像生成模型中除接受原始的初始元素图像之外,还接受对初始元素图像经过降采样处理后的降采样元素图像作为补充信息以生成特征图,因此,降低了数据处理过程中的信息损耗,从而有利于提高生成的目标元素图像的准确性。With the technical solution provided by the embodiment of the present application, after determining the initial element image, the trained element image generation model may be used to generate a target element image corresponding to the initial element image. Specifically, the element image generation model may be used according to the initial element. The image and the downsampled element image subjected to the downsampling process on the initial element image generate the target element image. By using the trained element image generation model, the target element image can be automatically generated according to the initial element image, so that the target element image of different styles can be efficiently expanded according to the initial element image, and the efficiency of constructing the element image library is improved. In addition, in the element image generation model, in addition to accepting the original initial element image, the downsampled element image after the downsample processing of the initial element image is accepted as supplementary information to generate a feature map, thereby reducing the data. The loss of information during processing facilitates the accuracy of the generated image of the target element.
以上举例阐述了如何利用训练好的元素图像生成模型生成目标元素图像的具体实施示例。可以理解到,在此之前,可以对元素图像生成模型进行训练,使之满足使用需求。其中,可具体训练包含元素图像生成模块和判别模块的元素图像生成网络,参见图8所示。元素图像生成模块(以下可简称为生成模块)用于基于原始图像生成与原始图像相对应的生成图像;判别模块用于判别生成图像与真实样本图像之间的差异程度,并根据差异程度调整元素图像生成网络中的参数;其中,真实样本图像与原始图像相对应,构成元素图像对。The above example illustrates a specific implementation example of how to generate a target element image using the trained element image generation model. It can be understood that before this, the element image generation model can be trained to meet the usage requirements. The element image generation network including the element image generation module and the discrimination module may be specifically trained, as shown in FIG. 8. An element image generating module (hereinafter may be simply referred to as a generating module) is configured to generate a generated image corresponding to the original image based on the original image; the discriminating module is configured to discriminate the degree of difference between the generated image and the real sample image, and adjust the element according to the degree of difference The parameters in the image generation network; wherein the real sample image corresponds to the original image, constituting an element image pair.
由于原始图像经过生成模块的编码和解码后会生成一张新的图像,也就是生成图像。在训练过程中,由判别模块来分辨输入的图像是真实样本图像还是模型生成的生成图像。判别模块的结构可如图9所示。图9所示例的判别模块中,生成图像和真实样本图像进 入判别模块,前面的两层是卷积层Conv和漏改进线性单元层LReLU,随后由3个[卷积层Conv-批零归一化层BN-漏改进线性单元层LReLU]结构串联而成。随后是一个全连接层FC,最后是Sigmoid层,用于将结果映射到0和1之间,体现生成图像是真实样本图像的概率。需要说明的是,在具体实施时,中间连接2个或4个[卷积层Conv-批零归一化层BN-漏改进线性单元层LReLU]结构也都可以,只要判别模块的结构不太复杂,能够满足需求即可。Since the original image is encoded and decoded by the generation module, a new image is generated, that is, the image is generated. During the training process, the discriminating module determines whether the input image is a real sample image or a generated image generated by the model. The structure of the discriminating module can be as shown in FIG. In the discriminating module of the example shown in FIG. 9, the generated image and the real sample image enter the discriminating module, and the first two layers are the convolution layer Conv and the leak improving linear unit layer LReLU, and then are normalized by three [convolution layer Conv-batch zeros). The layer BN-drain improved linear unit layer LReLU] is formed by connecting in series. This is followed by a fully connected layer FC, and finally a Sigmoid layer, which maps the result between 0 and 1, reflecting the probability that the resulting image is a real sample image. It should be noted that, in the specific implementation, two or four [convolution layer Conv-batch zero-normalization layer BN-leak improved linear unit layer LReLU] structures may be connected in the middle, as long as the structure of the discriminating module is not complicated. Can meet the needs.
具体地,训练元素图像生成网络的过程,可具体包括:Specifically, the process of training the element image generation network may specifically include:
确定元素图像对,元素图像对中包括原始图像和与原始图像相对应的真实样本图像;Determining an element image pair including an original image and a real sample image corresponding to the original image;
将原始图像输入生成模块,利用生成模块得到与生成模块相对应的生成图像;Inputting the original image into the generating module, and using the generating module to obtain a generated image corresponding to the generating module;
将生成图像和真实样本图像作为训练样本输入判别模块,利用判别模块确定生成图像与真实样本图像之间的差异程度;其中,生成图像标记为负类,真实样本图像标记为正类;The generated image and the real sample image are used as a training sample input discriminating module, and the discriminating module determines the degree of difference between the generated image and the real sample image; wherein the generated image is marked as a negative class, and the real sample image is marked as a positive class;
根据差异程度,交替调整生成模块和判别模块中的参数,直至差异程度满足预设条件。According to the degree of difference, the parameters in the generating module and the discriminating module are alternately adjusted until the degree of difference satisfies the preset condition.
在整个网络(可以理解为机器学习模型)的训练过程中,生成模块的目标是使得生成的生成图像尽可能真实,以至于可以“骗过”判别模块(也就是使得判别模块认为生成图像与真实样本图像没有差异,或者差异足够小)。而判别模块的目标是正确分辨出真实样本图像和生成图像。因此,在训练时可以对生成模块和判别模块进行交替训练。具体如下:In the training process of the entire network (which can be understood as a machine learning model), the goal of the generation module is to make the generated generated image as real as possible, so that the discriminating module can be "spoofed" (that is, the discriminating module considers the generated image and the real There is no difference in the sample image, or the difference is small enough). The goal of the discriminating module is to correctly distinguish the real sample image and the generated image. Therefore, the training module and the discriminating module can be alternately trained during training. details as follows:
首先,系统初始化生成模块(Generator)和判别模块(Discriminator),分别记为G0和D0;系统接受一个batch(也就是一个批次)的输入图片(输入的数量为batch size的取值),其中每个输入都是一对图,即原始图像和对应的真实样本图像。然后,将原始图像送入生成模块G0,经过一系列数据处理后生成一张新的图片,也就是生成图像。First, the system initializes the generation module (Generator) and the discriminator module (Discriminator), which are respectively recorded as G0 and D0; the system accepts an input picture of a batch (that is, a batch) (the number of inputs is the value of batch size), wherein Each input is a pair of maps, the original image and the corresponding real sample image. Then, the original image is sent to the generating module G0, and after a series of data processing, a new image is generated, that is, an image is generated.
其次,将真实样本图像标记为正类,生成图像标记为负类,二者作为训练样本输入给判别模块D0。此时固定生成模块G0,根据损失函数的计算结果更新判别模块D0的参数,使得D0更新到一个新的状态,记为D1。然后固定D1,根据损失函数的计算结果更新生成模块G0的参数,使得G0更新到一个新的状态,记为G1。如此往复,在整个训练过程中交替训练生成模块G和判别模块D,使得损失函数的计算结果满足预设要求,二者达到最佳状态。Secondly, the real sample image is marked as a positive class, and the generated image is marked as a negative class, and the two are input as a training sample to the discriminating module D0. At this time, the generating module G0 is fixed, and the parameter of the discriminating module D0 is updated according to the calculation result of the loss function, so that D0 is updated to a new state and recorded as D1. Then D1 is fixed, and the parameters of the generating module G0 are updated according to the calculation result of the loss function, so that G0 is updated to a new state, which is denoted as G1. In this way, the generating module G and the discriminating module D are alternately trained throughout the training process, so that the calculation result of the loss function satisfies the preset requirement, and the two achieve the optimal state.
可以理解到,在按照上述方式训练好模型后,即可将初始元素图像作为训练过程中 的原始图像输入模型,模型生成的生成图像即为用户期望得到的目标元素图像。It can be understood that after the model is trained in the above manner, the initial element image can be input into the model as the original image in the training process, and the generated image generated by the model is the target element image desired by the user.
在训练模型的过程中,采用非监督式的学习方法,通过让生成模块和判别模块这两个神经网络相互博弈的方式进行学习。生成模块的输出结果需要尽量模仿训练集中的真实样本图像;而判别模块的目的是分辨真实样本图像和生成图像。两个模块相互对抗,不断调整参数,最终目的是使得判别模块无法判断生成模块的输出结果是否真实。In the process of training the model, an unsupervised learning method is adopted, and the two neural networks, the generation module and the discriminating module, are learned from each other. The output of the generated module needs to mimic the real sample image in the training set as much as possible; and the purpose of the discriminating module is to distinguish the real sample image from the generated image. The two modules compete against each other and constantly adjust the parameters. The final purpose is to make the discriminating module unable to judge whether the output result of the generating module is true.
可以理解到,在模型训练的过程中,可根据生成图像和真实样本图像计算损失函数,从而根据损失函数的计算结果输出判别结果,反映生成图像与真实样本图像之间的差异程度。并可进而根据损失函数的计算结果,调整元素图像生成模型中的参数。It can be understood that, in the process of model training, the loss function can be calculated according to the generated image and the real sample image, thereby outputting the discrimination result according to the calculation result of the loss function, reflecting the degree of difference between the generated image and the real sample image. The parameters in the element image generation model can be adjusted according to the calculation result of the loss function.
在本申请实施例所采用的元素图像生成网络中,损失函数可以有多种选择。例如,可以包括对抗损失函数。对抗损失函数用于反映生成模块对降低差异程度的贡献程度和判别模块对提高差异程度的贡献程度。可以理解到,模型中的生成模块不断生成新的生成图像,希望通过判别模块的评估。而判别模块希望能够正确地分辨生成的生成图像(被标记为负类)和真实样本图像(被标记为正类)。因此,对抗损失函数可以用如下公式表示:In the element image generation network employed in the embodiments of the present application, the loss function can have various options. For example, an anti-loss function can be included. The confrontation loss function is used to reflect the contribution of the generation module to the degree of difference reduction and the contribution of the discriminant module to the degree of difference. It can be understood that the generation module in the model continuously generates a new generated image, and it is hoped to pass the evaluation of the discriminating module. The discriminating module hopes to correctly resolve the generated generated image (marked as a negative class) and the real sample image (marked as a positive class). Therefore, the anti-loss function can be expressed by the following formula:
Figure PCTCN2019081217-appb-000005
Figure PCTCN2019081217-appb-000005
其中,pdata(x)代表目标数据分布,pz(z)是一种原始数据分布。在图像生成任务中,pdata(x)代表将输入判别模块的真实样本图像,pz(z)是输入生成模块的原始图像,G(z)代表通过模型生成、将输入判别模块的生成图像。Where pdata(x) represents the target data distribution and pz(z) is a raw data distribution. In the image generation task, pdata(x) represents the real sample image to be input to the discriminating module, pz(z) is the original image of the input generating module, and G(z) represents the generated image generated by the model and input into the discriminating module.
由于生成模块G和判别模块D是通过互相对抗的方式交替训练的。生成模块G不断生成新的生成图像,试图通过判别模块D的评测;判别模块D试图准确的分辨真实样本图像和生成图像。因此,在训练模型的过程中,判别模块D对真实样本图像会给一个较高的分数,对生成的生成图像会给一个较低的分数,即D有最大化D(x)、最小化D(G(z))的趋势。因此判别模块D将最大化对抗损失函数V(D,G)。而生成模块G试图生成真实的图像,因此有最大化D(G(z))的趋势,因此会将最小化对抗损失函数V(D,G)。Since the generation module G and the discrimination module D are alternately trained in a mutually opposing manner. The generating module G continuously generates a new generated image in an attempt to pass the evaluation of the discriminating module D; the discriminating module D attempts to accurately distinguish the real sample image and generate the image. Therefore, in the process of training the model, the discriminating module D gives a higher score to the real sample image, and gives a lower score to the generated generated image, that is, D has a maximum D(x), minimizes D. (G(z)) trend. Therefore the discriminating module D will maximize the anti-loss function V(D, G). The generation module G attempts to generate a real image, so there is a tendency to maximize D(G(z)), thus minimizing the anti-loss function V(D, G).
除此之外,还可以引入特征空间损失函数(Perceptual Loss,又可称为感知Loss)作为模型训练中的损失函数。特征空间损失函数用于反映生成图像与真实样本图像之间在特征空间上的差异。特征空间是指将图像通过一个深度神经网络后生成富含语义信息的高维特征所对应的空间。和像素空间不同,特征空间的向量含有经过抽象的更高级的语义信息,因此可以用于衡量生成图像和真实样本图像中间的差异。In addition, the feature space loss function (Perceptual Loss, also known as Perceptual Loss) can be introduced as a loss function in model training. The feature space loss function is used to reflect the difference in feature space between the generated image and the real sample image. The feature space refers to the space corresponding to the high-dimensional features rich in semantic information after the image is passed through a deep neural network. Unlike the pixel space, the vector of the feature space contains more advanced semantic information, so it can be used to measure the difference between the generated image and the real sample image.
在具体实现中,可以采用AlexNet、ResNet、VGG19等多种方式计算感知loss。此处以VGG19这种方式为例详细说明。In the specific implementation, the perceived loss can be calculated by using various methods such as AlexNet, ResNet, and VGG19. Here, the VGG19 is taken as an example for detailed description.
可以选择在ImageNet数据集(一个包含1000类的超大规模的图像数据集)上已经训练好的VGG19模型。VGG19是一个典型的神经网络模型,由多个卷积层Conv、池化层pooling和全连接层FC构成。其中卷积层部分的结构参见图10所示。You can choose the VGG19 model that has been trained on the ImageNet dataset (a very large image dataset containing 1000 classes). VGG19 is a typical neural network model consisting of multiple convolutional layers Conv, pooling layer pooling and fully connected layer FC. The structure of the convoluted layer portion is shown in Fig. 10.
在训练过程中,分别将生成图像和真实样本图像通过VGG19网络,选取某些卷积层输出的特征,计算两种特征之间的L1距离作为感知loss。VGG19模型有很多卷积层,因为也有很多的选择方式。推荐选择不同深度的卷积层,例如图10中示例的,可以选择卷积层Conv1_2、Conv2_2、Conv3_2、Conv_4_2和Conv5_2这个五个卷积层输出的特征计算感知loss。计算公式表达如下:In the training process, the generated image and the real sample image are respectively passed through the VGG19 network, and the features of some convolutional layers are selected, and the L1 distance between the two features is calculated as the perceived loss. The VGG19 model has many convolutional layers because there are also many options. It is recommended to select convolutional layers of different depths. For example, as illustrated in Figure 10, the characteristics of the five convolutional layer outputs of the convolutional layers Conv1_2, Conv2_2, Conv3_2, Conv_4_2, and Conv5_2 can be selected to calculate the perceived loss. The calculation formula is expressed as follows:
Figure PCTCN2019081217-appb-000006
Figure PCTCN2019081217-appb-000006
其中,Φ表示VGG19网络模型。l是选定的卷积层,real表示真实样本图像,fake表示生成图像。λ l是每一个卷积层计算得到感知loss的权重。||.|| 1表示计算L1距离。最后整体的感知损失Ploss是每个卷积层计算感知loss的加权和。每一层的权重可以有不同的设定方式。考虑大越靠近输出端,卷积层捕捉到的信息越接近底层,抽象程度越高,因此,推荐采用靠近输入端的卷积层权重小于靠近输出端的卷积层的权重的原则确定每一层的权重。例如,在图10给出的示例中,权重可设置为:λ 1=λ 2=λ 3=λ 4=1,λ 5=10。 Where Φ represents the VGG19 network model. l is the selected convolutional layer, real represents the real sample image, and fake represents the generated image. λ l is the weight at which each convolution layer calculates the perceived loss. ||.|| 1 means calculating the L1 distance. Finally, the overall perceived loss Ploss is the weighted sum of the perceptual layers calculated perceptual loss. The weight of each layer can be set differently. Considering that the closer to the output, the closer the information captured by the convolutional layer is to the bottom layer, the higher the degree of abstraction. Therefore, it is recommended to use the principle that the weight of the convolution layer near the input is smaller than the weight of the convolution layer near the output to determine the weight of each layer. . For example, in the example given in Figure 10, the weights can be set to: λ 1 = λ 2 = λ 3 = λ 4 =1, λ 5 = 10.
进一步地,还可以采用像素空间损失函数反映目标元素图像与初始元素图像之间在对应像素点上的差异。由于元素图像生成模块的目的是生成的生成图像尽可能和真实样本图像一样。因此要衡量生成图像和真实样本图像之间的差异,也可以比较二者对应像素点之间的差异,即像素空间内的差异。具体地,可以计算真实样本图像和生成图像之间的L1距离,作为像素空间损失函数的计算结果。Further, the pixel space loss function may also be used to reflect the difference between the target element image and the initial element image at the corresponding pixel point. Since the purpose of the element image generation module is to generate a generated image as much as possible as a real sample image. Therefore, to measure the difference between the generated image and the real sample image, you can also compare the difference between the corresponding pixel points, that is, the difference in the pixel space. Specifically, the L1 distance between the real sample image and the generated image can be calculated as a calculation result of the pixel space loss function.
除此之外,还可以采用类别损失函数反映目标元素图像与初始元素图像之间在类别上的差异。对元素图像生成模型的训练方式有多种,包括单阶段训练方式和多阶段训练方式。以训练字体生成模型为例,单阶段训练,是指直接用一种源字体生成目标字体。多阶段训练方式分为预训练阶段和再训练阶段。多阶段训练在预训练阶段时,将固定采用一种源字体,用这种源字体生成多种目标字体;在随后的再训练阶段中,再用这种源 字体生成一种目标字体,这种方式称为一对多的训练方式。In addition to this, the category loss function can also be used to reflect the difference in categories between the target element image and the initial element image. There are many training methods for the element image generation model, including single-stage training mode and multi-stage training mode. Taking the training font generation model as an example, single-stage training refers to directly generating a target font using a source font. The multi-stage training method is divided into a pre-training phase and a re-training phase. Multi-stage training In the pre-training phase, a source font is fixed, and a plurality of target fonts are generated using the source font; in the subsequent retraining phase, the source font is used to generate a target font. The method is called a one-to-many training method.
由于在多阶段训练中,会要涉及到多种字体,因此希望判别模块在分辨图像是真实样本图像还是生成图像之外,还能对图像(字体)的类别做出正确的预测。因此引入了类别损失函数,具体体现为真实类别和预测类别之间的交叉熵,如下公式所示:Since multiple fonts are involved in multi-stage training, it is desirable that the discriminating module can correctly predict the type of image (font) in addition to whether the image is a real sample image or an image. Therefore, the category loss function is introduced, which is embodied as the cross entropy between the real category and the predicted category, as shown in the following formula:
Figure PCTCN2019081217-appb-000007
Figure PCTCN2019081217-appb-000007
在上述训练元素图像生成模型之后,还可以对生成的生成图像的质量进行测试、评价。例如,可以根据以下一项或多项指标确定生成图像与真实样本图像的匹配程度:After the training element image generation model described above, the quality of the generated generated image can also be tested and evaluated. For example, the degree to which the generated image matches the real sample image can be determined based on one or more of the following metrics:
生成图像与真实样本图像之间的L1距离;Generating the L1 distance between the image and the real sample image;
生成图像与真实样本图像之间的峰值信噪比PSNR;Generating a peak signal to noise ratio PSNR between the image and the real sample image;
生成图像与真实样本图像之间的结构相似度SSIM。A structural similarity SSIM between the generated image and the real sample image.
具体地,可以计算测试集中所有真实样本图像和生成图像之间的L1距离,并取平均值。可以理解到,L1距离越小,说明生成图像与真实样本图像越接近,表示生成图像的质量越高。Specifically, the L1 distance between all the real sample images and the generated images in the test set can be calculated and averaged. It can be understood that the smaller the L1 distance, the closer the generated image is to the real sample image, indicating that the quality of the generated image is higher.
具体地,可以计算测试集中所有真实样本图像和生成图像之间的峰值信噪比PSNR。PSNR是一种常见的衡量图像质量的方法,其计算公式为:Specifically, the peak signal to noise ratio PSNR between all real sample images and generated images in the test set can be calculated. PSNR is a common method of measuring image quality, and its calculation formula is:
Figure PCTCN2019081217-appb-000008
Figure PCTCN2019081217-appb-000008
其中,I、J表示两幅图像(具体为生成图像和真实样本图像),||.|| 2表示L2距离,P表示峰值,一般3通道8比特的图像P=255。可以理解到,峰值信噪比PSNR越高,说明生成字体的质量越好。 Wherein, I and J represent two images (specifically, a generated image and a real sample image), ||.|| 2 represents an L2 distance, P represents a peak value, and generally a 3-channel 8-bit image P=255. It can be understood that the higher the peak signal to noise ratio PSNR, the better the quality of the generated font.
具体地,可以计算生成图像与真实样本图像之间的结构相似度SSIM。结构相似度SSIM从结构相关性、对比度、亮度三个角度衡量两幅图像(也就是生成图像和真实样本图像)之间的差异。计算两幅图像的SSIM公式可表达为:Specifically, the structural similarity SSIM between the generated image and the real sample image can be calculated. Structural Similarity SSIM measures the difference between two images (ie, the generated image and the real sample image) from three perspectives: structural correlation, contrast, and brightness. The SSIM formula for calculating two images can be expressed as:
SSIM=l(X,Y)*c(X,Y)*s(X,Y)SSIM=l(X,Y)*c(X,Y)*s(X,Y)
Figure PCTCN2019081217-appb-000009
Figure PCTCN2019081217-appb-000009
其中,μ x、μ y、σ x、σ y为图像X、Y的均值和标差,σ xy为二者的协方差。 C 1、C 2、C 3为常数。 Where μ x , μ y , σ x , σ y are the mean and the standard deviation of the images X and Y, and σ xy is the covariance of the two. C 1 , C 2 and C 3 are constants.
在具体实施时,优选结合上述多种指标对生成的生成图像的质量进行测试和评价。下面以元素图像具体化为文字的字体为例,说明字体质量的评测过程。在准备阶段,可以准备一个在真实字体数据集上训练好的字体识别性能较好的单字识别模型,以下将基于单字识别模型进行字体质量的评测。In a specific implementation, it is preferable to test and evaluate the quality of the generated generated image in combination with the above various indicators. The following is an example of a font that is embodied as a text in an element image to illustrate the evaluation process of the font quality. In the preparation stage, a single word recognition model with good font recognition performance trained on the real font data set can be prepared. The following will be based on the single word recognition model for font quality evaluation.
第一步,用单字模型识别采用本申请实施例生成的字体图像(也就是生成图像)。如果可以正确识别生成的字,说明新生成的字在字形、笔画、结构等方面均已经初步正确。根据识别结果可以过滤一部分未正确识别的字。In the first step, the font image generated by the embodiment of the present application (that is, the generated image) is identified by a single word model. If the generated words can be correctly identified, the newly generated words have been initially correct in terms of glyphs, strokes, and structures. A portion of the words that are not correctly recognized can be filtered based on the recognition result.
第二步,对于通过单子识别模型识别的字,可以进一步计算每一个生成字的L1距离、PSNR以及SSIM这三种指标,得到这三种指标的结果分布。之后用每一种分布的平均值做为阈值:对于PSNR以及SSIM这两种指标的结果分布,滤除低于平均值的字;对L1距离这一指标的结果分布,滤除高于平均值的字。In the second step, for the words identified by the single-sub-recognition model, the three indicators of L1 distance, PSNR and SSIM of each generated word can be further calculated, and the distribution of the results of the three indicators is obtained. Then use the average of each distribution as the threshold: for the distribution of the results of the PSNR and SSIM indicators, filter out the words below the average; the distribution of the results for the L1 distance indicator, filter out the average Word.
第三步,对于通过前两步筛选的所有生成字进行一次人工评估,评估出人主观认为的质量好的字,从而可以调整每个指标阈值的大小,进而通过这个调整后的阈值决定哪些字是质量好的字。In the third step, a manual evaluation is performed on all the generated words filtered by the first two steps, and the quality of the subject is considered to be good, so that the threshold of each index can be adjusted, and then the words are determined by the adjusted threshold. It is a good quality word.
第四步,对验证数据集中质量好的字的个数或者在验证数据集中所占的比例衡量生成字体的整体质量,个数越多和/或比例越高,说明生成字体质量的越高,字体生成模型越好。The fourth step is to measure the total number of good quality words in the data set or the proportion of the generated data in the verification data set. The more the number and/or the higher the ratio, the higher the quality of the generated font. The better the font generation model.
上述对字体质量的评测方式,并没有在一个样本集上计算一个指标,而是从所有生成的字在某种特定指标上的分布入手,既结合了单字识别模型对字形、笔画等结构上的识别能力做筛选,又引入人工评判以调整每个指标的阈值,从而实现了对生成字体的质量的一种交互式评价,结合了主观评测和客观评测的优点,能够较为准确全面的反映模型生成字体的整体质量情况。在此基础上,还可以根据评测情况对模型中的参数进行调整。The above method of evaluating font quality does not calculate an indicator on a sample set, but starts from the distribution of all generated words on a certain index, combining the structure of the single word recognition model on the structure of the font, the stroke, and the like. The recognition ability is used for screening, and manual evaluation is introduced to adjust the threshold of each index, thereby realizing an interactive evaluation of the quality of the generated fonts, combining the advantages of subjective evaluation and objective evaluation, and being able to reflect the model generation more accurately and comprehensively. The overall quality of the font. On this basis, the parameters in the model can also be adjusted according to the evaluation situation.
以上主要举例介绍了本申请实施例提供的元素图像生成方法的具体实施过程。当元素图像具体为汉字字体时,本申请实施例提供的元素图像生成方法具体化为一种汉字字体图像生成方法,可以包括以下步骤:The specific implementation process of the element image generating method provided by the embodiment of the present application is mainly described above. When the element image is specifically a Chinese character font, the element image generating method provided by the embodiment of the present application is embodied as a Chinese character font image generating method, which may include the following steps:
基于初始汉字字体图像生成第一特征图;Generating a first feature map based on the initial Chinese font image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于第一特征图和多个第二特征图中至少一个第二特征图,生成与初始汉字字体图像相对应的目标汉字字体图像;Generating a target Chinese character font image corresponding to the initial Chinese character font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,迭代生成第二特征图的步骤包括至少一次如下步骤:The step of iteratively generating the second feature map includes the following steps at least once:
基于第一特征图和降采样汉字字体图像,生成第二特征图,降采样汉字字体图像由初始汉字字体图像经过将采样处理后得到,降采样汉字字体图像与第一特征图的空间尺寸相匹配。The second feature map is generated based on the first feature map and the downsampled Chinese font image, and the downsampled Chinese font image is obtained by sampling the initial Chinese font image, and the downsampled Chinese font image matches the spatial size of the first feature map. .
能够理解到,前述元素图像生成方法的实施例中的相关描述均适用于本汉字字体图像生成方法,此处不再赘述。It can be understood that the related description in the foregoing embodiments of the element image generating method is applicable to the Chinese character font image generating method, and details are not described herein again.
当元素图像具体为文字字体时,本申请实施例提供的元素图像生成方法具体化为一种文字字体图像生成方法,可以包括以下步骤:When the element image is specifically a text font, the element image generating method provided by the embodiment of the present application is embodied as a text font image generating method, and may include the following steps:
基于初始文字字体图像生成第一特征图;Generating a first feature map based on the initial text font image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于第一特征图和多个第二特征图中至少一个第二特征图,生成与初始文字字体图像相对应的目标文字字体图像;Generating a target text font image corresponding to the initial text font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,迭代生成第二特征图的步骤包括至少一次如下步骤:The step of iteratively generating the second feature map includes the following steps at least once:
基于第一特征图和降采样文字字体图像,生成第二特征图,降采样文字字体图像由初始文字字体图像经过将采样处理后得到,降采样文字字体图像与第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled text font image, wherein the downsampled text font image is obtained by sampling the initial text font image, and the downsampled text font image matches the spatial size of the first feature image .
能够理解到,前述元素图像生成方法的实施例中的相关描述均适用于本文字字体图像生成方法,此处不再赘述。It can be understood that the related description in the foregoing embodiments of the element image generating method is applicable to the text font image generating method, and details are not described herein again.
相对应的,本申请实施例还提供了一种元素图像生成装置,参见图11所示,包括:Correspondingly, the embodiment of the present application further provides an element image generating apparatus, as shown in FIG.
特征图第一生成单元101,用于基于初始元素图像生成第一特征图;a feature map first generating unit 101, configured to generate a first feature map based on the initial element image;
特征图第二生成单元103,用于基于第一特征图,生成第二特征图;a feature map second generating unit 103, configured to generate a second feature map based on the first feature map;
目标元素图像生成单元105,用于基于第一特征图和多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像;The target element image generating unit 105 is configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,所述特征图第二生成单元103中的至少一个,还用于:The at least one of the feature map second generating unit 103 is further configured to:
基于第一特征图和降采样元素图像,生成第二特征图;所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image; the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.
上述元素图像生成装置与前述实施例中元素图像生成方法相对应。前述方法实施例中的描述均适用于本装置,此处不再赘述。The above element image generating device corresponds to the element image generating method in the foregoing embodiment. The description in the foregoing method embodiments is applicable to the device, and details are not described herein again.
同时,本申请实施例给出了各步骤的具体实现方式。能够理解到,各步骤也可以采用其它的方式实现,本申请实施例对此不作限制。At the same time, the specific implementation manner of each step is given in the embodiment of the present application. It can be understood that the steps may be implemented in other manners, which is not limited in this embodiment of the present application.
本申请实施例还提供了一种元素图像生成系统,参见图12所示,该系统包括元素图像生成模块,元素图像生成模块中包括编码子模块和解码子模块;编码子模块中包括逐级相连的M级编码单元,解码子模块中包括逐级相连的M级解码单元,M为自然数;其中,The embodiment of the present application further provides an element image generating system. As shown in FIG. 12, the system includes an element image generating module. The element image generating module includes an encoding sub-module and a decoding sub-module. The encoding sub-module includes a step-by-step connection. The M-level coding unit, the decoding sub-module includes M-level decoding units connected step by step, and M is a natural number;
第一级编码单元,用于基于初始元素图像生成第一特征图;a first level coding unit, configured to generate a first feature map based on the initial element image;
第二级编码单元至第M级编码单元,用于基于上一级编码单元生成的特征图,生成第二特征图;a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;
解码子模块,用于基于第一特征图和多个第二特征图中至少一个第二特征图,生成与初始元素图像相对应的目标元素图像。And a decoding submodule, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.
优选的,上述元素图像生成系统还包括:Preferably, the above element image generating system further includes:
元素图像判别模块,用于判别生成图像与真实样本图像之间的差异程度;An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image;
其中,among them,
生成图像由元素图像生成模块基于原始图像生成;The generated image is generated by the element image generation module based on the original image;
真实样本图像与原始图像相对应,构成元素图像对;The real sample image corresponds to the original image and constitutes an element image pair;
差异程度用于交替调整元素图像生成模块和判别模块中的参数,使得差异程度满足预设条件。The degree of difference is used to alternately adjust the parameters in the element image generating module and the discriminating module so that the degree of difference satisfies the preset condition.
可以理解到,上述元素图像生成系统与前述实施例中元素图像生成方法相对应。前述方法实施例中的描述均适用于本系统,此处不再赘述。It can be understood that the above-described element image generating system corresponds to the element image generating method in the foregoing embodiment. The descriptions in the foregoing method embodiments are applicable to the system, and are not described herein again.
图13是本申请的一个实施例电子设备的结构示意图。请参考图13,在硬件层面,该电子设备包括处理器,可选地还包括内部总线、网络接口、存储器。其中,存储器可能包含内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少1个磁盘存储器等。当然,该电子设备还可能包括其他业务所需要的硬件。FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to FIG. 13, at the hardware level, the electronic device includes a processor, optionally including an internal bus, a network interface, and a memory. The memory may include a memory, such as a high-speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory. Of course, the electronic device may also include hardware required for other services.
处理器、网络接口和存储器可以通过内部总线相互连接,该内部总线可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。The processor, the network interface, and the memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended) Industry Standard Architecture, extending the industry standard structure) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one double-headed arrow is shown in Figure 13, but it does not mean that there is only one bus or one type of bus.
存储器,用于存放程序。具体地,程序可以包括程序代码,所述程序代码包括计算机操作指令。存储器可以包括内存和非易失性存储器,并向处理器提供指令和数据。Memory for storing programs. In particular, the program can include program code, the program code including computer operating instructions. The memory can include both memory and non-volatile memory and provides instructions and data to the processor.
处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成元素图像生成装置。处理器,执行存储器所存放的程序,并具体用于执行以下操作:The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to form an element image generating device on a logical level. The processor executes the program stored in the memory and is specifically used to perform the following operations:
基于初始元素图像生成第一特征图;Generating a first feature map based on the initial element image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于第一特征图和多个第二特征图中至少一个第二特征图,生成与初始元素图像相对应的目标元素图像;Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,迭代生成第二特征图的步骤包括至少一次如下步骤:The step of iteratively generating the second feature map includes the following steps at least once:
基于第一特征图和降采样元素图像,生成第二特征图,降采样元素图像由初始元素图像经过将采样处理后得到,降采样元素图像与第一特征图的空间尺寸相匹配。A second feature map is generated based on the first feature map and the downsampled element image, and the downsampled element image is obtained by sampling the initial element image, and the downsampled element image matches the spatial size of the first feature map.
上述如本申请图1所示实施例揭示的元素图像生成装置执行的方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机 存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。The method performed by the element image generating apparatus disclosed in the embodiment shown in FIG. 1 of the present application may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software. The above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; or may be a digital signal processor (DSP), dedicated integration. Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software modules can be located in random memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, etc., which are well established in the art. The storage medium is located in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the above method.
该电子设备还可执行图1中元素图像生成装置执行的方法,并实现元素图像生成装置在图1所示实施例的功能,本申请实施例在此不再赘述。The electronic device can also perform the method performed by the element image generating device in FIG. 1 and implement the functions of the element image generating device in the embodiment shown in FIG. 1. The embodiments of the present application are not described herein again.
本申请实施例还提出了一种计算机可读存储介质,该计算机可读存储介质存储一个或多个程序,该一个或多个程序包括指令,该指令当被包括多个应用程序的电子设备执行时,能够使该电子设备执行图1所示实施例中元素图像生成装置执行的方法,并具体用于执行:The embodiment of the present application further provides a computer readable storage medium storing one or more programs, the one or more programs including instructions that are executed by an electronic device including a plurality of applications The electronic device can be caused to perform the method performed by the element image generating apparatus in the embodiment shown in FIG. 1, and is specifically configured to execute:
基于初始元素图像生成第一特征图;Generating a first feature map based on the initial element image;
基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
将第二特征图作为新的第一特征图,迭代执行生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
基于第一特征图和多个第二特征图中至少一个第二特征图,生成与初始元素图像相对应的目标元素图像;Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
其中,迭代生成第二特征图的步骤包括至少一次如下步骤:The step of iteratively generating the second feature map includes the following steps at least once:
基于第一特征图和降采样元素图像,生成第二特征图,降采样元素图像由初始元素图像经过将采样处理后得到,降采样元素图像与第一特征图的空间尺寸相匹配。A second feature map is generated based on the first feature map and the downsampled element image, and the downsampled element image is obtained by sampling the initial element image, and the downsampled element image matches the spatial size of the first feature map.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device including the element.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可 用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above description is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims (19)

  1. 一种元素图像生成方法,其特征在于,包括:An element image generating method, comprising:
    基于初始元素图像生成第一特征图;Generating a first feature map based on the initial element image;
    基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
    将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
    基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像;Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
    其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
    基于第一特征图和降采样元素图像,生成第二特征图,所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  2. 根据权利要求1所述方法,其特征在于,所述方法由元素图像生成模块执行;所述元素图像生成模块中包括编码子模块,所述编码子模块中包括逐级相连的M级编码单元,M为自然数;The method according to claim 1, wherein the method is performed by an element image generating module; the element image generating module includes an encoding sub-module, and the encoding sub-module includes M-level coding units connected in stages, M is a natural number;
    所述基于初始元素图像生成第一特征图的步骤,由第一级编码单元执行;The step of generating a first feature map based on the initial element image is performed by the first level coding unit;
    所述生成第二特征图的步骤,由第二级编码单元至第M级编码单元执行。The step of generating the second feature map is performed by the second level coding unit to the Mth level coding unit.
  3. 根据权利要求2所述方法,其特征在于,所述基于第一特征图和降采样元素图像,生成第二特征图的步骤,由第二级编码单元至第M级编码单元中至少一级编码单元执行。The method according to claim 2, wherein the step of generating a second feature map based on the first feature map and the downsampled element image, at least one level of coding from the second level coding unit to the Mth level coding unit Unit execution.
  4. 根据权利要求2或3所述方法,其特征在于,所述方法还包括:The method according to claim 2 or 3, wherein the method further comprises:
    确定元素图像对,所述元素图像对中包括原始图像和与所述原始图像相对应的真实样本图像;Determining an element image pair including an original image and a real sample image corresponding to the original image;
    利用元素图像生成模块,基于所述原始图像生成与所述原始图像相对应的生成图像;Generating, by the element image generating module, a generated image corresponding to the original image based on the original image;
    利用判别模块,判别所述生成图像与所述真实样本图像之间的差异程度;其中,所述生成图像标记为负类,所述真实样本图像标记为正类;Determining, by the discriminating module, a degree of difference between the generated image and the real sample image; wherein the generated image is marked as a negative class, and the real sample image is marked as a positive class;
    根据所述差异程度,交替调整所述元素图像生成模块和所述判别模块中的参数,直至所述差异程度满足预设条件。The parameters in the element image generating module and the discriminating module are alternately adjusted according to the degree of difference until the degree of difference satisfies a preset condition.
  5. 根据权利要求4所述方法,其特征在于,判别所述生成图像与所述真实样本图像 之间的差异程度,包括:The method of claim 4, wherein determining a degree of difference between the generated image and the real sample image comprises:
    根据所述生成图像和所述真实样本图像计算损失函数;Calculating a loss function according to the generated image and the real sample image;
    根据所述损失函数的计算结果输出判别结果,所述判别结果用于反映所述生成图像与所述真实样本图像之间的差异程度。A discrimination result is output according to a calculation result of the loss function, the discrimination result being used to reflect a degree of difference between the generated image and the real sample image.
  6. 根据权利要求5所述方法,其特征在于,所述损失函数包括特征空间损失函数,所述特征空间损失函数用于反映所述生成图像与所述真实样本图像之间在特征空间上的差异。The method of claim 5 wherein said loss function comprises a feature space loss function for reflecting a difference in feature space between said generated image and said real sample image.
  7. 根据权利要求5所述方法,其特征在于,所述损失函数包括对抗损失函数,所述对抗损失函数用于反映所述元素图像生成模块对降低所述差异程度的贡献程度和所述判别模块对提高所述差异程度的贡献程度。The method according to claim 5, wherein said loss function comprises an anti-loss function for reflecting a degree of contribution of said element image generating module to reducing said degree of difference and said discriminating module pair Increase the contribution of the degree of difference.
  8. 根据权利要求5所述方法,其特征在于,所述损失函数包括以下至少一项:The method of claim 5 wherein said loss function comprises at least one of the following:
    像素空间损失函数,所述像素空间损失函数用于反映所述生成图像与所述真实样本图像之间在对应像素点上的差异;a pixel space loss function for reflecting a difference between the generated image and the real sample image at a corresponding pixel point;
    类别损失函数,所述类别损失函数用于反映所述生成图像与所述真实样本图像之间在类别上的差异。A class loss function for reflecting a difference in categories between the generated image and the real sample image.
  9. 根据权利要求4所述方法,其特征在于,在利用元素图像生成模块,基于所述原始图像生成与所述原始图像相对应的生成图像之后,所述方法还包括:The method according to claim 4, wherein after the generating the image corresponding to the original image based on the original image, the method further comprises:
    对生成的生成图像进行测试,确定所述生成图像与所述真实样本图像的匹配程度。The generated generated image is tested to determine the degree of matching of the generated image with the real sample image.
  10. 根据权利要求9所述方法,其特征在于,对所述元素图像生成模型生成的生成图像进行测试,确定所述生成图像与所述真实样本图像的匹配程度,包括:The method according to claim 9, wherein the generated image generated by the element image generation model is tested, and the degree of matching between the generated image and the real sample image is determined, including:
    根据以下至少一项指标确定所述生成图像与所述真实样本图像的匹配程度:Determining the degree of matching of the generated image with the real sample image according to at least one of the following indicators:
    所述生成图像与所述真实样本图像之间的L1距离;An L1 distance between the generated image and the real sample image;
    所述生成图像与所述真实样本图像之间的峰值信噪比PSNR;a peak signal to noise ratio PSNR between the generated image and the real sample image;
    所述生成图像与所述真实样本图像之间的结构相似度SSIM。The structural similarity SSIM between the generated image and the real sample image.
  11. 根据权利要求9所述方法,其特征在于,在确定所述生成图像与所述真实样本图像的匹配程度之后,所述方法还包括:The method according to claim 9, wherein after determining the degree of matching between the generated image and the real sample image, the method further comprises:
    根据所述生成图像与所述真实样本图像的匹配程度,调整所述元素图像生成模块中的参数。Adjusting parameters in the element image generating module according to a degree of matching between the generated image and the real sample image.
  12. 根据权利要求1~3、5~11之任一所述方法,其特征在于,所述元素图像为文字字体。The method according to any one of claims 1 to 3, 5 to 11, wherein the element image is a text font.
  13. 一种元素图像生成装置,其特征在于,所述装置包括:An element image generating apparatus, wherein the apparatus comprises:
    特征图第一生成单元,用于基于初始元素图像生成第一特征图;a feature map first generating unit, configured to generate a first feature map based on the initial element image;
    特征图第二生成单元,用于基于第一特征图,生成第二特征图;a second generation unit of the feature map, configured to generate a second feature map based on the first feature map;
    目标元素图像生成单元,用于基于所述第一特征图和多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像;a target element image generating unit, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
    其中,所述特征图第二生成单元中的至少一个,还用于:The at least one of the second generation unit of the feature map is further configured to:
    基于第一特征图和降采样元素图像,生成第二特征图;所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image; the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.
  14. 一种元素图像生成系统,其特征在于,所述系统包括元素图像生成模块,所述元素图像生成模块中包括编码子模块和解码子模块;所述编码子模块中包括逐级相连的M级编码单元,所述解码子模块中包括逐级相连的M级解码单元,M为自然数;其中,An element image generating system, wherein the system includes an element image generating module, the element image generating module includes an encoding sub-module and a decoding sub-module; and the encoding sub-module includes M-level encoding connected step by step. a unit, the decoding sub-module includes M-level decoding units connected in stages, and M is a natural number;
    第一级编码单元,用于基于初始元素图像生成第一特征图;a first level coding unit, configured to generate a first feature map based on the initial element image;
    第二级编码单元至第M级编码单元,用于基于上一级编码单元生成的特征图,生成第二特征图;a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;
    解码子模块,用于基于第一特征图和多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像。And a decoding submodule, configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  15. 根据权利要求14所述系统,其特征在于,所述系统还包括:The system of claim 14 wherein said system further comprises:
    元素图像判别模块,用于判别生成图像与真实样本图像之间的差异程度;An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image;
    其中,among them,
    所述生成图像由所述元素图像生成模块基于原始图像生成;The generated image is generated by the element image generation module based on the original image;
    所述真实样本图像与所述原始图像相对应,构成元素图像对;The real sample image corresponds to the original image to form an element image pair;
    所述差异程度用于交替调整所述元素图像生成模块和所述判别模块中的参数,使得所述差异程度满足预设条件。The degree of difference is used to alternately adjust parameters in the element image generating module and the discriminating module such that the degree of difference satisfies a preset condition.
  16. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    处理器;以及Processor;
    被安排成存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行以下操作:A memory arranged to store computer executable instructions that, when executed, cause the processor to perform the following operations:
    基于初始元素图像生成第一特征图;Generating a first feature map based on the initial element image;
    基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
    将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
    基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像;Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
    其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
    基于第一特征图和降采样元素图像,生成第二特征图,所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储一个或多个程序,所述一个或多个程序当被包括多个应用程序的电子设备执行时,使得所述电子设备执行以下操作:A computer readable storage medium, wherein the computer readable storage medium stores one or more programs, the one or more programs causing the electronic device when executed by an electronic device comprising a plurality of applications The device does the following:
    基于初始元素图像生成第一特征图;Generating a first feature map based on the initial element image;
    基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
    将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
    基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始元素图像相对应的目标元素图像;Generating a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
    其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
    基于第一特征图和降采样元素图像,生成第二特征图,所述降采样元素图像由所述初始元素图像经过将采样处理后得到,所述降采样元素图像与所述第一特征图的空间尺寸相匹配。Generating a second feature map based on the first feature map and the downsampled element image, the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  18. 一种汉字字体图像生成方法,其特征在于,包括:A method for generating a Chinese character font image, comprising:
    基于初始汉字字体图像生成第一特征图;Generating a first feature map based on the initial Chinese font image;
    基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
    将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
    基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始汉字字体图像相对应的目标汉字字体图像;Generating a target kanji font image corresponding to the initial kanji font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
    其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
    基于第一特征图和降采样汉字字体图像,生成第二特征图,所述降采样汉字字体图 像由所述初始汉字字体图像经过将采样处理后得到,所述降采样汉字字体图像与所述第一特征图的空间尺寸相匹配。Generating, according to the first feature map and the downsampled Chinese font image, the second feature map, wherein the downsampled Chinese font image is obtained by sampling the initial Chinese font image, and the downsampled Chinese font image and the first The spatial dimensions of a feature map match.
  19. 一种文字字体图像生成方法,其特征在于,包括:A method for generating a text font image, comprising:
    基于初始文字字体图像生成第一特征图;Generating a first feature map based on the initial text font image;
    基于第一特征图,生成第二特征图;Generating a second feature map based on the first feature map;
    将第二特征图作为新的第一特征图,迭代执行所述生成第二特征图的步骤,以获得多个第二特征图;Using the second feature map as a new first feature map, iteratively performing the step of generating the second feature map to obtain a plurality of second feature maps;
    基于所述第一特征图和所述多个第二特征图中至少一个第二特征图,生成与所述初始文字字体图像相对应的目标文字字体图像;Generating a target text font image corresponding to the initial text font image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
    其中,所述迭代生成第二特征图的步骤包括至少一次如下步骤:The step of generating the second feature map by the iterative process includes the following steps at least once:
    基于第一特征图和降采样文字字体图像,生成第二特征图,所述降采样文字字体图像由所述初始文字字体图像经过将采样处理后得到,所述降采样文字字体图像与所述第一特征图的空间尺寸相匹配。Generating, according to the first feature map and the downsampled text font image, the second feature map, wherein the downsampled text font image is obtained by sampling the initial text font image, and the downsampled text font image and the first The spatial dimensions of a feature map match.
PCT/CN2019/081217 2018-04-10 2019-04-03 Element image generation method, device and system WO2019196718A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810315058.3A CN110363830B (en) 2018-04-10 2018-04-10 Element image generation method, device and system
CN201810315058.3 2018-04-10

Publications (1)

Publication Number Publication Date
WO2019196718A1 true WO2019196718A1 (en) 2019-10-17

Family

ID=68163428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081217 WO2019196718A1 (en) 2018-04-10 2019-04-03 Element image generation method, device and system

Country Status (2)

Country Link
CN (1) CN110363830B (en)
WO (1) WO2019196718A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767935A (en) * 2019-10-31 2020-10-13 杭州海康威视数字技术股份有限公司 Target detection method and device and electronic equipment
CN112070658A (en) * 2020-08-25 2020-12-11 西安理工大学 Chinese character font style migration method based on deep learning

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308094B (en) * 2020-11-25 2023-04-18 创新奇智(重庆)科技有限公司 Image processing method and device, electronic equipment and storage medium
CN114169255B (en) * 2022-02-11 2022-05-13 阿里巴巴达摩院(杭州)科技有限公司 Image generation system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1077445A2 (en) * 1999-08-19 2001-02-21 Adobe Systems, Inc. Device dependent rendering of characters
CN104268549A (en) * 2014-09-16 2015-01-07 天津大学 Anti-noise multi-scale local binary pattern characteristic representation method
CN104794504A (en) * 2015-04-28 2015-07-22 浙江大学 Graphic pattern text detection method based on deep learning
CN107644006A (en) * 2017-09-29 2018-01-30 北京大学 A kind of Chinese script character library automatic generation method based on deep neural network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709875B (en) * 2016-12-30 2020-02-18 北京工业大学 Compressed low-resolution image restoration method based on joint depth network
CN107392973B (en) * 2017-06-06 2020-01-10 中国科学院自动化研究所 Pixel-level handwritten Chinese character automatic generation method, storage device and processing device
CN107330954A (en) * 2017-07-14 2017-11-07 深圳市唯特视科技有限公司 A kind of method based on attenuation network by sliding attribute manipulation image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1077445A2 (en) * 1999-08-19 2001-02-21 Adobe Systems, Inc. Device dependent rendering of characters
CN104268549A (en) * 2014-09-16 2015-01-07 天津大学 Anti-noise multi-scale local binary pattern characteristic representation method
CN104794504A (en) * 2015-04-28 2015-07-22 浙江大学 Graphic pattern text detection method based on deep learning
CN107644006A (en) * 2017-09-29 2018-01-30 北京大学 A kind of Chinese script character library automatic generation method based on deep neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767935A (en) * 2019-10-31 2020-10-13 杭州海康威视数字技术股份有限公司 Target detection method and device and electronic equipment
CN111767935B (en) * 2019-10-31 2023-09-05 杭州海康威视数字技术股份有限公司 Target detection method and device and electronic equipment
CN112070658A (en) * 2020-08-25 2020-12-11 西安理工大学 Chinese character font style migration method based on deep learning
CN112070658B (en) * 2020-08-25 2024-04-16 西安理工大学 Deep learning-based Chinese character font style migration method

Also Published As

Publication number Publication date
CN110363830B (en) 2023-05-02
CN110363830A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
WO2019196718A1 (en) Element image generation method, device and system
US20210390700A1 (en) Referring image segmentation
US20210326656A1 (en) Panoptic segmentation
CN111914085A (en) Text fine-grained emotion classification method, system, device and storage medium
CN115438215B (en) Image-text bidirectional search and matching model training method, device, equipment and medium
CN110826609B (en) Double-current feature fusion image identification method based on reinforcement learning
US20180365594A1 (en) Systems and methods for generative learning
CN109816659B (en) Image segmentation method, device and system
CN113221879A (en) Text recognition and model training method, device, equipment and storage medium
CN111079374B (en) Font generation method, apparatus and storage medium
CN111325660A (en) Remote sensing image style conversion method based on text data
CN112669215A (en) Training text image generation model, text image generation method and device
TWI803243B (en) Method for expanding images, computer device and storage medium
CN110347853B (en) Image hash code generation method based on recurrent neural network
CN113870286A (en) Foreground segmentation method based on multi-level feature and mask fusion
Zhou et al. Attention transfer network for nature image matting
CN116485649A (en) End-to-end image stitching and positioning method and system
US20220301106A1 (en) Training method and apparatus for image processing model, and image processing method and apparatus
CN111340139B (en) Method and device for judging complexity of image content
US11494431B2 (en) Generating accurate and natural captions for figures
CN110796115A (en) Image detection method and device, electronic equipment and readable storage medium
CN109754416B (en) Image processing apparatus and method
KR102477700B1 (en) Method and apparatus for generating and editing images using contrasitive learning and generative adversarial network
CN115048442A (en) System call sequence data enhancement method for host intrusion detection
CN114996466A (en) Method and system for establishing medical standard mapping model and using method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19786018

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19786018

Country of ref document: EP

Kind code of ref document: A1