WO2019196718A1 - Procédé de génération d'image d'élément, dispositif et système - Google Patents

Procédé de génération d'image d'élément, dispositif et système Download PDF

Info

Publication number
WO2019196718A1
WO2019196718A1 PCT/CN2019/081217 CN2019081217W WO2019196718A1 WO 2019196718 A1 WO2019196718 A1 WO 2019196718A1 CN 2019081217 W CN2019081217 W CN 2019081217W WO 2019196718 A1 WO2019196718 A1 WO 2019196718A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature map
generating
element image
feature
Prior art date
Application number
PCT/CN2019/081217
Other languages
English (en)
Chinese (zh)
Inventor
孙东慧
张庆
唐浩超
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019196718A1 publication Critical patent/WO2019196718A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Definitions

  • the present application relates to the field of computer technology, and in particular, to an element image generation method, apparatus, and system.
  • an element image library When constructing an image library of graphic elements (referred to as an element image library), it is often necessary to create a series of image images of the same style, and it is necessary for the designer to design and manufacture each element image in the element image library one by one, which is time-consuming and labor-intensive. .
  • the size of the Chinese character library is very large.
  • the GB2312 national standard code contains 6,763 commonly used Chinese characters
  • the GBK encoding program contains 21,886 Chinese characters
  • the latest GB18030 national standard code contains more than 70,044 Chinese characters. Since each new font is designed and produced by the font designer, it is necessary to repeat the labor of each character in the font to make it have the same style. Therefore, the workload is very heavy. Big.
  • the embodiment of the present application provides an element image generation method and apparatus, aiming to efficiently and accurately generate an element image, so as to improve the efficiency of constructing an element image library and improve the accuracy of the constructed element image.
  • an embodiment of the present application provides an element image generating method, including:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  • the method is performed by an element image generating module;
  • the element image generating module includes an encoding sub-module, and the encoding sub-module includes M-levels connected step by step.
  • Coding unit, M is a natural number;
  • the step of generating a first feature map based on the initial element image is performed by the first level coding unit;
  • the step of generating the second feature map is performed by the second level coding unit to the Mth level coding unit.
  • the step of generating a second feature map based on the first feature map and the downsampled element image is performed by at least a second level coding unit to an Mth level coding unit
  • the primary coding unit performs.
  • the method further includes:
  • the parameters in the element image generating module and the discriminating module are alternately adjusted according to the degree of difference until the degree of difference satisfies a preset condition.
  • determining a degree of difference between the generated image and the real sample image includes:
  • a discrimination result is output according to a calculation result of the loss function, the discrimination result being used to reflect a degree of difference between the generated image and the real sample image.
  • the loss function includes a feature space loss function for reflecting a feature space between the generated image and the real sample image The difference.
  • the loss function includes an anti-loss function for reflecting a degree of contribution of the element image generating module to reducing the degree of the difference and the The degree to which the discriminating module contributes to increasing the degree of the difference.
  • the loss function comprises at least one of the following:
  • a pixel space loss function for reflecting a difference between the generated image and the real sample image at a corresponding pixel point
  • a class loss function for reflecting a difference in categories between the generated image and the real sample image is a class loss function for reflecting a difference in categories between the generated image and the real sample image.
  • the method further includes:
  • the generated generated image is tested to determine the degree of matching of the generated image with the real sample image.
  • the generated image generated by the element image generating model is tested, and the matching degree of the generated image with the real sample image is determined, including:
  • the structural similarity SSIM between the generated image and the real sample image is the structural similarity SSIM between the generated image and the real sample image.
  • the method further includes:
  • the element image is a text font.
  • an embodiment of the present application provides an element image generating apparatus, where the apparatus includes:
  • a feature map first generating unit configured to generate a first feature map based on the initial element image
  • a second generation unit of the feature map configured to generate a second feature map based on the first feature map
  • a target element image generating unit configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
  • the at least one of the second generation unit of the feature map is further configured to:
  • the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.
  • an embodiment of the present application provides an element image generating system, where the system includes an element image generating module, where the element image generating module includes an encoding sub-module and a decoding sub-module; a connected M-level coding unit, where the decoding sub-module includes M-level decoding units connected in stages, and M is a natural number;
  • a first level coding unit configured to generate a first feature map based on the initial element image
  • a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;
  • a decoding submodule configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  • the system further includes:
  • An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image
  • the generated image is generated by the element image generation module based on the original image
  • the real sample image corresponds to the original image to form an element image pair
  • the degree of difference is used to alternately adjust parameters in the element image generating module and the discriminating module such that the degree of difference satisfies a preset condition.
  • an electronic device including:
  • a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the following operations:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  • an embodiment of the present application provides a computer readable storage medium, where the one or more programs are stored, when the one or more programs are executed by an electronic device including multiple applications. , causing the electronic device to perform the following operations:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the downsampled element image being obtained by the initial element image after being sampled, the downsampled element image and the first feature map being The space dimensions match.
  • the embodiment of the present application provides a method for generating a Chinese character font image, including:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the embodiment of the present application provides a text font image generating method, including:
  • the step of generating the second feature map by the iterative process includes the following steps at least once:
  • the step of generating the second feature map is iteratively performed to obtain a plurality of second feature maps. And, during the iterative execution, the downsampled element image is introduced at least once as supplementary information for generating the second feature map. Based on this, a target element image corresponding to the initial element image is generated based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  • the technical solution provided by the embodiment of the present application can not only efficiently expand the image of the target element of different styles according to the initial element image, but also improve the efficiency of constructing the element image library, and can also reduce the information loss in the data processing process, thereby Conducive to improving the accuracy of the generated target element image.
  • FIG. 1 is a schematic flowchart diagram of an element image generating method according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an element image generation model used in an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a coding unit in an element image generation model used in an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a decoding unit in an element image generation model used in an embodiment of the present application
  • FIG. 5 is a schematic flowchart of generating an image of a target element in an embodiment of the present application
  • FIG. 6 is a schematic diagram of a processing procedure of a first-level coding unit in an element image generation model used in an embodiment of the present application;
  • FIG. 7 is a schematic diagram of a processing procedure of coding units of each level in an element image generation model used in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an element image generation model training network used in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a discriminating module in an element image generation model training network used in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a model for calculating a feature space loss function according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an element image generating apparatus according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an element image generating system according to an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • CNN Convolutional Neural Network
  • Feature Map A representation of a feature output by a convolutional layer, a pooled layer, a fully connected layer, or other layers in the network.
  • An embodiment of the present application provides an element image generating method.
  • the element image to which the embodiment of the present application is applied may include various graphic elements such as a font of a character, a mark symbol on a musical score, and a cartoon character shape.
  • the purpose of the embodiment of the present application is to select a style according to a small part of a sample of an element image (it can be understood that this part of the sample is usually designed by the user one by one, which can be used as a basis for determining the style feature), and automatically expands the generation and the element image collection.
  • the other element image corresponds to the new element image, so that the generated new element image is consistent with the style of the user-designed sample, so that different styles of element image collection can be efficiently and accurately generated, and different styles of graphics can be realized. Automatic expansion of collections.
  • the element image is embodied as a cartoon image.
  • the cartoon image of the animal can be manually designed according to the original image of some animals, and the original image and the cartoon image of the animal (as a real sample image) are input as input elements.
  • An image generation model also referred to as an element image generation module
  • the image of the other animal can be input as the initial element image into the above-mentioned trained element image generation model, and other cartoon images unified with the style of the hand design can be automatically generated.
  • the element image is embodied as a Chinese font.
  • Each Chinese font requires a corresponding font library.
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts for example, 1000 new font Chinese characters
  • you can manually design some new Chinese fonts (for example, 1000 new font Chinese characters) to determine the style of the new font, and then use the original font of this part of the Chinese characters (for example, Song).
  • the new image of this part of the Chinese character is used as the real sample image, and the element image generation model is trained.
  • the original font of the specific Chinese character of the new font (for example, Song) is used as the initial element image, and the model is generated by the trained element image (also referred to as element image generation module, element image generation network) to generate the remaining Chinese characters.
  • the new font which can efficiently generate new fonts for Chinese characters, improves the efficiency of building font libraries.
  • the element image generation model (which can be embodied as the font generation model in the above application scenario) used in the embodiment of the present application receives the initial element image multiple times (can be embodied as the original font in the above application scenario), Accepting the original initial element image, and also accepting the downsampled element image after downsampling the original initial element image as supplementary information, thereby reducing information loss during data processing, and improving the generated target element image (may be Condensed into the accuracy of the new fonts in the above application scenarios.
  • an embodiment of the present application provides an element image generation method, which may include:
  • S101 Generate a first feature map based on the initial element image.
  • S103 Generate a second feature map based on the first feature map.
  • the second feature map is used as a new first feature map, and the step S103 is performed iteratively to generate a second feature map to obtain a plurality of second feature maps.
  • S107 Generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
  • the step of generating the second feature map includes, at least once:
  • the downsampled element image is obtained by sampling the initial element image, and the downsampled element image is matched with the spatial size of the first feature map.
  • the initial element image can be determined first.
  • the determined initial element image can be understood as the basis for generating the target element image, and the generated target element image corresponds to the initial element image.
  • the correspondence between the target element image and the initial element image may be different, and specifically may be implemented by a trained element image generation model (also referred to as an element image generation module).
  • a trained element image generation model also referred to as an element image generation module.
  • the corresponding target element image and the initial element image reflect different fonts (styles) of the same Chinese character.
  • the initial element image when determining the initial element image, it can also be understood as determining a batch of images that are processed into a single image input model for processing according to the batch size.
  • the batch size refers to the number of images that are input into the model for processing in a single time, and is a parameter in a model constructed based on a neural network. For example, if the value of batch size is 1, then only one initial element image is input to the model for processing at a time; the value of batch size is taken as 16, and 16 initial element images are input to the model as a batch each time.
  • the generated target element image will also be 16 sheets.
  • the above steps S101 to S107 may be performed to generate a target element image corresponding to the initial element image.
  • the model element may be generated based on the trained element image, and the target element image corresponding to the initial element image is generated according to the downsampled element image after the downsampled processing of the initial element image and the initial element image.
  • the step of generating the second feature map is iteratively performed to obtain a plurality of second feature maps.
  • the downsampled element image is introduced as supplementary information at least once, for generating the second feature map, and generating the second feature map based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  • the technical solution provided by the embodiment of the present application can not only efficiently expand the image of the target element of different styles according to the initial element image, but also improve the efficiency of constructing the element image library, and can also reduce the information loss in the data processing process, thereby Conducive to improving the accuracy of the generated target element image.
  • the model can accept the original initial element image and accept the downsampled element image after the downsampling process at least once.
  • the target element image including the new element image can be efficiently and accurately generated based on the initial element image, and the information loss in the data processing process can be reduced. Conducive to improving the accuracy of the generated target element image.
  • an element image is embodied as a text font is used as an example, and a specific frame of an element image generation model (element image generation module) is illustrated, and an element provided by an embodiment of the present application is described in detail in combination with an element image generation model.
  • FIG. 2 is a schematic diagram showing the framework of an element image generation model suitable for the embodiment of the present application.
  • the element image generation model shown in Fig. 2 includes a generation module including an encoding sub-module (Encoder) and a decoding sub-module (Decoder).
  • the input of the model is an initial element image
  • the output of the model is a target element image corresponding to the initial element image.
  • the input initial element image is the text of the original font (for example, the " ⁇ " character in bold in FIG. 2)
  • the output target element image is the text of the new font (for example, in FIG. 2)
  • the word " ⁇ " in the body for example, the " ⁇ " character in bold in FIG. 2
  • the encoding sub-module in the above element image generation model is used to convert the initial element image into a high-dimensional feature map
  • the decoding sub-module is used to convert the high-dimensional feature image into a new image, that is, the output Target element image. It can be understood that the processing process of the data by the encoding sub-module and the decoding sub-module is symmetric and reciprocal.
  • the coding sub-module includes M-level coding units connected in stages, and M is a natural number.
  • the decoding sub-module includes M-level decoding units connected in stages, and M is a natural number. Taking the model shown in Figure 2 as an example, the value of M is 8.
  • the step of generating the first feature map based on the initial element image is performed by the first level coding unit (such as the coding unit e1 in FIG. 2).
  • the first-level coding unit generates a first feature map based on the image of the initial element, and outputs the first feature map to the second-level coding unit (such as the coding unit e2 in FIG. 2).
  • the first level coding unit may also output the first feature map to the decoding unit corresponding to the first level coding unit in the decoding submodule (such as the decoding unit d7 in FIG. 2).
  • the steps of generating the second feature map based on the first feature map (and the downsampled element image) in the above steps S103 to S105 are performed by the second level encoding unit to the Mth level encoding unit. And, at least one coding unit in the second level coding unit to the Mth level coding unit (for example, at least one coding unit in the coding units e2 to e8 in FIG. 2) is generated based on the previous level when generating the second feature picture.
  • the downsampled element image is obtained based on the initial element image, and the downsampled element image is matched with the spatial size of the feature map output by the previous coding unit.
  • each coding unit in the coding sub-module has the same structure and is connected step by step in order, and each coding unit outputs a feature map to the next-stage coding unit (the feature picture outputted by the last-level coding unit will be first
  • the level decoding unit accepts).
  • the coding unit may be composed of a plurality of convolutional layers Conv, a drain modified linear cell layer LReLU, and a bulk normalization layer BN.
  • the image or feature map input to the coding unit may be processed and output through the leakage improvement linear unit layer LReLU, the convolution layer Conv, and the batch normalization layer BN, as shown in FIG.
  • the convolutional layer Conv and the leaky modified linear unit layer LReLU can usually be paired (for example, designed as Conv-LReLU-Conv-LReLU or LReLU-Conv- In the form of LReLU-Conv), the batch normalization layer BN can also be designed at the end of the coding unit.
  • Each decoding unit in the decoding sub-module has the same structure and is also connected in stages in order. And, the number of decoding units is the same as the number of coding units.
  • the decoding unit may be composed of a modified linear unit layer ReLU, a deconvolution layer Deconv, a bulk normalization layer BN, and a gradient falling layer Dropout.
  • the feature map input to the decoding unit may be processed and outputted through the modified linear unit layer ReLU, the deconvolution layer Deconv, the batch normalization layer BN, and the gradient falling layer Dropout, as shown in FIG. Similarly, the number of layers and the order between the layers can be adjusted, and the processing of the feature map can be realized, and finally the image of the target element can be generated.
  • the spatial size of the output feature map is reduced to half of the spatial size of the input feature image for each processing of the feature map by one level of coding unit (specifically, the width of the feature map is reduced by half, And the height is reduced by half).
  • the spatial size of the output feature map will be increased to twice the spatial size of the input feature image after each processing of the feature map by one level of decoding unit (specifically, the width of the feature map is increased). It is twice and the height is doubled).
  • the coding unit and the decoding unit satisfying the condition may be associated with each other. It can be called a symmetric pair. Specifically, it can be described that the coding unit and the decoding unit having the same spatial size of the output feature map have a corresponding relationship.
  • the size representation of the image can be understood as: the number of pixels in the width and height of the image is 256, which is represented by the RGB color model, so for each Pixels are represented by three-dimensional attribute data).
  • the spatial size of the feature map output by the 8-level coding unit included in the coding sub-module shown in FIG. 2 is as shown in Table 1.
  • the size of the image is changed from 256 ⁇ 256 of the initial element image to the encoding unit e1 (first-level coding unit). 128 ⁇ 128, which is processed by the coding unit e2 (second-level coding unit) to become 64 ⁇ 64, and so on, until it is processed to 1 ⁇ 1 after being processed via the coding unit e8 (eight-level coding unit).
  • the number of convolution layers in each coding unit may be different, resulting in different values of channels in each level of output.
  • the number of convolution layers preferably increases with the number of stages of the coding unit.
  • the spatial size of the feature map output by the 8-level decoding unit included in the decoding sub-module shown in Fig. 2 is as shown in Table 2 below.
  • the structure of the element image generation model and the correspondence between the coding unit and the decoding unit are briefly explained above.
  • a target corresponding to the initial element image is generated according to the downsampled element image after the downsampled processing of the initial element image and the initial element image.
  • the element image may specifically include the following steps, as shown in FIG. 5:
  • S1031 Input the initial element image and the downsampled element image into the encoding submodule.
  • the downsampled element image after the downsampled processing of the initial element image is also input into the encoding submodule, which can reduce information loss in the data processing process, thereby generating a more accurate target element image.
  • the first level coding unit (for example, the coding unit e1 in FIG. 2) can directly accept the original initial element image, as shown in FIG. 6.
  • Any Nth-level coding unit after the first-stage coding unit (N may be a natural number greater than 1 and not greater than M), and may be subjected to down-sampling processing in addition to the feature map outputted by the coding unit of the previous-stage coding unit.
  • the image of the downsampled element is shown in Figure 7.
  • the spatial size of the acceptable downsampled element image corresponds to the coding unit, and specifically, should be consistent with the spatial size of the feature image output by the coding unit received by the coding unit, so that the coding unit can After the two are merged, subsequent data processing is performed.
  • the downsampled element image can be input to each coding unit after the first stage coding unit (for example, coding units e2 to e8 in FIG. 2), or can be input only to the partial coding unit (for example, in FIG. 2).
  • the coding units e2 and e7) may be as long as the spatial size of the downsampled element image coincides with the spatial size of the upper-level feature map received by the accessed coding unit.
  • the spatial size of the feature map output by the encoding unit e1 to the encoding unit e2 is 128 ⁇ 128, and the spatial size of the downsampled element image of the input encoding unit e2 should also be processed to be 128 ⁇ 128.
  • layers of different depths of the model can supplement the information of the specific size of the initial element image, thereby facilitating the generation of high quality target element images.
  • the initial element image may be downsampled by using a plurality of methods such as bilinear interpolation, single interpolation, and nearest interpolation. This application does not limit this.
  • the coding unit combines the received feature map and the downsampled element image for subsequent processing.
  • the specific method of fusion may have various options, for example, superimposing on a specific dimension or performing a superposition operation on attribute values of corresponding pixel points.
  • the superposition is performed on the feature map channel dimension.
  • S1033 Output multiple feature maps to the decoding submodule by using the encoding submodule.
  • the coding unit and the decoding unit having the same spatial size of the output feature map have a corresponding relationship, and constitute a symmetric unit pair, for example, ⁇ e1, d7>, ⁇ e3, d5>, and the like.
  • the output unit of the corresponding relationship and the output of the decoding unit are directly connected. Therefore, each coding unit in the coding sub-module outputs the generated feature map to its next-stage coding unit when the feature map is output (the feature map generated by the last-stage coding unit is output to the first-stage decoding unit), And output to its corresponding decoding unit.
  • the first level coding unit (concrete to the coding unit e1) generates a first feature map according to the initial element image; the first level coding unit (concrete to e1) outputs the first feature map to the second level coding unit (concrete E2) and a decoding unit corresponding to the first level coding unit (concrete to the decoding unit d7);
  • the Kth order coding unit (concrete to any one of the coding units e2 to e7, for example, the coding unit e3) is output according to the downsampled element image and the K-1th coding unit (corresponding to the coding unit e2) a second feature map, generating a third feature map; the Kth level coding unit (eg, coding unit e3) outputs the third feature map to the K+1th level coding unit (corresponding to the coding unit e4) a decoding unit corresponding to a Kth-level coding unit (eg, coding unit e3) (corresponding to, specifically, decoding unit d5); wherein K is a natural number greater than 1 and less than M, a downsampled element image and a second feature
  • K is a natural number greater than 1 and less than M, a downsampled element image and a second feature
  • S1035 Generate a target element image corresponding to the initial element image according to the plurality of feature maps by using the decoding submodule.
  • the decoding units of each stage in the decoding sub-module are processed after receiving the feature map output by the decoding unit of the previous stage, and then the feature map outputted by the level is merged with the feature map transmitted by the corresponding coding unit, as the next level.
  • the input of the decoding unit is as follows:
  • the first level decoding unit (concrete to the decoding unit d1) generates a sixth feature map according to the fifth feature map output by the last level encoding unit ( embodied as the encoding unit e8), and outputs the sixth feature map to the second level decoding unit. Is the decoding unit d2);
  • the Lth stage decoding unit in the decoding submodule (specifically, any one of the decoding units d2 to d7, for example, the decoding unit d3) is output according to the L-1th stage decoding unit (corresponding to the decoding unit d2) a seventh feature map, generating an eighth feature map;
  • the Lth level decoding unit (eg, decoding unit d3) is a coding unit corresponding to the eighth feature map and the Lth level decoding unit (eg, decoding unit d3) (corresponding,
  • the ninth feature map outputted by the coding unit e5) is superimposed on the channel dimension to generate a tenth feature map, which is output to the L+1th level decoding unit (corresponding to the decoding unit d4);
  • L is a natural number greater than 1 and less than M;
  • the target element image corresponding to the initial element image is generated by the Mth stage decoding unit in the decoding submodule according to the eleventh feature map output by the M-1th stage decoding unit.
  • the coding unit in the coding sub-module accepts two kinds of output signals of the feature image output by the upper-level coding unit and the down-sampled element image after the down-sample processing of the initial element image, thereby being at different stages of the coding sub-module.
  • Information with initial element images flows in.
  • the decoding unit in the decoding sub-module accepts the feature map directly output by the corresponding coding unit in the coding sub-module, in addition to the feature map output by the upper-level decoding unit, thereby further reducing the image data processing process. Information loss.
  • the trained element image generation model may be used to generate a target element image corresponding to the initial element image.
  • the element image generation model may be used according to the initial element.
  • the image and the downsampled element image subjected to the downsampling process on the initial element image generate the target element image.
  • the target element image can be automatically generated according to the initial element image, so that the target element image of different styles can be efficiently expanded according to the initial element image, and the efficiency of constructing the element image library is improved.
  • the downsampled element image after the downsample processing of the initial element image is accepted as supplementary information to generate a feature map, thereby reducing the data.
  • the loss of information during processing facilitates the accuracy of the generated image of the target element.
  • the above example illustrates a specific implementation example of how to generate a target element image using the trained element image generation model.
  • the element image generation network including the element image generation module and the discrimination module may be specifically trained, as shown in FIG. 8.
  • An element image generating module (hereinafter may be simply referred to as a generating module) is configured to generate a generated image corresponding to the original image based on the original image; the discriminating module is configured to discriminate the degree of difference between the generated image and the real sample image, and adjust the element according to the degree of difference The parameters in the image generation network; wherein the real sample image corresponds to the original image, constituting an element image pair.
  • the discriminating module determines whether the input image is a real sample image or a generated image generated by the model.
  • the structure of the discriminating module can be as shown in FIG.
  • the generated image and the real sample image enter the discriminating module, and the first two layers are the convolution layer Conv and the leak improving linear unit layer LReLU, and then are normalized by three [convolution layer Conv-batch zeros).
  • the layer BN-drain improved linear unit layer LReLU] is formed by connecting in series.
  • the process of training the element image generation network may specifically include:
  • the generated image and the real sample image are used as a training sample input discriminating module, and the discriminating module determines the degree of difference between the generated image and the real sample image; wherein the generated image is marked as a negative class, and the real sample image is marked as a positive class;
  • the parameters in the generating module and the discriminating module are alternately adjusted until the degree of difference satisfies the preset condition.
  • the goal of the generation module is to make the generated generated image as real as possible, so that the discriminating module can be "spoofed” (that is, the discriminating module considers the generated image and the real There is no difference in the sample image, or the difference is small enough).
  • the goal of the discriminating module is to correctly distinguish the real sample image and the generated image. Therefore, the training module and the discriminating module can be alternately trained during training. details as follows:
  • the system initializes the generation module (Generator) and the discriminator module (Discriminator), which are respectively recorded as G0 and D0; the system accepts an input picture of a batch (that is, a batch) (the number of inputs is the value of batch size), wherein Each input is a pair of maps, the original image and the corresponding real sample image. Then, the original image is sent to the generating module G0, and after a series of data processing, a new image is generated, that is, an image is generated.
  • the real sample image is marked as a positive class
  • the generated image is marked as a negative class
  • the two are input as a training sample to the discriminating module D0.
  • the generating module G0 is fixed, and the parameter of the discriminating module D0 is updated according to the calculation result of the loss function, so that D0 is updated to a new state and recorded as D1.
  • D1 is fixed, and the parameters of the generating module G0 are updated according to the calculation result of the loss function, so that G0 is updated to a new state, which is denoted as G1.
  • the generating module G and the discriminating module D are alternately trained throughout the training process, so that the calculation result of the loss function satisfies the preset requirement, and the two achieve the optimal state.
  • the initial element image can be input into the model as the original image in the training process, and the generated image generated by the model is the target element image desired by the user.
  • an unsupervised learning method is adopted, and the two neural networks, the generation module and the discriminating module, are learned from each other.
  • the output of the generated module needs to mimic the real sample image in the training set as much as possible; and the purpose of the discriminating module is to distinguish the real sample image from the generated image.
  • the two modules compete against each other and constantly adjust the parameters.
  • the final purpose is to make the discriminating module unable to judge whether the output result of the generating module is true.
  • the loss function can be calculated according to the generated image and the real sample image, thereby outputting the discrimination result according to the calculation result of the loss function, reflecting the degree of difference between the generated image and the real sample image.
  • the parameters in the element image generation model can be adjusted according to the calculation result of the loss function.
  • the loss function can have various options.
  • an anti-loss function can be included.
  • the confrontation loss function is used to reflect the contribution of the generation module to the degree of difference reduction and the contribution of the discriminant module to the degree of difference. It can be understood that the generation module in the model continuously generates a new generated image, and it is hoped to pass the evaluation of the discriminating module. The discriminating module hopes to correctly resolve the generated generated image (marked as a negative class) and the real sample image (marked as a positive class). Therefore, the anti-loss function can be expressed by the following formula:
  • pdata(x) represents the target data distribution and pz(z) is a raw data distribution.
  • pdata(x) represents the real sample image to be input to the discriminating module
  • pz(z) is the original image of the input generating module
  • G(z) represents the generated image generated by the model and input into the discriminating module.
  • the generation module G and the discrimination module D are alternately trained in a mutually opposing manner.
  • the generating module G continuously generates a new generated image in an attempt to pass the evaluation of the discriminating module D; the discriminating module D attempts to accurately distinguish the real sample image and generate the image. Therefore, in the process of training the model, the discriminating module D gives a higher score to the real sample image, and gives a lower score to the generated generated image, that is, D has a maximum D(x), minimizes D. (G(z)) trend. Therefore the discriminating module D will maximize the anti-loss function V(D, G).
  • the generation module G attempts to generate a real image, so there is a tendency to maximize D(G(z)), thus minimizing the anti-loss function V(D, G).
  • the feature space loss function (Perceptual Loss, also known as Perceptual Loss) can be introduced as a loss function in model training.
  • the feature space loss function is used to reflect the difference in feature space between the generated image and the real sample image.
  • the feature space refers to the space corresponding to the high-dimensional features rich in semantic information after the image is passed through a deep neural network.
  • the vector of the feature space contains more advanced semantic information, so it can be used to measure the difference between the generated image and the real sample image.
  • the perceived loss can be calculated by using various methods such as AlexNet, ResNet, and VGG19.
  • the VGG19 is taken as an example for detailed description.
  • VGG19 is a typical neural network model consisting of multiple convolutional layers Conv, pooling layer pooling and fully connected layer FC. The structure of the convoluted layer portion is shown in Fig. 10.
  • the generated image and the real sample image are respectively passed through the VGG19 network, and the features of some convolutional layers are selected, and the L1 distance between the two features is calculated as the perceived loss.
  • the VGG19 model has many convolutional layers because there are also many options. It is recommended to select convolutional layers of different depths. For example, as illustrated in Figure 10, the characteristics of the five convolutional layer outputs of the convolutional layers Conv1_2, Conv2_2, Conv3_2, Conv_4_2, and Conv5_2 can be selected to calculate the perceived loss.
  • the calculation formula is expressed as follows:
  • represents the VGG19 network model.
  • l is the selected convolutional layer, real represents the real sample image, and fake represents the generated image.
  • ⁇ l is the weight at which each convolution layer calculates the perceived loss.
  • 1 means calculating the L1 distance.
  • the overall perceived loss Ploss is the weighted sum of the perceptual layers calculated perceptual loss.
  • the weight of each layer can be set differently. Considering that the closer to the output, the closer the information captured by the convolutional layer is to the bottom layer, the higher the degree of abstraction. Therefore, it is recommended to use the principle that the weight of the convolution layer near the input is smaller than the weight of the convolution layer near the output to determine the weight of each layer. .
  • the pixel space loss function may also be used to reflect the difference between the target element image and the initial element image at the corresponding pixel point. Since the purpose of the element image generation module is to generate a generated image as much as possible as a real sample image. Therefore, to measure the difference between the generated image and the real sample image, you can also compare the difference between the corresponding pixel points, that is, the difference in the pixel space. Specifically, the L1 distance between the real sample image and the generated image can be calculated as a calculation result of the pixel space loss function.
  • the category loss function can also be used to reflect the difference in categories between the target element image and the initial element image.
  • the element image generation model There are many training methods for the element image generation model, including single-stage training mode and multi-stage training mode. Taking the training font generation model as an example, single-stage training refers to directly generating a target font using a source font.
  • the multi-stage training method is divided into a pre-training phase and a re-training phase. Multi-stage training In the pre-training phase, a source font is fixed, and a plurality of target fonts are generated using the source font; in the subsequent retraining phase, the source font is used to generate a target font.
  • the method is called a one-to-many training method.
  • the discriminating module can correctly predict the type of image (font) in addition to whether the image is a real sample image or an image. Therefore, the category loss function is introduced, which is embodied as the cross entropy between the real category and the predicted category, as shown in the following formula:
  • the quality of the generated generated image can also be tested and evaluated.
  • the degree to which the generated image matches the real sample image can be determined based on one or more of the following metrics:
  • the L1 distance between all the real sample images and the generated images in the test set can be calculated and averaged. It can be understood that the smaller the L1 distance, the closer the generated image is to the real sample image, indicating that the quality of the generated image is higher.
  • PSNR peak signal to noise ratio
  • I and J represent two images (specifically, a generated image and a real sample image),
  • the structural similarity SSIM between the generated image and the real sample image can be calculated.
  • Structural Similarity SSIM measures the difference between two images (ie, the generated image and the real sample image) from three perspectives: structural correlation, contrast, and brightness.
  • the SSIM formula for calculating two images can be expressed as:
  • ⁇ x , ⁇ y , ⁇ x , ⁇ y are the mean and the standard deviation of the images X and Y, and ⁇ xy is the covariance of the two.
  • C 1 , C 2 and C 3 are constants.
  • a font that is embodied as a text in an element image to illustrate the evaluation process of the font quality.
  • a single word recognition model with good font recognition performance trained on the real font data set can be prepared. The following will be based on the single word recognition model for font quality evaluation.
  • the font image generated by the embodiment of the present application (that is, the generated image) is identified by a single word model. If the generated words can be correctly identified, the newly generated words have been initially correct in terms of glyphs, strokes, and structures. A portion of the words that are not correctly recognized can be filtered based on the recognition result.
  • the three indicators of L1 distance, PSNR and SSIM of each generated word can be further calculated, and the distribution of the results of the three indicators is obtained. Then use the average of each distribution as the threshold: for the distribution of the results of the PSNR and SSIM indicators, filter out the words below the average; the distribution of the results for the L1 distance indicator, filter out the average Word.
  • a manual evaluation is performed on all the generated words filtered by the first two steps, and the quality of the subject is considered to be good, so that the threshold of each index can be adjusted, and then the words are determined by the adjusted threshold. It is a good quality word.
  • the fourth step is to measure the total number of good quality words in the data set or the proportion of the generated data in the verification data set. The more the number and/or the higher the ratio, the higher the quality of the generated font. The better the font generation model.
  • the above method of evaluating font quality does not calculate an indicator on a sample set, but starts from the distribution of all generated words on a certain index, combining the structure of the single word recognition model on the structure of the font, the stroke, and the like.
  • the recognition ability is used for screening, and manual evaluation is introduced to adjust the threshold of each index, thereby realizing an interactive evaluation of the quality of the generated fonts, combining the advantages of subjective evaluation and objective evaluation, and being able to reflect the model generation more accurately and comprehensively.
  • the overall quality of the font On this basis, the parameters in the model can also be adjusted according to the evaluation situation.
  • the element image generating method provided by the embodiment of the present application is embodied as a Chinese character font image generating method, which may include the following steps:
  • the step of iteratively generating the second feature map includes the following steps at least once:
  • the second feature map is generated based on the first feature map and the downsampled Chinese font image, and the downsampled Chinese font image is obtained by sampling the initial Chinese font image, and the downsampled Chinese font image matches the spatial size of the first feature map. .
  • the element image generating method is embodied as a text font image generating method, and may include the following steps:
  • the step of iteratively generating the second feature map includes the following steps at least once:
  • the embodiment of the present application further provides an element image generating apparatus, as shown in FIG.
  • a feature map first generating unit 101 configured to generate a first feature map based on the initial element image
  • a feature map second generating unit 103 configured to generate a second feature map based on the first feature map
  • the target element image generating unit 105 is configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps;
  • the at least one of the feature map second generating unit 103 is further configured to:
  • the downsampled element image is obtained after the initial element image is subjected to sampling processing, and the downsampled element image and the first feature map are The space dimensions match.
  • the above element image generating device corresponds to the element image generating method in the foregoing embodiment.
  • the description in the foregoing method embodiments is applicable to the device, and details are not described herein again.
  • the embodiment of the present application further provides an element image generating system.
  • the system includes an element image generating module.
  • the element image generating module includes an encoding sub-module and a decoding sub-module.
  • the encoding sub-module includes a step-by-step connection.
  • the M-level coding unit, the decoding sub-module includes M-level decoding units connected step by step, and M is a natural number;
  • a first level coding unit configured to generate a first feature map based on the initial element image
  • a second level coding unit to an Mth level coding unit, configured to generate a second feature map based on the feature map generated by the upper level coding unit;
  • a decoding submodule configured to generate a target element image corresponding to the initial element image based on the first feature map and the at least one second feature map of the plurality of second feature maps.
  • the above element image generating system further includes:
  • An element image discriminating module for discriminating the degree of difference between the generated image and the real sample image
  • the generated image is generated by the element image generation module based on the original image
  • the real sample image corresponds to the original image and constitutes an element image pair
  • the degree of difference is used to alternately adjust the parameters in the element image generating module and the discriminating module so that the degree of difference satisfies the preset condition.
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device includes a processor, optionally including an internal bus, a network interface, and a memory.
  • the memory may include a memory, such as a high-speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory.
  • RAM high-speed random access memory
  • non-volatile memory such as at least one disk memory.
  • the electronic device may also include hardware required for other services.
  • the processor, the network interface, and the memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended) Industry Standard Architecture, extending the industry standard structure) bus.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one double-headed arrow is shown in Figure 13, but it does not mean that there is only one bus or one type of bus.
  • the program can include program code, the program code including computer operating instructions.
  • the memory can include both memory and non-volatile memory and provides instructions and data to the processor.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to form an element image generating device on a logical level.
  • the processor executes the program stored in the memory and is specifically used to perform the following operations:
  • the step of iteratively generating the second feature map includes the following steps at least once:
  • a second feature map is generated based on the first feature map and the downsampled element image, and the downsampled element image is obtained by sampling the initial element image, and the downsampled element image matches the spatial size of the first feature map.
  • the method performed by the element image generating apparatus disclosed in the embodiment shown in FIG. 1 of the present application may be applied to a processor or implemented by a processor.
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software.
  • the above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; or may be a digital signal processor (DSP), dedicated integration.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • other programmable logic device discrete gate or transistor logic device, discrete hardware component.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software modules can be located in random memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, etc., which are well established in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the above method.
  • the electronic device can also perform the method performed by the element image generating device in FIG. 1 and implement the functions of the element image generating device in the embodiment shown in FIG. 1.
  • the embodiments of the present application are not described herein again.
  • the embodiment of the present application further provides a computer readable storage medium storing one or more programs, the one or more programs including instructions that are executed by an electronic device including a plurality of applications
  • the electronic device can be caused to perform the method performed by the element image generating apparatus in the embodiment shown in FIG. 1, and is specifically configured to execute:
  • the step of iteratively generating the second feature map includes the following steps at least once:
  • a second feature map is generated based on the first feature map and the downsampled element image, and the downsampled element image is obtained by sampling the initial element image, and the downsampled element image matches the spatial size of the first feature map.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé, un dispositif, et un système de génération d'image d'élément. Le procédé consiste à : générer une première image de caractéristique sur la base d'une image d'élément initial (S101) ; générer une seconde image de caractéristique sur la base de la première image de caractéristique (S103), et utiliser la seconde image de caractéristique en tant que nouvelle première image de caractéristique (S105), et exécuter de manière itérative l'étape de génération de la seconde image de caractéristique ; et générer une image d'élément cible correspondant à l'image d'élément initiale sur la base de la première image de caractéristique et d'au moins une seconde image de caractéristique (S107). Au moins une des étapes de génération de la seconde image de caractéristique consiste à : générer une seconde image de caractéristique sur la base de la première image de caractéristique et d'une image d'élément sous-échantillonnée, l'image d'élément sous-échantillonnée étant obtenue par sous-échantillonnage de l'image d'élément initial ; l'image d'élément sous-échantillonnée correspond à la taille d'espace de la première image de caractéristique. Le procédé peut étendre efficacement différents styles d'images d'éléments cibles selon des images d'éléments initiales, améliorer l'efficacité de construction d'une bibliothèque d'images d'éléments, et réduire en outre les pertes d'informations pendant le traitement de données et améliorer la précision de l'image d'élément cible générée.
PCT/CN2019/081217 2018-04-10 2019-04-03 Procédé de génération d'image d'élément, dispositif et système WO2019196718A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810315058.3 2018-04-10
CN201810315058.3A CN110363830B (zh) 2018-04-10 2018-04-10 元素图像生成方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2019196718A1 true WO2019196718A1 (fr) 2019-10-17

Family

ID=68163428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081217 WO2019196718A1 (fr) 2018-04-10 2019-04-03 Procédé de génération d'image d'élément, dispositif et système

Country Status (2)

Country Link
CN (1) CN110363830B (fr)
WO (1) WO2019196718A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767935A (zh) * 2019-10-31 2020-10-13 杭州海康威视数字技术股份有限公司 一种目标检测方法、装置及电子设备
CN112070658A (zh) * 2020-08-25 2020-12-11 西安理工大学 一种基于深度学习的汉字字体风格迁移方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308094B (zh) * 2020-11-25 2023-04-18 创新奇智(重庆)科技有限公司 一种图像处理方法、装置、电子设备及存储介质
CN114169255B (zh) * 2022-02-11 2022-05-13 阿里巴巴达摩院(杭州)科技有限公司 图像生成系统以及方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1077445A2 (fr) * 1999-08-19 2001-02-21 Adobe Systems, Inc. Rendu de caractères dépendant d'appareil
CN104268549A (zh) * 2014-09-16 2015-01-07 天津大学 抗噪声的多尺度局部二值模式特征表示方法
CN104794504A (zh) * 2015-04-28 2015-07-22 浙江大学 基于深度学习的图形图案文字检测方法
CN107644006A (zh) * 2017-09-29 2018-01-30 北京大学 一种基于深度神经网络的手写体中文字库自动生成方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709875B (zh) * 2016-12-30 2020-02-18 北京工业大学 一种基于联合深度网络的压缩低分辨率图像复原方法
CN107392973B (zh) * 2017-06-06 2020-01-10 中国科学院自动化研究所 像素级手写体汉字自动生成方法、存储设备、处理装置
CN107330954A (zh) * 2017-07-14 2017-11-07 深圳市唯特视科技有限公司 一种基于衰减网络通过滑动属性操纵图像的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1077445A2 (fr) * 1999-08-19 2001-02-21 Adobe Systems, Inc. Rendu de caractères dépendant d'appareil
CN104268549A (zh) * 2014-09-16 2015-01-07 天津大学 抗噪声的多尺度局部二值模式特征表示方法
CN104794504A (zh) * 2015-04-28 2015-07-22 浙江大学 基于深度学习的图形图案文字检测方法
CN107644006A (zh) * 2017-09-29 2018-01-30 北京大学 一种基于深度神经网络的手写体中文字库自动生成方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767935A (zh) * 2019-10-31 2020-10-13 杭州海康威视数字技术股份有限公司 一种目标检测方法、装置及电子设备
CN111767935B (zh) * 2019-10-31 2023-09-05 杭州海康威视数字技术股份有限公司 一种目标检测方法、装置及电子设备
CN112070658A (zh) * 2020-08-25 2020-12-11 西安理工大学 一种基于深度学习的汉字字体风格迁移方法
CN112070658B (zh) * 2020-08-25 2024-04-16 西安理工大学 一种基于深度学习的汉字字体风格迁移方法

Also Published As

Publication number Publication date
CN110363830B (zh) 2023-05-02
CN110363830A (zh) 2019-10-22

Similar Documents

Publication Publication Date Title
WO2019196718A1 (fr) Procédé de génération d'image d'élément, dispositif et système
US11657230B2 (en) Referring image segmentation
US20210326656A1 (en) Panoptic segmentation
WO2021073493A1 (fr) Procédé et dispositif de traitement d'image, procédé d'apprentissage de réseau neuronal, procédé de traitement d'image de modèle de réseau neuronal combiné, procédé de construction de modèle de réseau neuronal combiné, processeur de réseau neuronal et support d'informations
CN113313022B (zh) 文字识别模型的训练方法和识别图像中文字的方法
WO2018086519A1 (fr) Procédé et dispositif d'identification d'informations textuelles spécifiques
CN115438215B (zh) 图文双向搜索及匹配模型训练方法、装置、设备及介质
CN113961736B (zh) 文本生成图像的方法、装置、计算机设备和存储介质
CN109816659B (zh) 图像分割方法、装置及系统
CN110826609B (zh) 一种基于强化学习的双流特征融合图像识别方法
CN113221879A (zh) 文本识别及模型训练方法、装置、设备及存储介质
CN111079374B (zh) 字体生成方法、装置和存储介质
CN113870286B (zh) 一种基于多级特征和掩码融合的前景分割方法
US11494431B2 (en) Generating accurate and natural captions for figures
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
CN111325660A (zh) 一种基于文本数据的遥感图像风格转换方法
US20220301106A1 (en) Training method and apparatus for image processing model, and image processing method and apparatus
CN112669215A (zh) 一种训练文本图像生成模型、文本图像生成的方法和装置
CN110347853B (zh) 一种基于循环神经网络的图像哈希码生成方法
Zhou et al. Attention transfer network for nature image matting
CN116485649A (zh) 一种端到端的图像拼接定位方法和系统
CN111340139B (zh) 一种图像内容复杂度的判别方法及装置
WO2021137942A1 (fr) Génération de motif
CN110796115A (zh) 图像检测方法、装置、电子设备及可读存储介质
CN117392284B (zh) 自适应条件增强的文本图像生成方法、系统、装置及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19786018

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19786018

Country of ref document: EP

Kind code of ref document: A1