CN114564768A - End-to-end intelligent plane design method based on deep learning - Google Patents

End-to-end intelligent plane design method based on deep learning Download PDF

Info

Publication number
CN114564768A
CN114564768A CN202210218256.4A CN202210218256A CN114564768A CN 114564768 A CN114564768 A CN 114564768A CN 202210218256 A CN202210218256 A CN 202210218256A CN 114564768 A CN114564768 A CN 114564768A
Authority
CN
China
Prior art keywords
text
design
image
poster
layout
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210218256.4A
Other languages
Chinese (zh)
Inventor
李金遥
刘金华
刘传蔓
黄东晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202210218256.4A priority Critical patent/CN114564768A/en
Publication of CN114564768A publication Critical patent/CN114564768A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an end-to-end intelligent plane design method based on deep learning, which comprises the steps of using an acquired poster data set, firstly screening and dividing an original poster data set, carrying out layout design and attribute confirmation for common learning of two subtasks, and carrying out training and parameter adjustment on a training set to enable the model performance to reach an optimal state; according to the input image and text information, by calling the trained model, the intelligent planar design framework extracts the characteristic information of the image and the text, searches for a proper layout of the design drawing and confirms the text attribute of the composition, and automatically generates a harmonious planar design. The invention uses a unified joint training framework to prevent data distribution difference in the training process, reduces error propagation in a pipeline model, exerts the advantage of end-to-end training, does not need to artificially define aesthetic rules of plane design, learns the aesthetic rules from data, and does not depend on image significant map detection, thereby being capable of better generalizing to various plane design tasks.

Description

End-to-end intelligent plane design method based on deep learning
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an end-to-end intelligent plane design method based on deep learning.
Background
The planar design, for example, poster design, is widely used in our daily life as an important medium for visual communication. Designers often require a great deal of time and effort to create a harmonious and aesthetically pleasing graphic design, and have a high threshold for ordinary people without professional foundations, requiring high aesthetic requirements and professional design knowledge. With the rapid progress and development of computer vision technology, people have increasingly strong interest in computer intelligent design.
At present, various methods for assisting in planar design can significantly reduce the time and labor consumed by basic design work, and can be mainly divided into a conventional method, a rule-based method and a data-driven method.
The traditional method of automatic flat panel design usually adopts a design rule or a structured data manner, and designers often obey the design rule when creating graphic designs, such as the aesthetic principle of layout and harmonious color model, but the traditional method has lower intelligence and limited application range. Rule-based approaches use some design rules to assist specific design tasks, including layout synthesis, color generation, and image thumbnails, etc., yet aesthetic rules are complex and difficult to define and constrain by specific rules. Ali, et al, in the document Learning cancellation network for semantic segmentation, propose a constraint-based recommendation system that can generate a magazine cover design according to the user's preference, but this method is only applicable to a specific magazine cover template.
With the development of deep learning, data-driven methods derive specific design attributes from training images for use in solving planar design problems. Yang et al, in the literature, "Recommendation system for automatic design of Magazine covers" discloses the effectiveness of optimization methods using aesthetic design principles, and proposes a learning-based method for generating magazine covers with the help of a large database, however, the rules are more restricted and there is a templating problem.
Zheng et al, in Content-aware formatting of graphic design layouts, propose a Content-aware depth generation model for graphic design layouts, which is capable of synthesizing graphic design layouts based on visual and textual features, but is mainly applied to magazine layout and is not suitable for generation with more extensive graphic posters.
Yang et al, in the document "assistant frame for comprehensive learning of visual representation", devised a system for automatically generating digital magazine covers by summarizing a set of templates related to topics and introducing a calculation formula, including a framework of key elements for layout design, but the above method is a rule-based system and is less intelligent.
Feng et al propose an interactive system design Scap in Adataset and a baseline model for spatial object detection, which can generate layout suggestions on a set of images and texts input by a user, in order to enhance the interactivity of the automatic layout generation process, but which is mainly based on saliency detection and has limitations for different planar designs.
While the above-described techniques have contributed to the generation of suitable text image layouts, the layout rules derived from these techniques are not applicable to the generation of planar design works, and ignore the critical impact of design element attributes other than layout on planar designs. Meanwhile, the mainstream intelligent solution excessively depends on significance detection, and the development space of intelligent plane design is limited to a great extent.
Disclosure of Invention
Four big problems to present intelligent planar design field: firstly, the use of saliency image detection is excessively relied on so as to limit the intelligent plane design, secondly, design element styles in the plane design are lack of consideration and solutions, thirdly, propagation errors and errors are accumulated due to the fact that a pipeline model structure is used more, and fourthly, poster structured data is insufficient; the invention provides an end-to-end intelligent plane design method based on deep learning, which comprehensively considers given images and design elements and automatically generates harmonious and attractive plane design.
An end-to-end intelligent plane design method based on deep learning comprises the following steps:
(1) collecting semi-structured poster data, cleaning and screening the poster data, and respectively storing composition text attributes and background images of the poster to provide required training data for an intelligent planar design model;
(2) performing joint training on the layout design and attribute confirmation subtasks in the intelligent planar design model, so that the model can extract features from a multi-modal (visual and text) view of poster data;
(3) fusing image text characteristics by using an image module and a text module in the model, and decoding to obtain a layout density inference graph;
(4) determining a layout design by using an approximate inference algorithm according to a layout density inference diagram;
(5) and integrating the overall image features and the local image features, and determining the attribute category of the composition text according to the output of the classifier.
Further, the specific implementation manner of the step (1) is as follows: firstly, semi-structured poster data are collected from public information of a webpage and screened, poster background images, title text sequences, relative coordinates of text boxes, text font information and other corresponding composition text attributes in the collected poster data set are recorded and stored, the poster background images, the title text sequences, the poster background images and the text font information are used as input, the poster is rendered as a target, and required training data are provided for a neural network in an intelligent planar design network framework. Training with an unsupervised signal in a semi-structured poster can overcome the difficulty of obtaining annotations.
Further, the specific implementation manner of the step (2) is as follows: firstly, gathering two main parts of visual representation and text representation, which are keys of the design of the plane poster, and considering not only the characteristics of an image but also the semantics of an input text; then, the aggregated multi-modal features are used as input of a decoder, the decoder outputs a density map with the same size as the original image, and each element value in the density map corresponds to the fraction of an original pixel and represents the weight of the pixel appearing in the text area; and then designing an objective function in the training process, wherein the overall objective is the sum of the minimum layout prediction loss and the attribute identification loss, and the parameter learning of the model is carried out through second-order gradient optimization.
Further, the expression of the objective function is as follows:
Figure BDA0003530104680000031
Figure BDA0003530104680000032
Figure BDA0003530104680000033
wherein:
Figure BDA0003530104680000034
in order to be the objective function, the target function,
Figure BDA0003530104680000035
a loss function is predicted for the layout,
Figure BDA0003530104680000036
identifying a loss function for an attribute, Mi,jIs the element value, G, of a pixel point (i, j) in the density mapi,jA binary indicator of whether the pixel point (i, j) is located in the text region,
Figure BDA0003530104680000037
for attribute alpha belonging to suitable category yaσ () is a sigmoid function and Attributes represents a set of Attributes.
Furthermore, the intelligent planar design model integrally adopts a coder-decoder architecture, a convolutional neural network is used for extracting the image characteristics of the poster in an image module, the image characteristics are coded in more channels along with the depth of the network, and a characteristic diagram is used for coding in a wider receptive field; in a text module, extracting context-aware text features by using a pre-training language model, namely adopting a distributed vector to represent semantic information expected to carry an input text, then obtaining fixed-dimension distributed representation of the whole input sequence by using average pooling, converting text representation into a similar vector space represented by a graph by using a multilayer perceptron, and finally representing the text as a vector; and carrying out channel level feature fusion on the image features and the text features, and decoding by a decoder to obtain a layout density inference graph.
Further, in the step (4), a search-based layout design is adopted, that is, the position and the size of the composition text are predicted by searching on the layout density inference graph after feature fusion, the score of each pixel is modeled to indicate a proper region of the composition text, and the layout design predicts the position and the size of the composition text region by solving the optimization problem on the density inference graph; starting from the local maximum of the layout density inference graph, gradually enlarging the area of a rectangle in a heuristic search mode to determine the area of a text object, assuming that the correct position of composition text input is approximately centered on the local maximum, and the score of a candidate area is almost convex in terms of the distance from the edge to the local maximum, and determining the layout design by using an approximate inference algorithm by utilizing the locality of the density graph.
Further, after determining the poster layout design, the step (5) discretizes continuous attributes in composition text attributes (including text font, color, font size, etc.) into several categories for respective design, and further collects features from two sources by using a composition text attribute design model: on one hand, collecting the hidden image features of the poster image text features for global summarization of the integral input; on the other hand, local features are collected from the weighted local view of the original image; as the color of text information is generally limited by the hue of a local text region, the composition text attribute design model extracts local view image features by using a similar convolution encoder, integrates the overall features and the local features of the image, performs grading output by using a multi-layer perceptron classifier, and jointly determines the category of the composition text attribute.
The invention uses the unified joint training framework to prevent the data distribution difference in the training process, reduces the error propagation in the pipeline model and exerts the advantage of end-to-end training; meanwhile, the invention does not need to artificially define the aesthetic rules of the planar design, but learns the aesthetic rules from the data, and does not depend on the detection of the image significant map, thereby being capable of better generalizing to various planar design tasks.
Compared with the prior art, the invention has the following characteristics and beneficial technical effects:
1. the invention firstly provides an end-to-end framework of intelligent planar design, and avoids the weak point of error propagation generated by a multi-step pipeline and the difficulty of maintaining aesthetic constraints.
2. The invention designs an end-to-end network, jointly learns layout design and attribute determination, trains the network by using the self-supervision signal extracted from the semi-structured poster, and overcomes the difficulty of obtaining labels.
3. The effectiveness of the framework of the invention is proved by experimental results of crawling data, and extensive experiments and analysis of the results prove that the end-to-end intelligent plane design method based on deep learning has outstanding superiority compared with the prior method based on significance.
Drawings
Fig. 1 is a schematic overall flow chart of the intelligent planar design method of the present invention.
FIG. 2 is a schematic diagram of a design flat data auto-supervision signal.
FIG. 3 is a schematic diagram of an image and text feature fusion model.
FIG. 4 is a search-based layout design.
FIG. 5 is a schematic diagram of an attribute determination process based on fused features.
Detailed Description
In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.
In the embodiment, the collected poster data set is used, the original poster data set is screened and divided firstly, the common learning of two subtasks of layout design and attribute confirmation is carried out, and training and parameter adjustment are carried out on the training set, so that the model performance reaches the optimal state; according to the input image and text information, by calling the trained model, the intelligent planar design framework extracts the characteristic information of the image and the text, searches for a proper layout of the design drawing and confirms the text attribute of the composition, and automatically generates a harmonious planar design. As shown in fig. 1, the specific process flow of the present embodiment is as follows:
(1) and (4) collecting and preprocessing the design data of the self-supervision plane.
The method comprises the steps of collecting semi-structured poster data from public information of a webpage, screening, recording poster background images, basic composition units, title text sequences, relative coordinates of text boxes, text font information and other corresponding design element attributes in collected poster data sets, and using the poster background images, the basic composition units, the title text sequences, the relative coordinates of the text boxes, the text font information and the other corresponding design element attributes as annotation information and self-supervision signals of training data. The method comprises the steps that a text sequence and a background image are used as input, a rendered poster is used as a target, and a required training signal is provided for a neural network in a network framework designed for an intelligent poster; training with an unsupervised signal in a semi-structured poster overcomes the difficulty of obtaining annotations. The data used in this example was poster data, the underlying data consisting essentially of: poster background picture, basic composition unit, caption content and text sequence, character font, font color, font size, font position, etc., as shown in fig. 2.
(2) And (5) performing intelligent planar design framework joint training.
The intelligent planar design framework uses a unified neural network for layout design and attribute validation, so that the benefit of joint training of two subtasks can be enjoyed. The key to the design of a flat poster, the two main parts of visual representation and text representation are grouped together as follows, taking into account not only the characteristics of the image, but also the semantics of the input text:
F=Concat(HV,REP(U),REP(L))
wherein:
Figure BDA0003530104680000061
representing the hidden image feature map extracted by the encoder, CVH and W respectively represent the channel number, height and width of the characteristic diagram; the text representation is replicated in the height and width dimensions to align with the visual representation (repeated operations are represented by REP) and additionally a scalar feature is added as an input sequence
Figure BDA0003530104680000062
Figure BDA0003530104680000063
Length of (d).
M=Decoder(F)
Using the aggregated multi-modal features F as input to the decoder, the decoder outputs a density map of the same size as the original image
Figure BDA0003530104680000064
Each element Mi,jIs corresponding to the pixel Ii,jRepresents the weight of the pixel selected to appear in the text region.
Goal of layout design
Figure BDA0003530104680000065
Layout and composition text attribute validation
Figure BDA0003530104680000066
The attributes are respectively shown in the following formula:
Figure BDA0003530104680000067
Figure BDA0003530104680000068
Figure BDA0003530104680000069
wherein: gi,jIs a binary indicator of whether the pixel (i, j) is located in a text region,
Figure BDA00035301046800000610
is that the attribute a belongs to the suitable category yaThe probability of (c). The overall goal is to minimize the sum of layout design loss and attribute validation loss, perform parameter learning of the model through second order gradient optimization, and joint training helps the intelligent floorplanning network to extract better features from multimodal (visual and textual) views of data.
(3) And image text feature fusion and layout density reasoning.
The intelligent planar design model is trained on the acquired poster data training set, and the whole structure adopts a coder-decoder architecture, as shown in fig. 3; at an image module, extracting image features of the poster by using a convolutional neural network; as the network goes deeper, it encodes the graphical features in more channels and uses the feature map to encode over a wider receptive field:
HV=Encoder (I)
at the text module, extracting context-aware text features using a pre-trained language model:
U=MLP(AVG({E1,E2,…,En,}))
wherein: each mark TiIs embedded in a mark represented by
Figure BDA0003530104680000071
Distributed vector representation is expected to carry semantic information of input text, followed by using average poolingA fixed-dimension distributed representation of the entire input sequence is obtained (denoted AVG), and the textual representation is converted to a similarity vector space of the graphical representation by a multi-layered perceptron (denoted MLP). The final text representation is a vector
Figure BDA0003530104680000072
And carrying out channel level feature fusion on the image features and the text features, and decoding by a decoder M to obtain a layout density inference graph.
(4) And designing a layout based on the search.
The layout design module models the score for each pixel to indicate the appropriate region of the composed text by predicting the position and size of the composed text based on a search on the feature fused layout density map.
The layout design predicts the position and size of a composition text region by solving the optimization problem on the density map; based on the output of decoder M, assume σ (M)i,j) Is represented byi,jProbability of being located in the bounding box of a given text sequence, σ, represents a sigmoid function as shown below:
Figure BDA0003530104680000073
given a density map M and a corresponding probability matrix σ (M), the angular coordinates of the text box are determined: lower left corner (x)1,y1) And the upper right corner (x)2,y2) And specifies x1<x2,y1<y2And converting the corresponding prediction task into a constraint optimization problem. Starting from the local maximum of the layout density map, the area of the rectangle is gradually enlarged in a heuristic search mode to determine the area of the text object. Assuming that the correct position of the patterned text input is approximately centered on the local maximum and the score of the candidate region is almost convex in terms of distance from the edge to the local maximum, the layout design is determined using an approximation reasoning algorithm, taking advantage of the locality of the density map, as shown in fig. 4.
(5) And determining the composition text attribute based on the fusion features.
Through the data-driven joint learning in the step (2), the fusion feature F not only contains key information of layout design, but also contains key information of composition text attributes. After the poster layout design is determined, the continuous attributes of the attributes (including text font, color, font size, etc.) of the composition text are discretized into several categories for separate design. Composition text property design models collect features mainly from two sources: on one hand, collecting the hidden image features of the poster image text features for global summarization of the overall input; on the other hand, local features are collected from the weighted local views of the original image. In particular, since the color of text information is generally limited by the hue of local text regions, the model may use a similar convolution encoder to extract local view image features, as shown in fig. 5, and merge the overall features and the local features of the image, and use a multi-layered perceptron classifier to perform scoring output, thereby jointly determining the category of the composition text attribute.
Feature F from the multimodal feature extraction network describes a global view of the input, while local views of the image also have key-dependent effects on the attribute determination, e.g. the color of the text input is generally limited by the hue of the local text regions. Weighting original images
Figure BDA0003530104680000084
Applying the probability density map as an attention weight for each pixel of the original image,
Figure BDA0003530104680000083
representing the bit-aligned product, using a similar convolutional encoder to extract local view image features
Figure BDA0003530104680000081
Figure BDA0003530104680000085
Connecting the global features and the local features, using an MLP classifier to carry out logits output, and logarithms pass through a softmax functionNormalized to a probability distribution, piRepresents the probability that an attribute belongs to the ith class:
logiti=MLP(F,Fl)
Figure BDA0003530104680000082
the performance of an end-to-end intelligent plane design framework on various mainstream models is researched, and Test loss (lower is better), Jaccard similarity (higher is better) and ACC (higher is better) indexes are used for respectively evaluating the overall quality, the layout generation quality and the attribute design quality.
With the increasing functionality of the backbone structure, the end-to-end intelligent planar design framework can obtain better design works in terms of overall quality, and overall performance and effectiveness of each component can be better. As shown in table 1, the performance of the end-to-end intelligent planar design network of the present invention on the layout prediction subtask is superior to the method based on significance detection, which obtains lower Jaccard similarity during testing compared to all AuPoD series networks except AuPoD-FCN 32. The results show that the end-to-end intelligent floor design framework can benefit from better significance detection induced bias.
TABLE 1
Figure BDA0003530104680000091
The foregoing description of the embodiments is provided to enable one of ordinary skill in the art to make and use the invention, and it is to be understood that other modifications of the embodiments, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty, as will be readily apparent to those skilled in the art. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims (8)

1. An end-to-end intelligent plane design method based on deep learning comprises the following steps:
(1) collecting semi-structured poster data, cleaning and screening the poster data, and respectively storing composition text attributes and background images of the poster to provide required training data for an intelligent planar design model;
(2) performing joint training on the layout design and attribute confirmation two subtasks in the intelligent planar design model, so that the model can extract features from the multi-modal view of the poster data;
(3) fusing image text characteristics by using an image module and a text module in the model, and decoding to obtain a layout density inference graph;
(4) determining a layout design by using an approximate inference algorithm according to a layout density inference diagram;
(5) and integrating the overall image features and the local image features, and determining the attribute category of the composition text according to the output of the classifier.
2. The end-to-end intelligent planar design method of claim 1, wherein: the specific implementation manner of the step (1) is as follows: firstly, semi-structured poster data are collected from public information of a webpage and screened, poster background images, title text sequences, relative coordinates of text boxes, text font information and other corresponding composition text attributes in the collected poster data set are recorded and stored, the poster background images, the title text sequences, the poster background images and the text font information are used as input, the poster is rendered as a target, and required training data are provided for a neural network in an intelligent planar design network framework.
3. The end-to-end intelligent planar design method of claim 1, wherein: the specific implementation manner of the step (2) is as follows: firstly, gathering two main parts of visual representation and text representation, which are keys of the design of the plane poster, and considering not only the characteristics of an image but also the semantics of an input text; then, the aggregated multi-modal features are used as the input of a decoder, the decoder outputs a density map with the same size as the original image, and each element value in the density map corresponds to the fraction of the original pixel and represents the weight of the pixel selected to appear in the text region; and then designing an objective function in the training process, wherein the overall objective is the sum of the minimum layout prediction loss and the attribute identification loss, and the parameter learning of the model is carried out through second-order gradient optimization.
4. The end-to-end intelligent planar design method of claim 3, wherein: the expression of the objective function is as follows:
Figure FDA0003530104670000011
Figure FDA0003530104670000021
Figure FDA0003530104670000022
wherein:
Figure FDA0003530104670000023
in order to be the objective function, the target function,
Figure FDA0003530104670000024
a loss function is predicted for the layout,
Figure FDA0003530104670000025
identifying a loss function for an attribute, Mi,jIs the element value, G, of a pixel point (i, j) in the density mapi,jA binary indicator of whether the pixel point (i, j) is located in the text region,
Figure FDA0003530104670000026
for attribute a belonging to suitable category yaσ () is a sigmoid function and Attributes represents a set of Attributes.
5. The end-to-end intelligent planar design method of claim 1, wherein: the intelligent planar design model integrally adopts a coder-decoder framework, a convolutional neural network is used for extracting the image characteristics of the poster in an image module, the image characteristics are coded in more channels along with the depth of the network, and a characteristic diagram is used for coding in a wider receptive field; in a text module, extracting context-aware text features by using a pre-training language model, namely adopting a distributed vector to represent semantic information expected to carry an input text, then obtaining fixed-dimension distributed representation of the whole input sequence by using average pooling, converting text representation into a similar vector space represented by a graph by using a multilayer perceptron, and finally representing the text as a vector; and carrying out channel level feature fusion on the image features and the text features, and decoding by a decoder to obtain a layout density inference graph.
6. The end-to-end intelligent planar design method of claim 1, wherein: in the step (4), a layout design based on search is adopted, namely, the position and the size of the composition text are predicted by searching on a layout density inference graph after feature fusion, the score of each pixel is modeled to indicate a proper region of the composition text, and the position and the size of the region of the composition text are predicted by solving the optimization problem on the density inference graph; starting from the local maximum of the layout density inference graph, gradually enlarging the area of a rectangle in a heuristic search mode to determine the area of a text object, assuming that the correct position of composition text input is approximately centered on the local maximum, and the score of a candidate area is almost convex in terms of the distance from the edge to the local maximum, and determining the layout design by using an approximate inference algorithm by utilizing the locality of the density graph.
7. The end-to-end intelligent planar design method of claim 1, wherein: after determining the poster layout design, discretizing the continuous attributes in the composition text attributes into several categories for respective design, and collecting features from two sources by using a composition text attribute design model: on one hand, collecting the hidden image features of the poster image text features for global summarization of the integral input; on the other hand, local features are collected from the weighted local view of the original image; as the color of text information is generally limited by the hue of a local text region, the composition text attribute design model extracts local view image features by using a similar convolution encoder, integrates the overall features and the local features of the image, performs grading output by using a multi-layer perceptron classifier, and jointly determines the category of the composition text attribute.
8. The end-to-end intelligent planar design method of claim 1, wherein: the method comprises the steps of using an acquired poster data set, firstly screening and dividing an original poster data set, carrying out layout design and attribute confirmation for common learning of two subtasks, and carrying out training and parameter adjustment on a training set to enable the model performance to reach an optimal state; according to the input image and text information, by calling the trained model, the intelligent planar design framework extracts the characteristic information of the image and the text, searches for a proper layout of the design drawing and confirms the text attribute of the composition, and automatically generates a harmonious planar design.
CN202210218256.4A 2022-03-03 2022-03-03 End-to-end intelligent plane design method based on deep learning Pending CN114564768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210218256.4A CN114564768A (en) 2022-03-03 2022-03-03 End-to-end intelligent plane design method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210218256.4A CN114564768A (en) 2022-03-03 2022-03-03 End-to-end intelligent plane design method based on deep learning

Publications (1)

Publication Number Publication Date
CN114564768A true CN114564768A (en) 2022-05-31

Family

ID=81717527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210218256.4A Pending CN114564768A (en) 2022-03-03 2022-03-03 End-to-end intelligent plane design method based on deep learning

Country Status (1)

Country Link
CN (1) CN114564768A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998482A (en) * 2022-06-13 2022-09-02 厦门大学 Intelligent generation method of characters and artistic patterns
CN116776827A (en) * 2023-08-23 2023-09-19 山东捷瑞数字科技股份有限公司 Artificial intelligent typesetting method, device, equipment and readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998482A (en) * 2022-06-13 2022-09-02 厦门大学 Intelligent generation method of characters and artistic patterns
CN116776827A (en) * 2023-08-23 2023-09-19 山东捷瑞数字科技股份有限公司 Artificial intelligent typesetting method, device, equipment and readable storage medium
CN116776827B (en) * 2023-08-23 2023-11-21 山东捷瑞数字科技股份有限公司 Artificial intelligent typesetting method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
WO2021147726A1 (en) Information extraction method and apparatus, electronic device and storage medium
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN110750959A (en) Text information processing method, model training method and related device
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN114743020B (en) Food identification method combining label semantic embedding and attention fusion
Sharma et al. A survey of methods, datasets and evaluation metrics for visual question answering
CN113535917A (en) Intelligent question-answering method and system based on travel knowledge map
CN113779211A (en) Intelligent question-answer reasoning method and system based on natural language entity relationship
CN114564768A (en) End-to-end intelligent plane design method based on deep learning
CN110502655B (en) Method for generating image natural description sentences embedded with scene character information
CN114996488A (en) Skynet big data decision-level fusion method
CN114239585A (en) Biomedical nested named entity recognition method
CN116860978B (en) Primary school Chinese personalized learning system based on knowledge graph and large model
CN116610778A (en) Bidirectional image-text matching method based on cross-modal global and local attention mechanism
CN112036178A (en) Distribution network entity related semantic search method
CN115311465A (en) Image description method based on double attention models
CN114880307A (en) Structured modeling method for knowledge in open education field
Nam et al. A survey on multimodal bidirectional machine learning translation of image and natural language processing
CN117497178A (en) Knowledge-graph-based common disease auxiliary decision-making method
CN115017884A (en) Text parallel sentence pair extraction method based on image-text multi-mode gating enhancement
CN114048314A (en) Natural language steganalysis method
CN117648429A (en) Question-answering method and system based on multi-mode self-adaptive search type enhanced large model
Ma et al. Ontology-based BERT model for automated information extraction from geological hazard reports
CN116662591A (en) Robust visual question-answering model training method based on contrast learning
CN116843175A (en) Contract term risk checking method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination