CN113793403A - Text image synthesis method for simulating drawing process - Google Patents

Text image synthesis method for simulating drawing process Download PDF

Info

Publication number
CN113793403A
CN113793403A CN202110953553.9A CN202110953553A CN113793403A CN 113793403 A CN113793403 A CN 113793403A CN 202110953553 A CN202110953553 A CN 202110953553A CN 113793403 A CN113793403 A CN 113793403A
Authority
CN
China
Prior art keywords
information
text
image
synthesizing
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110953553.9A
Other languages
Chinese (zh)
Other versions
CN113793403B (en
Inventor
俞文心
张志强
戚原瑞
吴筱迪
刘露
龚俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202110953553.9A priority Critical patent/CN113793403B/en
Publication of CN113793403A publication Critical patent/CN113793403A/en
Application granted granted Critical
Publication of CN113793403B publication Critical patent/CN113793403B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a text image synthesis method for simulating a drawing process, which comprises the steps of inputting described text information and synthesizing corresponding outline information based on the text information; synthesizing foreground information of the image based on the synthesized contour information in combination with the input text information; synthesizing foreground information and combining with the text information input at the beginning to synthesize corresponding background information; and synthesizing a final image result by using the obtained foreground information, background information and text information. The method can be used in all image synthesis algorithms based on text information to further improve the quality of image synthesis, greatly improve the practicability of the image synthesis technology, and can better promote the development of the text synthesis image technology and better popularize the application of image synthesis software.

Description

Text image synthesis method for simulating drawing process
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a text image synthesis method for simulating a drawing process.
Background
The use of textual information to synthesize images has gained widespread attention in recent years in the field of image synthesis for computer vision. The reason is that the basic content of the image to be synthesized can be better described by using the text information, and meanwhile, the text information also conforms to the daily input habit of people. Using text has better flexibility than image synthesis techniques using simple category labels. Therefore, the image synthesis technology based on the text information can better promote the development of image synthesis software towards an interface-friendly direction, and a user can input corresponding text information according to personal requirements so as to synthesize an image result meeting subjective will. The method has good promotion effect on improving the practicability of the image synthesis technology and popularizing the image synthesis software.
In the existing image synthesis technology, the excellent synthetic image quality performance is based on a deep learning method. Most of the existing methods, although the synthesis quality is good, do not have good practicability. Some methods can synthesize corresponding images based on image category labels, which improves the utility of the technique to some extent. But the less information provided by category labels results in overall utility that is still lacking. The existing method with better practicability is to use text information to synthesize images. The current text-to-image technology has the problem that the foreground and background contents of the image are directly synthesized at one time, and a reasonable image synthesis step is lacked. This results in a relatively modest synthesis quality of the current text-synthesized image, which still leaves much room for improvement as a whole.
Disclosure of Invention
In order to solve the problems, the invention provides a text synthesis image method for simulating the drawing process, which can be used in all image synthesis algorithms based on text information to further improve the image synthesis quality, thereby greatly improving the practicability of the image synthesis technology, better promoting the development of the text synthesis image technology and better popularizing the application of image synthesis software.
In order to achieve the purpose, the invention adopts the technical scheme that: a text-to-image method for simulating a painting process, comprising the steps of:
s10, inputting the described text information, and synthesizing corresponding outline information based on the text information;
s20, synthesizing foreground information of the image based on the synthesized contour information in combination with the inputted text information;
s30, synthesizing foreground information and combining with the text information input at the beginning to synthesize corresponding background information;
s40, synthesizing a final image result using the obtained foreground information, background information, and text information.
Further, in the step S10, for the process of inputting text information, a text encoder is used to encode the text information into corresponding text vectors, and then a continuous deconvolution operation is used to encode the text vectors into corresponding outline information.
Further, in the step S20, when synthesizing the foreground information of the image based on the synthesized contour information together with the input text information, the contour information is encoded into the corresponding feature vector by the convolutional neural network, and the feature vector of the contour information and the text vector are synthesized into the foreground information of the image by the deconvolution operation.
Further, in step S30, when synthesizing the foreground information and then combining the text information input at the beginning to synthesize corresponding background information, the foreground information is encoded into corresponding feature vectors by the convolutional neural network, and the feature vectors of the foreground information and the text vectors are subjected to inference synthesis by a predictive synthesis operation to synthesize matched background information.
Further, in step S40, when synthesizing the final image result by using the obtained foreground information, background information, and text information, the background information is encoded into corresponding feature vectors by a convolutional neural network, and the feature vectors of the foreground information, the feature vectors of the background information, and the text vectors are synthesized into the final image result by a deconvolution operation.
The beneficial effects of the technical scheme are as follows:
the whole process of the invention is sequentially de-synthesis from the outline to the foreground to the background and finally to the whole image. The simple to gradually complex synthesis process refines the tasks of each stage so that each stage can focus more on its own task, and thus the task performance of each stage can be more excellent. In this case, the final synthesis of high-quality image results can be well guaranteed. The text information is participated in the whole process so as to ensure that the finally synthesized image can accord with the semantic information of the input text.
The method comprises the steps of firstly, speculating and synthesizing corresponding simple outline information based on the text, then synthesizing a corresponding foreground result based on the synthesized outline information, and finally synthesizing corresponding background information and a final image result based on the foreground content. The image synthesis process is from easy to difficult, from a simple outline to a foreground to a final image, and is similar to the process of drawing to synthesize a more real and credible image step by step. The image synthesis process more reasonable in the invention can better promote the development of the text synthesis image technology and better promote the application of image synthesis software.
The method can be used in all image synthesis algorithms based on text information to further improve the quality of image synthesis, greatly improves the practicability of the image synthesis technology, and can better promote the development of the image synthesis technology and the popularization of related image synthesis software.
Drawings
FIG. 1 is a flow chart of a method for synthesizing images from texts to simulate a painting process according to the present invention;
fig. 2 is a schematic diagram illustrating a schematic diagram of a text-to-image method for simulating a drawing process according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described with reference to the accompanying drawings.
In this embodiment, referring to fig. 1 and fig. 2, the present invention provides a method for synthesizing an image of a text by simulating a drawing process, including the steps of:
s10, inputting the described text information, and synthesizing corresponding outline information based on the text information;
s20, synthesizing foreground information of the image based on the synthesized contour information in combination with the inputted text information;
s30, synthesizing foreground information and combining with the text information input at the beginning to synthesize corresponding background information;
s40, synthesizing a final image result using the obtained foreground information, background information, and text information.
As an optimization of the above embodiment, in step S10, for the process of inputting text information, a text encoder is used to encode the text information into a corresponding text vector, and then a continuous deconvolution operation is used to encode the text vector into a corresponding outline information.
The specific process formula is as follows:
text vector
Figure BDA0003219477080000031
Contour information: i isc=deconvolution(s);
Wherein T represents inputted text information;
Figure BDA0003219477080000032
representing a text encoder; deconvolution refers to a deconvolution operation.
As an optimization scheme of the above embodiment, in step S20, when synthesizing foreground information of an image based on synthesized contour information together with input text information, the contour information is encoded into corresponding feature vectors by a convolutional neural network, and the feature vectors of the contour information and the text vectors are subjected to deconvolution to synthesize the foreground information of the image.
The specific process formula is as follows:
feature vector of contour information: fea _ c ═ CNN (I)c);
Foreground information: i isf=deconvolution(fea_c,s);
Wherein, CNN is a convolution neural network; deconvolution refers to a deconvolution operation.
As an optimization scheme of the above embodiment, in step S30, when synthesizing foreground information and combining text information input at the beginning to synthesize corresponding background information, the foreground information is encoded into corresponding feature vectors by a convolutional neural network, and the feature vectors of the foreground information and the text vectors are subjected to inference synthesis by a predictive synthesis operation to synthesize matched background information.
The specific process formula is as follows:
feature vector of foreground information: fea _ f ═ CNN (I)f);
Background information: i isb=prediction(fea_f,s);
Wherein, CNN is a convolution neural network; prediction represents a predictive synthesis operation.
As an optimization scheme of the above embodiment, in the step S40, when synthesizing the final image result by using the obtained foreground information, background information and text information, the background information is encoded into corresponding feature vectors by a convolutional neural network, and the feature vectors of the foreground information, the feature vectors of the background information and the text vectors are synthesized into the final image result by a deconvolution operation.
The specific process formula is as follows:
feature vector of background information: fea _ f ═ CNN (I)b);
Final image results: i isg=deconvoluotion(fea_f,fea_b,s);
Wherein, CNN is a convolution neural network; deconvolution refers to a deconvolution operation.
Specific examples may employ:
text-to-image system
A web interface similar to the hundred degree translation is provided in which text information is allowed to be manually input and then a composition button is clicked to generate a corresponding image result. Thus obtaining the image result which accords with the subjective intention of people.
Two, text synthetic image software
The software comprises two parts: and synthesizing an image result and displaying an image process result.
The text-to-image software formed using the present invention allows a user to enter text information within the software, which can then automatically synthesize a corresponding image. Meanwhile, the software can also display the staged results, wherein the staged results comprise contour information, foreground content and background content generated in the synthesis process. The software can be used in computer structural aided design.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. A text-to-image method for simulating a painting process, comprising the steps of:
s10, inputting the described text information, and synthesizing corresponding outline information based on the text information;
s20, synthesizing foreground information of the image based on the synthesized contour information in combination with the inputted text information;
s30, synthesizing foreground information and combining with the text information input at the beginning to synthesize corresponding background information;
s40, synthesizing a final image result using the obtained foreground information, background information, and text information.
2. A method for synthesizing images of texts simulating drawing according to claim 1, wherein in step S10, for the processing of input text information, a text encoder is used to encode the text information into corresponding text vectors, and then a continuous deconvolution operation is used to encode the text vectors into corresponding outline information.
3. A method for synthesizing an image according to a text simulating a painting process as claimed in claim 2, wherein in said step S20, when synthesizing foreground information of the image based on the synthesized contour information together with the inputted text information, the contour information is encoded into corresponding feature vectors by a convolutional neural network, and the feature vectors of the contour information and the text vectors are synthesized into the foreground information of the image by a deconvolution operation.
4. A method as claimed in claim 3, wherein in step S30, when synthesizing foreground information and combining the text information inputted at the beginning to synthesize corresponding background information, the foreground information is encoded into corresponding feature vectors by convolutional neural network, and the feature vectors of the foreground information and the text vectors are inferred and synthesized into matched background information by predictive synthesis operation.
5. A method as claimed in claim 2, wherein in step S40, when synthesizing the final image result by using the obtained foreground information, background information and text information, the background information is encoded into corresponding feature vectors by a convolutional neural network, and the feature vectors of the foreground information, the feature vectors of the background information and the text vectors are synthesized into the final image result by a deconvolution operation.
CN202110953553.9A 2021-08-19 2021-08-19 Text image synthesizing method for simulating painting process Active CN113793403B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110953553.9A CN113793403B (en) 2021-08-19 2021-08-19 Text image synthesizing method for simulating painting process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110953553.9A CN113793403B (en) 2021-08-19 2021-08-19 Text image synthesizing method for simulating painting process

Publications (2)

Publication Number Publication Date
CN113793403A true CN113793403A (en) 2021-12-14
CN113793403B CN113793403B (en) 2023-09-22

Family

ID=79182069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110953553.9A Active CN113793403B (en) 2021-08-19 2021-08-19 Text image synthesizing method for simulating painting process

Country Status (1)

Country Link
CN (1) CN113793403B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110157221A1 (en) * 2009-12-29 2011-06-30 Ptucha Raymond W Camera and display system interactivity
WO2011112522A2 (en) * 2010-03-10 2011-09-15 Microsoft Corporation Text enhancement of a textual image undergoing optical character recognition
CN102724554A (en) * 2012-07-02 2012-10-10 西南科技大学 Scene-segmentation-based semantic watermark embedding method for video resource
CN105184074A (en) * 2015-09-01 2015-12-23 哈尔滨工程大学 Multi-modal medical image data model based medical data extraction and parallel loading method
CN107305696A (en) * 2016-04-22 2017-10-31 阿里巴巴集团控股有限公司 A kind of image generating method and device
CN107895393A (en) * 2017-10-24 2018-04-10 天津大学 A kind of story image sequence generation method of comprehensive word and shape
CN111507328A (en) * 2020-04-13 2020-08-07 北京爱咔咔信息技术有限公司 Text recognition and model training method, system, equipment and readable storage medium
US20200285855A1 (en) * 2017-06-05 2020-09-10 Umajin Inc. Hub and spoke classification system
CN112734881A (en) * 2020-12-01 2021-04-30 北京交通大学 Text synthesis image method and system based on significance scene graph analysis

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110157221A1 (en) * 2009-12-29 2011-06-30 Ptucha Raymond W Camera and display system interactivity
WO2011112522A2 (en) * 2010-03-10 2011-09-15 Microsoft Corporation Text enhancement of a textual image undergoing optical character recognition
CN102724554A (en) * 2012-07-02 2012-10-10 西南科技大学 Scene-segmentation-based semantic watermark embedding method for video resource
CN105184074A (en) * 2015-09-01 2015-12-23 哈尔滨工程大学 Multi-modal medical image data model based medical data extraction and parallel loading method
CN107305696A (en) * 2016-04-22 2017-10-31 阿里巴巴集团控股有限公司 A kind of image generating method and device
US20200285855A1 (en) * 2017-06-05 2020-09-10 Umajin Inc. Hub and spoke classification system
CN107895393A (en) * 2017-10-24 2018-04-10 天津大学 A kind of story image sequence generation method of comprehensive word and shape
CN111507328A (en) * 2020-04-13 2020-08-07 北京爱咔咔信息技术有限公司 Text recognition and model training method, system, equipment and readable storage medium
CN112734881A (en) * 2020-12-01 2021-04-30 北京交通大学 Text synthesis image method and system based on significance scene graph analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHIQIANG ZHANG .ETC: "Text to Image Synthesis Using Two-Stage Generation and Two-Stage Discrimination", 《INTERNATIONAL CONFERENCE ON KNOWLEDGE SCIENCE,ENGINEERING AND MANAGEMENT》, vol. 11776, pages 110 - 114 *
张志强: "基于深度学习的图文转换算法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》, pages 138 - 423 *

Also Published As

Publication number Publication date
CN113793403B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
CN110880315A (en) Personalized voice and video generation system based on phoneme posterior probability
WO2023065617A1 (en) Cross-modal retrieval system and method based on pre-training model and recall and ranking
Saunders et al. Signing at scale: Learning to co-articulate signs for large-scale photo-realistic sign language production
CN110162766B (en) Word vector updating method and device
Ren et al. Two-stage sketch colorization with color parsing
CN112002301A (en) Text-based automatic video generation method
Qiao et al. Efficient style-corpus constrained learning for photorealistic style transfer
CN115294427A (en) Stylized image description generation method based on transfer learning
CN112819692A (en) Real-time arbitrary style migration method based on double attention modules
Yi et al. Quality metric guided portrait line drawing generation from unpaired training data
Zhang et al. A survey on multimodal-guided visual content synthesis
Lv et al. Generating chinese classical landscape paintings based on cycle-consistent adversarial networks
Wang et al. Towards harmonized regional style transfer and manipulation for facial images
Rao et al. UMFA: a photorealistic style transfer method based on U-Net and multi-layer feature aggregation
CN113793403A (en) Text image synthesis method for simulating drawing process
Liu et al. Bi-lstm sequence modeling for on-the-fly fine-grained sketch-based image retrieval
WO2023154192A1 (en) Video synthesis via multimodal conditioning
Yu et al. Sketch beautification: Learning part beautification and structure refinement for sketches of man-made objects
Bai et al. Itstyler: Image-optimized text-based style transfer
Ma et al. Data-Driven Computer Choreography Based on Kinect and 3D Technology
CN112435319A (en) Two-dimensional animation generating system based on computer processing
Tan et al. Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style
CN113793404A (en) Artificially controllable image synthesis method based on text and outline
Song et al. Virtual Human Talking-Head Generation
Rahul et al. Morphology & word sense disambiguation embedded multimodal neural machine translation system between Sanskrit and Malayalam

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant