CN116597048A - Image file generation method, device, equipment and program product - Google Patents

Image file generation method, device, equipment and program product Download PDF

Info

Publication number
CN116597048A
CN116597048A CN202310449587.3A CN202310449587A CN116597048A CN 116597048 A CN116597048 A CN 116597048A CN 202310449587 A CN202310449587 A CN 202310449587A CN 116597048 A CN116597048 A CN 116597048A
Authority
CN
China
Prior art keywords
training
image
generation model
image generation
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310449587.3A
Other languages
Chinese (zh)
Inventor
郭冬雨
陈斌
周旭
张亚中
陈起进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202310449587.3A priority Critical patent/CN116597048A/en
Publication of CN116597048A publication Critical patent/CN116597048A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application provides a method, a device, equipment and a program product for generating an image file, and relates to the technical field of AI images. A method of generating an image file, comprising: optimizing an image according to preset conditions to generate a model; acquiring a training picture and a training text, wherein the training text comprises image theme information; fine tuning the image generation model according to the training pictures and the training texts; and generating an image file through the trimmed image generation model according to the input information. According to the embodiment of the application, the generalization of the image generation model in the training and image file generation processes can be ensured not to be influenced.

Description

Image file generation method, device, equipment and program product
Technical Field
The present application relates to the technical field of AI images, and in particular, to a method, an apparatus, a device, and a program product for generating an image file.
Background
With the continuous development of AIGC (AI Generated Content) technology for generating material content through AI, the personalized customization of material images is increasingly demanded.
The official materials provided by the traditional platform are mainly made of icon, the quality of the materials is low, the richness is low, and the external high-quality materials cannot be distributed and authorized secondarily due to copyright problems. The material created by the AIGC technology at the present stage often has too many elements in the image and uncontrollable and unfocused main body of the image due to too generalized descriptive text, so that the practical requirements cannot be met.
Disclosure of Invention
According to an aspect of the present application, there is provided a method of generating an image file, including: optimizing an image according to preset conditions to generate a model; acquiring a training picture and a training text, wherein the training text comprises image theme information; fine tuning the image generation model according to the training pictures and the training texts; and generating an image file through the trimmed image generation model according to the input information.
According to some embodiments, optimizing the image generation model according to preset conditions includes: encoding by the image generation model; generating a control vector according to the preset condition; decoding is carried out according to the control vector so as to control the output result of the image generation model.
According to some embodiments, fine tuning the image generation model according to the training picture and the training text comprises: performing initial fine adjustment on the image generation model according to the training pictures and the training texts; and performing secondary fine tuning on the image generation model subjected to the initial fine tuning according to the training picture.
According to some embodiments, performing initial fine tuning of the image generation model based on the training picture and the text information includes: setting a loss function of the image generation model; inputting the training picture into the image generation model in combination with the text information to obtain a priori picture; and updating the loss function according to the training picture and the prior picture.
According to some embodiments, performing secondary fine tuning on the image generation model that has undergone initial fine tuning according to the training picture includes: acquiring a downsampled picture corresponding to the training picture; generating a downsampling-training picture combination according to the downsampling picture and the training picture; updating the image generation model by the downsampling-training picture combination.
According to some embodiments, generating an image file from the trimmed image generation model according to the input information comprises: inputting an image theme text according to the preset condition; and generating the image file through the trimmed image generation model according to the image theme text.
According to some embodiments, the image subject text includes keyword information that matches image subject information of the training text.
According to an aspect of the present application, there is provided an image file generating apparatus including: the data acquisition module is used for acquiring training pictures and training texts; acquiring an image theme text; the data processing module is used for setting and optimizing an image generation model according to preset conditions; performing initial fine tuning and secondary fine tuning on the image generation model according to the training pictures and the training texts; generating an image file through the trimmed image generation model according to the image subject text; and the output module outputs the image file.
According to an aspect of the present application, there is provided an electronic apparatus including: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods as described above.
According to an aspect of the application there is provided a computer program product comprising a computer program or instructions which, when executed by a processor, carries out a method as hereinbefore described.
According to the embodiment of the application, through fine adjustment of the image generation model, the consistency of the style of the material image generated by the model is maintained, and the image file with proper similarity meeting the requirements is generated under the condition that the input description information is not enough imaged, so that the creation efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application.
Fig. 1 shows a flowchart of a method of generating an image file according to an exemplary embodiment of the present application.
FIG. 2 shows a flow chart of fine tuning of an image generation model according to an example embodiment of the application.
Fig. 3A shows a training picture according to an example embodiment of the application.
Fig. 3B shows a priori pictures according to an example embodiment of the application.
FIG. 4 shows a schematic diagram of fine tuning of an image generation model and generation of an image file according to an example embodiment of the application.
Fig. 5 shows a block diagram of an image file generating apparatus according to an exemplary embodiment of the present application.
Fig. 6 shows a block diagram of an electronic device according to an example embodiment of the application.
Detailed Description
The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application can be practiced without one or more of the specific details, or with other methods, components, materials, devices, operations, etc. In these instances, well-known structures, methods, devices, implementations, materials, or operations are not shown or described in detail.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
The application provides a method, a device, equipment and a program product for generating an image file, which optimize an image generation model to improve image creation efficiency, and perform material mining based on provided material topics to generate material images with proper similarity, so that copyright infringement risk is avoided.
A method, apparatus, device and program product for generating an image file according to an embodiment of the present application will be described in detail with reference to the accompanying drawings.
Fig. 1 shows a flowchart of a method of generating an image file according to an exemplary embodiment of the present application.
As shown in fig. 1, in step S110, the image file generating apparatus optimizes an image generation model according to preset conditions.
For example, in step S110, the image file generating apparatus sets an image generation model in advance, and fuses control information into the image generation model according to preset conditions, thereby controlling the output result of the image generation model.
The image file generating means sets an image generation model in advance to realize encoding of a text or a picture of the input image generation model, and outputs a picture obtained by decoding.
According to some embodiments, the image generation model may employ a Stable diffration model, which includes UNet networks.
The image file generating device generates a control vector of the image generating model according to preset conditions so as to limit the decoding result of the image generating model, and therefore control of pictures output by the image generating model is achieved.
According to some embodiments, the preset conditions include various control conditions introduced through a cross-attention mechanism (cross-attention), such as text, category, etc., including descriptions of image subjects.
According to some embodiments, the image file generating device generates a corresponding text control vector according to the text control condition through the image generating model, and fuses the text control vector into an intermediate layer of a UNet network of the image generating model, so as to ensure that consistency of the content of a picture vector obtained after decoding of the image generating model and the text control vector exists.
According to some embodiments, the control of the decoding result of the image generation model by the image file generating apparatus through the control vector may be achieved by the following formula:
Attention(Q,K,V)=softmax(QK/d 1/2 ).V
where Q represents a text control vector, K, V represents a vector obtained during decoding, W represents a network parameter, τ θ (y) generating a multi-mode information coding function preset in the model for the image.
In step S120, the image file generating apparatus acquires a training picture and a training text.
For example, in step S120, the image file generating apparatus acquires a training picture and a corresponding training text for fine-tuning training of the image generation model.
The image file generating device obtains training pictures through a preset material database, and obtains training texts in a manual setting mode, wherein the training texts comprise image theme information, and the training pictures correspond to the image theme information.
According to some embodiments, the image file generating apparatus may obtain training pictures through a preset UED (User Experience Design abbreviation, user experience design) platform, and obtain manually set training texts. The training pictures are associated and correspond to the training texts through image theme information contained in the training texts.
In step S130, the image file generating apparatus performs fine adjustment of the image generation model based on the training picture and the training text.
For example, in step S130, the image file generating apparatus trains the image generation model with the training picture as input data in combination with the training text, and fine-tunes the image generation model according to the training result.
The image file generating means sets a loss function of the image generation model.
According to some embodiments, the loss function of the image generation model includes a reconstruction loss function (Reconstruction Loss) that may cause the image generation model to reconstruct an input picture from the input text information. The loss function of the image generation model also comprises a priori information retaining function (Class-Specific Prior Preservation Loss) which can ensure that the image generation model does not generate overfitting and semantic distortion due to small sample fine tuning.
The image file generating device firstly takes the training pictures as input data, takes the training texts as labels, trains the image generating model and obtains priori pictures.
According to some embodiments, the image file generating apparatus inputs a plurality of training pictures obtained from the UED platform into the image generating model, and takes the manually set training text as a tag, so that the image generating model can learn any description related to the image subject information contained in the training text.
According to some embodiments, the description format of the image subject information of the training text may be [ example, type ]. For example, the image file generating means trains an image generation model of a personalized face, and image subject information of the training text can be described as [ person name, person ]. For another example, the image file generating means trains a personalized rabbit-style image generating model, and the image subject information of the training text can be described as [ rabit_2023, animal ].
The image file generating means provides the a priori pictures generated by training to the image generating model for initial fine tuning of the image generating model.
According to some embodiments, the image file generating device uses a training picture as input data, uses a training text as a label, and combines the prior picture to update the reconstruction loss function and the prior information retaining function in a gradient descent mode, so that the reconstruction loss function and the prior information retaining function reach a preset optimization target.
The image file generating device performs downsampling processing on the training pictures, and performs secondary fine adjustment on the image generating model by using the pictures subjected to the downsampling processing.
According to some embodiments, the image file generating device performs downsampling processing on the training picture to obtain a downsampled picture having a lower resolution than the training picture. The image file generating means generates a downsampled-training picture combination by combining the training picture with the corresponding downsampled picture.
According to some embodiments, the image file generating device may calculate a gradient loss of the image generating model according to a mean square error between a picture output after the downsampled picture is input into the image generating model and a corresponding training picture, and then update the image generating model according to the gradient loss, so as to complete secondary fine tuning of the image generating model.
In step S140, the image file generating means generates an image file from the trimmed image generation model based on the input information.
For example, in step S140, the image file generating apparatus generates a corresponding image file from the inputted information by the image generation model having undergone the initial trimming and the secondary trimming.
The image file generating device inputs the image subject text meeting the preset conditions into an image generating model which is subjected to initial fine tuning and secondary fine tuning, and generates an image file corresponding to the image subject text through the image generating model.
According to some embodiments, the image theme text input by the image file generating means contains keyword information matching with the image theme information item of the training text. For example, if the image subject information of the training text is "2023", the image subject text input by the image file generating apparatus should contain the keyword "2023".
According to the embodiment of the application, the bottom layer model is optimized, the production efficiency of the image file is improved, and the image file with proper similarity meeting the requirements can be generated based on a small amount of provided theme-style image materials under the condition that the input text description content is not enough.
FIG. 2 shows a flow chart of fine tuning of an image generation model according to an example embodiment of the application.
As shown in fig. 2, in step S210, the image file generating apparatus generates a priori pictures through the image generation model.
For example, in step S210, the image file generating apparatus generates a priori pictures through an image generation model in combination with training texts, with the training pictures as input data.
Before training the image generation model, the image file generation device first sets a loss function of the image generation model. Further, the image file generating device trains the image generation model with the training picture as input data and the training text as a tag, and obtains the prior picture.
According to some embodiments, the image file generation means is assumed to train a personalized rabbit-style image generation model. The image file generating apparatus obtains 3 pictures as shown in fig. 3A as training pictures through the UED platform, and the image file generating apparatus inputs text information rubbit_2023 as training text by a person.
The image file generating apparatus trains the image generation model with 3 training pictures as input data and with rabit_2023 as a tag, and obtains a priori pictures as shown in fig. 3B.
It can be seen that the prior picture shown in fig. 3B is authentic in appearance, but still has flaws, and the prior picture shown in fig. 3B is not identical in style to the training picture shown in fig. 3A, and does not exhibit a warm and happy atmosphere of the training picture shown in fig. 3A.
In step S220, the image file generating apparatus performs initial fine adjustment of the image generation model in conjunction with the prior picture.
For example, in step S220, the image file generating apparatus performs initial fine adjustment on the image generation model through the training picture and the training text in combination with the prior picture.
The image file generating device provides the priori pictures for the image generating model, and continuously trains the image generating model according to the training pictures and the training texts so as to carry out initial fine adjustment on the image generating model.
According to some embodiments, the image file generating device uses the training picture as input data, uses the training text as a label, and updates the loss function of the image generating model in a gradient descent mode in combination with the prior picture, so that generalization of the image generating model is not affected in the fine tuning process.
In step S230, the image file generating apparatus performs secondary trimming on the image generation model that has undergone the initial trimming.
For example, in step S230, the image file generating apparatus performs secondary trimming on the image generation model that has undergone the initial trimming by the training picture and the downsampled picture corresponding to the training picture.
The image file generating device obtains a downsampled picture of the training picture, combines the training picture with the corresponding downsampled picture to generate a downsampled-training picture combination, and performs secondary fine tuning on the image generating model through the downsampled-training picture combination.
According to some embodiments, the image file generating device performs downsampling processing on the training picture to obtain a downsampled picture having a lower resolution than the training picture. The image file generating device updates the image generating model through the downsampling-training picture combination generated by combining the training picture and the corresponding downsampling picture so as to finish secondary fine adjustment of the image generating model.
According to the embodiment of the application, the generalization of the model in the training process can be maintained through fine adjustment of the model, and the picture styles generated before and after the model are kept consistent, so that the image file meeting the requirements can be generated.
FIG. 4 shows a schematic diagram of fine tuning of an image generation model and generation of an image file according to an example embodiment of the application.
As shown in fig. 4, the image file generating apparatus (not shown in fig. 4) sets the image generation model 100.
After optimizing the image generation model 100 according to the preset conditions, the image file generation device fine-tunes the image generation model 100.
The image file generating means trains the image generation model 100 with the training picture 220 as input data and the training text 210 as a tag and obtains a priori picture (not shown in fig. 4), wherein the training text 210 includes image subject information, and the training picture 220 corresponds to the image subject information included in the training text 210.
The image file generating means uses the training pictures 220 as input data, the training texts 210 as labels, and in combination with the prior pictures, performs an initial fine tuning of the image generation model 100 to update the loss function in the image generation model 100.
The image file generating means performs a downsampling process on the training picture 220 and performs a secondary fine tuning on the image generation model 100 through the training picture 220 and a corresponding downsampled picture (not shown in fig. 4) to continue updating the image generation model 100.
After the fine tuning of the image generation model 100 is completed, the image file generation device inputs the input information 310 of the model into the image generation model 100, wherein the input information 310 of the model includes keywords that match the image subject information of the training text 210.
For example, the content of the training text 210 is "rabit_2023", and the image subject information thereof is "rabit" and "2023". The content of the input information 310 of the model is "Chinese style rabbit _2023animal … …", which contains keywords "rabit" and "2023" that match the image subject information of the training text 210.
The image file generating means performs an inference calculation on the image generating model 100 which has been trimmed, based on the input information 310 of the model, to obtain an image file 320 generated by the model.
Fig. 5 shows a block diagram of an image file generating apparatus according to an exemplary embodiment of the present application.
As shown in fig. 5, the image file generating apparatus 400 includes a data acquisition module 410, a data processing module 420, and an output module 430.
The data acquisition module 410 obtains a training picture through a preset material database (such as a UED platform), and receives a manually set training text, wherein the training text comprises image subject information, the training picture corresponds to the image subject information, and the training text and the training picture are used for training and fine tuning of an image generation model.
After the fine tuning of the image generation model is completed, the data acquisition module 410 also obtains manually entered information (e.g., image subject text) for generating a satisfactory image file from the image generation model.
The data processing module 420 presets an image generation model, and generates a control vector of the image generation model according to preset conditions so as to control a picture output by the image generation model, thereby realizing optimization of the image generation model.
After the optimization of the image generation model is completed, the data processing module 420 trains the image generation model with the training picture as input data and the training text as a label, and obtains a priori pictures.
The data processing module 420 uses the training pictures as input data, uses the training texts as labels, and updates the loss function of the image generation model by combining the prior pictures so as to perform initial fine adjustment on the image generation model.
The data processing module 420 performs downsampling processing on the training picture to obtain a downsampled picture corresponding to the training picture. The data processing module 420 performs a secondary fine adjustment on the image generation model for which the initial fine adjustment has been completed by a downsampling-training picture combination generated by the training picture and the corresponding downsampling picture combination.
The data processing module 420 inputs the image subject text containing keywords matching the image subject information of the training text into the image generation model to generate a corresponding image file through inference calculation of the image generation model.
The output module 430 is used to output the image file generated by the image generation model.
Fig. 6 shows a block diagram of an electronic device according to an example embodiment of the application.
As shown in fig. 6, the electronic device 600 is only an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.
As shown in fig. 6, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different system components (including the memory unit 620 and the processing unit 610), a display unit 640, etc. In which a storage unit stores program codes that can be executed by the processing unit 610, so that the processing unit 610 performs the methods according to various exemplary embodiments of the present application described in the present specification. For example, the processing unit 610 may perform the method as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 6201 and/or cache memory unit 6202, and may further include Read Only Memory (ROM) 6203.
The storage unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 600, and/or any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the description of the embodiments above, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. The technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The computer-readable medium carries one or more programs which, when executed by one of the devices, cause the computer-readable medium to perform the aforementioned functions.
Those skilled in the art will appreciate that the modules may be distributed throughout several devices as described in the embodiments, and that corresponding variations may be implemented in one or more devices that are unique to the embodiments. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
According to some embodiments of the application, the technical scheme of the application can generate the image file with proper similarity according with requirements based on the provided small amount of theme style image materials, and the production efficiency of the image materials is improved through optimizing the model, so that the cost is saved.
The foregoing detailed description of the embodiments of the application has been presented only to assist in understanding the method and its core ideas of the application. Meanwhile, based on the idea of the present application, those skilled in the art can make changes or modifications on the specific embodiments and application scope of the present application, which belong to the protection scope of the present application. In view of the foregoing, this description should not be construed as limiting the application.

Claims (10)

1. A method of generating an image file, comprising:
optimizing an image according to preset conditions to generate a model;
acquiring a training picture and a training text, wherein the training text comprises image theme information;
fine tuning the image generation model according to the training pictures and the training texts;
and generating an image file through the trimmed image generation model according to the input information.
2. The method of claim 1, wherein optimizing the image generation model according to the preset condition comprises:
encoding by the image generation model;
generating a control vector according to the preset condition;
decoding is carried out according to the control vector so as to control the output result of the image generation model.
3. The method of claim 1, wherein fine-tuning the image generation model based on the training pictures and the training text comprises:
performing initial fine adjustment on the image generation model according to the training pictures and the training texts;
and performing secondary fine tuning on the image generation model subjected to the initial fine tuning according to the training picture.
4. A method according to claim 3, wherein initially fine-tuning the image generation model based on the training picture and the text information comprises:
setting a loss function of the image generation model;
inputting the training picture into the image generation model in combination with the text information to obtain a priori picture;
and updating the loss function according to the training picture and the prior picture.
5. A method according to claim 3, wherein performing secondary trimming on the image generation model that has undergone initial trimming from the training picture comprises:
acquiring a downsampled picture corresponding to the training picture;
generating a downsampling-training picture combination according to the downsampling picture and the training picture;
updating the image generation model by the downsampling-training picture combination.
6. The method of claim 1, wherein generating an image file from the trimmed image generation model based on the input information, comprises:
inputting an image theme text according to the preset condition;
and generating the image file through the trimmed image generation model according to the image theme text.
7. The method of claim 6, wherein the image subject text includes keyword information that matches image subject information of the training text.
8. An image file generation apparatus, comprising:
the data acquisition module is used for acquiring training pictures and training texts; acquiring an image theme text;
the data processing module is used for setting and optimizing an image generation model according to preset conditions; performing initial fine tuning and secondary fine tuning on the image generation model according to the training pictures and the training texts; generating an image file through the trimmed image generation model according to the image subject text;
and the output module outputs the image file.
9. An electronic device, comprising:
one or more processors;
a storage means for storing one or more programs;
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the method of any of claims 1-7.
CN202310449587.3A 2023-04-18 2023-04-18 Image file generation method, device, equipment and program product Pending CN116597048A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310449587.3A CN116597048A (en) 2023-04-18 2023-04-18 Image file generation method, device, equipment and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310449587.3A CN116597048A (en) 2023-04-18 2023-04-18 Image file generation method, device, equipment and program product

Publications (1)

Publication Number Publication Date
CN116597048A true CN116597048A (en) 2023-08-15

Family

ID=87599860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310449587.3A Pending CN116597048A (en) 2023-04-18 2023-04-18 Image file generation method, device, equipment and program product

Country Status (1)

Country Link
CN (1) CN116597048A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315417A (en) * 2023-09-04 2023-12-29 浙江大学 Diffusion model-based garment pattern fusion method and system
CN117315417B (en) * 2023-09-04 2024-05-14 浙江大学 Diffusion model-based garment pattern fusion method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315417A (en) * 2023-09-04 2023-12-29 浙江大学 Diffusion model-based garment pattern fusion method and system
CN117315417B (en) * 2023-09-04 2024-05-14 浙江大学 Diffusion model-based garment pattern fusion method and system

Similar Documents

Publication Publication Date Title
US10380996B2 (en) Method and apparatus for correcting speech recognition result, device and computer-readable storage medium
CN109583952B (en) Advertisement case processing method, device, equipment and computer readable storage medium
JP2023539532A (en) Text classification model training method, text classification method, device, equipment, storage medium and computer program
CN113011186B (en) Named entity recognition method, named entity recognition device, named entity recognition equipment and computer readable storage medium
CN111368538A (en) Voice interaction method, system, terminal and computer readable storage medium
CN111090728A (en) Conversation state tracking method and device and computing equipment
CN101458681A (en) Voice translation method and voice translation apparatus
KR102076793B1 (en) Method for providing electric document using voice, apparatus and method for writing electric document using voice
CN110929094A (en) Video title processing method and device
WO2022001888A1 (en) Information generation method and device based on word vector generation model
CN107861954A (en) Information output method and device based on artificial intelligence
CN116820429B (en) Training method and device of code processing model, electronic equipment and storage medium
CN111985243B (en) Emotion model training method, emotion analysis device and storage medium
JPH07222248A (en) System for utilizing speech information for portable information terminal
CN115359314A (en) Model training method, image editing method, device, medium and electronic equipment
CN111368531B (en) Translation text processing method and device, computer equipment and storage medium
CN116644168A (en) Interactive data construction method, device, equipment and storage medium
US11036996B2 (en) Method and apparatus for determining (raw) video materials for news
CN116909435A (en) Data processing method and device, electronic equipment and storage medium
CN116597048A (en) Image file generation method, device, equipment and program product
CN116958738A (en) Training method and device of picture recognition model, storage medium and electronic equipment
CN117216544A (en) Model training method, natural language processing method, device and storage medium
CN112799658B (en) Model training method, model training platform, electronic device, and storage medium
US20200394733A1 (en) Systems and methods for mobile device-based legal self help
CN114860869A (en) Controllable universal dialogue model with generalized intentions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination