CN116433468A

CN116433468A - Data processing method and device for image generation

Info

Publication number: CN116433468A
Application number: CN202310154767.9A
Authority: CN
Inventors: 梁天明; 范凌
Original assignee: Tezign Shanghai Information Technology Co Ltd
Current assignee: Tezign Shanghai Information Technology Co Ltd
Priority date: 2023-02-22
Filing date: 2023-02-22
Publication date: 2023-07-14

Abstract

The application discloses a data processing method and device for image generation. Acquiring image demand data, wherein the image demand data comprises base image data and user description data, and the base image data is base image data for generating a target image; performing image descriptor extraction processing based on a deep learning model on the base map data to obtain image descriptor data, wherein the image descriptor data is used for representing descriptor data for generating a target image base map; and performing image generation processing according to the image descriptor data and the user description data to obtain a target image. In the process of generating the target image according to the base map, the image style of the target image is determined according to the extracted base map image style by extracting the base map, and the target image which is the same as the base map style but has variable image content is generated, so that the problem that the effect of generating the image according to the base map is poor in the prior art is solved, and the image effect of generating the target image according to the base map is improved.

Description

Data processing method and device for image generation

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a data processing method and apparatus for image generation.

Background

With the continuous development of information technology, the development of an AI algorithm for generating images based on characters is rapid, so that the AI drawing is widely applied, and an creator generates a new image conforming to the styles of the description words and the base drawings by inputting the description words or adding the base drawings. In the field of text image generation, the structure of a descriptor has a great influence on the effect of generating an image. When a target image is generated according to a base image, the description words corresponding to the base image are determined through a preset description word database in the prior art, and the image is generated according to the description words corresponding to the base image, but because the description words in the preset description word database are all collected common description words, when the target image is generated according to the common description words, the style of the base image is poor in reproduction degree, so that the effect of generating the image is poor.

Therefore, the prior art has a problem of poor effect in generating an image from a base map.

Disclosure of Invention

The main purpose of the application is to provide a data processing method and device for image generation, so as to solve the problem that in the prior art, the effect of generating an image according to a base map is poor, and improve the image effect of generating a target image according to the base map in AI drawing.

To achieve the above object, a first aspect of the present application proposes a data processing method for image generation, including:

acquiring image demand data, wherein the image demand data comprises base image data and user description data, and the base image data is base image data for generating a target image;

performing image descriptor extraction processing based on a deep learning model on the base map data to obtain image descriptor data, wherein the image descriptor data is used for representing descriptor data for generating a target image base map;

and performing image generation processing according to the image descriptor data and the user description data to obtain a target image.

In some optional embodiments of the present application, performing image descriptor extraction processing based on a deep learning model on the base map data, where obtaining image descriptor data includes:

performing image style extraction data on the base image data based on a preset style extraction model to obtain image style data, wherein the image style data is style data used for representing a base image used for generating a target image;

performing image color extraction processing on the base map data to obtain image color data, wherein the image color data is color data for representing a base map for generating a target image;

performing image detail supplementing processing on the base map data based on a preset detail database to obtain image detail data, wherein the image detail data are detail data used for representing images;

and determining the image descriptor data according to the image style data, the image color data and the image detail data.

In some optional embodiments of the present application, performing image style extraction data on the base map data based on a preset style extraction model, and obtaining image style data includes:

performing first image style extraction processing on the base image data based on a preset image style descriptor matching model to obtain first image style data, wherein the first image style data is the base image style descriptor data subjected to the first image style extraction processing;

performing second image style extraction processing on the base image data based on a preset visual question-answering model to obtain second image style data, wherein the second image style data is the base image style descriptor data subjected to the second image style extraction processing;

the image style data is determined from the first image style data and the second image style data.

In some optional embodiments of the present application, performing a first image style extraction process on the base map data based on a preset image style descriptor matching model, to obtain first image style data includes:

performing vector extraction processing based on image coding on the base map data to obtain image vector data, wherein the image vector data is data for representing a base map image vector;

carrying out vector extraction processing based on text coding on the preset image style descriptor to obtain text vector data, wherein the text vector data is data for representing text vectors of the preset image style descriptor;

and screening the image vector data and the text vector data based on the similarity to obtain the first image style data, wherein the first image style data is data of a preset image style descriptor corresponding to the text vector meeting a preset similarity rule.

In some optional embodiments of the present application, performing a second image style extraction process on the base map data based on a preset visual question-answer model, and obtaining second image style data includes:

performing image style extraction processing based on a preset first style question and answer on the base map data based on a preset visual question and answer model to obtain first question and answer image style data, wherein the first question and answer image style data is base map style data used for representing first style question and answer;

performing image style extraction processing based on a preset second style question and answer on the base map data based on a preset visual question and answer model to obtain second question and answer image style data, wherein the second question and answer image style data is base map style data used for representing second style question and answer;

and determining the second image style data according to the first question-answer image style data and the second question-answer image style data.

In some optional embodiments of the present application, performing image color extraction processing on the base map data to obtain image color data includes:

preprocessing the base map data based on image scaling processing to obtain scaled base map data;

performing color space-based conversion processing on the scaled base map data to obtain converted base map data, wherein the converted base map data are data used for representing the converted scaled base map color space;

performing pixel screening processing of a color similarity threshold on the converted base map data to obtain base map color pixel data, wherein the base map color pixel data is image pixel data used for representing that the color similarity threshold is met;

and determining the image color data according to the base image color pixel data, wherein the image color data is the data of the corresponding color of the base image color pixel.

In some optional embodiments of the present application, performing image generation processing according to the image descriptor data and the user description data, to obtain a target image includes:

extracting the user description data based on an image main body to obtain image main body data, wherein the image main body data is data for a main body in a user target image;

performing recognition processing based on image intention on the user description data to obtain image description intention data, wherein the image description intention data is data for representing user target image intention

And inputting the image main body data and the image descriptor data into a preset artificial intelligence drawing model to obtain the target image data.

According to a second aspect of the present application, there is provided a data processing apparatus for image generation, comprising:

the system comprises a data acquisition module, a storage module and a storage module, wherein the data acquisition module is used for acquiring image demand data, the image demand data comprises base image data and user description data, and the base image data is base image data used for generating a target image;

the descriptor extraction module is used for carrying out image descriptor extraction processing based on a deep learning model on the base map data to obtain image descriptor data, wherein the image descriptor data is used for representing descriptor data for generating a target image base map;

and the image generation module is used for carrying out image generation processing according to the image descriptor data and the user description data to obtain a target image.

According to a third aspect of the present application, a computer-readable storage medium storing computer instructions for causing the computer to execute the above-described data processing method for image generation is provided.

According to a fourth aspect of the present application, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the data processing method for image generation described above.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

in the application, the image demand data is acquired, wherein the image demand data comprises base image data and user description data, and the base image data is base image data for generating a target image; performing image descriptor extraction processing based on a deep learning model on the base map data to obtain image descriptor data, wherein the image descriptor data is used for representing descriptor data for generating a target image base map; and performing image generation processing according to the image descriptor data and the user description data to obtain a target image. In the process of generating the target image according to the base map, the image style of the target image is determined according to the extracted base map image style by extracting the base map, and the target image which is the same as the base map style but has variable image content is generated, so that the problem that the effect of generating the image according to the base map is poor in the prior art is solved, and the image effect of generating the target image according to the base map is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application and to provide a further understanding of the application with regard to the other features, objects and advantages of the application. The drawings of the illustrative embodiments of the present application and their descriptions are for the purpose of illustrating the present application and are not to be construed as unduly limiting the present application. In the drawings:

FIG. 1 is a flow chart of a data processing method for image generation provided herein;

FIG. 2 is a flow chart of a data processing method for image generation provided herein;

FIG. 3 is a flow chart of a data processing method for image generation provided herein;

fig. 4 is a schematic diagram of a data processing apparatus for image generation provided in the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the present application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal" and the like indicate an azimuth or a positional relationship based on that shown in the drawings. These terms are used primarily to better describe the present application and its embodiments and are not intended to limit the indicated device, element or component to a particular orientation or to be constructed and operated in a particular orientation.

Also, some of the terms described above may be used to indicate other meanings in addition to orientation or positional relationships, for example, the term "upper" may also be used to indicate some sort of attachment or connection in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.

Furthermore, the terms "mounted," "configured," "provided," "connected," "coupled," and "sleeved" are to be construed broadly. For example, "connected" may be in a fixed connection, a removable connection, or a unitary construction; may be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements, or components. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.

With the rapid development of AI algorithm for generating images by words, AI painting is widely applied to social media of Internet, and by inputting descriptive words or adding base pictures, the AI algorithm generates a new image which accords with the style of descriptive words or base pictures. In the field of text image generation, a descriptor (hereinafter, referred to as "template") has a large influence on a generated image.

When a user generates an image with a similar style according to a base image, the user can generate a target image according to the acquired campt by acquiring related words of all campts of the base image, in the prior art, a good campt is largely obtained based on continuous try of a person, and the picture can be generated again by using the campt shared by other people to acquire picture samples with the same style; the other way is that the job position of the promtt engineer is specially responsible for collecting and filtering some common and high-quality promtt vocabularies, and the preserved promtt vocabularies can ensure that the generated picture is more 'good', but no method is good for reproducing the style of the base map, so that the image effect of generating the image according to the base map is poor.

In an alternative embodiment of the present application, a data processing method for image generation is provided, where a primitive descriptor extraction process based on a neural network model is performed on a current primitive to obtain a primitive consistent with description information such as style modification of the primitive image, and target image generation is performed according to the primitive extracted from the primitive, so that a problem in the prior art that an effect of generating an image is poor due to that the primitive selection is not similar to the primitive style is solved.

In an alternative embodiment of the present application, there is provided a data processing method for image generation, and fig. 1 is a flowchart of a data processing method for image generation provided in the present application, as shown in fig. 1, and the method includes the following steps:

s101: acquiring image demand data;

the image demand data comprises base image data and user description data, wherein the base image data is base image data used for generating a target image, the user description data is image related information of the target image required by a user, the image related information comprises data such as image main body information, intention information, state information and the like, the user description data is text or voice information obtained through man-machine interaction, and processing for generating the image according to the text is carried out according to the base image data and the user description data, so that the target image required by the user is obtained.

S102: performing image descriptor extraction processing based on a deep learning model on the base map data to obtain image descriptor data;

the image descriptor data are used for representing descriptor data for generating a base image of a target image, the base image is subjected to image descriptor extraction through a deep learning model, image descriptors corresponding to the style of the base image are obtained, AI drawing is carried out according to the descriptors of the base image, the obtained target image is the same as the style of the base image, the problem that in the prior art, the effect of generating the target image according to preset descriptors is poor is solved, and the image generation effect is improved.

In an alternative embodiment of the present application, there is provided a data processing method for image generation, and fig. 2 is a flowchart of a data processing method for image generation provided in the present application, as shown in fig. 2, and the method includes the following steps:

s201: performing image style extraction data on the base map data based on a preset style extraction model to obtain image style data;

the image style data are style data used for representing the base image used for generating the target image, the base image is subjected to image style descriptor extraction through a preset and learning model thereof, image style descriptors are obtained, style descriptor extraction modes of different modes are set, style extraction of different dimensions is achieved, comprehensiveness of the descriptors is improved, and similarity with the base image style is further improved during image generation.

In another alternative embodiment of the present application, there is provided a data processing method for image generation, and fig. 3 is a flowchart of a data processing method for image generation provided in the present application, as shown in fig. 3, and the method includes the following steps:

s301: performing first image style extraction processing on the base map data based on a preset image style descriptor matching model to obtain first image style data;

the first image style data is base picture style descriptor data after the first image style extraction processing.

In another alternative embodiment of the present application, there is provided a data processing method for image generation for extracting first image style data, the method comprising:

performing vector extraction processing based on image coding on the base map data to obtain image vector data, wherein the image vector data is data for representing the base map image vector; carrying out vector extraction processing based on text coding on the preset image style descriptor to obtain text vector data, wherein the text vector data is data for representing the text vector of the preset image style descriptor; and performing similarity-based screening processing on the image vector data and the text vector data to obtain the first image style data, wherein the first image style data is data of preset image style descriptors corresponding to the text vectors meeting preset similarity rules.

In an alternative embodiment of the present application, based on a preset image style descriptor matching model, text vectors are extracted from text of a description style prepared in advance based on text codes, image vectors are extracted from a base map based on image codes, similarity matching is performed on the image vectors and the extracted style vectors, and the description style text with similarity ranking meeting a preset threshold is used as first image style data according to a similarity score returned from high to low. The method comprises the steps of setting a plurality of sets of style texts corresponding to different style types through classifying the style types, respectively matching and returning the different sets, wherein the sets corresponding to the different style types, such as an artist set: such as Sanskyline, zileucite, etc.; 'literature sports' set: such as abstract art, boy style, etc.; "detailed description" set: such as high definition, wide angle; "media description" set: such as 3D rendering, photography, oil painting, etc., wherein the corresponding sets of different style types may be set correspondingly according to the user drawing requirements.

In an alternative embodiment of the present application, the preset image style descriptor matching model may be a contrast text-to-image pre-training model (Contrastive Language-Image Pretraining, referred to below as CLIP), which learns the mapping between an image and its descriptive text. And the text coding module and the image coding module are used for establishing a similarity relation through feature codes obtained by two codes, so that the base picture style description word is determined according to the similarity relation established by the base picture and the base picture style description word.

S302: performing second image style extraction processing on the base map data based on a preset visual question-answering model to obtain second image style data;

the second image style data is base picture style descriptor data after second image style extraction processing, and the second image style extraction processing is question-answering processing based on a preset visual question-answering model.

In another alternative embodiment of the present application, there is provided a data processing method for image generation for extracting second image style data, the method comprising:

performing image style extraction processing based on a preset first style question and answer on the base map data based on a preset visual question and answer model to obtain first question and answer image style data, wherein the first question and answer image style data is base map style data used for representing first style question and answer; performing image style extraction processing based on a preset second style question and answer on the base map data based on a preset visual question and answer model to obtain second question and answer image style data, wherein the second question and answer image style data is base map style data used for representing second style question and answer; and determining second image style data according to the first question-answer image style data and the second question-answer image style data.

For example, two questions and answers related to the style are set in the preset visual question and answer model, "what style does the picture belong to? "," What is the style of this image? "realizing questions and answers concerning style and image presentation form to images".

In an alternative embodiment of the application, question questions of different styles can be set according to drawing demand data, and the recognition and extraction of the base pattern styles are realized.

S303: image style data is determined from the first image style data and the second image style data.

The image style data includes first image style data and second image style data.

S202: performing image color extraction processing on the base map data to obtain image color data;

the image color data is color data for representing the base image of the target image, and the color is an important influencing factor of the image when the target image is generated, so that the consistency of the target image and the style of the base image is realized through color extraction, and the effect of generating the image according to the base image is improved.

In another alternative embodiment of the present application, there is provided a data processing method for image generation for extracting image color data, the method comprising:

preprocessing the base map data based on image scaling processing to obtain scaled base map data; performing color space-based conversion processing on the scaled base map data to obtain converted base map data, wherein the converted base map data are used for representing data obtained by converting the color space of the scaled base map; performing pixel screening processing of a color similarity threshold on the converted base map data to obtain base map color pixel data, wherein the base map color pixel data is used for representing image pixel data meeting the color similarity threshold; and determining image color data according to the base image color pixel data, wherein the image color data is the data of the corresponding color of the base image color pixel.

Uniformly scaling the base graph to N, for example, N is 30; converting the zoom map color space into a Lab color space, wherein the Lab color space can measure the similarity through the distance of the pixel values; setting a color similarity threshold as T, if t=100, calculating the number M of pixels with a distance from the current pixel being smaller than the threshold T from pixel to pixel, wherein r=m/(n×n) represents the duty ratio of the current pixel color in the base map; after the duty ratios of all the pixels are calculated, selecting a color set corresponding to the pixels with the duty ratio larger than a threshold (for example, taking the threshold T1=0.2), and adding the color set back to the template to obtain the base map color descriptor.

S203: performing image detail supplementing processing on the base map data based on a preset detail database to obtain image detail data;

the image detail data are detail data used for representing the image, the description word matching of the target image can be optimized through a preset detail database, the description word which can enable the target image to be optimized is stored in the preset detail database, and the description word is used as a supplementary description word for generating the target image. For example, if the description words are stored in the preset detail database, there are multiple sets of description words that are randomly selected each time, such as environmental factors, light conditions, detail supplements, and the like.

In the embodiment of the application, the description words for generating the target image are supplemented in detail, so that the AI drawing model is guided to generate richer details according to the promtt, the AI drawing model can be generated on the basis of the given promtt, the promtt is relatively rich, and the rationality of rendering generation of the target image is further improved.

S204: image descriptor data is determined from the image style data, the image color data, and the image detail data.

S103: and performing image generation processing according to the image descriptor data and the user description data to obtain a target image.

In another alternative embodiment of the present application, there is provided a data processing method for image generation for generating a target image from image descriptor data and user description data, the method including:

extracting the user description data based on the image main body to obtain image main body data, wherein the image main body data is data for a main body in a user target image; performing recognition processing based on image intention on the user description data to obtain image description intention data, wherein the image description intention data is data for representing the intention of a target image of the user; inputting the image main body data and the image descriptor data into a preset artificial intelligence drawing model to obtain target image data.

In the embodiment of the application, the style modification corresponding to the base picture and the description information such as artist and the like are obtained by extracting the description words of the given base picture, and a picture which is the same as the style of the base picture but has a variable main body and high diversity can be generated by obtaining the promt of the style information assembly, and additional random styles are added to complement the style and increase the diversity, so that the image effect of the target image is improved.

In another alternative embodiment of the present application, there is provided a data processing apparatus for image generation, and fig. 4 is a schematic diagram of the data processing apparatus for image generation provided in the present application, as shown in fig. 4, including:

a data acquisition module 41, configured to acquire image demand data, where the image demand data includes base map data and user description data, and the base map data is base map data for generating a target image;

the descriptor extraction module 42 is configured to perform image descriptor extraction processing based on a deep learning model on the base map data to obtain image descriptor data, where the image descriptor data is used for representing descriptor data for generating a base map of the target image;

the image generation module 43 is configured to perform image generation processing according to the image descriptor data and the user description data, so as to obtain a target image.

The specific manner in which the operations of the units in the above embodiments are performed has been described in detail in the embodiments related to the method, and will not be described in detail here.

In summary, in the present application, by acquiring image demand data, where the image demand data includes base map data and user description data, the base map data is base map data for generating a target image; performing image descriptor extraction processing based on a deep learning model on the base map data to obtain image descriptor data, wherein the image descriptor data is used for representing descriptor data for generating a target image base map; and performing image generation processing according to the image descriptor data and the user description data to obtain a target image. In the process of generating the target image according to the base map, the image style of the target image is determined according to the extracted base map image style by extracting the base map, and the target image which is the same as the base map style but has variable image content is generated, so that the problem that the effect of generating the image according to the base map is poor in the prior art is solved, and the image effect of generating the target image according to the base map is improved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

It will be apparent to those skilled in the art that the elements or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by computing devices, such that they may be stored in a memory device for execution by the computing devices, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A data processing method for image generation, comprising:

2. The data processing method according to claim 1, wherein performing image descriptor extraction processing based on a deep learning model on the base map data, obtaining image descriptor data includes:

3. The data processing method according to claim 2, wherein performing image style extraction data on the base map data based on a preset style extraction model, obtaining image style data includes:

4. The data processing method according to claim 3, wherein performing a first image style extraction process on the base map data based on a preset image style descriptor matching model, to obtain first image style data includes:

5. A data processing method according to claim 3, wherein performing a second image style extraction process on the base map data based on a preset visual question-answering model, to obtain second image style data includes:

6. The data processing method according to claim 2, wherein performing image color extraction processing on the base map data to obtain image color data includes:

7. The data processing method according to claim 1, wherein performing image generation processing based on the image descriptor data and the user description data to obtain a target image includes:

8. A data processing apparatus for image generation, comprising:

9. A computer-readable storage medium storing computer instructions for causing the computer to execute the data processing method for image generation according to any one of claims 1 to 7.

10. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to cause the at least one processor to perform the data processing method for image generation of any of claims 1-7.