CN114169255A

CN114169255A - Image generation system and method

Info

Publication number: CN114169255A
Application number: CN202210127000.2A
Authority: CN
Inventors: 李智康; 李沛珂; 周慧玲; 周畅; 杨红霞; 周靖人
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-03-11
Anticipated expiration: 2042-02-11
Also published as: CN114169255B

Abstract

The application provides an image generation system and method. The image generation system includes: the image generation component is used for acquiring at least one original image, wherein the at least one original image comprises a detail pattern image and/or a basic pattern image, and converting the original image into information of the original image, the information of the original image comprises n first image sequences respectively corresponding to n feature spaces, and one first image sequence represents the original image in the feature space corresponding to the first image sequence; the image information processing component is used for inputting the information of at least one original image into the image processing model, generating the information of N initial design images through code word processing, and generating M industrial design images according to the information of the N initial design images; wherein N is a positive integer, M is an integer greater than or equal to N, and N is an integer greater than 1. Aiming at the industrial design of product details, the output efficiency of industrial design images is improved.

Description

Image generation system and method

Technical Field

The present application relates to the field of intelligent manufacturing technologies, and in particular, to an image generation system and method.

Background

At present, in the process of designing clothes, the existing clothes elements are often required to be used for reference to design a new clothes style, the design effect of the clothes and the matching effect among the clothes elements are required to be considered from aspects of color, material, contour, style, connotation and the like, and the clothes design mode cannot adapt to the current ever-changing fashion trend due to low efficiency and long time consumption. Similarly, in industrial design of other manufacturing industries, the problems that the product design efficiency is low, the time consumption is long, and the product updating requirement cannot be matched when the product is designed based on the details exist.

Disclosure of Invention

The image generation system and method provided by the embodiment of the application aim to realize high-efficiency industrial design.

In a first aspect, an embodiment of the present application provides an image generation system, including: the image generation component is used for acquiring at least one original image, wherein the at least one original image comprises a detail pattern image and/or a basic pattern image, the detail pattern image is used for reflecting the detail pattern requirement and the detail editing area of the industrial design image, the basic pattern image is used for reflecting the basic pattern requirement and the detail editing area of the industrial design image, and for each original image in the at least one original image, the original image is converted into information of the original image, the information of the original image comprises n first image sequences respectively corresponding to n feature spaces, and one first image sequence is used for representing the original image in the feature space corresponding to the first image sequence; the image information processing component is used for inputting the information of the at least one original image into an image processing model and generating N pieces of information of the initial design image through code word processing; the image generation component is also used for generating M industrial design images according to the information of the N initial design images; wherein N is a positive integer, M is an integer greater than or equal to N, and N is an integer greater than 1.

In a second aspect, an embodiment of the present application provides an image generation method, including: acquiring at least one original image, wherein the at least one original image comprises a detail pattern image and/or a basic pattern image, the detail pattern image is used for reflecting the detail pattern requirement and the detail editing area of the industrial design image, and the basic pattern image is used for reflecting the basic pattern requirement and the detail editing area of the industrial design image; for each original image in the at least one original image, converting the original image into information of the original image, where the information of the original image includes n first image sequences corresponding to n feature spaces, respectively, and one first image sequence is used to represent the original image in the feature space corresponding to the first image sequence; inputting the information of the at least one original image into an image processing model, and generating N pieces of information of the initial design image through code word processing; generating M industrial design images according to the information of the N initial design images; wherein N is a positive integer, M is an integer greater than or equal to N, and N is an integer greater than 1.

In the embodiment of the application, the detail pattern requirement and the detail editing area are automatically identified through the acquired original image, the information of a plurality of initial design images meeting the pattern requirement (such as the basic pattern requirement and the detail pattern requirement) or aiming at the detail editing area is generated and obtained based on the information of the original image, a plurality of industrial design images are generated based on the information of the plurality of initial design images, and the output efficiency of the industrial design images is improved aiming at the industrial design of product details.

Furthermore, an original image is represented by n first image sequences corresponding to n feature space distributions, compared with the prior art that the original image is represented in a feature space by one feature sequence, the feature dimensionality of the original image is richer, and then the original image is subjected to image processing in the n feature spaces, so that an industrial design image is obtained, and the definition of the industrial design image is increased.

Drawings

Fig. 1 is a schematic structural diagram of an image generation system according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of an internal architecture of an image generation system according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart of model training and codebook generation provided by an embodiment of the present application;

fig. 4 is a schematic flowchart of an image generation method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The image generation systems and methods provided herein may be applicable to various manufacturing industries, including, for example and without limitation: food, clothing, tobacco, furniture, paper making, printing, sports and entertainment products, medicine, chemical industry and other manufacturing industries. Industrial designs provide a practical implementation for aspects of the manufacturing process. For example, in the apparel industry, industrial designs may include product-oriented apparel designs, packaging designs, user-oriented advertising designs, store-oriented display designs, and the like; as another example, in the printing industry, industrial designs may include product-oriented typesetting designs, production equipment-oriented mechanical designs, and the like. In addition, industrial designs may include flat designs, web site designs, interior designs, environmental designs, and the like in various manufacturing industries. Based on the wide application of industrial design in the manufacturing field, it becomes important to realize the intelligent development of industrial design. The present application will be described below by way of example in the apparel industry, but the present application is not limited thereto.

According to the image generation scheme provided by the embodiment of the application, based on the acquired original image, the detail pattern requirement and the detail editing area are automatically identified, information of a plurality of initial design images meeting the pattern requirement (such as a basic pattern requirement and a detail pattern requirement) or aiming at the detail editing area is generated based on the information of the original image, and then a plurality of industrial design images are generated based on the information of the plurality of initial design images, so that the high-efficiency industrial design aiming at the product details is realized.

For example, when the image generation scheme provided by the embodiment of the application is applied to the clothing industry, diversified designs can be efficiently performed on a local area of clothing to obtain a plurality of different clothing design images, and the plurality of clothing design images can be diversified designs performed on a collar of a T-shirt, such as a red round collar, a square collar with lace edges, a collar with a drawstring, a collar with a hat and the like, which are designed for the T-shirt. For another example, when the image generation scheme provided by the embodiment of the application is applied to the home decoration industry, diversified designs can be efficiently performed on indoor local areas to obtain a plurality of different indoor design images, and the plurality of indoor design images can be diversified designs performed on dining tables, such as a round glass dining table, a square wooden dining table, a black long table and the like arranged in a dining room.

The technical solution in the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a schematic structural diagram of an image generation system according to an embodiment of the present application. As shown in fig. 1, the system 100 includes: an image information processing component 110 and an image generation component 120. The image information processing component 110 and the image generation component 120 are connected via a network.

The embodiments of the present application do not limit the arrangement of the image information processing component 110 and the image generation component 120. Optionally, the image information processing component 110 and the image generation component 120 are implemented as distributed deployments, and the overall system functionality is implemented by means of a cloud computing system. The image information processing component 110 may implement cloud deployment, or with respect to the cloud deployment, the image information processing component 110 may be deployed on various e-commerce platforms and user sides, and no matter where the image information processing component is deployed, the image information processing component 110 may be deployed on a terminal device or a server. The image generation component 120 and the image information processing component 110 have similar deployment modes, and are not described in detail here. For example, the image information processing component 110 may be deployed in the cloud and implemented as a cloud server, so as to run a neural network model for image processing by virtue of resource advantages on the cloud; the image generation component 120 can be deployed on the e-commerce platform or user side to facilitate acquisition of raw images or output of industrial design images.

The terminal device may be, for example, a Mobile Phone (Mobile Phone), a tablet computer (Pad), a desktop computer, a terminal device in industrial control (industrial control), or the like. The terminal equipment in this application embodiment can also be wearable equipment, and wearable equipment also can be called as wearing formula smart machine, is the general term of using wearing formula technique to carry out intelligent design, develop the equipment that can dress to daily wearing, like glasses, gloves, wrist-watch, dress and shoes etc.. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user.

The server may be, for example, a conventional server, a cloud server, or a cluster of servers, etc.

Therein, the image generation component 120 is first used to acquire at least one raw image. The at least one original image may include a detail pattern image for reflecting the detail pattern requirements and the detail editing region of the industrial design image and/or a base pattern image for reflecting the base pattern requirements and the detail editing region of the industrial design image. For example, in the clothing design, the detail pattern image is an image of a hat, the requirement of the reflected detail pattern is 'hat' shown in the figure, and the reflected detail editing area is a collar area; for another example, the basic style image is an image of a T-shirt, the position of the collar in the basic style image is marked as a detail editing area, the basic style requirement reflected by the detail editing area is the "T-shirt" shown in the figure, the detail editing area reflected by the detail editing area is the collar area, and the industrial design image can be the T-shirt with a hat.

It should be understood that the detail style requirements in the present embodiment may include, but are not limited to: size, color, style, material, touch, transparency, type, brand, etc.

Next, the image generating component 120 may convert each of the at least one original image to obtain information of each original image. The information of an original image comprises n first image sequences respectively corresponding to n feature spaces, wherein n is an integer greater than 1, and one first image sequence is used for representing the original image in the feature space corresponding to the first image sequence. It should be noted that the feature is an abstraction of the original image, that is, an abstract representation of the original image, and by using a numerical value to represent the original image, the feature in the feature space is a higher-dimensional abstraction of the original image. For example, assuming that one feature vector in one feature space can represent 256 feature dimensions, and in 4 feature spaces, one feature vector in each feature space can represent 256 feature dimensions, then 4 feature spaces can represent 256 feature dimensions to the power of 4. Therefore, compared with the feature expression capability of one image sequence to the image, the feature expression capability of the n first image sequences to the original image is greatly improved, and the original image has rich feature expression to provide a basis for generating a high-definition industrial design image.

Referring to fig. 1, the image generation component 120 may include, for example, an image sequence generation module 121. The image sequence generation module 121 may be configured to convert an input original image into n first image sequences based on the n pre-trained codebooks, for example, convert the original image into n first image sequences based on the n pre-trained codebooks. Further, the image sequence generation module 121 may transmit the generated n first image sequences to the image information processing component 110.

The codebook is a dictionary which can embody the mapping relation between the image sequence and the image pixels, so that the image can be expressed by the image sequence, and the codebook is a discretized text representation of the image sequence. The n codebooks are respectively in one-to-one correspondence with the n feature spaces, and each codebook in the n codebooks is used for reflecting the mapping relation between the image sequence and the image pixels in the feature space corresponding to the codebook. It should be noted that the n codebooks may be generated during the model training process, and a specific generation manner will be described below.

As shown in connection with fig. 2, the image sequence generation module 121 may include an encoder 1211 and a product quantization module 1212. The image generation component may input the original image into the encoder 1211, encode the original image to obtain image features of the original image, and represent the image features of the original image as n first image sequences (the n first image sequences may be referred to as information of the original image) respectively corresponding to n feature spaces through the product quantization module 1212 based on n pre-trained codebooks.

Taking an original image as an example of a base pattern image, after the base pattern image is input to the encoder 1211, the encoder 1211 encodes the base pattern image to obtain image characteristics of the base pattern image, and the base pattern image includes a detail editing region. For example, the encoder 1211 may convert the base pattern image into 256 × 3 images into 8 × 1024 image features, where 256 × 256 is the length h × width w of the base pattern image, 3 is the number of color channels of the RGB image, 8 × 8 represents tiles that block the base pattern image into 32 blocks 32 × 32, and 1024 represents the number of features corresponding to each block.

Further, the product quantization module 1212 may quantize the image features based on the n codebooks, for example, map the image features of the base pattern image into n first image sequences corresponding to the n feature spaces, respectively. Continuing with the above example, the encoder 1211 inputs 8 × 1024 image features into the product quantization module 1212, and the product quantization module 1212 represents the base pattern image in a feature space including 256-dimensional features through 8 × 8=64 codewords in one codebook, for example, an image sequence includes index numbers corresponding to 64 codewords, n =1024/256=4 image sequences includes index numbers corresponding to 256 codewords, each codeword may represent 256-dimensional features, for example, a first codebook includes codewords with index numbers of 1 to 256 (corresponding to a first feature space), a second codebook includes codewords with index numbers of 257 to 512 (corresponding to a second feature space), and so on. The first codebook, the second codebook, the first eigenspace and the second eigenspace are only used for convenience of expressing different codebooks or eigenspaces, and do not have other limiting meanings.

Similar to the above example, the encoder 1211 may encode the detail pattern image to obtain the image features of the detail pattern image, and further, the product quantization module 1212 may represent the image features of the detail pattern image as n first image sequences corresponding to n feature spaces, respectively, based on n codebooks, and specific implementations may refer to the above example and are not further described herein.

Referring to fig. 1, in some embodiments, the image generation system 100 further includes a text sequence generation module 130. The text sequence generation module 130 is configured to obtain a detail description text for describing a detail pattern requirement and/or a detail editing region of the industrial design image, and convert the detail description text into a text sequence, which may be a one-dimensional feature sequence of the detail description text. The detail description text may be, for example, "design the neck of the T-shirt as a hat," where the neck of the T-shirt reflects the detail editing area and the hat reflects the detail style requirements.

It can be understood that the N first image sequences corresponding to the detail pattern images, the N first image sequences corresponding to the base pattern images, and the text sequence corresponding to the detail description text can all be used as input of the image information processing component, so that the image information processing component can obtain requirements based on multi-modal input, for example, edit the detail editing area in the base pattern image based on the detail pattern requirements reflected by the detail pattern images and the detail description text and the base pattern requirements reflected by the base pattern images, so as to generalize the information of the N initial design images.

In some embodiments, the input to the image information processing component 110 also includes a task type identifier. The task type identifier is used to indicate the task type, for example to indicate a dress edit or a virtual fitting, etc.

Referring to fig. 1, the image information processing component 110 may generate information of N initial design images according to the N first image sequences corresponding to the detail pattern images, the N first image sequences corresponding to the base pattern images, and the text sequence corresponding to the detail description text, where N is a positive integer.

On one hand, it should be noted that, after the image information processing component 110 performs codeword processing on the image sequence and the text sequence of each input original image based on the N codebooks, in the information of the N initial design images, each piece of information of the initial design image includes N second image sequences corresponding to N feature spaces, which is similar to the N first image sequences, and compared with the feature expression capability of one second image sequence on the initial design image, the feature expression capability of the N second image sequences on the initial design image is greatly improved, and the initial design image has rich feature expression to provide a basis for generating a high-definition industrial design image.

On the other hand, it should be noted that the image information processing component 110 may generate more than one piece of information of the initial design image, and in the case that the image information processing component 110 generates more than one piece of information of the initial design image, the industrial design efficiency is further improved.

Still referring to FIG. 1, the image information processing component 110 can include, for example, an image processing model 111, and the image processing model 111 can be used to generate information for N initial design images. Illustratively, the information of the original images and the text sequence are input into the image processing model 111, and the information of the N initial design images is generated based on N codebooks trained in advance.

The following exemplarily explains a process in which the image information processing section 110 generates information of N initial design images:

the image information processing component 110 inputs the n first image sequences (or referred to as information of the base style image) corresponding to the base style image, the n first image sequences (or referred to as information of the detail style image) corresponding to the detail style image, and the text sequence (or referred to as information of the detail description text) corresponding to the detail description text into the image processing model 111. Referring to fig. 2, the image processing model 111 predicts, according to the information of the input detail pattern image, N codewords of N second detail images corresponding to N initial design image distributions based on N pre-trained codebooks; or, the image processing model 111 predicts, according to the information of the input detail description text, N pre-trained codebooks, to obtain codewords of N third detail images corresponding to the N initial design images, respectively; or, the image processing model 111 predicts, according to the information of the input detail description image and the information of the detail description text, the N fifth detail images corresponding to the N initial design images respectively based on the N pre-trained codebooks.

Then, the image processing model 111 replaces detail elements in the N first image sequences corresponding to the base pattern image N times based on the N codebooks according to the information of the input base pattern image, and obtains information of N initial design images. The detail elements in the n first image sequences are used to represent the detail editing regions in the base style image in n feature spaces. Illustratively, for one of N initial design images, detail elements in N first image sequences are masked by the image processing model 111, and a codeword of a first detail image corresponding to the detail element is predicted based on N pre-trained codebooks, where the first detail image may be any one of the detail images (e.g., the second detail image, the third detail image, and the fifth detail image) described above, or the first detail image may be generalized in a network, and the codeword of the first detail image replaces the mask of the detail element to obtain information of the initial design image.

For example, after the image processing model 111 masks detail elements in N first image sequences corresponding to the base pattern image, the masks of the detail elements in the N first image sequences corresponding to the base pattern image may be replaced N times according to code words of N second detail images, so as to obtain information of N initial design images. In the codeword processing process, the codewords of the N second detail images may be replaced by the codewords of the N third detail images or the codewords of the N fifth detail images, which are similar to each other and are not described herein again.

It is understood that any detail image can be an abstract information, and is represented by a discretized code word during the processing of the image processing model 111.

Optionally, the image processing model 111 may sequentially send the information of the N initial design images to the image generating component 120, or the image processing model 111 may send the information of the N initial design images to the image generating component 120 together, which is not limited in this application.

It should also be understood that the information of the base pattern image, the information of the detail pattern image, and the information of the detail description text as the input of the image processing model 111 are only an example, and do not constitute any limitation to the present application. For example, the input of the image processing model 111 may be part of information of a base pattern image, information of a detail description text, or the input of the image processing model 111 may further include other feature information than the enumerated information.

When the input of the image processing model 111 is partial information among information of the base pattern image, information of the detail pattern image, and information of the detail description text, the image processing model 111 can still generate information of N initial design images.

As an example, the input of the image processing model 111 is information of a base pattern image. Referring to fig. 2, the image processing model 111 performs N times of replacement on detail elements in N first image sequences corresponding to the basic pattern image based on N pre-trained codebooks, so as to obtain information of the N initial design images. And the detail elements in the n first image sequences corresponding to the base pattern image are used for representing the detail editing area in the base pattern image in the n feature spaces.

For example, as shown in fig. 2, in the process of generating information of an initial design image, the image processing model 111 may mask detail elements in n first image sequences corresponding to the base pattern image, for example, in fig. 2, code words of the detail elements are replaced by mask X from code word index numbers, and based on n pre-trained codebooks, code words of a first detail image corresponding to the detail elements are predicted, and the mask of the detail elements is replaced by the code words of the first detail image, so as to obtain the initial design image.

It will be appreciated that the first detail image may be generalized in the network, i.e. generated by the image processing model 111 without specifying a detail pattern.

In the first example, the image processing model 111 may obtain a detail editing region to be edited and a requirement of the base style based on information of the base style image, and then predict a possible detail image (for example, a first detail image) of the base style image for the detail editing region, and obtain information of the initial design image through codeword replacement, so that the image generating component 120 can obtain an industrial design image subjected to detail editing based on the information of the initial design image, and the industrial design image is obtained by editing the base style image.

It should be understood that, in the process of masking the detail editing region, the masked detail elements are not limited to the detail editing region in the base style image, and the image processing model 111 may adaptively adjust the detail editing region and/or the region outside the detail editing region, so as to make the finally generated industrial design image more reasonable and aesthetic. For example, when the basic style image is a short-sleeved T-shirt, the detail editing region is a collar region, and the first detail image is replaced with a code word to form a collar with a hat, the image processing model 111 may simultaneously use the sleeves as the detail editing region to edit the T-shirt from short-sleeved to long-sleeved.

As another example, the detailed design requirement information includes detailed pattern information, which may include at least one of information of the aforementioned detailed pattern description text, information of a detailed pattern image, for example. In this case, the image information processing unit 110 may input the detail pattern information to the image processing model 111, and generate information of the N initial design images from the image processing model 111. Referring to fig. 2, for one initial design image of the N initial design images, the image processing model 111 predicts and obtains a codeword of a first detail image corresponding to the one initial design image based on N pre-trained codebooks, and obtains information of the one initial design image according to the codeword of the first detail image. For example, the image processing model 111 may replace any target base image with a codeword of the predicted first detail image, the template base image may be predicted by the image processing model 111, or obtained in advance by the image processing model 111, which is not limited in this application.

In the second example, the image processing model 111 may obtain a detail editing region or a detail pattern requirement based on the detail pattern information, predict a codeword of a third detail image based on the detail pattern requirement, and replace an element of the detail editing region of any target base image with the codeword of the third detail image to generate information of an initial design image.

Of course, the information of the detail description text, the information of the detail pattern image, and the information of the base pattern image may be input to the image processing model 111 in any combination. It can be understood that the richer the information of the input image processing model 111 is, the higher the degree of conformity of the generated industrial design image with the requirements is; the less information is input to the image processing model 111, the more the image information processing component 110 can identify or predict unknown requirements, which increases the flexibility of industrial design.

Alternatively, the image generation system 100 may initiate the image generation process based on the user's detailed design instructions.

For example, at least one of the detail description text, the detail pattern image, and the base pattern image may be carried in the user's detail design instruction.

As can be appreciated based on the foregoing, the image information processing component 110 can perform information processing on the multimodal input to obtain a sequence of images. For example, the image information processing component 110 may generate information of an image based on a text sequence to implement a text-to-figure information processing procedure, may also generate information of an image based on an image sequence to implement a figure-to-figure information processing procedure, and may also generate information of an image based on a text sequence and an image sequence to implement a text and figure-to-figure information processing procedure.

In the embodiment of the present application, the image generation component 120 is configured to generate M industrial design images according to the information of the N initial design images generated by the image information processing component 110, for example, the image generation component 120 may generate M industrial design images through the image generation module 122, where M is an integer greater than or equal to N. For example, the image generation component 120 may perform image reconstruction on the initial design images according to N second image sequences corresponding to the N initial design images, respectively, to obtain the N initial design images, and use the N initial design images as N industrial design images; or the image generating component 120 may generalize the N second image sequences corresponding to the N initial design images respectively to obtain more (for example, M) pieces of information of the initial design images, and then perform image reconstruction according to the M pieces of information of the initial design images respectively to obtain M pieces of industrial design images.

Optionally, when N is 1, M is equal to N; when N is greater than 1, M is greater than N. That is, when there are two or more initial design images, the image generation component 120 can generalize to obtain M industrial design images based on information of the two or more initial design images. The specific generalization procedure will be described below.

In some embodiments, after the image information processing component 110 generates the information of the initial design image (i.e., the n second image sequences corresponding to the n feature spaces, respectively), for each initial design image, the image information processing component 110 may determine, from the n second image sequences corresponding to the initial design image, a target element whose confidence is lower than a threshold, where the target element whose confidence is lower than the threshold is included in the n second image sequences, which may affect the quality of the generated industrial design image in dimensions such as design rationality, aesthetics, definition, and the like. Furthermore, the image information processing component 110 (for example, the image processing model 111 in the image information processing component 110) may mask the target element, predict, based on n pre-trained codebooks, a codeword of the fourth detail image corresponding to the target element, and replace the codeword of the fourth detail image with the mask of the target element, so as to obtain information of the updated initial design image (i.e., the updated n second image sequences).

Further, the image information processing component 110 continues to determine the target element with the confidence coefficient lower than the threshold value from the updated n second image sequences, and performs the mask and codeword replacement process on the target element as described above, which is not described herein again. And ending the processing process aiming at the generation quality of the industrial design image until the confidence degrees of all the elements in the n second image sequences are equal to or higher than the threshold value, or until the number of the elements with the confidence degrees lower than the threshold value in the n second image sequences is less than a preset value.

Alternatively, the number of target elements may be all or part of the n second image sequences having a confidence level lower than the threshold. For example, the image information processing component 110 may randomly select one or more elements as the above-described target element from all elements of the n second image sequences whose degree of confidence is lower than the threshold value.

In some embodiments, the image processing model may be, but is not limited to, a very large scale Chinese pretraining (Multi-simulation to Multi-simulation Multi task Mega-transformer, M6) model. The M6 model is a very large scale multi-modal language pre-training model, is a large artificial intelligence model with universality and has trillion or even billion model parameters.

Referring to fig. 1, the image generation component 120 may include, for example, an image generation module 122.

The image generation module 122 may be implemented as a decoder, see the decoder 1221 in fig. 2. The decoder 1221 may be, for example, a decoder in a Generic Adaptive Networks (GAN), which is a Style decoupled GAN model that can implement linear interpolation by performing semantic Style decoupling on an image sequence. The decoder 1221 may be configured to perform image reconstruction, which may refer to a process of decoding an encoded image sequence to obtain an image.

Illustratively, the image information processing component 110, for example, the image processing model 111 in the image information processing component 110, may input the obtained information of the N initial design images into the decoder 1221, and the decoder 1221 generates the M industrial design images based on the information of the N initial design images.

As mentioned above, the image generation component 120 generates M industrial design images in two ways: in the first mode, the image generation component 120 may perform image reconstruction on the initial design images according to N second image sequences corresponding to the N initial design images, respectively, to obtain the N initial design images, and use the N initial design images as N industrial design images; in a second mode, the image generating component 120 may generalize the N second image sequences corresponding to the N initial design images respectively to obtain more (for example, M) pieces of information of the initial design images, and then perform image reconstruction according to the M pieces of information of the initial design images respectively to obtain M pieces of industrial design images. For the second of the two approaches described above, generalization of the image can be achieved by the encoder 1221 in the image generation component 120.

For example, the decoder 1221 may perform linear interpolation on information of N initial design images to obtain information of M initial design images, as described above, the information of N initial design images includes N second image sequences corresponding to N feature spaces corresponding to the initial design images, respectively, and the decoder 1221 may perform linear interpolation on codewords on the same space between N second image sequences corresponding to any two initial design images in the N initial design images to obtain more (e.g., M) information of the initial design images in a generalized manner; furthermore, the decoder 1221 generates M industrial design images based on the information of the M initial design images, and the generation process of the industrial design images is a decoding process of the decoder 1221 on the information of the M initial design images, so as to realize a reconstruction process from an image sequence to an image.

Based on the above examples, the image generation component 120 may be implemented as an image generation model that employs a structure including, but not limited to, an encoder 1211, a product quantization module 1212, a decoder 1221, and a discriminator 1222, it being understood that a network in which the encoder and decoder are deployed may be generally referred to as a countermeasure generation network. The image generation model may be obtained based on countermeasure training, the training process of the image generation model is also synchronously trained on the n codebooks, and the image generation model may be considered as a neural network model obtained by training the n codebooks. One two-dimensional image can be compressed into N one-dimensional image sequences by the N codebooks through the encoder 1211 and the product quantization model 1212 in the image generation model, and at the same time, the N image sequences corresponding to the N initial design images can be restored to the original two-dimensional image as much as possible through the decoder 1221 in the image generation model. For example, an original RGB image I with a resolution of H × W (H is the image height, W is the image width) is first processed by the encoder 1211 to obtain N times of down-sampled image features (H/N) × (W/N) × C, where C is the feature dimension; then, the N discrete sequences with the length of (H/N) × (W/N) are changed through the product quantization module 1212, and under the condition of the same resolution, the space expression capability of the N discrete sequences respectively corresponding to the N feature spaces is stronger, that is, the richness of the reality and the details is higher; finally, the image is restored to H × W or higher resolution image J by decoder 1221. The encoder 1211 is a convolutional neural network, and is composed of a residual error network and a down-sampling module; the decoder 1221 is similar in structure to the encoder 1211, and only the down-sampling module needs to be replaced with the up-sampling module correspondingly. The product quantization module internally has n code dictionaries, namely codebooks, and for the input features, the closest code word in the codebooks is used for replacing the features.

From the perspective of a Model training principle, an image generation Model is composed of a generation Model (generic Model) and a discriminant Model (discriminant Model), and in the Model training process, the generation Model mainly learns real image distribution, for example, first image sample distribution, so that an image generated by the generation Model is more real, and the discriminant Model cannot distinguish whether data is real data or not. The discriminant model is used to determine whether the received image is a real-world image, i.e., the probability that the output data is a real image rather than a generated image. The discriminant model may provide the loss of model training as feedback to the generative model, which improves the ability of the generative model to generate a near-reality image. The whole process can be regarded as a game of a generating model and a judging model, and finally two network models reach a dynamic balance through continuous alternate iteration, namely the judging model cannot judge whether data given by the generating model are real images, the judging probability is about 0.5, and the judging probability is similar to random guess.

In an optional embodiment, a training process of the image serialization model may be completed in one stage, and in the training process, the generation model of the image serialization model and the decision module are directly subjected to countermeasure training to obtain the image generation model and the n codebooks. In this embodiment, the confrontation training refers to a training process of mutual balance formed by the generative model and the discriminant model, specifically, the generative model generates an observation image similar to the input first image sample as much as possible to deceive the discriminant model, and the discriminant model separates the observation image generated by the generative model from the first image sample as much as possible.

The image processing model 111 may be obtained by performing further model training based on the n codebooks obtained by the synchronous training in the image generation model training process. Accordingly, the present embodiment obtains the image generation model and the image processing model 111 by performing model training in the following two stages.

The first stage is as follows: a plurality of first image samples are input into the first initial network model, and the first initial network model is subjected to countermeasure training to obtain an image generation model and n codebooks, as shown in fig. 3. The first initial network model is an initial image generation model and comprises an initial encoder, an initial product quantization module and an initial decoder, wherein the initial encoder is used for extracting features of a first image sample to obtain image features, discretizing the image features in n different feature spaces by using initial n codebooks to obtain n image sequences, each image sequence comprises an index number of a corresponding code word after discretization of the image features, and the n image sequences are decoded by the initial decoder to obtain a reconstructed image; further, a loss function of the first stage is calculated according to a difference between the reconstructed image and the first image sample, for example, the loss function may be a residual of the two images, and if the loss function does not meet a requirement, iteration is continued until the loss function meets a set requirement, so as to obtain an image generation model with a higher instruction and n codebooks.

Due to the fact that the n codebooks need to have universality or universality, a plurality of first image samples across fields can be collected in the process of training the n codebooks and the image generation model, namely the codebook, is obtained by performing model training by adopting the first image samples across fields.

And a second stage: inputting the plurality of sample groups into the second initial network model, and training the second initial network model based on the n codebooks and the plurality of sample groups obtained by training to obtain an image processing model, as shown in fig. 3. Wherein, the second initial network model refers to an initial image processing model. Each sample group of the plurality of sample groups may include at least one of a base pattern image sample, a detail pattern image sample, and a detail description text. For example, in the process of training the image processing model, the image generation model and the text sequence generation module obtained by the training may be combined, and the image processing model is input after the samples in the sample group are converted into the one-dimensional text sequence or the image sequence by the image generation model and the text sequence generation module. The training process of the image processing model may be, for example, an unsupervised training process; alternatively, the image processing model may be trained in conjunction with the image generation model obtained by the first stage training, for example, the image processing model 111 may be obtained by performing countermeasure training using the decoder 1221 and the arbiter 1222 of the image generation model until the loss function satisfies the setting requirement.

In the training process of the image processing model, a plurality of sample sets need to be acquired, and the sample sets can be in the same field or different fields. When a plurality of sample groups come from the same field, the training precision of the image processing model can be improved, that is, the obtained image processing model has higher processing capacity for one field, for example, when the image processing model is applied to the garment design industry, the plurality of sample groups can all be the sample groups related to garment design; when the multiple sample sets come from different fields, the universality or universality of the image processing model can be improved. This is not a limitation of the present application.

In the embodiment of the present application, in addition to providing an image generation system, there is also provided an image generation method, which can implement an industrial design image designed for a detail pattern in each manufacturing industry, as shown in fig. 4, the method 200 includes:

s210, acquiring at least one original image, wherein the at least one original image comprises a detail pattern image and/or a basic pattern image, the detail pattern image is used for reflecting the detail pattern requirement and the detail editing area of the industrial design image, and the basic pattern image is used for reflecting the basic pattern requirement and the detail editing area of the industrial design image;

s220, for each original image in the at least one original image, converting the original image into information of the original image, where the information of the original image includes n first image sequences corresponding to n feature spaces, respectively, and one first image sequence is used to represent the original image in the feature space corresponding to the first image sequence;

s230, inputting the information of the at least one original image into an image processing model, and generating N pieces of information of the initial design image through code word processing;

s240, generating M industrial design images according to the information of the N initial design images;

wherein N is a positive integer, M is an integer greater than or equal to N, and N is an integer greater than 1.

In some embodiments, converting the original image into information of the original image includes: inputting the original image into an encoder, and encoding the original image to obtain the image characteristics of the original image; and representing the image characteristics of the original image as n first image sequences respectively corresponding to the n characteristic spaces by a product quantization module based on n pre-trained codebooks, wherein the codebooks are discretized text representations of the image sequences.

In some embodiments, at least one original image comprises the base pattern image; inputting the information of the at least one original image into an image processing model, and generating the information of N initial design images through codeword processing, wherein the information comprises: inputting the information of the basic pattern image into an image processing model, and replacing detail elements in N first image sequences corresponding to the basic pattern image for N times based on the N codebooks to obtain the information of the N initial design images; wherein the detail elements in the n first image sequences are used for representing the detail editing area in the base style image in the n feature spaces.

In some embodiments, inputting the information of the base pattern image into an image processing model, and performing N replacements on detail elements in N first image sequences corresponding to the base pattern image based on the N codebooks to obtain information of the N initial design images, includes: for one initial design image in the N initial design images, the image information processing component masks detail elements in the N first image sequences through the image processing model, and predicts and obtains code words of the first detail images corresponding to the detail elements based on N pre-trained codebooks; and replacing the mask of the detail element with the code word of the first detail image to obtain the information of the initial design image.

In some embodiments, the at least one original image further comprises the detail pattern image; the method further comprises the following steps: inputting the information of the detail pattern image into an image processing model, and predicting to obtain code words of N second detail images corresponding to the N initial design images respectively based on the N codebooks; the inputting the information of the basic pattern image into an image processing model, and replacing detail elements in N first image sequences corresponding to the basic pattern image for N times based on the N codebooks to obtain information of the N initial design images includes: and carrying out mask masking on detail elements in the N first image sequences corresponding to the basic pattern image, and carrying out N times of replacement on the mask masking of the detail elements in the N first image sequences corresponding to the basic pattern image according to the code words of the N second detail images to obtain the information of the N initial design images.

In some embodiments, the method further comprises: acquiring a detail description text, wherein the detail description text is used for describing a detail style requirement and/or a detail editing area of an industrial design image, and converting the detail description text into a text sequence; inputting the text sequence into an image processing model, and predicting to obtain code words of N third detail images corresponding to the N initial design images respectively based on the N codebooks; the inputting the information of the basic pattern image into an image processing model, and replacing detail elements in N first image sequences corresponding to the basic pattern image for N times based on the N codebooks to obtain information of the N initial design images includes: and carrying out mask masking on detail elements in the N first image sequences corresponding to the basic pattern image, and carrying out N times of replacement on the mask masking of the detail elements in the N first image sequences corresponding to the basic pattern image according to the code words of the N third detail images to obtain the information of the N initial design images.

In some embodiments, the method further comprises: determining target elements with confidence degrees lower than a threshold value from the n second image sequences corresponding to the initial design image; masking the target element, and predicting to obtain a code word of a fourth detail image corresponding to the target element based on n pre-trained codebooks; replacing the code words of the fourth detail image with the mask of the target elements to obtain n updated second image sequences; and performing iterative processing on the n second image sequences until the confidence coefficient of each element in the n second image sequences is equal to or higher than the threshold value.

In some embodiments, the generating M industrial design images from the information of the N initial design images includes: the information of the N initial design images is input to a decoder, and the M industrial design images are generated in the decoder based on the information of the N initial design images.

In some embodiments, the generating, in the decoder, the M industrial design images based on the information of the N initial design images includes: performing linear interpolation on the information of the N initial design images through a decoder to obtain information of M initial design images; the M industrial design images are generated based on information of the M initial design images.

In some embodiments, the method further comprises: acquiring a plurality of first image samples across domains; and inputting the plurality of first image samples into a first initial network model, and performing countermeasure training on the first initial network model to obtain an image generation model and the n codebooks, wherein the first initial network model consists of a countermeasure generation network and a product quantization module.

In some embodiments, the method further comprises: collecting a plurality of sample groups; and inputting the plurality of sample groups into a second initial network model, training the second initial network model based on the N codebooks and the plurality of sample groups, and obtaining an image processing model, wherein the image processing model is used for generating information of the N initial design images.

According to the image generation method provided by the embodiment of the application, the detail pattern requirement and the detail editing area are automatically identified through the acquired original image, the information of a plurality of initial design images meeting the pattern requirement (such as the basic pattern requirement and the detail pattern requirement) or aiming at the detail editing area is generated based on the information of the original image, a plurality of industrial design images are generated based on the information of the plurality of initial design images, and the output efficiency of the industrial design images is improved aiming at the industrial design of product details.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 210 to 230 may be device a; for another example, the execution subject of steps 210, 220, and 240 may be device a, and the execution subject of step 230 may be device B; and so on.

Referring to fig. 5, the embodiment of the present application is only described with reference to fig. 5 as an example, and the present application is not limited thereto.

Fig. 5 is a schematic structural diagram of an electronic device 300 according to an embodiment of the present application. The electronic device 300 shown in fig. 5 may be a terminal device or a server, and when the electronic device 300 is deployed in the cloud as a server, the electronic device 300 may be implemented as a cloud server, and the electronic device 300 includes a processor 310, and the processor 310 may call and execute a computer program from a memory to implement the method in the embodiment of the present application.

Optionally, as shown in fig. 5, the electronic device 300 may further include a memory 330. From the memory 330, the processor 310 may call and run a computer program to implement the method in the embodiment of the present application.

The memory 330 may be a separate device from the processor 310, or may be integrated in the processor 310.

Optionally, as shown in fig. 5, the electronic device 300 may further include a transceiver 320, and the processor 310 may control the transceiver 320 to communicate with other devices, and specifically, may transmit information or data to the other devices or receive information or data transmitted by the other devices.

The transceiver 320 may include a transmitter and a receiver, among other things. The transceiver 320 may further include one or more antennas.

Optionally, the electronic device 300 may implement a corresponding process corresponding to the image generation system in each method in the embodiment of the present application, and for brevity, details are not described here again.

It should be understood that the processor of the embodiments of the present application may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should be understood that the above memories are exemplary but not limiting illustrations, for example, the memories in the embodiments of the present application may also be Static Random Access Memory (SRAM), dynamic random access memory (dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (enhanced SDRAM, ESDRAM), Synchronous Link DRAM (SLDRAM), Direct Rambus RAM (DR RAM), and the like. That is, the memory in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.

The embodiment of the application also provides a computer readable storage medium for storing the computer program.

Optionally, the computer-readable storage medium may be applied to the electronic device in the embodiment of the present application, and the computer program enables the computer to execute the corresponding process executed by the image generation system in each method in the embodiment of the present application, which is not described herein again for brevity.

Embodiments of the present application also provide a computer program product comprising computer program instructions.

Optionally, the computer program product may be applied to the electronic device in the embodiment of the present application, and the computer program instructions enable the computer to execute the corresponding processes executed by the image generation system in the methods in the embodiment of the present application, which are not described herein again for brevity.

The embodiment of the application also provides a computer program.

Optionally, the computer program may be applied to the electronic device in the embodiment of the present application, and when the computer program runs on a computer, the computer executes a corresponding process executed by the image generation system in each method in the embodiment of the present application, which is not described herein again for brevity.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. With regard to such understanding, the technical solutions of the present application may be essentially implemented or contributed to by the prior art, or may be implemented in a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image generation system, comprising:

the image generation component is used for acquiring at least one original image, wherein the at least one original image comprises a detail pattern image and/or a basic pattern image, the detail pattern image is used for reflecting the detail pattern requirement and the detail editing area of an industrial design image, the basic pattern image is used for reflecting the basic pattern requirement and the detail editing area of the industrial design image, and for each original image in the at least one original image, the original image is converted into information of the original image, the information of the original image comprises n first image sequences respectively corresponding to n feature spaces, and one first image sequence is used for representing the original image in the feature space corresponding to the first image sequence;

the image information processing component is used for inputting the information of the at least one original image into an image processing model and generating N pieces of information of the initial design image through code word processing;

the image generation component is further used for generating M industrial design images according to the information of the N initial design images;

2. The system according to claim 1, wherein the image generation component converts the original image into information of the original image, and specifically comprises:

inputting the original image into an encoder, and encoding the original image to obtain the image characteristics of the original image;

and representing the image characteristics of the original image into n first image sequences respectively corresponding to the n characteristic spaces by a product quantization module based on n pre-trained codebooks, wherein the codebooks are discretized text representations of the image sequences.

3. The system of claim 2, wherein the at least one original image comprises the base pattern image; the image information processing component inputs the information of the at least one original image into an image processing model, and generates information of N initial design images through code word processing, and the method specifically comprises the following steps:

inputting the information of the basic pattern image into an image processing model, and replacing detail elements in N first image sequences corresponding to the basic pattern image for N times based on the N codebooks to obtain the information of the N initial design images; wherein the content of the first and second substances,

detail elements in the n first image sequences are used to represent detail editing regions in the base style image in the n feature spaces.

4. The system according to claim 3, wherein the image information processing component inputs the information of the base pattern image into an image processing model, and performs N replacements of detail elements in N first image sequences corresponding to the base pattern image based on the N codebooks to obtain the information of the N initial design images, specifically including:

for one initial design image in the N initial design images, the image information processing component masks detail elements in the N first image sequences through the image processing model, and predicts and obtains code words of a first detail image corresponding to the detail elements based on N pre-trained codebooks;

and replacing the code words of the first detail image with the mask of the detail elements to obtain the information of the initial design image.

5. The system according to claim 3 or 4, wherein the at least one original image further comprises the detail pattern image; the image information processing component is further configured to:

inputting the information of the detail pattern image into an image processing model, and predicting to obtain code words of N second detail images corresponding to the N initial design images respectively based on the N codebooks;

the image information processing component inputs the information of the basic pattern image into an image processing model, and performs N times of replacement on detail elements in N first image sequences corresponding to the basic pattern image based on the N codebooks to obtain information of the N initial design images, which specifically includes:

and carrying out mask masking on detail elements in the N first image sequences corresponding to the basic pattern image, and carrying out N times of replacement on the mask masking of the detail elements in the N first image sequences corresponding to the basic pattern image according to the code words of the N second detail images to obtain the information of the N initial design images.

6. The system of claim 3 or 4, further comprising a text sequence generation module;

the text sequence generating module is used for acquiring a detail description text, wherein the detail description text is used for describing the detail style requirement and/or the detail editing area of the industrial design image and converting the detail description text into a text sequence;

the image information processing component is further configured to input the text sequence into an image processing model, and predict, based on the N codebooks, codewords of N third detail images corresponding to the N initial design images, respectively;

and carrying out mask masking on detail elements in the N first image sequences corresponding to the basic pattern image, and carrying out N times of replacement on the mask masking of the detail elements in the N first image sequences corresponding to the basic pattern image according to the code words of the N third detail images to obtain the information of the N initial design images.

7. The system of any of claims 1 to 4, wherein the image information processing component is further configured to:

determining target elements with confidence degrees lower than a threshold value from the n second image sequences corresponding to the initial design image; masking the target elements, and predicting to obtain code words of a fourth detail image corresponding to the target elements based on n pre-trained codebooks; replacing the code words of the fourth detail image with the mask of the target elements to obtain n updated second image sequences; and performing iterative processing on the n second image sequences until the confidence coefficient of each element in the n second image sequences is equal to or higher than the threshold value.

8. The system according to any one of claims 1 to 4, wherein the image generation component generates M industrial design images from the information of the N initial design images, specifically comprising:

inputting the information of the N initial design images into a decoder, and generating the M industrial design images in the decoder based on the information of the N initial design images.

9. The system of claim 8, wherein the image generation component generates, in the decoder, the M industrial design images based on the information of the N initial design images, including:

performing linear interpolation on the information of the N initial design images through a decoder to obtain information of M initial design images;

generating the M industrial design images based on information of the M initial design images.

10. An image generation method, comprising:

acquiring at least one original image, wherein the at least one original image comprises a detail pattern image and/or a basic pattern image, the detail pattern image is used for reflecting the detail pattern requirement and the detail editing area of the industrial design image, and the basic pattern image is used for reflecting the basic pattern requirement and the detail editing area of the industrial design image;

for each original image in the at least one original image, converting the original image into information of the original image, where the information of the original image includes n first image sequences corresponding to n feature spaces, respectively, and one first image sequence is used to represent the original image in the feature space corresponding to the first image sequence;

inputting the information of the at least one original image into an image processing model, and generating N pieces of information of the initial design image through code word processing;

generating M industrial design images according to the information of the N initial design images;

11. The method of claim 10, wherein the information for converting the original image into the original image comprises:

12. The method according to claim 11, wherein the at least one original image comprises the base pattern image; inputting the information of the at least one original image into an image processing model, and generating information of N initial design images through codeword processing, including:

13. The method according to claim 12, wherein said inputting the information of the base pattern image into an image processing model, and based on the N codebooks, performing N replacements on detail elements in N first image sequences corresponding to the base pattern image to obtain information of the N initial design images, comprises:

14. The method according to claim 12 or 13, wherein the at least one original image further comprises the detail pattern image; the method further comprises the following steps:

the inputting the information of the basic pattern image into an image processing model, and replacing detail elements in N first image sequences corresponding to the basic pattern image for N times based on the N codebooks to obtain the information of the N initial design images includes: