CN117649477B

CN117649477B - Image processing method, device, equipment and storage medium

Info

Publication number: CN117649477B
Application number: CN202410125653.6A
Authority: CN
Inventors: 杨伟东; 何俊烽
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-01-30
Filing date: 2024-01-30
Publication date: 2024-06-04
Anticipated expiration: 2044-01-30
Also published as: CN117649477A

Abstract

The embodiment of the application discloses an image processing method, an image processing device, image processing equipment and a storage medium, which can be applied to the field of artificial intelligence. The method comprises the following steps: determining at least one sub-image of the image to be processed and rendering prompt information of each sub-image, wherein each sub-image comprises one image element of the image to be processed; inputting the sub-image and corresponding rendering prompt information into a generating model for each sub-image to obtain an initial image after rendering the sub-image; according to the sub-image, carrying out pixel adjustment on the corresponding initial image to obtain a target image after rendering the sub-image; and generating a target image after rendering the image to be processed according to the target image corresponding to each sub-image. By adopting the embodiment of the application, the image rendering effect can be improved, and the applicability is high.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image processing method, apparatus, device, and storage medium.

Background

With the development of computer technology, image rendering has become one of the key technologies in a number of fields including animation.

In the prior art, a unified rendering mode is often adopted for rendering an image to be processed, so that style interference is generated after each image element in the image to be processed is rendered, for example, a male character is feminized, and all the image elements in the image to be processed cannot be well rendered. Therefore, how to improve the rendering effect of the image is a problem to be solved.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a storage medium, which can improve the image rendering effect and have high applicability.

In one aspect, an embodiment of the present application provides an image processing method, including:

determining at least one sub-image of the image to be processed and rendering prompt information of each sub-image, wherein each sub-image comprises an image element of the image to be processed;

Inputting the sub-image and corresponding rendering prompt information into a generated model for each sub-image to obtain an initial image after rendering the sub-image; according to the sub-image, carrying out pixel adjustment on the corresponding initial image to obtain a target image after rendering the sub-image;

and generating a target image after rendering the image to be processed according to the target image corresponding to each sub-image.

In another aspect, an embodiment of the present application provides an image processing apparatus, including:

the image determining module is used for determining at least one sub-image of the image to be processed and rendering prompt information of each sub-image, wherein each sub-image comprises an image element of the image to be processed;

The image rendering module is used for inputting the sub-image and corresponding rendering prompt information into a generating model for each sub-image to obtain an initial image after the sub-image is rendered; according to the sub-image, carrying out pixel adjustment on the corresponding initial image to obtain a target image after rendering the sub-image;

And the image fusion module is used for generating a target image after rendering the image to be processed according to the target image corresponding to each sub-image.

In another aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other;

The memory is used for storing a computer program;

the processor is used for executing the image processing method provided by the embodiment of the application when the computer program is called.

In another aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program that is executed by a processor to implement the image processing method provided by the embodiment of the present application.

In another aspect, an embodiment of the present application provides a computer program product, where the computer program product includes a computer program, where the computer program implements the image processing method provided by the embodiment of the present application when the computer program is executed by a processor.

In the embodiment of the application, when the image to be processed is rendered, for each image element of the image to be processed, independent rendering prompt information and the image element can be input into the generation type model, so that independent rendering of each image element can be realized through the generation type model to obtain an initial image. And for each image element, the initial image obtained by preliminary rendering can be subjected to pixel adjustment according to the image element, so that pixels of all the image elements are more fused, and further, the target image obtained by rendering the image to be processed according to the target image after pixel adjustment has higher rendering effect and high applicability.

Drawings

Fig. 1 is a schematic diagram of an image processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of an image processing method according to an embodiment of the present application;

FIG. 3a is a schematic diagram of an image to be processed according to an embodiment of the present application;

FIG. 3b is a schematic view of an example segmentation of an image provided by an embodiment of the present application;

FIG. 3c is a schematic view of a sub-image provided by an embodiment of the present application;

FIG. 3d is a schematic view of a scene of a determined sub-image according to an embodiment of the present application;

FIG. 3e is another schematic view of a determined sub-image provided by an embodiment of the present application;

FIG. 4 is another schematic illustration of a sub-image provided by an embodiment of the present application;

FIG. 5a is a schematic representation of a rendering style provided by an embodiment of the present application;

FIGS. 5 b-5 c are schematic diagrams illustrating a comparison of a first set of rendering results according to embodiments of the present application;

FIGS. 6a-6 b are diagrams illustrating a second set of rendering results according to embodiments of the present application;

FIGS. 6 c-6 d are schematic diagrams illustrating a third set of rendering results according to embodiments of the present application;

FIG. 7 is another flow chart of an image processing method according to an embodiment of the present application;

FIGS. 8 a-8 c are a fourth set of rendering results versus schematic diagrams provided by embodiments of the present application;

FIG. 9 is a schematic diagram of rendering results passed by an embodiment of the present application;

fig. 10 is a schematic structural view of an image processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The image processing method provided by the embodiment of the application can be applied to the field of artificial intelligence, and particularly, the rendering processing of the image is realized through the generated model.

Referring to fig. 1, fig. 1 is a schematic diagram of an image processing method according to an embodiment of the present application. As shown in fig. 1, for the image 11 to be processed, when image rendering of the image 11 to be processed is required, at least one sub-image 12 of the image 11 to be processed and rendering prompt information 13 of each sub-image may be determined first.

Wherein one sub-image 12 of each image 11 to be processed comprises one image element of the image 11 to be processed. For example, any one of the sub-images 12 may include a background element of the image 11 to be processed, a person, an animal, an object, a special effect, etc. of the image 11 to be processed, which may be specifically determined based on the actual application scene requirement, and is not limited herein.

Further, for each sub-image 12, the sub-image 12 and the corresponding rendering prompt information 13 may be input into the device 14, so that the device 14 may render the sub-image 12 according to the rendering prompt information 13 of the sub-image 12 to obtain a rendered initial image 15, and perform pixel adjustment on the corresponding initial image 15 according to the sub-image 12 to obtain a target image 16 after rendering the sub-image 12.

The device 14 runs a pre-trained generating model, that is, the device 14 renders the corresponding sub-image 12 according to the rendering prompt information 13 corresponding to each sub-image 12 based on the pre-trained generating model to obtain a rendered initial image 15.

The device 14 may be a server or a terminal, and may be specifically determined based on actual application scenario requirements, which is not limited herein.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms.

The terminal can be a smart phone, a tablet personal computer, a notebook computer, a desktop computer, a smart sound box, a smart watch, a vehicle-mounted terminal, an aircraft, a smart home appliance (such as a smart television) or a wearable device.

Based on this, after obtaining the target image 16 after rendering each sub-image 12, the target image 17 after rendering the image to be processed may be generated from the target image 16 corresponding to each sub-image 12.

Referring to fig. 2, fig. 2 is a flow chart of an image processing method according to an embodiment of the application. As shown in fig. 2, the image processing method provided in the embodiment of the present application may specifically include:

s21, determining rendering prompt information of at least one sub-image of the image to be processed and each sub-image.

In some possible embodiments, the image to be processed comprises at least one image element.

The image elements in the embodiments of the present application include, but are not limited to, background elements, characters, objects, buildings, animals, plants, animation special effects, etc., which can be specifically determined based on actual application scene requirements, and are not limited herein.

For example, the image to be processed in the embodiment of the present application may be an image obtained through 3 rendering 2, an image synthesized based on any software, program or model, an original image photographed by a photographing device, or the like, which may be specifically determined based on actual application scene requirements, and is not limited herein.

3 Rendering 2 (CEL SHADING), namely 3D modeling and 2D rendering, is a special rendering method for de-realisation. According to the technology, the plane color and the outline are analyzed on the basis of the appearance of the three-dimensional object, so that the object has three-dimensional perspective and simultaneously can also present a two-dimensional effect, and finally, the 3D graph is endowed with the texture like 2D hand painting.

As an example, the image to be processed in the embodiment of the present application may be a video frame image in a cartoon video, where each sub-image of the video frame image includes a background element in the video frame image, or includes an animated character in the video frame image, or an animated special effect, or an object, etc.

In some possible embodiments, in determining each sub-image of the image to be processed, at least one mask (mask) area in the image to be processed may be determined based on the image instance segmentation, where each mask area corresponds to one image element in the image to be processed. And the pixel value of each mask area is the same, and the pixel values corresponding to any two areas are different.

Based on the method, a plurality of mask areas can be determined from the image to be processed, and then the image corresponding to each mask area can be determined as one sub-image of the image to be processed, or the images except for each mask area in the image to be processed can be determined as background elements and determined as one sub-image of the image to be processed.

Referring to fig. 3a, fig. 3a is a schematic diagram of an image to be processed according to an embodiment of the present application. Image example segmentation is performed on the image to be processed shown in fig. 3a, so as to obtain an image example segmentation schematic diagram shown in fig. 3 b. Two mask areas are shown in FIG. 3b, corresponding to the automobile element and the special effects element of FIG. 3a, respectively.

Based on this, the image element corresponding to each mask area in fig. 3b can be determined as one sub-image of the image element to be processed, i.e. sub-image 1 and sub-image 2 in fig. 3c, respectively.

Further, the remainder of fig. 3a excluding sub-image 1 and sub-image 2 (i.e., the background element) may be determined as sub-image 3.

Alternatively, in the practical application of the cartoon scene, in the case that each element in the cartoon video has been subjected to 3D modeling, for each video image frame of the cartoon video, the sub-image in each video image frame may be directly determined based on the 3D modeling of each element.

Alternatively, when the image to be processed is a video frame image, a partial sub-image of the image to be processed may be determined based on TRACK ANYTHING techniques. That is, by TRACK ANYTHING techniques, the same image element in a plurality of consecutive video frame images can be determined, and thus, a portion of the sub-image in the image to be processed can be determined.

Among other things, TRACK ANYTHING techniques generally involve locking the same object in a video using computer vision and machine learning techniques, which allow an object to be locked to be specified by simple clicking or selection, and automatically locked throughout the video. Specifically, TRACK ANYTHING techniques are typically based on video object locking algorithms and models, such as SAM (SEGMENT ANYTHING Model) and VOS (Video Object Segmentation) models, and the like. These models use deep learning techniques to identify and lock target objects in the video and are capable of handling a variety of complex scenes and dynamic changes.

Referring to fig. 3d, fig. 3d is a schematic view of a scene of a determined sub-image according to an embodiment of the present application. For example, a sequence of video frames is used to represent the running process of an athlete, and the athlete in the sequence of video frames may be locked by TRACK ANYTHING techniques and a mask sequence for the athlete may be obtained. The mask sequence includes a plurality of mask areas, each mask area corresponding to an athlete in one of the video frame images in the video frame sequence. The mask area corresponding to the athlete in each video frame image in the video frame sequence can be rapidly determined based on TRACK ANYTHING technology, and then the image content of each mask area corresponding to the corresponding video frame image is determined to be a sub-image of the corresponding video frame image.

Optionally, when each sub-image of the image to be processed is determined, in the case that the image to be processed is a video frame image, since the background element of the image to be processed is often unchanged in a plurality of continuous video frame images, when rendering a plurality of continuous video frame images including the image to be processed, only the background image needs to be rendered once, thereby avoiding random jitter of the picture caused by randomness of the generated model.

In this case, a first sequence of video frames may be determined, the first sequence of video frames comprising images to be processed, and the background element of each video frame image in the first video frame training being the same. That is, in the case where the image to be processed is a video frame image, a video frame sequence including the same background element including the image to be processed is truncated as a first video frame sequence.

Further, the background elements of each video frame image in the first video frame sequence are fused to obtain a background image corresponding to the first video frame sequence, and the background image is further determined to be a sub-image, including the background elements, of each video frame image in the first video frame sequence.

The background element of each video frame image in the first video frame sequence can be determined through a background recognition model, and the area except for each mask area in each video frame image can be determined to be the background element through an image instance segmentation mode.

As an example, referring to fig. 3e, fig. 3e is a schematic view of a scene of a determined sub-image according to an embodiment of the present application. As shown in fig. 3e, if the video frame sequence in which the image to be processed shown in fig. 3a is located includes a video frame image 1, a video frame image 2, and a video frame image 3 (i.e., the first video frame sequence), the background element 1 of the video frame image 1 is obtained by performing image instance segmentation and foreground element removal on the video frame image 2, the background element 2 of the video frame image 2 is obtained by performing image instance segmentation and foreground element removal on the video frame image 2, and the background element 3 of the video frame image 3 is obtained by performing image instance segmentation and foreground element removal on the video frame image 3.

Further, the background element 1, the background element 2 and the background element 3 are fused to obtain the background element corresponding to each image video frame in the first video frame sequence, namely, a sub-image corresponding to each image video frame and comprising the background element is obtained, namely, a sub-image corresponding to the image to be processed and comprising the background element is obtained, which is shown in fig. 3 a.

If a 3D model exists in the background element corresponding to the image to be processed, then only the same Shader parameter is needed to derive the background image which does not include any foreground element as a sub-image which includes the background element of the image to be processed.

Wherein a shader is a simple procedure that describes the characteristics of vertices or pixels. The vertex shader describes the attributes of the vertex (location, texture coordinates, color, etc.), while the pixel shader describes the characteristics of the pixel (color, z-depth, and alpha values).

In some possible embodiments, for each sub-image, the rendering prompt information corresponding to the sub-image is instruction information indicating that the generative model renders the sub-image.

Alternatively, the rendering hint information of each sub-image may include related information for describing the image elements of the sub-image.

The relevant information for objectively describing the image elements of the sub-image includes, but is not limited to, color features, texture features, shape features, category features, action features, style features, name information, size features, wearing features of each part, direction features, etc. of the image elements, which may be specifically determined based on actual application scene requirements, and is not limited herein.

Wherein, for each sub-image, in determining the related information for describing the image element of the sub-image, the sub-image may be input into a label inference model (such as deepboru model) to obtain the image label of the sub-image, and each image label is used for describing one image feature of the image element of the sub-image.

Further, each image tag may be determined as related information for describing an image element of the image.

As an example, referring to fig. 4, fig. 4 is another schematic diagram of a sub-image provided by an embodiment of the present application. Assuming that the person in fig. 4 is red hair and wears a yellow jacket, the relevant information for describing the picture elements of the sub-image may be one girl (1 girl), solo (single person), long hair, shorts (shorts), grey background (gray background), red hair, navel (umbilical exposure), simple background (simple background), jacket (jacket), standing, full body (whole body), midriff (abdomen), jewelry (jewelry), short shorts (shorts), looking at viewer (direct view viewer), yellow jacket (yellow jacket), choker (necklace).

Optionally, the rendering prompt information of each sub-image may include related information for prompting the rendering feature of the initial image corresponding to the sub-image.

The initial image corresponding to each sub-image is an image obtained by rendering the sub-image through a generated model.

That is, the relevant information for prompting the rendering characteristics of the initial image corresponding to each sub-image is forward prompting information for indicating the generating model to render the image, and is specifically used for indicating the generating model to render the sub-image, so that the initial image obtained by rendering has the rendering characteristics described by the relevant information.

The rendering characteristics of the initial image corresponding to each sub-image are rendering characteristics required to be included in the rendered initial image after the sub-image is rendered, including, but not limited to, image contrast, image saturation, image sharpness, artistic effect, image transparency, hue, style, image characteristics that the image element should have (such as related information for describing the image element), etc., which may be specifically determined based on actual application scene requirements, and are not limited herein.

As an example, the relevant information for providing the rendering characteristics of the initial image corresponding to the sub-image shown in fig. 4 may be MASTERPIECE (excellent), best quality, detailed (focusing on details), red eyes, no bangs (no bang), SIDELIGHTING (photometry), high Saturation, colorful (rich color), portrait, realistic (realistic), lustrous skin (shiny skin).

Optionally, the rendering prompt information of each sub-image may include related information for prompting a rendering feature to be avoided of the initial image corresponding to the sub-image.

That is, the relevant information for prompting the rendering feature to be avoided of the initial image corresponding to each sub-image is used for indicating the reverse prompting information for the generating model to render the image, and is specifically used for indicating the generating model to render the sub-image, so that the rendering initial image avoids generating (does not have) the rendering feature described by the relevant information.

The rendering features to be avoided of the initial image corresponding to each sub-image are rendering features that the rendered initial image does not have after the sub-image is rendered, including but not limited to image contrast, image saturation, image sharpness, artistic effect, image transparency, hue, style, image features that the image element should have (such as the related information used for describing the image element), etc., which may be specifically determined based on actual application scene requirements, and are not limited herein.

As an example of this, in one instance, the relevant information for providing the rendering characteristics to be circumvented by the initial image corresponding to the sub-image shown in fig. 4 may be sketches (sketch), lowres (low resolution), normal quality (normal quality), monochrome (monochromic tone), grayscale (gray scale), skin spots (skin spots), acnes (acne), skin blemishes (skin blemish), bad anatomy (bad anatomy), long hair (long hair), FACING AWAY (back-to-back), looking away (looking elsewhere), TILTED HEAD (askew head), multiple people (multi-person), text (text), MISSING FINGERS (missing finger), extra digit (extra toe), FEWER DIGITS (missing toe), and the like cropping, low quality, normal quality, jpeg artifacts, signature, watermark, usernames, blurry (blur), poorly DRAWN HANDS (extremely poorly drawn hand), poorly DRAWN FACE (extremely poorly drawn face), mutation (mutation) deformed (deformity), extra limbs (redundant limb), extra arms, extra legs (redundant leg), malformed limbs (deformity limb), fused fingers, tou MANY FINGERS (excessive finger), long neck, cross-eyed, mutated hands (variant hand), polar lowres (very low resolution), bad proportions (bad ratio), extra foot.

Optionally, the rendering hint information of each sub-image may include at least one of:

related information for describing image elements of the sub-image;

Related information for prompting rendering characteristics of the initial image corresponding to the sub-image;

and the related information is used for prompting the rendering characteristics to be avoided of the initial image corresponding to the sub-image.

As an example, the rendering prompt information of each sub-image includes related information for describing an image element of the sub-image, related information for prompting a rendering feature of an initial image corresponding to the sub-image, and related information for prompting a rendering feature to be avoided of the initial image corresponding to the sub-image.

The rendering prompt information of each sub-image can further comprise indication information for indicating the generated model to perform image rendering on the sub-image according to the corresponding information.

S22, inputting the sub-images and corresponding rendering prompt information into a generated model for each sub-image to obtain an initial image after rendering the sub-images; and carrying out pixel adjustment on the corresponding initial image according to the sub-image to obtain a target image after rendering the sub-image.

In some possible embodiments, for each sub-image, after determining the rendering prompt information corresponding to the sub-image, the sub-image and the corresponding rendering prompt information may be input into a generating model, so as to obtain an initial image after the sub-image is rendered.

The generating model includes, but is not limited to, generating a countermeasure network (GAN), a Conditional GAN, a language condition generating model (Language Conditioned Generator, LCG), a Stable diffration model, etc., but may also be other large language models with graphics context generating capability, which may be specifically determined based on actual application scene requirements, and is not limited herein.

In some possible embodiments, for each sub-image, when the sub-image and the corresponding rendering prompt information are input into the generative model to obtain an initial image after the sub-image is rendered, the generative model may be updated according to at least one of the target style control model or the structure control model, so as to obtain an updated generative model.

The target style control model is used for rendering the sub-image according to the appointed rendering style of the sub-image by the updated generation model.

Wherein, different rendering styles correspond to different style control models, and the target style control model is the style control model corresponding to the appointed rendering style of the sub-image. The generated model is updated through different wind control models, so that the updated generated model can be rendered to obtain an initial image of a rendering style (such as a role, a drawing wind, an article and the like) corresponding to the corresponding style control model.

As an example, referring to fig. 5a, fig. 5a is a schematic view of rendering styles provided in an embodiment of the present application. For the image a, after the model update of the generative model is performed by the style control model 1, the updated generative model may be caused to render the image a according to the rendering style 1, after the model update of the generative model is performed by the style control model 2, the updated generative model may be caused to render the image a according to the rendering style 2, and after the model update of the generative model is performed by the style control model 3, the updated generative model may be caused to render the image a according to the rendering style 3.

The style control model may be a Low-order adaptation (Low-Rank Adaptation of Large Language Models, loRA) model of a large language model, or may be another network model, which may be specifically determined based on actual application scene requirements, and is not limited herein.

The original parameters of the U-Net structure can be frozen by adopting the style control model, a group of parameter matrix pairs are designed for each network layer of the U-Net structure, and the dimension of the quantity of parameters to be trained is reduced through the parameter matrix pairs.

LoRA is a method for fine tuning a large model, which reduces the number of trainable parameters of the model and does not lose the model performance as much as possible, and the principle is to fine tune the weight parameters of a pre-trained large language model. Specifically, the method freezes the parameters of the original generative model, then adds additional network parameters into the model, and only updates the additional network parameters but not the original network parameters in the fine tuning stage of the model. LoRA the default tuning stage is a single task, and the default generated model has a lower internal dimension (i.e. the minimum dimension required for solving the task) when tuning on the single task, and the internal dimension is often far smaller than the original dimension of the generated model, so that the generated model can achieve the effect of tuning close to the full-scale parameter only through tuning of a small number of parameters.

For example, W ₁ generates model parameters of a model, AB ^T is a low rank matrix, i.e., model parameters of a style control model. And updating the model parameters of the generative model according to the model parameters of the style control model to obtain updated model parameters W ₀= W₁+AB^T of the generative model.

As an example, if the generated model is a Stable diffration model and the style control model is a LoRA model, the model parameters of the LoRA model may be used to update the U-Net structure parameters of Denoising (denoising network) of the Stable diffration model, so as to obtain an updated Stable diffration model.

Referring to fig. 5 b-5 c, fig. 5 b-5 c are schematic diagrams illustrating a comparison of a first set of rendering results according to an embodiment of the present application. After the model update of the generated model is performed through the style control model corresponding to the pen-touch rendering style, the updated generated model is adopted to render the image in fig. 5b, so as to obtain a rendered initial image (fig. 5 c). The updated generated model renders the image more stroke-wise due to the style control model corresponding to the stroke-wise rendering style, as shown in fig. 5c, which is more pronounced than fig. 5 b.

If the generated model is updated according to the target style control model, after the generated model is updated according to the target style control model, the sub-image and the corresponding rendering prompt information can be input into the updated generated model for each sub-image, so that an initial image which is used for rendering the sub-image according to the rendering prompt information and has a designated rendering style corresponding to the target style control model is obtained.

The structure control model is used for rendering the sub-images according to the image structures of the sub-images, that is, when the updated generated model renders the sub-images, the layout, the characters, the image structures and the like of the image elements of the sub-images can be reserved.

The above-mentioned structure control model includes, but is not limited to ControlNet, ip-adapter, etc., which may be specifically determined based on the actual application scenario requirements, and is not limited herein.

If the generated model is updated according to the structure control model, after the generated model is updated according to the structure control model, the sub-image and the corresponding rendering prompt information can be input into the updated generated model for each sub-image, so that the sub-image is rendered according to the rendering prompt information, and the initial image of the basic composition of the sub-image is reserved.

If the generated model is updated according to the structure control model and the target style control model, after the generated model is updated according to the structure control model and the target style control model, the sub-image and corresponding rendering prompt information can be input into the updated generated model for each sub-image, so that an initial image which is used for rendering the sub-image according to the rendering prompt information, is reserved for basic composition of the sub-image, and has a designated rendering style corresponding to the target style control model is obtained.

In some possible embodiments, when each sub-image includes an image element that is a person object, the sub-image that is obtained by dividing the image instance is also prone to losing the person details because of the greater image details of the person object, so that the person object with lost details is likely to be obtained by directly rendering the sub-image through the generated model, and the person object is more abrupt compared with the surrounding environment because the background element affects the light and color of the contour line around the person.

As an example, referring to fig. 6 a-6 b, fig. 6 a-6 b are a second set of rendering results versus schematic diagrams provided by an embodiment of the present application. Fig. 6a is an initial image obtained by directly rendering a person object through a generative model, and fig. 6b is an initial image obtained by rendering a person object including a background element through a generative model. By rendering the background element and the character object together, character details (such as hand details) of the character object can be reserved, so that rendering results of the character object are more complete and are more fused with the background element.

Based on this, when the image element included in any one of the sub-images is a person object, a first combined image of the image to be processed may be determined, the first combined image including the background element of the image to be processed and the person object to which the sub-image corresponds.

Further, the first combined image and the rendering prompt information corresponding to the sub image can be input into a generated model or an updated generated model, so as to obtain an initial image after the first combined image is rendered.

Based on the above, an initial image after rendering the sub-image is determined according to the initial image corresponding to the first combined image. For example, the initial image corresponding to the first combined image may be directly determined as the initial image after the sub-image is rendered, or the background element of the initial image corresponding to the first combined image may be removed, and the initial image after the background element is removed may be determined as the initial image after the sub-image is rendered.

In some possible embodiments, when the generative model includes a denoising network, such as Denoising U-Net network in the Stable diffration model, the denoising parameters of the denoising network are used to control the denoising intensity, and reduce the noise in the image. The higher the value, the more obvious the denoising effect, but the loss of image details may be caused. In brief, the larger the Denoise parameters, the less the generated image is like the artwork, and conversely the smaller the difference from the artwork.

Based on this, for each sub-image, if the image element included in the sub-image is a person object or other objects (such as a building, a vehicle, etc.), a lower denoising parameter may be applied to the generated model, so that the initial image obtained by rendering retains the structural information of the sub-image as much as possible. If the image elements included in the sub-image are special effect elements, higher denoising parameters can be applied to the generated model, so that the special effect elements in the initial image obtained by rendering are diversified as much as possible and are more real.

As an example, referring to fig. 6 c-6 d, fig. 6 c-6 d are a third set of rendering results comparing diagrams according to an embodiment of the present application. FIG. 6c is a sub-image including effect elements, after rendering FIG. 6c by applying a generative model with higher denoising parameters, the resulting initial image (FIG. 6 d) has a more diversified and more realistic effect representation than the effect elements prior to rendering.

In some possible embodiments, different sub-images respectively correspond to different rendering prompt information, so that different rendering styles may exist between the initial images corresponding to different sub-images under the influence of different rendering prompt information, and the image content of different initial images may have dissonance problems in light and dark, white balance and the like. Therefore, for each sub-image, after the initial image after rendering the sub-image is obtained through generating the pattern model, pixel adjustment can be performed on the corresponding initial image according to the sub-image, so as to obtain the target image after rendering the sub-image.

Specifically, for each sub-image, an initial HSV pixel value for each pixel of the sub-image, and a corresponding HSV pixel value for each pixel of the initial image, may be determined.

That is, for each sub-image, the sub-image and the corresponding initial image may be converted to HSV space, resulting in HSV pixel values for each pixel of the sub-image and the corresponding initial image.

Among them, HSV space is also called hexagonal pyramid color space, which is also three parameters, where H represents hue, S represents saturation, and V represents brightness. Hue H is a color base attribute measured by the magnitude of the angle in a hexagonal pyramid model, where 0 ° represents red, 120 ° represents green, 240 ° represents blue, 60 ° represents yellow, 180 ° represents cyan, 300 ° represents violet. The saturation S is the degree that the color is similar to the spectral color, the spectral color is mixed with white to form other colors, when the spectral color occupies a high proportion, the color and the spectral color become similar, the higher the color saturation is, and the more gorgeous the color is. The brightness V is the brightness of the color, which is subdivided into two types, one of which is the brightness for the light source color, which depends on the luminance of the light-emitting object; another type of brightness for the color of an object depends on the physical light transmittance of the object.

Further, for each sub-image, a first intermediate image may be determined from the sub-image and the corresponding initial image.

The initial HSV pixel value of each pixel point of the first intermediate image is the difference between the initial HSV pixel value of the sub-image at the same pixel point and the initial HSV pixel value of the corresponding initial image at the same pixel point.

That is, for each sub-image, the sub-image and the corresponding initial image are subjected to difference processing on the initial HSV pixel values at each pixel point to obtain the difference of the initial HSV pixel values of each pixel point, and then the first intermediate image is obtained according to the difference of the initial HSV pixel values of each pixel point.

On the basis, a pixel point, the initial HSV pixel value of which is larger than a preset pixel value, in the first intermediate image can be determined to be a first pixel point, and the initial HSV pixel value of the first pixel point in the first intermediate image is set to be the first pixel value, so that the second intermediate image is obtained.

The first pixel value is smaller than a preset pixel value, for example, an initial HSV pixel value of a first pixel point in the first intermediate image is set to 0 (first pixel value).

By setting the initial HSV pixel value of the first pixel point in the first intermediate image to the first pixel value, the influence of the pixel points with hue, saturation and brightness too high on the rendering result can be eliminated.

Further, for each sub-image, after the second intermediate image is obtained based on the sub-image and the corresponding initial image, the second intermediate image may be subjected to the etching process for the first preset number of times and/or the expansion process for the second preset number of times to obtain the third intermediate image. The influence of newly added image details after image rendering on the final rendering result can be reduced through corrosion processing and/or expansion processing.

For each sub-image, after obtaining the third intermediate image corresponding to the sub-image, a target image after rendering the sub-image can be obtained according to the image and the corresponding third intermediate image.

For example, the sub-image and the corresponding initial HSV pixel value of each pixel value of the third intermediate image may be averaged to finally obtain the target image after rendering the sub-image.

Or for each sub-image, a first standard deviation and a first mean of the HSV pixel values for each pixel of the sub-image and a second standard deviation and a second mean of the initial HSV pixel values for each pixel of the third intermediate image to which the sub-image corresponds may be determined.

Thus, for each pixel of the sub-image, the target HSV pixel value of the pixel may be determined based on the first standard deviation and the first average value of the HSV pixel values of each pixel of the sub-image, the second standard deviation and the second average value of the initial HSV pixel values of each pixel of the third intermediate image corresponding to the sub-image, and the initial HSV pixel value of the third intermediate image at the pixel.

The specific determination mode of the target HSV pixel value of each pixel point may be: h= (a-B2) × (C1/C2) +b1.

Wherein H is a target HSV pixel value, a is an HSV pixel value of the third intermediate image at the pixel point, B2 is a second average value of initial HSV pixel values of each pixel point of the third intermediate image, C1 is a first standard deviation of initial HSV pixel values of each pixel point of the sub-image, C2 is a second standard deviation of each pixel point of the third intermediate image, and B1 is a first average value of initial HSV pixel values of each pixel point of the sub-image.

S23, generating a target image after rendering the image to be processed according to the target image corresponding to each sub-image.

In some possible embodiments, after determining the target image corresponding to each sub-image of the image to be processed, the position information of each sub-image in the image to be processed may be determined, and then the target images corresponding to each sub-image may be fused according to the position information of each sub-image in the image to be processed, so as to obtain the target image after rendering the processed image.

As an example, the target image corresponding to each first sub-image may be placed on the second sub-image according to the position information of the sub-image in the image to be processed, so as to obtain a new image, and the image may be determined as the target image after rendering the image to be processed.

The second sub-image is a sub-image comprising a background element of the image to be processed, and the first sub-image is other sub-images except the second sub-image in all sub-images of the image to be processed.

Optionally, when generating the target image after rendering the image to be processed according to each sub-image of the image to be processed, each sub-image may be input into an image fusion model, image fusion is performed on each sub-image by image fusion, and the fusion result is determined to be the target image after rendering the image to be processed.

In the embodiment of the application, the generated model, the style control model and the structure control model can be obtained by training the technology fields of machine learning (MACHINE LEARNING, ML) based on artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), cloud computing (Cloud computing) in Cloud technology (Cloud technology), artificial intelligent Cloud service and the like.

Wherein artificial intelligence is the intelligence of simulating, extending and expanding a person using a digital computer or a machine controlled by a digital computer, sensing the environment, obtaining knowledge, and using knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence.

Machine learning is the specialized study of how computers simulate or implement learning behavior of humans to acquire new knowledge or skills, reorganizing existing knowledge structures to continually improve their own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence.

In the embodiment of the application, related images, such as sub-images of an image to be processed, initial images corresponding to each sub-image, target images and the like, which need to be stored may be stored in a designated storage space, where the designated storage space includes, but is not limited to, cloud storage, a database (such as a MYSQL database), a blockchain, a storage space of a device itself executing the video processing method provided by the embodiment of the application, and the like, and the related images may be specifically determined based on actual application scene requirements, and are not limited herein.

The database may be considered as an electronic file cabinet, i.e. a place where electronic files are stored, and may be a relational database (SQL database) or a non-relational database (NoSQL database), which is not limited herein. The method and the device can be used for storing the sub-images of the image to be processed, the initial image and the target image corresponding to each sub-image and other related images which need to be stored. Blockchains are novel application modes of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. Blockchains are essentially a de-centralized database, which is a string of data blocks that are generated in association using cryptographic methods. In the embodiment of the application, each data block in the block chain can store the sub-image of the image to be processed, the initial image corresponding to each sub-image, the target image and other related images which need to be stored. Cloud storage is a new concept which extends and develops in the concept of cloud computing, and refers to that a large number of storage devices (storage devices are also called storage nodes) of different types in a network are combined to work cooperatively through application software or application interfaces through functions of cluster application, grid technology, a distributed storage file system and the like, so as to jointly store sub-images of images to be processed, initial images corresponding to each sub-image, related images needing to be stored, and the like.

The image processing process (such as the determination process of the sub-image, the pixel adjustment process, the image fusion process and the like) related in the embodiment of the application can be realized based on cloud computing in cloud technology. The cloud technology is a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required.

Cloud computing is a computing mode, and is a product of fusion of traditional computer and network technology development, such as Grid computing (Grid computing), distributed computing (Distributed Computing), parallel computing (Parallel Computing), utility computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load Balance), and the like. Cloud computing distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information service according to requirements. The network providing the resources is called a ' cloud ', the resources in the cloud ' are infinitely expandable and available at any time, used on demand, expanded at any time, paid for use on demand.

As an example, the image processing method provided by the embodiment of the present application is further described below with reference to fig. 7. Fig. 7 is another flow chart of an image processing method according to an embodiment of the present application, where the image processing method shown in fig. 7 may specifically include:

S701, determining rendering prompt information of at least one sub-image of an image to be processed and each sub-image.

Specifically, when determining each sub-image of the image to be processed, at least one mask (mask) area in the image to be processed may be determined based on the image instance segmentation method, where each mask area corresponds to one image element in the image to be processed. And the pixel value of each mask area is the same, and the pixel values corresponding to any two areas are different.

Specifically, the rendering hint information of each sub-image may include at least one of:

related information for describing image elements of the sub-image;

The rendering prompt information may be specifically described with reference to S21 in fig. 2, and is not described herein.

S702, updating the generative model according to at least one of the target style control model or the structural control model to obtain an updated generative model.

Specifically, for each sub-image, when the sub-image and the corresponding rendering prompt information are input into the generated model to obtain an initial image after rendering the sub-image, the generated model can be updated according to at least one of the target style control model or the structure control model to obtain an updated generated model.

The style control model may be a Low-order adaptation (Low-Rank Adaptation of Large Language Models, loRA) model of a large language model, or may be another network model, and the structural control model includes, but is not limited to, controlNet, ip-adapter, etc., and may be specifically determined based on actual application scene requirements, which is not limited herein.

S703, inputting each sub-image and corresponding rendering prompt information into the updated generated model to obtain an initial image after rendering each sub-image.

Specifically, for each sub-image, after the model update is performed on the generated model according to the structure control model and the target style control model, the sub-image and the corresponding rendering prompt information can be input into the updated generated model, so as to obtain an initial image which is used for rendering the sub-image according to the rendering prompt information, retains the initial image of the basic composition of the sub-image and has a designated rendering style corresponding to the target style control model.

S704, determining an initial HSV pixel value of each pixel point of each sub-image, and an initial HSV pixel value of each pixel point of the corresponding initial image.

Specifically, for each sub-image, the sub-image and the corresponding initial image may be converted into HSV space, resulting in an HSV pixel value for each pixel of the sub-image and the corresponding initial image.

S705, determining a first intermediate image corresponding to each sub-image, wherein the initial HSV pixel value of each pixel point of the first intermediate image is the difference between the initial HSV pixel value of the sub-image at the same pixel point and the initial HSV pixel value of the corresponding initial image at the same pixel point.

In particular, for each sub-image, a first intermediate image may be determined from the sub-image and the corresponding initial image.

S706, setting the initial HSV pixel value of the first pixel point in the first intermediate image as the first pixel value, and obtaining a second intermediate image.

Specifically, for each sub-image, a pixel point in the first intermediate image, where the initial HSV pixel value is greater than the preset pixel value, may be determined as a first pixel point, and the initial HSV pixel value of the first pixel point in the first intermediate image is set as the first pixel value, so as to obtain the second intermediate image.

And S707, performing corrosion and expansion treatment on the second intermediate image to obtain a third intermediate image.

Specifically, for each sub-image, after the second intermediate image is obtained based on the sub-image and the corresponding initial image, the second intermediate image may be subjected to the etching process for the first preset number of times and/or the expansion process for the second preset number of times to obtain the third intermediate image. The influence of newly added image details after image rendering on the final rendering result can be reduced through corrosion processing and/or expansion processing.

S708, determining a first standard deviation and a first mean value of initial HSV pixel values of all pixel points of each sub-image; a second standard deviation and a second mean of the initial HSV pixel values of the respective pixels of the third intermediate image are determined.

S709, for each pixel point of each sub-image, determining a target HSV pixel value of the pixel point based on the first average value, the initial HSV pixel value of the third intermediate image at the pixel point, the second average value, the first standard deviation, and the second standard deviation.

Specifically, for each sub-image, the target HSV pixel value of each pixel point of the sub-image may be determined based on the first standard deviation and the first average value of the HSV pixel values of the pixel points of the sub-image, the second standard deviation and the second average value of the initial HSV pixel values of the pixel points of the third intermediate image corresponding to the sub-image, and the initial HSV pixel value of the third intermediate image at the pixel point.

The specific determination method of the target HSV pixel value of each pixel point may be: target HSV pixel value= (HSV pixel value of the third intermediate image at the pixel point-second average value of initial HSV pixel values of respective pixel points of the third intermediate image) = (first standard deviation of initial HSV pixel values of respective pixel points of the sub-image/second standard deviation of respective pixel points of the third intermediate image) +first average value of initial HSV pixel values of respective pixel points of the sub-image.

S710, generating a target image after rendering each sub-image according to the target HSV pixel value of each pixel point of each sub-image.

Specifically, for each sub-image, an image composed of each pixel point of the sub-image with the target HSV pixel value may be determined as a target image after rendering the sub-image.

And S711, fusing the target images corresponding to the sub-images according to the position information of the sub-images in the image to be processed, so as to obtain the target image after rendering the image to be processed.

Specifically, after determining the target image corresponding to each sub-image of the image to be processed, the position information of each sub-image in the image to be processed can be determined, and then the target images corresponding to each sub-image can be fused according to the position information of each sub-image in the image to be processed, so as to obtain the target image after rendering the processed image.

In the embodiment of the application, when the image to be processed is rendered, the image to be processed can be split into a plurality of sub-images comprising single image elements, and independent rendering prompt information is not determined by each image element, so that each sub-image and the rendering prompt information thereof can be input into the generated model, independent rendering of each image element comprising tiny image elements is realized, mutual interference among rendering styles of each image element under the condition of rendering the whole image to be processed is avoided, and the rendering effect of each image element is improved.

Meanwhile, the rendering prompt information of each sub-image can also comprise related information for prompting the rendering characteristics of the initial image corresponding to the sub-image and/or to be avoided besides the related information for describing the image elements of the sub-image, so that the rendering capability of the generated model is improved, and the rendering effect of each image element is improved.

And when each sub-image model is rendered, the model of the generated model is updated through the style control model, so that the updated generated model can render the corresponding sub-image according to the appointed rendering style, the stylization of each image element is more independent and obvious, and the style difference among the image elements is reflected. For example, as shown in fig. 8 a-8 c, fig. 8 a-8 c are a fourth set of rendering results comparing diagrams according to an embodiment of the present application. FIG. 8a is an unrendered image to be processed, after the model update of the generative model is performed by the style control model corresponding to the feminization rendering style, the character object in FIG. 8a is rendered by the updated generative model, and the character object in FIG. 8b is more feminized than the character style of the character object in FIG. 8 a. After the model update is performed on the generated model through the style control model corresponding to the maleation rendering style, the character object in fig. 8a is rendered through the updated generated model, and the character object in fig. 8c is more maleated than the character style of the character object in fig. 8 a.

The generated model is updated through the structure control model, so that the updated generated model can keep the structural information of the image elements of the sub-images in the process of rendering the corresponding sub-images, and the change of the image structures of the rendered image elements is avoided.

Further, for each sub-image, the pixel value of the target image corresponding to each sub-image can be fused by carrying out pixel adjustment on the corresponding initial image through the sub-image, so that each rendered image element is more coordinated and not obtrusive. As shown in fig. 9, fig. 9 is a schematic diagram of a rendering result provided by an embodiment of the present application. As shown in fig. 9, after the image processing method provided by the embodiment of the application renders the image to be processed, the style of the rendered male object and that of the rendered female object are differentiated, and meanwhile, the rendered male object and the rendered female object are more fused with the background element, so that the rendering effect is more coordinated, and the applicability is high.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. The image processing device provided by the embodiment of the application comprises:

an image determining module 101, configured to determine at least one sub-image of an image to be processed and rendering prompt information of each of the sub-images, where each of the sub-images includes an image element of the image to be processed;

The image rendering module 102 is configured to input, for each of the above sub-images and the corresponding rendering prompt information into a generated model, to obtain an initial image after rendering the sub-image; according to the sub-image, carrying out pixel adjustment on the corresponding initial image to obtain a target image after rendering the sub-image;

And the image fusion module 103 is used for generating a target image after rendering the image to be processed according to the target image corresponding to each sub-image.

In some possible embodiments, the rendering hint information of each of the above sub-images includes at least one of:

related information for describing image elements of the sub-image;

In some possible embodiments, for each of the sub-images, the image rendering module 102 is configured to:

determining an initial HSV pixel value of each pixel point of the sub-image and an initial HSV pixel value of each pixel point of the corresponding initial image;

Determining an initial HSV pixel value of each pixel point of the first intermediate image, wherein the initial HSV pixel value is the difference between the initial HSV pixel value of the sub-image at the same pixel point and the initial HSV pixel value of the corresponding initial image at the same pixel point;

setting an initial HSV pixel value of a first pixel point in the first intermediate image as a first pixel value to obtain a second intermediate image, wherein the initial HSV pixel value of the first pixel point is larger than a preset pixel value;

And performing corrosion and expansion processing on the second intermediate image to obtain a third intermediate image, and obtaining a target image after rendering the sub-image according to the sub-image and the third intermediate image.

Determining a first standard deviation and a first mean of initial HSV pixel values of all pixel points of the sub-image;

Determining a second standard deviation and a second average value of initial HSV pixel values of all pixel points of the third intermediate image;

For each pixel point of the sub-image, determining a target HSV pixel value of the pixel point based on the first mean value, the initial HSV pixel value of the third intermediate image at the pixel point, the second mean value, the first standard deviation and the second standard deviation;

and generating a target image after rendering the sub-image according to the target HSV pixel value of each pixel point of the sub-image.

Updating the generated model according to at least one of the target style control model or the structure control model to obtain an updated generated model;

The target style control model is associated with a designated rendering style of the sub-image, and is used for rendering the sub-image according to the designated rendering style by the updated generation model; the structure control model is used for rendering the sub-image according to the image structure of the sub-image by the updated generated model;

and inputting the sub-image and the corresponding rendering prompt information into the updated generated model to obtain an initial image after rendering the sub-image.

In some possible embodiments, for each of the above sub-images, when the image element included in the sub-image is a person object, the image rendering module 102 is configured to:

Determining a first combined image of an image to be processed, wherein the first combined image comprises a background element of the image to be processed and a character object corresponding to the sub-image;

Inputting the first combined image and rendering prompt information corresponding to the sub-image into a generating model to obtain an initial image after the first combined image is rendered;

And determining an initial image after rendering the sub-image according to the initial image corresponding to the first combined image.

In some possible embodiments, when the image to be processed is a video frame image, the image determining module 101 is configured to:

Determining a first video frame sequence, wherein the first video frame sequence comprises the image to be processed, and the background element of each video frame image in the first video frame sequence is the same;

and fusing the background elements of each video frame image in the first video frame sequence to obtain a background image corresponding to the first video frame sequence, and determining the background image as a sub-image of each video frame image, wherein the sub-image comprises the background elements.

In some possible embodiments, the image fusion module 103 is configured to:

determining position information of each sub-image in the image to be processed;

And fusing the target images corresponding to the sub-images according to the position information of the sub-images in the image to be processed to obtain the target image after rendering the image to be processed.

In a specific implementation, the image processing apparatus may execute the implementation provided by each step in fig. 2 and/or fig. 7 through each built-in functional module, and specifically, the implementation provided by each step may be referred to, which is not described herein again.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 11, the electronic device 1100 in the present embodiment may include: processor 1101, network interface 1104, and memory 1105, and further, the above-described electronic device 1100 may further include: an object interface 1103, and at least one communication bus 1102. Wherein communication bus 1102 is used to facilitate connection communications among the components. The object interface 1103 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional object interface 1103 may further include a standard wired interface and a wireless interface. Network interface 1104 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1105 may be a high-speed RAM memory or a non-volatile memory (NVM), such as at least one magnetic disk memory. The memory 1105 may also optionally be at least one storage device located remotely from the processor 1101. As shown in fig. 11, an operating system, a network communication module, an object interface module, and a device control application may be included in the memory 1105 as one type of computer-readable storage medium.

In the electronic device 1100 shown in fig. 11, the network interface 1104 may provide network communication functionality; while object interface 1103 is primarily an interface for providing input to objects; and the processor 1101 may be configured to invoke the device control application stored in the memory 1105 to implement:

In some possible embodiments, the processor 1101 is configured to:

related information for describing image elements of the sub-image;

In some possible embodiments, for each of the above sub-images, the above processor 1101 is configured to:

In some possible embodiments, for each of the above sub-images, when the image element included in the sub-image is a person object, the processor 1101 is configured to:

In some possible embodiments, when the image to be processed is a video frame image, the processor 1101 is further configured to:

In some possible embodiments, the processor 1101 is configured to:

It should be appreciated that in some possible embodiments, the processor 1101 may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.

In a specific implementation, the electronic device 1100 may execute, through each functional module built in the electronic device, an implementation manner provided by each step in fig. 2 and/or fig. 7, and specifically, the implementation manner provided by each step may be referred to, which is not described herein again.

The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored and executed by a processor to implement the method provided by each step in fig. 2 and/or fig. 7, and specifically, the implementation manner provided by each step may be referred to, which is not described herein.

The computer readable storage medium may be the image processing apparatus or the internal storage unit of the electronic device provided in any of the foregoing embodiments, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the electronic device. The computer readable storage medium may also include a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory (random access memory, RAM), or the like. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application provide a computer program product comprising a computer program for executing the method provided by the steps of fig. 2 by a processor.

The terms first, second and the like in the claims and in the description and drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or electronic device that comprises a list of steps or elements is not limited to the list of steps or elements but may, alternatively, include other steps or elements not listed or inherent to such process, method, article, or electronic device. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. An image processing method, the method comprising:

determining at least one sub-image of an image to be processed and rendering prompt information of each sub-image, wherein each sub-image comprises one image element of the image to be processed;

Inputting the sub-images and corresponding rendering prompt information into a generated model for each sub-image to obtain an initial image after rendering the sub-images; determining an initial HSV pixel value of each pixel point of the sub-image and the corresponding initial image; setting an initial HSV pixel value of a first pixel point with the initial HSV pixel value larger than a preset pixel value in a first intermediate image as a first pixel value to obtain a second intermediate image; the initial HSV pixel value of each pixel point of the first intermediate image is the difference between the initial HSV pixel value of the sub-image at the same pixel point and the initial HSV pixel value of the corresponding initial image at the same pixel point; performing corrosion and expansion treatment on the second intermediate image to obtain a third intermediate image, and obtaining a target image after rendering the sub-image according to the sub-image and the third intermediate image;

2. The method of claim 1, wherein the rendering hints for each of the sub-images comprises at least one of:

related information for describing image elements of the sub-image;

3. The method according to claim 1, wherein for each of the sub-images, the obtaining a rendered target image for the sub-image from the sub-image and the third intermediate image comprises:

determining a second standard deviation and a second mean of initial HSV pixel values of all pixel points of the third intermediate image;

4. The method according to claim 1, wherein for each sub-image, the inputting the sub-image and the corresponding rendering prompt information into the generative model to obtain the initial image after rendering the sub-image includes:

The target style control model is associated with a designated rendering style of the sub-image, and is used for rendering the sub-image according to the designated rendering style by the updated generated model; the structure control model is used for enabling the updated generated model to render the sub-image according to the image structure of the sub-image;

5. The method according to claim 1, wherein for each of the sub-images, when the image element included in the sub-image is a person object, the step of inputting the sub-image and the corresponding rendering prompt information into the generated model to obtain the initial image after rendering the sub-image includes:

6. The method of claim 1, wherein when the image to be processed is a video frame image, the method further comprises:

7. The method according to claim 1, wherein the generating the target image after rendering the image to be processed according to the target image corresponding to each sub-image includes:

determining the position information of each sub-image in the image to be processed;

8. An image processing apparatus, characterized in that the apparatus comprises:

The image determining module is used for determining at least one sub-image of the image to be processed and rendering prompt information of each sub-image, wherein each sub-image comprises one image element of the image to be processed;

The image rendering module is used for inputting the sub-images and corresponding rendering prompt information into a generating model for each sub-image to obtain an initial image after the sub-images are rendered; determining an initial HSV pixel value of each pixel point of the sub-image and the corresponding initial image; setting an initial HSV pixel value of a first pixel point with the initial HSV pixel value larger than a preset pixel value in a first intermediate image as a first pixel value to obtain a second intermediate image; the initial HSV pixel value of each pixel point of the first intermediate image is the difference between the initial HSV pixel value of the sub-image at the same pixel point and the initial HSV pixel value of the corresponding initial image at the same pixel point; performing corrosion and expansion treatment on the second intermediate image to obtain a third intermediate image, and obtaining a target image after rendering the sub-image according to the sub-image and the third intermediate image;

9. An electronic device comprising a processor and a memory, the processor and the memory being interconnected;

The memory is used for storing a computer program;

the processor is configured to perform the method of any of claims 1 to 7 when the computer program is invoked.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor to implement the method of any one of claims 1 to 7.

11. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method of any one of claims 1 to 7.