WO2023125374A1

WO2023125374A1 - Image processing method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023125374A1
Application number: PCT/CN2022/141815
Authority: WO
Inventors: 白须
Original assignee: 北京字跳网络技术有限公司
Priority date: 2021-12-29
Filing date: 2022-12-26
Publication date: 2023-07-06
Also published as: CN114331820A

Abstract

An image processing method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring an image to be processed comprising a target subject (S110); inputting the image to be processed and a subject attribute of the target subject into a target style conversion model to obtain a target special effect image in which the target subject has been converted into a target style type (S120); and displaying the target special effect image in an image display area (S130).

Description

Image processing method, device, electronic device and storage medium

This application claims priority to a Chinese patent application with application number 202111641158.3 filed with the China Patent Office on December 29, 2021, the entire contents of which are incorporated herein by reference.

technical field

Embodiments of the present disclosure relate to the technical field of image processing, for example, to an image processing method, device, electronic equipment, and storage medium.

Background technique

Image style transfer can be understood as rendering an image into an image with a specific artistic style. Image style transfer images in the related art are mostly implemented by texture synthesis. Alternatively, a style transfer model is trained to convert the image into a certain style based on the style transfer model.

However, when training the style transfer model, it is necessary to obtain a large amount of style data, which is difficult in the actual collection process. Therefore, based on this situation, the trained model cannot obtain a better style transfer effect . At the same time, the style transfer model in the related art cannot process images with different subject attributes, resulting in poor effect of the obtained style transfer image, which further affects the effect of user experience.

Contents of the invention

The present disclosure provides an image processing method, device, electronic equipment, and storage medium, so as to obtain a special effect image of a target style type and improve the display richness of image content.

In a first aspect, an embodiment of the present disclosure provides an image processing method, the method including:

Obtain an image to be processed including a target subject;

Inputting the image to be processed and the subject attributes of the target subject into the target style conversion model to obtain a target special effect image that converts the target subject into a target style type;

The target special effect image is displayed in the image display area.

In a second aspect, an embodiment of the present disclosure further provides an image processing device, which includes:

The image acquisition module to be processed is configured to acquire the image to be processed including the target subject;

The special effect image determination module is configured to input the image to be processed and the subject attributes of the target subject into the target style conversion model to obtain a target special effect image that converts the target subject into a target style type;

The image display module is configured to display the target special effect image in the image display area.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, and the electronic device includes:

one or more processors;

storage means configured to store one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the image processing method according to any one of the embodiments of the present disclosure.

In a fourth aspect, the embodiments of the present disclosure further provide a storage medium containing computer-executable instructions, and the computer-executable instructions are used to execute any one of the image processing methods described in the embodiments of the present disclosure when executed by a computer processor.

Description of drawings

Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic flowchart of an image processing method provided by another embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of an image processing method provided by another embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a target model to be trained provided by an embodiment of the present disclosure;

FIG. 5 is a schematic flowchart of an image processing method provided by another embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of an image processing device provided by an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

It should be understood that multiple steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this respect.

As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.

It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.

Before introducing the technical solution, an example description may be given to the application scenario. The disclosed technical solution can be applied to any scene of image style conversion, for example, converting a captured static image into an image of a certain theme style, the theme style can be Japanese style, Korean style, or designed by a designer Any theme style. Apply it in the scene of special effects video shooting, for example, convert a certain user in the captured screen or all users in the entire screen into a video with a certain theme style. The style type may match the style of the makeup, and at the same time, the screen to which the user belongs may also be converted to match the style of the makeup.

In this embodiment, users in each video frame to be processed can be displayed according to a corresponding style theme, or the entire video frame can be converted into a certain theme style.

Fig. 1 is a schematic flow chart of an image processing method provided by an embodiment of the present disclosure. The embodiment of the present disclosure is applicable to the situation of converting an image frame into a target style type in any image display scene supported by the Internet. The method can be executed by an image processing device, and the device can be realized in the form of software and/or hardware, for example, realized by electronic equipment, and the electronic equipment can be a mobile terminal, a PC terminal or a server, etc. The scene of arbitrary image display is usually implemented by the cooperation of the client and the server. The method provided in this embodiment can be executed by the server, the client, or the cooperation of the client and the server.

As shown in Figure 1, the method includes:

S110. Acquire an image to be processed including a target subject.

Wherein, the device for executing the image processing method provided by the embodiments of the present disclosure may be integrated into application software supporting image processing functions, and the software may be installed in electronic equipment, for example, the electronic equipment may be a mobile terminal or a PC terminal, etc. The application software may be a type of software for image/video processing, and its specific application software will not be described here one by one, as long as the image/video processing can be realized. It can also be a specially developed application program to realize the addition and display of special effects in the software, or it can be integrated in the corresponding page, and the user can realize the special effect addition process through the integrated page in the PC terminal.

Wherein, the image to be processed may be an image collected based on the application software. In practical applications, the image including the target subject may be captured in real time based on the application software. The associated video frame or image processing is the style type consistent with the target style transfer type. It is also possible to set a corresponding special effect based on the style type, and after detecting that the user triggers the special effect, all the captured images may be converted into the corresponding style type.

In the scene of image shooting or video shooting, there may be multiple subjects in the captured image. For example, in a scene with high traffic density, all users captured in the frame may be used as target subjects. It is also possible to mark which user or multiple users are the target subject before adding the special effect, and correspondingly, when the image to be processed is collected and determined to include the target subject in the image to be processed, it is processed.

It can be understood as: if a target video corresponding to the target style type needs to be generated, the style theme conversion control can be triggered. At the same time, the image to be processed can be collected based on the camera device deployed on the terminal device. The target subject may or may not be included in the image to be processed. An image randomly obtained from a webpage may also be used as the image to be processed. The image to be processed can be converted into a target special effect image consistent with the target style type.

S120. Input the image to be processed and the subject attributes of the target subject into the target style conversion model to obtain a target special effect image that converts the target subject into the target style type.

Wherein, the image to be processed is an image captured randomly or downloaded, and may also be an image captured in real time. The target style conversion model is pre-trained and used to convert the image to be processed into a model of the corresponding style type. The target style transfer model can be a GAN network based on the res structure. The target special effect image is an image obtained after being processed by the target style transfer model. The target subject in the target special effect image may be a target theme style, or an image obtained by converting the entire image to be processed into a corresponding target theme style type.

It should also be noted that the image style output by the target style conversion model is consistent with the style type of the training samples used when training the target style conversion model. For example, if the style type of the style image in the training sample is style A, then the target style type is style A, and the target style conversion model outputs an image of style A; if the style type of the style image in the training sample is style B, then, The output of the target style transfer model is a B-style image. That is, the style type output by the target style transfer model is consistent with the style type used during model training.

Among them, the subject attribute can be the gender attribute or style type attribute of the target subject. For example, the gender attribute can be male or female, and the style type attribute can be a preset style type. For example, the target style conversion model wants to achieve Output images of a certain style type under multiple style types, and you can define the style type and gender type corresponding to different tags in the alpha channel. After the image to be processed is acquired, the image to be processed may be processed into an image corresponding to a corresponding style type according to the gender attribute and/or style attribute of the image to be processed. The reason for defining the subject attributes is to avoid the need to train a model suitable for different genders in related technologies, which takes up memory, and realize that after the image to be processed is acquired, the subject attribute of the target subject in the image to be processed can be identified , and then input the subject attribute as information of the alpha channel into the target style transfer model, so as to perform style transfer processing on the image content of different gender attributes in different images to be processed based on a target style transfer model. At the same time, when you want to implement multiple style types, you need to train models of different style types, and there are cases where a model cannot implement multiple style type conversions.

It should also be explained that if you want to convert to a certain style type, you can display multiple style types on the display interface for user selection, and determine the label information in the alpha channel according to the user's selection, so as to achieve style conversion based on a target The model can get the target special effect image under the corresponding style type.

It should also be noted that if you want to obtain the above effects, when performing model training, the input of the model to be trained must not only include the image that needs to be converted to the image style type, but also need to edit the label value of the alpha channel, so that you can execute this The conversion model of the target special effect diagram of the technical scheme.

For example, after the image to be processed is collected, the subject attribute of the target object in the image to be processed may be determined based on the subject attribute identification module deployed in the terminal. Input the subject attributes and the image to be processed into the target style conversion model to obtain the target special effect image that converts the target subject into the target style type, or converts the entire image to be processed into the target special effect image of the target style type.

On the basis of the above technical solution, before inputting the image to be processed into the target style transfer model, it further includes: scaling the image to be processed into an image of a certain size, and the size may be 384*384 pixels. Correspondingly, during model training, the above-mentioned processing method can be performed on the obtained samples. When the technical solution is deployed on the terminal device to process the image, it can quickly respond to the image to be processed, so as to obtain the corresponding target special effect image, which improves the image processing efficiency.

S130. Display the target special effect image in the image display area.

Wherein, the image display area can be understood as an area where the target special effect image is displayed.

For example, after the target special effect image is obtained, the target image may be displayed on the display interface.

Correspondingly, in order to improve the interest and comparison of image display, the image display area can be divided into two areas according to the principle of left and right, or the principle of up and down. One area shows the target special effect image, and the other area shows the original acquired image to be processed. When the user triggers the control for split-screen display or triggers the command for split-screen display, it can be displayed in the above manner.

It should also be noted that if the target video is to be formed, special effect processing can be performed on the image to be processed collected in real time to obtain the target special effect image. A plurality of target special effect images are sequentially spliced to obtain a target video.

In this embodiment, the target style type includes Japanese style, Korean style, ancient costume style, comic style, or at least one of multiple preset style types to be selected. Period costume styles can include styles from any dynasty.

According to the technical solution of the embodiment of the present disclosure, when the image to be processed including the target subject is acquired, the subject attribute of the target subject in the image to be processed can be determined, and the subject attribute and the image to be processed can be used as the input of the pre-trained target style conversion model Parameters, to obtain the target special effect image that converts the target subject in the image to be processed into the target style type, and the image to be processed can be converted into the style type, thereby improving the user experience.

Fig. 2 is a schematic flow chart of an image processing method provided by another embodiment of the present disclosure. On the basis of the foregoing embodiments, firstly introduce how to train the target image generation model to determine the second training sample based on the target image generation model, and then The first style conversion model is obtained through training. On the basis of obtaining the first style conversion model, a target style conversion model can be trained to obtain the target style conversion model. Refer to the detailed description of this technical solution for an example implementation manner. Wherein, technical terms that are the same as or corresponding to those in the foregoing embodiments will not be repeated here.

Described method as shown in Figure 2 comprises:

S210. Acquire a to-be-used image of the third original image under the to-be-selected style type, and crop the to-be-used image to obtain a third style image.

Wherein, the third original image including facial information may be collected in a real environment, or the third original image including facial information downloaded from a webpage, or may be a randomly generated facial image based on a facial image generation model. The image to be used is an image after processing the style of the third original image, for example, it may be an image designed by a designer and corresponding to a certain style type. The third-style image is an image of one or more styles that are hand-painted from the original image. Style types can be various. Various style types are used as the style types to be selected. The image obtained after converting the third original image into a certain style is used as the image to be used. That is, the image to be used is an image of a certain style.

Most of the images in the related art only include facial information, so that the trained model can only process the style of the facial image, so there is a situation that the authenticity of the image is low. Based on this, image cropping processing can be used. The cropping process can be understood as aligning the nose tip and the center of the eyes as reference points, or aligning the center of the chin and the center of the eyes as reference points. Adjust the display ratio of the facial image in the display interface, thereby expanding the cropping range of the image, so that the entire head (including hair) and facial images are displayed on the display interface, and at the same time, images with corresponding background information are also included.

It should be noted that a limited number of third original images and corresponding images to be used are obtained, and then the image generation model to be trained is trained based on the limited third original image and corresponding images to be used to obtain a target image generation model.

S220. Input Gaussian noise into the image generation model to be trained to obtain a third output image.

Wherein, the image generation model to be trained may be a styleganv2 model, and the model parameters in the model are default values at this time. The third output image is an image of a certain style randomly generated based on the image generation model to be trained. That is, the style type of the third output image is indeterminate. Gaussian noise is randomly sampled Gaussian noise. The image generation model to be trained processes the Gaussian noise to obtain third output images of different styles.

In this embodiment, the image generation model to be trained is trained to obtain the target image generation model, which can generate sample data of different style types, so that a model capable of realizing conversion of different style types can be obtained based on the sample data training.

S230. Process the third output image and the third style image based on the first discriminator, determine a loss value, and correct model parameters in the image generation model to be trained based on the loss value.

Wherein, the input of the first discriminator is the third output image and the third style image. The first discriminator is configured to determine a loss value between the third output image and the third style image. The model parameters in the image generation model to be trained can be corrected according to the loss value.

S240. Taking the convergence of the loss function in the image generation model to be trained as a training target to obtain the target image generation model.

For example, the target image generation model is an image generation model obtained through final training. Repeat the above steps based on multiple training samples until the convergence of the loss function is detected, and use the image generation model obtained in this case as the target image generation model.

S250. Process Gaussian noise based on the target image generation model to obtain the second style image.

For example, the randomly sampled Gaussian noise can be processed based on the target image generation model, so as to obtain the second style image for training the first style transfer model. The number of second style images can be as large as possible.

It should be noted that the style types of the second style images may be the same or different. Of course, in order to enable the trained target model to generate images of different style types, the style types of the second style images can be as many and rich as possible.

S260. Add an expression to the second style image based on the expression editing model generated through pre-training, and update the second style image.

In order to improve the richness of samples for training the first style transfer model, the second style image can be processed on the basis of retaining the second style image. The expression editing model may be a model for adding facial expressions to the target subject in the second style image. The facial expression can be an open mouth, and the size of the mouth opening is different; expressions such as smiling and laughing, and the specific expressions are not specifically limited in this embodiment.

It can be understood that: after obtaining the second style image, the second style image can be input into the expression editing model on the basis of retaining the second style image, so as to add expressions to users in the second style image, and obtain expression A second style image with a change in content. Based on the original second-style image and the second-style image with the expression added, training samples for training the first-style model can be obtained.

In the technical solution of the embodiment of the present disclosure, the target image generation model can be obtained based on the training, and the Gaussian noise can be processed to obtain the second style image under various style types; the expression can be added to the second style image based on the expression editing model to obtain the expression The second style image with changed content, so as to obtain the first style conversion model based on the second style image training, and then determine the training samples for training the target style conversion model based on the first style conversion model, so as to obtain the target style conversion model based on the training samples , to avoid the uneven quality of samples in related technologies, resulting in the inability to effectively perform style conversion for the original image, and to perform style conversion on the image to be processed, thereby improving the matching degree between the converted image and the user, Improved user experience.

Fig. 3 is a schematic flow chart of an image processing method provided by another embodiment of the present disclosure. On the basis of the foregoing embodiments, after the second style image is obtained based on the target image generation model, it can be based on the second style image and the real environment The second original image collected in the first style transfer model is trained to obtain the first style transfer model. For an example implementation, please refer to the detailed description of the technical solution. Wherein, technical terms that are the same as or corresponding to those in the foregoing embodiments will not be repeated here.

Described method as shown in Figure 3 comprises:

S310. Determine at least one second style image.

Multiple style types of second style images can be determined based on the foregoing embodiments.

S320. Construct a target model to be trained including a style processing model to be trained, a target discriminator, and a target style comparer.

Wherein, before training to obtain the first style conversion model, it is necessary to construct a target model to be trained first, and train the constructed target model to be trained to obtain a target model to be used. By processing the target model, a first style transfer model is obtained.

It should also be noted that the target discriminator and target style comparer are pre-trained models.

In order to clearly understand the model structure of the target model to be trained and the style processing model to be trained, it can be understood in conjunction with Figure 4. Referring to Fig. 4, the target model to be trained includes a style processing model to be trained, a target discriminator and a target style comparer. The output of the style processing model to be trained is the input of the target discriminator and the target style comparer respectively, based on the output of the target discriminator and the target style comparer, the model parameters in the target model to be trained are corrected to obtain the target to be used Model. Continuing to refer to FIG. 4 , the style processing model to be trained includes: a style feature extractor to be trained, a content feature extractor to be trained, a feature fusion unit to be trained, and a compiler to be trained. The style model to be trained is a GAN (Generative Adversarial Network, Generative Adversarial Network) model based on the starganv2 structure. This model is primarily set up to generate batches of unpaired data, i.e. second style images.

S330. Train the constructed target model to be trained according to multiple second style images of at least one style type to be selected and at least one second original image to obtain a target model to be used.

It should be noted that the input of the target model to be trained is two images, one image is the image collected in the real environment, that is, the second original image; the other image is an image of a certain style generated based on the image generation model. The style types of the multiple second style images may be the same or different. The style type corresponding to the second style image may be used as the style type to be selected. The target model to be used is a model obtained through training based on the second style image and the second original image.

In this embodiment, the specific training method of the target model to be trained based on the second original image and the second style image may be as follows: by combining and processing the second style image and the second original image, multiple second training samples are obtained; Wherein, the second training sample includes a second original image and a second style image; for the second training sample, the content splicing feature of the second original image is obtained based on the content feature extractor to be trained, and based on the style to be trained The feature extractor obtains the style mosaic features of the second style image, performs fusion processing on the content mosaic features and style mosaic features based on the feature fusion model to be trained, obtains the fusion features, and inputs the fusion features into the compiler to be trained to obtain the actual output image; Input the actual output image and the second style image into the target discriminator to determine the first loss value; input the actual output image and the second style image into the target style comparison device to determine the style loss value; based on the first loss value and the style loss value, modify the model parameters in the style processing model to be trained in the target model to be trained, and take the convergence of the loss function in the style processing model to be trained as the training target to obtain the target model to be used.

Wherein, the second style image and the second original image may be randomly combined to obtain a plurality of second training samples. Each training sample includes a second style image and a second original image. The processing method for each second training sample is the same, and the processing of one of the training samples is taken as an example for introduction.

For example, input the second original image and the second style image into the target model to be trained. Obtain the image content of the second original image based on the content feature extractor to be trained, i.e. the content mosaic feature; obtain the style mosaic feature of the second style image based on the style feature extractor to be trained. Based on the feature fusion model to be trained, the content mosaic feature and style mosaic feature are fused to obtain the fusion feature. Compile and process the fusion features based on the compiler to be trained to obtain the actual output image. Wherein, the ideal actual output image should include the image content of the second original image and the style features of the second style image. However, since the model parameters in the style processing model to be trained are default values, there are some differences between the obtained actual image and the ideally obtained image. At this time, it can be based on the target discriminator and target style in the target model to be trained Comparator for processing. Input the actual output image and the second style image corresponding to the actual output image into the target discriminator to obtain the first loss value; at the same time, input the actual output image and the second style image corresponding to the actual output image into For the target style comparator, determine the style loss value. Based on the first loss value and the style loss value, model parameters in the style processing model to be trained can be corrected. Take the convergence of the loss function in the style processing model to be trained as the training target, and obtain the target model to be used.

Exemplarily, the second original image is A, and the second style image is B. After inputting the second original image A and the second style image B into the target model to be trained. The image content of the original image A is obtained based on the content feature extractor to be trained, and at the same time, the style features of the style image B are obtained based on the style feature extractor to be trained; the image content and style features are spliced based on the feature fusion model to be trained, and the actual Output the fused features corresponding to image C. Compile and process the fusion feature based on the compiler to be trained to obtain the actual output image C. Input the actual output image C and the style image B into the target style comparison device to obtain the style loss value; and input the actual output image C and the style image B into the target discriminator to obtain the first loss value. Based on the first loss value and the style loss value, the model parameters in the style processing model to be trained in the target model to be trained are corrected until the loss function in the style processing model to be trained converges, and the target model to be used is obtained.

S340. Use the to-be-trained style processing model trained in the target to-be-used model as the to-be-used style model.

For example, after the target to-be-used model is obtained, the target discriminator and the target style comparer in the target to-be-used model can be eliminated, that is, only the trained style processing model to be trained is retained, so as to obtain the to-be-used style model. In practical applications, after training the style model to be used, an original image and an image of a preferred style type can be randomly input into the style conversion model to be used to obtain a style consistent with a certain style type. image, at this point, the content in the image is consistent with the content in the original image.

S350. Determine a reference image of the target style type, and determine the first style conversion model based on the reference image and the style model to be used.

Wherein, the target style type is the style type finally selected according to the preference of the user. Correspondingly, the reference image is an image consistent with the target style type. The reference image can be bound with the style model to be used, so that after an original image is input, the style features of the reference image are extracted based on the style model to be used, and the image content and style features of the first original image are fused to obtain Images that match the style type. The model after binding the reference image and the style model to be used is used as the first style conversion model.

It should be noted that since the model structure in the first style conversion model is relatively complex, when it is deployed on a mobile terminal device, there may be insufficient computing power. Based on this, the first style conversion model can be deployed on the server , so that the server can perform style conversion processing on the image.

The technical solutions of the embodiments of the present disclosure can generate second style images of various styles based on the target image generation model, and perform training on the target model to be trained based on the second style image and the original image to obtain the target model to be used, and The target model to be used and the pre-selected image of a certain style type are packaged and processed to obtain the first style conversion model. The first style conversion model can convert the input original image into a target special effect image consistent with the packaged style type, so that various collected images can be processed, and the convenience of sample acquisition and image content processing is improved. sex.

Fig. 5 is a schematic flowchart of an image processing method provided by another embodiment of the present disclosure. On the basis of the foregoing embodiments, after obtaining the first style conversion model, the first training sample can be constructed based on the first style conversion model , and then process the to-be-trained style conversion model based on the first training sample to obtain a target style conversion model. For an example implementation, please refer to the detailed description of the technical solution. Wherein, technical terms that are the same as or corresponding to those in the foregoing embodiments will not be repeated here.

As shown in Figure 5, the method includes:

It should be noted that, based on the above, the first style conversion model cannot be directly deployed on terminal devices due to its high requirements on computing power. Based on this, corresponding training samples can be constructed based on the first style conversion model. The trained target style transfer model can be deployed on the terminal device.

S410. Acquire at least one first original image.

Wherein, the first original image may be an image collected by a camera device in a real environment, or an image generated based on a certain image generation model. In order to improve the accuracy of model training, as many first original images as possible can be obtained. The first original image may or may not have a corresponding style.

It should also be noted that various brightness changes may be performed on the first original image, for example, brightness adjustment may be performed on the entire image, or brightness changes may be performed only on the face image in the original image. This allows for a random brightness correction to make the lighting conditions of the trained mesh more random. At the same time, in order to highlight the style conversion effect of the face, the face image pixels in the image can be extracted, and the brightness of the face pixels can be brightened.

S420. For the first original image, stylize the current first original image based on the first style conversion model to obtain a first style image consistent with the target style type.

Wherein, the style type of the first style image is consistent with the target style type. The first style image is generated based on the pre-trained first style transfer model. The style type corresponding to the first style conversion model is bound to the pre-bound image style type, that is, according to the user's preference for multiple style images, one of the style types of images can be selected and the pre-trained target model Binding, so as to obtain the first style conversion model. The first style conversion model can be deployed on the server, so that after receiving the image to be processed, the bound image and the image to be processed can be processed based on the first style conversion model to obtain the target special effect image of the target style type, and display on the client side. However, the calculation amount of this model is very large, and it is not suitable for deployment on terminal devices. Therefore, corresponding training samples can be obtained based on the first style conversion model, and then the above-mentioned target style conversion model can be obtained by training based on the corresponding training samples.

For example, the collected first original image is processed based on the pre-trained first style conversion model to obtain the first style image consistent with the style type of the first style conversion model. Based on this method, multiple training samples are obtained.

In this embodiment, obtaining the corresponding first style image based on the first original image may be: performing content extraction on the current first original image based on the content feature extractor to obtain image content features; based on the style feature The extractor extracts style features from a preset reference style image that is consistent with the target style type to obtain image style features; based on the feature fuser, the image content features and the image style features are fused, Obtaining features to be compiled; obtaining a first style image corresponding to the current first original image based on the processing of the features to be compiled by the compiler.

For example, in conjunction with FIG. 4, after the first original image is input into the first style conversion model, the content of the first original image is obtained based on the content feature extractor, and the style feature of the reference image is obtained based on the style feature extractor. Based on the feature fuser, the image content and style features are fused to obtain the fused features. Input the fusion feature into the compiler to obtain the image of the first original image under the target style type.

S430. Determine the plurality of first training samples based on the first original image and the corresponding first style image.

For example, based on the first style conversion model, the first original image may be subjected to style conversion processing to obtain a first style image of the first original image under the target style type. Based on the first original image and the corresponding first style image, a first training sample is determined.

S440. For the first training sample, use the first original image in the current first training sample and the object attributes of the object to be processed in the original image as input to the style transfer model to be trained, and use the first original image in the current first training sample The style image, as the output of the style model to be trained, is trained to obtain the target style transfer model.

Among them, the model parameters in the style transfer model to be trained are default values. The object attribute may be the gender attribute of the target subject in the original image. The style transfer model to be trained may be trained based on a plurality of first training samples to obtain a target style transfer model.

It should be noted that the processing manner for each first training sample is the same, and the processing of one of the training samples may be used as an example for introduction.

For example, the first training sample includes a corresponding original image and a first style image corresponding to the first original image. Before inputting the first original image into the style conversion model to be trained, the gender attribute of the target object in the first original image can be determined based on the corresponding algorithm, and the gender attribute and the first original image are input into the style conversion model to be trained, which will be compared with the second The first style image corresponding to the original image is used as the output of the style conversion model to be trained, and the style conversion model to be trained is trained to obtain the target style conversion model.

For example, before performing model training based on the first training sample, the size of the image needs to be scaled to a certain size, so as to improve the efficiency of the model for image processing. At the same time, the label whose gender attribute is female is set to 0, and the label whose gender attribute is male is set to 1, and four channels are constructed to train a target style conversion model, which can be used for different gender attributes The image is processed.

S450. Acquire an image to be processed including the target subject.

S460. Input the image to be processed and the subject attribute of the target subject into the target style conversion model to obtain a target special effect image that converts the target subject into the target style type.

S470. Display the target special effect image in the image display area.

The technical solution of the embodiment of the present disclosure can train the target style conversion model deployed on the terminal device, so that when the image to be processed is collected, the style type conversion can be performed on the image based on the client, which improves the convenience of image processing.

FIG. 6 is a schematic structural diagram of an image processing device provided by an embodiment of the present disclosure. The device includes: an image acquisition module 510 to be processed, a special effect image determination module 520 and an image display module 530 .

Wherein, the to-be-processed image acquisition module 510 is set to acquire the to-be-processed image including the target subject; the special effect image determination module 520 is set to input the subject attribute of the to-be-processed image and the target subject into the target style conversion model , to obtain the target special effect image converted from the target subject into the target style type; the image display module 530 is configured to display the target special effect image in the image display area.

On the basis of the above technical solutions, the device includes:

The first training sample acquisition module is configured to acquire a plurality of first training samples; wherein, the first training samples include a first original image and a first style image consistent with the target style type; the first The style image is generated by the first style transfer model;

The first training module is configured to use the first original image in the current first training sample and the object attribute of the object to be processed as the input of the style conversion model to be trained for the first training sample, and use the current first training sample The first style image in the training sample is used as the output of the style model to be trained to obtain the target style conversion model through training; wherein the object attribute matches the subject attribute.

On the basis of the above technical solution, the first training sample acquisition module includes:

a first original image acquisition unit, configured to acquire at least one first original image;

The first style image acquisition unit is configured to stylize the current first original image based on the first style conversion model for the first original image, to obtain a first style image consistent with the target style type;

The first training sample acquisition unit is configured to determine the plurality of first training samples based on the first original image and the corresponding first style image.

On the basis of the above technical solution, the first style conversion model includes a style feature extractor, a content feature extractor, a feature fusion device and a compiler, and the first style image acquisition unit is set to:

Based on the content feature extractor, perform content extraction on the current first original image to obtain image content features; based on the style feature extractor, style a preset reference style image that is consistent with the target style type Feature extraction to obtain image style features; based on the feature fuser, the image content features and the image style features are fused to obtain features to be compiled; based on the compiler to process the features to be compiled, to obtain the same The first style image corresponding to the current first original image.

On the basis of the above technical solutions, the device includes:

The first model construction unit is configured to construct a target model to be trained including a style processing model to be trained, a target discriminator, and a target style comparer; wherein, the target discriminator and the target style comparer are pre-trained The model determination unit to be used is configured to train the target model to be trained according to at least one second style image of at least one style type to be selected and at least one second original image to obtain the target model to be used ; Wherein, the second style image is determined based on the target image generation model; the to-be-used style model determining unit is configured to use the to-be-trained style processing model trained in the target to-be-used model as the to-be-used style model; The first style model determination unit is configured to determine a reference image corresponding to the target style type, and determine the first style conversion model based on the reference image and the style model to be used; wherein, the The target style type is one of the at least one style type to be selected.

On the basis of the above technical solution, the style processing model to be trained includes: a style feature extractor to be trained, a content feature extractor to be trained, a feature fusion device to be trained, and a compiler to be trained, and the model determination unit to be used Set as:

A plurality of second training samples are obtained by randomly combining the second style image and the second original image; wherein, the second training samples include a second original image and a second style image; for the second training samples, obtaining the content splicing features of the second original image based on the content feature extractor to be trained, and obtaining the style splicing features of the second style image based on the style feature extractor to be trained, and obtaining the splicing features of the second style image based on the features to be trained The fusion model fuses the content splicing feature and the style splicing feature to obtain the fusion feature, and inputs the fusion feature into the compiler to be trained to obtain an actual output image; combines the actual output image and the second Inputting the style image into the target discriminator to determine a first loss value, and inputting the actual output image and the second style image into the target style comparer to determine a style loss value; based on the The first loss value and the style loss value are used to modify the model parameters in the style processing model to be trained, and use the convergence of the loss function in the style processing model to be trained as the training target to obtain the target model to be used .

On the basis of the above technical solution, the first style model determining unit is configured to determine a target style type from at least one style type to be selected; obtain a reference image consistent with the target style type, and The reference image is packaged with the style model to be used to obtain the first style conversion model.

On the basis of the above technical solution, the device also includes:

The third style image acquiring unit is configured to acquire a third original image in an image to be used under the style type to be selected, and cut the image to be used to obtain a third style image;

The third output image acquisition unit is configured to input Gaussian noise into the image generation model to be trained to obtain a third output image;

A model parameter correction unit, configured to process the third output image and the third style image based on the first discriminator, determine a loss value, and correct the model parameters in the image generation model to be trained based on the loss value ;

The target image generation model determination unit is configured to use the convergence of the loss function in the image generation model to be trained as the training target to obtain the target image generation model.

On the basis of the above technical solutions, the device includes:

The second style image acquisition unit is configured to process Gaussian noise based on the target image generation model to obtain a second style image;

The second style image updating unit is configured to add expressions to the second style image based on the expression editing model generated through pre-training, and update the second style image.

On the basis of the above technical solution, the target style type includes Japanese style, Korean style, ancient costume style, comic style or multiple preset style types to be selected.

The image processing device provided by the embodiment of the present disclosure can execute the image processing method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.

It is worth noting that the multiple units and modules included in the above-mentioned device are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, the specific names of multiple functional units It is only for the convenience of distinguishing each other, and is not used to limit the protection scope of the embodiments of the present disclosure.

FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. Referring now to FIG. 7 , it shows a schematic structural diagram of an electronic device (such as a terminal device or a server in FIG. 7 ) 500 suitable for implementing an embodiment of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.

As shown in FIG. 7, an electronic device 500 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 501, which may be randomly accessed according to a program stored in a read-only memory (ROM) 502 or loaded from a storage device 508. Various appropriate actions and processes are executed by programs in the memory (RAM) 503 . In the RAM 503, various programs and data necessary for the operation of the electronic device 500 are also stored. The processing device 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An edit/output (I/O) interface 505 is also connected to the bus 504 .

Typically, the following devices can be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 507 such as a computer; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to perform wireless or wired communication with other devices to exchange data. While FIG. 7 shows electronic device 500 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

According to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 509, or from storage means 508, or from ROM 502. When the computer program is executed by the processing device 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.

The electronic device provided by the embodiment of the present disclosure belongs to the same idea as the image processing method provided by the above embodiment, and the technical details not described in detail in this embodiment can be referred to the above embodiment, and this embodiment has the same benefits as the above embodiment Effect.

An embodiment of the present disclosure provides a computer storage medium, on which a computer program is stored, and when the program is executed by a processor, the image processing method provided in the foregoing embodiments is implemented.

It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

In some embodiments, the client and the server can communicate using any currently known or future network protocols such as Hypertext Transfer Protocol (HyperText Transfer Protocol, HTTP), and can communicate with digital data in any form or medium The communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device:

Obtain an image to be processed including a target subject;

The target special effect image is displayed in the image display area.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the first obtaining unit may also be described as "a unit for obtaining at least two Internet Protocol addresses".

The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, [Example 1] provides an image processing method, the method including:

Obtain an image to be processed including a target subject;

The target special effect image is displayed in the image display area.

According to one or more embodiments of the present disclosure, [Example 2] provides an image processing method, the method further includes:

Obtain a plurality of first training samples; wherein, the first training samples include a first original image and a first style image consistent with the target style type; the first style image is generated by the first style conversion model of;

For the first training sample, the first original image in the current first training sample and the object attribute of the object to be processed are used as the input of the style conversion model to be trained, and the first style in the current first training sample is An image, as the output of the style model to be trained, is trained to obtain the target style conversion model;

Wherein, the object attribute matches the subject attribute.

According to one or more embodiments of the present disclosure, [Example 3] provides an image processing method, wherein,

The acquisition of multiple first training samples includes:

acquiring at least one first original image;

For the first original image, stylize the current first original image based on the first style conversion model to obtain a first style image consistent with the target style type;

The plurality of first training samples is determined based on the first original image and the corresponding first style image.

According to one or more embodiments of the present disclosure, [Example 4] provides an image processing method, wherein the first style conversion model includes a style feature extractor, a content feature extractor, a feature fusion unit, and a compiler , the stylization processing of the current first original image based on the first style conversion model to obtain a first style image consistent with the target style type, including:

performing content extraction on the current first original image based on the content feature extractor to obtain image content features;

Based on the style feature extractor, performing style feature extraction on a preset reference style image consistent with the target style type to obtain image style features;

merging the image content features and the image style features based on the feature fuser to obtain features to be compiled;

Based on the compiler processing the features to be compiled, a first style image corresponding to the current first original image is obtained.

According to one or more embodiments of the present disclosure, [Example 5] provides an image processing method, the method further includes:

Constructing a target model to be trained including a style processing model to be trained, a target discriminator, and a target style comparer; wherein, the target discriminator and the target style comparer are pre-trained;

According to multiple second style images of at least one style type to be selected, and at least one second original image, the constructed target model to be trained is trained to obtain the target model to be used; wherein the second style image is determined based on the target image generation model;

Using the style processing model to be trained trained in the target model to be used as the style model to be used;

determining a reference image corresponding to the target style type, and determining the first style conversion model based on the reference image and the style model to be used; wherein the target style type is the at least one One of the style types to be selected.

According to one or more embodiments of the present disclosure, [Example 6] provides an image processing method, wherein the style processing model to be trained includes: a style feature extractor to be trained, a content feature extractor to be trained, The training feature fusion device and the compiler to be trained, the target model to be trained is trained according to the multiple second style images of at least one style type to be selected, and at least one second original image, and the target to be trained model is obtained. Use models, including:

A plurality of second training samples are obtained by randomly combining the second style image and the second original image; wherein, the second training samples include a second original image and a second style image;

For the second training sample, the content splicing feature of the second original image is obtained based on the content feature extractor to be trained, and the style splicing feature of the second style image is obtained based on the style feature extractor to be trained, based on the The feature fusion model to be trained fuses the content splicing features and the style splicing features to obtain the fusion features, and inputs the fusion features into the compiler to be trained to obtain the actual output image;

inputting the actual output image and the second style image into the target discriminator, determining a first loss value, and inputting the actual output image and the second style image into the target style comparison In the device, determine the style loss value;

Based on the first loss value and the style loss value, correcting the model parameters in the style processing model to be trained, taking the convergence of the loss function in the style processing model to be trained as a training target, and obtaining the target The model to be used.

According to one or more embodiments of the present disclosure, [Example 7] provides an image processing method, wherein the determining a reference image corresponding to the target style type is based on the reference image and the determined Describe the style model to be used, and determine the first style conversion model, including:

determining a target style type from at least one style type to be selected;

Acquiring a reference image consistent with the target style type, and encapsulating the reference image with the style model to be used to obtain the first style conversion model.

According to one or more embodiments of the present disclosure, [Example 8] provides an image processing method, the method further includes:

Acquiring the image to be used of the third original image under the style type to be selected, and cutting the image to be used to obtain the image of the third style;

Gaussian noise is input in the image generation model to be trained, obtains the 3rd output image;

Processing the third output image and the third style image based on the first discriminator, determining a loss value, and correcting model parameters in the image generation model to be trained based on the loss value;

Taking the convergence of the loss function in the image generation model to be trained as the training target to obtain the target image generation model.

According to one or more embodiments of the present disclosure, [Example 9] provides an image processing method, the method further includes:

processing Gaussian noise based on the target image generation model to obtain a second style image;

Adding expression to the second style image based on the expression editing model generated through pre-training, and updating the second style image.

According to one or more embodiments of the present disclosure, [Example 10] provides an image processing method, wherein the target style type includes Japanese style, Korean style, ancient costume style, comic style, or a variety of preset Select a style type.

According to one or more embodiments of the present disclosure, [Example Eleven] provides an image processing device, which includes:

In addition, while various operations are depicted in a particular order, this should not be understood as requiring that these operations be performed in the particular order shown or to be performed in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

The technical solution of the embodiment of the present disclosure realizes the conversion of the image to be processed into an image with the target theme style, which improves the richness of the image display content, the novelty of the theme style, and the adaptability between the theme style and the user.

Claims

An image processing method, comprising:

Obtain an image to be processed including a target subject;

Inputting the image to be processed and the subject attributes of the target subject into a target style conversion model to obtain a target special effect image that converts the target subject into a target style type;

The target special effect image is displayed in the image display area.
The method according to claim 1, further comprising:

Obtaining a plurality of first training samples; wherein, the first training samples include a first original image and a first style image consistent with the target style type; the first style image is obtained through the first style conversion model Generated;

For the first training sample, the first original image in the current first training sample and the object attribute of the object to be processed are used as the input of the style conversion model to be trained, and the first style in the current first training sample is An image, as the output of the style model to be trained, is trained to obtain the target style conversion model;

Wherein, the object attribute is consistent with the subject attribute.
The method according to claim 2, wherein said acquiring a plurality of first training samples comprises:

acquiring at least one first original image;

For the first original image, stylize the current first original image based on the first style conversion model to obtain a first style image consistent with the target style type;

The plurality of first training samples is determined based on the first original image and the corresponding first style image.
The method according to claim 3, wherein, the first style conversion model includes a style feature extractor, a content feature extractor, a feature fusion device, and a compiler, and the current first style conversion model is based on the first style conversion model. An original image stylization process to obtain a first style image consistent with the target style type, including:

performing content extraction on the current first original image based on the content feature extractor to obtain image content features;

Based on the style feature extractor, performing style feature extraction on a preset reference style image consistent with the target style type to obtain image style features;

merging the image content features and the image style features based on the feature fuser to obtain features to be compiled;

Based on the compiler processing the features to be compiled, a first style image corresponding to the current first original image is obtained.
The method according to claim 3, further comprising:

Constructing a target model to be trained including a style processing model to be trained, a target discriminator, and a target style comparer; wherein, the target discriminator and the target style comparer are pre-trained;

According to multiple second style images of at least one style type to be selected, and at least one second original image, the constructed target model to be trained is trained to obtain the target model to be used; wherein the second style image is Determined based on the target image generation model;

Using the style processing model to be trained trained in the target model to be used as the style model to be used;

determining a reference image corresponding to the target style type, and determining the first style conversion model based on the reference image and the style model to be used; wherein the target style type is the at least one One of the style types to be selected.
The method according to claim 5, wherein the style processing model to be trained includes: a style feature extractor to be trained, a content feature extractor to be trained, a feature fuser to be trained, and a compiler to be trained, and the A plurality of second style images of the style type to be selected, and at least one second original image, are trained on the target model to be trained to obtain the target model to be used, including:

By randomly combining the second style image and the second original image, a plurality of second training samples are obtained; wherein, the second training samples include a second original image and a second style image;

For the second training sample, the content splicing feature of the second original image is obtained based on the content feature extractor to be trained, and the style splicing feature of the second style image is obtained based on the style feature extractor to be trained, based on The feature fusion model to be trained fuses the content splicing features and the style splicing features to obtain fusion features, and inputs the fusion features into a compiler to be trained to obtain an actual output image;

inputting the actual output image and the second style image into the target discriminator, determining a first loss value, and inputting the actual output image and the second style image into the target style comparison In the device, determine the style loss value;

Based on the first loss value and the style loss value, correcting the model parameters in the style processing model to be trained, taking the convergence of the loss function in the style processing model to be trained as a training target, and obtaining the target The model to be used.
The method according to claim 5, wherein said determining a reference image corresponding to said target style type, and determining said first style transfer model based on said reference image and said style model to be used ,include:

determining a target style type from at least one style type to be selected;

Acquiring a reference image consistent with the target style type, and encapsulating the reference image with the style model to be used to obtain the first style conversion model.
The method according to claim 5, further comprising:

Acquiring the image to be used of the third original image under the style type to be selected, and cutting the image to be used to obtain the image of the third style;

Gaussian noise is input in the image generation model to be trained, obtains the 3rd output image;

Processing the third output image and the third style image based on the first discriminator, determining a loss value, and correcting model parameters in the image generation model to be trained based on the loss value;

Taking the convergence of the loss function in the image generation model to be trained as the training target to obtain the target image generation model.
The method of claim 8, further comprising:

processing Gaussian noise based on the target image generation model to obtain a second style image;

Adding expression to the second style image based on the expression editing model generated through pre-training, and updating the second style image.
The method according to any one of claims 1-9, wherein the target style type includes Japanese style, Korean style, ancient costume style, comic style or multiple preset style types to be selected.
An image processing device, comprising:

The image acquisition module to be processed is configured to acquire the image to be processed including the target subject;

The special effect image determination module is configured to input the image to be processed and the subject attributes of the target subject into the target style conversion model to obtain a target special effect image that converts the target subject into a target style type;

The image display module is configured to display the target special effect image in the image display area.
An electronic device comprising:

one or more processors;

storage means configured to store one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the image processing method according to any one of claims 1-10.
A storage medium containing computer-executable instructions for performing the image processing method according to any one of claims 1-10 when executed by a computer processor.