WO2023138560A1 - 风格化图像生成方法、装置、电子设备及存储介质 - Google Patents

风格化图像生成方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023138560A1
WO2023138560A1 PCT/CN2023/072539 CN2023072539W WO2023138560A1 WO 2023138560 A1 WO2023138560 A1 WO 2023138560A1 CN 2023072539 W CN2023072539 W CN 2023072539W WO 2023138560 A1 WO2023138560 A1 WO 2023138560A1
Authority
WO
WIPO (PCT)
Prior art keywords
style
image
model
stylized
target
Prior art date
Application number
PCT/CN2023/072539
Other languages
English (en)
French (fr)
Inventor
李文越
周财进
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023138560A1 publication Critical patent/WO2023138560A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2024Style variation

Definitions

  • Embodiments of the present disclosure relate to the technical field of image processing, for example, to a stylized image generation method, device, electronic equipment, and storage medium.
  • the process of the server receiving the image, processing the image and feeding back the processing result will bring a large delay, and related algorithms that require large computing resources cannot be directly deployed on the client; at the same time, the obtained image may have flaws.
  • Embodiments of the present disclosure provide a stylized image generation method, device, electronic equipment, and storage medium, which improves the matching of training data and reduces the time delay of stylized image processing.
  • an embodiment of the present disclosure provides a method for generating a stylized image, the method including:
  • each of the initial pairing data includes an original image and an initial style image obtained after the original image is processed by a 3D style generation model;
  • a target style image corresponding to each original image to be processed is obtained, and each original image to be processed and the corresponding target style images as stylized paired data;
  • the embodiment of the present disclosure also provides a device for generating a stylized image, which includes:
  • the style model determination module to be used is configured to determine a plurality of initial paired data, and train a style model to be used based on the plurality of initial paired data; wherein each of the initial paired data includes an original image and an initial style image obtained after the original image is processed by a 3D style generation model;
  • the style image determination module to be used is configured to determine a plurality of original images to be processed from the original images in the plurality of initial pairing data based on preset screening conditions, and process each original image to be processed based on the style model to be used to obtain a style image to be used corresponding to each original image to be processed;
  • the target style image determination module is configured to obtain a target style image corresponding to each original image to be processed by deforming the style image to be used, and use each original image to be processed and the corresponding target style image as stylized pairing data;
  • the target stylized conversion model is set to train the stylized conversion model to be trained based on the stylized paired data to obtain the target stylized conversion model, so that when the video frame to be processed is obtained, the video frame to be processed is stylized based on the target stylized conversion model to obtain the processed target video.
  • an embodiment of the present disclosure further provides an electronic device, and the electronic device includes:
  • the processor When the program is executed by the processor, the processor is made to implement the method for generating a stylized image according to any one of the embodiments of the present disclosure.
  • the embodiments of the present disclosure further provide a storage medium containing computer-executable instructions, the computer-executable instructions are used to execute the method for generating a stylized image as described in any one of the embodiments of the present disclosure when executed by a computer processor.
  • FIG. 1 is a schematic flowchart of a method for generating a stylized image provided in Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic diagram of generating stylized pairing data provided by Embodiment 1 of the present disclosure
  • FIG. 3 is an example of an original image, a style image to be used, and a target style image provided by Embodiment 1 of the present disclosure
  • FIG. 4 is a structural block diagram of a stylized image generation device provided by Embodiment 2 of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by Embodiment 3 of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • the application scenarios may be illustrated first.
  • the embodiments of the present disclosure can be applied to any scene where a special effect video needs to be generated. For example, in the process of shooting a video through a related application, a corresponding pattern can be generated on the display interface based on the user's drawing operation, and then a special effect video containing the 3D model corresponding to the pattern can be generated, and the special effect video can be displayed on the display interface. exhibit.
  • Embodiment 1 is a schematic flowchart of a method for generating a stylized image provided by Embodiment 1 of the present disclosure.
  • This embodiment is applicable to generate more matching training data based on an image processing model in the related art, so as to train a model suitable for a mobile terminal with better image processing effect based on the obtained training data.
  • the method can be executed by a device for generating a stylized image, which can be implemented in the form of software and/or hardware.
  • the scene of special effect video display is usually implemented by the cooperation of the client and the server.
  • the method provided in this embodiment can be executed by the server, the client, or the cooperation of the client and the server.
  • the method of the present embodiment comprises:
  • S110 Determine a plurality of initial paired data, and train based on the plurality of initial paired data to obtain a style model to be used.
  • the device for executing the stylized image generation method may be integrated into application software that supports special effect video processing functions, and the software may be installed in an electronic device.
  • the electronic device may be a mobile terminal or a PC.
  • the application software may be a type of software for image/video processing, as long as the application software can realize image/video processing.
  • the application software can also be a specially developed application program for adding special effects and displaying the special effects, or integrated in the corresponding page, and the user can realize the processing of video frames or specific images through the integrated page on the PC.
  • the algorithm adopted by the application software for generating the target stylized image should be compatible with the mobile terminal.
  • the computing power and computing resources of the mobile side are relatively weak. Therefore, algorithms deployed on the server side and requiring high computing power cannot be directly run on the mobile side. Instead, a lightweight model that is suitable for the computing power of the mobile side needs to be retrained for the mobile platform, so as to achieve the effect of providing services to users with lower latency.
  • a 3-Dimension (3D) style generation model in the related art to generate corresponding training data.
  • 3D 3-Dimension
  • multiple original images including facial information are acquired; each original image is input into a pre-trained 3D style generation model to obtain a corresponding initial style image after processing facial information.
  • the 3D style generation model in the related art may be a model deployed on the server to generate an initial style image.
  • the 3D style generation model may be a model used to generate A model of a 3D game character style image.
  • its input is the original image containing the user’s facial features. It can be understood that the image at least contains the user’s facial features, such as the user’s life photo or ID photo. The visual effect is similar to the characters in the game. With the curse of 3D game character style special effects, the facial features are smoother, clearer and more three-dimensional.
  • the training set used to train the 3D style generation model may contain multiple open-source user facial images with diversity, for example, user avatars containing multiple genders, multiple age groups, multiple character expressions, and multiple visual angles. These images can be obtained from an open-source image database.
  • the model after the model is trained based on the training set, the model can be deployed on the server associated with the application, which will not be repeated in this embodiment.
  • each initial pairing data includes the original image and the initial style image obtained after the original image is processed by the 3D style generation model. It can be understood that in the actual application process, the original image is processed by the 3D style generation model.
  • the image output by the 3D style generation model may only match some of the features in the original image.
  • the two images reflect the user’s gender, posture and other features, but cannot accurately match the finer features of the user’s facial image.
  • the input original image when the user shows a relatively exaggerated expression, there will be a large facial movement.
  • the output initial style image the corresponding feature may not be accurately matched with the part of the user’s facial movement that is relatively large. Therefore, in order to provide better paired training data for the final model to be trained, it is also necessary to train the first style model to be trained based on the initial paired data, so as to obtain the style model to be used.
  • the first style model to be trained can be a peer-to-peer (Peer to Peer, P2P) model deployed on the server, similar to the 3D style generation model, the model also uses the original image including facial information as input, and outputs an image of a specific style type, such as an image of a 3D game character style type.
  • P2P peer-to-peer
  • the image output by the style model to be used after training has a higher degree of matching with the original image.
  • obtain a first style model to be trained for each initial paired data, use the original image in each initial paired data as an input of the first style model to be trained to obtain a first output image corresponding to the original image; determine a loss value based on the first output image and the initial style image corresponding to the original image, so as to correct the model parameters in the first style model to be trained based on the loss value; The convergence of the first loss function in the first style model to be trained is used as the training target to obtain the style model to be used.
  • the existing first style model to be trained can be used to process a large amount of initial pairing data to generate a first output image, that is, an image with a higher degree of matching with the original image input by the user.
  • a loss value between the first output image and the initial style image may be determined.
  • the training error of the first loss function in the first style model to be trained can be used as a condition for detecting whether the first loss function reaches convergence, such as whether the training error is smaller than the preset error or whether the trend of the training error tends to be stable, or whether the current iteration number is equal to the preset number. If the detection meets the convergence condition, for example, the training error of the loss function is smaller than the preset error, or the trend of the training error tends to be stable, it indicates that the training of the first style model to be trained is completed, and the iterative training can be stopped at this time.
  • S120 Determine a plurality of original images to be processed from the original images in the plurality of initial paired data based on preset filtering conditions, and process each original image to be processed based on the style model to be used to obtain a style image to be used corresponding to each original image to be processed.
  • the style model to be used is a P2P model used to generate a style image of a 3D game character
  • the image output by the model is more compatible with the original image than the original style image, but the overall quality of the output image still has a certain gap with the original image, for example, the definition of the output image is lower than the original image. Therefore, in order to obtain more matching data, multiple original images to be processed can be screened from the original images based on preset conditions, so that the style model to be used can only process the screened images.
  • the screening process of the original images will be described below.
  • an original image whose change angle of the part to be adjusted is greater than a preset change angle threshold is determined as the original image to be processed.
  • the parts to be adjusted include facial features, such as the user's eyes, nose, ears, and mouth, etc.
  • the preset change angle thresholds may be respectively set change angles for the user's facial features, for example, angles set respectively for the angle between the upper and lower contours of the eyes, or the degree of curvature of the mouth.
  • relevant image processing software can be used to determine the angles of the user's facial features in the original image, and then compare these angles with the preset change angle threshold.
  • the original image may be determined as the original image to be used.
  • these images can be copied to a specific database, or for these images Like a specific logo.
  • the determined original image to be processed is an image whose expression change range is at least greater than the user's daily expression change range, which can be understood as the corresponding facial image when the user produces a relatively large facial movement and an exaggerated expression.
  • each original image to be processed is input into a style model to be used to obtain a style image to be used corresponding to each original image to be processed.
  • the input of the style model to be used is the facial image corresponding to the user's large facial movements and exaggerated expressions, which is different from the input of the initial 3D style generation model, the corresponding outputs of the two models are also different. It can be understood that in the two output images, there is a large difference in the features corresponding to the user's facial features.
  • these images can be input to the P2P model deployed on the server for generating 3D game character style images for processing, thereby obtaining 3D game character style images with a higher degree of matching with the original images.
  • TPS Thin Plate Spline
  • the input of the deformation algorithm is the matching point pairs of multiple groups of the same parts in the two images. For example, the matching point pairs of the user's mouth in the original image to be processed and the user's mouth in the style image to be used correspond to the output of the mapping coordinates of the same parts of the two images.
  • (x, y) is the arbitrary coordinates of the key points in the original image to be processed.
  • mapping from arbitrary coordinates (x, y) of key points in the original image to be processed to arbitrary coordinates (x i , y i ) after deformation can be obtained by derivation:
  • w is the coefficient matrix
  • U is the basis function
  • the deformation parameters of the mouth part are determined based on the above formula, and based on the deformation parameters, the mouth part in the original image to be processed is attached to the position corresponding to the mouth part in the style image to be used, so as to realize the replacement of the original part of the style image to be used that does not match the user's actual mouth features.
  • the style image to be used after the key points are pasted and adjusted is the target style image.
  • the target style image not only retains the unique facial features of the user in the original image to be processed, but also makes the image present a visual effect of the style of a 3D game character, which has a high degree of matching with the original image to be processed.
  • the target style image after the target style image is obtained, it can be combined with the corresponding original image to be processed, so as to obtain stylized pairing data.
  • the stylized paired data is the data used to train the model actually adopted by the application, for example, the data used to train the image processing model that needs to be deployed on the mobile terminal.
  • the original image after obtaining the original image including the user's face information, the original image can be processed by using the model deployed on the server to generate the 3D game character style image, so as to combine the processing result with the original image to obtain the first paired training data. Based on these training data, the P2P model to be trained that is also used to generate the 3D game character style image can be trained.
  • the output image has a higher matching with the original image, but because the overall quality of the output image is lower than the original image, it is necessary to filter out the original image to be processed in the original image that the user shows a large expression and some parts of the face have a large angle change, and then input these original images to be processed into the trained P2P model to obtain the corresponding style image to be used, that is, the 3D game character style image corresponding to the image with a large expression and a large angle change in some parts of the face. 1. There are still differences between some parts of the face with large angle changes and corresponding parts in the original image to be used.
  • the image processing model deployed on the mobile terminal may be a lightweight stylized conversion model used to generate 3D game character style images. It can be understood that after inputting an image including user facial information into the stylized conversion model for processing, a 3D game character style image with a high degree of matching with the user's facial features can be generated with relatively low computing resources.
  • the stylized conversion model to be trained may be a model based on a Generative Adversarial Network (GAN).
  • GAN Generative Adversarial Network
  • the generation confrontation network can associate a generation network and a discriminant network.
  • the generation network randomly samples from the latent space as input, and its output needs to imitate the real samples in the training set as much as possible.
  • the input of the discriminant network is a real sample and the output of the generation network.
  • the stylized conversion model to be trained in this embodiment is also It can be spliced with a discriminator to be trained, so that the stylized conversion model after parameter correction can regenerate a target stylized image in the subsequent process.
  • multiple lightweight models can be developed in advance, that is, models with fewer network layers and model parameters that are not refined enough.
  • the original image can be processed. Since these models can generate images with a lower signal-to-noise ratio based on a lower number of channels, the latency of image processing can be reduced.
  • a model with better image processing effect and shorter running time can be selected from multiple models as the stylized conversion model to be trained.
  • the discriminator to be trained can be made to satisfy Lipschitz continuity in the subsequent training process, namely
  • K is a constant.
  • K is a constant.
  • K is a constant.
  • K is the smallest constant K that satisfies the above conditions is the Lipschitz constant.
  • SNR performs a singular value decomposition (Singular Value Decomposition, SVD) on the parameters of each layer of the neural network. After the decomposition, the largest singular value is set to 1, that is, after each update of the network parameters, it is divided by the largest singular value. On this basis, the stretch value of each layer of the network in the discriminator will not exceed 1, so that the discriminator to be trained can satisfy Lipschitz continuity during the training process, and the training process of the discriminator to be trained is more stable and easy to receive hold back.
  • SVD singular Value Decomposition
  • the training process of the stylized conversion model to be trained can be as follows: input the original image to be processed in the stylized paired data into the style conversion model to be trained to obtain the second actual output image; input the second actual output image and the target style image in the stylized paired data to the discriminator to be trained to obtain the discrimination result; adjust the model parameters of the style conversion model to be trained and the discriminator to be trained based on the discriminant results and constraints; use the style conversion model to be trained and the loss function convergence of the discriminator to be trained as the training target to obtain the target style conversion model.
  • the existing stylized conversion model to be trained can be used to process a large amount of stylized paired data to generate a second actual output image, that is, an image with a higher degree of matching with the original image input by the user.
  • the loss value between the second actual output image and the target style image in the stylized paired data can be determined based on the discriminator to be trained. It can be understood that the loss value is the discrimination result.
  • the training error of the loss function in the style conversion model to be trained and the discriminator to be trained can be used as a condition for detecting whether the loss function reaches convergence, such as whether the training error is smaller than the preset error or whether the trend of the training error tends to be stable, or whether the current iteration number is equal to the preset number.
  • the iterative training can be stopped at this time. If it is detected that the current convergence condition is not met, other stylized paired data can be obtained to continue training the model and discriminator until the training error of the loss function is within the preset range. It can be understood that when the training error of the loss function converges, the trained target stylization conversion model can be obtained. At this time, after the original image including the user's facial information is input into the model, a visual effect that not only retains the original facial features of the user but also presents the style of the 3D game character can be obtained.
  • the target stylization model can also be deployed to the client, so that when the video frame to be processed is obtained, the target video frame is obtained based on the target stylization model to obtain the target video frame, and the target video is obtained based on all target video frames.
  • the client can be installed in the user's mobile terminal device, and the mobile terminal device generally has the function of collecting the user's facial image, after deploying the target stylized model on the client and detecting that the user triggers the control associated with generating the target stylized model, the facial information in the video captured by the front camera or rear camera of the mobile terminal can be recognized.
  • these video frames can be processed based on the target stylized model to obtain the target video frame.
  • the target stylized model generates a 3D game character style image for the user
  • the user's facial information can also present the visual effect of the 3D game character style type while retaining the original features.
  • the processed video frames are spliced to obtain the target video, which can be presented to the user by displaying the target video on a display interface associated with the application, thereby enhancing the interest of the video content.
  • the trained target stylized model can be directly deployed on the mobile terminal, when processing the user's facial image, the image/video can be directly processed on the mobile terminal. It avoids the cumbersome process of uploading images/videos to the server and receiving the processing results from the server, reduces the lag and delay when the application software presents the target video to the user, and makes the video playback with the target stylized image smoother, thus improving the user experience.
  • FIG. 3 may also be used as an example for illustration.
  • the image can be used as the original image, that is, the first image containing the expression of the user's mouth opening.
  • the original image is input into the ready-to-use style model that has been trained in the server, and the ready-to-use style image of the 3D game character style type can be obtained. It can be seen from the figure that although the image reflects some features of the user's face as a whole and presents the visual effect of the 3D game character style, it does not show the expression of the user opening his mouth.
  • the target style image can be output.
  • the target style image not only reflects some features of the user's face as a whole like the style image to be used, but also presents the visual effect of the 3D game character style, and also reproduces the expression of the user's mouth opening in the image, so that the output image and the original image maintain a high degree of consistency in key features such as facial features, eliminating the differences between the image obtained based on the style model to be used and the original image. Problems with poor image matching.
  • a plurality of initial paired data is determined first, and a style model to be used is obtained through training based on the plurality of initial paired data, a plurality of original images to be processed are determined from the original images in the plurality of initial paired data based on preset screening conditions, and each original image to be processed is processed based on the style model to be used to obtain a style image to be used corresponding to each original image to be processed; a target style image corresponding to each original image to be processed is obtained by deforming the style image to be used, and each original image to be processed and the corresponding target style image are used as stylized paired data, and finally, based on the style
  • the paired data is used to train the stylized conversion model to be trained to obtain the target stylized conversion model, so that when the video frame to be processed is obtained, the stylized processing is performed based on the target stylized conversion model to obtain the processed target video.
  • Fig. 4 is a structural block diagram of a stylized image generation device provided in Embodiment 2 of the present disclosure, which can execute the stylized image generation method provided in any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • the device includes: a to-be-used style model determining module 210 , a to-be-used style image determining module 220 , a target style image determining module 230 and a target stylized conversion model determining module 240 .
  • the style model determination module 210 to be used is configured to determine a plurality of initial paired data, and train a style model to be used based on the plurality of initial paired data; wherein each of the initial paired data includes an original image and an initial style image obtained after the original image is processed by a 3D style generation model.
  • the to-be-used style image determination module 220 is configured to determine a plurality of to-be-processed original images from the original images in the plurality of initial paired data based on preset filtering conditions, and process each to-be-processed original image based on the to-be-used style model to obtain a to-be-used style image corresponding to each to-be-processed original image.
  • the target style image determination module 230 is configured to obtain a target style image corresponding to each original image to be processed by performing deformation processing on the style image to be used, and use each original image to be processed and the corresponding target style image as stylized pairing data.
  • the target stylized conversion model determination module 240 is configured to train the stylized conversion model to be trained based on the stylized paired data to obtain the target stylized conversion model, so that when the video frame to be processed is acquired, stylize the video frame to be processed based on the target stylized conversion model to obtain the processed target video.
  • the to-be-used style model determination module 210 includes an original image acquisition unit and an initial style image generation unit.
  • the original image acquiring unit is configured to acquire multiple original images including facial information.
  • the initial style image generation unit is configured to input each original image into the pre-trained 3D style generation model to obtain the corresponding initial style image after processing the facial information.
  • the module 210 for determining the style model to be used further includes a unit for obtaining a style model to be trained, a first output image generating unit, a loss value determining unit, and a unit for determining a style model to be used.
  • the style model acquisition unit to be trained is configured to acquire the first style model to be trained.
  • the first output image generating unit is configured to, for each initial paired data, use an original image in each initial paired data as an input of the first style model to be trained, and obtain a first output image corresponding to the original image.
  • a loss value determining unit configured to be based on the first output image corresponding to the original image For an initial style image, a loss value is determined, so as to correct model parameters in the first style model to be trained based on the loss value.
  • the to-be-used style model determining unit is configured to take the convergence of the first loss function in the first to-be-trained style model as a training target to obtain the to-be-used style model.
  • the preset screening condition includes that the change angle of the part to be adjusted is greater than a preset change angle threshold.
  • the to-be-used style image determination module 220 is further configured to determine an original image whose change angle of the part to be adjusted in the original image is greater than a preset change angle threshold as the original image to be processed; wherein the part to be adjusted includes facial features.
  • the style image to be used determining module 220 is further configured to input each original image to be processed into the style model to be used to obtain a style image to be used corresponding to each original image; wherein, the style image to be used has different characteristics from the initial style image corresponding to each original image to be processed.
  • the target style image determination module 230 includes a pixel point information determination unit and a target style image generation unit.
  • the pixel point information determining unit is configured to determine the pixel point information of key points in each of the original image to be processed and the style image to be used.
  • the target style image generating unit is configured to determine deformation parameters based on the pixel point information, so as to attach the parts to be adjusted in each original image to be processed to the style image to be used based on the deformation parameters to obtain the target style image.
  • the device for generating a stylized image further includes a parameter adjustment constraint condition setting module.
  • the parameter adjustment constraint setting module is configured to determine the stylized conversion model to be trained of the target grid structure; combine the discriminator to be trained for the stylized conversion model to be trained, and set parameter adjustment constraints for the discriminator to be trained, so as to perform constraint adjustment on the model parameters in the stylized conversion model to be trained and the discriminator to be trained based on the constraints, so as to obtain the target stylized conversion model.
  • the target stylization conversion model determination module 240 includes a second actual output image generation unit, a discrimination result generation unit, a parameter adjustment unit, and a target stylization conversion model determination unit.
  • the second actual output image generating unit is configured to input the original image to be processed in the stylized paired data into the style conversion model to be trained to obtain a second actual output image.
  • the discrimination result generating unit is configured to input the second actual output image and the target style image in the stylized paired data into the discriminator to be trained to obtain a discrimination result.
  • a parameter adjustment unit configured to adjust model parameters in the style conversion model to be trained and the discriminator to be trained based on the discrimination result and the constraint conditions.
  • the target stylized conversion model determination unit is configured to take the convergence of the loss function in the to-be-trained style conversion model and the to-be-trained discriminator as a training target, and obtain the target stylized conversion model.
  • the device for generating a stylized image further includes a model deployment module.
  • the model deployment module is configured to deploy the target stylized model to the client, so that when the video frame to be processed is obtained, the video frame to be processed is stylized based on the target stylized model to obtain the target video frame, and the target video is obtained based on all target video frames.
  • a plurality of initial paired data is determined first, and a style model to be used is obtained through training based on the plurality of initial paired data, a plurality of original images to be processed are determined from the original images in the plurality of initial paired data based on preset screening conditions, and each original image to be processed is processed based on the style model to be used to obtain a style image to be used corresponding to each original image to be processed; a target style image corresponding to each original image to be processed is obtained by deforming the style image to be used, and each original image to be processed and the corresponding target style image are used as stylized paired data.
  • the stylized paired data is used to train the stylized conversion model to be trained to obtain the target stylized conversion model, so that when the video frame to be processed is obtained, the stylized processing is performed based on the target stylized conversion model to obtain the processed target video.
  • the pairing of the final training data is improved, and a lightweight model suitable for the mobile terminal is trained based on the training data.
  • the device for generating a stylized image provided by an embodiment of the present disclosure can execute the method for generating a stylized image provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by Embodiment 3 of the present disclosure.
  • the terminal equipment in the embodiments of the present disclosure may include mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), Tablet PC (Portable Android Device, PAD), Portable Media Player (Portable Media Player, PMP), vehicle-mounted terminal (eg, vehicle-mounted navigation terminal) and other mobile terminals, and fixed terminals such as digital television (television, TV), desktop computer and so on.
  • PDA Personal Digital Assistant
  • Tablet PC Portable Media Player
  • PMP Portable Media Player
  • vehicle-mounted terminal eg, vehicle-mounted navigation terminal
  • fixed terminals such as digital television (television, TV), desktop computer and so on.
  • the electronic device shown in FIG. 5 is just an example.
  • the electronic device 300 may include a processing device (such as a central processing unit, a pattern processor, etc.) 301, and the electronic device 300 may perform various appropriate actions and processes according to a program stored in a read-only memory (Read-Only Memory, ROM) 302 or a program loaded from a storage device 306 into a random access memory (Random Access Memory, RAM) 303.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • the processing device 301, ROM 302, and RAM 303 are connected to each other through a bus 304.
  • An input/output (Input/Output, I/O) interface 305 is also connected to the bus 304 .
  • the following devices can be connected to the I/O interface 305: an editing device 306 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; an output device 307 including, for example, a liquid crystal display (Liquid Crystal Display, LCD), a speaker, a vibrator, etc.; a storage device 308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 309.
  • the communication means 309 may allow the electronic device 300 to perform wireless or wired communication with other devices to exchange data. While FIG. 5 shows electronic device 300 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer readable medium, the computer program including program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 309, or from storage means 306, or from ROM 302.
  • the processing device 301 When the computer program is executed by the processing device 301, the above-mentioned functions in the methods of the embodiments of the present disclosure are executed.
  • the electronic device provided by the embodiment of the present disclosure belongs to the same inventive concept as the method for generating a stylized image provided by the above embodiment.
  • the method for generating a stylized image provided by the above embodiment For technical details not described in detail in this embodiment, please refer to the above embodiment, and this embodiment has the same beneficial effects as the above embodiment.
  • An embodiment of the present disclosure provides a computer storage medium on which a computer program is stored, the program When the program is executed by the processor, the method for generating a stylized image provided by the above embodiment is realized.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • a computer-readable storage medium may be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof.
  • the computer-readable storage medium may include: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory, a read-only memory, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the program code contained on the computer readable medium can be transmitted by any appropriate medium, including: electric wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any appropriate combination of the above.
  • the client and the server can communicate using any currently known or future-developed network protocols such as HyperText Transfer Protocol (HyperText Transfer Protocol, HTTP), and can be interconnected with any form or medium of digital data communication (for example, a communication network).
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include local area networks (Local Area Network, LAN), wide area networks (Wide Area Network, WAN), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device:
  • each of the initial pairing data includes an original image and an initial style image obtained after the original image is processed by a 3D style generation model;
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a LAN or WAN, or it can be connected to an external computer (eg via the Internet using an Internet Service Provider).
  • each block in the flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more executable instructions for implementing specified logical functions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the first obtaining unit may also be described as "a unit for obtaining at least two Internet Protocol addresses".
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Parts
  • SOC System on Chip
  • Complex Programmable Logic Device Complex Programmable Logic Device, CPLD
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may comprise an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • the machine-readable storage medium may include one or more wire-based electrical connections, a portable computer disk, a hard disk, RAM, ROM, erasable programmable read-only memory (EPROM or flash memory), optical fiber, CD-ROM, optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • Example 1 provides a method for generating a stylized image, the method including:
  • each of the initial pairing data includes an original image and an initial style image obtained after the original image is processed by a 3D style generation model;
  • Example 2 provides a method for generating a stylized image, which further includes:
  • Example 3 provides a method for generating a stylized image, which further includes:
  • Example 4 provides a method for generating a stylized image, which further includes:
  • the preset filter condition includes that the change angle of the part to be adjusted is greater than the preset change angle threshold, and the multiple original images to be processed are determined from the original images in the multiple initial paired data based on the preset filter condition, including:
  • the parts to be adjusted include facial features.
  • Example 5 provides a method for generating a stylized image, which further includes:
  • each original image to be processed is input into the style model to be used to obtain a style image to be used corresponding to each original image;
  • the features in the style image to be used are different from those in the initial style image corresponding to each original image to be processed.
  • Example 6 provides a method for generating a stylized image, which further includes:
  • Determining a deformation parameter based on the pixel point information so as to attach the part to be adjusted in each original image to be processed to the style image to be used based on the deformation parameter to obtain the target style image.
  • Example 7 provides a method for generating a stylized image, which further includes:
  • Example 8 provides a method for generating a stylized image, which further includes:
  • the target style conversion model is obtained.
  • Example 9 provides a method for generating a stylized image, which further includes:
  • the target stylization model is deployed to the client, so that when the video frames to be processed are obtained, the video frames to be processed are stylized based on the target stylization model to obtain target video frames, and target videos are obtained based on all target video frames.
  • Example 10 provides a device for generating a stylized image, including:
  • the style model determination module to be used is configured to determine a plurality of initial paired data, and train a style model to be used based on the plurality of initial paired data; wherein each of the initial paired data includes an original image and an initial style image obtained after the original image is processed by a 3D style generation model;
  • the style image determination module to be used is configured to determine a plurality of original images to be processed from the original images in the plurality of initial pairing data based on preset screening conditions, and process each original image to be processed based on the style model to be used to obtain a style image to be used corresponding to each original image to be processed;
  • the target style image determination module is configured to obtain a target style image corresponding to each original image to be processed by deforming the style image to be used, and use each original image to be processed and the corresponding target style image as stylized pairing data;
  • the target stylized conversion model is set to train the stylized conversion model to be trained based on the stylized paired data to obtain the target stylized conversion model, so that when the video frame to be processed is obtained, the basic Perform stylization processing on the video frame to be processed based on the target stylization conversion model to obtain a processed target video.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Graphics (AREA)
  • Architecture (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本公开实施例提供了一种风格化图像生成方法、装置、电子设备及存储介质,该方法包括:确定多个初始配对数据,并基于初始配对数据训练得到待使用风格模型;基于预设筛选条件从原始图像中确定出待处理原始图像,并基于待使用风格模型对待处理原始图像进行处理,得到待使用风格图像;对待使用风格图像进行形变处理,得到目标风格图像,将每个待处理原始图像和所对应的目标风格图像作为风格化配对数据;基于风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型。

Description

风格化图像生成方法、装置、电子设备及存储介质
本公开要求在2022年01月24日提交中国专利局、申请号为202210080456.8的中国专利申请的优先权,该申请的全部内容通过引用结合在本公开中。
技术领域
本公开实施例涉及图像处理技术领域,例如涉及一种风格化图像生成方法、装置、电子设备及存储介质。
背景技术
随着图像处理技术的不断发展,用户可以利用多种应用对图像进行处理,从而使处理后的图像呈现出自己所期望的风格类型。
相关技术中,部署于服务端的相关算法对图像进行处理后,虽然得到相应风格类型的图像,然而,服务端接收图像、处理图像再将处理结果反馈的过程会带来较大的时延,对计算资源需求较大的相关算法又无法直接部署在客户端;同时,所得到图像可能存在瑕疵,例如,可能会出现图像中部分特征与原始图像中的特征无法对应的情况,这就导致算法对图像处理的效果较差,降低了用户的使用体验。
发明内容
本公开实施例提供一种风格化图像生成方法、装置、电子设备及存储介质,提升了训练数据的配对性,减少了风格化图像处理的时延。
第一方面,本公开实施例提供了一种风格化图像生成方法,该方法包括:
确定多个初始配对数据,并基于所述多个初始配对数据训练得到待使用风格模型;其中,每个所述初始配对数据中包括原始图像和所述原始图像经3D风格生成模型处理后得到的初始风格图像;
基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于所述待使用风格模型对每个待处理原始图像进行处理,得到与所述每个待处理原始图像对应的待使用风格图像;
通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格 图像作为风格化配对数据;
基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基于所述目标风格化转换模型对所述待处理视频帧进行风格化处理,得到处理后的目标视频。
第二方面,本公开实施例还提供了一种风格化图像生成装置,该装置包括:
待使用风格模型确定模块,设置为确定多个初始配对数据,并基于所述多个初始配对数据训练得到待使用风格模型;其中,每个所述初始配对数据中包括原始图像和所述原始图像经3D风格生成模型处理后得到的初始风格图像;
待使用风格图像确定模块,设置为基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于所述待使用风格模型对每个待处理原始图像进行处理,得到与所述每个待处理原始图像对应的待使用风格图像;
目标风格图像确定模块,设置为通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格图像作为风格化配对数据;
目标风格化转换模型,设置为基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基于所述目标风格化转换模型对所述待处理视频帧进行风格化处理,得到处理后的目标视频。
第三方面,本公开实施例还提供了一种电子设备,所述电子设备包括:
处理器;
存储装置,用于存储程序,
当所述程序被所述处理器执行,使得所述处理器实现如本公开实施例任一所述的风格化图像生成方法。
第四方面,本公开实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如本公开实施例任一所述的风格化图像生成方法。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开实施例一所提供的一种风格化图像生成方法的流程示意图;
图2为本公开实施例一所提供的生成风格化配对数据的示意图;
图3为本公开实施例一所提供的作为示例的原始图像、待使用风格图像以及目标风格图像;
图4为本公开实施例二所提供的一种风格化图像生成装置的结构框图;
图5为本公开实施例三所提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的一些实施例,然而应当理解的是,本公开可以通过各种形式来实现,提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
在介绍本公开之前,可以先对应用场景进行示例性说明。可以将本公开实施方案应用在任意需要生成特效视频的场景中,例如,可以在通过相关应用拍摄视频的过程中,基于用户的绘制操作在显示界面中生成对应的图案,进而生成包含该图案所对应三维模型的特效视频,并将该特效视频在显示界面中进行 展示。
实施例一
图1为本公开实施例一所提供的一种风格化图像生成方法的流程示意图,本实施例可适用于基于相关技术中的图像处理模型生成更具匹配性的训练数据,从而基于得到的训练数据训练出适用于移动端的、图像处理效果更好的模型的情况,该方法可以由风格化图像生成装置来执行,该装置可以通过软件和/或硬件的形式实现,该硬件可以是电子设备,如移动终端、个人计算机(Personal Computer,PC)端或服务器等。特效视频展示的场景通常是由客户端和服务器来配合实现的,本实施例所提供的方法可以由服务端来执行,客户端来执行,或者是客户端和服务端的配合来执行。
如图1,本实施例的方法包括:
S110、确定多个初始配对数据,并基于多个初始配对数据训练得到待使用风格模型。
本实施例中,执行本公开实施例提供的风格化图像生成方法的装置,可以集成在支持特效视频处理功能的应用软件中,且该软件可以安装至电子设备中,可选的,电子设备可以是移动终端或者PC端等。应用软件可以是对图像/视频处理的一类软件,应用软件只要可以实现图像/视频处理即可。应用软件还可以是专门研发的应用程序,来实现添加特效并将特效进行展示的软件中,亦或是集成在相应的页面中,用户可以通过PC端中集成的页面来实现对视频帧或特定图像的处理。
同时,由于应用软件可以在移动终端上安装并运行,因此,应用软件所采用的用于生成目标风格化图像的算法应与移动终端相适配。示例性的,相对于服务端来说,移动端的计算能力与计算资源都比较弱,因此,部署于服务端的、对算力需求较高的算法无法直接在移动端上运行,而是需要针对移动平台重新训练一个轻量级的、与移动端计算能力相适配的模型,进而实现以更低的时延向用户提供服务的效果。
在本实施例中,为了得到适用于移动端的、图像处理效果更好的模型,首先需要利用相关技术中的三维(3-Dimension,3D)风格生成模型生成对应的训练数据。可选的,获取多幅包括面部信息的原始图像;将每个原始图像输入至预先训练得到的3D风格生成模型中,得到对面部信息处理后的对应的初始风格图像。
本实施例中,相关技术中的3D风格生成模型可以是部署于服务端的、用于生成初始风格图像的模型,在实际应用过程中,3D风格生成模型可以是用于生 成3D游戏人物风格类型的图像的模型。对于3D风格生成模型来说,其输入即是包含有用户面部特征的原始图像,可以理解,该图像中至少包含用户的五官特征,如,用户的生活照或证件照等,对应的,模型的输出即是在保留用户五官原有特征的同时,使用户面部呈现出特定3D风格类型的图像,可以理解为,特定风格类型的3D面部图像,当模型为用于生成3D游戏人物风格类型的图像的模型时,在输出的图像中,用户面部会呈现出与游戏中的角色相似的视觉效果,在3D游戏人物风格特效的加持下,面部的五官更加平滑清晰,也更具立体性。
本领域技术人员应当理解,在用于对3D风格生成模型进行训练的训练集中,可以包含多幅开源的具有多样性的用户面部图像,例如,包含多性别、多年龄段、多种人物表情以及多种视觉角度下的用户头像,这些图像可以从开源的图像数据库中进行获取。在一实施例中,基于训练集将模型训练完毕后,即可将该模型在与应用相关联的服务端中进行部署,本实施例在此不再赘述。
基于此,在每个初始配对数据中即包括原始图像和该原始图像经3D风格生成模型处理后得到的初始风格图像,可以理解,在实际应用过程中,原始图像经3D风格生成模型处理后,所得到的图像在保留原有的用户面部特征的同时,还会使用户面部图像呈现出3D游戏人物风格的视觉效果。
在本实施例中,3D风格生成模型输出的图像可能仅与原始图像中部分特征保持匹配,例如,两幅图像中反映用户性别、姿态等特征的部分匹配,但对于用户面部图像中更加精细的特征部分却无法准确匹配,例如,在所输入的原始图像中,用户在展示较为夸张的表情时,会产生较大的面部动作,此时,在输出的初始风格图像中,相应的特征可能与用户面部动作较大的部位无法准确匹配。因此,为了给最终需要训练的模型提供配对性更好的训练数据,还需要基于初始配对数据对第一待训练风格模型进行训练,从而得到待使用风格模型。
在本实施例中,第一待训练风格模型可以是部署于服务端的对等网络(Peer to Peer,P2P)模型,与3D风格生成模型相似,该模型同样以包括面部信息的原始图像作为输入,并输出特定风格类型的图像,如,3D游戏人物风格类型的图像。可以理解,在训练数据(即初始配对数据)已经存在一定配对性的数据基础下,训练得到的待使用风格模型输出的图像与原始图像具有更高的匹配度。下面对第一待训练风格模型的训练过程进行说明。
可选的,获取第一待训练风格模型;针对每个初始配对数据,将所述每个初始配对数据中的原始图像作为第一待训练风格模型的输入,得到与原始图像相对应的第一输出图像;基于第一输出图像和原始图像所对应的初始风格图像,确定损失值,以基于损失值对第一待训练风格模型中的模型参数进行修正;将 第一待训练风格模型中的第一损失函数收敛作为训练目标,得到待使用风格模型。
可选地,在获取多个初始配对数据后,即可利用现有的第一待训练风格模型对大量的初始配对数据进行处理,生成第一输出图像,即与用户输入的原始图像具有更高匹配度的图像。在一实施例中,在得到第一输出图像后,可以确定出第一输出图像以及初始风格图像之间的损失值。利用损失值对第一待训练风格模型中的模型参数进行修正时,可以将第一待训练风格模型中的第一损失函数的训练误差,即损失参数作为检测第一损失函数是否达到收敛的条件,比如训练误差是否小于预设误差或训练误差的变化趋势是否趋于稳定,或者当前的迭代次数是否等于预设次数。若检测达到收敛条件,比如损失函数的训练误差小于预设误差,或者训练误差的变化趋势趋于稳定,表明第一待训练风格模型训练完成,此时可以停止迭代训练。若检测到当前未达到收敛条件,可以获取其他初始配对数据以对模型继续进行训练,直至损失函数的训练误差在预设范围之内。可以理解,在损失函数的训练误差达到收敛时,即可得到训练完成的待使用风格模型,此时将包括用户面部信息的原始图像输入至模型中后,即可得到与原始图像具有更高匹配度的图像。
S120、基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于待使用风格模型对每个待处理原始图像进行处理,得到与每个待处理原始图像对应的待使用风格图像。
在本实施例中,当待使用风格模型为用于生成3D游戏人物风格类型图像的P2P模型时,模型输出的图像与初始风格图像比,虽然与原始图像具有更高的匹配性,但所输出图像的整体质量却依然与原始图像具有一定的差距,例如,所输出图像的清晰度低于原始图像。因此,为了获取匹配性更高的数据,可以基于预设条件从原始图像中筛选出多个待处理原始图像,从而使待使用风格模型仅对筛选得到的图像进行处理,下面对原始图像的筛选过程进行说明。
可选的,确定所有原始图像中待调整部位的变化角度大于预设变化角度阈值的原始图像作为待处理原始图像。示例性地,待调整部位包括五官部位,例如,用户的眼睛、鼻子、耳朵以及嘴巴等,预设变化角度阈值可以是针对用户五官分别设置的变化角度,例如,针对眼睛上下轮廓之间的夹角,或者嘴巴弯曲的程度分别设置的角度。在实际应用过程中,可以利用相关图像处理软件来确定原始图像中用户五官的角度,进而将这些角度与预设变化角度阈值进行比对。当确定五官角度大于对应的预设变化角度阈值时,可以将该原始图像确定为待使用原始图像。在实际应用过程中,在确定出待使用原始图像后,为了便于后续模型的使用,可以将这些图像拷贝至特定的数据库中,或者,为这些图 像打上特定的标识。
在本实施例中,当将用户日常表情中无关的角度作为预设变化角度时,所确定的待处理原始图像即是表情变化幅度至少大于用户日常表情变化幅度的图像,可以理解为,用户产生较大面部动作、表情较夸张时所对应的面部图像。
在一实施例中,在筛选得到待使用原始图像后,将每个待处理原始图像输入至待使用风格模型中,得到与每个待处理原始图像所对应的待使用风格图像。其中,由于待使用风格模型的输入为用户产生较大面部动作、表情较夸张时所对应的面部图像,与初始的3D风格生成模型的输入存在差异,因此,两个模型对应的输出也不相同,可以理解为,在两种输出的图像中,与用户五官部分对应的特征存在较大的差异。
示例性的,当得到用户产生较大面部动作的图像作为待使用原始图像后,即可将这些图像输入至部署于服务端的、用于生成3D游戏人物风格类型图像的P2P模型中进行处理,从而得到与原始图像具有更高匹配度的3D游戏人物风格类型的图像。
S130、通过对待使用风格图像进行形变处理,得到与每个待处理原始图像所对应的目标风格图像,并将每个待处理原始图像和所对应的目标风格图像作为风格化配对数据。
在本实施例中,所得到的待使用风格图像与待使用原始图像虽然具有更高的匹配度,但是,图像中与用户产生较大面部动作的部位对应的特征,与原始图像中对应的部位依然无法准确匹配,例如,在待使用原始图像中,用户的嘴角有较大幅度的上扬,而在得到的待使用风格图像中,虽然使用户面部图像呈现出3D游戏人物风格类型的视觉效果,嘴角位置却没有如对应的待使用原始图像向上产生较大幅度的上扬。因此,为了给应用最终采用的、用于向用户提供服务的模型构建出质量更高的训练样本,还需要对待使用风格图像进行形变处理。在实际应用过程中,可以利用薄板样条变换(Thin Plate Spline,TPS)对待使用风格图像进行处理,其中,TPS属于一种非刚性形变,该形变算法的输入为两幅图像中多组相同部位的匹配点对,例如,待处理原始图像中用户嘴部与待使用风格图像中用户嘴部的匹配点对,对应的,输出为两幅图像的相同部位的映射坐标。下面对形变处理的过程进行说明。
可选的,确定每个待处理原始图像和待使用风格图像中关键点的像素点信息;基于像素点信息,确定形变参数,以基于形变参数将每个待处理原始图像中的待调整部位贴附至待使用风格图像中,得到目标风格图像。
由于TPS的目标是求解一个函数f,使得f(Pi)=Pi'(1≤i≤n),且弯曲能量 函数最小,此时,待使用风格图像上的点可以通过插值得到很好的校正。将形变函数作为弯折一块钢板的过程时,使这块钢板穿过给定的n个点,弯折钢板所需的能量可以表示为:
其中,(x,y)即是待处理原始图像中的关键点的任意坐标。
可以证明TPS的插值函数即是弯曲能量最小的函数:
通过推导即可得到待处理原始图像中关键点的任意坐标(x,y)到形变后的任意坐标(xi,yi)的映射:
其中,w为系数矩阵,U为基函数。
示例性的,在得到待使用风格图像后,首先需要确定待处理原始图像和待使用风格图像中关键点的像素点信息,即在两幅图像中,确定出与产生较大上扬幅度的嘴巴部分相对应的像素点信息。可选的,基于上述公式确定出嘴巴部分的形变参数,并基于形变参数将待处理原始图像中的嘴巴部分贴附至待使用风格图像中与嘴巴部分对应的位置,以实现对待使用风格图像中原有的、与用户实际嘴部特征并不匹配的部分的替换。
可以理解,将关键点部分进行贴附调整后的待使用风格图像即是目标风格图像,目标风格图像不仅保留了待处理原始图像中用户独特的面部特征,还使图像呈现出3D游戏人物风格类型的视觉效果,与待处理原始图像具有较高的匹配度。
在本实施例中,得到目标风格图像后,即可将其与相应的待处理原始图像进行结合,从而得到风格化配对数据。其中,风格化配对数据即是用于对应用实际采用的模型进行训练的数据,例如,用于对需要在移动端进行部署的图像处理模型进行训练的数据。
为了以更具整体性的方式说明确定风格化配对数据的过程,下面结合图2 对这一过程进行说明。
参见图2,当获取到包括用户面部信息的原始图像后,可以先利用部署于服务端的用于生成3D游戏人物风格类型图像的模型对原始图像进行处理,从而将处理结果与原始图像进行结合,得到初版成对的训练数据,基于这些训练数据,可以对同样用于生成3D游戏人物风格类型图像的待训练P2P模型进行训练。可以理解,P2P模型训练完毕后,其输出的图像与原始图像具有更高的配对性,但由于输出的图像整体质量低于原始图像,因此还需要在原始图像中筛选出用户呈现出大表情、面部一些部位出现大角度变化的待处理原始图像,进而将这些待处理原始图像输入至训练完毕的P2P模型中,得到对应的待使用风格图像,即大表情、面部一些部位出现大角度变化的图像所对应的3D游戏人物风格类型的图像,在这些图像中,大表情、面部一些部位出现大角度变化的部分与待使用原始图像中相应的部分依然存在差异。
继续参见图2,为了提高待使用风格图像与相应的原始图像的配对性,还需要基于TPS技术对待使用风格图像中与用户大表情、面部一些部位出现大角度变化的部分进行形变,例如,待处理原始图像中用户嘴角出现较大幅度的上扬时,可以基于TPS技术对待使用风格图像中相应的部分进行形变,在得到形变参数后,即可将原始图像中嘴部区域贴附至待使用图像中,从而得到与待处理原始图像配对性较高的目标风格图像,最后,将目标风格图像与相应的待处理原始图像进行结合,即构建出终版成对的训练数据,利用这些具有较高配对性的训练数据,即可对应用最终使用的目标风格化转换模型进行训练。
需要说明的是,在实际应用过程中,在移动端进行部署的图像处理模型可以是轻量级的、用于生成3D游戏人物风格类型图像的风格化转换模型,可以理解,将包括用户面部信息的图像输入至风格化转换模型进行处理后,可以以较低的计算资源生成与用户面部特征匹配度较高的、3D游戏人物风格类型的图像。
基于上述说明可以确定,在利用风格化配对数据对风格化转换模型进行训练之前,需要先确定目标网格结构的待训练风格化转换模型;为待训练风格化转换模型拼接待训练判别器,并为待训练判别器设置参数调整约束条件,以基于约束条件对待训练风格化转换模型和待训练判别器中的模型参数进行约束调整,以得到目标风格化转换模型。
在实际应用过程中,待训练风格化转换模型可以是基于生成对抗网络(Generative Adversarial Network,GAN)的模型。其中,生成对抗网络可以关联一个生成网络和一个判别网络。生成网络从潜在空间中随机采样作为输入,其输出结果需要尽量模仿训练集中的真实样本,判别网络的输入为真实样本以及生成网络的输出。基于此可以理解,本实施例中的待训练风格化转换模型也 可以与一个待训练判别器进行拼接,从而在后续过程中使参数修正后的风格化转换模型重新生成一幅目标风格化图像。
需要说明的是,为了使待训练风格化模型与移动端的计算能力相适应,可以预先开发出多个轻量级的模型,即具有较少的网络层级、模型参数也不够精细化的模型,将这些模型部署于移动端后即可对原始图像进行处理。由于这些模型可以基于较低的通道数生成信噪比较低的图像,因此能够降低图像处理的时延。当所有模型都输出与原始图像相对应的处理结果后,基于图像处理效果以及模型运行时长等因素,即可从多个模型中筛选出一个图像处理效果较好且运行时间较短的模型作为待训练风格化转换模型。
在本实施例中,由于普通的P2P模型在3D游戏人物风格类型下进行训练时很容易产生大量的错误数据,这正是因为待训练判别器的不稳定造成的。因此,为了提升后续模型训练过程的稳定性,还需要为待训练判别器设置参数调整约束条件,例如,谱范数正则(Spectral Norm Regularization,SNR),该技术从每层神经网络的参数矩阵的谱范数角度引入正则约束,使神经网络对输入扰动具有较好的非敏感性,从而使后续的训练过程更稳定、更易收敛。
可选的,当为待训练判别器设置的参数调整约束条件为SNR时,可以使待训练判别器在后续的训练过程中满足利普希茨连续性,即
其中,K为一个常数,对于函数f来说,满足上述条件下最小的常数K即是利普希茨常数。在本实施例中,SNR将神经网络的每一层的参数都做一个奇异值分解(Singular Value Decomposition,SVD),分解之后将最大的奇异值设置为1,即在每一次更新网络参数后都除以最大的奇异值,在此基础上,判别器中每一层网络的拉伸值都不会超过1,从而使待训练判别器在训练过程中满足利普希茨连续性,待训练判别器的训练过程更稳定、更易收敛。
S140、基于风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基于目标风格化转换模型对待处理视频帧进行风格化处理,得到处理后的目标视频。
在本实施例中,对待训练风格化转换模型的训练过程,可以是将风格化配对数据中的待处理原始图像输入至待训练风格转换模型中,得到第二实际输出图像;将第二实际输出图像和风格化配对数据中的目标风格图像输入至待训练判别器中,得到判别结果;基于判别结果和约束条件对待训练风格转换模型和待训练判别器中模型参数进行调整;将待训练风格转换模型和待训练判别器中的损失函数收敛作为训练目标,得到目标风格化转换模型。
可选的,在获取多组风格化配对数据后,即可利用现有的待训练风格化转换模型对大量的风格化配对数据进行处理,生成第二实际输出图像,即与用户输入的原始图像具有更高匹配度的图像。示例性的,在得到第二实际输出图像后,可以基于待训练判别器确定第二实际输出图像以及风格化配对数据中的目标风格图像之间的损失值,可以理解,该损失值即为判别结果。利用损失值在预先设置的谱范数正则约束条件下对待训练风格转换模型和待训练判别器中的模型参数进行修正时,可以将待训练风格转换模型和待训练判别器中的损失函数的训练误差,即损失参数作为检测损失函数是否达到收敛的条件,比如训练误差是否小于预设误差或训练误差的变化趋势是否趋于稳定,或者当前的迭代次数是否等于预设次数。若检测达到收敛条件,比如损失函数的训练误差小于预设误差,或者训练误差的变化趋势趋于稳定,表明待训练风格转换模型和待训练判别器训练完成,此时可以停止迭代训练。若检测到当前未达到收敛条件,可以获取其他风格化配对数据以对模型和判别器继续进行训练,直至损失函数的训练误差在预设范围之内。可以理解,在损失函数的训练误差达到收敛时,即可得到训练完成的目标风格化转换模型,此时将包括用户面部信息的原始图像输入至模型中后,即可得到既保留用户原有面部特征、又呈现出3D游戏人物风格类型的视觉效果。
需要说明的是,在得到目标风格化模型后,还可以将目标风格化模型部署到客户端中,以在获取到待处理视频帧时,基于目标风格化模型对待处理视频帧进行风格化处理,得到目标视频帧,以基于所有目标视频帧得到目标视频。
由于客户端可以安装于用户的移动终端设备中,而移动终端设备又普遍具备采集用户面部图像的功能,因此,在将目标风格化模型在客户端进行部署,并检测到用户触发与生成目标风格化模型相关联的控件之后,可以对移动终端前置摄像头或后置摄像头所采集的视频中的面部信息进行识别,当识别出用户面部信息后,即可将对应的视频片段分割为多个视频帧,或者,直接将用户手动导入应用的视频分割为多个视频帧。
在一实施例中,获取到多个视频帧后,即可基于目标风格化模型对这些视频帧进行处理,得到目标视频帧,可以理解,当目标风格化模型为用户生成3D游戏人物风格类型的图像时,在目标视频帧中,用户面部信息在保留原有特征的同时,还可以同时呈现出3D游戏人物风格类型的视觉效果。最后,将处理后的视频帧进行拼接,即可得到目标视频,通过将目标视频显示于应用相关联的显示界面中,即可将其呈现给用户,增强了视频内容的趣味性。
需要说明的是,由于训练得到的目标风格化模型可以直接部署于移动端,因此,在对用户面部图像进行处理时,可以直接在移动端对图像/视频进行处理, 避免了将图像/视频上传至服务端、再接收服务端处理结果的繁琐过程,降低了应用软件将目标视频向用户进行呈现时的卡顿与时延,使具有目标风格化图像的视频播放更加流畅,从而提高了用户的使用体验。
在本实施例中,为了更加清楚地体现原始图像、待使用风格模型输出的待使用风格图像以及目标风格化模型输出的目标风格图像三者之间的差异,还可以以图3为例进行说明。
参见图3,当用户利用移动终端的摄像头采集到包含面部信息的图像后,可以将该图像作为原始图像,即第一幅包含有用户张嘴表情的图像。在一实施例中,将原始图像输入至服务端中已经训练完毕的待使用风格模型中,可以得到3D游戏人物风格类型的待使用风格图像,由图可知,该图像虽然从整体上体现出用户面部的部分特征,并呈现出3D游戏人物风格的视觉效果,却并未展现出用户张嘴的表情,也即是说,在待使用风格图像中,出现了嘴巴对应部位与原始图像不一致的情况。
继续参见图3,在基于本实施例的方案训练得到目标风格化模型,并将其部署于移动端对原始图像进行处理后,则可以输出目标风格图像,由图可知,目标风格图像不仅如待使用风格图像一般从整体上体现出用户面部的部分特征,并呈现出3D游戏人物风格的视觉效果,还在图像中复现出用户张嘴的表情,从而使输出的图像与原始图像在五官等关键特征上保持较高的一致性,消除了基于待使用风格模型得到的图像与原始图像匹配度较差的问题。
本实施例,先确定多个初始配对数据,并基于多个初始配对数据训练得到待使用风格模型,基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于待使用风格模型对每个待处理原始图像进行处理,得到与每个待处理原始图像对应的待使用风格图像;通过对待使用风格图像进行形变处理,得到与每个待处理原始图像所对应的目标风格图像,并将每个待处理原始图像和所对应的目标风格图像作为风格化配对数据,最后,基于风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,从而在获取到待处理视频帧时,基于目标风格化转换模型对待处理视频帧进行风格化处理,得到处理后的目标视频,通过引入待使用风格模型并对待使用风格图像进行形变处理,提升了最终得到的训练数据的配对性,基于训练数据训练出适用于移动端的轻量级模型,避免了图像数据在客户端与服务端之间流转的繁琐过程,减少了风格化图像处理的时延,提升了用户的使用体验。
实施例二
图4为本公开实施例二所提供的一种风格化图像生成装置的结构框图,可执行本公开任意实施例所提供的风格化图像生成方法,具备执行方法相应的功能模块和有益效果。如图4所示,该装置包括:待使用风格模型确定模块210、待使用风格图像确定模块220、目标风格图像确定模块230以及目标风格化转换模型确定模块240。
待使用风格模型确定模块210,设置为确定多个初始配对数据,并基于所述多个初始配对数据训练得到待使用风格模型;其中,每个所述初始配对数据中包括原始图像和所述原始图像经3D风格生成模型处理后得到的初始风格图像。
待使用风格图像确定模块220,设置为基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于所述待使用风格模型对每个待处理原始图像进行处理,得到与所述每个待处理原始图像对应的待使用风格图像。
目标风格图像确定模块230,设置为通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格图像作为风格化配对数据。
目标风格化转换模型确定模块240,设置为基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基于所述目标风格化转换模型对所述待处理视频帧进行风格化处理,得到处理后的目标视频。
在上述实施方案的基础上,待使用风格模型确定模块210包括原始图像获取单元、初始风格图像生成单元。
原始图像获取单元,设置为获取多幅包括面部信息的原始图像。
初始风格图像生成单元,设置为将每个原始图像输入至预先训练得到的3D风格生成模型中,得到对面部信息处理后的对应的初始风格图像。
在上述实施方案的基础上,待使用风格模型确定模块210还包括待训练风格模型获取单元、第一输出图像生成单元、损失值确定单元以及待使用风格模型确定单元。
待训练风格模型获取单元,设置为获取第一待训练风格模型。
第一输出图像生成单元,设置为针对每个初始配对数据,将所述每个初始配对数据中的原始图像作为所述第一待训练风格模型的输入,得到与所述原始图像相对应的第一输出图像。
损失值确定单元,设置为基于所述第一输出图像和所述原始图像所对应的 初始风格图像,确定损失值,以基于所述损失值对所述第一待训练风格模型中的模型参数进行修正。
待使用风格模型确定单元,设置为将所述第一待训练风格模型中的第一损失函数收敛作为训练目标,得到所述待使用风格模型。
在上述实施方案的基础上,所述预设筛选条件包括待调整部位的变化角度大于预设变化角度阈值。
可选的,待使用风格图像确定模块220,还设置为确定所述原始图像中待调整部位的变化角度大于预设变化角度阈值的原始图像作为待处理原始图像;其中,待调整部位包括五官部位。
可选的,待使用风格图像确定模块220,还设置为将每个待处理原始图像输入至所述待使用风格模型中,得到与每个原始图像所对应的待使用风格图像;其中,所述待使用风格图像与所述每个待处理原始图像所对应的初始风格图像中的特征不同。
在上述实施方案的基础上,目标风格图像确定模块230包括像素点信息确定单元以及目标风格图像生成单元。
像素点信息确定单元,设置为确定所述每个待处理原始图像和所述待使用风格图像中关键点的像素点信息。
目标风格图像生成单元,设置为基于所述像素点信息,确定形变参数,以基于所述形变参数将所述每个待处理原始图像中的待调整部位贴附至所述待使用风格图像中,得到所述目标风格图像。
在上述实施方案的基础上,风格化图像生成装置还包括参数调整约束条件设置模块。
参数调整约束条件设置模块,设置为确定目标网格结构的待训练风格化转换模型;为所述待训练风格化转换模型拼接待训练判别器,并为所述待训练判别器设置参数调整约束条件,以基于所述约束条件对所述待训练风格化转换模型和所述待训练判别器中的模型参数进行约束调整,以得到所述目标风格化转换模型。
在上述实施方案的基础上,目标风格化转换模型确定模块240包括第二实际输出图像生成单元、判别结果生成单元、参数调整单元以及目标风格化转换模型确定单元。
第二实际输出图像生成单元,设置为将所述风格化配对数据中的待处理原始图像输入至所述待训练风格转换模型中,得到第二实际输出图像。
判别结果生成单元,设置为将所述第二实际输出图像和所述风格化配对数据中的目标风格图像输入至所述待训练判别器中,得到判别结果。
参数调整单元,设置为基于所述判别结果和所述约束条件对所述待训练风格转换模型和所述待训练判别器中模型参数进行调整。
目标风格化转换模型确定单元,设置为将所述待训练风格转换模型和所述待训练判别器中的损失函数收敛作为训练目标,得到所述目标风格化转换模型。
在上述实施方案的基础上,风格化图像生成装置还包括模型部署模块。
模型部署模块,设置为将所述目标风格化模型部署到客户端中,以在获取到待处理视频帧时,基于所述目标风格化模型对所述待处理视频帧进行风格化处理,得到目标视频帧,以基于所有目标视频帧得到目标视频。
本实施例,先确定多个初始配对数据,并基于多个初始配对数据训练得到待使用风格模型,基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于待使用风格模型对每个待处理原始图像进行处理,得到与每个待处理原始图像对应的待使用风格图像;通过对待使用风格图像进行形变处理,得到与每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格图像作为风格化配对数据,最后,基于风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,从而在获取到待处理视频帧时,基于目标风格化转换模型对待处理视频帧进行风格化处理,得到处理后的目标视频,通过引入待使用风格模型并对待使用风格图像进行形变处理,提升了最终得到的训练数据的配对性,基于训练数据训练出适用于移动端的轻量级模型,避免了图像数据在客户端与服务端之间流转的繁琐过程,减少了风格化图像处理的时延,提升了用户的使用体验。
本公开实施例所提供的风格化图像生成装置可执行本公开任意实施例所提供的风格化图像生成方法,具备执行方法相应的功能模块和有益效果。
值得注意的是,上述装置所包括的各个单元和模块只是按照功能逻辑进行划分的,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分。
实施例三
图5为本公开实施例三所提供的一种电子设备的结构示意图。下面参考图5,示出了适于用来实现本公开实施例的电子设备(例如图5中的终端设备或服务器)300的结构示意图。本公开实施例中的终端设备可以包括诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、 平板电脑(Portable Android Device,PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(television,TV)、台式计算机等等的固定终端。图5示出的电子设备仅仅是一个示例。
如图5所示,电子设备300可以包括处理装置(例如中央处理器、图案处理器等)301,电子设备300可以根据存储在只读存储器(Read-Only Memory,ROM)302中的程序或者从存储装置306加载到随机访问存储器(Random Access Memory,RAM)303中的程序而执行各种适当的动作和处理。在RAM 303中,还存储有电子设备300操作所需的各种程序和数据。处理装置301、ROM 302以及RAM 303通过总线304彼此相连。输入/输出(Input/Output,I/O)接口305也连接至总线304。
通常,以下装置可以连接至I/O接口305:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的编辑装置306;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置307;包括例如磁带、硬盘等的存储装置308;以及通信装置309。通信装置309可以允许电子设备300与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备300,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,该计算机程序产品包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置309从网络上被下载和安装,或者从存储装置306被安装,或者从ROM 302被安装。在该计算机程序被处理装置301执行时,执行本公开实施例的方法中的上述功能。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开实施例提供的电子设备与上述实施例提供的风格化图像生成方法属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的有益效果。
实施例四
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程 序被处理器执行时实现上述实施例所提供的风格化图像生成方法。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质可以包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器、只读存储器、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
确定多个初始配对数据,并基于所述多个初始配对数据训练得到待使用风格模型;其中,每个所述初始配对数据中包括原始图像和所述原始图像经3D风格生成模型处理后得到的初始风格图像;
基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于所述待使用风格模型对每个待处理原始图像进行处理,得到与所述每个待处理原始图像对应的待使用风格图像;
通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格图像作为风格化配对数据;
基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基于所述目标风格化转换模型对所述待处理视频帧进行风格化处理,得到处理后的目标视频。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括LAN或WAN—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable  Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质可以包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、CD-ROM、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,【示例一】提供了一种风格化图像生成方法,该方法包括:
确定多个初始配对数据,并基于所述多个初始配对数据训练得到待使用风格模型;其中,每个所述初始配对数据中包括原始图像和所述原始图像经3D风格生成模型处理后得到的初始风格图像;
基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于所述待使用风格模型对每个待处理原始图像进行处理,得到与所述每个待处理原始图像对应的待使用风格图像;
通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格图像作为风格化配对数据;
基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基于所述目标风格化转换模型对所述待处理视频帧进行风格化处理,得到处理后的目标视频。
根据本公开的一个或多个实施例,【示例二】提供了一种风格化图像生成方法,还包括:
可选的,获取多幅包括面部信息的原始图像;
将每个原始图像输入至预先训练得到的3D风格生成模型中,得到对面部信息处理后的对应的初始风格图像。
根据本公开的一个或多个实施例,【示例三】提供了一种风格化图像生成方法,还包括:
可选的,获取第一待训练风格模型;
针对所述多个初始配对数据,将所述每个初始配对数据中的原始图像作为所述第一待训练风格模型的输入,得到与所述原始图像相对应的第一输出图像;
基于所述第一输出图像和所述原始图像所对应的初始风格图像,确定损失值,以基于所述损失值对所述第一待训练风格模型中的模型参数进行修正;
将所述第一待训练风格模型中的第一损失函数收敛作为训练目标,得到所述待使用风格模型。
根据本公开的一个或多个实施例,【示例四】提供了一种风格化图像生成方法,还包括:
可选的,所述预设筛选条件包括待调整部位的变化角度大于预设变化角度阈值,所述基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,包括:
确定所述原始图像中待调整部位的变化角度大于预设变化角度阈值的原始图像作为待处理原始图像;
其中,待调整部位包括五官部位。
根据本公开的一个或多个实施例,【示例五】提供了一种风格化图像生成方法,还包括:
可选的,将每个待处理原始图像输入至所述待使用风格模型中,得到与每个原始图像所对应的待使用风格图像;
其中,所述待使用风格图像与所述每个待处理原始图像所对应的初始风格图像中的特征不同。
根据本公开的一个或多个实施例,【示例六】提供了一种风格化图像生成方法,还包括:
可选的,确定所述每个待处理原始图像和所述待使用风格图像中关键点的像素点信息;
基于所述像素点信息,确定形变参数,以基于所述形变参数将所述每个待处理原始图像中的待调整部位贴附至所述待使用风格图像中,得到所述目标风格图像。
根据本公开的一个或多个实施例,【示例七】提供了一种风格化图像生成方法,还包括:
可选的,确定目标网格结构的待训练风格化转换模型;
为所述待训练风格化转换模型拼接待训练判别器,并为所述待训练判别器 设置参数调整约束条件,以基于所述约束条件对所述待训练风格化转换模型和所述待训练判别器中的模型参数进行约束调整,以得到所述目标风格化转换模型。
根据本公开的一个或多个实施例,【示例八】提供了一种风格化图像生成方法,还包括:
可选的,将所述风格化配对数据中的待处理原始图像输入至所述待训练风格转换模型中,得到第二实际输出图像;
将所述第二实际输出图像和所述风格化配对数据中的目标风格图像输入至所述待训练判别器中,得到判别结果;
基于所述判别结果和所述约束条件对所述待训练风格转换模型和所述待训练判别器中模型参数进行调整;
将所述待训练风格转换模型和所述待训练判别器中的损失函数收敛作为训练目标,得到所述目标风格化转换模型。
根据本公开的一个或多个实施例,【示例九】提供了一种风格化图像生成方法,还包括:
可选的,将所述目标风格化模型部署到客户端中,以在获取到待处理视频帧时,基于所述目标风格化模型对所述待处理视频帧进行风格化处理,得到目标视频帧,以基于所有目标视频帧得到目标视频。
根据本公开的一个或多个实施例,【示例十】提供了一种风格化图像生成装置,包括:
待使用风格模型确定模块,设置为确定多个初始配对数据,并基于所述多个初始配对数据训练得到待使用风格模型;其中,每个所述初始配对数据中包括原始图像和所述原始图像经3D风格生成模型处理后得到的初始风格图像;
待使用风格图像确定模块,设置为基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于所述待使用风格模型对每个待处理原始图像进行处理,得到与所述每个待处理原始图像对应的待使用风格图像;
目标风格图像确定模块,设置为通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格图像作为风格化配对数据;
目标风格化转换模型,设置为基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基 于所述目标风格化转换模型对所述待处理视频帧进行风格化处理,得到处理后的目标视频。

Claims (20)

  1. 一种风格化图像生成方法,包括:
    确定多个初始配对数据,并基于所述多个初始配对数据训练得到待使用风格模型;其中,每个所述初始配对数据中包括原始图像和所述原始图像经三维3D风格生成模型处理后得到的初始风格图像;
    基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于所述待使用风格模型对每个待处理原始图像进行处理,得到与所述每个待处理原始图像对应的待使用风格图像;
    通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格图像作为风格化配对数据;
    基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基于所述目标风格化转换模型对所述待处理视频帧进行风格化处理,得到处理后的目标视频。
  2. 根据权利要求1所述的方法,其中,所述确定多个初始配对数据,包括:
    获取多幅包括面部信息的原始图像;
    将每个原始图像输入至预先训练得到的3D风格生成模型中,得到对面部信息处理后的对应的初始风格图像。
  3. 根据权利要求1所述的方法,其中,所述基于所述多个初始配对数据训练得到待使用风格模型,包括:
    获取第一待训练风格模型;
    针对每个初始配对数据,将所述每个初始配对数据中的原始图像作为所述第一待训练风格模型的输入,得到与所述原始图像相对应的第一输出图像;
    基于所述第一输出图像和所述原始图像所对应的初始风格图像,确定损失值,以基于所述损失值对所述第一待训练风格模型中的模型参数进行修正;
    将所述第一待训练风格模型中的第一损失函数收敛作为训练目标,得到所述待使用风格模型。
  4. 根据权利要求1所述的方法,其中,所述预设筛选条件包括待调整部位的变化角度大于预设变化角度阈值,所述基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,包括:
    确定所述原始图像中待调整部位的变化角度大于预设变化角度阈值的原始图像作为待处理原始图像;
    其中,待调整部位包括五官部位。
  5. 根据权利要求1所述的方法,其中,所述基于所述待使用风格模型对所述多个待处理原始图像进行处理,得到待使用风格图像,包括:
    将每个待处理原始图像输入至所述待使用风格模型中,得到与每个原始图像所对应的待使用风格图像;
    其中,所述待使用风格图像与所述每个待处理原始图像所对应的初始风格图像中的特征不同。
  6. 根据权利要求1所述的方法,其中,所述通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,包括:
    确定所述每个待处理原始图像和所述待使用风格图像中关键点的像素点信息;
    基于所述像素点信息,确定形变参数,以基于所述形变参数将所述每个待处理原始图像中的待调整部位贴附至所述待使用风格图像中,得到所述目标风格图像。
  7. 根据权利要求1所述的方法,在所述基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型之前,所述方法还包括:
    确定目标网格结构的待训练风格化转换模型;
    为所述待训练风格化转换模型拼接待训练判别器,并为所述待训练判别器设置参数调整约束条件,以基于所述约束条件对所述待训练风格化转换模型和所述待训练判别器中的模型参数进行约束调整,以得到所述目标风格化转换模型。
  8. 根据权利要求7所述的方法,其中,所述基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,包括:
    将所述风格化配对数据中的待处理原始图像输入至所述待训练风格转换模型中,得到第二实际输出图像;
    将所述第二实际输出图像和所述风格化配对数据中的目标风格图像输入至所述待训练判别器中,得到判别结果;
    基于所述判别结果和所述约束条件对所述待训练风格转换模型和所述待训练判别器中模型参数进行调整;
    将所述待训练风格转换模型和所述待训练判别器中的损失函数收敛作为训练目标,得到所述目标风格化转换模型。
  9. 根据权利要求1所述的方法,还包括:
    将所述目标风格化模型部署到客户端中,以在获取到待处理视频帧时,基于所述目标风格化模型对所述待处理视频帧进行风格化处理,得到目标视频帧,以基于所有目标视频帧得到目标视频。
  10. 一种风格化图像生成装置,包括:
    待使用风格模型确定模块,设置为确定多个初始配对数据,并基于所述多个初始配对数据训练得到待使用风格模型;其中,每个所述初始配对数据中包括原始图像和所述原始图像经3D风格生成模型处理后得到的初始风格图像;
    待使用风格图像确定模块,设置为基于预设筛选条件从多个初始配对数据中的原始图像中确定出多个待处理原始图像,并基于所述待使用风格模型对每个待处理原始图像进行处理,得到与所述每个待处理原始图像对应的待使用风格图像;
    目标风格图像确定模块,设置为通过对所述待使用风格图像进行形变处理,得到与所述每个待处理原始图像所对应的目标风格图像,并将所述每个待处理原始图像和所对应的目标风格图像作为风格化配对数据;
    目标风格化转换模型,设置为基于所述风格化配对数据对待训练风格化转换模型进行训练,得到目标风格化转换模型,以在获取到待处理视频帧时,基于所述目标风格化转换模型对所述待处理视频帧进行风格化处理,得到处理后的目标视频。
  11. 根据权利要求10所述的装置,其中,所述待使用风格模型确定模块包括:
    原始图像获取单元,设置为获取多幅包括面部信息的原始图像;
    初始风格图像生成单元,设置为将每个原始图像输入至预先训练得到的3D风格生成模型中,得到对面部信息处理后的对应的初始风格图像。
  12. 根据权利要求10所述的装置,其中,所述待使用风格模型确定模块包括:
    待训练风格模型获取单元,设置为获取第一待训练风格模型;
    第一输出图像生成单元,设置为针对每个初始配对数据,将所述每个初始配对数据中的原始图像作为所述第一待训练风格模型的输入,得到与所述原始图像相对应的第一输出图像;
    损失值确定单元,设置为基于所述第一输出图像和所述原始图像所对应的初始风格图像,确定损失值,以基于所述损失值对所述第一待训练风格模型中 的模型参数进行修正;
    待使用风格模型确定单元,设置为将所述第一待训练风格模型中的第一损失函数收敛作为训练目标,得到所述待使用风格模型。
  13. 根据权利要求10所述的装置,其中,所述预设筛选条件包括待调整部位的变化角度大于预设变化角度阈值,
    所述待使用风格图像确定模块,还设置为确定所述原始图像中待调整部位的变化角度大于预设变化角度阈值的原始图像作为待处理原始图像;其中,待调整部位包括五官部位。
  14. 根据权利要求10所述的装置,其中,所述待使用风格图像确定模块,还设置为将每个待处理原始图像输入至所述待使用风格模型中,得到与每个原始图像所对应的待使用风格图像;
    其中,所述待使用风格图像与所述每个待处理原始图像所对应的初始风格图像中的特征不同。
  15. 根据权利要求10所述的装置,其中,所述目标风格图像确定模块,包括:
    像素点信息确定单元,设置为确定所述每个待处理原始图像和所述待使用风格图像中关键点的像素点信息;
    目标风格图像生成单元,设置为基于所述像素点信息,确定形变参数,以基于所述形变参数将所述每个待处理原始图像中的待调整部位贴附至所述待使用风格图像中,得到所述目标风格图像。
  16. 根据权利要求10所述的装置,还包括:
    参数调整约束条件设置模块,设置为确定目标网格结构的待训练风格化转换模型;为所述待训练风格化转换模型拼接待训练判别器,并为所述待训练判别器设置参数调整约束条件,以基于所述约束条件对所述待训练风格化转换模型和所述待训练判别器中的模型参数进行约束调整,以得到所述目标风格化转换模型。
  17. 根据权利要求16所述的装置,其中,所述目标风格化转换模型确定模块包括:
    第二实际输出图像生成单元,设置为将所述风格化配对数据中的待处理原始图像输入至所述待训练风格转换模型中,得到第二实际输出图像;
    判别结果生成单元,设置为将所述第二实际输出图像和所述风格化配对数据中的目标风格图像输入至所述待训练判别器中,得到判别结果;
    参数调整单元,设置为基于所述判别结果和所述约束条件对所述待训练风格转换模型和所述待训练判别器中模型参数进行调整;
    目标风格化转换模型确定单元,设置为将所述待训练风格转换模型和所述待训练判别器中的损失函数收敛作为训练目标,得到所述目标风格化转换模型。
  18. 根据权利要求10所述的装置,还包括:
    模型部署模块,设置为将所述目标风格化模型部署到客户端中,以在获取到待处理视频帧时,基于所述目标风格化模型对所述待处理视频帧进行风格化处理,得到目标视频帧,以基于所有目标视频帧得到目标视频。
  19. 一种电子设备,所述电子设备包括:
    处理器;
    存储装置,用于存储程序,
    当所述程序被所述处理器执行,使得所述处理器实现如权利要求1-9中任一所述的风格化图像生成方法。
  20. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-9中任一所述的风格化图像生成方法。
PCT/CN2023/072539 2022-01-24 2023-01-17 风格化图像生成方法、装置、电子设备及存储介质 WO2023138560A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210080456.8A CN114419300A (zh) 2022-01-24 2022-01-24 风格化图像生成方法、装置、电子设备及存储介质
CN202210080456.8 2022-01-24

Publications (1)

Publication Number Publication Date
WO2023138560A1 true WO2023138560A1 (zh) 2023-07-27

Family

ID=81276553

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072539 WO2023138560A1 (zh) 2022-01-24 2023-01-17 风格化图像生成方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114419300A (zh)
WO (1) WO2023138560A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692346A (zh) * 2024-01-31 2024-03-12 浙商银行股份有限公司 基于谱正则化变分自编码器的消息阻塞预测方法及装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419300A (zh) * 2022-01-24 2022-04-29 北京字跳网络技术有限公司 风格化图像生成方法、装置、电子设备及存储介质
CN115082298A (zh) * 2022-07-15 2022-09-20 北京百度网讯科技有限公司 图像生成方法、装置、电子设备以及存储介质
CN115439307B (zh) * 2022-08-08 2023-06-27 荣耀终端有限公司 风格转换方法、风格转换模型的生成方法和风格转换系统
CN117576265B (zh) * 2024-01-15 2024-05-28 腾讯科技(深圳)有限公司 风格图像的生成方法、装置、计算机设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373999A1 (en) * 2017-06-26 2018-12-27 Konica Minolta Laboratory U.S.A., Inc. Targeted data augmentation using neural style transfer
CN113409342A (zh) * 2021-05-12 2021-09-17 北京达佳互联信息技术有限公司 图像风格迁移模型的训练方法、装置及电子设备
CN113658324A (zh) * 2021-08-03 2021-11-16 Oppo广东移动通信有限公司 图像处理方法及相关设备、迁移网络训练方法及相关设备
CN114419300A (zh) * 2022-01-24 2022-04-29 北京字跳网络技术有限公司 风格化图像生成方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373999A1 (en) * 2017-06-26 2018-12-27 Konica Minolta Laboratory U.S.A., Inc. Targeted data augmentation using neural style transfer
CN113409342A (zh) * 2021-05-12 2021-09-17 北京达佳互联信息技术有限公司 图像风格迁移模型的训练方法、装置及电子设备
CN113658324A (zh) * 2021-08-03 2021-11-16 Oppo广东移动通信有限公司 图像处理方法及相关设备、迁移网络训练方法及相关设备
CN114419300A (zh) * 2022-01-24 2022-04-29 北京字跳网络技术有限公司 风格化图像生成方法、装置、电子设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692346A (zh) * 2024-01-31 2024-03-12 浙商银行股份有限公司 基于谱正则化变分自编码器的消息阻塞预测方法及装置

Also Published As

Publication number Publication date
CN114419300A (zh) 2022-04-29

Similar Documents

Publication Publication Date Title
WO2023138560A1 (zh) 风格化图像生成方法、装置、电子设备及存储介质
WO2022068487A1 (zh) 风格图像生成方法、模型训练方法、装置、设备和介质
CN111476871B (zh) 用于生成视频的方法和装置
WO2022105862A1 (zh) 视频生成及显示方法、装置、设备、介质
WO2023160513A1 (zh) 3d素材的渲染方法、装置、设备及存储介质
WO2023061169A1 (zh) 图像风格迁移和模型训练方法、装置、设备和介质
CN111866483B (zh) 颜色还原方法及装置、计算机可读介质和电子设备
WO2023138549A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2022171024A1 (zh) 图像显示方法、装置、设备及介质
WO2023125181A1 (zh) 图像处理方法、装置、电子设备和存储介质
WO2023051244A1 (zh) 图像生成方法、装置、设备及存储介质
WO2023232056A1 (zh) 图像处理方法、装置、存储介质及电子设备
WO2021088790A1 (zh) 用于目标设备的显示样式调整方法和装置
WO2023138498A1 (zh) 生成风格化图像的方法、装置、电子设备及存储介质
WO2024037556A1 (zh) 图像处理方法、装置、设备及存储介质
WO2023072015A1 (zh) 人物风格形象图的生成方法、装置、设备及存储介质
WO2023030381A1 (zh) 三维人头重建方法、装置、设备及介质
WO2023143126A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2023273697A1 (zh) 图像处理方法、模型训练方法、装置、电子设备及介质
CN112785669B (zh) 一种虚拟形象合成方法、装置、设备及存储介质
CN113902636A (zh) 图像去模糊方法及装置、计算机可读介质和电子设备
CN113284206A (zh) 信息获取方法及装置、计算机可读存储介质、电子设备
WO2023202543A1 (zh) 文字处理方法、装置、电子设备及存储介质
WO2023143118A1 (zh) 图像处理方法、装置、设备及介质
CN110717467A (zh) 头部姿势的估计方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23742875

Country of ref document: EP

Kind code of ref document: A1