CN113592708A

CN113592708A - Image processing method and device

Info

Publication number: CN113592708A
Application number: CN202110951407.2A
Authority: CN
Inventors: 文为
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2021-11-02

Abstract

The embodiment of the invention provides an image processing method and device, which comprise the following steps: acquiring an image to be processed and an animation video to be embedded; performing semantic segmentation on the image to be processed to obtain a local feature image with semantic categories; inputting the style of the animation video into a corresponding style migration model according to the semantic category of the local characteristic image to obtain a local animation style image; and fusing the local animation style images to obtain animation style images, and embedding the animation style images into the animation video. According to the embodiment of the invention, the stylized processing is carried out on the image to be processed according to the semantic category to obtain the local cartoon style image with different styles, and then the local cartoon style image is fused to obtain the cartoon style image.

Description

Image processing method and device

Technical Field

Embodiments of the present invention relate to the field of image technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium.

Background

Image style migration is to convert an image from one style to another style with the content unchanged, so that the image is more vivid and interesting, for example, the image can be converted into an animation style image or a hand-drawing style image.

At present, algorithms corresponding to stylization processing of images are available, but the algorithms are stylization processing applied to the whole image, so that specific objects in the image cannot be stylized effectively, and the stylization of all objects in one image is the same, for example, the styles of people and backgrounds are the same, but the styles of different objects in one image are different or different, and the stylization effect of the image is not good.

Disclosure of Invention

Embodiments of the present invention provide an image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium, so as to implement corresponding stylization processing on each local feature image of an image, thereby achieving a better stylization effect. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided an image processing method, including:

acquiring an image to be processed and an animation video to be embedded;

performing semantic segmentation on the image to be processed to obtain a local feature image with semantic categories;

inputting the style of the animation video into a corresponding style migration model according to the semantic category of the local characteristic image to obtain a local animation style image;

and fusing the local animation style images to obtain animation style images.

And embedding the cartoon style image into the cartoon video.

Optionally, before the local feature image is input to the corresponding style migration model according to the semantic category of the local feature image to obtain the local animation style image, the method further includes:

acquiring sample data of each semantic category, wherein the sample data comprises a sample image and a target cartoon style image;

respectively inputting the sample images into a style migration model to be trained corresponding to the semantic categories to obtain a style image for predicting animation;

determining a loss value of the style migration model to be trained corresponding to each semantic category according to a difference value between the predicted animation style image and the target animation style image;

and adjusting the model parameters of the style migration model to be trained according to the loss value to obtain the trained style migration model corresponding to each semantic category.

Optionally, the semantic categories include at least one of characters, items, animals, and backgrounds, the characters including close-up characters.

acquiring portrait sample data, wherein the portrait sample data comprises a portrait sample image and a style image of target portrait animation;

inputting the portrait sample image into a style migration model to be trained to obtain a forecast portrait cartoon style image, wherein the forecast portrait cartoon style image is obtained by dividing the portrait sample image into portrait local feature images with corresponding portrait semantic categories through the style migration model, generating the portrait local area cartoon style images corresponding to the portrait semantic categories and then fusing the portrait local feature images;

determining a loss value of the style migration model to be trained according to a difference value between the predicted portrait cartoon style image and the target portrait cartoon style image;

and adjusting the model parameters of the style migration model to be trained according to the loss value to obtain the portrait style migration model after training.

Optionally, the portrait semantic categories include at least one of hair, skin, and clothing.

Optionally, the obtaining the local animated style image by inputting the semantic category of the local feature image to the corresponding style migration model includes:

when the semantic category of the local feature image is a person, determining the image proportion of the local feature image in the image to be processed;

and when the image proportion reaches a preset image proportion, determining that the semantic category of the local characteristic image is a close-up character, and inputting the local characteristic image into a character style migration model to obtain a character local animation style image.

In a second aspect of the present invention, there is also provided an image processing apparatus comprising:

the image acquisition module is used for acquiring an image to be processed and an animation video to be embedded;

the semantic segmentation module is used for performing semantic segmentation on the image to be processed to obtain a local feature image with a semantic category;

the stylized processing module is used for inputting the style of the animation video into a corresponding style migration model according to the semantic category of the local characteristic image to obtain a local animation style image;

the image fusion module is used for fusing the local animation style image to obtain an animation style image;

and the style image embedding module is used for embedding the cartoon style image into the cartoon video.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute any of the image processing methods described above.

In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the image processing methods described above.

According to the image processing method provided by the embodiment of the invention, the image to be processed and the animation video to be embedded are obtained, the image to be processed is subjected to semantic segmentation to obtain the local feature image with the semantic category, then the local feature image is input to the corresponding style migration model according to the semantic category of the local feature image to obtain the local animation style image, the local animation style image is fused to obtain the animation style image corresponding to the object to be processed, and the animation style image is embedded into the animation video. According to the embodiment of the invention, each local characteristic image of the image to be processed is input into the corresponding style migration model according to the semantic category to be stylized, so that the local animation style image with different styles is obtained, and then the animation style image is obtained by fusion.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flowchart illustrating steps of an image processing method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating semantic segmentation of an image to be processed according to an embodiment of the present invention;

FIG. 3 is a stylized representation of a close-up portrait provided in an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating the process of embedding an advertisement image into a cartoon provided in the embodiment of the present invention;

fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram of an electronic device provided in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In a specific implementation, a common image style migration is to convert a real scene into an animation style, for example, convert a photo taken from a real scene into an image in the animation style.

In the operation process of a company, a to-be-processed image can be embedded in a media object, wherein the media object can be cartoon video (including animation or cartoon) and the like, and the to-be-processed image can be an advertisement image. Taking the animation video as an example, the advertisement image is embedded in the animation video, so that at present, many advertisement images based on the real scene appear to be beyond the format of the animation video if the advertisement image of the real scene is directly used, but the advertisement image of the animation style is made for a certain animation video, which is time-consuming and labor-consuming.

In view of the above problems, the core concept of the embodiments of the present invention is to disclose an image processing method, which can automatically generate an animation style image of a style corresponding to a media object such as an animation video to be embedded, and then embed the animation style image into the animation video, and is simple and practical, and by performing batch production on advertisement images in the media object, the advertisement images of the existing real scene are fully utilized, and a lot of manpower and material resources are saved.

In addition, considering that although an algorithm corresponding to the stylization of the image is available at the present stage, the algorithm itself is a stylized processing applied to the whole image, and cannot effectively perform corresponding stylized processing to different objects in the image, therefore, the image processing method disclosed in the embodiment of the present invention performs semantic segmentation to different local areas in the image, such as different people, animals, articles, backgrounds, and the like, to obtain local area images with semantic categories, performs corresponding stylized processing to the local area images according to the semantic categories to obtain local animated style images, and finally fuses the local animated style images to obtain a complete animated style image, so that different objects in the animated style images have corresponding styles, the stylization effect is better.

Referring to fig. 1, which is a flowchart illustrating steps of an image processing method provided in an embodiment of the present invention, as shown in fig. 1, the method may specifically include the following steps:

step 101, acquiring an image to be processed and an animation video to be embedded.

The to-be-processed image may be an image in a media object, specifically, the media object may be an animation, a cartoon, a movie or other media product, and the to-be-processed image is an image to be embedded in the media product, for example, an advertisement image. In addition, the image to be processed may also be an image acquired from a network or a local device, or an image captured by a user through a device such as a smart phone, a tablet computer, or a camera, which is not limited in this embodiment of the present invention. The animation video may be a media object with a non-realistic style, such as an animation, a cartoon, an animation, and the like, which is not limited by the embodiment of the present invention.

For example, when the animation video is played, an advertisement image to be embedded into the animation video, or a sticker image to be embedded into the animation video, or the like, may be the image to be processed according to the embodiment of the present invention.

And 102, performing semantic segmentation on the image to be processed to obtain a local feature image with semantic categories.

Wherein the semantic category may include at least one of a person, an article, an animal and a background, and the person may further include a close-up person, specifically, the close-up person refers to a person occupying a larger area of the image to be processed, for example, occupying 30% or more of the image to be processed.

In semantic segmentation, the image is classified at a pixel level, and pixels belonging to the same category in the image are classified into one category, for example, pixels belonging to a person are classified into a person, pixels belonging to a motorcycle are classified into an article, and besides, pixels which cannot be classified into a category, for example, the content of the image except for the person, the article and the animal, can be uniformly classified into a background. As a specific example, semantic segmentation may employ CNN (Convolutional Neural Networks) or FCN (full Convolutional Neural Networks).

In the embodiment of the invention, the image to be processed is subjected to semantic segmentation, so that local characteristic images with corresponding semantic categories, such as local characteristic images respectively corresponding to people, articles, animals and backgrounds, can be obtained. Specifically, in the embodiment of the present invention, the image to be processed may be subjected to semantic segmentation by using FCN, and MRF (Markov Random Field) and CRF (Conditional Random Field) are added to optimize the segmentation result, so as to obtain a local area image of the image to be processed, and obtain a semantic category corresponding to the local area image.

Referring to fig. 2, a schematic diagram of semantic segmentation of an image to be processed according to the present invention is shown, where the left side is the image to be processed, after performing the semantic segmentation on FCN and performing the CRF and MRF optimization, a local area image corresponding to each semantic category on the right side, that is, a local area image corresponding to a person (a person on a motorcycle, a person behind a vehicle, and a person on a motorcycle), an article (a vehicle, a motorcycle), and a background (contents other than the person and the article) can be output.

For the local feature image of the person, if the area of the local feature image corresponding to the person occupying the image to be processed reaches a preset image proportion, for example, reaches 30%, the semantic category of the local feature image can be further identified as a close-up person.

And 103, inputting the style of the animation video into a corresponding style migration model according to the semantic category of the local characteristic image to obtain a local animation style image.

In specific implementation, different animation videos have different styles, such as budding, hot blood, laugh and the like, and the embodiment of the invention can respectively train corresponding style migration models for each style.

In the embodiment of the invention, the style migration models corresponding to the styles are respectively trained in advance according to different semantic categories, for example, the style migration models corresponding to the characters, the articles, the animals, the backgrounds and the close-up characters can be respectively trained according to the different semantic categories, after the local feature images with the semantic categories are obtained by performing semantic segmentation on the image to be processed, the local feature images can be input to the corresponding style migration models according to the semantic categories, and thus the local cartoon style images are obtained.

For example, assuming that the semantic segmentation of the image to be processed obtains local feature images corresponding to a person, an article, an animal and a background, the local feature image of the person may be input to a style transition model corresponding to the person, the local feature image of the article may be input to the style transition model corresponding to the article, the local feature image of the animal may be input to the style transition model corresponding to the animal, and the local feature image of the background may be input to the style transition model corresponding to the background, where if the person is a close-up person, the local feature image of the close-up person may be input to the style transition model corresponding to the close-up person.

And 104, fusing the local animation style images to obtain animation style images.

In the embodiment of the invention, the local cartoon style images are fused to obtain the cartoon style images comprising one or more styles. Specifically, when fusing the local animation style images, all the local animation style images are fused by adopting a progressive fusion mode for different local animation style images, so that a complete animation style image is obtained.

And 105, embedding the cartoon style image into the cartoon video.

In the embodiment of the present invention, the image to be processed may be an advertisement image to be embedded into the media object, for example, a real shot advertisement image to be embedded into the animation video, and therefore, after the image to be processed is converted into an animation style image of another style, the image to be processed is embedded into the corresponding media object by adopting other ways such as an adhesive type, a middle insertion type, and the like, for example, the animation style image converted into the animation style may be embedded into the animation video.

According to the image processing method, the image to be processed is obtained, the image to be processed is subjected to semantic segmentation to obtain a local feature image with a semantic category, then the local feature image is input to a corresponding style migration model according to the semantic category of the local feature image to obtain a local cartoon style image, the local cartoon style image is fused to obtain a cartoon style image corresponding to the object to be processed, and the cartoon style image is embedded into a cartoon video. According to the embodiment of the invention, each local characteristic image of the image to be processed is input into the corresponding style migration model according to the semantic category to be stylized, so that the local animation style image with different styles is obtained, and then the animation style image is obtained by fusion.

In an exemplary embodiment of the present invention, before the step 103, inputting the semantic category of the local feature image into the corresponding style migration model to obtain the locally animated style image, the method may further include:

Specifically, sample data corresponding to each semantic category, for example, sample data corresponding to a person, an article, an animal and a background, are collected in advance, where a sample image refers to an original image of a migration-required style, and may be an image acquired from a network or a local device, or an image captured by a device such as a smartphone, a tablet computer or a camera, for example, a landscape image captured by the camera; the style image of the target animation is an image with a style different from that of the sample image, but the image details are similar to those of the sample image, and can be obtained by manual drawing or other methods, for example, the style image of the animation style corresponding to one animal image is animation-style.

For example, a sample image M1 is obtained in advance, and a style transition is performed on the sample image M1 to obtain a target cartoon style image M2 of the cartoon style, wherein details of the sample image M1 and the target cartoon style image M2 are the same, for example, if there is a person in the sample image M1, there is a corresponding person in the target cartoon style image M2, and only styles of the two are different.

Specifically, the sample image corresponding to each semantic category is input into a style migration model to be trained, such as cycleGAN, to obtain a predictive animation style image of the sample image, then calculating a loss value based on a difference between the predicted animation-style image and the target animation-style image according to a loss function (loss function), then updating the model parameters of the style migration model based on the loss values to obtain the style migration model with the updated model parameters, then continuing to train the style migration model after the model parameters are updated by using the sample image and the target style object until the loss value of the style migration model meets the iteration ending condition, for example, the loss value is less than a preset loss threshold value, or when the iteration times reach the preset times, the style migration model at the moment can be used as the style migration model corresponding to the trained semantic category.

In the above embodiment, sample data is respectively acquired for each semantic category, and then the style migration model is trained by using the sample data to obtain the style migration model of each trained semantic category, for example, the style migration models respectively for people, articles, animals and backgrounds, so that when the style of the image to be processed is migrated, the style migration of the local feature images corresponding to different objects in the image to be processed is performed by using the corresponding style migration model, so that the different objects in the image to be processed have corresponding styles, and the stylization effect of the animation style image of the finally obtained image to be processed is better.

Where semantic categories include personalities, the personalities may be further divided into close-up personalities. In the embodiment of the invention, the close-up character in the image to be processed occupies a larger area of the image to be processed, so that in order to enable the close-up character to achieve a better image effect during style migration, the style migration model is trained by adopting the character sample data, and the character style migration model for the close-up character is obtained.

Specifically, portrait sample data, such as portrait sample data of various postures or upper half bodies, is collected in advance, where the portrait sample image is an original image of which a style needs to be migrated, and may be an image acquired from a network or a local device, or an image captured by a device such as a smart phone, a tablet computer, or a camera, for example, a portrait image captured by a camera; the style image of the animation of the target portrait is an image which has a style different from that of the sample image, but has image details similar to those of the sample image, and can be obtained by manual drawing or other methods, for example, the style image of the animation style corresponding to one portrait image.

For example, a portrait sample image N1 is obtained in advance, and a style transition is performed on the portrait sample image N1 to obtain a target portrait-animated style image N2 with an animation style, wherein the portrait details of the portrait sample image N1 and the target portrait-animated style image N2 are the same, such as the hairstyle and clothes of the person in the portrait sample image, and the hairstyle and clothes of the corresponding person are also in the target animation style image, but the styles of the two are different.

Specifically, a portrait sample image is input into a style migration model to be trained, a style image of the portrait sample image, which is predicted to be animated and cartoon, is obtained, then a loss value is obtained through calculation according to a loss function and based on a difference value between the style image of the portrait, which is predicted to be animated and the style image of the target portrait, model parameters of the style migration model are updated based on the loss value, a style migration model with updated model parameters is obtained, then the style migration model with the updated model parameters is trained by continuously utilizing the portrait sample image and the style image of the target portrait, until the style migration model meets an iteration ending condition, for example, the loss value is smaller than a preset threshold value or the iteration number reaches a preset number, and the style migration model can be used as a trained portrait style migration model.

It should be noted that the portrait style migration model is a model with semantic segmentation, that is, the semantic segmentation can be automatically performed on the local feature images of close-up characters to obtain portrait local feature images of multiple portrait semantic categories, and the animated style images of the portrait local regions corresponding to the portrait semantic categories are generated and then fused to obtain the animated style images of the portrait. Wherein the portrait semantic category may include at least one of hair, skin, and clothing.

In the embodiment, portrait sample data is collected, and then the style migration model is trained by using the portrait sample data to obtain the trained portrait style migration model, so that when the style of the image to be processed is migrated, the style of the local feature image corresponding to the close-up figure in the image to be processed is migrated by using the portrait style migration model, and the stylization effect of the local feature image of the close-up figure in the image to be processed is better.

In an exemplary embodiment of the present invention, the step 103, inputting the semantic type of the local feature image into a corresponding style migration model to obtain a local animated style image, and may include:

Specifically, semantic segmentation is performed on an image to be processed, local feature images corresponding to a plurality of semantic categories can be obtained, then the local feature images can be input into a style migration model corresponding to the semantic categories, and a local cartoon style image is obtained, wherein if the image to be processed includes the semantic categories of characters, whether the characters are close-up characters needs to be further determined, and if the characters are close-up characters, the local feature images corresponding to the close-up characters are input into a portrait style migration model, so that the portrait local cartoon style image is obtained.

Wherein the preset image ratio may be 30%. Specifically, when the local feature image of the person is included, the image proportion of the local feature image of the person in the image to be processed is determined, if the image proportion of the local feature image of the person in the image to be processed is more than half, which indicates that the image proportion is more than 30% of the preset image proportion, the semantic type can be determined to be a close-up person, the local feature image of the close-up person is input into the person style migration model, and the person style image with the local animation is obtained.

Referring to fig. 3, which shows a stylized schematic diagram of a close-up portrait of the present invention, from left to right, wherein a first image is a to-be-processed image containing close-up characters, a local feature image of the close-up characters and a local feature image of a background are semantically segmented from the to-be-processed image, then the local feature image of the close-up characters is input to a portrait style migration model, the portrait style migration model performs portrait semantic segmentation on the local feature image to obtain a portrait local feature image of hair, face, clothes and the like of the close-up characters, and then stylized according to a style of a second image to obtain a portrait partially animated style image corresponding to the portrait local feature image of each portrait semantic category, the portrait partially animated style image is fused, and the portrait partially animated style image of the background is input to the style migration model, and obtaining a local cartoon style image, and fusing the local cartoon style image and the portrait local cartoon style image to obtain a cartoon style image, which is specifically shown as a third image.

In the above embodiment, when the local feature images of the characters are included, whether the local feature images are close-up characters is further determined, and if the local feature images are close-up characters, the local feature images corresponding to the close-up characters can be input into the character style migration model to obtain the character style images with the local animation, so that the stylized effect of the local feature images of the close-up characters is better, and the stylized effect of the animation style images obtained by fusing the local animation style images based on the close-up characters is better.

In order to make those skilled in the art better understand the embodiment of the present invention, a specific scheme is used below to describe the image processing, specifically, taking the conversion into the animation style as an example, referring to fig. 4, a schematic flow chart of embedding an advertisement image into an animation according to the present invention is shown, which includes the following steps:

step 401, collecting sample data corresponding to different semantic categories of the cartoon and portrait sample data;

step 402, training a style migration model to be trained based on sample data of each semantic category; wherein, semantic categories can include characters, articles, animals, backgrounds and the like;

step 403, training the style migration model to be trained based on portrait sample data of the close-up portrait;

step 404, obtaining style migration models of various semantic categories and portrait style migration models of close-up portraits;

step 405, obtaining an advertisement image, and performing semantic segmentation on the advertisement image to obtain a local feature image with semantic categories;

step 406, if the local feature image of the close-up character is included, inputting the local feature image of the close-up character into the character style migration model;

and 407, fusing the style migration model and the local cartoon style images corresponding to the local characteristic images output by the portrait style migration model to obtain cartoon style images, and embedding the cartoon style images into the cartoon video.

In the embodiment of the invention, for the advertisement image to be embedded into the cartoon, the cartoon style animation style image corresponding to the cartoon can be automatically generated and embedded into the cartoon video based on the style migration model and the portrait style migration model, and manual processing is not needed, so that the advertisement image can be produced in batch, the advertisement manufacturing cost can be saved for companies, the possibility of rapid online of the advertisement can be provided, and the increase of the benefits of the cartoon advertisement can be facilitated.

To sum up, in the embodiment of the present invention, the local feature images corresponding to each semantic category are obtained by performing semantic segmentation through analysis and understanding of the content of the image to be processed, and then the local feature images are mapped to the corresponding styles through the style migration model, so that the animated style images of the image to be processed are more vivid and closer to the corresponding media objects, for example, the styles of animations, and the animated style images are more matched when being embedded into the animation video, thereby achieving better viewing experience.

Referring to fig. 5, which is a block diagram of a video processing apparatus according to an embodiment of the present invention, as shown in fig. 5, the apparatus 50 may specifically include the following modules:

the image acquisition module 501 is used for acquiring an image to be processed and an animation video to be embedded;

a semantic segmentation module 502, configured to perform semantic segmentation on the image to be processed to obtain a local feature image with a semantic category;

the stylized processing module 503 is configured to input the style of the animation video into a corresponding style migration model according to the semantic category of the local feature image, so as to obtain a local animation style image;

an image fusion module 504, configured to fuse the local animation style image to obtain an animation style image;

and a style image embedding module 505, configured to embed the cartoon style image into the cartoon video.

In an exemplary embodiment of the invention, the apparatus further comprises: the style migration model training module is used for acquiring sample data of each semantic category, wherein the sample data comprises a sample image and a target cartoon style image; respectively inputting the sample images into a style migration model to be trained corresponding to the semantic categories to obtain a style image for predicting animation; determining a loss value of the style migration model to be trained corresponding to each semantic category according to a difference value between the predicted animation style image and the target animation style image; and adjusting the model parameters of the style migration model to be trained according to the loss value to obtain the trained style migration model corresponding to each semantic category.

In an exemplary embodiment of the invention, the semantic categories may include at least one of characters, items, animals, and backgrounds, the characters including close-up characters.

In an exemplary embodiment of the invention, the apparatus further comprises: the portrait migration model training module is used for acquiring portrait sample data, wherein the portrait sample data comprises a portrait sample image and a style image of the animation of the target portrait; inputting the portrait sample image into a style migration model to be trained to obtain a forecast portrait cartoon style image, wherein the forecast portrait cartoon style image is obtained by dividing the portrait sample image into portrait local feature images with corresponding portrait semantic categories through the style migration model, generating the portrait local area cartoon style images corresponding to the portrait semantic categories and then fusing the portrait local feature images; determining a loss value of the style migration model to be trained according to a difference value between the predicted portrait cartoon style image and the target portrait cartoon style image; and adjusting the model parameters of the style migration model to be trained according to the loss value to obtain the portrait style migration model after training.

In an exemplary embodiment of the invention, the portrait semantic categories may include at least one of hair, skin, and clothing.

In an exemplary embodiment of the present invention, the stylization processing module 503 is configured to determine, when the semantic category of the local feature image is a person, an image proportion of the local feature image in the image to be processed; and when the image proportion reaches a preset image proportion, determining that the semantic category of the local characteristic image is a close-up character, and inputting the local characteristic image into a character style migration model to obtain a character local animation style image.

For the above device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiment.

An embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 61, a communication interface 62, a memory 63, and a communication bus 64, where the processor 61, the communication interface 62, and the memory 63 complete mutual communication through the communication bus 64,

a memory 63 for storing a computer program;

the processor 61 is configured to implement the following steps when executing the program stored in the memory 63:

acquiring an image to be processed and an animation video to be embedded;

fusing the local animation style images to obtain animation style images;

and embedding the cartoon style image into the cartoon video.

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but not only one bus or class of buses.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, and when the instructions are executed on a computer, the instructions cause the computer to execute the image processing method described in any of the above embodiments.

In yet another embodiment, the present invention further provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the image processing method described in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An image processing method, comprising:

acquiring an image to be processed and an animation video to be embedded;

fusing the local animation style images to obtain animation style images;

and embedding the cartoon style image into the cartoon video.

2. The method according to claim 1, wherein before the local animated style image is obtained by inputting the semantic type of the local feature image into a corresponding style migration model based on the style of the animation video, the method further comprises:

3. The method of claim 2, wherein the semantic categories include at least one of characters, items, animals, and contexts, the characters including close-up characters.

4. The method according to claim 3, wherein before the local feature image is input to the corresponding style migration model according to the semantic category of the local feature image to obtain the local animated style image, the method further comprises:

5. The method of claim 4, wherein the portrait semantic categories include at least one of hair, skin, and clothing.

6. The method according to claim 4, wherein the inputting to the corresponding style migration model according to the semantic category of the local feature image to obtain the local animated style image comprises:

7. An image processing apparatus characterized by comprising:

8. The apparatus of claim 7, further comprising: the style migration model training module is used for acquiring sample data of each semantic category, wherein the sample data comprises a sample image and a target cartoon style image; respectively inputting the sample images into a style migration model to be trained corresponding to the semantic categories to obtain a style image for predicting animation; determining a loss value of the style migration model to be trained corresponding to each semantic category according to a difference value between the predicted animation style image and the target animation style image; and adjusting the model parameters of the style migration model to be trained according to the loss value to obtain the trained style migration model corresponding to each semantic category.

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.