WO2023232056A1 - 图像处理方法、装置、存储介质及电子设备 - Google Patents

图像处理方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2023232056A1
WO2023232056A1 PCT/CN2023/097319 CN2023097319W WO2023232056A1 WO 2023232056 A1 WO2023232056 A1 WO 2023232056A1 CN 2023097319 W CN2023097319 W CN 2023097319W WO 2023232056 A1 WO2023232056 A1 WO 2023232056A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
stylized
target area
stylization
Prior art date
Application number
PCT/CN2023/097319
Other languages
English (en)
French (fr)
Inventor
张朋
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023232056A1 publication Critical patent/WO2023232056A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Definitions

  • An image receiving module configured to receive an image to be processed and a mask image of the target area in the image to be processed
  • An embodiment of the present disclosure also provides an electronic device, including:
  • the one or more processors are caused to implement the image processing method provided by any embodiment of the present disclosure.
  • Figure 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic structural diagram of a stylization processing system provided by an embodiment of the present disclosure
  • Figure 8 is a schematic comparison diagram of an image to be processed and a stylized image provided by an embodiment of the present disclosure
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “base is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least Some embodiments”. Relevant definitions of other terms will be given in the description below.
  • the user Before using the technical solutions disclosed in the multiple embodiments of this disclosure, the user should be informed of the type, scope of use, usage scenarios, etc. of the personal information involved in this disclosure in an appropriate manner in accordance with relevant laws and regulations, and the user's authorization should be obtained.
  • the data involved in this technical solution shall comply with the requirements of corresponding laws, regulations and relevant regulations.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure is suitable for converting an image to be processed into a stylized image.
  • This method can be performed by an image processing device provided by an embodiment of the present disclosure.
  • the image processing device can be implemented in the form of software and/or hardware, optionally, through electronic equipment, and the electronic equipment can be a mobile terminal, a personal computer (Personal Computer, PC) or a server.
  • the method includes the following steps.
  • S110 Receive an image to be processed and a mask image of the target area in the image to be processed.
  • S120 Process the image to be processed and the mask image based on the stylization processing system to obtain a stylized image associated with the target area.
  • the image to be processed is the original image that has been stylized.
  • the image to be processed may be a single image, or may be multiple frame images in the video to be processed.
  • the processing method of the present disclosure is performed on each frame image in the video to be processed, and the multi-frame style obtained Stylized images create stylized videos.
  • the image to be processed can be imported from an external device, or it can be collected through an externally connected image collection device (such as a camera, etc.), or it can be imported from local storage (such as a local photo album), or it can be collected through image collection.
  • the device (such as the device's own camera) collects it in real time.
  • the application scenarios of the embodiments of the present disclosure include stylization processing scenarios for inputting a single image (real-time collection or historical collection), stylization processing scenarios for collected videos, and styles for real-time collection of videos (such as live videos). processing scenarios.
  • the mask image of the target area in the image to be processed is an image that distinguishes the target area from other areas of the image to be processed in the form of a mask.
  • the target area is an area that maintains a strong correlation with the original content in the stylization process. This target The number of areas can be one or more, determined according to the needs of the operating user.
  • the mask image may be in the form of an image or a data matrix, and is not limited to this.
  • the target area may be manually selected by the operating user. For example, after receiving the image to be processed, the image to be processed is displayed on the display screen of the device. In the area selection mode, after detecting the user's In the case of an area selection operation, determine the target area corresponding to the area selection operation. Among them, the area selection mode can be automatically entered after the image to be processed is displayed, or an area selection control is set on the display page, and when the area selection control is triggered, the area selection mode is entered.
  • the region selection operation may be a region contour drawing operation, that is, using a finger or a mouse to draw the contour of the target area in the image to be processed, and by identifying the input contour, the area within the contour is determined as the target area.
  • the area selection operation can also be to determine the target area by setting the position and size of the area selection box.
  • the shape of the area selection box can be displayed, such as a rectangular frame and a circular frame.
  • the position of the selection box can be determined, and when a drag operation on the selection box is detected, the position of the selection box can be adjusted according to the drag operation.
  • the position of the selection box When a sliding operation within the display area of the image to be processed is detected, the size of the selection box can be adjusted according to the sliding operation. According to the position and size of the selection box, determine the area within the selection box as the target area.
  • a variety of regional recognition models can be preset, such as facial recognition models, portrait recognition models, food recognition models, etc.
  • the corresponding regional recognition models are called according to the region type.
  • the image to be processed is processed based on the called regional recognition model and the target area is output. Split the image.
  • the default type of the target area can be set in advance, for example, the target area is a facial area.
  • the facial area of the image to be processed is identified and the facial area is used as the target area.
  • the default type can be set and edited according to user needs.
  • the pre-trained stylized processing system processes the image to be processed and the mask image to obtain a stylized image corresponding to the image to be processed.
  • the target area in the stylized image is consistent with the target of the image to be processed.
  • the area is relevant, that is, the target area in the stylized image has a high similarity with the target area of the image to be processed.
  • the stylized processing system uses the mask image as auxiliary information and integrates the original content into the processing process information of the target area, so that the target area of the stylized image is kept relatively consistent with the target area of the image to be processed. High consistency.
  • the stylized image obtained based on the stylization processing system is displayed, for example, the image to be processed and the stylized image can be displayed on the same display page to facilitate comparison of the image to be processed and the stylized image.
  • the technical solution provided by this embodiment provides auxiliary information for the stylization process of the image to be processed by setting a mask image of the target area for the received image to be processed, so as to distinguish the target area from the non-target area.
  • Pre-set a trained stylized processing system with stylized processing capabilities.
  • the image to be processed and the mask map are processed based on the stylized processing system, and the target area and non-target area in the image to be processed are distinguished based on the mask map.
  • to obtain a stylized image associated with the target area The stylized image takes into account the consistency of the image style and the content of the target area, improves the correlation between the target area and the original content in the stylized image, and enables the conversion of the image style in the stylized image.
  • the content of the target area maintains a high degree of recognition from the original content, and the original content is well reflected in the converted image style.
  • the stylization processing system includes a coding model, an image reconstruction model, and an image stylization model.
  • the coding model is used to code the input image to obtain the image coding corresponding to the input image.
  • the coding model may be a neural network. Model.
  • the image reconstruction model and the image stylization model may be neural network models, for example, generator models.
  • the input information of the image reconstruction model and the image stylization model is encoded data, and the corresponding image is generated based on the encoded data.
  • the image reconstruction model is used to restore the encoded data to the image to be processed, and the image stylization model is used to generate a style based on the encoded data. image.
  • the encoding model is connected to the image reconstruction model and the image stylization model respectively, and the corresponding network layers of the image reconstruction model and the image stylization model are connected.
  • the corresponding network layer connection here is used to implement the network layer in the image reconstruction model. Transfer feature information to the network layer in the image stylization model.
  • the image reconstruction model and the image stylization model each include multiple network layers, and there is a corresponding relationship between the network layers in the image reconstruction model and the image stylization model.
  • the network layer with the corresponding relationship can be a local network layer in the model or All network layers. For example, a corresponding relationship is set between network layers in the same processing stage.
  • the image reconstruction model and the image stylization model respectively include different network layers.
  • the number of network layers is different, and the types or structures of the multiple network layers are different. wait.
  • the first network layer of the image reconstruction model can be connected to the first network layer of the image stylization model
  • the second network layer of the image reconstruction model can be connected to the third network layer of the image stylization model, etc., here
  • the corresponding relationship can be determined based on the structure of the image reconstruction model and the image stylization model and the processing functions of multiple network layers.
  • the image reconstruction model and the image stylization model have the same structure, have the same network layer, and are trained separately with different training data.
  • the two models By setting the image reconstruction model and the image stylization model with the same structure, the two models
  • the network layers with the corresponding number of layers in the network process the input information at the same stage.
  • the network layers with the same number of layers are connected to transmit the feature information so that the fused feature information matches.
  • the matching degree of feature information is improved and the accuracy of the stylized image is improved.
  • the corresponding network layers of the image reconstruction model and the image stylization model are connected, that is, the network layers of the same number of layers are connected.
  • Each network layer of the image reconstruction model transmits the feature information output by the network layer to the corresponding network layer of the image stylization model.
  • Network layer Network layer.
  • the network layer in the image stylization model combines the feature information generated by its own network layer with the corresponding network in the image reconstruction model.
  • the characteristic information transmitted by the network layer is fused to obtain the output characteristic information.
  • the feature information output by the network layer in the image reconstruction model and image stylization model can be a feature map or a feature matrix, and there is no limit to this.
  • processing the image to be processed and the mask image based on a stylization processing system to obtain a stylized image associated with the target area includes: inputting the image to be processed into the In the coding model, the image coding of the image to be processed is obtained; the image coding is input to the image reconstruction model, and the characteristic information of the network layer during the image coding process of the image reconstruction model is obtained; and the The image coding and mask map are input to the input end of the image stylization model, and the feature information of the network layer in the image reconstruction model is input to the corresponding network layer of the image stylization model to obtain the target Region-associated stylized images.
  • the image coding and mask map are used as input information from the input end of the image stylization model to the image stylization model, and the feature information generated by multiple network layers of the image reconstruction model is used as the input of the corresponding network layer of the image stylization model. information.
  • the network layer of the image stylization model generates the initial feature information of the current network layer based on image coding or the target feature information output by the previous network layer. Based on the mask map, the initial feature information of the current network layer and the image reconstruction model are input to the corresponding network layer.
  • the feature information of the current network layer is fused to obtain the target feature information of the current network layer, and the target feature information of the current network layer is input to the next network layer until the output of the last network layer of the image stylization model is consistent with the target area. Associated stylized images.
  • Any network layer of the image stylization model generates the initial feature information g2f i of the current network layer based on the image encoding or the target feature information output by the previous network layer, where i is the number of the network layer. Based on the mask map, g1f i and g2f i are fused to obtain the target feature information of the current network layer, and the target feature information of the current network layer is used as the input of the next network layer.
  • Target feature information The feature information of each network layer can be in the form of a feature map.
  • the feature map output by the last network layer is the stylized image associated with the target area.
  • the input information of the first network layer of the image stylization model is image coding, and the input information of non-first network layers is the target feature information output by the previous network layer.
  • the way in which the network layer in the image stylization model fuses the feature information g1fi in the image reconstruction model and the initial feature information g2fi in the current network layer can be to carry out different weights on the feature information corresponding to the target area and the non-target area. Combine, and combine the feature information fused based on the target area and the non-target area into target feature information. The fusion of target areas and non-target areas with different weights can be implemented based on the mask map.
  • the initial feature information of the current network layer and the feature information input by the image reconstruction model corresponding to the network layer are fused to obtain the target feature information of the current network layer, including: based on the first weight group, all Among the feature information input by the network layer corresponding to the initial feature information and the image reconstruction model, the feature information in the target area is feature fused to obtain the first fusion feature; based on the second weight group, the initial feature information and the image reconstruction are Among the feature information input by the corresponding network layer of the model, the feature information outside the target area is feature fused to obtain the second fusion feature; based on the first fusion feature and the second fusion feature, the target feature information of the current network layer is obtained.
  • the first weight group is the fusion weight of the feature information in the target area respectively among the initial feature information and the feature information input by the image reconstruction model corresponding to the network layer.
  • the second weight group is the fusion weight of the initial feature information and the feature information input by the image reconstruction model corresponding to the network layer.
  • the first weight group includes the first weight of the initial feature information, and the second weight of the feature information input by the corresponding network layer of the image reconstruction model, where neither the first weight nor the second weight is zero, respectively.
  • the sum of the first weight and the second weight is 1.
  • the first weight is a
  • the second weight is 1-a.
  • the second weight group includes the third weight of the initial feature information, and the fourth weight of the feature information input by the image reconstruction model corresponding to the network layer, where the sum of the third weight and the fourth weight is 1, and the third weight is greater than 0 and less than or equal to 1, the fourth weight is a value greater than or equal to 0 and less than 1.
  • the third weight is b
  • the fourth weight is 1-b.
  • the weight values in the first weight group and the second weight group can be set according to the fusion requirements. By adjusting the weight values, the content consistency degree of the target area can be controlled, and stylized images that meet different content consistency levels can be obtained. .
  • gfi a*g1fi*mask+(1-a)*g2fi*mask+b*g2fi*(1-mask)+(1-b )*g1fi*(1-mask).
  • gfi is the target feature information output by the i-th network layer of the image stylization model
  • g1fi is the feature information generated by the image reconstruction model at the i-th network layer
  • g2fi is the initial generated by the i-th network layer of the image stylization model.
  • Feature information, mask is the mask image.
  • multiple target areas in the mask image The pixel position can be set to 1, and multiple pixel positions in the non-target area are set to 0.
  • the mask in the formula represents the target area as 1, that is, a*g1fi*mask+(1-a)*g2fi*mask is the target
  • the feature information of the area is fused;
  • 1-mask represents the non-target area as 1, and b*g2fi*(1-mask)+(1-b)*g1fi*(1-mask) is used to fuse the non-target area.
  • a>1-b that is, relative to the non-target area, the weight corresponding to the feature information generated by the image reconstruction model in the target area is increased to reduce the degree of stylization of the target area and improve the similarity with the original content. similarity.
  • non-target areas directly use initial feature information without fusion of feature information to improve the degree of stylization of non-target areas.
  • Each network layer in the image stylization model performs the above processing until the last network layer outputs a stylized image.
  • the technical solution of the embodiment of the present disclosure obtains feature information of multiple network layers through the image reconstruction model in the process of restoring image coding, and inputs the multiple feature information into the corresponding network layer of the image stylization model.
  • the image style In the process of image encoding of the model, each network layer fuses the initial feature information generated by its own network layer and the feature information input by the image reconstruction model based on the mask map to achieve target and non-target areas.
  • the feature information in the image is fused with different weights to adjust the degree of stylization in the target area and obtain a stylized image associated with the target area. While stylizing the image to be processed, the stylization degree of the local area can be adjusted.
  • the training process of the image reconstruction model includes: training the image reconstruction model and the discriminant network model to be trained based on random data and sample images to obtain a trained image reconstruction model.
  • the image reconstruction model is the generator in the generative adversarial network
  • the discriminant network model can be the discriminator in the generative adversarial network.
  • the generative adversarial network is trained through the training data.
  • the training of the generative adversarial network is completed, Obtain the trained image reconstruction model.
  • FIG. 3 which is a schematic diagram of the training process of the image reconstruction model provided by an embodiment of the present disclosure.
  • the generative adversarial network includes a generator G1 and a discriminator D1.
  • the generator G1 and the discriminator D1 are trained alternately until the training end condition is met, and the trained generator G1 is determined as the image generator.
  • the alternating training process includes: fixing the network parameters in the generator G1, inputting random data into the generator G1, obtaining the reconstructed image output by the generator G1, and using the reconstructed image or training data as the input information of the discriminator D1.
  • the discriminator D1 outputs the discrimination result of the input information, and adjusts the network parameters of the discriminator D1 according to the label of the input information and the loss function.
  • the discriminator D1 After the discriminator D1 has gone through a preset training process, the discriminator D1 is fixed and the generator G1 is trained. That is, the network parameters of the generator G1 are adjusted through the determined loss function, and the above training process is alternately executed until convergence is reached. Under conditions such as state, the trained generator G1 is determined as the image reconstruction model.
  • the random data can be random noise.
  • the data format of the random data can be set according to the input requirements of the image reconstruction model.
  • the data format can include a data length, and the data length can be the same as the encoding model.
  • the output data length is the same.
  • the training data used to train the discriminator D1 can be collected through image acquisition equipment.
  • real objects are collected at different shooting angles and different light intensities.
  • the real objects are not limited here and can be determined according to the training needs.
  • the real object may be a real person or the like.
  • the training data may also be rendered virtual characters, or may be images generated by a pre-trained generative adversarial network, etc., without limitation.
  • the training process of the coding model includes: iteratively executing the following training process until the training conditions are met, and obtaining the trained coding model: inputting the sample image into the coding model to be trained, and obtaining the training image Encoding; inputting the training image encoding into the trained image reconstruction model to obtain a reconstructed image; adjusting model parameters of the encoding model based on the sample image and the reconstructed image.
  • FIG. 4 is a schematic diagram of the training process of a coding model provided by an embodiment of the present disclosure.
  • the coding model is assisted in training, and the training data is input to the coding model to be trained to obtain the training image coding output by the coding model, where the training data can be training data used to train the image reconstruction model,
  • the training image encoding is input into the image reconstruction model, and the image reconstruction model generates a reconstructed image according to the training image encoding, where the input training data is the theoretical data of the reconstructed image, a loss function is determined based on the training data and the reconstructed image, and based on the loss function Adjust the network parameters of the coding model.
  • the type of loss function is not limited here.
  • the training method of the image stylization model includes: performing parameter initialization processing on the image stylization model based on the model parameters of the image reconstruction model; and initializing the image to be trained based on random data and stylized sample images.
  • the stylized model and the discriminant network model are trained to obtain the trained image stylized model.
  • the image stylization model is the generator in the generative adversarial network, and the discriminant network model can be the discriminator in the generative adversarial network.
  • the generative adversarial network is trained through the training data. When the training of the generative adversarial network is completed, the trained result is obtained.
  • Image stylization model can be stylized images, which can be read from open source commercial data sets, or can be modified through image processing software (Photoshop, PS) , virtual character rendering or adversarial generation network generation, there is no limit here.
  • the image stylization model has the same structure as the image reconstruction model.
  • the image stylization model performs parameter initialization processing, and iteratively trains the initialized image stylization model to obtain the trained image stylization model. Assigning values to the network parameters of the image stylization model during the initialization process is conducive to accelerating the training process of the image stylization model, shortening the training time of the image stylization model, and reducing the training data required by the image stylization model during the training process. The number reduces the difficulty of setting up training data.
  • FIG. 5 is a schematic diagram of the training process of the image stylization model provided by an embodiment of the present disclosure.
  • the training process of the image stylization model is similar to the training process of the image reconstruction model.
  • the generator and discriminator after initialization are trained alternately until the training end conditions are met, which will not be described again here.
  • the method further includes: using the image to be processed and the stylized image as an image pair in the training sample; training an end-to-end mobile network model based on multiple image pairs, Obtain an end-to-end stylized network model.
  • the end-to-end mobile network model can include an encoder and a decoder.
  • the encoder can downsample the input image, and the decoder can upsample the output features of the previous network layer. This The number of network layers for the encoder and decoder is not limited here.
  • the mobile network model Compared with the stylized processing system, the mobile network model has a simple structure, takes up little memory, and consumes little computing power in the running process. It is suitable for configuration on mobile devices such as mobile phones to achieve stylized processing of images on mobile devices. , to obtain a stylized image associated with the target area in the input image.
  • the target areas of the multiple images to be processed are the same. Accordingly, the trained mobile network model can obtain a stylized image associated with the target area of the image to be processed.
  • the image to be processed is an image including a face area
  • the target area is the face area
  • the stylized image is a stylized image associated with the face area.
  • the mobile terminal obtained by training based on the above image pair
  • the network model can stylize the input image and obtain a stylized image associated with the facial area of the input image.
  • the technical solution provided by this embodiment trains the mobile network model based on the input images and output images processed by the stylization processing system to obtain a mobile network model adapted to mobile applications and realize image stylization on the mobile terminal. chemical treatment.
  • FIG. 6 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure.
  • the method also includes: extracting the target area in the image to be processed to obtain a target area image; inputting the target area image into the The stylization processing system obtains a local stylized image of the target area; performs image fusion on the stylized image associated with the target area and the local stylized image to obtain the target stylized image; correspondingly, the display and the stylized image are
  • the stylized image associated with the target area includes: displaying the target stylized image.
  • the method includes the following steps.
  • S250 Image fuse the stylized image associated with the target area and the local stylized image to obtain a target stylized image.
  • the technical solution provided by this embodiment is to stylize the local image formed in the target area to obtain a local stylized image.
  • the local stylized image is not affected by the content of the non-target area and is consistent with the content of the target area in the image to be processed. The consistency is high.
  • the local stylized image is fused with the overall stylized image corresponding to the image to be processed to obtain the target stylized image and improve the target stylized image. Consistency between the target area and the original content.
  • the facial area is extracted from the image to be processed to form a facial area image.
  • the facial area image is stylized to obtain a local stylized image of the facial area.
  • the local stylized image is combined with the above-mentioned image to be processed.
  • the stylized image obtained by the overall processing is fused with the facial area to obtain a style map that is more consistent with the real face.
  • Figure 8 is a schematic comparison diagram of an image to be processed and a stylized image provided by an embodiment of the present disclosure.
  • the left image in Figure 8 is the image to be processed, and the right image is the image obtained after being processed by the stylized processing system.
  • stylized image The portrait in the picture is a virtual portrait synthesized by the device and is only an example.
  • the image to be processed is a portrait image containing a facial area.
  • the target area is the facial area
  • the stylized processing system converts the image into an ancient style.
  • the stylization degree of areas other than the facial area in the image to be processed is greater than the stylization degree of the facial area, especially the background and hair areas, which are converted to ancient style.
  • the facial area is The similarity with the original content is high, so that the similarity of the face can be clearly recognized from the stylized image, and the situation where the stylized image is obviously inconsistent with the input image to be processed is avoided.
  • Figure 9 is a schematic structural diagram of an image processing device provided by an embodiment of the present disclosure. As shown in Figure 9, the device includes: an image receiving module 410, an image processing module 420 and an image display module 430.
  • the image receiving module 410 is configured to receive an image to be processed and a mask image of the target area in the image to be processed;
  • the image processing module 420 is configured to process the image to be processed and the mask image based on a stylized processing system to obtain a stylized image associated with the target area;
  • the image display module 430 is configured to display a stylized image associated with the target area.
  • the technical solution provided by the embodiment of the present disclosure provides auxiliary information for the stylization process of the image to be processed by setting a mask image of the target area for the received image to be processed, so as to distinguish the target area from the non-target area.
  • Pre-set a trained stylized processing system with stylized processing capabilities.
  • the image to be processed and the mask map are processed based on the stylized processing system, and the target area and non-target area in the image to be processed are distinguished based on the mask map. , to obtain a stylized image associated with the target area, which takes into account both the image style and the content consistency of the target area.
  • the stylization processing system includes a coding model, an image reconstruction model and an image stylization model, wherein the coding model is consistent with the image reconstruction model and the image stylization model.
  • the corresponding network layers of the image reconstruction model and the image stylization model are connected respectively.
  • the image processing module 420 includes:
  • An image coding determination module configured to input the image to be processed into the coding model and obtain the image coding of the image to be processed
  • a feature information determination module configured to input the image coding into the image reconstruction model to obtain the feature information of the network layer during the image coding process by the image reconstruction model;
  • a stylized image determination module configured to input the image encoding and mask map to the input end of the image stylization model, and to input the feature information of the network layer in the image reconstruction model to the image stylization respectively.
  • the corresponding network layer of the model obtains a stylized image associated with the target area.
  • the network layer in the image stylization model generates the initial feature information of the current network layer based on the feature information input by the previous network layer, and converts the current network layer's feature information based on the mask map.
  • the initial feature information and the image reconstruction model are fused with the feature information input by the corresponding network layer to obtain the target feature information of the current network layer, and the target feature information of the current network layer is input to the next network layer until the image stylization model
  • the last network layer outputs a stylized image associated with the target area.
  • the network layer in the image stylization model based on the first weight group, compares the initial feature information and the feature information input by the corresponding network layer of the image reconstruction model, in the target area Perform feature fusion on the feature information within the target area to obtain the first fusion feature; based on the second weight group, perform feature fusion on the feature information outside the target area among the feature information input by the network layer corresponding to the initial feature information and the image reconstruction model. , obtain the second fusion feature; and, based on the first fusion feature and the second fusion feature, obtain the target feature information of the current network layer.
  • the device further includes:
  • a local stylized image generation module configured to input the target area image to the stylization processing system to obtain a local stylized image of the target area
  • the image display module 430 is configured to display the target stylized image.
  • the device further includes:
  • the image reconstruction model training module is configured to train the image reconstruction model and discriminant network model to be trained based on random data and sample images, and obtain the trained image reconstruction model.
  • the device further includes:
  • the coding model training module is configured to iteratively execute the following training process until the training conditions are met and the trained coding model is obtained: input the sample image into the coding model to be trained to obtain the training image coding; input the training image coding into From the image reconstruction model that has been trained, a reconstructed image is obtained; and the model parameters of the encoding model are adjusted based on the sample image and the reconstructed image.
  • the device further includes:
  • the image stylization model training module is configured to: perform parameter initialization processing on the image stylization model based on the model parameters of the image reconstruction model; and initialize the image stylization model and discriminant network model to be trained based on random data and stylized sample images. Carry out training and obtain the trained image Stylized model.
  • the image to be processed is an image including a facial area, and the target area is a facial area;
  • the image processing module 420 is configured to: process the image to be processed including the facial area and the mask image of the facial area based on the stylization processing system to obtain a stylized image associated with the facial area.
  • the device further includes:
  • the mobile terminal model training module is configured to determine the image to be processed and the stylized image as image pairs in the training sample, and train the end-to-end mobile terminal network model based on multiple image pairs to obtain the end-to-end End-to-end stylized network model.
  • the image processing device provided by the embodiments of the present disclosure can execute the image processing method provided by any embodiment of the present disclosure, and has functional modules and effects corresponding to the execution method.
  • the multiple units and modules included in the above-mentioned device are only divided according to functional logic, but are not limited to the above-mentioned divisions, as long as they can achieve the corresponding functions; in addition, the names of the multiple functional units are only for the convenience of distinguishing each other. , are not used to limit the protection scope of the embodiments of the present disclosure.
  • the electronic device 500 may include a processing device (such as a central processing unit, a graphics processor, etc.) 501, which may process data according to a program stored in a read-only memory (Read-Only Memory, ROM) 502 or from a storage device. 508 loads the program in the random access memory (Random Access Memory, RAM) 503 to perform various appropriate actions and processes. In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored.
  • the processing device 501, ROM 502 and RAM 503 are connected to each other via a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504.
  • the following devices can be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display.
  • An output device 507 Liquid Crystal Display, LCD), speaker, vibrator, etc.; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509.
  • Communication device 509 may allow electronic device 500 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 10 illustrates an electronic device 500 with multiple devices, implementation or availability of all illustrated devices is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 509, or from storage device 508, or from ROM 502.
  • the processing device 501 When the computer program is executed by the processing device 501, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
  • the electronic device provided by the embodiments of the present disclosure and the image processing method provided by the above embodiments belong to the same concept.
  • Technical details that are not described in detail in this embodiment can be referred to the above embodiments, and this embodiment has the same effect as the above embodiments. .
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware. Among them, the name of a unit does not constitute a limitation on the unit itself.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Parts
  • SOC System on Chip
  • CPLD Complex Programmable Logic Device
  • Example 7 provides an image processing method, further including:
  • the training process of the image reconstruction model includes: training the image reconstruction model and the discriminant network model to be trained based on random data and sample images to obtain a trained image reconstruction model.
  • the training process of the encoding model includes: iteratively executing the following training process until the training conditions are met, and obtaining a trained encoding model: inputting sample images into the encoding model to be trained to obtain training image encoding; encoding the training image Input it into the image reconstruction model that has been trained to obtain a reconstructed image; adjust the model parameters of the encoding model based on the sample image and the reconstructed image.
  • Example 9 provides an image processing method, further comprising: the training method of the image stylization model includes: stylizing the image based on the model parameters of the image reconstruction model The model performs parameter initialization processing; the initialized image stylization model and discriminant network model to be trained are trained based on random data and stylized sample images to obtain the trained image stylization model.
  • Example 10 provides an image processing method, further comprising: the image to be processed is an image including a facial area, and the target area is a facial area;
  • the stylization-based processing system processes the to-be-processed image and the mask image to obtain a stylized image associated with the target area, including: processing the to-be-processed image including the facial area based on the stylization-based processing system.
  • the image and the mask image of the facial area are processed to obtain a stylized image associated with the facial area.
  • Example 11 provides an image processing method, further comprising:
  • the method further includes: determining the image to be processed and the stylized image as image pairs in the training sample, and training an end-to-end mobile network model based on the multiple image pairs to obtain an end-to-end style. network model.
  • Example 12 provides an image processing device, including:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种图像处理方法、装置、存储介质及电子设备。其中,所述图像处理方法包括:接收待处理图像和所述待处理图像中目标区域的掩膜图;基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;展示与所述目标区域关联的风格化图像。

Description

图像处理方法、装置、存储介质及电子设备
本申请要求在2022年06月02日提交中国专利局、申请号为202210625667.5的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像处理技术,例如涉及一种图像处理方法、装置、存储介质及电子设备。
背景技术
随着科技的不断发展,越来越多的应用软件走进用户的生活,逐渐丰富用户的业余生活。例如用户可通过多种多样的应用软件通过视频或者图像等的形式记录生活,并将视频或者图像上传到网络中。
通过应用软件将采集的视频或者图像等进行风格化处理,但是风格化处理得到的风格化图像与处理前的原始内容差异大且关联性差,以至于处理得到的风格化图像不能很好的反应原始图像中的内容。示例性的,人像图经过风格化处理,得到的风格化图像与原始人像图中人像面部等关键区域相差较远,无法识别为同一人像。
发明内容
本公开提供一种图像处理方法、装置、存储介质及电子设备,以实现提高风格化图像与原始图像内容的关联性。
本公开实施例提供了一种图像处理方法,包括:
接收待处理图像和所述待处理图像中目标区域的掩膜图;
基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;
展示与所述目标区域关联的风格化图像。
本公开实施例还提供了一种图像处理装置,包括:
图像接收模块,设置为接收待处理图像和所述待处理图像中目标区域的掩膜图;
图像处理模块,设置为基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;
图像展示模块,设置为展示与所述目标区域关联的风格化图像。
本公开实施例还提供了一种电子设备,包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本公开任意实施例提供的图像处理方法。
本公开实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行本公开任意实施例提供的图像处理方法。
附图说明
图1为本公开实施例所提供的一种图像处理方法的流程示意图;
图2是本公开实施例提供的一种风格化处理系统的结构示意图;
图3是本公开实施例提供的图像重建模型训练过程的示意图;
图4是本公开实施例提供的编码模型的训练过程的示意图;
图5是本公开实施例提供的图像风格化模型的训练过程的示意图;
图6是本公开实施例提供的一种图像处理方法的流程示意图;
图7是本公开实施例提供的一种图像处理方法的流程示意图;
图8是本公开实施例提供的一种待处理图像和风格化图像的对比示意图;
图9为本公开实施例所提供的一种图像处理装置的结构示意图;
图10为本公开实施例所提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的一些实施例,然而本公开可以通过多种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了理解本公开。本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基 于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,除非在上下文另有指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
在使用本公开多个实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,该用户请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。
作为一种可选的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。
上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现方式中。
本技术方案所涉及的数据(包括数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。
图1为本公开实施例所提供的一种图像处理方法的流程示意图,本公开实施例适用于将待处理图像转换为风格化图像的情形,该方法可以由本公开实施例提供的图像处理装置来执行,该图像处理装置可以通过软件和/或硬件的形式实现,可选的,通过电子设备来实现,该电子设备可以是移动终端、个人计算机(Personal Computer,PC)端或服务器等。如图1所示,所述方法包括以下步骤。
S110、接收待处理图像和所述待处理图像中目标区域的掩膜图。
S120、基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像。
S130、展示与所述目标区域关联的风格化图像。
待处理图像为进行风格化处理的原始图像。在一些实施例中,待处理图像可以是单一图像,还可以是待处理视频中的多帧图像,相应的,对待处理视频中的每一帧图像执行本公开的处理方式,得到的多帧风格化图像可形成风格化视频。
待处理图像可以是从外部设备导入的,还可以是通过外部连接的图像采集设备(例如相机等)采集的,或者可以是从本地存储(例如本地相册)中导入的,还可以是通过图像采集设备(例如设备自带摄像头)实时采集的。相应的,本公开实施例的应用场景包括输入单一图像(实时采集的,或者历史采集的)的风格化处理场景、对已采集视频的风格化处理场景、实时采集视频(例如直播视频)的风格化处理场景。
待处理图像中目标区域的掩膜图为通过掩膜形式,将目标区域与待处理图像的其他区域进行区分的图像,该目标区域为风格化处理中保持与原始内容强相关的区域,该目标区域的数量可以是一个或多个,根据操作用户的需求确定。该掩膜图可以是图像形式,还可以是数据矩阵形式,对此不作限定。
在一些实施例中,目标区域可以是操作用户手动选择的,示例性的,在接收待处理图像后,在设备的显示屏幕上显示该待处理图像,在区域选择模式下,在检测到用户的区域选择操作的情况下,确定该区域选择操作对应的目标区域。其中,区域选择模式可在展示待处理图像后自动进入,或者,在显示页面上设置有区域选择控件,在区域选择控件被触发的情况下,进入区域选择模式。
区域选择操作可以是区域轮廓绘制操作,即通过手指或者鼠标等在待处理图像中绘制目标区域的轮廓,通过识别输入的轮廓,将轮廓内的区域确定为目标区域。区域选择操作还可以是通过设置区域选择框的位置和尺寸,确定目标区域,其中,在待处理图像的显示页面上,可显示区域选择框的形状,例如矩形框以及圆形框等,在任一区域选择框的形状被选择的情况下,在检测到待处理图像的显示区域内的点击操作时,可确定选择框的位置,在检测到对选择框的拖动操作时,根据拖动操作调节选择框的位置,在检测到待处理图像的显示区域内的滑动操作时,可根据滑动操作调节选择框的尺寸。根据选择框的位置和尺寸,确定选择框内的区域为目标区域。
在一些实施例中,目标区域可通过自动识别、以及自动分割得到。可选的, 在待处理图像的显示页面上,设置有区域类型,例如,区域类型可以包括面部、眼睛、嘴、人像、食物、花、数目、前景、背景等,可根据用户的选择,确定目标区域的区域类型。根据选择的区域类型,对待处理图像进行识别,并基于识别结果对待处理图像进行图像分割,得到目标区域。示例性的,选择的类型为面部,在待处理图像中识别并分割得到面部区域,将面部区域作为目标区域。其中识别出的区域的数量可以是多个。可对识别出的多个目标区域进行选择,例如在待处理图像中识别出2个面部区域,可通过将被选择的面部区域作为目标区域,未被选择的面部区域作为非目标区域。
可预先设置多种区域识别模型,例如包括面部识别模型、人像识别模型、食物识别模型等,根据区域类型调用对应的区域识别模型,基于调用的区域识别模型对待处理图像进行处理,输出目标区域的分割图像。
可预先设置目标区域的默认类型,例如目标区域为面部区域,相应的,在接收到待处理图像后,识别待处理图像的面部区域,将面部区域作为目标区域。其中,默认类型可根据用户需求进行设置和编辑。
基于目标区域对待处理图像进行掩膜处理,其中,目标区域和非目标区域可以通过0和1的方式进行区分。
本实施例中,通过预先训练好的风格化处理系统,对待处理图像和掩膜图进行处理,得到与待处理图像对应的风格化图像,该风格化图像中的目标区域与待处理图像的目标区域具有关联性,即风格化图像中的目标区域与待处理图像的目标区域具有较高的相似性,在风格化处理的同时,保留了目标区域较高的真实性和一致性。风格化处理系统在对待处理图像的处理过程中,将掩膜图作为辅助信息,在目标区域的处理过程信息中融入原始内容,以使得风格化图像的目标区域与待处理图像的目标区域保持较高的一致性。
风格化处理系统对图像的风格化处理,转换得到的图像风格此处不作限定,可根据风格转换需求确定。示例性的,风格化处理系统对应的转换风格可以是包括古风风格、印象派风格、简笔画风格等。不同风格类型对应的风格化处理系统,可根据对应风格类型的图像训练得到,对此不作限定。此处不限定风格化处理系统的结构,在一些实施例中,风格化处理系统可以是一个机器学习模型,例如神经网络模型或者深度神经网络模型等,在一些实施例中,风格化处理系统可以是由多个机器学习模型组成的,组成该机器学习模型的多个机器学习模型可以是相同类型的模型,还可以是不同类型的模型。
将基于风格化处理系统得到的风格化图像进行展示,例如可以是将待处理图像和风格化图像在同一显示页面进行展示,便于将待处理图像和风格化图像进行比对。
本实施例提供的技术方案,通过对接收的待处理图像,设置目标区域的掩码图像,为待处理图像的风格化处理过程提供辅助信息,以区分目标区域和非目标区域。预先设置已训练好的具有风格化处理能力的风格化处理系统,基于风格化处理系统对待处理图像和掩膜图进行处理,基于掩膜图对待处理图像中的目标区域和非目标区域进行区分处理,以得到与目标区域关联的风格化图像,该风格化图像兼顾图像风格和目标区域的内容一致性,提高风格化图像中目标区域与原始内容的关联性,使得在风格化图像在转换图像风格的基础上,对目标区域的内容保持与原始内容较高的识别度,将原始内容在转换的图像风格上进行很好的体现。
在一实施例中,风格化处理系统包括编码模型、图像重建模型和图像风格化模型,其中,编码模型用于对输入图像进行编码,得到输入图像对应的图像编码,该编码模型可以是神经网络模型。图像重建模型和图像风格化模型可以为神经网络模型,例如,生成器模型。图像重建模型和图像风格化模型的输入信息为编码数据,基于编码数据生成对应的图像,其中,图像重建模型用于将编码数据还原为待处理图像,图像风格化模型用于基于编码数据生成风格化图像。编码模型与所述图像重建模型和所述图像风格化模型分别连接,图像重建模型和图像风格化模型的对应网络层连接,此处对应网络层的连接,用于实现图像重建模型中的网络层向图像风格化模型中的网络层传输特征信息。图像重建模型和图像风格化模型分别包括多个网络层,且图像重建模型和图像风格化模型中网络层之间存在对应关系,其中,具有对应关系的网络层可以是模型中的局部网络层或全部网络层。例如将处于同一处理阶段的网络层设置对应关系,在一些实施例中图像重建模型和图像风格化模型分别包括的网络层不同,例如,网络层的数量不同,多个网络层的类型或结构不同等。示例性的,可将图像重建模型的第一网络层与图像风格化模型的第一网络层连接,将图像重建模型的第二网络层与图像风格化模型的第三网络层连接等,此处仅为示例,可根据图像重建模型和图像风格化模型的结构和多个网络层的处理功能确定对应关系。在一些实施例中,图像重建模型和图像风格化模型的结构相同,具有相同的网络层,经过不同的训练数据分别训练得到,通过设置相同的结构的图像重建模型和图像风格化模型,两模型中对应层数的网络层对输入信息的处理阶段相同,将相同层数的网络层连接进行特征信息的传输,使得进行融合的特征信息相匹配。在简化了上述两模型之间网络层的对应关系的确定方式的基础上,提高了特征信息的匹配度,提高风格化图像的精度。图像重建模型和图像风格化模型的对应网络层连接,即相同层数的网络层相连接,图像重建模型的每一个网络层,将该网络层输出的特征信息,传输至图像风格化模型的对应网络层,图像风格化模型中的网络层将自身网络层生成的特征信息与图像重建模型中对应网 络层传输的特征信息进行融合,以得到输出的特性信息。其中,图像重建模型和图像风格化模型中网络层输出的特征信息,可以是特征图,还可以是特征矩阵,对此不作限定。
在一些实施例中,基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像,包括:将所述待处理图像输入至所述编码模型中,得到所述待处理图像的图像编码;将所述图像编码输入至所述图像重建模型,得到所述图像重建模型对所述图像编码处理过程中网络层的特征信息;将所述图像编码和掩膜图输入至所述图像风格化模型的输入端,以及将所述图像重建模型中网络层的特征信息分别输入至所述图像风格化模型的对应网络层,得到与所述目标区域关联的风格化图像。
示例性的,参见图2,图2是本公开实施例提供的一种风格化处理系统的结构示意图。将待处理图像作为编码模型的输入信息,输入至编码模型中,得到该待处理图像对应的图像编码,该图像编码可以是数据矩阵或者数据向量的形式,对此不做限定。将图像编码作为图像重建模型的输入信息,输入图像重建模型中,该图像重建模型中可以包括多个网络层,该网络层基于输入信息生成特征信息,将生成的特征信息输入至下一网络层,以及在图像重建模型中网络层与图像风格化模型中网络层存在连接的情况下,将生成的特征信息输入至图像风格化模型中对应网络层。
将图像编码和掩膜图作为输入信息,从图像风格化模型的输入端输入至图像风格化模型,以及将图像重建模型多个网络层生成的特征信息,作为图像风格化模型对应网络层的输入信息。图像风格化模型的网络层,基于图像编码或者前一网络层输出的目标特征信息生成当前网络层的初始特征信息,基于掩膜图将当前网络层的初始特征信息和图像重建模型对应网络层输入的特征信息进行融合处理,得到当前网络层的目标特征信息,并将所述当前网络层的目标特征信息输入至下一个网络层,直到图像风格化模型的最后一个网络层输出与所述目标区域关联的风格化图像。
以图像重建模型G1与图像风格化模型G2的模型结构相同,图像重建模型G1与图像风格化模型G2的对应层数的网络层之间连接为例,图像重建模型G1中多个网络层输出的特征信息可标记为G1F={g1f1,g1f2,g1f3,…g1fn},n=G1的层数,其中,g1f1为第一网络层输出的特征信息,相应的输入至图像风格化模型G2的第一网络层中,并以此类推。图像风格化模型的任一网络层基于该图像编码或者前一网络层输出的目标特征信息,生成当前网络层的初始特征信息g2fi,其中,i为网络层的层数。基于掩膜图将g1fi和g2fi进行融合处理,得到当前网络层的目标特征信息,将该当前网络层的目标特征信息作为下一网络层输入的 目标特征信息。每一个网络层的特征信息可以是特征图的形式,相应的,最后一网络层输出的特征图即为与目标区域关联的风格化图像。图像风格化模型的首个网络层的输入信息为图像编码,非首个网络层的输入信息为前一网络层输出的目标特征信息。
图像风格化模型中网络层对图像重建模型中的特征信息g1fi和当前网络层的初始特征信息g2fi进行融合的方式可以是,对目标区域和非目标区域分别对应的特征信息进行不同权重的组合,并将基于目标区域和非目标区域分别融合后的特征信息组合成目标特征信息。可以是基于掩膜图实现对目标区域和非目标区域的不同权重的融合。
可选的,基于掩膜图将当前网络层的初始特征信息和图像重建模型对应网络层输入的特征信息进行融合处理,得到当前网络层的目标特征信息,包括:基于第一权重组,对所述初始特征信息和图像重建模型对应网络层输入的特征信息中,在对目标区域内的特征信息进行特征融合,得到第一融合特征;基于第二权重组,对所述初始特征信息和图像重建模型对应网络层输入的特征信息中,在对目标区域外的特征信息进行特征融合,得到第二融合特征;基于所述第一融合特征和第二融合特征,得到当前网络层的目标特征信息。
第一权重组为初始特征信息和图像重建模型对应网络层输入的特征信息中,分别在目标区域的特征信息的融合权重,第二权重组为初始特征信息和图像重建模型对应网络层输入的特征信息中,分别在非目标区域,即目标区域外的特征信息的融合权重。示例性的,第一权重组中包括初始特征信息的第一权重,以及图像重建模型对应网络层输入的特征信息的第二权重,其中,第一权重和第二权重均不为零,分别为大于0且小于1的数值,第一权重和第二权重的和为1。示例性的,第一权重为a,第二权重为1-a。第二权重组中包括初始特征信息的第三权重,和图像重建模型对应网络层输入的特征信息的第四权重,其中,第三权重和第四权重的和为1,且第三权重为大于0且小于或等于1的数值,第四权重为大于或等于0且小于1的数值。示例性的,第三权重为b,第四权重为1-b。其中,第一权重组和第二权重组中的权重数值可根据融合需求进行设置,通过调节权重数值,实现对目标区域的内容一致性程度进行调控,得到满足不同内容一致性程度的风格化图像。
图像风格化模型的任一网络层的特征融合过程可以通过如下公式实现:gfi=a*g1fi*mask+(1-a)*g2fi*mask+b*g2fi*(1-mask)+(1-b)*g1fi*(1-mask)。其中,gfi为图像风格化模型的第i层网络层输出的目标特征信息,g1fi为图像重建模型在第i层网络层生成的特征信息,g2fi为图像风格化模型的第i网络层生成的初始特征信息,mask为掩膜图。本实施例中,掩膜图中目标区域中多个 像素位置可设置为1,非目标区域中的多个像素位置设置为0,相应的,公式中mask表征目标区域为1,即a*g1fi*mask+(1-a)*g2fi*mask为对目标区域的特征信息进行融合;1-mask表征非目标区域为1,b*g2fi*(1-mask)+(1-b)*g1fi*(1-mask)为对非目标区域进行融合。在一些实施例中,a>1-b,即相对于非目标区域,目标区域中图像重建模型生成的特征信息对应的权重加大,以减小目标区域的风格化程度,提高与原始内容的相似性。
在一些实施例中,非目标区域直接采用初始特征信息,不进行特征信息的融合,以提高非目标区域的风格化程度。相应的,目标特征信息可通过如下公式实现:gfi=a*g1fi*mask+(1-a)*g2fi*mask+g2fi*(1-mask)。
图像风格化模型中每一个网络层执行上述处理过程,直到最后一网络层输出风格化图像。
本公开实施例的技术方案,通过图像重建模型在对图像编码进行还原的过程中,得到多个网络层的特征信息,并将多个特征信息输入至图像风格化模型的对应网络层,图像风格化模型在对图像编码进行处理的过程中,每一个网络层将自身网络层生成的初始特征信息和图像重建模型输入的特征信息,基于掩膜图进行融合处理,实现对目标区域和非目标区域中特征信息进行不同权重的融合,以调节目标区域中风格化程度,得到与目标区域关联的风格化图像,在对待处理图像进行风格化处理的同时,实现局部区域风格化程度可调。
在上述实施例的基础上,所述图像重建模型的训练过程,包括:基于随机数据和样本图像对待训练的图像重建模型和判别网络模型进行训练,得到训练完成的图像重建模型。本实施例中,图像重建模型为生成对抗网络中的生成器,判别网络模型可以是生成对抗网络中的判别器,通过训练数据对生成对抗网络进行训练,在生成对抗网络训练完成的情况下,得到训练完成的图像重建模型。示例性的,参见图3,图3是本公开实施例提供的图像重建模型的训练过程的示意图。示例性的,生成对抗网络中,包括生成器G1和判别器D1,对生成器G1和判别器D1进行交替的训练,直到满足训练结束条件,将训练完成的生成器G1确定为图像生成器。交替训练过程包括:固定生成器G1中的网络参数,将随机数据输入至生成器G1中,得到生成器G1输出的重建图像,将该重建图像或者训练数据作为判别器D1的输入信息,判别器D1输出对输入信息的判别结果,并根据输入信息的标签确实损失函数,对判别器D1进行网络参数的调节。在对判别器D1经过预设训练的过程后,固定判别器D1,对生成器G1进行训练,即通过确定的损失函数对生成器G1进行网络参数的调节,交替执行上述训练过程,直到达到收敛状态等条件的情况下,将训练好的生成器G1确定为图像重建模型。
在上述实施例中,随机数据可以是随机噪声,可选的,根据图像重建模型的输入需求,设置随机数据的数据格式,该数据格式可以是包括数据长度,该数据长度可以是与编码模型的输出数据长度相同。用于对判别器D1进行训练的训练数据可以是通过图像采集设备采集得到,例如对真实对象在不同拍摄角度、不同光线强度下采集得到,此处不限定真实对象,可根据训练需求确定,在一些实施例中,真实对象可以是真实人物等。示例性的,训练数据还可以是虚拟人物经渲染得到,还可以是预先训练的生成对抗网络生成的图像等,对此不做限定。
在上述实施例的基础上,所述编码模型的训练过程包括:迭代执行如下训练过程,直到满足训练条件,得到训练完成的编码模型:将样本图像输入至待训练的编码模型中,得到训练图像编码;将所述训练图像编码输入至已训练完成的图像重建模型中,得到重建图像;基于所述样本图像和所述重建图像调节所述编码模型的模型参数。
示例性的,参见图4,图4是本公开实施例提供的编码模型的训练过程的示意图。基于已训练完成的图像重建模型辅助训练编码模型,将训练数据输入至待训练的编码模型,得到该编码模型输出的训练图像编码,其中,训练数据可以是用于训练图像重建模型的训练数据,此处不做限定。将训练图像编码输入至图像重建模型中,该图像重建模型根据训练图像编码生成重建图像,其中,输入的训练数据为重建图像的理论数据,基于训练数据和重建图像确定损失函数,基于该损失函数对编码模型进行网络参数的调节。此处不限定损失函数的类型。通过迭代地执行上述训练过程,直到满足训练结束条件,确定训练完成的编码模型。
在上述实施例的基础上,图像风格化模型的训练方法包括:基于所述图像重建模型的模型参数对图像风格化模型进行参数初始化处理;基于随机数据和风格化样本图像对待训练的初始化的图像风格化模型和判别网络模型进行训练,得到训练完成的图像风格化模型。
图像风格化模型为生成对抗网络中的生成器,判别网络模型可以是生成对抗网络中的判别器,通过训练数据对生成对抗网络进行训练,在生成对抗网络训练完成的情况下,得到训练完成的图像风格化模型。此处,用于训练图像风格化模型的训练数据可以是风格化图像,该风格化图像可以是从开源可商用数据集中读取的,还可以是通过图像处理软件(Photoshop,PS)修图处理、虚拟人物渲染或者对抗生成网络生成的,此处不做限定。
本实施例中,图像风格化模型与图像重建模型的结构相同,通过将已训练完成的图像风格化模型的网络参数作为图像风格化模型的初始网络参数,即对 图像风格化模型进行参数初始化处理,对初始化后的图像风格化模型进行迭代训练,以得到训练完成的图像风格化模型。在初始化过程对图像风格化模型的网络参数进行赋值,有利于加速图像风格化模型的训练过程,缩短了图像风格化模型的训练时长,同时减少图像风格化模型在训练过程中所需要的训练数据的数量,降低训练数据的设置难度。
示例性的,参见图5,图5是本公开实施例提供的图像风格化模型的训练过程的示意图。图像风格化模型的训练过程与图像重建模型的训练过程相似,通过对初始化处理后的生成器与判别器进行交替训练,直到满足训练结束条件,此处不再赘述。
在上述实施例的基础上,该方法还包括:将所述待处理图像与所述风格化图像作为训练样本中的图像对;基于多个图像对,对端对端的移动端网络模型进行训练,得到端对端的风格化网络模型。可选的,端对端的移动端网络模型可以是包括编码器和解码器,编码器可以是对输入图像进行下采样处理,解码器可以是对上一网络层的输出特征进行上采样处理,此处不限定编码器和解码器分别的网络层数。
相对于风格化处理系统,移动端网络模型结构简单,占用内存小,且运行过程消耗的算力小,适应于配置在诸如手机等移动端设备,实现在移动端设备上对图像进行风格化处理,得到与输入图像中目标区域关联的风格化图像。
示例性的,将待处理图像与经风格化处理系统对待处理图像处理得到的风格化图像作为图像对,其中,将待处理图像作为移动端网络模型的输入数据,风格化图像作为移动端网络模型输出的预测风格化数据的标准数据,用于与预测风格化数据生成损失函数,以调节移动端网络模型的模型参数。迭代执行上述训练过程,以得到具有风格化处理功能的移动端网络模型。
上述作为训练数据的多个图像对中,多个待处理图像的目标区域相同,相应的,训练得到的移动端网络模型能够得到与待处理图像的目标区域关联的风格化图像。在一些实施例中,待处理图像为包括脸部区域的图像,目标区域为脸部区域,风格化图像为与脸部区域关联的风格化图像,相应的,基于上述图像对训练得到的移动端网络模型能够对输入图像进行风格化处理,得到与输入图像的脸部区域相关联的风格化图像。
本实施例提供的技术方案,通过基于风格化处理系统处理的输入图像和输出图像,对移动端网络模型进行训练,以得到适应于移动端应用的移动端网络模型,实现在移动端进行图像风格化处理。
参见图6,图6是本公开实施例提供的一种图像处理方法的流程示意图。在上述实施例的基础上,对上述实施例进行说明,可选的,该方法还包括:提取所述待处理图像中的目标区域,得到目标区域图像;将所述目标区域图像输入至所述风格化处理系统,得到目标区域的局部风格化图像;将与所述目标区域关联的风格化图像与所述局部风格化图像进行图像融合,得到目标风格化图像;相应的,所述展示与所述目标区域关联的风格化图像,包括:展示所述目标风格化图像。参见图6,该方法包括以下步骤。
S210、接收待处理图像和所述待处理图像中目标区域的掩膜图。
S220、基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像。
S230、提取所述待处理图像中的目标区域,得到目标区域图像。
S240、将所述目标区域图像输入至所述风格化处理系统,得到目标区域的局部风格化图像。
S250、将与所述目标区域关联的风格化图像与所述局部风格化图像进行图像融合,得到目标风格化图像。
S260、展示所述目标风格化图像。
本实施例中,通过在待处理图像中分割出目标区域,得到目标区域图像,将目标区域图像作为风格化处理系统的输入图像,对目标区域图像进行处理,得到目标区域风格图,即目标区域的局部风格化图像。目标区域图像对应的掩膜图可以是像素值全部为1的掩膜图。
通过将对待处理图像整体处理得到的风格化图像和对目标区域图像进行局部处理得到的局部风格化图像进行融合,得到目标风格化图像,提高了目标风格化图像中目标区域与待处理图像的目标区域的内容一致性。
将对待处理图像整体处理得到的风格化图像和对目标区域图像进行局部处理得到的局部风格化图像进行融合,可以是将该风格化图像和该局部风格化图像对应像素点进行加权处理,此处,预先设置了进行融合的图像权重。
本公开实施例中步骤S220、与步骤S230-步骤S240可以是顺序执行,也可以是并行执行,此处不作限定。
本实施例提供的技术方案,通过对目标区域形成的局部图像进行风格化处理,得到局部风格化图像,该局部风格化图像不受非目标区域的内容的影响,与待处理图像中目标区域内容的一致性较高,将局部风格化图像与待处理图像对应的整体风格化图像进行融合,得到目标风格化图像,提高目标风格化图像 中目标区域与原始内容的一致性。
参见图7,图7是本公开实施例提供的一种图像处理方法的流程示意图。在上述实施例的基础上,提供了一种应用场景的处理过程。参见图7,该方法包括以下步骤。
S310、接收包括面部区域的待处理图像和所述面部区域的掩膜图。
S320、基于风格化处理系统对所述包括面部区域的待处理图像和所述面部区域的掩膜图进行处理,得到与面部区域关联的风格化图像。
S330、展示与所述面部区域关联的风格化图像。
本实施例中,风格化处理系统中包括编码模型、图像重建模型和图像风格化模型,上述编码模型、图像重建模型和图像风格化模型通过人像图像、人像风格化图像的训练数据得到。相应的,编码模型为人像编码模型,图像重建模型为人像重建模型,图像风格化模型为人像分割化模型。
在接收到包括面部区域的待处理图像的情况下,确定待处理图像中的面部区域,示例性的,可通过面部识别模型(例如还可以是面部分割模型)对待处理图像进行识别,得到待处理图像的面部区域,并得到面部区域的掩膜图,即面部mask。
将待处理图像输入编码模型中,得到图像编码,将图像编码输入至图像重建模型,得到图像重建模型多个网络层的输出特征信息集合,即G1F={g1f1,g1f2,g1f3,…g1fn},n=G1的层数,将图像编码和掩膜图从输入端输入至图像风格化模型,将图像重建模型多个网络层的输出特征信息输入到图像风格化模型的对应网络层,与多个网络层的初始特征信息通过面部mask进行区域加权融合,融合方式为g2fi=a*g1fi*mask+(1-a)*g2fi*mask+g2fi*(1-mask),直到输出风格化图像。通过面部mask将图像重建模型和图像风格化模型得到的面部区域特征进行加权混合,非面部包含头发和背景区域特征,使用图像风格化模型的特征,能够可控调节面部区域风格化程度,同时保持风格化的头发和背景。
对待处理图像提取面部区域,形成面部区域图像,基于风格化处理系统对面部区域图像进行风格化处理,得到面部区域的局部风格化图像,通过面部融合技术,将局部风格化图像与上述对待处理图像整体处理得到的风格化图像进行面部区域融合,得到面部区域与真实面部更加一致的风格图。
示例性的,参见图8,图8是本公开实施例提供的一种待处理图像和风格化图像的对比示意图,图8中左图为待处理图像,右图为经风格化处理系统处理得到的风格化图像。图中人像为经过设备合成的虚拟人像,仅为示例。其中, 待处理图像为包含面部区域的人像图,相应的,目标区域为面部区域,风格化处理系统对图像的转换风格为古风风格。由图8可知,待处理图像中面部区域以外的区域的风格化程度大于面部区域的风格化程度,尤其是背景和头发等区域,转换为古风风格,面部区域在进行古风风格转换的基础上,与原始内容的相似度较高,以使得从风格化图像中可明显识别面部的相似性,避免风格化图像与输入的待处理图像明显不一致的情况。
图9为本公开实施例所提供的一种图像处理装置的结构示意图,如图9所示,所述装置包括:图像接收模块410、图像处理模块420和图像展示模块430。
图像接收模块410,设置为接收待处理图像和所述待处理图像中目标区域的掩膜图;
图像处理模块420,设置为基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;
图像展示模块430,设置为展示与所述目标区域关联的风格化图像。
本公开实施例所提供的技术方案,通过对接收的待处理图像,设置目标区域的掩码图像,为待处理图像的风格化处理过程提供辅助信息,以区分目标区域和非目标区域。预先设置已训练好的具有风格化处理能力的风格化处理系统,基于风格化处理系统对待处理图像和掩膜图进行处理,基于掩膜图对待处理图像中的目标区域和非目标区域进行区分处理,以得到与目标区域关联的风格化图像,该风格化图像兼顾图像风格和目标区域的内容一致性。
在上述实施例的基础上,可选的,所述风格化处理系统包括编码模型、图像重建模型和图像风格化模型,其中,所述编码模型与所述图像重建模型和所述图像风格化模型分别连接,所述图像重建模型和图像风格化模型的对应网络层连接。
在上述实施例的基础上,可选的,图像处理模块420包括:
图像编码确定模块,设置为将所述待处理图像输入至所述编码模型中,得到所述待处理图像的图像编码;
特征信息确定模块,设置为将所述图像编码输入至所述图像重建模型,得到所述图像重建模型对所述图像编码处理过程中网络层的特征信息;
风格化图像确定模块,设置为将所述图像编码和掩膜图输入至所述图像风格化模型的输入端,以及将所述图像重建模型中网络层的特征信息分别输入至所述图像风格化模型的对应网络层,得到与所述目标区域关联的风格化图像。
在上述实施例的基础上,可选的,所述图像风格化模型中的网络层,基于前一网络层输入的特征信息生成当前网络层的初始特征信息,基于掩膜图将当前网络层的初始特征信息和图像重建模型对应网络层输入的特征信息进行融合处理,得到当前网络层的目标特征信息,并将所述当前网络层的目标特征信息输入至下一网络层,直到图像风格化模型的最后一网络层输出与所述目标区域关联的风格化图像。
在上述实施例的基础上,可选的,图像风格化模型中的网络层,基于第一权重组,对所述初始特征信息和图像重建模型对应网络层输入的特征信息中,在对目标区域内的特征信息进行特征融合,得到第一融合特征;基于第二权重组,对所述初始特征信息和图像重建模型对应网络层输入的特征信息中,在对目标区域外的特征信息进行特征融合,得到第二融合特征;以及,基于所述第一融合特征和第二融合特征,得到当前网络层的目标特征信息。
在上述实施例的基础上,可选的,该装置还包括:
面部区域图像提取模块,设置为提取所述待处理图像中的目标区域,得到目标区域图像;
局部风格化图像生成模块,设置为将所述目标区域图像输入至所述风格化处理系统,得到目标区域的局部风格化图像;
图像融合模块,设置为将与所述目标区域关联的风格化图像与所述局部风格化图像进行图像融合,得到目标风格化图像;
图像展示模块430设置为展示所述目标风格化图像。
在上述实施例的基础上,可选的,该装置还包括:
图像重建模型训练模块,设置为基于随机数据和样本图像对待训练的图像重建模型和判别网络模型进行训练,得到训练完成的图像重建模型。
在上述实施例的基础上,可选的,该装置还包括:
编码模型训练模块,设置为迭代执行如下训练过程,直到满足训练条件,得到训练完成的编码模型:将样本图像输入至待训练的编码模型中,得到训练图像编码;将所述训练图像编码输入至已训练完成的图像重建模型中,得到重建图像;基于所述样本图像和所述重建图像调节所述编码模型的模型参数。
在上述实施例的基础上,可选的,该装置还包括:
图像风格化模型训练模块,设置为:基于所述图像重建模型的模型参数对图像风格化模型进行参数初始化处理;基于随机数据和风格化样本图像对待训练的初始化的图像风格化模型和判别网络模型进行训练,得到训练完成的图像 风格化模型。
在上述实施例的基础上,可选的,所述待处理图像为包括面部区域的图像,所述目标区域为面部区域;
图像处理模块420设置为:基于风格化处理系统对所述包括面部区域的待处理图像和所述面部区域的掩膜图进行处理,得到与面部区域关联的风格化图像。
在上述实施例的基础上,可选的,该装置还包括:
移动端模型训练模块,设置为将所述待处理图像与所述风格化图像确定为训练样本中的图像对,并基于多个图像对,对端对端的移动端网络模型进行训练,得到端对端的风格化网络模型。
本公开实施例所提供的图像处理装置可执行本公开任意实施例所提供的图像处理方法,具备执行方法相应的功能模块和效果。
上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
图10为本公开实施例所提供的一种电子设备的结构示意图。下面参考图10,其示出了适于用来实现本公开实施例的电子设备(例如图10中的终端设备或服务器)500的结构示意图。本公开实施例中的终端设备可以包括诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(Portable Android Device,PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字数字电视(television,TV)、台式计算机等等的固定终端。图10示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图10所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(Read-Only Memory,ROM)502中的程序或者从存储装置508加载到随机访问存储器(Random Access Memory,RAM)503中的程序而执行多种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的多种程序和数据。处理装置501、ROM 502以及RAM503通过总线504彼此相连。输入/输出(Input/Output,I/O)接口505也连接至总线504。
以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器 (Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图10示出了具有多装置的电子设备500,但是并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开实施例提供的电子设备与上述实施例提供的图像处理方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的效果。
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例所提供的图像处理方法。
本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质可以包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者 器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
接收待处理图像和所述待处理图像中目标区域的掩膜图;基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;展示与所述目标区域关联的风格化图像。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括LAN或WAN—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。框图和/或流程图中的每个 方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在一种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM、快闪存储器、光纤、便捷式CD-ROM、光学储存设备、磁储存设备、或上述内容的任何合适组合。存储介质可以是非暂态(non-transitory)存储介质。
根据本公开的一个或多个实施例,【示例一】提供了一种图像处理方法,包括:
接收待处理图像和所述待处理图像中目标区域的掩膜图;基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;展示与所述目标区域关联的风格化图像。
根据本公开的一个或多个实施例,【示例二】提供了一种图像处理方法,还包括:
所述风格化处理系统包括编码模型、图像重建模型和图像风格化模型,其中,所述编码模型与所述图像重建模型和所述图像风格化模型分别连接,所述图像重建模型和图像风格化模型的对应网络层连接。
根据本公开的一个或多个实施例,【示例三】提供了一种图像处理方法,还包括:
所述基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到 与所述目标区域关联的风格化图像,包括:将所述待处理图像输入至所述编码模型中,得到所述待处理图像的图像编码;将所述图像编码输入至所述图像重建模型,得到所述图像重建模型对所述图像编码处理过程中网络层的特征信息;将所述图像编码和掩膜图输入至所述图像风格化模型的输入端,以及将所述图像重建模型中网络层的特征信息分别输入至所述图像风格化模型的对应网络层,得到与所述目标区域关联的风格化图像。
根据本公开的一个或多个实施例,【示例四】提供了一种图像处理方法,还包括:
所述图像风格化模型的网络层,基于所述图像编码或者前一网络层输出的目标特征信息生成当前网络层的初始特征信息,基于掩膜图将当前网络层的初始特征信息和图像重建模型对应网络层输入的特征信息进行融合处理,得到当前网络层的目标特征信息,并将所述当前网络层的目标特征信息输入至下一网络层,直到图像风格化模型的最后一网络层输出与所述目标区域关联的风格化图像。
根据本公开的一个或多个实施例,【示例五】提供了一种图像处理方法,还包括:图像风格化模型中的网络层,基于第一权重组,对所述初始特征信息和图像重建模型对应网络层输入的特征信息中,在对目标区域内的特征信息进行特征融合,得到第一融合特征;基于第二权重组,对所述初始特征信息和图像重建模型对应网络层输入的特征信息中,在对目标区域外的特征信息进行特征融合,得到第二融合特征;以及,基于所述第一融合特征和第二融合特征,得到当前网络层的目标特征信息。
根据本公开的一个或多个实施例,【示例六】提供了一种图像处理方法,还包括:
所述方法还包括:提取所述待处理图像中的目标区域,得到目标区域图像;将所述目标区域图像输入至所述风格化处理系统,得到目标区域的局部风格化图像;将与所述目标区域关联的风格化图像与所述局部风格化图像进行图像融合,得到目标风格化图像;
相应的,所述展示与所述目标区域关联的风格化图像,包括:展示所述目标风格化图像。
根据本公开的一个或多个实施例,【示例七】提供了一种图像处理方法,还包括:
所述图像重建模型的训练过程,包括:基于随机数据和样本图像对待训练的图像重建模型和判别网络模型进行训练,得到训练完成的图像重建模型。
根据本公开的一个或多个实施例,【示例八】提供了一种图像处理方法,还包括:
所述编码模型的训练过程包括:迭代执行如下训练过程,直到满足训练条件,得到训练完成的编码模型:将样本图像输入至待训练的编码模型中,得到训练图像编码;将所述训练图像编码输入至已训练完成的图像重建模型中,得到重建图像;基于所述样本图像和所述重建图像调节所述编码模型的模型参数。
根据本公开的一个或多个实施例,【示例九】提供了一种图像处理方法,还包括:所述图像风格化模型的训练方法包括:基于所述图像重建模型的模型参数对图像风格化模型进行参数初始化处理;基于随机数据和风格化样本图像对待训练的初始化的图像风格化模型和判别网络模型进行训练,得到训练完成的图像风格化模型。
根据本公开的一个或多个实施例,【示例十】提供了一种图像处理方法,还包括:所述待处理图像为包括面部区域的图像,所述目标区域为面部区域;
所述基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像,包括:基于风格化处理系统对所述包括面部区域的待处理图像和所述面部区域的掩膜图进行处理,得到与面部区域关联的风格化图像。
根据本公开的一个或多个实施例,【示例十一】提供了一种图像处理方法,还包括:
所述方法还包括:将所述待处理图像与所述风格化图像确定为训练样本中的图像对,并基于多个图像对,对端对端的移动端网络模型进行训练,得到端对端的风格化网络模型。
根据本公开的一个或多个实施例,【示例十二】提供了一种图像处理装置,包括:
图像接收模块,设置为接收待处理图像和所述待处理图像中目标区域的掩膜图;
图像处理模块,设置为基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;
图像展示模块,设置为展示与所述目标区域关联的风格化图像。
此外,虽然采用特定次序描绘了多个操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和 并行处理可能是有利的。同样地,虽然在上面论述中包含了多个实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的一些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (14)

  1. 一种图像处理方法,包括:
    接收待处理图像和所述待处理图像中目标区域的掩膜图;
    基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;
    展示与所述目标区域关联的风格化图像。
  2. 根据权利要求1所述的方法,其中,所述风格化处理系统包括编码模型、图像重建模型和图像风格化模型,其中,所述编码模型与所述图像重建模型和所述图像风格化模型分别连接,所述图像重建模型和所述图像风格化模型的对应网络层连接。
  3. 根据权利要求2所述的方法,其中,所述基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像,包括:
    将所述待处理图像输入至所述编码模型中,得到所述待处理图像的图像编码;
    将所述图像编码输入至所述图像重建模型,得到所述图像重建模型对所述图像编码处理过程中网络层的特征信息;
    将所述图像编码和所述掩膜图输入至所述图像风格化模型的输入端,以及将所述图像重建模型中网络层的特征信息分别输入至所述图像风格化模型的对应网络层,得到与所述目标区域关联的风格化图像。
  4. 根据权利要求3所述的方法,其中,所述图像风格化模型中的网络层,基于所述图像编码或者前一网络层输出的目标特征信息生成当前网络层的初始特征信息,基于所述掩膜图将所述当前网络层的初始特征信息和所述图像重建模型对应网络层输入的特征信息进行融合处理,得到所述当前网络层的目标特征信息,并将所述当前网络层的目标特征信息输入至下一个网络层,直到所述图像风格化模型的最后一个网络层输出与所述目标区域关联的风格化图像。
  5. 根据权利要求4所述的方法,其中,所述图像风格化模型中的网络层,基于第一权重组,对所述初始特征信息和所述图像重建模型对应网络层输入的特征信息中,分别在所述目标区域内的特征信息进行特征融合,得到第一融合特征;基于第二权重组,对所述初始特征信息和所述图像重建模型对应网络层输入的特征信息中,分别在所述目标区域外的特征信息进行特征融合,得到第二融合特征;以及,基于所述第一融合特征和所述第二融合特征,得到所述当前网络层的目标特征信息。
  6. 根据权利要求1所述方法,还包括:
    提取所述待处理图像中的目标区域,得到目标区域图像;
    将所述目标区域图像输入至所述风格化处理系统,得到所述目标区域的局部风格化图像;
    将与所述目标区域关联的风格化图像与所述局部风格化图像进行图像融合,得到目标风格化图像;
    所述展示与所述目标区域关联的风格化图像,包括:
    展示所述目标风格化图像。
  7. 根据权利要求2所述的方法,其中,所述图像重建模型的训练过程,包括:
    基于随机数据和样本图像对待训练的图像重建模型和判别网络模型进行训练,得到训练完成的图像重建模型。
  8. 根据权利要求2所述的方法,其中,所述编码模型的训练过程包括:
    迭代执行如下训练过程,直到满足训练条件,得到训练完成的编码模型:
    将样本图像输入至待训练的编码模型中,得到训练图像编码;
    将所述训练图像编码输入至已训练完成的图像重建模型中,得到重建图像;
    基于所述样本图像和所述重建图像调节所述编码模型的模型参数。
  9. 根据权利要求2所述的方法,其中,所述图像风格化模型的训练方法包括:
    基于所述图像重建模型的模型参数对所述图像风格化模型进行参数初始化处理;
    基于随机数据和风格化样本图像对待训练的初始化的图像风格化模型和判别网络模型进行训练,得到训练完成的图像风格化模型。
  10. 根据权利要求1所述的方法,其中,所述待处理图像为包括面部区域的图像,所述目标区域为面部区域;
    所述基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像,包括:
    基于所述风格化处理系统对所述包括面部区域的待处理图像和所述面部区域的掩膜图进行处理,得到与所述面部区域关联的风格化图像。
  11. 根据权利要求1所述的方法,还包括:
    将所述待处理图像与所述风格化图像确定为训练样本中的图像对,并基于多个图像对,对端对端的移动端网络模型进行训练,得到端对端的风格化网络模型。
  12. 一种图像处理装置,包括:
    图像接收模块,设置为接收待处理图像和所述待处理图像中目标区域的掩膜图;
    图像处理模块,设置为基于风格化处理系统对所述待处理图像和所述掩膜图进行处理,得到与所述目标区域关联的风格化图像;
    图像展示模块,设置为展示与所述目标区域关联的风格化图像。
  13. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-11中任一项所述的图像处理方法。
  14. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-11中任一项所述的图像处理方法。
PCT/CN2023/097319 2022-06-02 2023-05-31 图像处理方法、装置、存储介质及电子设备 WO2023232056A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210625667.5A CN114913061A (zh) 2022-06-02 2022-06-02 一种图像处理方法、装置、存储介质及电子设备
CN202210625667.5 2022-06-02

Publications (1)

Publication Number Publication Date
WO2023232056A1 true WO2023232056A1 (zh) 2023-12-07

Family

ID=82771482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097319 WO2023232056A1 (zh) 2022-06-02 2023-05-31 图像处理方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN114913061A (zh)
WO (1) WO2023232056A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913061A (zh) * 2022-06-02 2022-08-16 北京字跳网络技术有限公司 一种图像处理方法、装置、存储介质及电子设备
CN116862757A (zh) * 2023-05-19 2023-10-10 上海任意门科技有限公司 一种控制人脸风格化程度的方法、装置、电子设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150170002A1 (en) * 2013-05-31 2015-06-18 Google Inc. Object detection using deep neural networks
CN109712068A (zh) * 2018-12-21 2019-05-03 云南大学 用于葫芦烙画的图像风格迁移与模拟方法
CN112424834A (zh) * 2018-08-01 2021-02-26 Oppo广东移动通信有限公司 用于图像处理的方法和设备
CN114913061A (zh) * 2022-06-02 2022-08-16 北京字跳网络技术有限公司 一种图像处理方法、装置、存储介质及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150170002A1 (en) * 2013-05-31 2015-06-18 Google Inc. Object detection using deep neural networks
CN112424834A (zh) * 2018-08-01 2021-02-26 Oppo广东移动通信有限公司 用于图像处理的方法和设备
CN109712068A (zh) * 2018-12-21 2019-05-03 云南大学 用于葫芦烙画的图像风格迁移与模拟方法
CN114913061A (zh) * 2022-06-02 2022-08-16 北京字跳网络技术有限公司 一种图像处理方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN114913061A (zh) 2022-08-16

Similar Documents

Publication Publication Date Title
WO2023232056A1 (zh) 图像处理方法、装置、存储介质及电子设备
CN111476871B (zh) 用于生成视频的方法和装置
WO2023125361A1 (zh) 文字生成方法、装置、电子设备及存储介质
US20230326248A1 (en) Expression transformation method and apparatus, electronic device, and computer readable medium
CN112562019A (zh) 图像色彩调整方法及装置、计算机可读介质和电子设备
WO2022100680A1 (zh) 混血人脸图像生成方法、模型训练方法、装置和设备
WO2023138560A1 (zh) 风格化图像生成方法、装置、电子设备及存储介质
CN110796721A (zh) 虚拟形象的颜色渲染方法、装置、终端及存储介质
WO2023072015A1 (zh) 人物风格形象图的生成方法、装置、设备及存储介质
WO2023051244A1 (zh) 图像生成方法、装置、设备及存储介质
EP4120181A2 (en) Method and apparatus of fusing image, and method of training image fusion model
WO2023197648A1 (zh) 截图处理方法及装置、电子设备和计算机可读介质
WO2023185671A1 (zh) 风格图像生成方法、装置、设备及介质
WO2023273697A1 (zh) 图像处理方法、模型训练方法、装置、电子设备及介质
CN112581635A (zh) 一种通用的快速换脸方法、装置、电子设备和存储介质
CN112785669B (zh) 一种虚拟形象合成方法、装置、设备及存储介质
WO2024037556A1 (zh) 图像处理方法、装置、设备及存储介质
CN114693876A (zh) 数字人生成方法、装置、存储介质和电子设备
WO2024027819A1 (zh) 图像处理方法、装置、设备及存储介质
WO2023207779A1 (zh) 图像处理方法、装置、设备及介质
WO2024041235A1 (zh) 图像处理方法、装置、设备、存储介质及程序产品
WO2023202543A1 (zh) 文字处理方法、装置、电子设备及存储介质
WO2023143118A1 (zh) 图像处理方法、装置、设备及介质
WO2023138441A1 (zh) 视频生成方法、装置、设备及存储介质
WO2023098649A1 (zh) 视频生成方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23815234

Country of ref document: EP

Kind code of ref document: A1