CN116229054A - Image processing method and device and electronic equipment - Google Patents

Image processing method and device and electronic equipment Download PDF

Info

Publication number
CN116229054A
CN116229054A CN202211594249.0A CN202211594249A CN116229054A CN 116229054 A CN116229054 A CN 116229054A CN 202211594249 A CN202211594249 A CN 202211594249A CN 116229054 A CN116229054 A CN 116229054A
Authority
CN
China
Prior art keywords
image
target
original image
class object
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211594249.0A
Other languages
Chinese (zh)
Inventor
朱渊略
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202211594249.0A priority Critical patent/CN116229054A/en
Publication of CN116229054A publication Critical patent/CN116229054A/en
Priority to PCT/CN2023/134020 priority patent/WO2024125267A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Color Image Communication Systems (AREA)
  • Facsimile Image Signal Circuits (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides an image processing method, an image processing device and electronic equipment, wherein a specific implementation mode of the method comprises the following steps: acquiring an original image to be processed; the original image comprises a first class object; the first class object corresponds to a first region in the original image; determining a mask image corresponding to the original image based on the first region; dividing the original image to obtain a divided image comprising at least part of preset category semantics; obtaining a target image based on the original image, the mask image and the segmentation image; the target image includes a second class object transformed from the first class object in the original image. According to the embodiment, the edge of the target object converted from the appointed object in the target image is clearer, the texture details are richer, the image is more real, the display effect is improved, and the problem of abnormal color is solved.

Description

Image processing method and device and electronic equipment
Technical Field
The disclosure relates to the technical field of image processing, and in particular relates to an image processing method, an image processing device and electronic equipment.
Background
Artificial intelligence technology is increasingly used in the field of images, and is generally used to convert a specified object in an original image into a target object, so as to obtain a target image including the target object. At present, in a target image obtained by processing an original image by adopting a related technology, the problems of unclear edge of a target object, poor display effect of a processed area, abnormal color, missing texture details and the like exist. Therefore, a scheme capable of converting a specified object in an original image into a target object is required.
Disclosure of Invention
The disclosure provides an image processing method, an image processing device and electronic equipment.
According to a first aspect, there is provided an image processing method, the method comprising:
acquiring an original image to be processed; the original image comprises a first class object; the first class object corresponds to a first region in the original image;
determining a mask image corresponding to the original image based on the first region;
dividing the original image to obtain a divided image comprising at least part of preset category semantics;
obtaining a target image based on the original image, the mask image and the segmentation image; the target image includes a second class object transformed from the first class object in the original image.
According to a second aspect, there is provided an image processing apparatus, the method comprising:
the acquisition module is used for acquiring an original image to be processed; the original image comprises a first object of a first category; the first class object corresponds to a first region in the original image;
the determining module is used for determining a mask image corresponding to the original image based on the first area;
the segmentation module is used for carrying out segmentation processing on the original image to obtain a segmented image comprising at least part of preset category semantics;
the processing module is used for obtaining a target image based on the original image, the mask image and the segmentation image; the target image includes a second class object transformed from the first class object in the original image.
According to a third aspect, there is provided a computer readable storage medium storing a computer program which when executed by a processor implements the method of any one of the first aspects.
According to a fourth aspect, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the first aspects when executing the program.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:
the embodiment of the disclosure provides an image processing method and device, which combine a mask image and a segmentation image corresponding to an original image, and convert a first class object appointed in the original image into a second class object to obtain a target image. Because the specified object in the original image is converted, only the partial area corresponding to the specified object needs to be changed greatly. Therefore, the edge of the target object converted from the appointed object in the target image is clearer, the texture details are richer, the image is more real, the display effect is improved, and the problem of abnormal color is solved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of a scenario of an image processing illustrated in accordance with an exemplary embodiment of the present disclosure;
FIG. 2 is a training scenario schematic of an image processing model shown in accordance with an exemplary embodiment of the present disclosure;
FIG. 3 is a flowchart of an image processing method according to an exemplary embodiment of the present disclosure;
FIG. 4 is a flowchart of a training method of an image processing model according to an exemplary embodiment of the present disclosure;
FIG. 5 is a block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure;
FIG. 6 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure;
FIG. 7 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure;
fig. 8 is a schematic diagram of a storage medium provided by some embodiments of the present disclosure.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
Artificial intelligence technology is increasingly used in the field of images, and is generally used to convert a specified object in an original image into a target object, so as to obtain a target image including the target object. For example, the hairstyle of the person in the person image is changed (e.g., long hair is changed to short hair), or the person in the person image is changed, etc. At present, in a target image obtained by processing an original image by adopting a related technology, the problems of unclear edge of a target object, poor display effect of a processed area, abnormal color, missing texture details and the like exist. Therefore, a scheme capable of converting a specified object in an original image into a target object is required.
According to the image processing scheme, a first class object appointed in an original image is converted into a second class object by combining a mask image and a segmentation image corresponding to the original image, so that a target image is obtained. Because the specified object in the original image is converted, only the partial area corresponding to the specified object needs to be changed greatly. Therefore, the edge of the target object in the target image is clearer, the texture details are richer, the image is more true, the display effect is improved, and the problem of abnormal color is solved.
Referring to fig. 1, a schematic view of a scene of an image processing is shown according to an exemplary embodiment. The scheme of the present disclosure is schematically illustrated below with reference to fig. 1 in conjunction with a complete specific application example. The application example describes a specific procedure of image processing.
As shown in fig. 1, the original image A1 is an image to be processed, the original image A1 includes an object m, and a mask image B1 corresponding to the original image A1 may be generated according to an area S corresponding to the object m in the original image A1. For example, the value of the pixel corresponding to the region S may be set to 1, and the value of the pixel of the region S' other than the region S may be set to 0. Meanwhile, semantic analysis can be performed on the original image A1 to obtain a segmented image C1 corresponding to the original image A1. The original image A1, the mask image B1, and the divided image C1 are subjected to a merging process, for example, a stacking process, to obtain merged data D1. The channel number of the merged data D1 is the sum of the channel numbers of the original image A1, the mask image B1, and the divided image C1. For example, if the number of channels of the original image A1 is a, the number of channels of the mask image B1 is B, and the number of channels of the divided image C1 is C, the number of channels of the merged data D1 is a+b+c.
The combined data D1 may then be input into an image processing network, which processes the multi-channel combined data D1. When the merged data D1 is processed, the same/different convolution kernels may be used in the shallow layer of the image processing network to perform convolution processing on each channel of the merged data D1, and the convolution processing results corresponding to the channels may be added. Therefore, the information of the mask image B1 and the divided image C1 can play a role in guiding the processing of the image.
Finally, the image processing network may output image data having a+b+c channels, where data having a channels can constitute the target image A2, data having B channels can constitute the mask image B2, and data having the remaining C channels can constitute the divided image C2. The object image A2 includes an object n, which is converted from the object m included in the original image A1. Wherein the mask image B2 and the divided image C2 are both for the object n, corresponding to the target image A2. The target image A2 can be acquired from image data of which the channel number is a+b+c output from the image processing network.
Referring to fig. 2, a training scenario diagram of an image processing model is shown according to an exemplary embodiment. The scheme of the present disclosure is schematically illustrated below with reference to fig. 2 in conjunction with a complete specific application example. The application example describes a training process for a specific image processing model.
As shown in fig. 2, first, a sample image A3, a sample mask image B3 corresponding to the sample image A3, a sample divided image C3 and a label image A5, and a label mask image B5 and a label divided image C5 corresponding to the label image A5 may be acquired from a training set created in advance. And combining the sample image A3, the sample mask image B3 and the sample segmentation image C3 to obtain combined data D2. The number of channels of the merged data D2 is the sum of the number of channels of the sample image A3, the sample mask image B3, and the sample division image C3. For example, if the number of channels of the sample image A3 is a, the number of channels of the sample mask image B3 is B, and the number of channels of the sample divided image C3 is C, the number of channels of the merged data D2 is a+b+c. The combined data D2 may be input into an image processing network to be trained, which processes the multi-channel combined data D2.
Then, the image processing network to be trained may output image data of the number of channels a+b+c, where data of a channels can constitute a predicted image A4, data of B channels can constitute a predicted mask image B4, and data of the remaining C channels can constitute a predicted divided image C4. The predicted image A4, the predicted mask image B4, and the predicted divided image C4 may be obtained from image data of which the number of channels output by the image processing network to be trained is a+b+c. And inputs the predicted image A4 and the tag image A5 to a discriminator P to be trained, respectively, which can be used to discriminate the true or false of the predicted image A4 and the tag image A5.
Finally, based on the results output by the prediction image A4, the prediction mask image B4, the prediction segmentation image C4 and the discriminator P, the respective network parameters of the image processing network to be trained and the discriminator P may be adjusted in turn with reference to the label image A5, the label mask image B5 and the label segmentation image C5. Specifically, in the training stage of the arbiter, the network parameters of the arbiter P can be adjusted based on the result output by the arbiter P, so that the judging result of the image true or false is more and more accurate.
In the training stage of the image processing network, the prediction loss may be determined based on the predicted image A4, the predicted mask image B4, the predicted divided image C4, the result output by the discriminator P, the label image A5, the label mask image B5, and the label divided image C5. The predicted loss may be composed of loss terms L1, L2, L3, L4, L5, L6, L7, etc., and network parameters of the image processing network are adjusted with the aim of reducing the predicted loss.
The loss terms L1, L2 and L3 are determined based on the prediction image A4 and the label image A5, where the loss term L1 can represent a difference of image features between the prediction image A4 and the label image A5, and specifically may be determined based on an average absolute error of pixel values of pixels of the prediction image A4 and the label image A5. The loss term L2 can represent a difference in visual perception between the predicted image A4 and the tab image A5, and the loss term L3 can represent a difference in image style between the predicted image A4 and the tab image A5. Specifically, the predicted image A4 and the label image A5 may be input to a convolutional neural network trained in advance, to obtain two feature maps, and the difference between the two feature maps is calculated to obtain the loss term L2. And obtaining a loss term L3 by calculating the difference of the corresponding Lagrangian matrixes of the two feature graphs.
The loss term L4 is determined based on the difference of the target area M in the predicted image A4 and the label image A5, where the target area M is an area where there is a difference in the sample mask image B3 and the label mask image B5. The loss term L4 can represent the difference in color of the target area M between the predicted image A4 and the label image A5.
The loss term L5 is determined based on the prediction mask map B4 and the label mask image B5, and may be a weighted sum of the two kinds of cross entropy loss and the region mutual information loss between the prediction mask map B4 and the label mask image B5. The loss term L6 is determined based on the prediction-divided image C4 and the label-divided image C5, and may be a weighted sum of a two-class cross entropy loss and a region mutual information loss between the prediction-divided image C4 and the label-divided image C5. The loss term L7 is determined based on the result output from the discriminator P, and may be a loss term indicating that the network loss is to be countered. It will be appreciated that the predicted loss may also include other loss terms, as the embodiment is not limited in this respect.
The present disclosure will be described in detail with reference to specific embodiments.
Fig. 3 is a flowchart illustrating an image processing method according to an exemplary embodiment. The execution body of the method may be implemented as any device, platform, server or cluster of devices having computing, processing capabilities. The method comprises the following steps:
As shown in fig. 3, in step 301, an original image to be processed is acquired.
In this embodiment, the original image is an image to be processed, and the original image includes a first class object. The processing of the original image according to the present embodiment may be converting a first class object included in the original image into a second class object. For example, in one scenario, the original image may be a person image, the object to be converted in the original image may be a person's hair, and the first type of object may be, for example, a long hair of a person in the original image. The processing of the original image may be converting long hairs included in the original image into short hairs, resulting in the target image. Wherein the target image includes a short transmission of the person after the conversion (i.e., the second class object).
For another example, in another scenario, the original image may be a whole-body image of a person, the object to be converted in the original image may be clothing of the person, and the first type of object may be, for example, a long skirt of the person in the original image. The processing of the original image may be converting a long skirt included in the original image into a short skirt, to obtain the target image. Wherein the target image may include a skirt of the person after the conversion (i.e., the second class object). It can be appreciated that the present solution may also be applied in other scenarios, and the embodiment is not limited to a specific application scenario aspect.
In step 302, a mask image is determined based on a corresponding first region of the first class object in the original image.
In this embodiment, the first class object corresponds to the first region in the original image, and a mask image corresponding to the original image may be generated according to the first region, where the mask image may be a binary image with a single channel. For example, the value of the pixel corresponding to the first region may be set to 1 or 255, and the value of the pixel of the other region than the first region may be set to 0. Therefore, in a scene where the object to be converted is human hair, the mask image corresponding to the original image exhibits an effect that the hair area is white and the other areas are black. In the scene that the object to be converted is the clothing of the person, the mask image corresponding to the original image presents the effect that the wearing area of the person is white, and other areas are black.
In step 303, the original image is subjected to a segmentation process to obtain a segmented image.
In this embodiment, for example, semantic segmentation processing may be performed on the original image, to obtain a segmented image including at least part of the preset category semantics. Alternatively, the original image may be processed using a semantic segmentation model that is trained based on images that include preset categories of semantics. For example, the semantics of n preset categories may be set in advance according to the scene and the need, and then the semantic segmentation model may be trained based on the semantics of the n preset categories and the image including the semantics of the n preset categories. And carrying out semantic segmentation processing on the original image by using the semantic segmentation model, wherein the obtained semantic image comprises any plurality of the n preset categories of semantics. For example, in a scenario where the object to be converted is human hair, the semantics of the n preset categories may include, but are not limited to: background, hair, face, clothing, body skin, hands, glasses, hats, etc.
In step 304, a target image is obtained based on the original image, the mask image, and the divided image.
In this embodiment, the first-class object included in the original image may be converted into the second-class object based on the original image, the mask image, and the divided image, to obtain the target image. Alternatively, the original image, the mask image and the segmented image may be converted using a pre-trained target model to obtain a target image. The target model may be a deep convolutional neural network model, or alternatively, the target model may be a model of a u2ne type. The target model may be trained by conventional supervised learning, or may be trained by a supervised antagonistic generation network, as will be appreciated that the embodiment is not limited in this respect.
Specifically, the original image, the mask image and the divided image may be combined to obtain combined data, so that the number of channels of the combined data is the sum of the number of channels of the original image, the mask image and the divided image. The merging process may be a process (such as stacking process) of merging based on channels. For example, the original image may be a 3-channel RGB image, the mask image may be a 1-channel binary image, the split image may be an n-channel semantic split image (n is the number of preset semantic categories), and the 4+n-channel merged data may be obtained after the merging process. After the merging process, the information of each channel is not changed. The combined data may be input to a target model for conversion processing to obtain a target image.
According to the image processing method, a first class object appointed in an original image is converted into a second class object by combining a mask image and a segmentation image corresponding to the original image, so that a target image is obtained. Because the specified object in the original image is converted, only the partial area corresponding to the specified object needs to be changed greatly. Therefore, the edge of the target object converted from the appointed object in the target image is clearer, the texture details are richer, the image is more real, the display effect is improved, and the problem of abnormal color is solved.
FIG. 4 is a flowchart illustrating a method of training an image processing model, according to an exemplary embodiment. The model may be the object model for processing the image as referred to in the embodiment of fig. 3. The execution body of the method may be implemented as any device, platform, server or cluster of devices having computing, processing capabilities. The method may comprise iteratively performing the steps of:
As shown in fig. 4, in step 401, a sample image, a sample mask image, a sample segmentation image, a label mask image, and a label segmentation image are acquired.
In this embodiment, a large number of candidate images for training may be acquired in advance, each of the candidate images includes the first class object, and each of the candidate images may be acquired as a respective candidate tag image. The alternative label image corresponding to any alternative image comprises a second class object converted by the first class object in the alternative image. For example, in a scenario where the object to be converted is a human hair, if the first class object is a human long hair and the second class object is a human short hair, the human long hair is included in each candidate image, and the human short hair converted from the human long hair in the candidate image is included in the candidate label image corresponding to each candidate image. For another example, in a scene that the object to be converted is a person garment, if the first class object is a person skirt and the second class object is a person skirt, each candidate image includes a person skirt, and the candidate tag image corresponding to each candidate image includes a person skirt converted from the person skirt in the candidate image.
It will be appreciated that the alternative label image corresponding to the alternative image may be obtained in any reasonable manner, for example, the alternative image may be manually modified by a person to obtain the alternative label image corresponding to the alternative image. The alternative image can also be processed through other models with poor similar function and effect, and then manually modified to obtain an alternative label image corresponding to the alternative image. The embodiment is not limited in terms of a specific acquisition mode of the alternative tag image.
Then, a mask image and a divided image corresponding to each of the candidate images are acquired (the acquisition manner may refer to the acquisition manner of the mask image and the divided image corresponding to the original image in the embodiment of fig. 3), and a mask image and a divided image corresponding to each of the candidate tag images are acquired. Then, the training set is constructed by using the alternative image, the mask image corresponding to the alternative image, the segmentation image, the alternative label image, and the mask image and the segmentation image corresponding to the alternative label image.
When the target model is trained, the candidate image can be taken out from the training set as a sample image, a sample mask image, a sample segmentation image and a label image corresponding to the sample image are obtained, and a label mask image and a label segmentation image corresponding to the label image are obtained. Wherein the sample image includes a first class object, the first class object corresponding to a second region in the sample image, and the sample mask image is determined based on the second region. The label image is an image obtained by converting a first class object in the sample image into a second class object, the second class object corresponds to a third area in the label image, and the label mask image is determined based on the third area.
In step 402, a predicted image, a predicted mask image, and a predicted segmented image are obtained based on the sample image, the sample mask image, the sample segmented image, and the target model to be trained.
In this embodiment, the sample image, the sample mask image, and the sample division image may be subjected to a merging process (such as a stacking process) to obtain merged sample data. The channel number of the merged sample data is the sum of the channel numbers of the sample image, the sample mask image and the sample segmentation image. And then, inputting the combined sample data into a target model to be trained, wherein the target model to be trained can output a predicted image, a predicted mask image corresponding to the predicted image and a predicted segmentation image. Wherein the predicted image includes a second class object converted from the first class object in the sample image.
The second-class object included in the label image and the second-class object included in the predicted image are not completely identical, although they are converted from the first-class object included in the sample image. And the label image is a reference image for comparison with the predicted image, so that the display effect is better.
In step 403, a prediction loss is determined based on the prediction image, the prediction mask image, the prediction segmentation image, the label mask image, and the label segmentation image, and in step 404, model parameters of the target model to be trained are adjusted with the goal of reducing the prediction loss.
In the present embodiment, the prediction loss may be determined based on the prediction image, the prediction mask image, the prediction division image, the label mask image, and the label division image. Wherein the predicted loss may include a first loss term, a second loss term, and a third loss term. The first penalty term is determined based on the prediction image and the label image, and may be a penalty term representing an image feature difference between the prediction image and the label image, and may be specifically determined based on an average absolute error of the prediction image and the label image. The second penalty term is determined based on the prediction mask map and the label mask image and may be a weighted sum of the two types of cross entropy penalty and the region mutual information penalty between the prediction mask map and the label mask image. The third loss term is determined based on the prediction cut image and the label cut image, and may be a weighted sum of a two-class cross entropy loss and a region mutual information loss between the prediction cut image and the label cut image.
Optionally, the prediction loss may further include a fourth loss term, which is determined based on a difference of the target region in the prediction image and the label image. The target area is an area with differences between the sample mask image and the label mask image. Specifically, the target region may be determined by calculating the absolute value of the difference between the pixel values of the respective pixel points of the sample mask image and the tag mask image. Because the fourth loss term for guiding the parameter adjustment of the model is obtained based on the target area, the target model obtained through training is more accurate and more targeted in processing the specified object in the original image.
Alternatively, the fourth penalty term may be determined by: specifically, a first difference value of the target region in the first color space in the prediction image and the label image may be determined, a second difference value of the target region in the second color space in the prediction image and the label image may be determined, and a fourth loss term may be determined based on a weighted sum of the first difference value and the second difference value. The first color space may be, for example, an RGB color space, and the second color space may be, for example, a LAB color space. The first difference value may be an average absolute error of the target region in the RGB color space in the prediction image and the label image, and the second difference value may be an average absolute error of the target region in the LAB color space in the prediction image and the label image. The corresponding weight of the first difference value is greater than the corresponding weight of the second difference value. According to the method and the device, the loss term used for guiding the model parameter adjustment is obtained based on the weighted sum of the difference values of the target area in the predicted image and the label image in different color spaces, so that the color effect of the image obtained by processing the original image by the target model obtained through training is better.
Further alternatively, the target model may be trained by a supervised generative countermeasure network, and therefore, it is necessary to input the predicted image and the tag image into the discriminator to be trained, and adjust the discriminator based on the result output by the discriminator. And, a fifth loss term included in the predicted loss may also be determined based on the result output by the arbiter, and the fifth loss term may be a loss term for representing generation of the countermeasure network loss. It will be appreciated that the predicted loss may also include other loss terms, as the embodiment is not limited in this respect.
In the process of training the target model, the embodiment combines the label image, the label mask image and the label segmentation image corresponding to the sample image to obtain the prediction loss. And training the target model based on the prediction loss, so that the process of converting the appointed object in the original image by the target model is more targeted, the edge of the target object in the target image is clearer, the texture details are richer, the image is more real, the display effect is improved, and the problem of color abnormity is solved.
It should be noted that while in the above embodiments, the operations of the methods of the embodiments of the present disclosure are described in a particular order, this does not require or imply that the operations must be performed in that particular order or that all of the illustrated operations be performed in order to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
Corresponding to the foregoing image processing method embodiments, the present disclosure also provides embodiments of an image processing apparatus.
As shown in fig. 5, fig. 5 is a block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure, which may include: an acquisition module 501, a determination module 502, a segmentation module 503 and a processing module 504.
The acquiring module 501 is configured to acquire an original image to be processed, where the original image includes a first class object, and the first class object corresponds to a first area in the original image.
A determining module 502, configured to determine, based on the first area, a mask image corresponding to the original image.
The segmentation module 503 is configured to perform segmentation processing on the original image, so as to obtain a segmented image that includes at least part of preset category semantics.
A processing module 504, configured to obtain a target image based on the original image, the mask image and the segmented image. The target image includes a second class object transformed from the first class object in the original image.
In some implementations, the segmentation module 503 is configured to: and processing the original image by adopting a semantic segmentation model to obtain a segmented image. The semantic segmentation model is trained based on images comprising preset category semantics.
In other embodiments, the processing module 504 may include: a conversion sub-module (not shown).
The conversion sub-module is used for converting the original image, the mask image and the segmentation image by using the target model to obtain a target image. The conversion process is a process of converting the first-class object into the second-class object.
In other embodiments, the conversion sub-module is configured to: and combining the original image, the mask image and the segmentation image to obtain combined data. The channel number of the combined data is the sum of the channel numbers of the original image, the mask image and the divided image, and the combined division is input into the target model for the conversion processing to obtain the target image.
In other embodiments, the target model is trained by iteratively performing the steps of: a sample image and a label image are acquired, wherein a second area in the sample image corresponds to the first class object and a third area in the label image corresponds to the second class object. And obtaining a predicted image based on the sample image and the target model to be trained, and determining the prediction loss based on the predicted image and the label image. The predicted loss includes a fourth loss term that is determined based on a difference in the predicted image and the label image of the target region. The target area is determined based on the second area and the third area, and model parameters of a target model to be trained are adjusted with the aim of reducing prediction loss.
In other embodiments, the fourth loss term may be determined by: determining a first difference value of the target region in the predicted image and the label image in the first color space, determining a second difference value of the target region in the predicted image and the label image in the second color space, and determining a fourth loss term based on a weighted sum of the first difference value and the second difference value.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the embodiments of the present disclosure. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Fig. 6 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure. As shown in fig. 6, the electronic device 910 includes a processor 911 and memory 912, which may be used to implement a client or server. Memory 912 is used to non-transitory store computer-executable instructions (e.g., one or more computer program modules). The processor 911 is operable to execute computer-executable instructions that, when executed by the processor 911, perform one or more steps of the image processing methods described above, thereby implementing the image processing methods described above. The memory 912 and the processor 911 may be interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, the processor 911 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capabilities and/or program execution capabilities. For example, the Central Processing Unit (CPU) may be an X86 or ARM architecture, or the like. The processor 911 may be a general-purpose processor or a special-purpose processor that can control other components in the electronic device 910 to perform desired functions.
For example, memory 912 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules may be stored on the computer-readable storage medium and executed by the processor 911 to implement various functions of the electronic device 910. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the computer readable storage medium.
It should be noted that, in the embodiments of the present disclosure, specific functions and technical effects of the electronic device 910 may refer to the above description about the image processing method, which is not repeated herein.
Fig. 7 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. The electronic device 920 is suitable, for example, for implementing the image processing method provided by the embodiments of the present disclosure. The electronic device 920 may be a terminal device or the like, and may be used to implement a client or a server. The electronic device 920 may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), wearable electronic devices, and the like, and stationary terminals such as digital TVs, desktop computers, smart home devices, and the like. It should be noted that the electronic device 920 shown in fig. 7 is only an example, and does not impose any limitation on the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the electronic device 920 may include a processing apparatus (e.g., a central processing unit, a graphics processor, etc.) 921, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 922 or a program loaded from the storage apparatus 928 into a Random Access Memory (RAM) 923. In the RAM 923, various programs and data required for the operation of the electronic device 920 are also stored. The processing device 921, the ROM 922, and the RAM 923 are connected to each other through a bus 924. An input/output (I/O) interface 925 is also connected to bus 924.
In general, the following devices may be connected to the I/O interface 925: input devices 926 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 927 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 928 including, for example, magnetic tape, hard disk, etc.; and communication device 929. The communication device 929 may allow the electronic apparatus 920 to communicate wirelessly or by wire with other electronic apparatuses to exchange data. While fig. 7 shows the electronic device 920 with various means, it is to be understood that not all of the illustrated means are required to be implemented or provided, and that the electronic device 920 may alternatively be implemented or provided with more or fewer means.
For example, according to an embodiment of the present disclosure, the above-described image processing method may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program, carried on a non-transitory computer readable medium, the computer program comprising program code for performing the above-described image processing method. In such an embodiment, the computer program may be downloaded and installed from a network via the communications device 929, or from the storage device 928, or from the ROM 922. The functions defined in the image processing method provided by the embodiments of the present disclosure can be implemented when the computer program is executed by the processing device 921.
Fig. 8 is a schematic diagram of a storage medium according to some embodiments of the present disclosure. For example, as shown in FIG. 8, the storage medium 930 may be a non-transitory computer-readable storage medium for storing non-transitory computer-executable instructions 931. The image processing methods described in embodiments of the present disclosure may be implemented when the non-transitory computer-executable instructions 931 are executed by a processor, for example, one or more steps in accordance with the image processing methods described above may be performed when the non-transitory computer-executable instructions 931 are executed by a processor.
For example, the storage medium 930 may be applied to the above-described electronic device, and for example, the storage medium 930 may include a memory in the electronic device.
For example, the storage medium may include a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), portable compact disc read only memory (CD-ROM), flash memory, or any combination of the foregoing, as well as other suitable storage media.
For example, the description of the storage medium 930 may refer to the description of the memory in the embodiment of the electronic device, and the repetition is omitted. The specific functions and technical effects of the storage medium 930 may be referred to the above description of the image processing method, and will not be repeated here.
It should be noted that in the context of this disclosure, a computer-readable medium can be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (9)

1. An image processing method, the method comprising:
acquiring an original image to be processed; the original image comprises a first class object; the first class object corresponds to a first region in the original image;
determining a mask image corresponding to the original image based on the first region;
dividing the original image to obtain a divided image comprising at least part of preset category semantics;
Obtaining a target image based on the original image, the mask image and the segmentation image; the target image includes a second class object transformed from the first class object in the original image.
2. The method of claim 1, wherein the segmenting the original image to obtain a segmented image including at least part of a preset category semantic comprises:
processing the original image by adopting a semantic segmentation model to obtain the segmented image; the semantic segmentation model is trained based on images comprising the preset category semantics.
3. The method of claim 1, wherein the deriving a target image based on the original image, the mask image, and the segmented image comprises:
converting the original image, the mask image and the segmentation image by using a target model to obtain the target image; the conversion process is a process of converting the first class object into the second class object.
4. A method according to claim 3, wherein said converting the original image, the mask image and the segmented image using a target model to obtain the target image comprises:
Combining the original image, the mask image and the segmentation image to obtain combined data; the channel number of the merged data is the sum of the channel numbers of the original image, the mask image and the segmented image;
and inputting the combined data to the target model for the conversion processing to obtain the target image.
5. A method according to claim 3, the object model being trained by iteratively performing the steps of:
acquiring a sample image and a label image; wherein a second region in the sample image corresponds to a first class object; a third region in the label image corresponds to a second class object;
obtaining a predicted image based on the sample image and a target model to be trained;
determining a prediction loss based on the prediction image and the label image; the predicted loss includes a fourth loss term determined based on a difference in the predicted image and the label image of a target region; wherein the target region is determined based on the second region and the third region;
and adjusting model parameters of the target model to be trained with the aim of reducing the prediction loss.
6. The method of claim 5, wherein the fourth penalty term is determined by:
determining a first difference value of the target region in the predicted image and the label image under a first color space;
determining a second difference value of the target region in the predicted image and the label image under a second color space;
the fourth penalty term is determined based on a weighted sum of the first variance value and the second variance value.
7. An image processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring an original image to be processed; the original image comprises a first class object; the first class object corresponds to a first region in the original image;
the determining module is used for determining a mask image corresponding to the original image based on the first area;
the segmentation module is used for carrying out segmentation processing on the original image to obtain a segmented image comprising at least part of preset category semantics;
the processing module is used for obtaining a target image based on the original image, the mask image and the segmentation image; the target image includes a second class object transformed from the first class object in the original image.
8. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-6.
9. An electronic device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-6.
CN202211594249.0A 2022-12-13 2022-12-13 Image processing method and device and electronic equipment Pending CN116229054A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211594249.0A CN116229054A (en) 2022-12-13 2022-12-13 Image processing method and device and electronic equipment
PCT/CN2023/134020 WO2024125267A1 (en) 2022-12-13 2023-11-24 Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211594249.0A CN116229054A (en) 2022-12-13 2022-12-13 Image processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116229054A true CN116229054A (en) 2023-06-06

Family

ID=86590031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211594249.0A Pending CN116229054A (en) 2022-12-13 2022-12-13 Image processing method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN116229054A (en)
WO (1) WO2024125267A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024125267A1 (en) * 2022-12-13 2024-06-20 北京字跳网络技术有限公司 Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511580A (en) * 2022-01-28 2022-05-17 北京字跳网络技术有限公司 Image processing method, device, equipment and storage medium
CN114863482A (en) * 2022-05-17 2022-08-05 北京字跳网络技术有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN116229054A (en) * 2022-12-13 2023-06-06 北京字跳网络技术有限公司 Image processing method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024125267A1 (en) * 2022-12-13 2024-06-20 北京字跳网络技术有限公司 Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product

Also Published As

Publication number Publication date
WO2024125267A1 (en) 2024-06-20

Similar Documents

Publication Publication Date Title
US11734851B2 (en) Face key point detection method and apparatus, storage medium, and electronic device
US10936919B2 (en) Method and apparatus for detecting human face
US20200334830A1 (en) Method, apparatus, and storage medium for processing video image
CN107622240B (en) Face detection method and device
US10719693B2 (en) Method and apparatus for outputting information of object relationship
CN110009059B (en) Method and apparatus for generating a model
CN111783626B (en) Image recognition method, device, electronic equipment and storage medium
CN110210501B (en) Virtual object generation method, electronic device and computer-readable storage medium
CN110298850B (en) Segmentation method and device for fundus image
CN114187624B (en) Image generation method, device, electronic equipment and storage medium
WO2024125267A1 (en) Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product
CN113505848A (en) Model training method and device
CN111833242A (en) Face transformation method and device, electronic equipment and computer readable medium
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
CN110288532B (en) Method, apparatus, device and computer readable storage medium for generating whole body image
CN114332553A (en) Image processing method, device, equipment and storage medium
US20230036366A1 (en) Image attribute classification method, apparatus, electronic device, medium and program product
CN109241930B (en) Method and apparatus for processing eyebrow image
CN112037305B (en) Method, device and storage medium for reconstructing tree-like organization in image
CN113284206A (en) Information acquisition method and device, computer readable storage medium and electronic equipment
CN117315758A (en) Facial expression detection method and device, electronic equipment and storage medium
CN112966592A (en) Hand key point detection method, device, equipment and medium
CN110059739B (en) Image synthesis method, image synthesis device, electronic equipment and computer-readable storage medium
CN112183303A (en) Transformer equipment image classification method and device, computer equipment and medium
CN116258800A (en) Expression driving method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination