CN114926491B - Matting method and device, electronic equipment and storage medium - Google Patents

Matting method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114926491B
CN114926491B CN202210515100.2A CN202210515100A CN114926491B CN 114926491 B CN114926491 B CN 114926491B CN 202210515100 A CN202210515100 A CN 202210515100A CN 114926491 B CN114926491 B CN 114926491B
Authority
CN
China
Prior art keywords
image
matting
sample
training
training samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210515100.2A
Other languages
Chinese (zh)
Other versions
CN114926491A (en
Inventor
高宇康
焦少慧
杜绪晗
程京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202210515100.2A priority Critical patent/CN114926491B/en
Publication of CN114926491A publication Critical patent/CN114926491A/en
Application granted granted Critical
Publication of CN114926491B publication Critical patent/CN114926491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/155Segmentation; Edge detection involving morphological operators

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure discloses a matting method, a matting device, electronic equipment and a storage medium. The method comprises the following steps: inputting the image to be scratched and the prior background image corresponding to the target background image in the image to be scratched into a target scratched model which is trained in advance to obtain a target transparent image corresponding to the target foreground image in the image to be scratched; the target matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to the second foreground images in the second sample images. According to the technical scheme, fine matting can be achieved.

Description

Matting method and device, electronic equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the field of image processing, in particular to a matting method, a matting device, electronic equipment and a storage medium.
Background
With the development of internet technology, the matting technology is also becoming more and more widely applied, such as often applied to matting technology in the process of producing film and television drama, post-processing publicity photo and the like.
But the existing matting scheme has the problem of low matting fineness.
Disclosure of Invention
The embodiment of the disclosure provides a matting method, a matting device, electronic equipment and a storage medium, so as to achieve the effect of fine matting.
In a first aspect, an embodiment of the present disclosure provides a matting method, which may include:
acquiring a trained target matting model, a target matting image and a priori background image corresponding to a target background image in the target matting image;
Inputting the image to be scratched and the prior background image into a target scratched model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched;
The target matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to second foreground images in the second sample images.
In a second aspect, an embodiment of the present disclosure further provides a matting apparatus, which may include:
the image acquisition module is used for acquiring the trained target matting model, the to-be-matting image and the prior background image corresponding to the target background image in the to-be-matting image;
The image matting module is used for inputting the image to be scratched and the prior background image into the target matting model to obtain a target transparent image corresponding to the target foreground image in the image to be scratched;
The target matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to second foreground images in the second sample images.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, which may include:
One or more processors;
a memory for storing one or more programs,
When executed by one or more processors, causes the one or more processors to implement the matting method provided by any embodiment of the present disclosure.
In a fourth aspect, the embodiments of the present disclosure further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the matting method provided by any of the embodiments of the present disclosure.
According to the technical scheme, the to-be-scratched image and the prior background image corresponding to the target background image in the to-be-scratched image are input into the trained target scratched model, so that the target transparent image corresponding to the target foreground image in the to-be-scratched image can be obtained according to the output result of the target scratched model. According to the technical scheme, the target matting model is obtained by training based on a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise a first sample image, a first background image corresponding to a sample background image in the first sample image and a first transparent image corresponding to a first foreground image in the first sample image, the second training samples comprise a second sample image, a second background image corresponding to the second sample image and a labeling segmentation image corresponding to the second foreground image in the second sample image, namely the target matting model is a matting model obtained by combining foreground matting training based on background priori based on a plurality of groups of first training samples and foreground segmentation training based on a plurality of groups of second training samples, so that matting fineness, particularly matting fineness at the edge of a foreground target is guaranteed.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
Fig. 1 is a flow chart of a matting method in a first embodiment of the present disclosure;
fig. 2a is a schematic diagram of a first training sample in a matting method in a first embodiment of the disclosure;
fig. 2b is a schematic diagram of a second training sample in a matting method in a first embodiment of the disclosure;
Fig. 3 is a matting effect diagram in a matting method in a first embodiment of the present disclosure;
Fig. 4 is a flowchart of a matting method in a second embodiment of the disclosure;
fig. 5 is a schematic diagram of a third training sample in a matting method in a second embodiment of the disclosure;
fig. 6 is a flow chart of a matting method in a third embodiment of the present disclosure;
Fig. 7 is a flowchart of a matting method in a fourth embodiment of the present disclosure;
Fig. 8 is a schematic structural diagram of an original matting model in a matting method in a fourth embodiment of the disclosure;
Fig. 9 is a block diagram of a matting apparatus in a fifth embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of an electronic device in a sixth embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
In the following embodiments, optional features and examples are provided in each embodiment at the same time, and the features described in the embodiments may be combined to form multiple alternatives, and each numbered embodiment should not be considered as only one technical solution.
Example 1
Fig. 1 is a flowchart of a matting method provided in a first embodiment of the present disclosure. The method and the device are applicable to the situation of fine matting, and are particularly applicable to the situation of achieving fine matting based on a target matting model, wherein the target matting model is a matting model obtained by combining foreground segmentation training and foreground matting training based on background priori. The method may be performed by a matting apparatus provided by an embodiment of the disclosure, which may be implemented by software and/or hardware, and which may be integrated on an electronic device.
Referring to fig. 1, the method of the embodiment of the disclosure specifically includes the following steps:
S110, acquiring a trained target matting model, a to-be-matting image and a priori background image corresponding to a target background image in the to-be-matting image, wherein the target matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to the second foreground images in the second sample images.
The first sample image may be considered as a sample image composed of two parts of a sample background image and a first foreground image, and the first background image may be a background image corresponding to the sample background image, such as a back image which is the same as or similar to the sample background image and is acquired in advance; the first transparent map may be a pre-noted transparent map corresponding to the first foreground image (i.e. for extracting the first foreground image from the first sample image), in particular a floating-point matrix of the same size as the first sample image and with pixel values between 0 and 1. For example, referring to fig. 2a, a first sample image, a first background image and a first transparency map are shown in that order from left to right. The first sample image (which may also be referred to as a first sample artwork), the first background image, and the first transparency image are taken as a set of first training samples.
The second sample image may be regarded as a sample image composed of two parts of a certain background image and a second foreground image, and the second background image may be a background image corresponding to the second sample image, such as a background image generated according to pixel values of each pixel point in the second sample image or the certain background image; the labeled segmentation map may be a pre-labeled segmentation map corresponding to the second foreground image (i.e. for segmenting the second foreground image from the second sample image), in particular a binary matrix of the same size as the second sample image and with a pixel value of 0 or 1. For example, see fig. 2b, which shows the second sample image, the second background image and the labeling segmentation map in sequence from left to right. And taking the second sample image, the second background image and the labeling segmentation map as a group of second training samples.
On the basis, the target matting model can be obtained based on multiple groups of first training samples and multiple groups of second training samples, specifically, the foreground matting training based on background priori and the foreground segmentation training based on multiple groups of second training samples are combined, and the target matting model is obtained through training by mixing the two. Because what the target of the matting is can be learned based on the former, and the fineness of the matting can be improved based on the latter, the two mutually cooperate, and the accuracy and the fineness of the matting can be better ensured by the target matting model obtained through training.
In particular, when the sample background image and the first background image are the same background image, the first sample image may be regarded as a composite image of the first foreground image and the first background image, and if model training is performed based on only the first training sample obtained thereby, a problem that the composite image and the real scene are different and the edge of the matting is not fine easily occurs during application. Therefore, the foreground segmentation training is mixed into the foreground matting training based on the background priori, and the problem of imprecise matting edges is effectively solved.
The image to be scratched can be considered as an image to be subjected to foreground scratching, which consists of a target foreground image and a target background image, and the prior background image can be a background image which corresponds to the target background image and serves as prior knowledge, such as a back image which is acquired in advance and is the same as or similar to the target background image.
S120, inputting the image to be scratched and the prior background image into a target scratched model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched.
The method comprises the steps of inputting an image to be scratched and a priori background image into a target scratched model, and obtaining a target transparent image corresponding to the target foreground image according to an output result of the target scratched model, namely, a transparent image for extracting the target foreground image from the image to be scratched.
In order to verify the effectiveness of the method, for example, referring to fig. 3, a left image to be scratched and a middle prior background image are input into a target scratched model, and then a right target transparent image can be obtained according to an output result of the target scratched model. According to fig. 3, the method can accurately extract the foreground object in the image to be scratched, and the edge of the foreground object has higher scratch fineness. In practical applications, the method can be optionally applied to live video-on-demand scenes (such as education, electronic commerce, television programs and the like) with fixed shooting positions, and can be used for removing and/or replacing target background images.
According to the technical scheme, the to-be-scratched image and the prior background image corresponding to the target background image in the to-be-scratched image are input into the trained target scratched model, so that the target transparent image corresponding to the target foreground image in the to-be-scratched image can be obtained according to the output result of the target scratched model. According to the technical scheme, the target matting model is obtained by training based on a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise a first sample image, a first background image corresponding to a sample background image in the first sample image and a first transparent image corresponding to a first foreground image in the first sample image, the second training samples comprise a second sample image, a second background image corresponding to the second sample image and a labeling segmentation image corresponding to the second foreground image in the second sample image, namely the target matting model is a matting model obtained by combining foreground matting training based on background priori based on a plurality of groups of first training samples and foreground segmentation training based on a plurality of groups of second training samples, so that matting fineness, particularly matting fineness at the edge of a foreground target is guaranteed.
Example two
Fig. 4 is a flowchart of a matting method provided in a second embodiment of the present disclosure. This embodiment is optimized based on the various alternatives in the embodiments described above. In this embodiment, optionally, the target matting model may be obtained by training in advance through the following steps: acquiring an original matting model to be trained, a first training sample and a second training sample; and training the original matting model based on a plurality of groups of first training samples and a plurality of groups of second training samples to obtain a target matting model. The same or corresponding terms as those of the above embodiments are not repeated herein.
Accordingly, as shown in fig. 4, the method of this embodiment may specifically include the following steps:
s210, acquiring an original matting model to be trained, a first training sample and a second training sample, wherein the first training sample comprises a first sample image, a first background image corresponding to a sample background image in the first sample image and a first transparent image corresponding to a first foreground image in the first sample image, and the second training sample comprises a second sample image, a second background image corresponding to the second sample image and a labeling segmentation image corresponding to a second foreground image in the second sample image.
The original matting model may be a matting model to be trained.
In practical applications, optionally, for the first training sample, the first sample image may be acquired by: acquiring an acquired image and a first transparent image, and extracting a first foreground image from the acquired image according to the first transparent image; and acquiring a first background image, and synthesizing the first background image and the first foreground image to obtain a first sample image. Wherein the acquired image may be an image acquired in advance including a first foreground image, and the first transparent map may be a transparent map corresponding to the first foreground image, and thus the first foreground image may be extracted from the acquired image based on the first transparent map. Then, the first background image may be a background image acquired in advance for use as a priori knowledge, and the first background image and the first foreground image may be synthesized to obtain a first sample image. According to the technical scheme, the problem that the collection difficulty of the first training sample is high can be effectively solved.
Still alternatively, the second background image may be generated by: and acquiring a second sample image, and generating a second background image according to the pixel values of all pixel points in the second sample image, for example, generating the second background image according to the average value of all pixel values, wherein the obtained second background image can be a gray image, so that the second background image corresponding to the second sample image is effectively obtained.
S220, training the original matting model based on a plurality of groups of first training samples and a plurality of groups of second training samples to obtain a target matting model.
The original matting model is trained based on a plurality of groups of first training samples, so that the original matting model can learn what the matting target is; and training the original matting model based on a plurality of groups of second training samples, so that the original matting model can deduct finer matting targets. Therefore, the target matting model obtained through training in the two training processes effectively ensures matting fineness, and particularly ensures matting fineness at the edge of the foreground target. It should be noted that the two training processes may be performed sequentially or simultaneously, which is not specifically limited herein.
S230, acquiring an image to be scratched and a priori background image corresponding to a target background image in the image to be scratched, and inputting the image to be scratched and the priori background image into a target scratched model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched.
According to the technical scheme, an original matting model to be trained, a first training sample and a second training sample are obtained, and the original matting model is trained based on a plurality of groups of first training samples and a plurality of groups of second training samples, so that a target matting model capable of guaranteeing matting fineness, particularly matting fineness at the edge of a foreground target, can be obtained.
An optional technical solution, on the basis of the second embodiment, the matting method may further include: acquiring a third training sample, wherein the third training sample comprises a third sample image, a third background image corresponding to the third sample image and a second transparent image corresponding to a third foreground image in the third sample image; training the original matting model based on a plurality of groups of first training samples and a plurality of groups of second training samples, including: the original matting model is trained based on a plurality of sets of first training samples, a plurality of sets of second training samples, and a plurality of sets of third training samples. The third sample image may be considered as a sample image formed by a certain background image and a third foreground image, the third background image may be a background image corresponding to the third sample image, for example, a background image generated according to a pixel value of each pixel point in the third sample image or the certain background image, and the background image may be a gray image; the second transparent map may be a pre-noted transparent map corresponding to the third foreground image (i.e. for extracting the third foreground image from the third sample image). For example, referring to fig. 5, a third sample image, a third background image, and a second transparency image are shown in that order from left to right. And taking the third sample image, the third background image and the second transparent image as a group of third training samples, and then training the original matting model based on a plurality of groups of first training samples, a plurality of groups of second training samples and a plurality of groups of third training samples. According to the technical scheme, in the model training stage, the foreground matting training based on the plurality of groups of third training samples is added, so that the original matting model can learn the related information at the edge and the detail of the foreground object better, and the matting fineness of the target matting model obtained through training is further improved.
On the basis of the above, optionally, the foreground object in the second foreground image and the foreground object in the third foreground image each correspond to the first type, and there are first training samples including the first foreground image corresponding to the first type and first training samples including the first foreground image corresponding to the second type in the plurality of sets of first training samples, the first type and the second type being different. The first type may be a type of a foreground object in the second foreground image and a foreground object in the third foreground image, such as a person, a house, a road, a tree, etc., which may be understood as a matting object to which the target matting model is mainly applied. On the basis, in order to ensure the universality of the target matting model, the plurality of groups of first training samples can comprise first training samples corresponding to the foreground targets (namely, the types of the foreground targets in the first training samples are first types) and first training samples corresponding to the rest of the foreground targets (namely, the types of the foreground targets in the first training samples are second types), so that the target matting model obtained through training can realize the effects of refining the matting of the foreground targets and the universal targets except the foreground targets.
In another optional technical solution, on the basis of the second embodiment, training the original matting model based on a plurality of groups of first training samples and a plurality of groups of second training samples to obtain a target matting model may include: acquiring a matting loss function, and performing matting training on the original matting model based on the matting loss function and a plurality of groups of first training samples; obtaining a segmentation loss function, and carrying out segmentation training on the original matting model based on the segmentation loss function and a plurality of groups of second training samples; and obtaining the target matting model.
The matting loss function may be a loss function suitable for performing matting training on the original matting model based on a plurality of groups of first training samples, and of course, may also be a loss function suitable for performing matting training on the original matting model based on a plurality of groups of third training samples. In practical applications, optionally, the matting loss function may include at least one of a laplace loss function, an L1 loss function, an edge loss function, and a foreground object loss function. Specifically, the laplace loss function may be the loss function i lap for enhancing detail learning, and in practical applications, optionally, it may be represented by the following formula: Wherein A is a predicted transparent image obtained by inputting a first sample image and a first background image into an original matting model, lpha _gt is a first transparent image. The L1 loss function may be a loss function L l1 for global supervision a accuracy, which may be expressed by the following formula in practical applications: l l1 =mean (|a-alpha_gt|). The edge loss function may be a loss function i edge set for enhancing detail learning of the edge portion, in which the weight of the edge portion is increased by setting an edge_region of the expansion corrosion result, so that the edge can be more stable and clear, and in practical application, optionally, the edge loss function may be expressed by the following formula: and/edge =mean (|a-alpha_gt| edge_region), where edge_region may be the result of expanding and corroding the first transparent graph, respectively, such as dilate (alpha_gt) -erode (alpha_gt) as edge_region. The foreground object loss function may be a loss function i obj provided to increase the integrity of the segmented foreground portion, and in particular may be implemented by increasing the weight of the foreground portion by adding the term expansion result dilate region therein. In practical applications, alternatively, it may be represented by the following formula: i obj = mean (|a-alpha_gt| dilate _region), where dilate _region may be the result of expanding the first transparency map, e.g., dilate (alpha_gt) as edge_region. The matting loss function and the plurality of groups of first training samples are matched with each other, so that the matting training of the original matting model can be effectively realized.
The segmentation loss function may include a loss function l seg suitable for performing segmentation training on the original matting model based on multiple sets of second training samples, and in an actual application, in an alternative, the loss function may be expressed by the following formula, where l seg =bce (Seg, seg_labe), BCE is cross entropy (binary cross entropy), seg is a prediction segmentation map obtained after the second sample image and the second background image are input into the original matting model, and seg_labe may be an labeling segmentation map. Still alternatively, seg_label may be set manually, or may be obtained by conversion according to alpha_gt, for example, seg_label= (alpha gt > THR), where THR is a preset threshold, in other words, a pixel greater than the threshold in alpha_gt may be set to 1, and a pixel less than or equal to the threshold may be set to 0, thereby obtaining seg_label. The segmentation loss function and the plurality of groups of second training samples are matched with each other, so that the segmentation training of the original segmentation model can be effectively realized.
The two parts are mutually matched, so that the target matting model can be obtained, and the effect of effective training of the target matting model is achieved.
On this basis, optionally, performing the matting training on the original matting model based on the matting loss function and the plurality of groups of first training samples may include: inputting a first sample image and a first background image in the first training samples into an original matting model aiming at each group of first training samples to obtain a prediction transparent image; according to the predicted transparent graph, the first transparent graph in the first training sample and the matting loss function, adjusting network parameters in the original matting model; the performing segmentation training on the original matting model based on the segmentation loss function and the plurality of groups of second training samples may include: inputting a second sample image and a second background image in the second training samples into the original matting model aiming at each group of second training samples to obtain a prediction segmentation map; according to the prediction segmentation map, the labeling segmentation map and the segmentation loss function in the second training sample, adjusting network parameters; the obtaining the target matting model may include: and obtaining the target matting model according to the adjustment result of the network parameters. In other words, the matting loss function and the plurality of groups of first training samples are mutually matched, so that network parameters in the original matting model can be adjusted towards the direction capable of bringing better matting effect; similarly, the segmentation loss function and the plurality of groups of second training samples are matched with each other, so that the network parameters can be adjusted towards the direction capable of bringing better segmentation effect. The two parts are mutually matched, and after the network parameters are adjusted, the target matting model can be obtained, so that the effect of effective training of the target matting model is realized.
Example III
Fig. 6 is a flowchart of a matting method provided in a third embodiment of the present disclosure. This embodiment is optimized based on each of the alternatives in the second embodiment described above. In this embodiment, optionally, the matting method may further include: acquiring a first background image, carrying out data augmentation on the first background image to obtain a fourth background image, and taking the first sample image, the fourth background image and the first transparent image as a group of fourth training samples; training the original matting model based on a plurality of groups of first training samples and a plurality of groups of second training samples, including: the original matting model is trained based on a plurality of sets of first training samples, a plurality of sets of second training samples, and a plurality of sets of fourth training samples. Wherein, the explanation of the same or corresponding terms as the above embodiments is not repeated herein.
Accordingly, as shown in fig. 6, the method of this embodiment may specifically include the following steps:
S310, acquiring an original matting model to be trained, a first training sample and a second training sample, wherein the first training sample comprises a first sample image, a first background image corresponding to a sample background image in the first sample image and a first transparent image corresponding to a first foreground image in the first sample image, and the second training sample comprises a second sample image, a second background image corresponding to the second sample image and a labeling segmentation image corresponding to a second foreground image in the second sample image.
And S320, carrying out data augmentation on the first background image to obtain a fourth background image, and taking the first sample image, the fourth background image and the first transparent image as a group of fourth training samples.
In practical applications, considering that the prior background image is a pre-acquired background image and the image to be scratched is a later acquired image, this means that there may be differences between the prior background image and the target background image in the image to be scratched, for example, differences caused by shadows, artifacts, camera shake, illumination changes, and the like, which may affect the prediction fineness of the target transparent map. Therefore, in order to ensure the robustness of the application of the target matting model, the first background image may be subjected to data augmentation, and the fourth background image obtained by this may be used to simulate the situation that there is a difference between the prior background image and the target background image due to some factor transformation in the model application stage, that is, the situation that the difference between the prior background image and the target background image is simulated by the difference between the fourth background image and the sample background image. And taking the first sample image, the fourth background image and the first transparent image as a group of fourth training samples so as to combine multiple groups of first training samples and multiple groups of second training samples subsequently to train the original matting model together.
S330, training the original matting model based on a plurality of groups of first training samples, a plurality of groups of second training samples and a plurality of groups of fourth training samples.
S340, acquiring an image to be scratched and a priori background image corresponding to a target background image in the image to be scratched, and inputting the image to be scratched and the priori background image into a target scratched model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched.
According to the technical scheme, the first background image is subjected to data augmentation to obtain the fourth background image for simulating the difference between the prior background image and the target background image, and a fourth training sample consisting of the first sample image, the fourth background image and the first transparent image can be added into a model training process in a model training stage, so that the robustness of a target matting model obtained through training is guaranteed.
An optional technical solution, based on the third embodiment, performs data augmentation on the first background image, including at least one of the following operations: rotating and/or translating the first background image; adjusting an environmental parameter of the first background image, wherein the environmental parameter comprises at least one of a brightness parameter, a tone parameter and a contrast parameter; performing Gaussian blur and/or motion blur on the first background image; shadows and/or artifacts are generated in the first background image. Wherein the situation of slight camera shake is simulated by rotating and/or translating the first background image, such as randomly rotating (-10 degrees) and/or randomly translating (-50 px) the first background image with a certain probability. By adjusting at least one of the environmental parameters, gaussian blur, and motion blur of the first background image, the condition of the shooting environment change is thereby simulated, such as random brightness, hue, and contrast adjustment, and random gaussian/motion blur, of the first background image with a certain probability. The situation of shadows and/or artefacts is simulated by generating shadows and/or artefacts in the first background image. The technical scheme is matched with other technical characteristics, so that interference caused by shadow, artifact, shooting condition change and the like on the prediction fineness of the target transparent graph can be effectively reduced.
Example IV
Fig. 7 is a flowchart of a matting method provided in a fourth embodiment of the present disclosure. The present embodiment is optimized based on each of the alternatives in the second and third embodiments described above. In this embodiment, optionally, the original matting model includes an encoding layer and a decoding layer that are sequentially connected; the coding layer comprises a first coding layer and a second coding layer, the first coding layer is used for coding the first sample image and the second sample image, and the second coding layer is used for coding the first background image and the second background image; and/or the output channel of the decoding layer comprises a segmentation output channel for outputting a prediction segmentation map obtained after the second sample image and the second background image are processed by the encoding layer and the decoding layer; and/or aiming at the current layer in the decoding layers of all the layers, the input features of the current layer comprise splicing features, wherein the splicing features are features obtained by splicing a last output feature of a decoding layer of a last layer of the current layer, an intermediate output feature with the same scale as the last output feature in an intermediate output feature of an encoding layer of all the layers and a first sample image with the size adjusted to the scale. Wherein, the explanation of the same or corresponding terms as the above embodiments is not repeated herein.
Accordingly, as shown in fig. 7, the method of this embodiment may specifically include the following steps:
S410, acquiring an original matting model to be trained, wherein the original matting model comprises a coding layer and a decoding layer which are sequentially connected, the coding layer comprises a first coding layer and a second coding layer, the first coding layer is used for coding a first sample image and a second sample image, and the second coding layer is used for coding a first background image and a second background image; the output channel of the decoding layer comprises a segmentation output channel for outputting a prediction segmentation map obtained after the second sample image and the second background image are processed by the encoding layer and the decoding layer; for the current layer in the decoding layers of each layer, the input features of the current layer comprise splicing features, wherein the splicing features are obtained by splicing a last output feature of a decoding layer of a last layer of the current layer, an intermediate output feature with the same scale as the last output feature in the intermediate output features of the encoding layers of each layer, and a first sample image with the size adjusted to the scale.
Taking the first sample image and the first background image as examples, the advantage of encoding the first encoding layer and the second encoding layer based on the first encoding layer and the second encoding layer respectively is that: the two images are not provided with strong dependency relationship (namely decoupling of the two images is realized), so that even if the sample background image and the first background image are different due to the change of certain factors, a better image matting effect can be obtained, and the method is still applicable in the model application stage; in addition, when the second background image is a grayscale image, the encoding layer that is not separated cannot process such a second sample image and the second background image. In order to implement foreground segmentation training based on the plurality of sets of second training samples, a segmentation output channel for outputting a predictive segmentation map may be provided in an output channel of the decoding layer. For the current layer in the decoding layers of each layer, the input features of the current layer comprise splicing features, the splicing features are features obtained by splicing a last output feature of a decoding layer of a last layer of the current layer, an intermediate output feature which is the same as the last output feature in the intermediate output features of the encoding layers of each layer, and a first sample image with the size adjusted to the size, and the advantage that the first sample image with the size adjusted to the size is taken as a part of the splicing features is that a skip connection is added between the first sample image and the decoding layer, so that the input features of the decoding layer can acquire lower-level information, and the features of multilevel are fused, and the fineness of model training is improved.
S420, acquiring a first training sample and a second training sample, wherein the first training sample comprises a first sample image, a first background image corresponding to a sample background image in the first sample image and a first transparent image corresponding to a first foreground image in the first sample image, and the second training sample comprises a second sample image, a second background image corresponding to the second sample image and a labeling segmentation image corresponding to a second foreground image in the second sample image.
And S430, training the original matting model based on a plurality of groups of first training samples and a plurality of groups of second training samples to obtain a target matting model.
S440, acquiring an image to be scratched and a priori background image corresponding to a target background image in the image to be scratched, and inputting the image to be scratched and the priori background image into a target scratched model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched.
According to the technical scheme, decoupling between the first sample image and the first background image (also the second sample image and the second background image) is achieved through the separation decoding layer, so that the robustness of the target matting model obtained through training is guaranteed, and the second background image belonging to the gray level image and the encoding of the second sample image are better processed through the separation encoding layer; the effective progress of target segmentation training is ensured by newly adding a segmentation output channel in the output channel of the decoding layer; this improves the fineness of model training by taking the resized first sample image as part of the input features of the decoding layer.
In order to better understand the original matting model as a whole, an exemplary description thereof is given below in connection with specific examples. Illustratively, referring to FIG. 8, the original matting model includes an input layer (input), an encoding layer (encoder), a decoding layer (decoder), an optimization layer (refner), and an output layer (output). In particular, the method comprises the steps of,
Input: taking the first sample image I and the first background image B as examples, the size (shape) of I is (3, h, w), and the shape of B is (3, h, w).
The encoder comprises a first encoding layer (encoder 1), a second encoding layer (encoder 2) and a third encoding layer (encoder 3), wherein the structures of the encoder1 and the encoder2 can be the front half part of a common backbone (such as Resnet, vggnet, mobilenet and the like), the encoder3 is used for polymerizing and further extracting the characteristics of I and B, and the structure can be the back half part of a 1x1 convolution dimension reduction+common backbone.
The Decoder may consist of four convolutional layers + up-sampling layers, whose output shape is the intermediate predictive transparent map a_sm of (1, h/2, w/2), the predictive segmentation map Seg of (1, h/2, w/2), and the intermediate feature hd of (32, h/2, w/2) for use as an input feature of the refiner.
Refiner can be composed of an upsampling layer and three convolutional layers, outputting a final predictive transparency map a _ lg of high resolution (1, h, w) and high quality. The refiner may also be referred to as an optimizer or an optimization network.
In practical application, the matting loss function combined in the above is optionally applied to both a_sm and a_lg, i.e. a in the above formula may be either a_sm or a_lg.
Example five
Fig. 9 is a block diagram of a matting apparatus according to a fifth embodiment of the present disclosure, where the apparatus is configured to execute the matting method provided in any of the foregoing embodiments. The device and the matting method of the above embodiments belong to the same conception, and the details of the matting device embodiment which are not described in detail can be referred to the above matting method embodiment. Referring to fig. 9, the apparatus may specifically include: an image acquisition module 510 and a matting module 520.
The image obtaining module 510 is configured to obtain a trained target matting model, a to-be-matting image, and a priori background image corresponding to a target background image in the to-be-matting image;
The matting module 520 is configured to input the to-be-matting image and the prior background image into a target matting model, so as to obtain a target transparent image corresponding to a target foreground image in the to-be-matting image;
The target matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to second foreground images in the second sample images.
According to the matting device provided by the fifth embodiment of the present disclosure, the image obtaining module and the matting module are mutually matched, and the to-be-matting image and the prior background image corresponding to the target background image in the to-be-matting image are input into the trained target matting model, so that the target transparent image corresponding to the target foreground image in the to-be-matting image can be obtained according to the output result of the target matting model. According to the device, the target matting model is obtained based on the training of the plurality of groups of first training samples and the plurality of groups of second training samples, the first training samples comprise the first sample image, the first background image corresponding to the sample background image in the first sample image and the first transparent image corresponding to the first foreground image in the first sample image, the second training samples comprise the second sample image, the second background image corresponding to the second sample image and the labeling segmentation image corresponding to the second foreground image in the second sample image, namely the target matting model is a matting model obtained by combining the background priori foreground matting training based on the plurality of groups of first training samples and the foreground segmentation training based on the plurality of groups of second training samples, so that the matting fineness, particularly the matting fineness at the edge of the foreground target is ensured.
The matting device provided by the embodiment of the disclosure can execute the matting method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
It should be noted that, in the embodiment of the matting apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present disclosure.
Example six
Referring now to fig. 10, a schematic diagram of an electronic device (e.g., a terminal device or server in fig. 10) 600 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 10 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 10, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphic processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While an electronic device 600 having various means is shown in fig. 10, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
Example seven
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:
acquiring a trained target matting model, a target matting image and a priori background image corresponding to a target background image in the target matting image;
Inputting the image to be scratched and the prior background image into a target scratched model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched;
The target matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to second foreground images in the second sample images.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit is not limited to the unit itself in some cases, for example, the image acquisition module may also be described as "a module for acquiring a trained target matting model, a to-be-matting image, and a priori background image corresponding to a target background image in the to-be-matting image".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, there is provided a matting method, which may include:
acquiring a trained target matting model, a target matting image and a priori background image corresponding to a target background image in the target matting image;
Inputting the image to be scratched and the prior background image into a target scratched model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched;
The target matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to second foreground images in the second sample images.
According to one or more embodiments of the present disclosure, a method of example one is provided [ example two ], the target matting model may be pre-trained by:
acquiring an original matting model to be trained, a first training sample and a second training sample;
and training the original matting model based on a plurality of groups of first training samples and a plurality of groups of second training samples to obtain a target matting model.
According to one or more embodiments of the present disclosure, a method of example two is provided [ example three ], where the matting method may further include:
Acquiring a third training sample, wherein the third training sample comprises a third sample image, a third background image corresponding to the third sample image and a second transparent image corresponding to a third foreground image in the third sample image;
training the original matting model based on the plurality of sets of first training samples and the plurality of sets of second training samples may include:
The original matting model is trained based on a plurality of sets of first training samples, a plurality of sets of second training samples, and a plurality of sets of third training samples.
According to one or more embodiments of the present disclosure, a method of example three is provided [ example four ], the foreground object in the second foreground image and the foreground object in the third foreground image each correspond to a first type, and there are first training samples in the plurality of sets of first training samples that include the first foreground image corresponding to the first type and first training samples that include the first foreground image corresponding to the second type, the first type and the second type being different.
According to one or more embodiments of the present disclosure, a method of example two is provided [ example five ], where training an original matting model based on a plurality of sets of first training samples and a plurality of sets of second training samples to obtain a target matting model may include:
acquiring a matting loss function, and performing matting training on the original matting model based on the matting loss function and a plurality of groups of first training samples;
obtaining a segmentation loss function, and carrying out segmentation training on the original matting model based on the segmentation loss function and a plurality of groups of second training samples;
And obtaining the target matting model.
According to one or more embodiments of the present disclosure, a method of example five is provided [ example six ], which performs a matting training on an original matting model based on a matting loss function and a plurality of sets of first training samples, and may include:
inputting a first sample image and a first background image in the first training samples into an original matting model aiming at each group of first training samples to obtain a prediction transparent image;
according to the predicted transparent graph, the first transparent graph in the first training sample and the matting loss function, adjusting network parameters in the original matting model;
The segmentation training of the original matting model based on the segmentation loss function and the plurality of groups of second training samples may include:
inputting a second sample image and a second background image in the second training samples into the original matting model aiming at each group of second training samples to obtain a prediction segmentation map;
according to the prediction segmentation map, the labeling segmentation map in the second training sample and the segmentation loss function, adjusting network parameters;
The obtaining the target matting model may include:
and obtaining the target matting model according to the adjustment result of the network parameters.
According to one or more embodiments of the present disclosure, a method of example five is provided [ example seven ], the matting loss function includes at least one of a laplace loss function, an L1 loss function, an edge loss function, and a foreground object loss function, the edge loss function is a loss function constructed according to a predicted transparency map, a first transparency map, and an expansion corrosion result, the expansion corrosion result is a result obtained after expanding and corroding the first transparency map, respectively, the foreground object loss function is a loss function constructed according to the predicted transparency map, the first transparency map, and the expansion result is a result obtained after expanding the first transparency map.
According to one or more embodiments of the present disclosure, a method of example two is provided [ example eight ], where the matting method may further include:
performing data augmentation on the first background image to obtain a fourth background image, and taking the first sample image, the fourth background image and the first transparent image as a group of fourth training samples;
training the original matting model based on the plurality of sets of first training samples and the plurality of sets of second training samples may include:
The original matting model is trained based on a plurality of sets of first training samples, a plurality of sets of second training samples, and a plurality of sets of fourth training samples.
According to one or more embodiments of the present disclosure, a method of example eight is provided [ example nine ], data augmentation of a first background image comprising at least one of:
rotating and/or translating the first background image;
Adjusting an environmental parameter of the first background image, wherein the environmental parameter comprises at least one of a brightness parameter, a hue parameter and a contrast parameter;
performing Gaussian blur and/or motion blur on the first background image;
Shadows and/or artifacts are generated in the first background image.
According to one or more embodiments of the present disclosure, a method of example two is provided [ example ten ], where the matting method may further include:
acquiring a second sample image;
and generating a second background image according to the pixel value of each pixel point in the second sample image.
According to one or more embodiments of the present disclosure, a method of example two is provided [ example eleven ], where the matting method may further include:
the sample background image is the same as the first background image, an acquired image and a first transparent image are obtained, and a first foreground image is extracted from the acquired image according to the first transparent image;
And acquiring a first background image, and synthesizing the first background image and the first foreground image to obtain a first sample image.
According to one or more embodiments of the present disclosure, a method of example two is provided [ example twelve ], an original matting model comprising coding layers and decoding layers connected in sequence; wherein,
The coding layer comprises a first coding layer and a second coding layer, the first coding layer is used for coding the first sample image and the second sample image, and the second coding layer is used for coding the first background image and the second background image;
and/or the number of the groups of groups,
The output channel of the decoding layer comprises a segmentation output channel for outputting a prediction segmentation map obtained after the decoding layer processes the second sample image and the second background image;
and/or the number of the groups of groups,
Aiming at the current layer in the decoding layers of each layer, the input characteristics of the current layer comprise splicing characteristics, wherein the splicing characteristics are obtained by splicing the last output characteristics of the decoding layer of the last layer of the current layer, the middle output characteristics with the same scale as the last output characteristics in the middle output characteristics of the coding layer of each layer and the first sample image with the size adjusted to the scale.
According to one or more embodiments of the present disclosure, there is provided a matting apparatus [ example thirteenth ], which may include:
the image acquisition module is used for acquiring the trained target matting model, the to-be-matting image and the prior background image corresponding to the target background image in the to-be-matting image;
The image matting module is used for inputting the image to be scratched and the prior background image into the target matting model to obtain a target transparent image corresponding to the target foreground image in the image to be scratched;
The target matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to the second foreground images in the second sample images.
According to one or more embodiments of the present disclosure, an apparatus of example thirteen is provided, and the target matting model may be pre-trained by:
the first sample acquisition module is used for acquiring an original matting model to be trained, a first training sample and a second training sample;
The target matting model obtaining module is used for training the original matting model based on a plurality of groups of first training samples and a plurality of groups of second training samples to obtain a target matting model.
According to one or more embodiments of the present disclosure, an apparatus of example fourteen is provided [ example fifteen ], where the matting apparatus may further include:
a sample second obtaining module, configured to obtain a third training sample, where the third training sample includes a third sample image, a third background image corresponding to the third sample image, and a second transparent map corresponding to a third foreground image in the third sample image;
the model training module may include:
and the model first training unit is used for training the original matting model based on a plurality of groups of first training samples, a plurality of groups of second training samples and a plurality of groups of third training samples.
According to one or more embodiments of the present disclosure, an apparatus of example fifteen is provided [ example sixteen ], the foreground object in the second foreground image and the foreground object in the third foreground image each correspond to a first type, and there are first training samples in the plurality of sets of first training samples that include the first foreground image corresponding to the first type and first training samples that include the first foreground image corresponding to the second type, the first type and the second type being different.
According to one or more embodiments of the present disclosure, an apparatus of example fourteen is provided, and the target matting model obtaining module may include:
The matting training unit is used for acquiring a matting loss function and performing matting training on the original matting model based on the matting loss function and a plurality of groups of first training samples;
The segmentation training unit is used for acquiring a segmentation loss function and carrying out segmentation training on the original matting model based on the segmentation loss function and a plurality of groups of second training samples;
The target matting model obtaining unit is used for obtaining the target matting model.
According to one or more embodiments of the present disclosure, an apparatus of example seventeen is provided, a matting training unit may include:
the prediction transparent image obtaining subunit is used for inputting a first sample image and a first background image in the first training samples into the original matting model aiming at each group of first training samples to obtain a prediction transparent image;
The first network parameter adjusting subunit is used for adjusting network parameters in the original matting model according to the predicted transparent graph, the first transparent graph in the first training sample and the matting loss function;
the matting training unit may include:
The prediction segmentation map obtaining subunit is used for inputting a second sample image and a second background image in the second training samples into the original matting model aiming at each group of second training samples to obtain a prediction segmentation map;
the second network parameter adjusting subunit is used for adjusting network parameters according to the prediction segmentation map, the labeling segmentation map and the segmentation loss function in the second training sample;
the target matting model obtaining unit may be specifically configured to:
and obtaining the target matting model according to the adjustment result of the network parameters.
According to one or more embodiments of the present disclosure, an apparatus of example seventeen is provided, where the matting loss function includes at least one of a laplace loss function, an L1 loss function, an edge loss function, and a foreground object loss function, the edge loss function is a loss function constructed according to a predicted transparency map, a first transparency map, and an expansion corrosion result, the expansion corrosion result is a result obtained by expanding and corroding the first transparency map, respectively, and the foreground object loss function is a loss function constructed according to the predicted transparency map, the first transparency map, and the expansion result is a result obtained by expanding the first transparency map.
According to one or more embodiments of the present disclosure, an apparatus of example fourteen is provided [ example twenty ], where the matting apparatus may further include:
The data augmentation module is used for carrying out data augmentation on the first background image to obtain a fourth background image, and taking the first sample image, the fourth background image and the first transparent image as a group of fourth training samples;
the model training module may include:
and the model third training unit is used for training the original matting model based on a plurality of groups of first training samples, a plurality of groups of second training samples and a plurality of groups of fourth training samples.
According to one or more embodiments of the present disclosure, an apparatus of example twenty is provided, a data augmentation module may include at least one of:
A rotation and translation unit for rotating and/or translating the first background image;
An environmental parameter adjustment unit, configured to adjust an environmental parameter of the first background image, where the environmental parameter includes at least one of a brightness parameter, a hue parameter, and a contrast parameter;
A blurring processing unit, configured to perform gaussian blurring and/or motion blurring on the first background image;
And the shadow artifact generating unit is used for generating shadows and/or artifacts in the first background image.
According to one or more embodiments of the present disclosure, an apparatus of example fourteen is provided, where the matting apparatus may further include:
the image reacquiring module is used for acquiring a second sample image;
And the second background image generation module is used for generating a second background image according to the pixel value of each pixel point in the second sample image.
According to one or more embodiments of the present disclosure, an apparatus of example fourteen is provided [ example twenty-third ], where the matting apparatus may further include:
The first foreground image extraction module is used for acquiring an acquired image and a first transparent image, and extracting the first foreground image from the acquired image according to the first transparent image;
The first sample image obtaining module is used for obtaining a first background image, and synthesizing the first background image and the first foreground image to obtain a first sample image.
According to one or more embodiments of the present disclosure, an apparatus of example fourteen is provided, an original matting model comprising coding layers and decoding layers connected in sequence; wherein,
The coding layer comprises a first coding layer and a second coding layer, the first coding layer is used for coding the first sample image and the second sample image, and the second coding layer is used for coding the first background image and the second background image;
and/or the number of the groups of groups,
The output channel of the decoding layer comprises a segmentation output channel for outputting a prediction segmentation map obtained after the decoding layer processes the second sample image and the second background image;
and/or the number of the groups of groups,
Aiming at the current layer in the decoding layers of each layer, the input characteristics of the current layer comprise splicing characteristics, wherein the splicing characteristics are obtained by splicing the last output characteristics of the decoding layer of the last layer of the current layer, the middle output characteristics with the same scale as the last output characteristics in the middle output characteristics of the coding layer of each layer and the first sample image with the size adjusted to the scale.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims (15)

1. A matting method, comprising:
acquiring a trained target matting model, a target matting image and a priori background image corresponding to a target background image in the target matting image;
Inputting the image to be scratched and the prior background image into the target scratched model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched;
The target image matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to the first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to the second foreground images in the second sample images.
2. A method as in claim 1 wherein the target matting model is pre-trained by:
acquiring an original matting model to be trained, the first training sample and the second training sample;
And training the original matting model based on a plurality of groups of the first training samples and a plurality of groups of the second training samples to obtain the target matting model.
3. The method as recited in claim 2, further comprising:
acquiring a third training sample, wherein the third training sample comprises a third sample image, a third background image corresponding to the third sample image and a second transparent image corresponding to a third foreground image in the third sample image;
the training the original matting model based on the plurality of groups of the first training samples and the plurality of groups of the second training samples includes:
And training the original matting model based on a plurality of groups of the first training samples, a plurality of groups of the second training samples and a plurality of groups of the third training samples.
4. A method according to claim 3, wherein the foreground object in the second foreground image and the foreground object in the third foreground image each correspond to a first type, and wherein there are multiple sets of the first training samples comprising the first foreground image corresponding to the first type and the first training samples comprising the first foreground image corresponding to a second type, the first type and the second type being different.
5. A method as in claim 2 wherein training the original matting model based on the plurality of sets of the first training samples and the plurality of sets of the second training samples to obtain the target matting model comprises:
acquiring a matting loss function, and performing matting training on the original matting model based on the matting loss function and a plurality of groups of first training samples;
obtaining a segmentation loss function, and carrying out segmentation training on the original matting model based on the segmentation loss function and a plurality of groups of second training samples;
And obtaining the target matting model.
6. A method as in claim 5 wherein said matting training the original matting model based on the matting loss function and a plurality of sets of the first training samples comprises:
inputting the first sample image and the first background image in the first training sample into the original matting model aiming at each group of the first training samples to obtain a prediction transparent image;
according to the prediction transparent graph, the first transparent graph in the first training sample and the matting loss function, adjusting network parameters in the original matting model;
the performing segmentation training on the original matting model based on the segmentation loss function and a plurality of groups of the second training samples comprises the following steps:
Inputting the second sample image and the second background image in the second training sample into the original matting model aiming at each group of the second training samples to obtain a prediction segmentation map;
According to the prediction segmentation map, the labeling segmentation map and the segmentation loss function in the second training sample, adjusting the network parameters;
the obtaining the target matting model comprises the following steps:
and obtaining the target matting model according to the adjustment result of the network parameters.
7. A method as in claim 5 wherein said matting loss function comprises at least one of a laplace loss function, an L1 loss function, an edge loss function, and a foreground object loss function, said edge loss function being a loss function constructed from a predicted transparency map, said first transparency map, and an expansion corrosion result, said expansion corrosion result comprising results from expanding and corroding said first transparency map, respectively, and said foreground object loss function being a loss function constructed from said predicted transparency map, said first transparency map, and an expansion result, said expansion result being a result from expanding said first transparency map.
8. The method as recited in claim 2, further comprising:
performing data augmentation on the first background image to obtain a fourth background image, and taking the first sample image, the fourth background image and the first transparent image as a group of fourth training samples;
the training the original matting model based on the plurality of groups of the first training samples and the plurality of groups of the second training samples includes:
And training the original matting model based on a plurality of groups of the first training samples, a plurality of groups of the second training samples and a plurality of groups of the fourth training samples.
9. The method of claim 8, wherein the data augmentation of the first background image comprises at least one of:
Rotating and/or translating the first background image;
adjusting an environmental parameter of the first background image, wherein the environmental parameter comprises at least one of a brightness parameter, a tone parameter and a contrast parameter;
Performing Gaussian blur and/or motion blur on the first background image;
shadows and/or artifacts are generated in the first background image.
10. The method as recited in claim 2, further comprising:
Acquiring the second sample image;
And generating the second background image according to the pixel value of each pixel point in the second sample image.
11. The method as recited in claim 2, further comprising:
The sample background image is the same as the first background image, an acquired image and the first transparent image are obtained, and the first foreground image is extracted from the acquired image according to the first transparent image;
and acquiring the first background image, and synthesizing the first background image and the first foreground image to obtain the first sample image.
12. A method as in claim 2 wherein the original matting model comprises coding and decoding layers connected in sequence; wherein,
The encoding layers include a first encoding layer for encoding the first sample image and the second sample image, and a second encoding layer for encoding the first background image and the second background image;
and/or the number of the groups of groups,
The output channel of the decoding layer comprises a segmentation output channel for outputting a prediction segmentation map obtained after the second sample image and the second background image are processed by the encoding layer and the decoding layer;
and/or the number of the groups of groups,
For the current layer in the decoding layers of each layer, the input features of the current layer comprise splicing features, wherein the splicing features are obtained by splicing a last output feature of the decoding layer of a previous layer of the current layer, the intermediate output features with the same scale as the last output feature in the intermediate output features of the coding layers of each layer, and the first sample image with the size adjusted to the scale.
13. A matting apparatus comprising:
The image acquisition module is used for acquiring the trained target matting model, the to-be-matting image and the prior background image corresponding to the target background image in the to-be-matting image;
The image matting module is used for inputting the image to be scratched and the prior background image into the target image matting model to obtain a target transparent image corresponding to a target foreground image in the image to be scratched;
The target image matting model is obtained through training of a plurality of groups of first training samples and a plurality of groups of second training samples, the first training samples comprise first sample images, first background images corresponding to sample background images in the first sample images and first transparent images corresponding to the first foreground images in the first sample images, and the second training samples comprise second sample images, second background images corresponding to the second sample images and labeling segmentation images corresponding to the second foreground images in the second sample images.
14. An electronic device, comprising:
One or more processors;
a memory for storing one or more programs;
When executed by the one or more processors, causes the one or more processors to implement a matting method as claimed in any one of claims 1 to 12.
15. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements a matting method according to any one of claims 1 to 12.
CN202210515100.2A 2022-05-11 2022-05-11 Matting method and device, electronic equipment and storage medium Active CN114926491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210515100.2A CN114926491B (en) 2022-05-11 2022-05-11 Matting method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210515100.2A CN114926491B (en) 2022-05-11 2022-05-11 Matting method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114926491A CN114926491A (en) 2022-08-19
CN114926491B true CN114926491B (en) 2024-07-09

Family

ID=82809193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210515100.2A Active CN114926491B (en) 2022-05-11 2022-05-11 Matting method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114926491B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223016A (en) * 2022-09-19 2022-10-21 苏州万店掌网络科技有限公司 Sample labeling method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768425A (en) * 2020-07-23 2020-10-13 腾讯科技(深圳)有限公司 Image processing method, device and equipment
CN112541927A (en) * 2020-12-18 2021-03-23 Oppo广东移动通信有限公司 Method, device, equipment and storage medium for training and matting model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6676563B2 (en) * 2017-02-22 2020-04-08 日本電信電話株式会社 Image processing apparatus, image processing method, and image processing program
US10255681B2 (en) * 2017-03-02 2019-04-09 Adobe Inc. Image matting using deep learning
CN114038006A (en) * 2021-08-09 2022-02-11 奥比中光科技集团股份有限公司 Matting network training method and matting method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768425A (en) * 2020-07-23 2020-10-13 腾讯科技(深圳)有限公司 Image processing method, device and equipment
CN112541927A (en) * 2020-12-18 2021-03-23 Oppo广东移动通信有限公司 Method, device, equipment and storage medium for training and matting model

Also Published As

Publication number Publication date
CN114926491A (en) 2022-08-19

Similar Documents

Publication Publication Date Title
CN110288036B (en) Image restoration method and device and electronic equipment
CN112419151B (en) Image degradation processing method and device, storage medium and electronic equipment
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
CN110288549B (en) Video repairing method and device and electronic equipment
CN111598902B (en) Image segmentation method, device, electronic equipment and computer readable medium
CN113689372B (en) Image processing method, apparatus, storage medium, and program product
CN110349107B (en) Image enhancement method, device, electronic equipment and storage medium
CN112381717A (en) Image processing method, model training method, device, medium, and apparatus
CN112330788A (en) Image processing method, image processing device, readable medium and electronic equipment
CN113034648A (en) Image processing method, device, equipment and storage medium
CN111325704A (en) Image restoration method and device, electronic equipment and computer-readable storage medium
CN112927144A (en) Image enhancement method, image enhancement device, medium, and electronic apparatus
CN113962859B (en) Panorama generation method, device, equipment and medium
CN112182299A (en) Method, device, equipment and medium for acquiring highlight segments in video
CN111738951B (en) Image processing method and device
CN114926491B (en) Matting method and device, electronic equipment and storage medium
CN112967207A (en) Image processing method and device, electronic equipment and storage medium
CN108460768B (en) Video attention object segmentation method and device for hierarchical time domain segmentation
CN114066722B (en) Method and device for acquiring image and electronic equipment
CN113706385A (en) Video super-resolution method and device, electronic equipment and storage medium
CN111696041B (en) Image processing method and device and electronic equipment
CN114998149A (en) Training method of image restoration model, image restoration method, device and equipment
CN112308809B (en) Image synthesis method, device, computer equipment and storage medium
CN116188254A (en) Fourier domain-based super-resolution image processing method, device, equipment and medium
CN111612714A (en) Image restoration method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant