CN111784726A

CN111784726A - Image matting method and device

Info

Publication number: CN111784726A
Application number: CN201910912853.5A
Authority: CN
Inventors: 申童; 张炜; 梅涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2020-10-16

Abstract

The embodiment of the application discloses a portrait matting method and device. One embodiment of the method comprises: acquiring an original image presenting a portrait; inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; determining a mask of the original image based on the original image, the trimap image of the original image and a pre-trained matting model; and intercepting a portrait image from the original image by using the mask of the original image. The embodiment can enable the trimap image generation model and the cutout model not to be coupled, and facilitates modifying or enhancing functions of any model.

Description

Image matting method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a portrait matting method and a portrait matting device.

Background

With the rapid development of smart phones and internet technologies, taking pictures with mobile phones and editing and sharing pictures become important components in people's lives. Portrait photos, as a form of photos, play an important role in everyday entertainment, social interaction, and sharing. Portrait photographs generally refer to photographs in the form of a portrait plus a background. For such pictures, there is a more common need to separate the portrait from the background, i.e. portrait segmentation or portrait matting, which we often say.

Disclosure of Invention

The embodiment of the application provides a portrait matting method and device.

In a first aspect, an embodiment of the present application provides a method for portrait matting, including: acquiring an original image presenting a portrait; inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; determining a mask of the original image based on the original image, the trimap image of the original image and a pre-trained matting model; and intercepting a portrait image from the original image by using the mask of the original image.

In some embodiments, determining a mask for the original image based on the original image, the trimap image of the original image, and the pre-trained matte model comprises: correcting the trimap image of the original image; and inputting the original image and the corrected trisection image into a pre-trained matting model to obtain a mask of the original image.

In some embodiments, modifying the trimap image of the original image comprises: determining at least one connected region of a foreground region in a three-part image of an original image; determining a connected region with a region area smaller than a preset first area threshold value in at least one connected region of the foreground region as a first target connected region; and changing the first target connected region from the foreground region to the unknown region to obtain the corrected trimap image.

In some embodiments, modifying the trimap image of the original image comprises: determining at least one connected region of a background region in a three-part image of an original image; determining a connected region with a region area smaller than a preset second area threshold value in at least one connected region of the background region as a second target connected region; and changing the second target connected region from the background region to the unknown region to obtain the corrected trimap image.

In some embodiments, the trimap generation model is trained as follows: acquiring a first training sample set, wherein the first training sample comprises a first sample image and a first sample three-part graph, and the first sample three-part graph is generated by performing morphological transformation on a mask corresponding to the first sample image; and respectively taking the first sample image and the first sample trimap image in the first training sample set as the input and the expected output of the first initial model, and training the first initial model by using a machine learning method to obtain a trimap image generation model.

In some embodiments, the matte model is trained by: acquiring a second training sample set, wherein the second training sample comprises a second sample image, a second sample three-division diagram and a sample mask, and a portrait appears in the second sample image; and taking a second sample image and a second sample tertiary map in a second training sample in the second training sample set as the input of a second initial model, taking a sample mask corresponding to the input second sample image and the second sample tertiary map as the expected output of the second initial model, and training the second initial model by using a machine learning method to obtain a matting model.

In a second aspect, an embodiment of the present application provides a portrait matting device, including: an acquisition unit configured to acquire an original image presented with a portrait; the input unit is configured to input the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; a determining unit configured to determine a mask of the original image based on the original image, a trimap of the original image, and a pre-trained matting model; and the intercepting unit is configured to intercept the portrait image from the original image by using the mask of the original image.

In some embodiments, the determining unit is further configured to determine the mask of the original image based on the original image, the trimap of the original image, and the pre-trained matting model by: correcting the trimap image of the original image; and inputting the original image and the corrected trisection image into a pre-trained matting model to obtain a mask of the original image.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

The portrait matting method and the portrait matting device provided by the above embodiments of the application first obtain an original image presenting a portrait; then, inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; then, determining a mask of the original image based on the original image, the trimap image of the original image and a pre-trained matting model; and finally, intercepting a portrait image from the original image by using the mask of the original image. In this way, the trimap image generation model and the cutout model are not coupled, and modification or enhancement functions of any model are facilitated.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which various embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a person matting method according to the application;

FIG. 3 is a flow diagram of yet another embodiment of a person matting method according to the application;

FIG. 4 is a schematic diagram of one application scenario of a person matting method according to the application;

FIG. 5 is a schematic diagram of the construction of one embodiment of a person image matting device according to the application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

FIG. 1 illustrates an exemplary system architecture 100 to which the image matting method or apparatus of the present application can be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages (e.g., the

terminal devices

101, 102, 103 may send captured original images presented with a portrait to the server 105; the server 105 may also send pre-trained trimap generation models and pre-trained matte models to the

terminal devices

101, 102, 103), etc. Various communication client applications, such as an image processing application, a camera application, a face recognition application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal device

101, 102, 103 may first obtain an original image presenting a portrait; then, inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; then, determining a mask of the original image based on the original image, the trimap image of the original image and a pre-trained matting model; finally, the mask of the original image can be used to cut out the portrait image from the original image.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a camera and supporting information interaction, including but not limited to smart phones, tablet computers, laptop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services. For example, an original image presented with a portrait may be analyzed to intercept the portrait image from the original image. The server 105 may first obtain an original image presenting a portrait; then, inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; then, determining a mask of the original image based on the original image, the trimap image of the original image and a pre-trained matting model; finally, the mask of the original image can be used to cut out the portrait image from the original image.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the image matting method provided in the embodiment of the present application may be executed by the

terminal devices

101, 102, and 103, or may be executed by the server 105.

It should be further noted that the local area of the

terminal devices

101, 102, 103 may store a pre-trained trimap image generation model and a pre-trained matte model, and the

terminal devices

101, 102, 103 may determine the trimap image of the original image and the mask of the original image, so as to intercept the portrait image from the original image. The exemplary system architecture 100 may not have a network 104 and server 105 at this point.

It should be noted that the server 105 may locally store an original image representing a portrait, and the server 105 may locally acquire the original image representing the portrait. The exemplary system architecture 100 may not have

terminal devices

101, 102, 103 and network 104 present at this time.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of image matting according to the application is shown. The image matting method comprises the following steps:

step 201, obtaining an original image presenting a portrait.

In this embodiment, an executing body (e.g., a server shown in fig. 1) of the portrait matting method can acquire an original image presented with a portrait. A portrait is a planar or stereoscopic depiction of an individual as a whole, typically including the face, eyes, nose, mouth, ears, eyebrows, etc. of the face. Sometimes also described as the full body appearance of a person's limb, refers to a rough outline of the person.

Step 202, inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image.

In this embodiment, the executing subject may input the original image acquired in step 201 into a pre-trained trimap image generation model, so as to obtain a trimap image of the original image. The trimap (trimap) may reflect foreground regions, background regions (background regions), and unknown regions (uncertain regions) of the original image. Here, the trimap image generation model described above may be used to characterize a correspondence between the original image and the trimap image of the original image, that is, the trimap image generation model may generate the trimap image of the original image based on the original image. Here, a trimap image generation model that can represent the correspondence between the original image and the trimap image of the original image may be trained in various ways.

As an example, the correspondence table in which correspondence between a plurality of sample images and trimap images corresponding to the sample images is stored may be generated by a technician based on statistics of a large number of sample images and trimap images corresponding to the sample images (for example, may be a trimap image drawn by the technician based on a mask corresponding to the sample images), and the correspondence table may be used as a trimap image generation model.

And step 203, determining a mask of the original image based on the original image, the trimap image of the original image and the pre-trained matting model.

In this embodiment, the executing entity may determine the mask of the original image based on the original image acquired in step 201, the trimap of the original image obtained in step 202, and the pre-trained matting model. Note that the mask refers to the outside of the box. The mask herein is generally referred to as a mask with alpha channels. The alpha channel refers to the transparency and translucency of a picture. Specifically, the executing agent may input the original image and the trimap image of the original image into the pre-trained matting model to obtain a mask of the original image.

Here, the above-mentioned scratch model may be used to characterize the correspondence between the original image and the trimap of the original image and the mask of the original image, i.e. the scratch model may generate the mask of the original image based on both the original image and the trimap of the original image. Here, a matting model that can represent correspondence between both the original image and the trimap image of the original image and the mask of the original image can be trained in various ways.

As an example, a correspondence table storing correspondence between a plurality of sample images and trimap images of the sample images and masks of the sample images, which is generated by a technician based on statistics of a large number of sample images, trimap images of the sample images, and masks of the sample images, may be used as the matting model.

And step 204, intercepting a portrait image from the original image by using the mask of the original image.

In this embodiment, the executing subject may cut out the portrait image from the original image by using the mask of the original image determined in step 203. Since the mask refers to the outside of the frame, the execution subject may determine the inside area of the frame as the portrait image, thereby intercepting the portrait image from the original image.

In some optional implementations of this embodiment, the trimap generation model may be obtained by training the execution subject or other execution subjects used for training the trimap generation model in the following manners:

at step S1, a first set of training samples may be obtained.

Here, the first training sample in the first training sample set may include a first sample image and a first sample trimap. The first sample histogram may be generated by performing morphological transformation on a mask corresponding to the first sample image. Mask refers to the outside of the box (the inside of the box is the box). The mask herein generally refers to a mask with an alpha (alpha ) channel. The alpha channel refers to the transparency and translucency of a picture.

The core problem of matting technique is to solve the following formula (1):

I_p＝α_pF_p+(1-α_p)B_p(1)

wherein, I_pα is the pixel value of the p-th pixel point of the image, which is a known quantity_pIs the transparency of the p-th pixel point of the image, F_pIs the foreground pixel value of the p-th pixel point of the image, B_pFor the understanding of this formula, the original image can be considered as foreground and background according to a certain weight (α)_pTransparency) α -1 for pixels that are completely determined to be foreground, α -0 for pixels that are completely determined to be background, α for pixels that are not determined to be foreground or background (unknown area) is between 0 and 1.

Here, the mask corresponding to the image may be morphologically transformed to obtain a trimap image of the image as follows: the mask corresponding to the image may be subjected to a dilation (dilate) transform and an erosion (anode) transform to obtain a trimap image of the image. Note that the expansion and erosion are usually directed to the foreground region (white portion, highlight portion). Here, the dilation is to dilate a foreground region in the image, and a part of the region dilated by the dilation operation is taken as an unknown region, and it is easily understood that the background region (black part) naturally becomes smaller after the dilation operation is performed on the image; the erosion is to erode the foreground region in the image, and the region reduced by the erosion operation is taken as an unknown region, so it is easy to understand that the foreground region (white portion) naturally becomes smaller after the erosion operation is performed on the image.

In the morphological transformation, if different parameters (expansion parameter and corrosion parameter) are used for morphological transformation, the obtained trimap images are different. In order to improve the effect of the trimap generated by the trained trimap generation model during the application process, each first sample trimap is usually generated by performing morphological transformation on the mask corresponding to the first sample image by using different parameters.

In step S2, the first sample image and the first sample trimap in the first training sample of the first training sample set may be used as the input and the expected output of the first initial model, respectively, and the first initial model is trained by using a machine learning method to obtain a trimap generation model.

Here, the first sample image in the first training sample set may be input into the first initial model to obtain a trimap image of the first sample image, and the first initial model may be trained by a machine learning method with the first sample trimap image in the first training sample as an expected output of the first initial model. Specifically, the difference between the obtained trimap and the first sample trimap in the first training sample may be first calculated by using a preset loss function, and the difference between the obtained trimap and the first sample trimap in the first training sample may be calculated by using a cross entropy loss function which may adopt a standard as a loss function. Then, the network parameters of the first initial model may be adjusted based on the calculated difference, and the training may be ended if a preset training end condition is satisfied. For example, the preset training end condition may include, but is not limited to, at least one of the following: the training time exceeds the preset time; the training times exceed the preset times; the calculated difference is less than a preset difference threshold. The first initial model may be a convolutional neural network, a deep neural network, or the like.

Here, the trained trimap generation model may be a structure based on deplab V3+ and Resnet50, which may include an encoder that down-samples the input image by 16 times and up-samples by 4 times. Deeplab V3+ is a tool that can adjust the filter field of view, controlling the resolution of the feature response computed by the convolutional neural network.

In some optional implementations of this embodiment, the cutout model may be obtained by training the execution subject or other execution subjects used for training the cutout model in the following ways:

in step S1, a second set of training samples may be obtained.

Here, the second training sample in the second training sample set may include a second sample image, a second sample trimap, and a sample mask. The second sample image is typically represented by a portrait. The second sample trimap may be generated by performing morphological transformation on a mask corresponding to the second sample image. Here, when performing morphological transformation on the mask corresponding to the second sample image, parameters applied in the morphological transformation are generally set so as to increase the range of the morphological transformation.

In step S2, the second initial model may be obtained by training the second initial model by using a machine learning method, with the second sample image and the second sample histogram in the second training sample set as inputs of the second initial model, and the sample mask corresponding to the input second sample image and the second sample histogram as expected outputs of the second initial model.

Here, the second initial model may be trained by a machine learning method by inputting the second sample image and the second sample trimap in the second training sample set into the second initial model to obtain a mask of the second sample image, and using the sample mask in the second training sample as an expected output of the second initial model. Specifically, the difference between the obtained mask and the sample mask in the second training sample may be first calculated using a preset loss function, and the difference between the obtained mask and the sample mask in the second training sample may be calculated using a regression loss function as the loss function. Then, the network parameters of the second initial model may be adjusted based on the calculated difference, and the training may be ended if a preset training end condition is satisfied. For example, the preset training end condition may include, but is not limited to, at least one of the following: the training time exceeds the preset time; the training times exceed the preset times; the calculated difference is less than a preset difference threshold. The second initial model may be a convolutional neural network, a deep neural network, or the like.

In the method provided by the embodiment of the application, the original image is input into the pre-trained trisection image generation model to obtain the trisection image of the original image, and the mask of the original image is determined based on the original image, the trisection image of the original image and the pre-trained matting model. The mode can ensure that the trimap image generation model and the cutout model are not coupled, and the modification or the enhancement of the function of any model is facilitated.

With further reference to fig. 3, a flow 300 of yet another embodiment of a method of image matting is shown. The process 300 of the image matting method includes the following steps:

step 301, an original image presenting a portrait is obtained.

Step 302, inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image.

In the present embodiment, the steps 301-302 can be performed in a similar manner to the steps 201-202, and will not be described herein again.

Step 303, correcting the trimap image of the original image.

In this embodiment, the execution subject may correct the trimap image of the original image obtained in step 302. For example, if the execution subject is a terminal device, the execution subject may present the original image and the trimap of the original image, and a user may correct the original image and the trimap of the original image presented by the terminal device. If the execution main body is a server, the execution main body can send the original image and the trimap image of the original image to a target terminal device (a terminal device of a person who corrects the trimap image); then, the target terminal device may present the original image and the trimap image of the original image, and a user may correct the original image and the trimap image of the original image presented by the target terminal device; finally, the target terminal device may send the corrected trimap image of the original image to the execution main body.

And step 304, inputting the original image and the corrected trisection image into a pre-trained matting model to obtain a mask of the original image.

In this embodiment, the executing entity may input the original image acquired in step 301 and the corrected trisection image acquired in step 303 into a pre-trained matting model to obtain a mask of the original image.

Step 305, using the mask of the original image to intercept the portrait image from the original image.

In this embodiment, step 305 may be performed in a similar manner as step 204, and is not described herein again.

In some optional implementations of the embodiment, the executing body may correct the trimap image of the original image by: the execution subject may first determine at least one connected region of the foreground region in the trimap image of the original image. In an image, the smallest unit is a pixel, with 8 contiguous pixels around each pixel. There are 2 common adjacencies: 4 contiguous with 8 contiguous. 4 contiguous means 4 points up, down, left, and right contiguous. The 8-point adjacency is 8 point adjacency in total, which means up, down, left, right, and diagonal positions. If the pixel points A and B are adjacent, we call A and B connected, so we have no proven conclusion as follows: if A is communicated with B and B is communicated with C, A is communicated with C. Visually, the dots that are connected to each other form one area, while the dots that are not connected form a different area. Such a set of points connected to each other is called a connected region. Here, the adjacency relation of the pixel points in the communication area is usually 8 adjacencies. Then, for each connected region in at least one connected region of the foreground region, the execution subject may determine a region area of the connected region; and determining a connected region with a region area smaller than a preset first area threshold value in the at least one connected region as a first target connected region. Finally, the execution subject may change the region attribute of the first target connected region in the trimap image from a foreground region to an unknown region, so as to obtain a modified trimap image.

In some optional implementations of the embodiment, the executing body may correct the trimap image of the original image by: the execution subject may first determine at least one connected region of the background region in the trimap image of the original image. Here, the adjacency relation of the pixel points in the communication area is usually 8 adjacencies. Then, for each of at least one connected region of the background region, the execution subject may determine a region area of the connected region; and determining a connected region with a region area smaller than a preset second area threshold value in at least one connected region of the background region as a second target connected region. Finally, the execution subject may change the region attribute of the second target connected region in the trimap image from a background region to an unknown region, so as to obtain a modified trimap image.

It should be noted that, the execution subject may change a first target connected region in the trimap of the original image from a foreground region to an unknown region, and change a second target connected region in the trimap of the original image from a background region to an unknown region, so as to obtain a modified trimap.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, a flow 300 of the image matting method in this embodiment represents a step 303 of correcting a trimap image of an original image, and a step 304 of inputting the original image and the corrected trimap image into a pre-trained matting model to obtain a mask of the original image. Therefore, the scheme described in the embodiment can correct the output trimap image, so that the accuracy of the matting result is improved.

With continued reference to fig. 4, fig. 4 is a schematic diagram of an application scenario of the image matting method according to the present embodiment. In the application scenario of fig. 4, an executing body (e.g., a server or a terminal device) of the portrait matting method may first acquire an original image 401 presented with a portrait. Thereafter, the executing body may input the original image 401 into the pre-trained trimap image generation model 402, resulting in the trimap image 403 of the original image 401. Then, the execution subject may correct the trimap image 403 of the original image to obtain a corrected trimap image 404. The executive can then input the original image 401 and the modified trimap image 404 into a pre-trained matting model 405 to obtain a mask 406 of the original image 401. Finally, the executing entity may use the mask 406 of the original image to cut out a portrait image 407 from the original image 401.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present application provides an embodiment of a portrait matting device, which corresponds to the method embodiment shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 5, the image matting device 500 of the present embodiment includes: an acquisition unit 501, an input unit 502, a determination unit 503, and a clipping unit 504. Wherein the obtaining unit 501 is configured to obtain an original image presented with a portrait; the input unit 502 is configured to input an original image into a pre-trained trimap image generation model, resulting in a trimap image of the original image; the determination unit 503 is configured to determine a mask of the original image based on the original image, the trimap of the original image, and the pre-trained matting model; the clipping unit 504 is configured to clip the portrait image from the original image using the mask of the original image.

In the present embodiment, the specific processing of the acquisition unit 501, the input unit 502, the determination unit 503, and the interception unit 504 of the image matting device 500 can refer to step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2.

In some optional implementations of the present embodiment, the determining unit 503 may correct the trimap image of the original image. For example, if the execution subject is a terminal device, the determining unit 503 may present the original image and the trimap of the original image, and the user may correct the original image and the trimap of the original image presented by the terminal device. If the execution subject is a server, the determining unit 503 may send the original image and the trimap image of the original image to a target terminal device (a terminal device of a person who corrects the trimap image); then, the target terminal device may present the original image and the trimap image of the original image, and a user may correct the original image and the trimap image of the original image presented by the target terminal device; finally, the target terminal device may send the corrected trimap image of the original image to the execution main body. The determining unit 503 may input the original image and the corrected trimap image into a pre-trained matting model to obtain a mask of the original image. Here, the above-mentioned scratch model may be used to characterize the correspondence between the original image and the trimap of the original image and the mask of the original image, i.e. the scratch model may generate the mask of the original image based on both the original image and the trimap of the original image. Here, a matting model that can represent correspondence between both the original image and the trimap image of the original image and the mask of the original image can be trained in various ways. As an example, a correspondence table storing correspondence between a plurality of sample images and trimap images of the sample images and masks of the sample images, which is generated by a technician based on statistics of a large number of sample images, trimap images of the sample images, and masks of the sample images, may be used as the matting model.

In some optional implementations of the present embodiment, the determining unit 503 may correct the trimap image of the original image by: the determining unit 503 may first determine at least one connected region of the foreground region in the trimap image of the original image. In an image, the smallest unit is a pixel, with 8 contiguous pixels around each pixel. There are 2 common adjacencies: 4 contiguous with 8 contiguous. 4 contiguous means 4 points up, down, left, and right contiguous. The 8-point adjacency is 8 point adjacency in total, which means up, down, left, right, and diagonal positions. If the pixel points A and B are adjacent, we call A and B connected, so we have no proven conclusion as follows: if A is communicated with B and B is communicated with C, A is communicated with C. Visually, the dots that are connected to each other form one area, while the dots that are not connected form a different area. Such a set of points connected to each other is called a connected region. Here, the adjacency relation of the pixel points in the communication area is usually 8 adjacencies. Then, for each connected region in at least one connected region of the foreground region, the determining unit 503 may determine a region area of the connected region; and determining a connected region with a region area smaller than a preset first area threshold value in the at least one connected region as a first target connected region. Finally, the determining unit 503 may change the region attribute of the first target connected region in the trimap image from a foreground region to an unknown region, so as to obtain a modified trimap image.

In some optional implementations of the present embodiment, the determining unit 503 may correct the trimap image of the original image by: the determining unit 503 may first determine at least one connected region of the background region in the trimap image of the original image. Here, the adjacency relation of the pixel points in the communication area is usually 8 adjacencies. Then, for each of at least one connected region of the background region, the determining unit 503 may determine a region area of the connected region; and determining a connected region with a region area smaller than a preset second area threshold value in at least one connected region of the background region as a second target connected region. Finally, the determining unit 503 may change the region attribute of the second target connected region in the trimap image from a background region to an unknown region, so as to obtain a modified trimap image.

at step S1, a first set of training samples may be obtained.

The core problem of matting technique is to solve the following formula (1):

I_p＝α_pF_p+(1-α_p)B_p(1)

Here, the mask corresponding to the image may be morphologically transformed to obtain a trimap image of the image as follows: the expansion transformation and the corrosion transformation can be carried out on the mask corresponding to the image to obtain the trimap image of the image. It should be noted that expansion and erosion are usually directed to the foreground region. Here, the dilation is to dilate a foreground region in the image, and a part of the region dilated by the dilation operation is taken as an unknown region, and it is easily understood that the background region naturally becomes smaller after the dilation operation is performed on the image; the erosion is to erode the foreground region in the image, and the region reduced by the erosion operation is taken as an unknown region, so it is easy to understand that the foreground region naturally becomes smaller after the erosion operation is performed on the image.

In the morphological transformation process, if different parameters are used for morphological transformation, the obtained trimap images are different. In order to improve the effect of the trimap generated by the trained trimap generation model during the application process, each first sample trimap is usually generated by performing morphological transformation on the mask corresponding to the first sample image by using different parameters.

in step S1, a second set of training samples may be obtained.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an original image presenting a portrait; inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; determining a mask of the original image based on the original image, the trimap image of the original image and a pre-trained matting model; and intercepting a portrait image from the original image by using the mask of the original image.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an input unit, a determination unit, and an interception unit. Where the names of the cells do not in some cases constitute a limitation of the cells themselves, for example, the acquisition cell may also be described as a "cell that acquires an original image presented with a portrait".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method of portrait matting, comprising:

acquiring an original image presenting a portrait;

inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image;

determining a mask of the original image based on the original image, the trimap image of the original image and a pre-trained matting model;

and intercepting a portrait image from the original image by using the mask of the original image.

2. The method of claim 1, wherein the determining a mask for the original image based on the original image, the trimap of the original image, and a pre-trained matte model comprises:

correcting the trimap image of the original image;

and inputting the original image and the corrected trisection image into a pre-trained matting model to obtain a mask of the original image.

3. The method of claim 2, wherein the modifying the trimap image of the original image comprises:

determining at least one connected region of a foreground region in a trimap image of the original image;

determining a connected region with a region area smaller than a preset first area threshold value in at least one connected region of the foreground region as a first target connected region;

and changing the first target connected region from a foreground region to an unknown region to obtain a corrected trimap image.

4. The method of claim 2, wherein the modifying the trimap image of the original image comprises:

determining at least one connected region of a background region in the trimap image of the original image;

determining a connected region with a region area smaller than a preset second area threshold value in at least one connected region of the background region as a second target connected region;

and changing the second target connected region from a background region to an unknown region to obtain a corrected trimap image.

5. The method of claim 1, wherein the trimap generation model is trained by:

acquiring a first training sample set, wherein the first training sample comprises a first sample image and a first sample three-part graph, and the first sample three-part graph is generated by performing morphological transformation on a mask corresponding to the first sample image;

and respectively taking the first sample image and the first sample trimap image in the first training sample set as the input and the expected output of the first initial model, and training the first initial model by using a machine learning method to obtain a trimap image generation model.

6. The method of claim 1, wherein the matting model is trained by:

acquiring a second training sample set, wherein the second training sample comprises a second sample image, a second sample three-division diagram and a sample mask, and a portrait appears in the second sample image;

and taking a second sample image and a second sample tertiary map in a second training sample in the second training sample set as the input of a second initial model, taking a sample mask corresponding to the input second sample image and the second sample tertiary map as the expected output of the second initial model, and training the second initial model by using a machine learning method to obtain a sectional model.

7. A portrait matting device comprising:

an acquisition unit configured to acquire an original image presented with a portrait;

the input unit is configured to input the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image;

a determining unit configured to determine a mask of the original image based on the original image, a trimap of the original image, and a pre-trained matting model;

and the intercepting unit is configured to intercept a portrait image from the original image by using the mask of the original image.

8. The apparatus of claim 7, wherein the determination unit is further configured to determine a mask of the original image based on the original image, the trimap of the original image, and a pre-trained matte model by:

correcting the trimap image of the original image;

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.