CN111047604A

CN111047604A - Transparency mask extraction method and device for high-definition image and storage medium

Info

Publication number: CN111047604A
Application number: CN201911203685.9A
Authority: CN
Inventors: 冯夫健; 王林; 黄翰; 谭棉; 刘爽; 魏嘉银
Original assignee: Guizhou Minzu University
Current assignee: Guizhou Minzu University
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2020-04-21
Anticipated expiration: 2039-11-29
Also published as: CN111047604B

Abstract

The invention provides a method, a device and a storage medium for extracting a transparency mask of a high-definition image, wherein the method comprises the steps of marking an unknown area in the high-definition image; dividing the unknown region into a plurality of sub-regions according to the pixel information in the unknown region; converting each subregion into a node of a graph structure, calculating edge weights between adjacent nodes, and generating the graph structure according to each edge weight; and generating a node optimization queue according to the edge weight between the nodes, determining a foreground region and a background region according to the node optimization queue, selecting a pixel value, and solving an optimal value of the pixel value. According to the method, the high-definition image is subjected to region division through the pixel points, the divided regions are expressed in a node form of a graph structure, the edge weight is calculated, a node optimization queue is obtained through the edge weight, the foreground region and the background region are quickly determined in the node optimization queue, optimization solution is carried out, and finally the optimal foreground mask value is obtained, so that the calculation precision is high and quick.

Description

Transparency mask extraction method and device for high-definition image and storage medium

Technical Field

The invention mainly relates to the technical field of image processing, in particular to a transparency mask extraction method and device for a high-definition image and a storage medium.

Background

At present, mobile devices such as mobile phones and cameras have higher and higher resolution of shot images, i.e., a transparency mask extraction technology of high-definition images is mainly applied to special effects of movies and televisions, different foreground objects are synthesized into a specified scene, the higher the extraction precision is, the better the visual effect of image synthesis is, and the problems of too long calculation time and low calculation precision exist in the conventional method for extracting the transparency mask of the high-definition images.

Disclosure of Invention

The invention provides a transparency mask extraction method and device for a high-definition image and a storage medium, aiming at the defects of the prior art.

The technical scheme for solving the technical problems is as follows: a transparency mask extraction method of a high-definition image comprises the following steps:

inputting a high-definition image, and marking an unknown region, a foreground region and a background region in the high-definition image;

dividing the unknown region into a plurality of sub-regions according to the pixel information in the unknown region;

converting each subregion into a node of a graph structure, calculating edge weights between adjacent nodes, and generating the graph structure according to each edge weight;

and generating a node optimization queue according to the edge weights among the nodes, selecting pixel values in the sub-regions, the foreground region and the background region, solving the optimal value of the selected pixel values according to the node optimization queue, and taking the optimal value obtained by the solution as the optimal foreground mask value.

Another technical solution of the present invention for solving the above technical problems is as follows: a transparency mask extraction device for high definition images comprises:

the calibration module is used for inputting a high-definition image and calibrating an unknown region, a foreground region and a background region in the high-definition image;

the region segmentation module is used for segmenting the unknown region into a plurality of sub-regions according to the pixel information in the unknown region;

the graph structure generating module is used for converting each subregion into a node of the graph structure, calculating edge weights between adjacent nodes and generating the graph structure according to each edge weight;

and the optimization module is used for generating a node optimization queue according to the edge weight among the nodes, selecting pixel values in the sub-regions, the foreground region and the background region, carrying out optimal value solution on the selected pixel values according to the node optimization queue, and taking the optimal value obtained by the solution as an optimal foreground mask value.

Another technical solution of the present invention for solving the above technical problems is as follows: a transparency mask extraction apparatus for high definition images comprises a memory, a processor and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the transparency mask extraction method for high definition images as described above is realized.

Another technical solution of the present invention for solving the above technical problems is as follows: a computer-readable storage medium, storing a computer program which, when executed by a processor, implements a transparency mask extraction method for high definition images as described above.

The invention has the beneficial effects that: the high-definition image is subjected to region division through pixel points, the divided regions are expressed in a node form of a graph structure, edge weights are calculated, a node optimization queue is obtained through the edge weights, a foreground region and a background region are rapidly determined in the node optimization queue, pixel values in the region are optimized and solved, and finally an optimal foreground mask value is obtained.

Drawings

Fig. 1 is a schematic flowchart of a transparency mask extraction method for a high definition image according to an embodiment of the present invention;

fig. 2 is a schematic diagram of functional modules of a transparency mask extraction apparatus for high definition images according to an embodiment of the present invention;

fig. 3 is a schematic node diagram of a graph structure according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a schematic flowchart of a transparency mask extraction method for a high definition image according to an embodiment of the present invention.

As shown in fig. 1, a method for extracting a transparency mask of a high definition image includes the following steps:

inputting a high-definition image, and marking an unknown area in the high-definition image;

Specifically, the generation of the node optimization queue is performed by a minimum spanning tree method and an edge weight.

It should be understood that a High Definition image refers to a High resolution image, and a High resolution image (High Definition) refers to an image with a vertical resolution of 720p or more.

Specifically, the unknown region, the foreground region and the background region are specified in the high-definition image, namely, the target texture edge of the high-definition image is subjected to expansion processing through a preset template, the expanded region is taken as the unknown region, the target region except the expanded region is taken as the background region, and the other regions except the target region are taken as the foreground region.

In the embodiment, the high-definition image is subjected to region division through the pixel points, the divided regions are expressed in a node form of a graph structure, the edge weight is calculated, the node optimization queue is obtained through the edge weight, the foreground region and the background region are quickly determined in the node optimization queue, the pixel values in the region are optimized and solved, the optimal foreground mask value is finally obtained, and the calculation precision is high and quick.

Optionally, as an embodiment of the present invention, the process of dividing the unknown region into a plurality of sub-regions according to pixel information in the unknown region includes:

let the i-th pixel point on the unknown region be p _i1, 2, n, n is a positive integer;

calculating a mean shift vector m (p) corresponding to the pixel according to a pixel mean shift calculation formula and each pixel information on the unknown region_i) The pixel mean shift calculation formula is as follows:

wherein any one pixel p_iThe method is characterized by comprising five dimensions, wherein the five dimensions respectively represent R, G, B, x, y, R, G and B represent a pixel point p_iThe color of (1) in RGB space, x and y representing a pixel point p_iA planar coordinate on the high-definition image,

h represents the bandwidth, h is more than 0, | | | | non-woven phosphor₂The expression distance;

calculating p_iMean shift vector m (p)_i) Until the five-dimensional data points converge, so that each point reaches the maximum local density;

dividing the mean shift vectors corresponding to the n calculated pixel points into w classes, wherein the Euclidean distance of any two pixel points in each class in the five-dimensional space is less than the bandwidth h;

and merging the classes with the pixel number smaller than the preset pixel number threshold value M into adjacent classes to generate w' classes, wherein each class represents a sub-region.

It should be understood that the five-dimensional space represents five dimensions, namely R, G, B, x, y.

In the above embodiment, the local density is obtained by calculating the color of each pixel and the distance between the pixels, the euclidean distance of any two pixels in each class in the five-dimensional space is determined according to the density, the classes are merged according to the number of pixels in the classes, and a plurality of classes are obtained, each class representing a sub-region.

Optionally, as an embodiment of the present invention, the process of calculating edge weights between adjacent nodes and generating a graph structure according to each edge weight includes:

defining edge weights b between nodes_i,jThe definition is as follows:

wherein ,C_iIs the color information of the pixel point, and the color information of the pixel point,

S_iis the information of the plane coordinates of the pixel points,

X_wi' RGB three-dimensional color information of a midpoint pixel of an i-th class region,

an average value of three-dimensional color information representing the midpoint pixel of each class region,

RGB three-dimensional color information representing midpoint pixels of respective class regionsVariance value of (A), X_Si' plane coordinate information of a midpoint pixel of the i-th type region,

an average value of plane coordinate information representing midpoint pixels of the respective class areas,

variance values representing plane coordinate information of the midpoint pixels of the respective class areas,

according to the edge weight b between the node and each node_i,jA graph structure is generated.

The midpoint pixel is defined as

wherein ,

N_idenotes the number of pixels, Ω, of the i-th area_iA set of pixels representing the i-th class region,

a planar x-coordinate value representing a point pixel in the ith class region,

y coordinate value, x, of plane representing point pixel in i-th class region_jX-coordinate value, y, of a plane representing the jth pixel of the ith-type region_jAnd a plane y coordinate value representing the jth pixel of the ith type region.

The process of converting each sub-region into a node of a graph structure is as follows: the sub-regions are labeled, each label representing a node of the graph structure.

Specifically, before defining the edge weight, each node needs to be labeled, as shown in fig. 3, different regions of the unknown region U of the original image corresponding to w 'classes that are finally generated are labeled, where each label represents one divided region (total w' regions), and each label is a node of the graph structure.

In the above embodiment, each of the divided regions is converted into a graph node representation, and the edge weight relationship between the regions is defined by color and distance, which can facilitate generation of a node optimization queue.

Optionally, as an embodiment of the present invention, the process of determining the foreground region and the background region according to the node optimization queue includes:

optimizing a sub-region, a foreground region and a background region corresponding to an ith node of a node optimization queue to obtain an optimal value of the ith node, and taking the optimal value as an optimal foreground mask value of the ith node;

taking the optimal foreground mask value of the ith node as initial solution information of an (i + 1) th node area, and optimizing the foreground area and the background area of the (i + 1) th node according to the initial solution information to obtain the optimal foreground mask value of the (i + 1) th node;

and until all the node areas in the node optimization queue are optimized, and the optimal foreground mask value of the whole unknown area is obtained.

In the above embodiment, each node in the node optimization queue is optimized to obtain the optimal foreground mask value of the whole unknown region, so that the extracted transparency mask result is more accurate.

Optionally, as an embodiment of the present invention, the optimizing a sub-region, a foreground region, and a background region corresponding to an ith node of the node optimization queue, and obtaining an optimal value of the ith node includes:

calculating each pixel in the foreground area and the background area corresponding to the ith node according to a pixel calculation formula

wherein ,

representing the color value of the k unknown pixel in the unknown region,

representing the k-th background value selected in the background area,

representing the k-th foreground value selected from the foreground area;

all pixels in the foreground area and the background area corresponding to the ith node are used as optimization variables X, pixel values are randomly selected from the foreground area and the background area, and the optimization variables X are assigned according to the selected pixel values to obtain a solution set P ═ X (X ═ X)₁,X₂,…,X_N) N represents the number of solutions,

evaluating each solution in the solution set P to obtain the optimal value of the ith node, wherein the evaluation process comprises the following steps:

if f (X)_i)＞f(X_j) Then X_jTo X_iLearning, the learning process includes: according to learning formula X_j＝X_j+λ(X_i-X_j) Study, X_iAnd continuously comparing the current value with the next solution in the solution set P to obtain a comparison error value, and stopping comparison until the comparison error values of the N solutions are smaller than a preset error value to obtain an optimal solution, wherein the optimal solution is used as the optimal value of the ith node.

In the above embodiment, the solution values of the pixel values are calculated, the solution values are compared with each other to obtain the error value, and the optimal value is obtained by comparing the calculated error value with the preset error value, so that a more accurate foreground mask value can be obtained.

Fig. 2 is a schematic diagram of functional modules of a transparency mask extraction apparatus for high definition images according to an embodiment of the present invention.

Optionally, as another embodiment of the present invention, as shown in fig. 2, a transparency mask extracting apparatus for a high definition image includes:

and the optimization module generates a node optimization queue according to the edge weights among the nodes, selects pixel values in the sub-regions, the foreground region and the background region, performs optimal value solution on the selected pixel values according to the node optimization queue, and takes the optimal value obtained by the solution as an optimal foreground mask value.

Optionally, as an embodiment of the present invention, the region segmentation module is specifically configured to:

h meterIndicating bandwidth, h is more than 0, | | | | non-woven phosphor₂The expression distance;

Optionally, as an embodiment of the present invention, the graph structure generating module is specifically configured to:

defining edge weights b between nodes_i,jThe definition is as follows:

S_iis the information of the plane coordinates of the pixel points,

X_w′iRGB three-dimensional color information representing a midpoint pixel of the i-th class region,

variance value, X, representing RGB three-dimensional color information of the center point pixel of each class region_Si' plane coordinate information of a midpoint pixel of the i-th type region,

Optionally, as another embodiment of the present invention, a transparency mask extracting apparatus for a high definition image includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the computer program is executed by the processor, the transparency mask extracting method for a high definition image as described above is implemented.

Alternatively, as an embodiment of the present invention, a computer-readable storage medium stores a computer program which, when executed by a processor, implements the transparency mask extraction method for high definition images as described above.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A transparency mask extraction method of a high-definition image is characterized by comprising the following steps:

2. The transparency mask extraction method for high definition images according to claim 1, wherein the process of dividing the unknown region into a plurality of sub-regions according to the pixel information in the unknown region comprises:

let the i-th pixel point on the unknown region be p_i1, 2, n, n is a positive integer;

3. The method for transparency mask extraction of high definition images according to claim 1, wherein the calculating of edge weights between adjacent nodes and the generating of graph structures according to the respective edge weights comprises:

defining edge weights b between nodes_i,jThe definition is as follows:

S_iis the information of the plane coordinates of the pixel points,

RGB three-dimensional color information representing a midpoint pixel of the i-th class region,

variance values of RGB three-dimensional color information representing the midpoint pixels of the respective class areas,

plane coordinate information representing a midpoint pixel of the i-th type region,

4. The method for extracting transparency masks of high-definition images according to claim 1, wherein the process of solving the optimal value of the selected pixel values according to the node optimization queue comprises:

optimizing a sub-region, a foreground region and a background region corresponding to an ith node of the node optimization queue to obtain an optimal value of the ith node, and taking the optimal value as an optimal foreground mask value of the ith node;

5. The method for extracting transparency masks of high-definition images according to claim 4, wherein the process of optimizing the sub-region, the foreground region and the background region corresponding to the ith node of the node optimization queue to obtain the optimal value of the ith node comprises:

wherein ,

representing the color value of the k unknown pixel in the unknown region,

representing the k-th background value selected in the background area,

representing the k-th foreground value selected from the foreground area;

all pixels in the foreground area and the background area corresponding to the ith node are used as optimization variables X, pixel values are randomly selected from the foreground area and the background area, and the optimization variables X are assigned according to the selected pixel values to obtain a solution set P ═ X (X ═ X)₁,X₂,…,X_N) N represents the number of solutions;

if f (X)_i)＞f(X_j) Then X_jTo X_iLearning, the learning process includes: according to learning formula X_j＝X_j+λ(X_i-X_j) Study, X_iContinue and in the solution set PComparing the next solution to obtain a comparison error value, and stopping the comparison until the comparison error values of the N solutions are smaller than a preset error value to obtain an optimal solution, wherein the optimal solution is used as the optimal value of the ith node.

6. The utility model provides a transparency shade extraction element of high definition image which characterized in that includes:

7. The apparatus according to claim 6, wherein the region segmentation module is specifically configured to:

wherein any one pixel p_iIs composed of five dimensions, which are divided into partsRespectively indicating R, G, B, x, y, R, G and B indicating a pixel point p_iThe color of (1) in RGB space, x and y representing a pixel point p_iA planar coordinate on the high-definition image,

8. The apparatus according to claim 6, wherein the graph structure generating module is specifically configured to:

defining edge weights b between nodes_i,jThe definition is as follows:

S_iis the information of the plane coordinates of the pixel points,

represents an average value of three-dimensional color information of the midpoint pixel in each class region,

9. A transparency mask extraction device for high definition images, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that when the computer program is executed by the processor, the transparency mask extraction method for high definition images according to any of claims 1 to 5 is implemented.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the transparency mask extraction method for high definition images according to any one of claims 1 to 5.