CN114399423A

CN114399423A - Image content removing method, system, medium, device and data processing terminal

Info

Publication number: CN114399423A
Application number: CN202111512616.3A
Authority: CN
Inventors: 孟繁杰; 冯浩楠; 李润鑫; 李廷轩; 焦聪雨
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-04-26
Anticipated expiration: 2041-12-08
Also published as: CN114399423B

Abstract

The invention belongs to the technical field of image filling, and discloses an image content removing method, a system, a medium, equipment and a data processing terminal, wherein feature points in fuzzy range areas in two images are extracted, the local optimal homography transformation relation from the corresponding area in a graph A to an area to be filled in a graph S is approximately obtained through the position corresponding relation of corresponding feature point pairs, then the required area is intercepted from the corresponding area of the graph A after transformation and filled into the area to be filled, and finally, the problems of texture truncation, color difference and the like of a filling edge are optimized by using a single frame filling and Poisson fuzzy algorithm to obtain a complete removing result. The invention realizes the purpose of removing the target object in the image by an image filling method combining single frame filling and multi-frame filling; the method can restore the original shielded real scene background more efficiently and more realistically, meet the requirement of eliminating shielding objects with high fidelity, greatly improve the matching precision and reduce the calculation amount.

Description

Image content removing method, system, medium, device and data processing terminal

Technical Field

The present invention relates to the field of image filling technologies, and in particular, to an image content removal method, system, medium, device, and data processing terminal.

Background

At present, photos play a very important role in daily life of people, and especially along with the great improvement of the shooting capability of the intelligent mobile terminal, people are more and more accustomed to using a smart phone to conveniently record beautiful scenery, memorable moments and the like. However, the factors such as defects and occlusion in the photo are all covered by a layer of shadow at the moment of the freeze, so that how to eliminate the occlusion of the object in the photo and keep the reality of the whole and detailed photo is important to be how to use the image filling technology to fill in the occlusion object adaptively after the occlusion object is hollowed out.

Existing image filling techniques can be divided into two broad categories in principle: single frame padding and multi-frame padding.

Single frame padding refers to: the algorithm process only uses one picture of the filler image per se, and does not involve additional pictures. The traditional single-frame filling algorithm mainly fills missing pixels through representation information of an image, for example, information of a peripheral pixel level of a region to be filled, and is suitable for being used when the region to be filled is in a local region of a foreground target object. However, when the image structure is complex, or the region to be filled is enlarged to the key part or even all of the foreground object, due to the lack of key information (such as the relationship between pixels, contour information, etc.), it is very difficult to fill the foreground object in accordance with the real semantic situation in most cases by single frame filling.

With the development of deep learning methods in recent years, the filling problem can be simplified into an end-to-end mapping problem. Through training of a large number of data sets, the deep learning-based single-frame filling algorithm can learn deeper and more abstract features except image pixels, and is expected to become a supplement and a substitute of the traditional single-frame filling method. However, how to handle the contradiction between over-fitting and under-fitting of the model is always an urgent problem to be solved for the image filling type neural network, most models are default models and can well handle the distinction between the foreground and the background, when the actual area to be filled completely covers the foreground or is located at the boundary of the foreground and the background, the model can only carry out autonomous inference by the experience of a training sample, and the filling result cannot be well matched with the actual image scene. At the same time, the data cost and training cost required for the large number of diverse data sets required for neural network training remain non-negligible factors.

The multi-frame filling algorithm is used for searching pixel blocks similar to the missing area from data outside the image to be processed and filling. The method can ensure that the filling result meets semantic conditions with a high probability through copying and pasting, and has a good filling effect on backgrounds (such as sky, grassland, sand beach and the like). On the one hand, however, such methods generally require the use of a database consisting of a large number of different data sets, and a sufficient amount of alternative data is the most direct means for improving the multi-frame filling accuracy; on the other hand, because of the particularity and uniqueness of the foreground object in practical situations, it is difficult to find a pixel block suitable for filling the region to be filled in the database, so that the multi-frame filling based on the database is difficult to be applied to the local scene needing to fill the foreground object.

In some specific scene applications, such as taking pictures in tourist attractions, if people passing by the scene are to be removed from the pictures as much as possible, then the result is usually that the blocked part of the person cannot be finely restored due to the missing of the background pixel information by using the algorithm of single frame filling. On the other hand, if a plurality of frame images are continuously captured in a short time, the background (e.g., buildings) in the plurality of frame images is not changed, and the foreground (e.g., occlusion due to people flow) is mobile. By combining a multi-frame filling method, the characteristic that multi-frame filling is good at filling the background is exerted by utilizing the difference of the properties of the foreground and the background in a target scene, filling area image information shielded by a foreground object can be respectively obtained from different frame images, information of a junction between the foreground and the background which is difficult to process after multi-frame filling is supplemented by a single-frame filling method, and finally an image with extremely high confidence coefficient is formed. For example, extracting the background which is not blocked by passers in the front and back continuous frames of the target image, processing the background and filling the processed background into the adjacent frame image of which the background is blocked by passers, and finally optimizing a filling boundary by single-frame filling by combining local original pixel information and newly filled pixel information to enable the transition to be more natural.

If the corresponding positions of the multiple continuous images are directly compared and filled (see fig. 3), due to objective factors such as lens distortion and spherical projection, and the fact that a photographer does not specially use an auxiliary device to help a shooting angle and a shooting position to keep still when using a mobile phone in daily life, the spatial positions of cameras of the multiple continuous images shot by people are not completely the same, and therefore filling results which do not conform to semantic scenes can occur, such as the fact that object seams do not correspond, people incline, the overall perspective relationship of the images to be filled is not conformed, and the like. Therefore, in the multi-frame filling algorithm, how to register and align two pictures to eliminate the spatial difference between the image to be filled and the filled image is a key step before the corresponding filling is performed. At present, the mainstream registration method is mostly based on the principle that the registration confidence coefficient of the picture is the highest globally, and is applied to the overall splicing of two pictures.

Taking a scene of an image to be filled and a filled image as an example, the image to be filled is a graph S (see fig. 4), a region to be filled in the graph S is Sd (see fig. 5), an image for filling is a graph a (see fig. 6), and a region at the same position corresponding to the region Sd in the graph a is a region Ad (see fig. 7). The registration based on the picture overall situation can transform and overlap the spatially overlapped parts in the two images to the maximum extent, and is suitable for the field of focusing on the overall effect of the images, such as image splicing. However, in the field of image filling, local details of an image need to be paid more attention to, and in a traditional global-based registration result, the matching effect of the local region Sd and Ad is not satisfactory and is not suitable for local filling of the image (see fig. 8-9). On one hand, homography transformation is carried out by taking a full image as a reference, and a transformation matrix considering the full image cannot meet the detail requirement of a specific filling area due to the dimension difference between a real three-dimensional scene and a two-dimensional image; on the other hand, the calculation and comparison of the feature points of the whole graph can generate huge calculation amount, and the redundancy of the algorithm is improved. Therefore, a new method and system for removing image content are needed to overcome the shortcomings of the prior art.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) when the image structure is complex, or the region to be filled is enlarged to the key part or even all of the foreground target object, due to the loss of key information, if the foreground target is to be filled continuously, the filling conforming to the real semantic scene is difficult to be realized in most cases by single-frame filling.

(2) How to handle the contradiction between over-fitting and under-fitting of the model is always an urgent problem to be solved for the image filling type neural network, most models are default models and can well handle the distinction between the foreground and the background, when the actual area to be filled completely covers the foreground or is positioned at the junction of the foreground and the background, the model can only carry out autonomous inference by the experience of a training sample, and the filling result can not be well matched with the actual image scene generally; at the same time, the data cost and training cost required for the large number of diverse data sets required for neural network training remain non-negligible factors.

(3) The multi-frame filling algorithm needs to use a database consisting of a large number of different data sets, and enough alternative data quantity is the most direct means for improving the multi-frame filling precision; due to the particularity and uniqueness of foreground objects in actual situations, it is difficult to find pixel blocks suitable for filling the area to be filled in the database, so that the multi-frame filling based on the database is difficult to be applied to local scenes needing to be filled with foreground objects.

(4) In the traditional global-based registration result, the matching effect of the local area Sd and Ad is not satisfactory and is not suitable for local filling of images; because the proportion of the area Sd to be filled in the whole image is usually far less than 30%, the homography transformation is carried out by taking the full image as a reference, and the transformation matrix considering the full image cannot meet the detail requirement of a specific filling area due to the dimension difference between a real three-dimensional scene and a two-dimensional image; the calculation and comparison of the feature points of the whole graph can generate huge calculation amount, and the redundancy of the algorithm is improved.

The difficulty in solving the above problems and defects is:

the technical difficulty is as follows: with the continuous improvement of image-related software and hardware technologies, the resolution and the field of view of scene images obtained through sampling by various devices are continuously increased. However, due to the limitation of the homography principle, a larger scene image means that more areas do not conform to the homography principle, and the two images with different viewing angles cannot be unified into the same viewing angle everywhere by using the traditional homography, but the phenomena of stretching, shortening and the like with a lot of distortions occur.

In addition, the core of the current deep learning algorithm is still a model based on data training, and a large amount of training aiming at data is needed to achieve a target result. However, the data-based algorithm is inevitably limited by the data itself, and it is difficult to maintain an excellent algorithm level when facing the problem of data out-generalization.

Environmental difficulty: the recent development of deep learning in the field of machine vision has led many researchers to give solutions to the problem of image content removal using deep learning methods more than just abandoning extensive research on traditional image processing methods. But the robustness and the generalization of the deep learning algorithm are limited at present, and the deep learning algorithm is difficult to form large-scale application in the industry; meanwhile, the traditional method has low attention, so the traditional method has slow progress in the field of image content removal.

The significance of solving the problems and the defects is as follows:

based on the thought of the interest region, the problem that the large image transformation does not conform to the homography transformation region can be solved well by using the local homography transformation technology, the generalization of the homography transformation in the image processing field is expanded, and the method can be more suitable for the progress of various technologies.

And secondly, the expansibility of the traditional image processing algorithm is explored, the respective advantages and disadvantages of the traditional algorithm and the deep learning algorithm in the image field are correctly known, the adaptability of different algorithms in different application scenes is determined, and the cooperative and healthy development of various image algorithms is promoted.

Disclosure of Invention

The present invention provides an image content removing method, system, medium, device and data processing terminal, and particularly relates to an image content removing method, system, medium, device and data processing terminal based on local optimal homography transformation.

The present invention is achieved as such, an image content removing method including:

firstly, extracting feature points in fuzzy range areas in two images, approximately obtaining a local homography transformation relation from a corresponding area in a graph A to an area to be filled in a graph S through the position corresponding relation of corresponding feature point pairs, namely, transforming a same plane object shot by a camera from different space positions into a same angle, finally intercepting a required area from the corresponding area of the transformed graph A and filling the area to be filled, and finally optimizing the problems of texture truncation, color difference and the like of a filling edge by using a single frame filling and Poisson fuzzy algorithm to obtain a complete removal result.

Further, the image content removing method includes the steps of:

step one, for a target image containing a region to be filled and a filling image used for filling, putting the determined filling image into the region to be filled by using a local optimal homography conversion algorithm for corresponding filling; the global transformation idea of the traditional homography transformation cannot process the region which does not meet the homography principle under most conditions, and the high-fidelity transformation result cannot be accurately obtained. The idea of the interest region is adopted by using a local optimal homography transformation algorithm, and the region to be filled is used as a transformed high-priority region, so that the local distortion problem of the traditional algorithm can be better solved.

Step two, performing self-adaptive filling by using a single-frame filling algorithm aiming at the filled edge area; the existing single-frame image filling algorithm depends heavily on the peripheral pixel values of a filling area, so that good continuity can be obtained during small-area filling, the semantic requirement of filling cannot be met during large-area filling, and an ideal result is difficult to obtain when the algorithm is used alone. And combining a single-frame filling algorithm with the multi-frame filling algorithm in the step one, wherein the center main body of the filling area is filled by multi-frame filling, and the edge is filled in a self-adaptive manner by using the single-frame filling algorithm, so that the high continuity of the edge can be kept under the condition of ensuring the semantic environment of the filling area.

And thirdly, eliminating the color gradient difference between the filling area and the original image by using a Poisson fusion method. The visual impression of human eyes is particularly sensitive to the color gradient in an RGB image, and sharp color gradient changes can cause people to automatically produce a split feeling of edges. The Poisson fusion can smooth the pixel value change of the filling area and the peripheral area, so that the transition is more natural, and the perception degree of human eyes to the filling area is minimized while the human eye impression is greatly improved.

Further, in the step one, for the target image including the area to be filled and the filling image used for filling, the determined filling image is placed in the area to be filled by using a locally optimal homography transformation algorithm for corresponding filling, including:

(1) extracting feature points;

(2) adding respective description vectors for the feature points and searching matching point pairs in the two graphs;

(3) and solving a homography transformation matrix according to the two-dimensional coordinate information of the matched characteristic point pairs.

Further, in step (1), the extracting feature points includes:

regarding the map S to be filled, the information of the area Sd to be filled is regarded as unknown, and the most important condition is continuity between Ad and the surrounding environment information after Ad is filled into Sd. Based on the geometric features, the region Sd to be filled is expanded outward by 20% and does not exceed the image boundary, and is used as a rough sampling range of the feature point, which is called Sd _ out.

For the filling diagram A, because homography transformation is not carried out yet, the positions of the same objects in the diagrams S and A do not necessarily correspond to the completely same coordinates, but simultaneously, because the time interval between adjacent frames is short, the filling area appears near the coordinates of the area to be filled; the image A is subjected to fuzzy sampling, namely, the sampling range is expanded to include 50% of the area at the same position as Sd in the image A and the expansion of the area is called Ad _ out.

And (4) matching and searching by detecting an angular point inside (Sd _ out-Sd) and a local characteristic point in Ad _ out, so that the transformation relation of the two image areas is calculated through the matched characteristic point pairs, and a Harris angular point detection matching method is used.

The Harris angular point detection matching algorithm is based on the idea that a fixed window is used for sliding in any direction on an image, the gray level change degrees of pixels in the window in two conditions before sliding and after sliding are compared, if sliding in any direction exists, large gray level change exists, and then the angular point exists in the window.

When the window moves in a plane [ u, v ], the gray level change of the pixel points in the window corresponding to the window before and after sliding is described as follows:

wherein [ u, v ] is the two-dimensional offset of the window; e (u, v) is the difference in gray level at this offset position from the position before sliding; (x, y) is the corresponding pixel two-dimensional coordinate position in the window, and how many positions there are pixels of the window; w (x, y) is a window function, namely a weight coefficient occupied by pixel values at various positions in a window in calculation; and I is the brightness of the pixel point at the current position, and the corresponding gray value in the image is the gray value of the pixel point at the current position.

According to the expression, when the window slides on the flat area, the gray scale does not change, and then E (u, v) is 0; if the window is slid over a region richer than texture, the grey scale change will be large. The final idea of the algorithm is to calculate the corresponding position when the gray scale changes greatly, and the larger value is the sliding of the pointer in any direction, but not in a certain direction.

Taylor expansion of the above equation gives:

wherein, I_xIs the partial derivative of the pixel brightness I in the x-direction, I_yIs the partial derivative of the pixel luminance I in the y-direction.

Let matrix M be:

m is a covariance matrix by definition.

And respectively testing three scenes, namely a flat area, a linear edge and an angular point edge to obtain a gradient distribution diagram. It can be found that the eigenvalue and eigenvector properties of the covariance matrix can be obtained, and the three scenarios are represented by eigenvalue attributes:

when the characteristic values are all larger, namely the window contains angular points; the characteristic value is one larger and one smaller, and the window contains edges; the eigenvalues are all small and the window is in a flat area.

From these properties, for convenience of measurement, the construct R is:

R＝det M-k(trace M)²；

wherein det M ═ λ₁λ₂，_traceM＝λ₁+λ₂，λ₁、λ₂K is a real coefficient for two eigenvalues of the covariance matrix M.

The properties of the function R correspond exactly to those of the covariance matrix M: when R is larger, the corresponding characteristic values are larger, namely, the window contains angular points; when R is less than 0, the corresponding characteristic values are larger and smaller, and the window contains edges; when the R is smaller, the corresponding characteristic values are smaller, and the window is in a flat area.

By setting the movement of the sliding window on the region (Sd _ out-Sd) and Ad _ out, the R value of each position is calculated, and the window position where the R value is greater than the threshold is recorded, that is, the characteristic point.

On average, one feature point is taken from every 15 × 15 pixel grid, that is, 10 feature points are taken from a 150 × 150 area and enter a matching link of the feature points to ensure that corresponding feature point pairs are matched, and a specific number of key points are preferentially selected by using a self-adaptive non-maximum suppression method.

The idea of self-adaptive non-maximum suppression is to form a set for a plurality of characteristic points extracted in the last step

For each point x therein_iAll the characteristic points are taken as a circle center, so that the R values of all other characteristic points in the circle are less than x_iAnd finding the minimum circle radius R satisfying the above conditions_iIt is added to the queue. Traversing all points in the calculation set to obtain the minimum circle radius r meeting the constraint condition_iAnd (4) sorting in a descending order, and selecting the first n corresponding points as a key point set obtained after the self-adaptive non-maximum value is inhibited.

Expressed in mathematical language as:

wherein r is_iIs the minimum circle radius, x, satisfying the constraint_i，x_jIs the two-dimensional coordinate of the key point obtained in the previous step, f (x)_i) Is a point x_iThe value of R at (A) is,

is the set of all key points, c is 0.9.

In actual calculation, firstly, the number of feature points needing to be screened is calculated according to the pixel area of the region, and at the moment, since the number of feature points in a unified region (Sd _ out-Sd) and Ad _ out which are needed in subsequent calculation is the same, the number of feature points obtained in a self-adaptive mode is calculated by taking the region Ad _ out with a larger pixel area as a reference, and is set as n; secondly, finding out a key point Rmax with the maximum R value in the region, adding the key point Rmax into a queue, and obtaining a value of Rmax 0.9; go through theAll key points of the region, if the key point x_iRi > R max 0.9, the radius of the point is set to infinity; if the key point x_iRi < R max 0.9, this point is calculated to the point x > 0.9R of its nearest Ri_iRecording the distance r between two points_i(ii) a Finally, all the r are sorted, and n points with the maximum r are found.

In the step (2), adding respective description vectors to the feature points and searching for matching point pairs in the two graphs includes:

after n feature points are found in the regions (Sd _ out-Sd) and Ad _ out respectively, the features of each point are described, which are used for finding the corresponding relationship of the feature points between the graph a and the graph S, and a descriptor is constructed by using the two-dimensional position relationship between the key points.

Making appropriate Gaussian blur on the gray level image; taking a region of 40x40 pixels by taking each feature point as a center; namely, the area of 40x40 pixels around each feature point is used as a description area, so that each description area contains 3-4 feature points including the central feature point on average; down-sampling the region to 8x8 size, and generating a 64-dimensional vector by dimension; the vectors are normalized, that is, each feature point generates a 64-dimensional vector as its descriptor, and thus, an nx 64-dimensional feature matrix is obtained for each of the two registration areas.

After having descriptors of all the feature points, screening out matched feature points in the two images as feature point pairs, and filtering out feature point sets with low correlation; and calculating the variance of the description vectors between every two n feature points, sorting the n feature points from small to large according to the ratio of the minimum variance to the second minimum variance, and taking the point pairs smaller than a threshold value of 0.5 as screened matching points.

In step (3), the solving of the homography transformation matrix according to the two-dimensional coordinate information of the matching feature point pair includes:

after finding out the point pair with the highest matching confidence coefficient, solving a homography transformation matrix according to the two-dimensional coordinate matching relation between the point pairs, transforming the Ad _ out of the graph A area, and filling the required part into the area Sd to be filled of the graph S.

Let the midpoint (x) in graph A₁，y₁) From the midpoint (x) of the graph S₂，y₂) And matching, namely, having a corresponding relation:

wherein H_3*3The homography matrix is a key variable for obtaining the transformation relation.

Each set of matching points

The equation holds true:

where h is the homography coefficient. From the corresponding relation of plane coordinates and homogeneous coordinates

Then:

further transformation is as follows:

(h₃₁x_i+h₃₂y_i+h₃₃)·x′_i＝h₁₁x_i+h₁₂y_i+h₁₃；

(h₃₁x_i+h₃₂y_i+h₃₃)·y′_i＝h₂₁x_i+h₂₂y_i+h₂₃；

writing in matrix form:

i.e. each set of matching points

2 sets of equations can be obtained.

Since the homography matrix H is exactly the same as aH, where a ≠ 0, then:

i.e. point (x)_i，y_i) Whether mapped via H or aH, the change is (x'_i，y′_i)。

If make

Then there are:

although the homography matrix H has 9 unknowns, a constraint H is generally added during the solution₃₃If 1, then h₁₁...h₃₃There are 8 unknowns, i.e. only 8 degrees of freedom. Due to a set of matching points

Corresponding to 2 sets of equations, only n-4 sets of non-collinear matching points are needed to solve the unique solution of H.

If the number of the matched characteristic point pairs screened in the previous step is less than 4, then returning to the step of local maximum suppression of the characteristic points, and properly reducing the degree of local suppression to increase the number of the matched characteristic point pairs until the number of the obtained matched characteristic point pairs is more than 4; in actual situations, the number of the screened matching key point pairs is far more than 4 required pairs in most situations, and a random sampling algorithm is used to obtain 4 pairs with the minimum matching error; taking an image as a reference, randomly selecting 4 points from the area (Sd _ out-Sd) each time, and finding out 4 matched points in the area Ad _ out; calculating by using 4 pairs of points according to the method to obtain a homography transformation matrix, transforming and projecting the residual characteristic points in (Sd _ out-Sd) into an area Ad _ out according to the homography transformation matrix, and counting the number of the superposition of the residual characteristic points and the original characteristic points of the area Ad _ out; this step is repeated 2000 times, and one of the homography transformation matrices that is most accurately paired is selected as the final transformation matrix. So far, the local optimal projective transformation relation of the two images has been found.

Further, in step two, the adaptively padding the filled edge region by using a single frame padding algorithm includes:

the filling region is transformed to a region to be filled by local optimal homography transformation, and the perspective relation and the spatial information of the image to be filled are met to the maximum extent; and secondarily filling the edge of the area to be filled Sd by using a single-frame filling method, so as to replace the originally partially discontinuous filling edge.

The single frame filling algorithm is divided into three parts of priority calculation, matching block search and copy filling.

Firstly, dividing an image into a known region and a region to be filled, and calculating the filling priority of edge pixel points along edges from the boundary to be filled.

Suppose that the whole image area is I, the area to be filled is omega, and the edge of the filled area is

Block psi centred on target area edge point p_pThe priority calculation formula of (1) is as follows:

P(p)＝C(p)*D(p)；

where C (p) is confidence, D (p) is a data value defined as:

wherein, | ψ_pI is block psi_pThe area of (d); α is the image normalization factor, 255 for the uint8 format image; n is_pTo fill the edge of the region

A unit normal vector at the upper point p;

is an isolux line at point p.

The initialization values are:

C(p)＝0，p∈Ω；

C(p)＝1，p∈I-Ω；

the block ψ is formed with a set block size centering on the boundary point p of the highest current priority_p(ii) a Finding similar blocks in a known area according to a matching criterion, the block phi centered at q' and q ″_q′And block psi_q″(ii) a Separately calculating the block psi_q′And block psi_q″And psi_pMean square error E ', E'. When the difference between E 'and E' is more than 20%, selecting the block with the minimum mean square error for filling; however, when the difference between E 'and E' is less than 20%, and the macroscopic impression difference of human eyes is not large, because of the continuity of the change between adjacent pixels of the image, the space Euclidean distance discrimination is introduced, and the distance and the block psi are selected_pThe pixel block with the closest plane distance is filled.

Further, in step three, the method for eliminating the color gradient difference between the filled region and the original image by using poisson fusion includes:

using a Poisson fusion method, eliminating the color gradient at the edge seam filling each area, unifying the overall tone in different areas, filling the image g into the background S, wherein S is R²Represents the domain of definition of the image; omega is a closed subset of S, bounded by

f^*To be defined at the boundary of and outside the omega domain

A known scalar function of (a); f is defined as

An unknown scalar function on the inside of the domain; v is the vector field defined over the omega domain.

Namely, the known data includes: gradient field v of region omega, and the edges of the filled-in region

After filling, the following two conditions are met:

(1) keeping the fill image g undistorted, i.e. the gradient of the fill content is as close as possible to v;

(2) the transition is seamless, i.e. the border pixel values of the padding are consistent with the existing S.

The following mathematical expressions can be obtained by satisfying the two requirements:

wherein the content of the first and second substances,

for gradient operators, vector field v is the guide field, mathematical expression

And

the conditions (1) and (2) are respectively satisfied. The solution to this problem is the only solution to the poisson partial differential equation given the dirichlet boundary condition:

wherein the content of the first and second substances,

represents the divergence of (u, v); i.e. laplacian, i.e. second order gradient

And the boundary condition at this time is called dirichlet boundary.

The solution to solve the poisson equation with dirichlet boundary conditions can be solved step by step using an iterative method: the resolved pixel value is bounded by

The pixel value of (b) is used as a starting point, the gradient direction of the guide vector field v is followed, and the filled area is gradually corrected one circle by one circle, so that the visual effect of being more harmonious with the background is achieved; for a common three-channel color image, the poisson equation needs to be solved independently on three channels.

Another object of the present invention is to provide an image content removing system applying the image content removing method, the image content removing system comprising:

the image filling module is used for putting the determined filling image into the area to be filled by using a local optimal homography conversion algorithm for the target image containing the area to be filled and the filling image for filling;

the self-adaptive filling module is used for performing self-adaptive filling by using a single-frame filling algorithm aiming at the filled edge area;

and the color gradient difference elimination module is used for eliminating the color gradient difference between the filling area and the original image by using a Poisson fusion method.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

firstly, extracting characteristic points in fuzzy range areas in two images, approximately obtaining a local homography transformation relation from a corresponding area in a graph A to an area to be filled in a graph S according to the position corresponding relation of corresponding characteristic point pairs, namely, transforming objects in the same plane shot by a camera from different spatial positions into the same angle, and finally intercepting required areas from the corresponding area of the transformed graph A and filling the required areas into the area to be filled.

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

Another object of the present invention is to provide an information data processing terminal for implementing the image content removal system.

By combining all the technical schemes, the invention has the advantages and positive effects that: the image content removing method provided by the invention realizes the purpose of removing the target object in the image by utilizing continuous multi-frame image information and an image filling method combining single-frame filling and multi-frame filling, and the algorithm can more efficiently and more realistically restore the originally shielded real scene background and meet the requirement of eliminating the shielding object with high fidelity.

Software which can be used for removing image content in the market of the current mobile terminal is almost zero, and in an android system or an IOS system, the functions of an image editor provided by the mobile terminal are mainly focused on the aspects of image cutting, contrast adjustment, filter addition and the like; software available for image content removal in the personal computer market is represented by Photoshop, a software developed by Adobe corporation. Photoshop mainly processes digital images formed by taking pixels as basic units, and a plurality of editing and drawing tools are used for effectively editing pictures by users. Photoshop provides 2 methods for removing partial contents in images for users. An idea based on single frame stuffing: after a user selects a target area in an image by using an area selection tool, identifying and analyzing the surrounding area by Photoshop, and filling and replacing the selected area according to the identification result, wherein the method has the defect that a single-frame filling algorithm is consistently difficult to reasonably fill a large-area real scene; another approach is based on the idea of multi-frame padding: the user firstly needs to prepare 2 or more image materials, manually performs the registration of the layers after importing Photoshop, and then adjusts the layers of the target area to be transparent by using tools such as a painting brush and the like, so that the materials filled at the same position in the other layer are fused across the layers, and the manual process is complex. The characteristic of using software such as Photoshop to remove the image content lies in, the functional module of the software is very detailed in the whole operation step, so although for professional image processing practitioner or user skilled in using software, can adjust the processing parameter flexibly according to different task scenes in order to get the most detailed processing effect; but for most of the users in general, there is a higher use threshold, which seriously affects the use experience. The invention provides end-to-end image content removal software, a plurality of processing algorithms are integrated in the software, a user does not need to skillfully master a specific algorithm flow in the using process, and only needs to select a picture material and a target area to obtain a processed result, thereby filling the market blank of image content removal, and providing a one-stop solution for supporting a mobile terminal and a personal computer terminal for the user.

The invention not only can get rid of the interference of non-interest areas in the image and greatly improve the matching precision, but also reduces the calculated amount by the idea of searching the local optimal solution, thereby greatly reducing the requirement of the algorithm on hardware equipment and ensuring the efficiency of the algorithm when the mobile terminal operates.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of an image content removing method according to an embodiment of the present invention.

FIG. 2 is a block diagram of an image content removal system according to an embodiment of the present invention;

in the figure: 1. an image filling module; 2. a self-adaptive filling module; 3. and a color gradient difference elimination module.

Fig. 3 is a schematic diagram of a result of multi-frame direct padding according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of fig. S provided by an embodiment of the present invention.

Fig. 5 is a schematic diagram of an Sd mask provided in an embodiment of the present invention.

Fig. 6 is a schematic diagram of fig. a provided by an embodiment of the present invention.

Fig. 7 is a schematic diagram of an Ad mask (same as Sd) provided in an embodiment of the present invention.

Fig. 8 is a schematic diagram of a global optimal registration filling result according to an embodiment of the present invention.

Fig. 9 is a schematic diagram of problem labeling of global optimal registration filling according to an embodiment of the present invention.

Fig. 10 is a schematic diagram of an Sd mask provided in an embodiment of the present invention.

Fig. 11 is a schematic diagram of Sd _ out masks according to an embodiment of the present invention.

Fig. 12 is a schematic diagram of Ad masks according to an embodiment of the present invention.

Fig. 13 is a schematic diagram of an Ad _ out mask according to an embodiment of the present invention.

Fig. 14 is a schematic diagram of (Sd _ out-Sd) masks provided in the embodiment of the present invention.

Fig. 15 is a schematic diagram of characteristic points of the regions (Sd _ out-Sd) according to the embodiment of the present invention.

Fig. 16 is a schematic characteristic point diagram of an area Ad provided in an embodiment of the present invention.

Fig. 17 is a schematic diagram of characteristic points of the limited region (Sd _ out-Sd) according to the embodiment of the present invention.

Fig. 18 is a schematic characteristic point diagram of the restricted area Ad _ out according to the embodiment of the present invention.

Fig. 19 is a globally optimal registration overlay provided by an embodiment of the present invention.

Fig. 20 is a schematic diagram of a global optimal registration filling result according to an embodiment of the present invention.

Fig. 21 is a locally optimal registration overlay provided by an embodiment of the present invention.

Fig. 22 is a schematic diagram of a locally optimal registration filling result provided by an embodiment of the present invention.

Fig. 23 is a schematic diagram of a global optimal registration filling result compared with a local optimal registration filling result according to an embodiment of the present invention.

Fig. 24 is a schematic diagram of a breakpoint generated when a tree is cut off in an image according to an embodiment of the present invention.

Fig. 25 is a schematic diagram of calculating the priority of the single frame padding algorithm according to the embodiment of the present invention.

Fig. 26 is a schematic diagram of a search for a matching block in a single frame filling algorithm according to an embodiment of the present invention.

Fig. 27 is a schematic diagram illustrating a result of filling the margin of the left filling area with the single-frame filling algorithm according to the embodiment of the present invention.

Fig. 28 is a diagram of filling an edge without adding a single frame according to an embodiment of the present invention.

Fig. 29 is a diagram of a single frame padded edge added according to an embodiment of the invention.

Fig. 30 is a schematic diagram illustrating a multi-frame filling effect compared with a single-frame multi-frame combined filling effect according to an embodiment of the present invention.

Fig. 31 is a schematic illustration of the problem of not using poisson fusion as provided by embodiments of the present invention.

Fig. 32 is a simplified schematic diagram of a filling situation for filling the image g into the background S according to the embodiment of the present invention.

Fig. 33 is a diagram of a poisson fusion system provided by an embodiment of the present invention.

Fig. 34 is a diagram after poisson fusion as provided by embodiments of the present invention.

Fig. 35 is a schematic diagram showing a comparison between poisson fusion according to an embodiment of the present invention.

Fig. 36 is a schematic diagram of another example provided by an embodiment of the invention before filling.

Fig. 37 is a schematic diagram of another example of a device according to an embodiment of the present invention before poisson.

Fig. 38 is a diagram illustrating another example of a poisson process according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides an image content removing method, system, medium, device and data processing terminal, and the following describes the present invention in detail with reference to the accompanying drawings.

As shown in fig. 1, the image content removing method provided by the embodiment of the present invention includes the following steps:

s101, for a target image containing a region to be filled and a filling image used for filling, putting the determined filling image into the region to be filled by using a locally optimal homography algorithm for corresponding filling;

s102, carrying out self-adaptive filling by using a single-frame filling algorithm aiming at the filled edge area;

s103, eliminating the color gradient difference between the filling area and the original image by using a Poisson fusion method.

As shown in fig. 2, an image content removing system provided by an embodiment of the present invention includes:

the image filling module 1 is used for placing the determined filling image into the area to be filled by using a local optimal homography transformation algorithm for a target image containing the area to be filled and the filling image for filling;

the self-adaptive filling module 2 is used for performing self-adaptive filling by using a single-frame filling algorithm aiming at the filled edge area;

and the color gradient difference elimination module 3 is used for eliminating the color gradient difference between the filling area and the original image by using a Poisson fusion method.

The technical solution of the present invention is further described below with reference to specific examples.

1. The invention provides an image filling method which utilizes continuous multi-frame image information and combines single-frame filling and multi-frame filling to achieve the purpose of removing a target object in an image.

The method comprises three steps: for a target image containing a region to be filled and a filling image used for filling, firstly, the determined filling image is placed into the region to be filled by using a locally optimal homography transformation algorithm for corresponding filling, then, a single-frame filling algorithm is used for self-adaptive filling aiming at a filled edge region, and finally, the color gradient difference between the filling region and an original image is eliminated by using a Poisson fusion method.

2. The content is as follows: image content removing method based on local optimal homography transformation

The algorithm comprises the following steps: firstly, extracting characteristic points in fuzzy range areas in two images, approximately obtaining a local homography transformation relation from a corresponding area in a graph A to an area to be filled in a graph S according to the position corresponding relation of corresponding characteristic point pairs, namely transforming objects of the same plane shot by a camera from different spatial positions into the same angle, and finally intercepting a required area from the corresponding area of the transformed graph A and filling the required area into the area to be filled.

2.1 extracting feature points

Firstly, for the map S to be filled, the information of the area Sd to be filled is regarded as unknown, and the most important condition is continuity between Ad and its surrounding environment information after Ad is filled into Sd. The region Sd to be filled is expanded outward by 20% (not exceeding the image boundary) based on the geometric features as a rough sampling range of the feature points, which is called Sd _ out. Here, Sd mask is shown in fig. 10, and Sd _ out mask is shown in fig. 11.

For the filling diagram a, since the homography transformation has not been performed, the positions of the same objects in the diagrams S and a do not necessarily correspond to the exact same coordinates, but at the same time, since the time interval between adjacent frames is short, the filling area usually appears near the coordinates of the area to be filled. Therefore, the image A is sampled in a fuzzy way, namely, the sampling range is expanded to include 50% of the area at the same position as Sd in the image A and the expansion of the area, which is called Ad _ out. Here, Ad mask is shown in fig. 12, and Ad _ out mask is shown in fig. 13.

The idea of finding the local optimal solution can not only get rid of the interference of non-interest areas in the image and greatly improve the matching precision, but also reduce the calculated amount, greatly reduce the requirement of the algorithm on hardware equipment and ensure the efficiency of the algorithm when the mobile terminal operates.

In order to find the most suitable filling content from the a diagram, and the information of the region Sd to be filled is useless information, first, the feature points in the region (Sd _ out-Sd) and the region Ad _ out are found, and the transformation relationship between the regions Sd _ out and Ad _ out can be approximately reflected. Wherein the (Sd _ out-Sd) mask is shown in FIG. 14.

The method comprises the steps of detecting (Sd _ out-Sd) internal corner points and local feature points in Ad _ out, matching and searching the corner points and the local feature points, and calculating the transformation relation of two image areas through the matched feature point pairs. At present, a plurality of detection and matching methods for key points exist, and a common scale invariant feature transform matching algorithm (SIFT), Harris corner detection matching, histogram of oriented gradients feature extraction (HOG) and the like are provided. The scale invariant feature transformation matching algorithm has rotation and scale invariance, but has larger calculated amount; the matching precision of Harris corner detection matching is not as excellent as SIFT, but the threshold control of feature point extraction is more flexible, and the algorithm speed is higher; the histogram feature extraction ignores the influence of illumination colors on the image, and is generally applicable to the field of target detection. In consideration of the local optimal matching characteristic of the algorithm, the accuracy requirement on feature extraction matching is not high, and a faster calculation speed is required to meet the actual requirement of mobile terminal application. Therefore, in comprehensive consideration, the Harris corner detection matching method is used in the detection matching link of the feature points.

The basic idea of the Harris corner detection matching algorithm is to use a fixed window to slide on an image in any direction, compare the gray level change degrees of pixels in the window in two conditions before and after sliding, and if the sliding in any direction has large gray level change, the corner in the window can be considered to exist.

According to the above expression, when the window slides on a flat area, it is conceivable that the gradation does not change, and then E (u, v) becomes 0; if the window is slid over a region richer than texture, the grey scale change will be large. The final idea of the algorithm is to calculate the corresponding position when the gray scale changes greatly, and the larger value is the sliding of the pointer in any direction, but not in a certain direction.

Taylor expansion of the above equation gives:

wherein I_xIs the partial derivative of the pixel brightness I in the x-direction, I_yIs the partial derivative of the pixel luminance I in the y-direction.

Let matrix M be:

m is a covariance matrix by definition.

From these properties, for convenience of measurement, the construct R is:

R＝det M-k(trace M)²

where det M ═ λ₁λ₂，_traceM＝λ₁+λ₂，λ₁、λ₂K is a real coefficient for two eigenvalues of the covariance matrix M.

The properties of the function R then correspond completely to those of the covariance matrix M: when R is larger, the corresponding characteristic values are larger, namely, the window contains angular points; when R is less than 0, the corresponding characteristic values are larger and smaller, and the window contains edges; when the R is smaller, the corresponding characteristic values are smaller, and the window is in a flat area.

At this time, the R value of each position can be calculated by setting the movement of the sliding window on the regions (Sd _ out-Sd) and Ad _ out, and the window position where the R value is greater than the threshold is recorded, that is, the characteristic point.

The feature points of the regions (Sd _ out-Sd) are shown in FIG. 15, and the feature points of the region Ad are shown in FIG. 16.

Since the number of feature points cannot be specified before extracting the feature points, in order to ensure that the feature points in the region have a correspondence relationship and complete the subsequent calculation, the feature points extracted after the region limitation remain many. However, it is found through experiments that, in a real scene, considering factors such as a shooting distance and environmental complexity, it is more appropriate to average take one feature point out of every 15 × 15 pixel grid, that is, 10 feature points are taken out of a 150 × 150 area and enter a matching link of the feature points to ensure that corresponding feature point pairs are matched, and errors can be reduced without increasing redundant computation amount too much. Most of the key points are removed, only some characteristic obvious points are reserved, and the characteristic points are distributed uniformly in the region. A method of adaptive non-maxima suppression is used here to preferentially select a certain number of keypoints.

For each point x therein_iAll the characteristic points are taken as a circle center, so that the R values of all other characteristic points in the circle are less than x_iAnd finding the minimum circle radius R satisfying the above conditions_iIt is added to the queue. Traversing all points in the calculation set to obtain the minimum circle radius r meeting the constraint condition_iAnd arranging the points in a descending order, and selecting the first n corresponding points as a key point set obtained after the self-adaptive non-maximum value is inhibited.

Expressed in mathematical language as:

is the set of all key points, c is 0.9.

In actual calculation, the invention reverses the above process. First, the number of feature points to be screened is calculated from the pixel area of the region, and at this time, since the number of feature points in the subsequent calculation-required uniform region (Sd _ out-Sd) is the same as that in Ad _ out, the number of feature points to be adaptively acquired is calculated with reference to the region Ad _ out having a large pixel area, and is set as n. And secondly, finding out a key point Rmax with the maximum R value in the region, adding the key point Rmax into a queue, and obtaining a value of Rmax 0.9. Traversing all key points in the area if the key point x_iRi > R max 0.9, the radius of the point is set to infinity; if the key point x_iRi < R max 0.9, this point is calculated to the point x > 0.9R of its nearest Ri_iRecording the distance r between two points_i. Finally, all the r are sorted, and n points with the maximum r are found.

Fig. 17 shows characteristic points of the restricted area (Sd _ out-Sd), and fig. 18 shows characteristic points of the restricted area Ad _ out.

2.2 adding respective description vectors to the feature points and finding matching point pairs in the two graphs

After n feature points are found in the regions (Sd _ out-Sd) and Ad _ out, the features of the respective points need to be described for finding the corresponding relationship of the feature points between the graph a and the graph S. Because the information difference of distance, scale, angle and the like between the continuous frame images is small in daily use environment, and the position relation between the feature points has strong consistency, the descriptor is considered to be constructed by using the two-dimensional position relation between the key points.

First, to eliminate the influence of noise in the image, a moderate gaussian blur is made on the grayscale image. Next, since there is one feature point in the range of every 15 × 15 pixels on average in the previous step, in order to describe the relationship between feature points, a region of 40 × 40 pixels is now taken centering on each feature point. Namely, the area of 40 × 40 pixels around each feature point is used as a description area, so that each description area contains 3-4 feature points including the central feature point on average. The region is then downsampled to a size of 8x8, and is dimensionalized to generate a 64-dimensional vector. Finally, the vectors are normalized, that is, each feature point generates a 64-dimensional vector as its descriptor, and thus, an nx 64-dimensional feature matrix is obtained for each of the two registration areas.

After having the descriptors of the feature points, the feature points matched in the two images are screened out as feature point pairs, and feature point sets with low correlation are filtered out. Firstly, calculating the variance of the description vectors between every two n feature points, in order to obtain higher confidence, sorting the feature points from small to large according to the ratio of the minimum variance to the second minimum variance, and using the point pairs smaller than a threshold value of 0.5 as screened matching points.

2.3 solving the homography transformation matrix according to the two-dimensional coordinate information of the matched characteristic point pairs

After finding out the point pair with the highest matching confidence, the homography transformation matrix of the point pair needs to be solved according to the two-dimensional coordinate matching relationship between the point pair and the homography transformation matrix, and the image area Ad _ out is transformed and then the needed part is filled into the area Sd to be filled of the image S.

From a more general analysis, each set of matching points

The equation holds true:

The above formula can be expressed as:

further transformation is as follows:

(h₃₁x_i+h₃₂y_i+h₃₃)·x′_i＝h₁₁x_i+h₁₂y_i+h₁₃

(h₃₁x_i+h₃₂y_i+h₃₃)·y′_i＝h₂₁x_i+h₂₂y_i+h₂₃

writing in matrix form:

that is, each set of matching points

2 sets of equations can be obtained.

Since the homography matrix H is in fact identical to aH (where a ≠ 0), for example:

i.e. point (x)_i，y_i) Whether mapped via H or aH, the change is (x'_i，y′_i). If make

Then there are:

therefore, although the homography matrix H has 9 unknowns, the constraint H is generally added during the solution ₃₃1, so there is also h₁₁...h₃₃There are 8 unknowns, i.e. only 8 degrees of freedom. Due to a set of matching points

At this time, if the number of the matched feature point pairs screened in the previous step is less than 4, before returning to the maximum suppression step of the feature points again, the local suppression degree is appropriately reduced to increase the number of the matched feature point pairs until the number of the obtained matched feature point pairs is more than 4. In practice, of course, the number of matching key point pairs screened out is much more than 4 pairs required in most cases, so in order to obtain 4 pairs with the minimum matching error, a random sampling algorithm is used. That is, 4 points are randomly selected from the regions (Sd _ out-Sd) each time based on one image, and 4 paired points are found in the region Ad _ out. And 4 pairs of points are used for obtaining a homography transformation matrix by calculation according to the method, and the residual characteristic points in (Sd _ out-Sd) are transformed and projected into the area Ad _ out according to the homography transformation matrix, so that the number of the superposition with the original characteristic points of the area Ad _ out is counted. This step is repeated 2000 times, and one of the homography transformation matrices that is most accurately paired is selected as the final transformation matrix. So far, the local optimal projective transformation relation of the two images has been found.

The globally optimal registration overlay is shown in fig. 19, the globally optimal registration fill is shown in fig. 20, the locally optimal registration overlay is shown in fig. 21, the locally optimal registration fill is shown in fig. 22, and the globally optimal registration fill is compared with the locally optimal registration fill is shown in fig. 23.

3. Leaving out filling margin for single frame filling

The filling region can be transformed to the region to be filled through the local optimal homography transformation, and the perspective relation and the spatial information of the image to be filled are met to the maximum extent, so that the subjective impression of the image is greatly improved. However, since the homography matrix H has 8 degrees of freedom, that is, in a two-dimensional space, only 4 pairs of corresponding feature points are needed to solve a homography transformation matrix H so that the homography transformation matrix H satisfies mathematical conditions. In the above process, even if the number of the local feature points after being screened is much higher than 4, meanwhile, because the camera reduces the dimension of the three-dimensional scene to the unavoidable distortion of the two-dimensional image under the objective condition, this means that near the edge of the filling area, even if random sampling is performed, the finally selected matching point pair already satisfies the local optimal solution of transformation, the homography transformation still can never perfectly transform all the feature points to the new plane, that is, the phenomenon that the edge key points after transformation can not perfectly transform the edge key points of the original image. The edge key points refer to connection points (e.g., break points generated by cutting off trees in an image) of an original complete graph due to the cutting off of a region to be filled or a filling region.

The break point generated by the tree being cut off in the image is shown in fig. 24.

In order to alleviate the problem that the edge key points cannot be completely overlapped, the edge of the area Sd to be filled is filled for the second time by using a single-frame filling method in the algorithm, so as to replace the original partially discontinuous filling edge. At the moment, most of pixel value information of the outside and the inside of the area to be filled is possessed, so that single-frame filling can be combined with real pixel information of different angles inside and outside to carry out reasoning, and the filling effect far higher than that of filling a large-area unknown area by only using a single frame is achieved.

The single frame filling algorithm is divided into three parts of priority calculation, matching block search and copy filling. Firstly, dividing an image into a known region and a region to be filled, and calculating the filling priority of edge pixel points along edges from the boundary to be filled.

As shown in fig. 25, the whole image area is I, the area to be filled is Ω, and the edge of the filled area is

P(p)＝C(p)*D(p)

where P (p) is the priority value at point p, C (p) is the confidence, D (p) is the data value defined as:

wherein | ψ_pI is block psi_pQ is another point of priority calculation with point p, α is an image normalization factor (generally 255 for a uint8 format image), n_pTo fill the edge of the region

The unit normal vector at the upper point p,

is an isolux line at point p. The initialization values are:

C(p)＝0，p∈Ω

C(p)＝1，p∈I-Ω

then, the block ψ is formed with the set block size centering on the boundary point p at which the current priority is the highest_p(see FIG. 26(a)), and then find similar blocks in the known region based on the matching criteria, such as block ψ centered at q 'and q' in FIG. 26(b)_q′And block psi_q″. Separately calculating the block psi_q′And block psi_q″And psi_pMean square error E ', E'. It was found experimentally that when the difference between E' and E "is less than 20%, taking into account the effect of noise in the image, no matter whether block ψ is used_q′And block psi_q″Filling, no significant difference from the macroscopic point of view of the image normally viewed by the human eye. When the difference between E 'and E' is more than 20%, the block with the minimum mean square error is selected for filling, so that a more excellent filling effect can be presented; however, when the difference between E 'and E' is less than 20%, and the macroscopic impression difference of human eyes is not large, because of the continuity of the change between adjacent pixels of the image, the space Euclidean distance discrimination is introduced, and the distance and the block psi are selected_pThe pixel blocks with the closest plane distance are filled, so that the continuous change property of the pixels is more consistent in microscopic details, and the image gradient increase caused by filling is avoided as much as possible (see fig. 26 (c)).

The edge of the left filled area is filled by a single frame filling algorithm as shown in FIG. 27, the graph without the single frame filling edge is shown in FIG. 28, the graph with the single frame filling edge is shown in FIG. 29, and the multi-frame filling effect is shown in FIG. 30 compared with the single-frame multi-frame combined filling effect.

4. Poisson fusion

By using the image filling algorithm combining single frame and multiple frames, different filling means are used aiming at different areas, different scenes and different conditions, and the image to be filled can obtain a better filling result. However, human eyes are very sensitive to color gradient, and the whole tone of the filled inner and outer pixels is difficult to be unified due to the influence of shooting time, angle and weather environment at that time, so that the subjective impression of human eyes cannot consciously separate the filled part from the whole image, and a strong split feeling is caused. Therefore, the color gradient at the edge joint filling each area is eliminated by using a Poisson fusion method, the integral tone in different areas is unified, and the concealment and the deception of image filling are greatly improved.

Without using the poisson fusion problem as shown in fig. 31, the filling of the image g into the background S according to the present invention can be simplified to fig. 32.

Wherein S is R²Represents the domain of definition of the image; omega is a closed subset of S, bounded by

f^*To be defined at the boundary of and outside the omega domain

A known scalar function of (a); f is defined as

I.e. the data we know includes: gradient field v of region omega, and the edges of the filled-in region

We want to fill in a certain way, which can satisfy the following two conditions:

1. keeping the fill image g undistorted, i.e. the gradient of the fill is as close as possible to v

2. To make seamless transition, i.e. to make the border pixel values of the filling content coincide with the existing S

To satisfy the above two requirements, the following mathematical expression is obtained:

wherein the content of the first and second substances,

And

the

above conditions

1 and 2 are respectively satisfied. The solution to this problem is the only solution to the poisson partial differential equation given the dirichlet boundary condition:

wherein the content of the first and second substances,

denotes the divergence of (u, v). I.e. laplacian, i.e. second order gradient

And the boundary condition at this time is called dirichlet boundary.

Solutions to solving the poisson equation with dirichlet boundary conditions can be solved step by step, typically using iterative methods: the resolved pixel value is bounded by

The pixel value of (b) is used as a starting point, the gradient direction of the guide vector field v is followed, and the filled area is gradually corrected in a circle, so that the visual effect of being more harmonious with the background is achieved. For a common three-channel color image, the poisson equation needs to be solved independently on three channels.

The figure before poisson fusion is shown in fig. 33, the figure after poisson fusion is shown in fig. 34, and the figure before and after poisson fusion is shown in fig. 35.

As another example, FIG. 36 is shown before filling, FIG. 37 is shown before Poisson, and FIG. 38 is shown after Poisson.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. The image content removing method is characterized by extracting feature points in fuzzy range areas in two images, approximately obtaining a local optimal homography transformation relation from a corresponding area in a graph A to an area to be filled in a graph S according to the position corresponding relation of corresponding feature point pairs, then intercepting a required area from the transformed area corresponding to the graph A and filling the area to be filled, and finally optimizing the problems of texture truncation, color difference and the like of a filling edge by using a single-frame filling and Poisson fuzzy algorithm to obtain a complete removing result.

2. The image content removal method according to claim 1, characterized by comprising the steps of:

step one, for a target image containing a region to be filled and a filling image used for filling, putting the determined filling image into the region to be filled by using a local optimal homography conversion algorithm for corresponding filling;

step two, performing self-adaptive filling by using a single-frame filling algorithm aiming at the filled edge area;

and thirdly, eliminating the color gradient difference between the filling area and the original image by using a Poisson fusion method.

3. The image content removing method according to claim 2, wherein in the first step, for the target image containing the region to be filled and the filling image used for filling, the determined filling image is placed in the region to be filled by using a locally optimal homography transformation algorithm for corresponding filling, and the method comprises:

(1) extracting feature points;

(2) adding respective description vectors for the feature points and searching for local optimal matching point pairs in the two graphs;

(3) and solving the local optimal homography transformation matrix according to the two-dimensional coordinate information of the matched characteristic point pair.

4. The image content removing method according to claim 3, wherein in the step (1), the extracting the feature points includes:

regarding the image S to be filled, regarding the information of the area Sd to be filled as unknown, and the most important condition is continuity between the Ad and the surrounding environment information after the Ad is filled into the Sd; based on geometric characteristics, the region Sd to be filled is expanded outwards by 20 percent and does not exceed the image boundary, and the region Sd to be filled is used as a rough sampling range of characteristic points and is called Sd _ out;

for the filling diagram A, because homography transformation is not carried out yet, the positions of the same objects in the diagrams S and A do not necessarily correspond to the completely same coordinates, but simultaneously, because the time interval between adjacent frames is short, the filling area appears near the coordinates of the area to be filled; the image A is subjected to fuzzy sampling, namely, the sampling range is expanded to include 50% of the area which is the same as Sd in the image A and the expansion of the area, and the area is called Ad _ out;

matching and searching are carried out by detecting an angular point inside (Sd _ out-Sd) and a local characteristic point in Ad _ out, so that the transformation relation of two image areas is calculated through the matched characteristic point pairs, and a Harris angular point detection matching method is used;

the Harris angular point detection matching algorithm has the idea that a fixed window is used for sliding on an image in any direction, the gray level change degrees of pixels in the window in two conditions before sliding and after sliding are compared, if the sliding in any direction exists, the large gray level change exists, and the angular point exists in the window;

wherein [ u, v ] is the two-dimensional offset of the window; e (u, v) is the difference in gray level at this offset position from the position before sliding; (x, y) is the corresponding pixel two-dimensional coordinate position in the window, and how many positions there are pixels of the window; w (x, y) is a window function, and the weight coefficient occupied by the pixel value of each position in the window in the calculation process; i is the brightness of the pixel point at the current position, and the corresponding gray value of the pixel point at the current position is the gray value in the image;

according to the expression, when the window slides on the flat area, the gray scale does not change, and then E (u, v) is 0; if the window slides over a region richer than texture, the gray scale change will be large; the final idea of the algorithm is to calculate the corresponding position when the gray scale changes greatly, and the larger position is the sliding of the pointer in any direction and is not a direction;

taylor expansion of the above equation gives:

wherein, I_xIs the partial derivative of the pixel brightness I in the x-direction, I_yIs the partial derivative of the pixel luminance I in the y-direction;

let matrix M be:

obtaining M as a covariance matrix according to the definition;

respectively testing three scenes, namely a flat area, a linear edge and an angular point edge to obtain a gradient distribution diagram; the feature value of the covariance matrix and the property of the feature vector are found to be obtained, and the three scenes are respectively represented by the feature value attributes:

when the characteristic values are all larger, namely the window contains angular points; the characteristic value is one larger and one smaller, and the window contains edges; the characteristic values are small, and the window is located in a flat area;

by nature, configuration R is:

R＝det M-k(trace M)²；

wherein det M ═ λ₁λ₂，traceM＝λ₁+λ₂，λ₁、λ₂K is a real coefficient for two eigenvalues of the covariance matrix M.

The properties of the function R correspond exactly to those of the covariance matrix M: when R is larger, the corresponding characteristic values are larger, namely, the window contains angular points; when R is less than 0, the corresponding characteristic values are larger and smaller, and the window contains edges; when the R is smaller, the corresponding characteristic values are smaller, and the window is in a flat area;

calculating the R value of each position by setting the movement of a sliding window on a region (Sd _ out-Sd) and Ad _ out, and recording the window position with the R value larger than a threshold value, namely a characteristic point;

taking a feature point in each grid of 15 × 15 pixels on average, namely taking 10 feature points in a 150 × 150 area to enter a matching link of the feature points to ensure that corresponding feature point pairs are matched, and preferentially selecting a specific number of key points by using a self-adaptive non-maximum suppression method;

the idea of adaptive non-maximum suppression is to form a set I for many feature points extracted in the previous step, for each of which x_iAll the characteristic points are taken as a circle center, so that the R values of all other characteristic points in the circle are less than x_iAnd finding the minimum circle radius R satisfying the above conditions_iAdding it to the queue; traversing all points in the calculation set to obtain the minimum circle radius r meeting the constraint condition_iSorting in descending order, and selecting the first n corresponding points as a key point set obtained after self-adaptive non-maximum value inhibition;

expressed in mathematical language as:

is the set of all key points, c is 0.9;

in actual calculation, the number of feature points to be screened is calculated according to the pixel area of the region, and at this time, since the number of feature points in the uniform region (Sd _ out-Sd) and the Ad _ out are required to be the same in subsequent calculation, the region with larger pixel area is usedTaking the domain Ad _ out as a reference, calculating the number of the feature points acquired in a self-adaptive manner, and setting the number as n; secondly, finding out a key point Rmax with the maximum R value in the region, adding the key point Rmax into a queue, and obtaining a value of Rmax 0.9; traversing all key points in the area if the key point x_iRi > Rmax 0.9, the radius of the point is set to infinity; if the key point x_iRi < Rmax 0.9, this point is calculated to the point x > 0.9R nearest to it_iRecording the distance r between two points_i(ii) a Finally, sorting all the r, and finding out n points with the maximum r;

after n feature points are found in the region (Sd _ out-Sd) and Ad _ out respectively, describing the features of each point, wherein the features are used for finding the corresponding relation of the feature points between the graph A and the graph S, and constructing a descriptor by using the two-dimensional position relation between the key points;

making appropriate Gaussian blur on the gray level image; taking a region of 40x40 pixels by taking each feature point as a center; namely, the area of 40x40 pixels around each feature point is used as a description area, so that each description area contains 3-4 feature points including the central feature point on average; down-sampling the region to 8x8 size, and generating a 64-dimensional vector by dimension; carrying out normalization processing on the vectors, namely each feature point generates a 64-dimensional vector as a descriptor of the feature point, and thus an nx 64-dimensional feature matrix is obtained for two registration areas respectively;

after having descriptors of all the feature points, screening out matched feature points in the two images as feature point pairs, and filtering out feature point sets with low correlation; calculating the variance of the description vectors between every two n feature points, sorting the n feature points from small to large according to the ratio of the minimum variance to the second minimum variance, and taking the point pairs smaller than a threshold value of 0.5 as screened matching points;

after finding out the point pair with the highest matching confidence coefficient, solving a homography transformation matrix of the point pair according to the two-dimensional coordinate matching relation between the point pairs, transforming the Ad _ out of the graph A area and then filling the required part into the area Sd to be filled of the graph S;

wherein H_3*3The homography matrix is used as a key variable for solving the transformation relation;

each set of matching points

The equation holds true:

Then:

further transformation is as follows:

(h₃₁x_i+h₃₂y_i+h₃₃)·x′_i＝h₁₁x_i+h₁₂y_i+h₁₃；

(h₃₁x_i+h₃₂y_i+h₃₃)·y′_i＝h₂₁x_i+h₂₂y_i+h₂₃；

writing in matrix form:

i.e. each set of matching points

2 sets of equations can be obtained;

since the homography matrix H is exactly the same as aH, where a ≠ 0, then:

i.e. point (x)_i，y_i) Whether mapped via H or aH, the change is (x'_i，y′_i)；

If make

Then there are:

although the homography matrix H has 9 unknowns, a constraint H is generally added during the solution₃₃If 1, then h₁₁...h₃₃A total of 8 unknowns, i.e. only 8 degrees of freedom; due to a set of matching points

Corresponding to 2 groups of processes, the unique solution of H can be solved only by n-4 groups of non-collinear matching points;

if the number of the matched characteristic point pairs screened in the previous step is less than 4, then returning to the step of local maximum suppression of the characteristic points, and properly reducing the degree of local suppression to increase the number of the matched characteristic point pairs until the number of the obtained matched characteristic point pairs is more than 4; in actual situations, the number of the screened matching key point pairs is far more than 4 required pairs in most situations, and a random sampling algorithm is used to obtain 4 pairs with the minimum matching error; taking an image as a reference, randomly selecting 4 points from the area (Sd _ out-Sd) each time, and finding out 4 matched points in the area Ad _ out; calculating by using 4 pairs of points according to the method to obtain a homography transformation matrix, transforming and projecting the residual characteristic points in (Sd _ out-Sd) into an area Ad _ out according to the homography transformation matrix, and counting the number of the superposition of the residual characteristic points and the original characteristic points of the area Ad _ out; repeating the step 2000 times, and selecting one homography transformation matrix with the most accurate pairings as a final transformation matrix; so far, the local optimal projective transformation relation of the two images has been found.

5. The image content removing method according to claim 2, wherein in step two, the adaptively padding for the padded edge region by using a single frame padding algorithm comprises:

the filling region is transformed to a region to be filled by local optimal homography transformation, and the perspective relation and the spatial information of the image to be filled are met to the maximum extent; carrying out secondary filling on the edge of the area Sd to be filled by using a single-frame filling method, wherein the secondary filling is used for replacing the original partially discontinuous filling edge;

the single frame filling algorithm comprises three parts of priority calculation, matching block search and copy filling;

firstly, dividing an image into a known region and a region to be filled, and calculating the filling priority of edge pixel points along edges from the boundary to be filled;

P(p)＝C(p)*D(p)；

where C (p) is confidence, D (p) is a data value defined as:

A unit normal vector at the upper point p;

is an isolux line at point p;

the initialization values are:

C(p)＝0，p∈Ω；

C(p)＝1，p∈I-Ω；

the block ψ is formed with a set block size centering on the boundary point p of the highest current priority_p(ii) a Finding similar blocks in a known area according to a matching criterion, the block phi centered at q' and q ″_q′And block psi_q″(ii) a Separately calculating the block psi_q′And block psi_q″And psi_pThe mean square error of E ', E'; when the difference between E 'and E' is more than 20%, selecting the block with the minimum mean square error for filling; however, when the difference between E 'and E' is less than 20%, and the macroscopic impression difference of human eyes is not large, because of the continuity of the change between adjacent pixels of the image, the space Euclidean distance discrimination is introduced, and the distance and the block psi are selected_pThe pixel block with the closest plane distance is filled.

6. The method for removing image content according to claim 2, wherein in step three, the method for removing the color gradient difference between the filled region and the original image by using poisson fusion comprises:

f^*To be defined at the boundary of and outside the omega domain

A known scalar function of (a); f is defined as

An unknown scalar function on the inside of the domain; v is a vector field defined over the Ω domain;

After filling, the following two conditions are met:

(2) seamless transition is required, namely the boundary pixel value of the filling content is consistent with the existing S;

wherein the content of the first and second substances,

And

the conditions (1) and (2) are respectively corresponded; the solution to this problem is the only solution to the poisson partial differential equation given the dirichlet boundary condition:

wherein the content of the first and second substances,

represents the divergence of (u, v); i.e. laplacian, i.e. second order gradient

The boundary condition at this time is called dirichlet boundary;

Is taken as a starting point, follows the gradient direction of the guide vector field v, and is gradually repaired once and for a circleFilling the area to achieve a more harmonious visual effect with the background; for a common three-channel color image, the poisson equation needs to be solved independently on three channels.

7. An image content removal system for implementing the image content removal method according to any one of claims 1 to 6, the image content removal system comprising:

8. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

10. An information data processing terminal characterized by being configured to implement the image content removal system according to claim 7.