CN107992874B

CN107992874B - Image salient target region extraction method and system based on iterative sparse representation

Info

Publication number: CN107992874B
Application number: CN201711387624.3A
Authority: CN
Inventors: 张永军; 王祥; 谢勋伟; 李彦胜
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2020-01-07
Anticipated expiration: 2037-12-20
Also published as: CN107992874A

Abstract

The invention provides a method and a system for extracting a significant target area of an image based on iterative sparse representation, which comprises the steps of firstly carrying out superpixel segmentation on an original image by utilizing a SLIC (linear segmentation and segmentation) segmentation method of a plurality of groups of different pixel number parameters to generate a group of segmented images with different superpixel area sizes; and then, aiming at the segmentation result of each scale, taking the classical visual attention detection result as the initial saliency map to restrict the selection of foreground and background sample regions, further calculating the reconstructed residual error of each super-pixel region as a saliency factor through a sparse representation process, optimizing a saliency detection result map under a single scale by combining recursive iterative operation, and finally obtaining a final saliency target and a detection result through multi-scale saliency map fusion. The method provided by the invention can effectively improve the defects of inconsistent single-target significance evaluation, difficulty in detecting the image edge significant target, incomplete extraction of multiple significant targets and the like in the traditional method.

Description

Image salient target region extraction method and system based on iterative sparse representation

Technical Field

The invention belongs to the field of computer vision and image processing, and relates to an image salient target region extraction technology based on iterative sparse representation.

Background

The image visual saliency analysis is a basic research project which is very important in the fields of computer vision, psychology, neuroscience and the like, and is a technical embodiment that human eyes can quickly and accurately capture the biological performance of a target area which can draw attention in vision from a scene. Through image significance analysis, target areas which people are interested in can be effectively extracted, data compression can be successfully achieved, efficient management and utilization of data are completed, and the method is also a basic link of a plurality of image processing problems.

From 1998, people realize automatic significance analysis of images through computers for the first time, and as application prospects of the automatic significance analysis are continuously mined, novel significant target automatic detection algorithms are endless. From the solution point of view, the existing significant object extraction algorithms can be roughly divided into two categories, namely a data-driven bottom-up detection method and a task-driven top-down detection method. The former automatically processes and identifies an input image according to experience recognition to realize traditional cognitive significance analysis, and is usually an automatic extraction algorithm in an unsupervised mode; the later combines with the actual target task to perform targeted analysis on the image, extracts the target object which meets the specific application requirement, and is usually a recognition algorithm under supervised learning. On the other hand, from the perspective of extracting the result state, the conventional method can be further divided into a saliency analysis algorithm based on visual attention and a saliency target extraction method, wherein the saliency analysis algorithm generates a pixel-level saliency prediction map, and the saliency target extraction method takes the extracted complete saliency target region as a final target.

In the bottom-up unsupervised approach, due to the lack of high-level biological cognitive information, certain hypothetical constraints are usually introduced to complete the detection task. Empirical analysis results show that objects distributed near the middle of the image are more attractive to visual attention, whereas the saliency of areas near the edges of the image is generally lower; meanwhile, local areas with high contrast can also show higher visual saliency characteristics, so that the saliency detection method combining image center/boundary constraint and contrast analysis is developed rapidly, and meanwhile, the saliency detection method also shows very outstanding detection performance. With the further and extensive research and application, the deficiency of the dependence of the foregoing method on hypothetical conditions becomes more and more prominent, which is shown in the following: 1) when a salient object is close to the edge of an image, correct detection cannot be achieved generally; 2) based on a local contrast analysis method, the extracted significant target area is incomplete, and the target internal significance evaluation is not uniform; 3) the method based on the global contrast analysis often fails to detect when dealing with the problem of multiple simultaneous salient objects. Therefore, how to overcome the defects in the traditional method, weaken the hypothetical condition constraint dependency under the condition of high-level cognitive information loss, improve the uniformity and integrity of the extraction of the significant target, and strengthen the adaptability of the algorithm is still a technical problem which needs further research and overcoming.

Disclosure of Invention

The invention aims to provide a technical scheme of a method for extracting the consistency of an image salient target area under a natural background, which can fully utilize the comprehensive difference of a foreground and a background in an image, integrate the inherent relation between salient targets, weaken the dependency of a salient analysis process on the traditional hypothetical condition constraint, realize the consistency extraction of multiple salient targets, and ensure the internal integrity of a single salient target and the integrity of multiple salient targets.

In order to achieve the above object, the technical solution provided by the present invention is a method for extracting a salient object region of an image based on iterative sparse representation, comprising the following steps:

step 1, preprocessing data, setting different SLIC superpixel numbers, carrying out multi-scale superpixel segmentation on an original image, using saliency detection based on classical visual attention, and setting a detection result as an initial saliency map SAL₀；

Step 2, extracting the salient features of the pixel-level original image, wherein the salient features comprise color features, position features and gradient features;

step 3, calculating the average value of all original pixel features in the superpixel region of each scale to obtain the superpixel region features under single-scale segmentation;

step 4, aiming at the segmentation result of a single scale, calculating a saliency map through recursive sparse representation, and comprising the following substeps:

step 4.1, superpixel initial renderingSaliency calculation according to SAL₀Solving the initial significance level of each super pixel through mean value calculation;

step 4.2, extracting foreground samples, performing descending order arrangement on the initial significance level of the superpixels, and taking the front p 1% superpixels as foreground samples D_f；

Step 4.3, extracting a background sample, arranging the initial significance levels of the superpixels in an ascending order, and taking the front p2% superpixels as an alternative background sample D_b1Extracting superpixels contacting the image boundary as alternative background samples D_b2The background sample calculation formula is as follows:

D_b＝D_b1+D_b2-D_f (1)

step 4.4, performing double sparse representation and sparse residual calculation, wherein the foreground sample and the background sample are respectively used as dictionaries to perform sparse representation on all the superpixels, and a reconstructed residual is calculated, wherein the formula is as follows:

wherein i represents a super pixel number; f_iIs a feature vector of the superpixel region; lambda [ alpha ]_b，λ_fIs a regularization parameter; alpha is alpha_bi，α_fiRespectively representing a foreground sparse representation result and a background sparse representation result; epsilon_bi，ε_fiRespectively a foreground dilution reconstruction residual error and a background sparse reconstruction residual error;

step 4.5, calculating the significance factor, and aligning epsilon according to a formula (6)_biAnd ε_fiAre fused andgiving the super-pixel fusion result to all original image pixels in the super-pixel fusion result, and calculating to obtain a significant factor graph SAL_i，

SAL_i＝ε_bi/(ε_fi+σ²) (6)

Wherein sigma²A non-negative tuning parameter;

step 4.6, recursive processing is carried out, and the significance factor graph SAL is calculated according to a formula (7)_iAnd an initial saliency map SAL₀The rela coefficient between them, if rela<K, then let SAL₀＝SAL_iAnd the whole process of the step 4 is repeatedly executed; if rela>K, then the recursion is ended and the current SAL is output_iThe significance detection result at the scale is obtained; wherein K is a similarity determination threshold value,

rela＝corr2(A,B) (7)

wherein corr2() is the correlation coefficient calculation function; a and B are matrixes or images to be compared; rela is a correlation coefficient between A and B, the larger the value is, the more similar A and B are, otherwise, the difference is larger;

and 5, fusing multi-scale significance detection results, performing equal-weight linear combination on significance results under each single scale, and calculating a final significance detection result.

Further, in step 2, the salient features are RGB, Lab, x, y, 13-dimensional features with first-order gradient and second-order gradient, and are expressed as { R, G, B, L, a, B, x, y, f_x,f_y,f_xx,f_yy,f_xyThe color feature is a six-dimensional feature of R, G, B, L, a and B, and the R, G, B and L, a and B are RGB and LAB color information respectively; x and y are position information, and x and y are row-column coordinates of pixels in the image; f. of_x,f_y,f_xx,f_yy,f_xyFor the gradient feature, the first and second order differences of the pixel in the X and Y directions are respectively expressed, and the calculation formula is as follows:

f_x＝(f(i+1,j)-f(i-1,j))/2

f_y＝(f(i,j+1)-f(i,j-1))/2

f_xx＝(f_x(i+1,j)-f_x(i-1,j))/2 (8)

f_yy＝(f_y(i,j+1)-f_y(i,j-1))/2

f_xy＝(f_x(i,j+1)-f_x(i,j-1))/2

where f (i, j) is the image matrix and i, j is the image pixel row column number.

The invention also correspondingly provides an image salient target region extraction system based on iterative sparse representation, which comprises the following modules,

a preprocessing module for preprocessing data, setting different SLIC superpixel numbers, performing multi-scale superpixel segmentation on the original image, using the saliency detection based on classical visual attention, setting the detection result as an initial saliency map SAL₀；

The salient feature extraction module is used for extracting salient features of the pixel-level original image, wherein the salient features comprise color features, position features and gradient features;

the super-pixel region feature acquisition module is used for calculating the mean value of all original pixel features in the super-pixel region of each scale to obtain the super-pixel region features under single-scale segmentation;

the sparse representation module is used for calculating the saliency map through recursive sparse representation aiming at the segmentation result of a single scale, and comprises the following sub-modules:

a first sub-module for superpixel initial saliency calculation according to SAL₀Solving the initial significance level of each super pixel through mean value calculation;

the second sub-module is used for extracting the foreground sample, performing descending order arrangement on the initial significance level of the superpixels, and taking the front p 1% superpixels as the foreground sample D_f；

The third sub-module is used for extracting a background sample, the initial significance levels of the super pixels are arranged in an ascending order, and the front p2% of super pixels are taken as an alternative background sample D_b1Extracting superpixels contacting the image boundary as alternative background samples D_b2The background sample calculation formula is as follows:

D_b＝D_b1+D_b2-D_f (1)

the fourth submodule is used for dual sparse representation and sparse residual calculation, all the superpixels are sparsely represented and reconstructed residual is calculated by taking the foreground sample and the background sample as dictionaries, and the formula is as follows:

a fifth submodule for calculating the significance factor, for ε according to equation (6)_biAnd ε_fiFusing, giving the super-pixel fusion result to all original image pixels in the super-pixel fusion result, and calculating to obtain a significant factor graph SAL_i，

SAL_i＝ε_bi/(ε_fi+σ²) (6)

Wherein sigma²A non-negative tuning parameter;

a sixth sub-module for recursive processing for calculating the significance factor graph SAL according to equation (7)_iAnd an initial saliency map SAL₀The rela coefficient between them, if rela<K, then let SAL₀＝SAL_iAnd the whole process of the step 4 is repeatedly executed; if rela>K, then the recursion is ended and the current SAL is output_iThe significance detection result at the scale is obtained; wherein K is a similarity determination threshold value,

rela＝corr2(A,B) (7)

and the detection result fusion module is used for fusing multi-scale significance detection results, performing equal-weight linear combination on significance results under each single scale, and calculating a final significance detection result.

Further, the salient features in the salient feature extraction module are RGB, Lab, x, y, 13-dimensional features with first-order gradient and second-order gradient, and are expressed as { R, G, B, L, a, B, x, y, f_x,f_y,f_xx,f_yy,f_xyThe color feature is a six-dimensional feature of R, G, B, L, a and B, and the R, G, B and L, a and B are RGB and LAB color information respectively; x and y are position information, and x and y are row-column coordinates of pixels in the image; f. of_x,f_y,f_xx,f_yy,f_xyFor the gradient feature, the first and second order differences of the pixel in the X and Y directions are respectively expressed, and the calculation formula is as follows:

f_x＝(f(i+1,j)-f(i-1,j))/2

f_y＝(f(i,j+1)-f(i,j-1))/2

f_xx＝(f_x(i+1,j)-f_x(i-1,j))/2 (8)

f_yy＝(f_y(i,j+1)-f_y(i,j-1))/2

f_xy＝(f_x(i,j+1)-f_x(i,j-1))/2

The method of the invention firstly utilizes a SLIC segmentation method of a plurality of groups of different pixel number parameters to carry out superpixel segmentation on an original image, generates a group of segmented images with different superpixel areas and establishes multi-scale source data. And then, aiming at the segmentation result of each scale, taking the classical visual attention detection result as the initial saliency map to restrict the selection of foreground and background sample regions, further calculating the reconstructed residual error of each super-pixel region as a saliency factor through a sparse representation process, optimizing a saliency detection result map under a single scale by combining recursive iterative operation, and finally obtaining a final saliency target and a detection result through multi-scale saliency map fusion. The technical scheme of the invention has the following advantages:

1) the image is divided into the super-pixel images with multiple scales through a plurality of sets of SLIC dividers, so that on one hand, the image contour information can be effectively reserved by combining with an SLIC method, and the consistency of the interior of the same target area can be kept in the process of saliency detection; in addition, through multi-scale segmentation, the algorithm has better adaptability and robustness to the detection of targets with different sizes.

2) The pixel (region) significance is calculated through a dual sparse representation process based on a foreground dictionary and a background dictionary, on one hand, a reconstruction process residual error is used as a significance level index, the visual significance similarity level between pixels is judged from a global angle, and the method is different from the traditional method based on contrast and image boundary constraint and can effectively solve the problem of incomplete target detection; in addition, the calculation process of the double sparse representation can also carry out more comprehensive analysis on the attributes of all pixels so as to judge the significance level of the pixels, and the algorithm robustness can be further improved.

3) The dependence of the algorithm on the initial saliency map generated based on the classical visual attention model can be weakened to a certain extent through the recursive optimization process, and the reliability of the algorithm is improved.

Drawings

Fig. 1 is a graph comparing the detection result of the salient object of the image processed by the method of the present invention in the embodiment of the present invention with the traditional method, (a) is an input image, (b) is a true value of the salient object, (c) is a detection result of the traditional local contrast-based method, (d) is a detection result of the global contrast-based method, (e) is a detection result of the image boundary constraint method, and (f) is a detection result of the method of the present invention.

FIG. 2 is a flow chart of an embodiment of the present invention.

Detailed Description

The following describes a specific embodiment of the present invention with reference to the drawings and examples.

The invention provides an image salient target region extraction method based on iterative sparse representation. The method extracts the target area which can attract the most visual attention of a human body by carrying out significance analysis on the image, can play the roles of effective data optimization and data compression, and is a basic link of a plurality of image processing problems. Researches show that the traditional image salient object extraction method based on local contrast, global contrast and image boundary constraint generally has strong dependence on corresponding constraint conditions, and is easy to have the problems of non-uniform single-object internal saliency, incomplete multi-target detection and ambiguous image boundary salient object extraction, as shown in fig. 1(c) - (e). The method utilizes double sparse representation and reconstructed residual to calculate the pixel significance level, optimizes the detection result by means of a recursive iterative process, and improves the algorithm applicability by fusing multi-scale detection results, as shown in fig. 1(f), the single target significance in the detection result of the method has better consistency, is more complete for multi-target detection, and can improve the defect that the traditional method has missing detection on the significant target close to the image boundary to a certain extent, the embodiment fully proves that the method has the detection performance stronger than the traditional general significant target extraction method, as shown in fig. 2, the specific implementation method provided by the embodiment comprises the following steps:

and step 1, data acquisition. The saliency detection opens up the source data set raw image and the saliency target truth data is downloaded.

And 2, preprocessing data. And carrying out multi-scale segmentation on the original image, and carrying out saliency detection by using a classical visual attention model to generate an initial saliency map.

And 3, extracting 13-dimensional features of the pixels of the original image, namely { R, G, B, L, a, B, x, y, f_x,f_y,f_xx,f_yy,f_xy}。

And 4, extracting the super-pixel region characteristics under single-scale segmentation through mean value calculation, namely F ═ mR, mG, mB, mL, ma, mB, mx, my, mf_x,mf_y,mf_xx,mf_yy,mf_xyWhere F is a region feature vector, mX (X ═ R, G, B, L, a, B, X, y, F_x,f_y,f_xx,f_yy,f_xy) Is the average of the original pixel X attributes of all images in the super pixel region.

Step 5, aiming at the segmentation result of the single scale, calculating a saliency map through recursive sparse representation, and comprising the following sub-steps of:

and 5.1, obtaining a super-pixel initial saliency map through mean value calculation.

And 5.2, extracting a part of super-pixel regions with higher initial significance level as foreground samples. Arranging the initial significance levels of the super-pixels in a descending order, and taking the first p 1% of the super-pixels as foreground samples D_fIn this embodiment, p1 is 20, and those skilled in the art can select an appropriate value as required;

and 5.3, extracting a background sample by combining the initial saliency map and the image boundary constraint. The initial significance levels of the super-pixels are arranged in an ascending order, and the first p2% of the super-pixels are taken as alternative background samples D_b1In this embodiment, p2 is 20, and those skilled in the art can select an appropriate value as required; extracting superpixels contacting the image boundary as alternative background samples D_b2The background sample calculation formula is as follows:

D_b＝D_b1+D_b2-D_f (1)

and 5.4, respectively carrying out two groups of sparse representations on all the super pixels of the image by using the foreground sample and the background sample, and calculating corresponding reconstruction residual errors. The formula is as follows:

wherein i represents a super pixel number; f_iIs a feature vector of the superpixel region; lambda [ alpha ]_b，λ_fIs a regularizing parameter, in this embodiment λ_b，λ_fAll are taken as 0.01; alpha is alpha_bi，α_fiRespectively representing a foreground sparse representation result and a background sparse representation result; epsilon_bi，ε_fiRespectively a foreground dilution reconstruction residual error and a background sparse reconstruction residual error;

and 5.5, fusing the two groups of reconstructed residuals to generate a super-pixel significance factor, and acquiring an original image significance factor graph by taking the requirement of consistent original image pixel significance in a super-pixel region as a criterion. According to formula (6) to epsilon_biAnd ε_fiFusing, giving the super-pixel fusion result to all original image pixels in the super-pixel fusion result, and calculating to obtain a significant factor graph SAL_i，

SAL_i＝ε_bi/(ε_fi+σ²) (6)

Wherein sigma²The parameters are non-negative adjustment parameters, in this embodiment, 0.1 is taken, and a person skilled in the art can select a proper value as required;

and 5.6, comparing the significant factor graph obtained in the step 5.5 with the initial significant graph in the step 5.1, executing a recursion processing process, and outputting a significant detection result under the current scale when the recursion is finished. Calculating the significance factor graph SAL according to equation (7)_iAnd an initial saliency map SAL₀The rela coefficient between them, if rela<K, then let SAL₀＝SAL_iAnd the whole process of the step 4 is repeatedly executed; if rela>K, then the recursion is ended and the current SAL is output_iThe significance detection result at the scale is obtained; wherein K is a similarity determination threshold, and in this embodiment, K is 0.99, and a person skilled in the art can select an appropriate value as needed;

rela＝corr2(A,B) (7)

and 6, performing mean value fusion on the significance detection results under the multi-scale condition to generate a final target extraction result.

Theoretically, in the implementation of the whole technical scheme, the whole consistency extraction of the significant target area in the natural background image is realized under the support of the sparse representation principle. Different from the traditional method for detecting the salient targets based on the contrast and the image boundary constraint, the method disclosed by the invention fully utilizes the comprehensive difference between the foreground and the background in the image, integrates the inherent relation between the interior of the salient target and a plurality of salient targets, tries to avoid the problem of the supposition condition dependency faced by the traditional detection method based on the contrast constraint and the image boundary constraint, takes sparse representation as an image pixel consistency analysis approach, takes sparse reconstruction residual error as a pixel difference index, and takes the sparse reconstruction residual error based on an image foreground and background dictionary as a saliency factor, so that the consistency extraction of the plurality of salient targets is realized, and the integrity of the interior of a single salient target and the integrity of multi-target extraction are ensured.

In specific implementation, the technical scheme of the invention can realize automatic operation flow based on a computer software technology, and can also realize a corresponding system in a modularized mode. The embodiment of the invention provides an image salient target region extraction system based on iterative sparse representation, which comprises the following modules:

first sub-module for super-pixel initial saliencySexual calculation according to SAL₀Solving the initial significance level of each super pixel through mean value calculation;

D_b＝D_b1+D_b2-D_f (1)

SAL_i＝ε_bi/(ε_fi+σ²) (6)

Wherein sigma²A non-negative tuning parameter;

rela＝corr2(A,B) (7)

The salient features in the salient feature extraction module are RGB, Lab, x, y and 13-dimensional features with first-order gradient and second-order gradient in total, and are expressed as F ═ R, G, B, L, a, B, x, y, F_x,f_y,f_xx,f_yy,f_xyThe color feature is a six-dimensional feature of R, G, B, L, a and B, and the R, G, B and L, a and B are RGB and LAB color information respectively; x and y are position information, and x and y are row-column coordinates of pixels in the image; f. of_x,f_y,f_xx,f_yy,f_xyFor the gradient feature, the first and second order differences of the pixel in the X and Y directions are respectively expressed, and the calculation formula is as follows:

f_x＝(f(i+1,j)-f(i-1,j))/2

f_y＝(f(i,j+1)-f(i,j-1))/2

f_xx＝(f_x(i+1,j)-f_x(i-1,j))/2 (8)

f_yy＝(f_y(i,j+1)-f_y(i,j-1))/2

f_xy＝(f_x(i,j+1)-f_x(i,j-1))/2

The specific implementation of each module can refer to corresponding steps, and the invention is not described.

The above description of the embodiments is merely illustrative of the basic technical solutions of the present invention and is not limited to the above embodiments. Any simple modification, addition, equivalent change or modification of the described embodiments may be made by a person or team in the field to which the invention pertains without departing from the essential spirit of the invention or exceeding the scope defined by the claims.

Claims

1. The image salient object region extraction method based on the iterative sparse representation is characterized by comprising the following steps of:

step 4.1, superpixel initial saliency calculation according to SAL₀Solving the initial significance level of each super pixel through mean value calculation;

D_b＝D_b1+D_b2-D_f (1)

step 4.5, calculating the significance factor, and aligning epsilon according to a formula (6)_biAnd ε_fiFusing, giving the super-pixel fusion result to all original image pixels in the super-pixel fusion result, and calculating to obtain a significant factor graph SAL_i，

SAL_i＝ε_bi/(ε_fi+σ²) (6)

Wherein sigma²A non-negative tuning parameter;

step 4.6, recursive processing is carried out, and the significance factor graph SAL is calculated according to a formula (7)_iAnd an initial saliency map SAL₀Relative coefficient between them, if rela is less than K, SAL is ordered₀＝SAL_iAnd the whole process of the step 4 is repeatedly executed; if rela > K, the recursion is ended, and the current SAL is output_iThe significance detection result at the scale is obtained; wherein K is a similarity determination threshold value,

rela＝corr2(A，B) (7)

2. The image salient object region extraction method based on iterative sparse representation as claimed in claim 1, wherein the salient features in step 2 are RGB, Lab, x, y, and 13-dimensional features with first-order gradient and second-order gradient in total, and are represented as { R, G, B, L, a, B, x, y, f_x，f_y，f_xx，f_yy，f_xyThe color feature is a six-dimensional feature of R, G, B, L, a and B, and the R, G, B and L, a and B are RGB and Lab color information respectively; x and y are position information, and x and y are row-column coordinates of pixels in the image; f. of_x，f_yf_xx，f_yy，f_xFor the gradient feature, the first and second order differences of the pixel in the X and Y directions are respectively expressed, and the calculation formula is as follows:

3. The image salient object region extraction system based on the iterative sparse representation is characterized by comprising the following modules:

The third sub-module is used for extracting a background sample, the initial significance levels of the super pixels are arranged in an ascending order, and the front p 20% of super pixels are taken as an alternative background sample D_b1Extracting superpixels contacting the image boundary as alternative background samples D_b2The background sample calculation formula is as follows:

D_b＝D_b1+D_b2-D_f (1)

SAL_i＝ε_bi/(ε_fi+σ²) (6)

Wherein sigma²A non-negative tuning parameter;

a sixth sub-module for recursive processing for calculating the significance factor graph SAL according to equation (7)_iAnd an initial saliency map SAL₀Relative coefficient between them, if rela is less than K, SAL is ordered₀＝SAL_iAnd the whole process of the step 4 is repeatedly executed; if rela > K, the recursion is ended, and the current SAL is output_iThe significance detection result at the scale is obtained; wherein K is a similarity determination threshold value,

rela＝corr2(A，B) (7)

4. The iterative sparse representation-based image salient object region extraction system according to claim 3, wherein the salient features in the salient feature extraction module are RGB, Lab, x, y and 13-dimensional features with first-order gradient and second-order gradient in total, and are represented as { R, G, B, L, a, B, x, y, f_x，f_y，f_xx，f_yy，f_xyThe color feature is a six-dimensional feature of R, G, B, L, a and B, and the R, G, B and L, a and B are RGB and Lab color information respectively; x and y are position information, and x and y are row-column coordinates of pixels in the image; f. of_x，f_y，f_xx，f_yy，f_xyFor the gradient feature, the first and second order differences of the pixel in the X and Y directions are respectively expressed, and the calculation formula is as follows: