CN113408408A

CN113408408A - Sight tracking method combining skin color and iris characteristics

Info

Publication number: CN113408408A
Application number: CN202110674313.5A
Authority: CN
Inventors: 黄祖胜
Original assignee: Hangzhou Jiaxuan Information Technology Co ltd
Current assignee: Hangzhou Jiaxuan Information Technology Co ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-09-17

Abstract

The invention provides a sight tracking method and a sight tracking system combining skin color and iris characteristics, wherein the sight tracking method comprises the following steps: extracting a face region to be detected from a face image to be detected comprising a face to be detected by adopting a pre-trained face region segmentation model; acquiring human face characteristic points with the number of preset characteristic points in a human face area to be detected by an enhanced gradient characteristic method; extracting and obtaining an orbit contour according to the characteristic points of the human face, and searching and obtaining an iris edge in the orbit contour by adopting a sliding window so as to obtain the center position of the iris according to the iris edge; and correcting the central position of the iris by adopting a projection mapping algorithm, obtaining an eye movement vector by combining the corrected human face characteristic points and the central position of the iris, calculating to obtain an initial fixation point coordinate according to the eye movement vector corrected by adopting the projection mapping algorithm, and performing compensation calculation on the initial fixation point coordinate by using an SVR (support vector regression) model to obtain a final fixation point coordinate. The method and the device finish the final coordinate calculation of the fixation point without additional hardware performance equipment.

Description

Sight tracking method combining skin color and iris characteristics

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a sight tracking method combining skin color and iris characteristics.

Background

With the research and progress of computer research, human-computer interaction is one of the hot spots of computer research, and the human-computer interaction mode gradually develops and extends towards multiple directions. The traditional modes such as original keyboard and mouse operation, handle control and the like are developed to a plurality of new modes such as voice recognition, gesture recognition, human eye recognition and the like which utilize sense organ body feeling, so that man-machine interaction becomes more direct, flexible and convenient, and is more visual and intelligent compared with the traditional modes. And eye-tracking interactions among them receive a great deal of attention.

The sight tracking technology is a process of collecting user eye information through corresponding equipment and extracting corresponding characteristics to carry out fixation point estimation mapping. Visual interaction can be combined with an intelligent control technology, and the visual interaction can be used as auxiliary equipment to help people with life inconvenience to improve self-care level. The change condition of the watching position can be acquired when the user reads the examination paper, so that the content and the operation habit which are interested by the user can be found, the advertisement putting can be optimized, and the examination paper can be used in scenes such as network teaching or online testing and the like to detect the current learning examination state of the user. In addition, the method can also be applied to a plurality of fields such as smart phones, entertainment and driving assistance. However, most of the existing sight tracking devices rely on infrared light sources or special hardware devices such as an eye tracker, so that the existing sight tracking devices cannot track and monitor the sight of a user during online testing.

Disclosure of Invention

In view of the above problems in the prior art, a gaze tracking method combining skin color and iris features is provided.

The specific technical scheme is as follows:

a method for tracking sight line by combining skin color and iris characteristics comprises the following steps:

step S1, extracting a face region to be detected from a face image to be detected comprising a face to be detected by adopting a pre-trained face region segmentation model;

step S2, obtaining human face feature points with the number of preset feature points in the human face area to be detected by an enhanced gradient feature method;

step S3, extracting and obtaining an eye socket outline according to the human face characteristic points, and searching and obtaining an iris edge in the eye socket outline by adopting a sliding window so as to obtain an iris center position according to the iris edge;

and step S4, correcting the center position of the iris by adopting a projection mapping algorithm, obtaining an eye movement vector by combining the human face characteristic points and the corrected center position of the iris, correcting the eye movement vector by adopting the projection mapping algorithm, calculating to obtain an initial fixation point coordinate according to the corrected eye movement vector, and performing compensation calculation on the initial fixation point coordinate by using an SVR (singular value representation) model to obtain a final fixation point coordinate.

Preferably, the gaze tracking method, wherein the step S1 specifically includes a step of creating a face region segmentation model and a step of extracting a face region to be detected:

the step of creating the face region segmentation model specifically comprises:

step S11, acquiring a training image set, wherein the training image set comprises a plurality of training face images including training faces;

step S12, preprocessing the training face image to obtain a processed image;

step S13, mapping the RGB color space of the processed image to the YCbCr color space, obtaining the chroma vector of each pixel point of the processed image, and obtaining the chroma vector sample set corresponding to the training image set, wherein the chroma vector sample set comprises the chroma vector of each pixel point of the processed image corresponding to each training image in the training image set;

step S14, calculating the spatial distribution of the chroma vector sample set to obtain the statistical characteristics of the skin color values;

step S15, skin color mean parameters and covariance matrix parameters are obtained according to statistical characteristic analysis, and a human face region segmentation model is obtained according to skin color mean parameters and covariance matrix parameter fitting;

the step of extracting the face region to be detected specifically comprises the following steps:

step S16, calculating to obtain a likelihood value matrix corresponding to the face image to be detected through a face region segmentation model, wherein the likelihood value matrix comprises the similarity between each pixel point in the face image to be detected and the skin color;

and step S17, sequentially separating the maximum similarity in the likelihood value matrix to separate the skin color area from the background area so as to extract the face area to be detected.

Preferably, the gaze tracking method, wherein the face feature points in step S2 include: canthus feature points;

step S2 specifically includes a step of acquiring the canthus feature points:

step S21A: acquiring a face area to be detected, and enhancing the gradient characteristics of the face area to be detected in the horizontal direction;

step S22A: respectively carrying out differential projection mapping on the horizontal direction and the vertical direction of the enhanced human face area to be detected to obtain an orbit contour;

step S23A: carrying out binarization processing on an eye image where the eye socket outline is located;

step S24A: searching white pixels from two ends to the center of the eye image after binarization processing, and setting a first white pixel obtained by searching as a first external canthus feature point and a second external canthus feature point;

searching white pixels from the center position to two ends of the eye image after binarization processing, and setting a first white pixel obtained by searching as a first inner canthus feature point and a second inner canthus feature point;

the face feature points in step S2 include: a mouth corner feature point;

step S2 specifically includes a step of acquiring a mouth corner feature point:

step S21B: acquiring a face area to be detected, and enhancing gradient characteristics of the face area to be detected in the face vertical direction;

step S22B: taking the position of the position with the height distance of six times of the orbit contour downwards by taking the lower boundary of the orbit contour as the boundary as the initial boundary of the mouth area;

step S23B: performing projection curve smoothing filtering based on the initial boundary, and then taking an extreme value at the lower position as a longitudinal coordinate of the mouth region;

searching an inflection point of the gray level projection curve upwards according to the position, and setting a vertical coordinate of the inflection point as an upper boundary of the mouth area;

searching an inflection point of the gray level projection curve downwards according to the position, and setting a vertical coordinate of the inflection point as a lower boundary of the mouth region;

creating an upper boundary and a lower boundary of the mouth region to obtain the mouth region;

step S24B: calculating a first mouth corner point and a second mouth corner point of the mouth region by adopting an SUSAN algorithm;

the face feature points in step S2 include: contour feature points;

step S2 specifically includes a step of acquiring contour feature points:

step S21C, carrying out denoising processing on the human face area to be detected, and calculating to obtain the gradient value of the human face area to be detected after denoising processing;

step S22C, performing enhancement processing on the gradient feature in the vertical direction of the detected face contour;

step S23C, performing binarization processing on the face contour subjected to enhancement processing to obtain an edge curve distributed in the vertical direction;

step S24C, searching from the first mouth corner point to the adjacent edge curve to obtain an edge point, and setting the edge point obtained by searching as a first contour feature point;

and searching from the second mouth corner point to the adjacent edge curve to obtain edge points, and setting the edge points obtained by searching as second contour feature points.

Preferably, the gaze tracking method, wherein the step S24B specifically comprises the steps of:

step S24B1, sliding a preset circular template in the mouth region, acquiring an absolute value of a difference between a gray value of a pixel point of the circular template and a gray value of a preset pixel point, and judging whether the pixel point and the preset pixel point belong to a check and identity value region according to the absolute value and an absolute value threshold;

step S24B2, calculating the homological quantity and homological area of all pixel points belonging to the homological area according to the comparison result;

step S24B 3: comparing the kernel equivalence number with a kernel equivalence number threshold value to obtain edge response according to a comparison result;

and step S24B4, taking pixel points corresponding to the two maximum edge responses as a first mouth corner point and a second mouth corner point.

Preferably, the sight tracking method includes that a Canny operator is adopted in step S21C to calculate and obtain a gradient value of the denoised face region to be detected; the Canny operator is composed of two volumes and templates in the x direction and the y direction.

Preferably, the gaze tracking method, wherein the step S3 specifically includes:

step S31, extracting the eye socket contour according to the face feature points;

step S32, image processing is carried out on the orbit contour, and a red component in the orbit contour is reserved to obtain a processed orbit contour;

step S33, taking the pixel point through which the most gradient vectors pass in the orbit contour as the iris gradient center;

step S34, the sliding window takes the iris gradient central point as the starting point to search the iris edge to obtain the iris outline;

and step S35, performing ellipse fitting on the iris outline by using a least square method, calculating the center coordinate of the ellipse obtained by fitting, and taking the center coordinate as the center position of the iris.

Preferably, the gaze tracking method, wherein the step S4 specifically comprises the steps of:

step S41, establishing a projection mapping matrix according to a projection mapping algorithm, wherein the projection mapping matrix is used for representing the projection mapping relation between the coordinates of the corresponding pixel points on the face image to be detected and the face image of the mirror image surface in an image coordinate system, and the first plane and the second plane are in a mirror image relation;

step S42, carrying out normalization processing on the projection mapping matrix to create a projection mapping formula, and carrying out position coordinates of the face characteristic points of the face image to be detected on the mirror image face image according to the projection mapping formula;

step S43, substituting the position coordinates of the reference points in the face characteristic points on the face image to be detected and the face image on the mirror image surface into a preset linear formula to obtain linear equations with preset quantity;

step S44, correcting the center position of the iris according to a projection mapping algorithm;

step S45, combining the corrected human face feature points and the corrected iris center position to obtain an eye movement vector;

step S46, calculating an initial fixation point coordinate according to a preset fixation point mapping formula and an eye movement vector;

and step S47, performing compensation calculation on the initial fixation point coordinates by using an SVR model to obtain final fixation point coordinates.

Preferably, the gaze tracking method, wherein the step S4 further includes a step of creating an SVR model, specifically including the steps of:

step A1, acquiring the calibration points of the preset calibration point number in the face image to be detected, and acquiring a training data set corresponding to the calibration points, which is acquired after the user performs the fixation motion on each calibration point;

and step A2, inputting the training data set into the initial model for SVR training to obtain the SVR model.

Preferably, the gaze tracking method, wherein the compensation calculation of the initial gaze point coordinates using the SVR model in step S4, comprises the following steps: calculating the offset of the initial fixation point coordinate in two directions through an SVR model sight line drop point compensation model;

and performing compensation calculation on the initial fixation point coordinate according to the offset in the two directions to obtain a final fixation point coordinate.

Also included is a gaze tracking system that combines skin tone and iris characteristics, comprising:

the extraction module is used for extracting a face region to be detected from a face image to be detected comprising a face to be detected by adopting a pre-trained face region segmentation model;

the characteristic point acquisition module is used for acquiring human face characteristic points with the number of preset characteristic points in the human face region to be detected by an enhanced gradient characteristic method;

the iris center calculation module is used for extracting an eye socket outline according to the characteristic points of the human face, searching and acquiring an iris edge in the eye socket outline by adopting a sliding window, and calculating to obtain an iris center position according to the iris edge;

and the fixation point calculation module corrects the iris center position and the face characteristic points by adopting a projection mapping algorithm, obtains eye movement vectors by combining the corrected face characteristic points and the corrected iris center position, calculates an initial fixation point coordinate according to the eye movement vectors, and performs compensation calculation on the initial fixation point coordinate by using an SVR (support vector regression) model to obtain a final fixation point coordinate.

The technical scheme has the following advantages or beneficial effects:

the final fixation point coordinate can be obtained without extra equipment by combining the skin color method with the eye movement vector obtained by calculation, so that the simplicity and the applicability of calculation are improved.

The iris edge in the orbit contour is obtained through sliding window searching, the iris center position is obtained through calculation according to the iris edge, the iris center position is corrected through a projection mapping algorithm, and the accuracy of the obtained iris center position can be improved; and an initial eye movement vector is obtained by combining the human face characteristic points and the corrected iris center position, an initial fixation point coordinate is obtained by calculation according to the eye movement vector, and the initial fixation point coordinate is compensated and calculated by using an SVR (singular value representation) model to obtain a final fixation point coordinate, so that the precision of the final fixation point coordinate is improved.

Drawings

Embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings. The drawings are, however, to be regarded as illustrative and explanatory only and are not restrictive of the scope of the invention.

FIG. 1 is a circular template diagram of an embodiment of the gaze tracking method of the present invention;

FIG. 2 is a diagram of a sliding window of an embodiment of the gaze tracking method of the present invention;

FIG. 3 is a schematic diagram of a sliding window search according to an embodiment of the gaze tracking method of the present invention;

FIG. 4 is a schematic view of a human face imaging method according to an embodiment of the gaze tracking method of the present invention;

fig. 5 is a haar rectangle feature diagram of the embodiment of the gaze tracking method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.

The invention comprises a sight tracking method combining skin color and iris characteristics, wherein the sight tracking method comprises the following steps of:

In the above embodiment, the purpose of the gaze tracking method is to detect the behavior of the user during online testing, and therefore, it is necessary to complete the calculation of the gaze fixation point and the drop point coordinates of the user without using additional hardware performance equipment; the method specifically comprises the following steps:

firstly, a face region segmentation model is constructed through a skin color method, face feature points with preset feature point quantity are obtained through positioning of the face region segmentation model based on the skin color method which is constructed in advance, face feature points are combined to obtain a face feature vector, and an orbit contour is extracted for subsequent face recognition and eye motion vector acquisition;

then, obtaining the iris edge by a sliding window searching method, calculating according to the iris edge to obtain the iris center position, and correcting the iris center position by using a projection mapping algorithm, so that the corrected iris center position and the previous face characteristic points jointly form an eye movement vector.

Wherein the sliding window comprises a left window, a right window and a sliding window center, as shown in fig. 2;

subsequently, correcting the eye movement vector by adopting a projection mapping algorithm so as to calculate and obtain a primary fixation point coordinate according to the corrected eye movement vector;

and finally, performing compensation calculation on the initial fixation point coordinate through an SVR (singular value representation) model to obtain a final fixation point coordinate.

In the embodiment, the skin color method and the eye movement vector obtained by calculation are combined, so that the final fixation point coordinate can be obtained without extra equipment, and the calculation simplicity and applicability are improved.

In the above embodiment, the iris edge in the orbit contour is obtained by sliding window search, as shown in fig. 3, where the dotted line is used to represent the moving route of the sliding window, so as to obtain the iris center position by calculation according to the iris edge, and the iris center position is corrected by using a projection mapping algorithm, which can improve the accuracy of the obtained iris center position; and an initial eye movement vector is obtained by combining the human face characteristic points and the corrected iris center position, an initial fixation point coordinate is obtained by calculation according to the eye movement vector, and the initial fixation point coordinate is compensated and calculated by using an SVR (singular value representation) model to obtain a final fixation point coordinate, so that the precision of the final fixation point coordinate is improved.

Further, in the above embodiment, step S1 specifically includes a step of creating a face region segmentation model and a step of extracting a face region to be detected:

the step of creating the face region segmentation model specifically comprises:

step S12, preprocessing the training face image to obtain a processed image;

step S13, mapping the RGB color space of the processed image to the YCbCr color space, obtaining the chroma vector of each pixel point of the processed image, and obtaining the chroma vector sample set C corresponding to the training image set_b-C_rThe chroma vector sample set comprises a chroma vector (C) of each pixel point of the processing image corresponding to each training image in the training image set_b,C_r)；

Mapping the RGB color space of the processed image to the YCbCr color space by adopting the following formula;

wherein, in the above formula (1), Y represents luminance in the YCbCr color space;

cb represents a blue density offset in the YCbCr color space;

cr represents a red density offset in the YCbCr color space;

r represents a red component in the RGB color space;

g represents a green component in the RGB color space;

b denotes a blue component in the RGB color space.

Step S14, calculating the spatial distribution of the chroma vector sample set Cb-Cr to obtain the statistical characteristics of the skin color values, as shown in the following formula (2):

step S15, obtaining skin color mean parameter and covariance matrix parameter according to statistical characteristic analysis, as shown in the following formula (2), obtaining a face region segmentation model according to skin color mean parameter and covariance matrix parameter fitting;

wherein, in the above formula (2), m is used to represent the skin color mean parameter;

for representing YC in images_bC_rAn average value of blue density shift amounts in the color space;

an average value representing a red density shift amount in a YCbCr color space in an image;

t is used to represent transpose;

n is used for representing the number of image pixel points;

i is used for representing the ith in the current blue/red concentration offset, and the value range of I is [1, N ];

e is used to represent an identity matrix;

x_mcolor vector (C) for representing each pixel point_b,C_r)^T；

C is used for representing covariance matrix parameters;

σ is used to represent the value of x-m;

as a preferred embodiment, the face region segmentation model may select a single gaussian model of skin color distribution, and the single gaussian model of skin color distribution may be used to calculate the similarity between all pixel points in the image and the skin color, i.e. the likelihood value of the skin color. After the likelihood value matrix is obtained, the maximum value in the likelihood value matrix is utilized to carry out normalization operation, then binarization and morphological processing are carried out, the skin color area and the background can be separated, a preliminary candidate area possibly containing the face is obtained, and the more accurate candidate area containing the face can be obtained.

step S16, calculating to obtain a likelihood value matrix corresponding to the face image to be detected through a face region segmentation model, wherein the likelihood value matrix comprises the similarity between each pixel point in the face image to be detected and the skin color as shown in the following formula (3);

P_a(C_b,C_r)＝exp[-0.5(x-m)^TC^-1(x-m)]；(3)

wherein, in the above formula (3), P_a(C_b,C_r) For representing a matrix of likelihood values;

In the above embodiment, the training face images in the training image set include different race, age, gender, body area (including face, neck, trunk, and limbs), and skin color images under different illumination conditions, so that accuracy of the created face area segmentation model is improved, and accuracy of extracting the face area to be detected is improved.

Further, in the above embodiment, step S12 specifically includes: and denoising the training face image to obtain a processed image.

Further, in the above embodiment, the denoising process is performed by a gaussian filtering operation.

Further, in the above embodiment, the face feature points in step S2 include: the corner of the eye characteristic point, the corner of the mouth characteristic point, and the intersection point of the extension line of the corner of the mouth and the face outline.

Further, in the above embodiment, step S2 specifically includes:

the method comprises the steps of selecting gray level features to segment lovers, and then using face feature enhancement to reduce noise and other infection factors as much as possible to obtain feature points. Wherein, it is necessary to decide whether to use the enhanced gradient feature in the horizontal direction of the face or to process the image by using the enhanced gradient feature in the vertical direction of the face according to the matched positions (eyes, nose, mouth), as shown in equations (4) and (5), and then to segment the image by using the gray scale integral projection, as shown in equation (6).

It should be noted that the eye feature points need to enhance the gradient feature of the face in the horizontal direction, the mouth corner feature points need to enhance the gradient feature of the face in the vertical direction, and the nose and mouth corners do not need to enhance the gradient feature.

G_g1(x,y)＝f(x,y)*g₁+f(x,y)*g₂；(4)

G_g2(x,y)＝f(x,y)*g₃+f(x,y)*g₄；(5)

Wherein f (x, y) in formulas (4) and (5) is the value of pixel point (x, y) in the face image to be detected, and Gg in formula (4)₁(x, y) is an image in which the horizontal gradient is enhanced, and Gg in formula (5)₂(x, y) is an image after the vertical direction gradient is enhanced. S in formula (3)_h(y) is the integral projection in the horizontal direction, S_v(x) For the integral projection in the vertical direction, f (x, y) is the gray value at the pixel point (x, y) in the image. x is the number of_left，x_rightThe gray value at the leftmost side and the gray value at the rightmost side of a row of pixel points are set; y is_down，y_upThe gray value of the lowest edge and the gray value of the highest edge of a column of pixel points.

Further, in the above embodiment, the face feature points in step S2 include: canthus feature points;

step S2 specifically includes a step of acquiring the canthus feature points:

step S24A: searching white pixels from two ends to the center of the eye image after binarization processing, setting a first white pixel obtained by searching as a first external canthus feature point and a second external canthus feature point, namely searching white pixels from the leftmost side and the rightmost side to the middle of the eye image after binarization processing, and respectively marking the first white pixel as a first external canthus feature point (the first external canthus feature point at this time can be a left external canthus feature point) and a second external canthus feature point (the second external canthus feature point at this time can be a right external canthus feature point);

searching white pixels from the center position to two ends of the eye image after the binarization processing, setting a first white pixel obtained by searching as a first inner canthus feature point and a second inner canthus feature point, namely, searching the white pixels from the leftmost side and the rightmost side of the eye image after the binarization processing, and respectively recording the first white pixel as the first inner canthus feature point (the first inner canthus feature point at this time can be the left inner canthus feature point) and the second inner canthus feature point (the second inner canthus feature point at this time can be the right inner canthus feature point).

Further, in the above embodiment, the face feature points in step S2 include: a mouth corner feature point;

step S2 specifically includes a step of acquiring a mouth corner feature point:

searching an inflection point of the gray projection curve upwards according to the position (namely the vertical coordinate of the mouth area), and setting the vertical coordinate of the inflection point as the upper boundary of the mouth area;

searching an inflection point of the gray projection curve downwards according to the position (namely the vertical coordinate of the mouth area), and setting the vertical coordinate of the inflection point as the lower boundary of the mouth area;

step S24B: a first mouth corner point and a second mouth corner point of the mouth region are calculated using the SUSAN algorithm.

Further, in the above embodiment, the step S24B specifically includes the following steps:

preferably, a preset circular template can be adopted to slide in the mouth region, and an absolute value of a difference between a gray value of a pixel point of the circular template and a gray value of a preset pixel point is obtained, wherein the preset pixel point is a central pixel point of the circular template;

the circular template is shown in figure 1;

comparing the absolute value corresponding to the pixel point with an absolute value threshold, determining that the pixel point and the preset pixel point belong to a same-value check region (USAN) when the absolute value corresponding to the pixel point is smaller than or equal to a preset absolute value threshold, and determining that the pixel point and the preset pixel point do not belong to the same-value check region when the absolute value corresponding to the pixel point is larger than the preset absolute value threshold;

the comparison method can be shown in the following formula (7):

the circular template is shown in figure 1;

acquiring a ratio between an absolute value corresponding to a pixel point and an absolute value threshold, and performing calculation as shown in the following formula 8 to obtain a comparison result, wherein when the comparison result is 1, the pixel point and the preset pixel point are determined to belong to a same-value check region (USAN), and when the comparison result is 0, the pixel point and the preset pixel point are determined not to belong to the same-value check region;

the comparison method may be as shown in the following formula (8):

in the above-mentioned formulas (7) and (8),

r₀a center pixel point for representing a circular template;

r is used for representing pixel points in the circular template;

I(r₀) For the expression r₀The gray value of the central pixel point;

i (r) is used for representing the gray value of r pixel points;

t is a threshold for representing a gray difference;

c_rfor representing the result of the comparison, c_rWhen r and r are 1₀Belonging to the Kernel homonym region, c_rR and r0 do not belong to the same-valued area when being equal to 0;

it should be noted that the comparison result obtained by the formula (8) is more accurate than that obtained by the formula (7).

calculating the number of the parity values of all the pixel points belonging to the parity value area according to the following formula (9);

wherein n (r0) is used for representing the number of the core identity values;

r₀a center pixel point for representing a circular template;

r is used for representing pixel points in the circular template;

d (r) is used for representing the areas of all pixel points under the circular template;

wherein, the smaller the USAN region, the larger the edge response;

the comparison formula is shown in the following formula (10):

wherein R (R)₀) For representing an edge response;

r₀a center pixel point for representing a circular template;

g is used for representing the kernel equivalence number threshold, wherein g is 0.75n_maxN is used to denote the USAN pixel number, n_maxMaximum value used to represent the number of USAN pixels in SUSAN algorithm;

n(r₀) Used for representing the number of the core identity values;

step S24B4, taking pixel points corresponding to the two largest edge responses as a first mouth corner point and a second mouth corner point;

the leftmost point (i.e., the left mouth corner point) after the filtering may be used as the first mouth corner point, and the rightmost point (i.e., the right mouth corner point) may be used as the second mouth corner point.

Further, in the above embodiment, the face feature points in step S2 include: contour feature points;

step S2 specifically includes a step of acquiring contour feature points:

step S21C, carrying out denoising processing on the human face area to be detected, and calculating the gradient value of the human face area to be detected after denoising processing;

it should be noted that the gradient value is different from the gradient feature, the gradient value is a constant, and the gradient feature is a vector.

And (3) calculating by using a canny operator to obtain a gradient value and a gradient direction, carrying out non-maximum suppression on the gradient value, and detecting and connecting edges by using a dual-threshold algorithm to obtain more accurate image edges.

The step S21C may specifically include the following steps: and denoising the human face region to be detected by using Gaussian filtering, and calculating the gradient value of the denoised human face region to be detected by using a Canny operator.

Step S22C, suppressing the non-maximum value of the gradient value, detecting by using a dual-threshold algorithm and connecting edge detection to obtain an image edge, wherein the image edge is a face contour, and enhancing the gradient feature in the vertical direction of the detected face contour;

It should be noted that, when the first mouth corner point is the left mouth corner point, the adjacent edge curve is on the left side of the left mouth corner point, that is, the edge point is searched on the adjacent edge curve from the left mouth corner point to the left;

similarly, when the second mouth corner point is the right mouth corner point, the adjacent edge curve is right to the right mouth corner point.

Further, in the above embodiment, in step S21C, the gaussian filtering is used to perform denoising processing on the human face region to be detected, as shown in the following formula (11):

wherein x is used to represent the abscissa;

G_f(x) For representing gaussian filtering;

σ is used to represent the value of x-m.

Further, in the above embodiment, in step S21C, a Canny operator is used to calculate and obtain the gradient value of the denoised face region to be detected, as shown in the following formulas (12) to (16).

The Canny operator is composed of two volumes and templates in the x direction and the y direction. And f (x, y) is set to represent the gray value of the point (x, y), and the first-order partial derivatives in the x direction and the y direction are calculated respectively.

Wherein, in the above formulas (12) to (16),

sx and Sy represent two volumes and templates of a Canny operator in the x direction and the y direction;

f (x, y) is expressed as a gray value of the point (x, y);

P_b(x, y) and S (x, y) represent the first partial derivatives in the x and y directions;

m (x, y) represents a gradient value;

θ (x, y) represents the gradient direction.

Further, in the above embodiment, step S3 specifically includes:

wherein, the number of gradient vectors passing through the pixel point is calculated by the following formula (17);

wherein, c_cThe number of gradient vectors passing through the pixel points is calculated; wherein the center of the iris gradient is the point through which the most gradient vectors pass, thus pointing from the center of the iris gradient to point p_iNormalized vector d of_iAnd normalized gradient g_iThe dot product of (a) is maximum;

P_ifor indicating the pointing point from the gradient center c;

d_ia normalized vector for representing a pointing point of a pixel point;

g_ia normalized gradient at a pointing point representing a pixel point;

i is used for representing pixel points;

c is used to represent the gradient center;

n is used for representing the number of pixel points of the orbit contour;

w_cfor indicating the probability that the pixel value becomes the center position of the iris gradient, the lower the pixel value, the greater the probability, wherein, w_c＝255-I(c_x,c_y) Wherein the lower the pixel value, the more likely the position of the center of the iris gradient;

x_ian abscissa for representing the ith point;

y_ia ordinate for representing the ith point;

for representing mathematical symbols, derivation;

as shown in the following equation (18):

wherein, I_iAnd I_j(i ═ 1,2, …, N, j ═ 1,2, …, N) are the pixel values of the pixels in the left and right windows of the sliding window, respectively;

n is used for representing the number of pixel points in the sliding window;

when the sliding window searches to the right (the searching direction theta and the initial rotation angle theta of the eye)_oIs at an angle difference of

Inner), the energy E of the sliding window is the difference between the average of the pixels of the right window and the average of the pixels of the left window, and the opposite is true when searching to the left.

Taking sliding to the right as an example, starting from a starting point, the left window and the right window are both in the iris, the pixel value difference is small, when the window slides to the right window and begins to leave the iris feature, a white pixel region begins to appear in the right window, the pixel average value of the right window is increased at the moment, the left window is still in the iris, and the pixel average value is basically kept unchanged. Thereafter, the energy E of the sliding window slowly increases as it slides to the right, and slowly decreases as the right window leaves the iris region and the left window also begins to leave the iris region until the entire sliding window leaves the iris region.

Thus, the iris profile (where there may be small peaks, resulting from screen reflection hot spots) can be obtained by sliding the peaks of the energy E of the window.

Step S35, carrying out ellipse fitting on the iris outline by using a least square method, calculating the center coordinate of an ellipse obtained by fitting, and taking the center coordinate as the center position of the iris;

wherein, ellipse fitting is performed on the iris outline as shown in the following formula (19):

x²+a₀xy+a₁y²+a₂x+a₃y+a₄＝0；(19)

in the above formula (19), x is used to represent the abscissa;

a0, a1, a2, a3 and a4 are used to represent fitting parameters;

y is used to represent the ordinate;

wherein, the center coordinate of the ellipse obtained by fitting is calculated by the following formula (20);

in the above formula (20), I_xAn abscissa for representing the center of the ellipse;

I_yfor indicating the ordinate of the ellipse center.

Further, in the above embodiment, step S4 specifically includes the following steps:

step S41, establishing a projection mapping matrix according to a projection mapping algorithm, wherein the projection mapping matrix is used for representing the projection mapping relation between the coordinates of the corresponding pixel points on the face image to be detected (marked as a first plane Q1) and the face image of the mirror image (marked as a second plane Q2) in an image coordinate system, and the first plane Q1 and the second plane Q2 are in a mirror image relation, as shown in FIG. 4;

wherein the numbers h11-h33 represent 9 mapping coefficients;

H_pa projection mapping matrix for representing a projection mapping relationship;

step S42, carrying out normalization processing on the projection mapping matrix to create a projection mapping formula, and carrying out position coordinates of the human face characteristic points of the human face image Q1 to be detected on the mirror face human face image Q2 according to the projection mapping formula;

in a preferred embodiment, the projection mapping matrix is normalized, and h in the above formula (21) is used₃₃Set to 1 and then converted to the following formula (22), it can be calculated when the face image to be measured is on the first plane Q₁The face characteristic points after position imaging are on a second plane Q₂The position coordinates of (a);

step S43, substituting the position coordinates of the reference points in the human face characteristic points on the human face image Q1 to be detected and the human face image Q2 on the image surface into a preset linear formula to obtain linear equations with preset quantity;

as a preferred embodiment, 4 feature points including the first external canthus feature point, the second external canthus feature point, the first mouth corner point and the second mouth corner point are used as reference points, and the position coordinates of the reference point of the face image Q1 to be measured on the mirror image plane face image Q2, the position coordinates of the 4 reference points on the mirror image plane face image Q2 and the position coordinates of the 4 reference points on the face image Q1 are respectively substituted into the formula (23) according to the projection mapping formula to obtain 8 linear equations.

Step S44, the center position of the iris is corrected according to the projection mapping algorithm, as shown in the following formula (24):

wherein s is used for representing a calculation parameter;

i is used for representing the center position of the iris in the face image to be detected;

the iris center position is used for representing a facial image on a mirror image surface;

step S45, obtaining an eye movement vector by combining the corrected face feature point and the corrected iris center position, where the eye movement vector is expressed by the following equations (25) to (26):

in the above-described formulas (25) to (27),

expressed as corrected iris center positions, C, of the left and right eyes, respectively_ilAnd C_olRespectively expressed as corrected inner and outer corner positions of the left eye, C_irAnd C_orRespectively, as the positions of the inner and outer corner points of the corrected right eye, and v represents the corrected eye movement vector.

o denotes outer, i denotes inner, l denotes left, r denotes right;

v represents the finally obtained corrected eye movement vector, px, py represents the horizontal and vertical coordinates of the fixation point obtained by the fixation point mapping formula;

in step S47, the initial gaze point coordinates are compensated and calculated using the SVR model to obtain the final gaze point coordinates, as shown in the above equation (27).

Further, in the above embodiment, step S4 further includes a step of creating an SVR model, which specifically includes the following steps:

As a preferred embodiment, when a user corresponding to a face image to be detected gazes at a designated calibration point (5 points out of 9 points are selected because each point has a long sample acquisition time, for example, SVR training can be performed on five points, i.e., 0, 2, 4, 6, and 8, in fig. 5), while keeping watching the same point, head movement (depth movement is performed first, then free head movement is performed, rotation, offset, pitch, and translation are performed in the east) is performed, feature information is acquired in real time, and an input sample vector Vi is formed. Meanwhile, the eye movement vector is corrected, and a fixation point estimated value is obtained through a polynomial mapping model, and the displacement deviation of the fixation point estimated value from the real coordinate value is (delta x, delta y). Two directional training sets are constructed respectively: { (V)₁,Δx₁),K,(V_N,Δx_N)}，{(V₁,Δy₁),K,(V_N,Δy_N)}. N is the number of samples, and the two training sets are trained separately;

the initial model is a vector regression model, wherein the vector regression model can be an RBF radial basis function, and the RBF radial basis function can perform regression on complex relations. Secondly, performing parameter searching on the RBF radial basis function by adopting a grid searching method to obtain an initial model, wherein main searching parameters of the initial model are a balance parameter C, a loss function parameter epsilon and a kernel parameter gamma;

and performing SVR training on the two direction training sets by using the initial model to obtain an optimal regression model, wherein the optimal regression model is an SVR model.

Further, in the above embodiment, the performing compensation calculation on the initial gazing point coordinates by using the SVR model in step S4 specifically includes the following steps: the offset amounts in two directions of the initial gaze point coordinate calculated by the SVR model gaze point compensation model are shown in the following equation (28):

Y_xfor representing an offset on the abscissa;

Y_yfor indicating an offset on the ordinate;

Further, in the above embodiment, performing compensation calculation on the initial gaze point coordinates according to the offset in two directions to obtain final gaze point coordinates specifically includes the following steps:

step B1: recording an input vector X as head motion information, recording the offset of a sight line falling point rotating relative to the offset of a calibration position as (delta X, delta y), and starting to process different offset changes of different motions;

step B2: for depth motion, namely motion of the head perpendicular to the image plane, which is embodied as scale transformation information on the image, the depth motion can be expressed by adopting the proportion of the distance between the inner and outer corner points of the left and right eyes and the distance between the inner and outer corner points of the calibration position;

step B3: for translational motion, namely motion of a head parallel to an image plane, the change of coordinates of characteristic points is reflected on an image and is expressed by adopting coordinate displacement of the middle point of a line segment formed by internal angles;

step B4: for the left-right rotation movement, the characteristic point coordinate change and the left-right eye corner point distance change are reflected on the image;

step B5: for left-right tilting motion, the variation of the tilt angle of the connecting line of the inner corners is reflected on the image;

step B6: for up-down pitching motion, the coordinate change of the characteristic points and the distance change of left and right eye corner points are reflected on the image;

the final fixation point coordinate value is an offset calculated by a polynomial mapping equation and a superimposed sight line landing point compensation model, as shown in formula (29).

(S_x,S_y)＝(P_x,P_y)+(Y_x,Y_y)；(29)

It should be noted that, the embodiments of the gaze tracking system of the present invention are the same as the embodiments of the gaze tracking method, and are not described herein again.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A sight tracking method combining skin color and iris characteristics is characterized by comprising the following steps:

and step S4, correcting the iris center position by adopting a projection mapping algorithm, obtaining an eye movement vector by combining the human face characteristic points and the corrected iris center position, correcting the eye movement vector by adopting the projection mapping algorithm, calculating to obtain an initial fixation point coordinate according to the corrected eye movement vector, and performing compensation calculation on the initial fixation point coordinate by using an SVR (singular value representation) model to obtain a final fixation point coordinate.

2. The gaze tracking method according to claim 1, wherein the step S1 specifically comprises a step of creating the face region segmentation model and a step of extracting a face region to be detected:

the step of creating the face region segmentation model specifically includes:

step S12, preprocessing the training face image to obtain a processed image;

step S13, mapping the RGB color space of the processed image to the YCbCr color space, obtaining a chrominance vector of each pixel of the processed image, obtaining a chrominance vector sample set corresponding to the training image set, where the chrominance vector sample set includes a chrominance vector of each pixel of the processed image corresponding to each of the training images in the training image set;

step S15, skin color mean parameters and covariance matrix parameters are obtained according to the statistical characteristic analysis, and the face region segmentation model is obtained according to the skin color mean parameters and the covariance matrix parameters in a fitting mode;

step S16, calculating to obtain a likelihood value matrix corresponding to the face image to be detected through the face region segmentation model, wherein the likelihood value matrix comprises the similarity between each pixel point in the face image to be detected and skin color;

and step S17, sequentially separating the maximum similarity in the likelihood value matrix to separate a skin color area from a background area so as to extract a face area to be detected.

3. The gaze tracking method according to claim 1, wherein the face feature points in step S2 include: canthus feature points;

the step S2 specifically includes a step of acquiring the canthus feature point:

the face feature points in step S2 include: a mouth corner feature point;

the step S2 specifically includes a step of acquiring the mouth angle feature point:

the face feature points in step S2 include: contour feature points;

the step S2 specifically includes the step of acquiring the contour feature points:

4. The gaze tracking method according to claim 3, wherein the step S24B specifically comprises the steps of:

5. The gaze tracking method according to claim 3, wherein in step S21C, a Canny operator is used to calculate and obtain a gradient value of the denoised face region to be detected; the Canny operator is composed of two volumes and templates in the x direction and the y direction.

6. The gaze tracking method according to claim 1, wherein the step S3 specifically comprises:

step S34, the sliding window takes the iris gradient central point as a starting point to carry out iris edge search so as to obtain an iris outline;

7. The gaze tracking method according to claim 1, wherein the step S4 specifically comprises the steps of:

step S41, establishing a projection mapping matrix according to a projection mapping algorithm, wherein the projection mapping matrix is used for representing the projection mapping relation between the coordinates of the corresponding pixel points on the face image to be detected and the face image on the mirror image surface in an image coordinate system, and the first plane and the second plane are in a mirror image relation;

step S42, carrying out normalization processing on the projection mapping matrix to create a projection mapping formula, and carrying out position coordinates of the human face characteristic points of the human face image to be detected on the mirror image face human face image according to the projection mapping formula;

step S46, calculating an initial fixation point coordinate according to a preset fixation point mapping formula and the eye movement vector;

8. The gaze tracking method according to claim 1, wherein the step S4 further comprises the step of creating an SVR model, in particular comprising the steps of:

step A1, acquiring the calibration points of the number of the preset calibration points in the face image to be detected, and acquiring a training data set corresponding to the calibration points, which is acquired after the user performs the fixation motion on each calibration point;

9. The gaze tracking method according to claim 1, wherein the compensation calculation of the initial gaze point coordinates using the SVR model in step S4 comprises the following steps: calculating the offset of the initial fixation point coordinate in two directions through an SVR model sight line drop point compensation model;

10. An eye tracking system that combines skin tone and iris characteristics, comprising:

the characteristic point acquisition module is used for acquiring the human face characteristic points with the preset number of characteristic points in the human face area to be detected by an enhanced gradient characteristic method;

the iris center calculation module is used for extracting an eye socket outline according to the human face characteristic points, searching and acquiring an iris edge in the eye socket outline by adopting a sliding window, and calculating to obtain an iris center position according to the iris edge;

and the fixation point calculation module corrects the iris center position and the face characteristic points by adopting a projection mapping algorithm, obtains eye movement vectors by combining the corrected face characteristic points and the corrected iris center position, calculates to obtain initial fixation point coordinates according to the eye movement vectors, and performs compensation calculation on the initial fixation point coordinates by using an SVR (singular value representation) model to obtain final fixation point coordinates.