CN114445267B

CN114445267B - Eye movement tracking method and device based on retina image

Info

Publication number: CN114445267B
Application number: CN202210108603.8A
Authority: CN
Inventors: 殷琪; 付鹏; 李凯文; 刘若阳; 徐梦晨; 张�杰
Original assignee: Nanjing Boshi Medical Technology Co ltd
Current assignee: Nanjing Boshi Medical Technology Co ltd
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2024-02-06
Anticipated expiration: 2042-01-28
Also published as: CN114445267A

Abstract

The application discloses an eye movement tracking method and device based on retina images, and relates to the field of image processing. Selecting a reference frame local area from the reference frame images by acquiring the reference frame images and the current frame images in the video, wherein the size of the reference frame local area is smaller than that of the reference frame images; generating an affine transformation image from the current frame image through an affine transformation matrix, and finding out an area with the same position as the local area of the reference frame in the affine transformation image as a new local area; acquiring offset between a reference frame local area and a new local area, and taking the offset as a tracking result of the current moment; and acquiring a current frame image at the next moment, updating the affine transformation matrix, and returning to the step of generating an affine transformation image from the current frame image through the affine transformation matrix so as to repeatedly acquire the offset, thereby realizing real-time tracking. Because the local area is far smaller than the original image, the structure characteristics of the effective contents in the image can be contained, the calculation complexity is effectively reduced, and the real-time performance is improved.

Description

Eye movement tracking method and device based on retina image

Technical Field

The present disclosure relates to the field of image processing, and in particular, to an eye tracking method and apparatus based on retinal images.

Background

Ophthalmic devices commonly used for diagnosis and treatment have imaging portions, such as scanning laser ophthalmoscopes; the imaging portion may be used to obtain retinal images that track eye movement, particularly in therapeutic applications. Some application schemes of augmented Reality (Augmented Reality, AR)/Virtual Reality (VR) also employ eye tracking based on retinal images for real-time feedback to a display system.

In ophthalmic medical systems, typically for tracking real-time, the imaging portion acquires images at a high frame rate while requiring a sufficiently small computational time for post-processing. In addition, eye movement affects the illumination received by the retina, resulting in a large change in brightness across the image. Resulting in difficulty in achieving satisfactory confidence during eye tracking or in post-processing of the image, affecting the accuracy and reliability of the system.

Tracking methods can be divided into two paths: one is to calculate transformation parameters such as template matching, cross-correlation method, information entropy maximization and the like according to the overall similarity between images, and the other is to extract characteristic points such as Haar-like features (Haar-like features), scale-invariant feature transformation (Scale-invariant feature transform, SIFT) and the like in the images respectively based on an image registration method and then find matching point solution transformation parameters.

However, the former method tends to have fewer computable transformation parameters or high computational complexity and is susceptible to illumination variation and imaging quality, and thus is more limited.

In view of the above problems, designing an eye tracking method based on retinal images with low computational complexity is a problem to be solved by those skilled in the art.

Disclosure of Invention

The purpose of the application is to provide an eye movement tracking method and device based on retina images, which can reduce the calculation complexity of eye movement tracking.

In order to solve the above technical problems, the present application provides an eye tracking method based on retina images, including:

s1, acquiring a reference frame image and a current frame image in a video;

s2, selecting a reference frame local area from the reference frame image, wherein the size of the reference frame local area is smaller than that of the reference frame image;

s3, generating an affine transformation image from the current frame image through an affine transformation matrix, and finding out an area with the same position as the local area of the reference frame in the affine transformation image as a new local area;

s4, acquiring the offset between the reference frame local area and the new local area, and taking the offset as a tracking result of the current moment;

s5, updating the affine transformation matrix, acquiring a current frame image at the next moment, and returning to the step S3 to track the eye movement in real time.

Preferably, in step S5, the specific step of updating the affine transformation matrix includes:

acquiring an inverse matrix of the affine transformation matrix;

selecting reference point coordinates in the reference frame local area, overlapping the reference point coordinates with the offset, substituting the reference point coordinates and the inverse matrix into the affine transformation matrix, and calculating to obtain target coordinates of the reference point under the current frame image coordinate system;

and acquiring a new affine transformation matrix according to the reference point coordinates and the target coordinates.

Preferably, steps S3 and S4 are replaced with:

s31, generating an affine transformation image from the reference frame image through an affine transformation matrix to obtain a transformed reference frame local area, and finding an area with the same position as the transformed reference frame local area in the current frame image as a new local area;

s41, obtaining the offset between the transformed reference frame local area and the new local area as a tracking result of the current moment.

selecting reference point coordinates in the reference frame local area, and superposing the coordinates obtained by transforming the reference point coordinates in the step S31 with the offset to obtain new coordinates;

and acquiring a new affine transformation matrix according to the reference point coordinates and the new coordinates.

Preferably, the affine transformation matrix is in the form of:

wherein s is the zoom magnification, θ is the rotation angle, t _x Is the translation value in the x-axis direction, t _y A translation value in the y-axis direction; wherein the number of the new local areas is more than two; when the rotation angle θ is set to 0 and the zoom ratio s is set to 1, the number of the new partial areas is one or more.

Preferably, after step S3 or step S31, further comprising:

judging whether the image brightness of the new local area meets a preset condition or not;

if not, rejecting the new local area;

if yes, the flow proceeds to step S4 or step S41.

Preferably, after step S4 or step S41, further includes:

judging whether the tracking result meets the preset requirement;

if not, tracking again;

if yes, the process proceeds to step S5.

Preferably, the specific step of judging whether the tracking result meets the preset requirement is as follows:

judging whether the image confidence of the reference frame local area and the new local area meets a threshold value or whether the difference of the offset of two continuous moments is within a preset range.

Preferably, the specific steps of re-tracking are as follows:

setting the translation parameter of the current frame image to be 0, and carrying out affine transformation on the current frame image by a set of set rotation parameters and scaling parameters to obtain a new current frame image;

acquiring the offset between each reference frame local area and the new current frame image;

determining the corresponding relation between the position of each reference frame local area in the reference frame image and the position of each reference frame local area in the current frame image according to the offset to solve the affine transformation matrix;

judging whether the re-tracking is successful or not;

if yes, returning to the step S3;

if not, affine transformation is carried out on the current frame image by another set of set rotation parameters and scaling parameters to obtain a new current frame image, and the step of obtaining the offset between each reference frame local area and the new current frame image is returned.

In order to solve the above technical problem, the present application further provides an eye tracking device based on a retinal image, including:

the first acquisition module is used for acquiring a reference frame image and a current frame image in the video;

a selecting module, configured to select a reference frame local area from the reference frame image, where the size of the reference frame local area is smaller than the size of the reference frame image;

the generation module is used for generating an affine transformation image of the current frame through an affine transformation matrix; or generating a reference frame affine transformation image by the reference frame image through an affine transformation matrix;

a second obtaining module, configured to find, in an affine transformation image of a current frame, a region with the same position as the reference frame local region as a new local region, and obtain an offset between the reference frame local region and the new local region;

or obtaining a transformed reference frame local area in a reference frame affine transformation image, finding an area with the same position as the transformed reference frame local area in the current frame image as a new local area, and obtaining the offset between the transformed reference frame local area and the new local area; and the third acquisition module is used for acquiring the current frame image at the next moment, updating the affine transformation matrix and sequentially triggering the generation module and the second acquisition module so as to track the eye movement in real time.

Compared with the prior art, the eye tracking method based on the retina image has the following beneficial effects:

(1) Because the size of the local area is far smaller than the size of the original image in the eye movement tracking process, the eye movement tracking method can contain smaller structural features of effective contents in the image, effectively reduces the calculation complexity compared with the whole image tracking, reduces the requirement on equipment hardware, and improves the real-time performance of eye movement tracking.

(2) By selecting a plurality of local areas, more accurate and reliable parameter estimation can be obtained, and small-range image rotation and scaling caused by eye movement can be covered, so that a more robust tracking effect is achieved.

(3) Meanwhile, through switching between the two states of real-time tracking and re-tracking, the method can position the eyeballs and re-track the eyeballs after the conditions of image deformation and the like caused by blink or excessive eyeball movement occur.

Drawings

For a clearer description of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an eye tracking method based on a retinal image according to an embodiment of the present application;

fig. 2 is a schematic view of selecting a partial region of a reference frame according to an embodiment of the present application;

fig. 3 is a schematic diagram of a current frame image according to an embodiment of the present application;

fig. 4 is a schematic diagram of an affine transformation image provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a position of a new local area in a new current frame image according to an embodiment of the present application;

fig. 6 is a schematic diagram of a location of a local area of an original current frame image according to an embodiment of the present application;

FIG. 7 is a flowchart of another eye tracking method based on retinal images provided in an embodiment of the present application;

FIG. 8 is a flowchart of another eye tracking method based on retinal images provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of an eye tracking device based on a retinal image according to an embodiment of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments herein without making any inventive effort are intended to fall within the scope of the present application.

The core of the application is to provide an eye movement tracking method and device based on retina images, so as to reduce the calculation complexity of eye movement tracking.

In order to provide a better understanding of the present application, those skilled in the art will now make further details of the present application with reference to the drawings and detailed description.

Fig. 1 is a flowchart of an eye tracking method based on a retinal image according to an embodiment of the present application. It will be appreciated that in ophthalmic diagnostic systems, typically for tracking real-time, the imaging portion acquires images at a high frame rate, while requiring a sufficiently small computational time for post-processing. In addition, eye movement affects the illumination received by the retina, resulting in a large change in brightness across the image. Resulting in difficulty in achieving satisfactory confidence during eye tracking or in post-processing of the image, affecting the accuracy and reliability of the system. In order to solve the above problems, as shown in fig. 1, the eye tracking method includes:

s1, acquiring a reference frame image and a current frame image in a video.

S2, selecting a reference frame local area from the reference frame image, wherein the size of the reference frame local area is smaller than that of the reference frame image.

S3, generating an affine transformation image from the current frame image through an affine transformation matrix, and finding out the area with the same position as the local area of the reference frame in the affine transformation image as a new local area.

And S4, acquiring the offset between the reference frame local area and the new local area, and taking the offset as a tracking result of the current moment.

In this embodiment, to track the eye movement, a reference image of the retina of the eyeball in the eye movement video stream needs to be acquired first for tracking with this as a reference in the subsequent eye movement image. Therefore, one frame of image in the video is required to be acquired or a continuous multi-frame image is required to be acquired and is taken as a reference frame after being overlapped and averaged, the acquisition mode of the specific reference frame image is not limited, and the specific implementation situation is determined. Image processing means such as filtering, edge enhancement and the like can be performed on the acquired reference frame image to improve the image quality, and the method is not limited.

It should be noted that the reference image selected for comparison with the subsequent image in the present application is not the whole of the reference frame image, but a part thereof, i.e. a local area. The local region may be a region of interest (Region Of Interest, ROI) obtained by using various operators and methods in the prior art, or may be a specific small region selected manually, which is not limited in this embodiment. Specifically, at least two local areas which are far smaller than the whole size of the reference frame image are selected from the reference frame image, and are called reference frame local areas. It should be noted that the local regions selected from the reference frame image may be distributed over the various regions of the image. The size of the local area is generally about a few tenths of the overall size of the image, and can contain smaller structural features of the image's active content, such as twice the width of the retinal large blood vessels, the diameter of the macula fovea, etc. In an exemplary embodiment, a retinal image of 1024x1024 pixels in size will select a localized area of between 16x16 and 96x 96. The selection of the local area of the reference frame may be manual selection of areas with high contrast, such as blood vessels, video discs, etc., or automatic calculation, such as calculation of the mean and variance of pixel values in all sliding windows in a sliding window manner, and finally selecting a sliding window meeting the conditions according to a set threshold as a local area, or extracting the positions of image feature points, such as Haar-like features or Scale-invariant feature transform (Scale-invariant feature transform, SIFT) features, from the whole image, and then selecting local areas with densely distributed feature points, etc. Fig. 2 is a schematic drawing illustrating selection of a reference frame local area (dashed box portion) according to an embodiment of the present application. As shown in fig. 2, the small-sized local area effectively reduces the calculation amount, for example, for a two-dimensional image of n×n, the time complexity of the cross-correlation calculation mode based on the fast fourier transform (Fast Fourier Transform, FFT) is O (n≡2×log N), so that the time complexity of the local area mode can be calculated to be about equal to N/r≡2 compared with the whole area mode, N is the number of local areas, r is the ratio of the whole image size to the local area size, n= 5,r =32, and the calculation amount is reduced to about five thousandths of the original calculation amount. And the difference of local illumination is much smaller than that of integral illumination, so that the robustness of illumination change is improved, and the calculation amount of a small-sized local area is effectively reduced.

Assume that N reference frame local regions are selected, and the central position (or the position of one of the vertices) of each reference frame local region is respectively: (rx_i, ry_i), i=1. The subsequent image of the video, called the current frame image, extracts N local areas, i.e. new local areas, from the current frame image, which may be of a different size than the reference frame local area, but still around a few tenths of the overall size of the image. The extraction process comprises generating affine transformation image from current frame image via affine transformation matrix, finding out region with the same position as reference frame local region in affine transformation image as new local region, specifically assuming affine transformation matrix from current frame image to reference frame image is A _t In which the initial affine transformation is an identity matrix, i.e. A ₀ ＝I。

The tracking of two-dimensional images generally comprises two basic processes of selecting a reference image and calculating a tracking result from a current image, wherein the tracking result can be represented by transformation parameters, and the tracking result can be from basic translation parameters to complex projective transformation comprising six parameters according to different applications. The expression writable formula ax=x' is expressed by homogeneous coordinate representation; the following formula represents the affine transformation relationship between two images:

the following matrix is an affine matrix:

wherein (x, y), (x ', y') represent coordinate points of the reference image and the current image, respectively.

In this embodiment, affine transformation of the image is realized by specifying the form of affine transformation matrix; and the minimum number of new local areas required for determining the parameters of the affine transformation matrix can be set, and the new local areas and the corresponding reference frame local areas are taken as a group of data to be substituted into operation, so that the parameters of the affine transformation matrix are actually determined to be the minimum number of required data groups, the minimum number is not limited, and more data groups can obtain more accurate optimal solutions.

In retinal-based eye tracking, the transformations involved in the image are typically mainly translations in both directions XY and rotations, scaling in a small range, several of which are mainly considered in this application. The above formula can be simplified to include four parameters, as shown in the following formula:

that is, as a preferred embodiment, the affine transformation matrix is in the form of:

the four parameters are s, θ, t _x 、t _y The method comprises the steps of carrying out a first treatment on the surface of the Wherein s is the zoom magnification, θ is the rotation angle, t _x Is the translation value in the x direction, t _y Is the translation value in the y direction. Wherein the minimum number of new local areas is two; if the rotation angle θ is set to 0 and the zoom magnification s is set to 1, the minimum number of new partial areas is one, regardless of the rotation and the zoom. For the number of new local areas, if there are many pairs of the reference frame local area and the new local area, a person skilled in the art can solve the optimal parameters by combining multiple sets of data according to the prior art, and the method is not limited in this embodiment.

Fig. 3 is a schematic diagram of a current frame image provided in an embodiment of the present application, and fig. 4 is a schematic diagram of an affine transformation image provided in an embodiment of the present application. First, the current frame image (shown in fig. 3) is integrally represented by a _t Affine transformation to obtain affine transformed image (as shown in FIG. 4), F ^T ＝A _t F, therein F, F ^T The homogeneous coordinates of the current frame before and after transformation are respectively represented, an affine transformation image is finally obtained, and then the area with the same position as the local area of the reference frame is found in the affine transformation image to be used as a new local area. It should be noted that the specific form of the affine transformation matrix is not limited in the present embodiment, depending on the specific implementation. Further, the offset (sx_i, sy_i) obtained by solving the new local area and the corresponding reference frame local area is used as a tracking result of the current moment. It should be noted that, the tracking result is represented by the value of the offset, and the specific correspondence is not limited in the embodiment, and depends on the specific implementation.

In order to continuously track the eye movement, in this embodiment, an affine transformation matrix is also required to be updated, a current frame image at the next moment is acquired, and the step returns to S3, that is, the step returns to the step of generating an affine transformation image from the current frame image through the affine transformation matrix, and an area with the same position as the local area of the reference frame is found in the affine transformation image to be used as a new local area; since the affine transformation matrix is obtained again, returning to the step again can track the eye movement again, and finally real-time tracking of the eye movement is realized. It should be noted that, in this embodiment, the specific updating step of the affine transformation matrix is not limited, and depends on the specific implementation.

Based on the above embodiments:

as a preferred embodiment, in step S5, the specific step of updating the affine transformation matrix includes:

obtaining an inverse matrix of the affine transformation matrix;

selecting reference point coordinates in a local area of a reference frame, overlapping the reference point coordinates with the offset, substituting the reference point coordinates and the offset into an affine transformation matrix together with an inverse matrix, and calculating to obtain target coordinates of the reference point under the current frame image coordinate system;

In the above embodiment, the specific updating step for updating the affine transformation matrix is not limited, and depends on the specific implementation. In the present embodiment, as a preferred embodiment, when updating the affine transformation matrix, the affine transformation matrix a is first acquired _t Is the inverse matrix A of (2) _t ^-1 The reference point coordinates in the reference frame local area are selected, and the reference point coordinates of each reference frame local area are assumed to be (rx_i, ry_i) and the offset is (sx_i, sy_i). FIG. 5 is a schematic diagram of a position of a new local area in a new current frame image according to an embodiment of the present application; the position obtained by superimposing the reference point coordinates and the offset amounts is (ncx _i, ncy _i), i.e., the position of the dashed box pointed by the arrow in fig. 5, where ncx _i=rx_i+sx_i, ncy _i=ry_i+sy_i. FIG. 6 is a schematic diagram showing the positions of local areas of an original current frame image according to an embodiment of the present application, where the positions (ncx _i, ncy _i) and the inverse matrix areSubstituting into affine transformation matrix, calculatingThe target coordinates (cx_i, cy_i) of the reference point in the current frame image coordinate system, i.e., the portion of the dashed-line frame in fig. 6, are obtained. Finally, obtaining a new affine transformation matrix according to the reference point coordinates (rx_i, ry_i) and the target coordinates (cx_i, cy_i), and obtaining a new affine transformation matrix A _t+1 . It should be noted that the above-mentioned set of corresponding points may generate two linear equations, the plurality of sets of corresponding points may list the linear equations, and in order to solve the parameters, the number of required corresponding points is at least half of the degree of freedom, and the plurality of points form an overdetermined equation set, which may be solved based on a method such as a least square method or singular value decomposition (Singular Value Decomposition, SVD), and the like, which is not limited in this embodiment.

In the embodiment, the inverse matrix of the affine transformation matrix is obtained, the coordinates of the reference points in the local area of the reference frame are selected, the coordinates of the reference points and the offset are overlapped, then the overlapped coordinates and the inverse matrix are substituted into the affine transformation matrix, and the target coordinates of the reference points under the current frame image coordinate system are obtained through calculation; according to the reference point coordinates and the target coordinates, a new affine transformation matrix is obtained, and the continuously obtained new affine transformation matrix realizes real-time tracking of retina during eyeball movement.

Fig. 7 is a flowchart of another eye tracking method based on retinal images according to an embodiment of the present application. As shown in fig. 7, steps S3 and S4 are replaced with:

s41, obtaining the offset between the transformed reference frame local area and the new local area, and taking the offset as a tracking result of the current moment.

It will be appreciated that the tracking of eye movement in the present application is achieved by obtaining an offset between a reference frame local region and a new local region, provided that the image is affine transformed by an affine transformation matrix to obtain the new local region. In the method provided in the above embodiment, affine transformation is performed on the current frame image first, and the area with the same position as the reference frame local area is found in the obtained affine transformation image as a new local area. Similarly, affine transformation of the reference frame image can also achieve the acquisition of the offset.

Specifically, generating an affine transformation image from a reference frame image through an affine transformation matrix to obtain a transformed reference frame local area, and finding an area with the same position as the transformed reference frame local area in the current frame image as a new local area; it is to be noted that the affine transformation matrix here may be the same as that in the above-described method, and the specific form thereof is not limited in the present embodiment, depending on the specific implementation. After the new local area is acquired, the offset between the transformed reference frame local area and the new local area is acquired, and the offset is used as the tracking result at the current moment, so that the tracking result for a period of time is continuously obtained, and the eye movement tracking is finally realized. Correspondingly, after the tracking result at the current moment is obtained, the affine transformation matrix is updated, the reference frame image at the next moment is obtained, the process returns to the step S31, and the subsequent steps are continuously executed until the tracking is stopped. The specific step of updating the affine transformation matrix in this embodiment is not limited, and depends on the specific implementation.

In the embodiment, an affine transformation image is generated by an affine transformation matrix of a reference frame image, a transformed reference frame local area is obtained, and an area with the same position as the transformed reference frame local area is found in a current frame image and used as a new local area; and acquiring the offset between the transformed reference frame local area and the new local area as a tracking result at the current moment, and realizing eye movement tracking in another mode.

Based on the above embodiments:

selecting reference point coordinates in a local area of a reference frame, and overlapping the coordinates obtained by transforming the reference point coordinates in the step S31 with the offset to obtain new coordinates;

and acquiring a new affine transformation matrix according to the coordinates of the reference points and the new coordinates.

It will be appreciated that the above-described embodiment differs in the manner in which the affine transformation matrix is updated on the basis of affine transformation of the reference frame image by the affine transformation matrix to generate the affine transformation image. Specifically, selecting reference point coordinates in a reference frame local area, transforming the reference point coordinates in the step S31, namely generating an affine transformation image by an affine transformation matrix of a reference frame image to obtain transformed coordinates of the reference frame local area, and overlapping the transformed coordinates with offset to obtain new coordinates; and obtaining a new affine transformation matrix according to the reference point coordinates and the new coordinates. Finally, update of the affine transformation matrix on the basis of steps S31 and S41 is realized.

In this embodiment, in order to update the affine transformation matrix based on steps S31 and S41, the reference point coordinates in the local area of the reference frame are selected, the coordinates obtained by transforming the reference point coordinates in step S31 are superimposed with the offset to obtain new coordinates, the new affine transformation matrix is obtained according to the reference point coordinates and the new coordinates, and then the eye tracking is achieved after returning to step S31.

Fig. 8 is a flowchart of another eye tracking method based on retinal images according to an embodiment of the present application. As shown in fig. 8, after step S3 or step S31, the method further includes:

s6, judging whether the image brightness of the new local area meets the preset condition, if not, entering the step S7, and if yes, entering the step S4 or the step S41.

And S7, eliminating the new local area, and proceeding to step S4 or step S41.

It will be appreciated that the image of the local area may be pre-detected prior to the acquisition of the offset. For example, it is determined whether the local area is excessively dark due to blinking or excessively bright due to overexposure. And when the brightness of the image does not meet the preset condition, eliminating the image to remove the partial area which does not meet the requirement. In the specific implementation, the method is judged according to the mean value and the variance of the gray value of the whole local area, and then the method proceeds to the next step S4 or step S41. In this embodiment, the specific content of the preset condition is not limited, and depends on the specific implementation. It will be appreciated that step S6 and step S7 are determination as to whether the image of the new local area satisfies the preset condition, and thus are applicable to acquiring the new local area through step S3 or step S31.

In this embodiment, in order to make the image of the partial area meet the requirements before the offset is acquired, it is determined whether the image meets the preset conditions, and meanwhile, the portion that does not meet the requirements is removed, so that the offset can be acquired later.

As shown in fig. 8, in order to determine whether the eye movement tracking is successful, after step S4 or step S41, the method further includes:

s8, judging whether the tracking result meets the preset requirement; if not, tracking again; if yes, the process proceeds to step S5.

It will be appreciated that during eye tracking, there may be instances where the eye tracking fails due to excessive eye movement speed or other disturbance factors. In order to ensure that the movement of the eyeball can be continuously tracked, judging whether a tracking result meets a preset requirement; if not, tracking again; if so, the process proceeds to step S5, in which the affine transformation matrix is updated. It should be noted that, in this embodiment, specific content of the preset requirement is not limited, and depends on specific implementation conditions; the specific step of re-tracking is not limited, as the specific implementation will depend.

In this embodiment, in order to determine whether the eye movement tracking is successful, whether the tracking result meets a preset requirement is determined; if not, tracking again; if so, the process proceeds to step S5 to update the affine transformation matrix so that the eye movement tracking is continued.

Based on the above embodiments:

as a preferred embodiment, the specific steps for determining whether the tracking result meets the preset requirement are as follows:

judging whether the image confidence coefficient of the reference frame local area and the new local area meets a threshold value or judging whether the difference of the offset quantities of two continuous moments is in a preset range.

In the above embodiments, the specific content of the preset requirement is not limited, and depends on the specific implementation situation; as a preferred embodiment, the preset condition in this embodiment may be to determine whether the image confidence of the reference frame local area and the new local area meets the threshold. Specifically, a preferred way is to take the threshold value of the confidence coefficient calculated by the cross-correlation method to 0.9, and determine that the tracking result meets the requirement when the threshold value is met. Or the preset requirement is set to judge whether the difference between the offset values of two continuous moments is within a preset range, namely the difference between the offset values solved by two continuous frames cannot exceed a set threshold value; here, an example is illustrated: at a video frame rate of 30fps, an image resolution of 1024x1024 pixels, and a field of view of 50 °, the threshold value takes 15 pixels. In addition to the above two preset requirements, the preset requirements may be set to at least two other requirements, such as a confidence level, and the number of trusted requirements is not limited in this embodiment, and depends on the specific implementation situation.

In this embodiment, the specific step of setting the judgment whether the tracking result meets the preset requirement is to judge whether the image confidence of the local area of the reference frame and the new local area meet the threshold, or judge whether the difference between the offset amounts of two consecutive moments is within the preset range, so as to effectively judge the tracking result.

Based on the above embodiments:

as a preferred embodiment, the specific steps of re-tracking are:

s9, setting the translation parameter of the current frame image to be 0, and carrying out affine transformation on the current frame image by a set of set rotation parameters and scaling parameters to obtain a new current frame image.

S10, acquiring the offset between each reference frame local area and the new current frame image.

S11, determining the corresponding relation between the position of each reference frame local area in the reference frame image and the position of each reference frame local area in the current frame image according to the offset to solve an affine transformation matrix.

S12, judging whether the re-tracking is successful or not; if yes, returning to the step S3, if not, carrying out affine transformation on the current frame image by using another set of set rotation parameters and scaling parameters to obtain a new current frame image, and returning to the step S10.

It will be appreciated that re-tracking of eye movement may also be required due to excessive eye movement in an implementation or due to other reasons which may result in failure of eye movement tracking.

Specifically, the position of each reference frame local area in the current frame global image needs to be solved. At this time, considering that the rotation and scaling range of the retina image is smaller, and that the template matching modes such as cross correlation have certain rotation invariance and scaling invariance, the two parameters can be searched by adopting a grid search mode, such as smaller grids of 3x3, 5x5 and the like, namely, after a set of rotation and scaling parameters are determined, the current frame image is rotated and scaled to obtain a new current frame image. And then, the cross-correlation between the local area of each reference frame and the new current frame image is obtained, so as to judge whether the re-tracking is successful or not. The decision criteria may be based on the confidence of the cross-correlation, depending on the particular implementation. If the re-tracking is judged to be successful, determining a one-to-one correspondence between the position of each reference frame local area in the reference frame coordinate system and the position in the current frame coordinate system according to the set of rotation and scaling parameters and the corresponding cross-correlation offset, and solving an affine transformation matrix in a similar manner to the update of the affine transformation matrix. After the solution is successful, updating the affine transformation matrix, and re-entering a real-time tracking state; and if the re-tracking failure is judged, repeating the process again to solve.

In this embodiment, if it is determined that the eye movement tracking fails, the eye movement is re-tracked, and the offset of the local area of the reference frame relative to the new current frame image is calculated after affine transformation is performed on the current frame image, and the affine transformation matrix is solved according to the offset, so that the eye movement re-tracking is finally realized.

In the above embodiments, detailed descriptions are given to an eye tracking method based on a retinal image, and the present application also provides corresponding embodiments of an eye tracking device based on a retinal image.

Fig. 9 is a schematic structural diagram of an eye tracking device based on a retinal image according to an embodiment of the present application. As shown in fig. 9, the eye tracking apparatus based on the retinal image includes:

the first acquisition module 10 is configured to acquire a reference frame image and a current frame image in a video.

The selecting module 11 is configured to select a reference frame local area from the reference frame image, where the size of the reference frame local area is smaller than the size of the reference frame image.

The generating module 12 is configured to generate an affine transformation image of the current frame from the current frame image via an affine transformation matrix. Or generating a reference frame affine transformation image by the reference frame image through an affine transformation matrix;

a second obtaining module 13, configured to find, in the affine transformation image of the current frame, a region with the same position as the local region of the reference frame as a new local region, and obtain an offset between the local region of the reference frame and the new local region;

or obtaining a transformed reference frame local area in a reference frame affine transformation image, finding an area with the same position as the transformed reference frame local area in the current frame image as a new local area, and obtaining the offset between the transformed reference frame local area and the new local area.

A third obtaining module 14, configured to obtain the current frame image at the next moment, update the affine transformation matrix, trigger the generating module 12 and the second obtaining module 13, so as to track the eye movement in real time.

Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.

The above describes in detail an eye tracking method and device based on retinal images provided in the present application. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.

It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. An eye movement tracking method based on retina images, comprising:

s1, acquiring a reference frame image and a current frame image in a video;

s5, updating the affine transformation matrix, acquiring a current frame image at the next moment, and returning to the step S3 to track the eye movement in real time;

in step S5, the specific step of updating the affine transformation matrix includes:

acquiring an inverse matrix of the affine transformation matrix;

2. The eye tracking method based on retinal images according to claim 1, wherein steps S3 and S4 are replaced with:

3. The eye tracking method based on retinal image according to claim 2, wherein in step S5, the specific step of updating the affine transformation matrix includes:

4. A retinal image based eye tracking method according to any one of claims 1 to 3 wherein the affine transformation matrix is in the form of:

5. A retinal image based eye tracking method according to any one of claims 1 to 3 further comprising, after step S3 or step S31:

if not, rejecting the new local area;

if yes, the flow proceeds to step S4 or step S41.

6. A retinal image based eye tracking method according to any one of claims 1 to 3, further comprising, after step S4 or step S41:

judging whether the tracking result meets the preset requirement;

if not, tracking again;

if yes, the process proceeds to step S5.

7. The eye tracking method based on retina image according to claim 6, wherein the specific step of judging whether the tracking result meets the preset requirement is:

8. The eye tracking method based on retinal images according to claim 6, wherein the specific step of re-tracking is:

judging whether the re-tracking is successful or not;

if yes, returning to the step S3;

9. An eye tracking device based on retinal images, comprising:

or obtaining a transformed reference frame local area in a reference frame affine transformation image, finding an area with the same position as the transformed reference frame local area in the current frame image as a new local area, and obtaining the offset between the transformed reference frame local area and the new local area;

the third acquisition module is used for acquiring a current frame image at the next moment, updating the affine transformation matrix and sequentially triggering the generation module and the second acquisition module so as to track eye movement in real time;

the specific step of updating the affine transformation matrix comprises the following steps: acquiring an inverse matrix of the affine transformation matrix; selecting reference point coordinates in the reference frame local area, overlapping the reference point coordinates with the offset, substituting the reference point coordinates and the inverse matrix into the affine transformation matrix, and calculating to obtain target coordinates of the reference point under the current frame image coordinate system; and acquiring a new affine transformation matrix according to the reference point coordinates and the target coordinates.