WO2022227264A1

WO2022227264A1 - Video interactive operation method based on eyeball tracking

Info

Publication number: WO2022227264A1
Application number: PCT/CN2021/103342
Authority: WO
Inventors: 赵立; 张常华; 朱正辉; 赵定金
Original assignee: 广州市保伦电子有限公司
Priority date: 2021-04-27
Filing date: 2021-06-30
Publication date: 2022-11-03
Also published as: CN113918007A; CN113918007B

Abstract

Disclosed in the present invention is a video interactive operation method based on eyeball tracking. The method comprises: determining the coordinates P(x, y) of an operator relative to a screen, and according to P(x, y), determining whether the operator is within a preset position range, and if the operator is within the preset position range, continuing to execute the following steps, otherwise ending processing; determining a facial image of the operator, intercepting ROI images of left and right eyes from the facial image, and if the ROI images of the left and right eyes can be intercepted, determining that a front face faces the screen, otherwise, determining that a side face faces the screen, determining a left pupil and a right pupil, and then determining that the vision of the operator falls within a vision area of the screen; and step 3, determining change situations of the pupils according to two preceding and following adjacent ROI images of the left and right eyes, and executing a corresponding video interactive operation according to the change situations of the pupils. By means of the present invention, a preset position range within which an operator is located can be well determined, thereby accurately tracking eyeballs, and therefore a video interactive operation can be better executed.

Description

A method of video interaction based on eye tracking

technical field

The invention relates to the technical field of eye tracking, in particular to a video interactive operation method based on eye tracking.

Background technique

Most of the existing on-screen video interaction operations are achieved through mouse, keyboard or other physical hardware (such as VR handle) to achieve different operations, such as switching, closing, fast-forwarding and other video operations. However, this operation is not suitable in some scenarios. For example, for disabled persons with broken arms, the mouse and keyboard operation method is usually very inconvenient, and it is difficult for such persons to effectively control the screen for a long time. Therefore, if the screen can be controlled by eyeballs, the existing video interactive operation mode can be separated, it is more convenient for personnel to operate the screen content, and the cost of personnel and hardware can be well saved.

The current eye tracking is not very mature, and the focus tracking that falls on the screen in front of the eyes is not very accurate, which leads to the delay or even failure of controlling the corresponding operation based on eye tracking. To this end, a video interaction operation method that can more accurately track eyeballs is required.

SUMMARY OF THE INVENTION

In view of the deficiencies of the prior art, the purpose of the present invention is to provide a video interactive operation method based on eye tracking, which can solve the problem of the accuracy of eye tracking.

The technical solution for realizing the object of the present invention is: a video interaction operation method based on eye tracking, comprising the following steps:

Step 1: Determine the coordinates P(x,y) of the operator relative to the screen, x represents the vertical distance of the operator from the screen, y represents the horizontal distance of the operator relative to the screen, according to P(x,y) to determine whether the operator is Within the preset position range, if within the preset position range, continue to perform the following steps, otherwise, end the process;

Step 2: Determine the face image of the operator, and intercept the ROI images of the left and right eyes from the face image. If the ROI images of the left and right eyes can be intercepted, it is determined that the operator is facing the screen and the following steps are continued, otherwise, It is judged that the operator is facing the screen and the processing is ended.

Determine the left and right pupils from the ROI images of the left and right eyes, and determine the visual area where the operator's vision falls on the screen according to the left and right pupils;

Step 3: According to the ROI images of the two adjacent left and right eyes, compare the change positions of the left pupil and the right pupil, determine the change of the pupil, and perform the corresponding video interaction operation according to the change of the pupil.

Further, determining the preset position range includes the following steps:

Step S1: Determine the shooting range of the camera device and the viewing angle range of the screen. The camera device is used to track the eyeball of the operator. The shooting range of the camera device is determined by the two outermost shooting lines, and the area between the two dotted lines is the camera device. The viewing angle range of the screen is determined by the two outermost display lines, and the area between the two display lines is the viewing angle range of the screen;

Step S2: take the horizontal line where the intersection of the two outermost display lines representing the viewing angle range of the screen are located as the farthest distance from the human eye to the screen, and the horizontal line at the intersection of the two display lines is marked as the first horizontal line;

Step S3: The horizontal line where the intersection of the two angular bisectors of the viewing angle range of the screen is located is used as the closest distance between the human eye and the screen, and the horizontal line where the intersection of the two angular bisectors is located is recorded as the second horizontal line;

Step S4: Use the first horizontal line, the second horizontal line, and a trapezoidal area surrounded by two shooting lines outside the camera device as the preset position range.

Further, the ROI images of the left and right eyes are grayed to obtain grayscale ROI images of the left and right eyes, and the lowest gray value position in the grayscaled ROI images of the left and right eyes is the position of the pupil, thereby determining the left pupil and right pupil.

Further, the visual area where the operator's vision falls on the screen is a rectangular visual area Q, and the width and height of the visual area Q are calculated according to the following formula:

In the formula, A represents the angle between the operator's sight and the screen.

Further, in the step 3, the pupil changes include that the pupil moves to the left, moves to the right, moves upward, moves downward, blinks and the vision stays in the visual area for more than a preset time,

If the vision stays in the visual area for more than a preset time, the first operation is performed; if the pupil moves to the left, the second operation is performed; if the pupil moves to the right, the third operation is performed; if the pupil moves upward, the fourth operation is performed Operation, if the pupil moves down, execute the fifth operation, if blink, execute the sixth operation, the first operation and the sixth operation are different operations.

Further, the pupil changes are compared by comparing the pupil positions in the two adjacent left and right eye ROI images. If the abscissa of the current pupil position is larger than the abscissa of the previous pupil position, the pupil moves to the right. Small, the pupil moves to the left, if the abscissa of the current pupil position is equal to the abscissa of the previous pupil position and the ordinate of the current pupil position is greater than the ordinate of the previous pupil position, the pupil moves up, if it is smaller, then down move.

Further, before the step 3, it also includes judging whether the operator moves, and the operator movement includes body movement, face movement and pupil movement,

If the operator moves, it will not respond to any video interaction of the operator,

If the operator only moves the face, then determine whether the moved face is within the preset position range, if so, continue to perform step 3, otherwise, end the process,

If the operator only moves the pupil, the pupil movement direction is determined by the pupil changes of the two adjacent left and right eye ROI images before and after the movement, so as to perform the corresponding operations in the first operation to the sixth operation.

The beneficial effects of the present invention are as follows: the present invention can well determine the preset position range where the operator is located, so that the eyeball can be tracked more accurately, and the eyeball positioning and eye movement judgment can be well realized, so as to better Perform video interactive operation, the operation method is simple and convenient, which can greatly improve the control of the system and reduce the requirements and costs for the control personnel.

Description of drawings

FIG. 1 is a schematic diagram of determining a preset position range.

Detailed ways

Hereinafter, the present invention will be further described with reference to the accompanying drawings and specific embodiments.

As shown in Figure 1, a kind of video interaction operation method based on eye tracking, comprises the following steps:

Step 1: Determine the coordinates P(x, y) of the operator relative to the screen, x represents the vertical distance of the operator from the screen, y represents the horizontal distance of the operator relative to the camera device for eye tracking, and the horizontal distance y is also That is, the horizontal distance of the operator relative to the screen. Determine whether the operator is within the preset position range according to P(x, y). If it is within the preset position range, continue to perform the following steps, otherwise, end the process. Wherein, the camera device can be installed at the central position of the top of the screen, and use a vertex of the screen as the origin to establish the Kadir coordinates, for example, the vertex of the lower left corner of the screen as the origin, so as to obtain the coordinates P(x, y).

The specific value of x can be obtained by measuring the infrared distance measuring device, and the infrared distance measuring device can be installed at the central position of the top of the screen.

Wherein, when the operator is located in the middle of the screen, the horizontal distance y of the operator is half of the screen width, assuming that the screen width is W, then y=W/2. When the operator is not located in the middle of the screen, the actual distance between the operator and the camera device on the screen at this time (which is a diagonal distance) is a diagonal line, and the actual distance is also measured by the infrared distance measuring device. Then the vertical distance x can be obtained according to the trigonometric function under the triangular geometric relationship. Assuming that the actual distance between the operator and the camera device on the screen is b, and the angle between the operator's sight and the screen (that is, the angle between the operator and the camera device) is A, then x=sinA*b, the horizontal distance is

Thus P(x,y) can be obtained.

In an optional implementation manner, the vertical distance between the operator's face and the screen is taken as x, and the horizontal distance between the operator's face and the camera device for eye tracking is taken as y. The advantage of this is that the operator sometimes does not stand straight in front of the screen, but may stand in front of the screen in a bent, half-squatting and other postures. At this time, the distance between the face and the body part relative to the screen is different. Eyes can be more accurately tracked using the distance of the face relative to the screen and camera.

In an optional implementation manner, the coordinates P(x, y) are determined and subsequent steps are performed only when the operator is within a preset position range in front of the screen. Under normal circumstances, the human binocular vision angle is 124°, that is, the human eye vision is 124°, the visual focus angle when both eyes are focused is 25°, and the visual angle when both eyes are focused should be between 50°-124°. For this reason, since the human eye, especially the human eyeball, cannot be in the blind spot of the camera, it is necessary to limit the human eye to a preset range in front of the screen, otherwise the direction and distance of the eyeball movement cannot be calculated.

When the operator is not within the preset position range, the eye tracking is stopped, that is, it does not respond to any operation of the video interaction, and a prompt is issued, such as a prompt on the screen to move out of the range and the like.

Referring to FIG. 1, determining the preset position range can be achieved by the following steps:

Step S1: Determine the shooting range of the camera and the viewing angle range of the screen. The shooting range of the camera device is determined by the two outermost shooting lines. The two dotted lines in FIG. 1 are the two outermost shooting lines, and the area between the two dotted lines is the shooting range of the camera device. The viewing angle range of the screen is determined by the two outermost display lines. The area between the two display lines is the viewing angle range of the screen. The two lines with arrows in Figure 1 are the two outermost display lines. Wire.

Step S2: Take the horizontal line where the intersection of the two display lines at the outermost side of the viewing angle range of the screen is located as the farthest distance from the human eye to the screen, and the horizontal line at the intersection of the two display lines is recorded as the first horizontal line. The lowermost horizontal line in Figure 1 is that which allows a person (eye) to stand at the farthest position from the screen.

Step S3: The horizontal line where the intersection of the two angular bisectors of the viewing angle range of the screen is located is used as the closest distance between the human eye and the screen, and the horizontal line where the two angular bisectors intersect is recorded as the second horizontal line. The horizontal line in the middle in Fig. 1 is that which allows a person (eye) to stand at the closest position to the screen.

Step S4: Use the first horizontal line, the second horizontal line, and a trapezoidal area surrounded by two shooting lines outside the camera device as the preset position range. The trapezoidal area containing the filled lines in Figure 1 is the preset position range, that is, the human eye can be tracked by the camera device only within this range, and the human eye can be tracked within this range. better tracked by the camera.

Step 2: Determine the face image of the operator, intercept the ROI images of the left and right eyes from the face image, and determine the respective positions of the left eye ROI image and the right eye ROI image on the face. If the left and right eye ROI images cannot be captured from the face image, for example, only one eye (left eye or right eye) ROI image or both eye ROI images cannot be captured, it is determined that the current operator is on the side. face towards the screen. If the left and right eye ROI images can be captured from the face image, that is, both eye ROI images can be captured, it is determined that the current operator is facing the screen. When it is judged that the operator's face is facing the screen, it will not respond to any interaction of the operator with the video on the screen, that is, stop eye tracking. Only when it is judged that the operator is facing the screen, the operator will respond to the video interaction on the screen.

The positions of the ROI images of the left and right eyes on the human face can be determined by coordinates, and the coordinate system is established with the human face as a reference system, which will not be described in detail. The ROI is also called a region of interest, which is usually an area framed by a rectangular frame with a fixed width and height, and its specific size can be determined according to actual needs. Determining the ROI is the prior art, which will not be repeated here.

In an optional implementation manner, since the size of the left and right eye ROI images and the position of the person are fixed, the direction and angle of the face offset can be determined by comparing the size of the left and right eye ROI images. The width and height of the left eye ROI image are denoted as LW and LH respectively, the width and height of the right eye ROI image are denoted as RW and RH respectively, and the upper left coordinate of the left eye ROI image relative to the face image is P(LX, LY), and the right eye ROI image is P(LX, LY). The upper left coordinate of the ROI image relative to the face image is P(RX, RY), and it is assumed that the width and height of the face image are FW and FH respectively (which can be measured by taking pictures). According to the structure of the face, the distance between the eyes and the top of the head is about the height of the face multiplied by 0.25, and the width of the eyes is the width of the face multiplied by 0.13. Therefore, there are LY=RY=0.25*FH, LX=0.13FW, RX=FW-LW-0.13 fw. When the face is facing the camera, the size of the left and right eyes is the same, that is (LW=RW, LH=RH), and LY=RY, that is, the Y coordinates of the left and right eyes are the same and on the same level . When the face is offset at a certain angle, the width, height and coordinates of the left and right eye ROI images will change. According to the changes in the width, height and coordinates of the left and right eye ROI images, the offset direction and angle of the face can be judged.

After it is determined that the current operator faces the screen, grayscale processing is performed on the left and right eye ROI images to obtain grayscale left and right eye ROI images. The respective pupils of the left and right eyes are found from the grayscaled ROI images of the left and right eyes, that is, the left pupil and the right pupil are found. The gray value at the location of the pupil is the lowest, so the lowest gray value in the grayscaled left and right eye ROI images can be used as the left pupil and the right pupil. Therefore, the time direction of the eyeball is determined according to the left pupil and the right pupil, so as to determine the rectangular visual area Q that the operator falls on the screen. The width and height of the visual area Q are width and height respectively, and there are

Step 3: Judging the change position of the pupil according to the comparison between the two adjacent ROI images of the left and right eyes, so as to determine whether there is a change in the pupil and the change of the pupil, and perform corresponding interactive operations on the video in the visual area Q according to the change of the pupil. Wherein, when the pupil change means that the vision of the human eye stays on the screen for a certain period of time in the visual area (which can be determined by a preset time), the first operation is performed. When the pupil moves to the left (which also means that the eyeball moves to the left, and the subsequent pupil movement also means that the eyeball moves accordingly), the second operation is performed. When the pupil moves to the right, the third operation is performed. When the pupil moves upward, the fourth operation is performed. When the pupil moves downward, the fifth operation is performed. The first to fifth operations are distinct operations, that is, the five operations are respectively distinct operations. For example, the first operation is to zoom in on the video, the second operation is to switch the video, the third operation is to play the video, the fourth operation is to fast-forward, and the fifth operation is to pause. Of course, the first to fifth operations may also correspond to other operations.

In an optional embodiment, the method further includes judging whether to blink, and if there is a blink, execute a sixth operation, and the sixth operation is different from the first operation - the fifth operation. For example, the sixth operation is to close the window (eg, the webpage) where the video is located. To judge whether to blink, you can compare the two adjacent ROI images of the left and right eyes before and after. If one of the two adjacent ROI images of the left and right eyes has a pupil and the other does not have a pupil, it is judged that there is a blink. For example, if there is a pupil in the previous ROI image of the left and right eyes, and there is no pupil in the current ROI image of the left and right eyes, it means that there is currently a blink, and the sixth operation needs to be performed.

Among them, the pupil movement direction can be compared by comparing the pupil positions in the two adjacent left and right eye ROI images. If the abscissa of the current pupil position is larger than the abscissa of the previous pupil position, it means that the pupil moves to the right , if it is smaller, it means that the pupil moves to the left. If the abscissa of the current pupil position is equal to the abscissa of the previous pupil position and the ordinate of the current pupil position is greater than the ordinate of the previous pupil position, it means that the pupil moves up, if it is smaller, it moves down. Of course, the above is based on the positive direction of the abscissa as the right direction and the positive direction of the ordinate as the upward direction.

In an optional embodiment, it is determined whether the operator moves. The operator movement includes body movement, face movement, and eyeball (pupil) movement. If the operator's body moves (the corresponding face and eyeball also follow the movement), then Do not respond to any video interaction by the operator, i.e. stop eye tracking. If the operator's body does not move but only the face moves, the movement amount of the face is calculated by the offset of the nose, and it is judged whether the moved face is within the preset position range. Does not respond to any video interaction by the staff.

If neither the operator's body nor face moves, but only the eyeball moves, compare the pupil changes of the two adjacent left and right eye ROI images to determine the eyeball movement direction, so as to perform the corresponding operations in the first operation to the sixth operation above.

The invention can well determine the preset position range where the operator is located, so that the eyeball can be tracked more accurately, and the eyeball positioning and eye movement judgment can be well realized, so that the video interaction operation can be better performed. Simple and convenient, it can greatly improve the control of the system and reduce the requirements and costs for control personnel.

The embodiment disclosed in this specification is only an illustration of the unilateral feature of the present invention, and the protection scope of the present invention is not limited to this embodiment, and any other functionally equivalent embodiments fall within the protection scope of the present invention. For those skilled in the art, various other corresponding changes and deformations can be made according to the technical solutions and concepts described above, and all these changes and deformations should fall within the protection scope of the claims of the present invention.

Claims

An eye-tracking-based video interaction operation method, comprising the following steps:

Step 1: Determine the coordinates P(x,y) of the operator relative to the screen, x represents the vertical distance of the operator from the screen, y represents the horizontal distance of the operator relative to the screen, according to P(x,y) to determine whether the operator is Within the preset position range, if within the preset position range, continue to perform the following steps, otherwise, end the process;

Step 2: Determine the face image of the operator, and intercept the ROI images of the left and right eyes from the face image. If the ROI images of the left and right eyes can be intercepted, it is determined that the operator is facing the screen and the following steps are continued, otherwise, It is judged that the operator is facing the screen and the processing is ended.

Determine the left and right pupils from the ROI images of the left and right eyes, and determine the visual area where the operator's vision falls on the screen according to the left and right pupils;

Step 3: According to the ROI images of the two adjacent left and right eyes, compare the change positions of the left pupil and the right pupil, determine the change of the pupil, and perform the corresponding video interaction operation according to the change of the pupil.
The eye-tracking-based video interaction operation method according to claim 1, wherein determining the preset position range comprises the following steps:

Step S1: Determine the shooting range of the camera device and the viewing angle range of the screen. The camera device is used to track the eyeball of the operator. The shooting range of the camera device is determined by the two outermost shooting lines, and the area between the two dotted lines is the camera device. The viewing angle range of the screen is determined by the two outermost display lines, and the area between the two display lines is the viewing angle range of the screen;

Step S2: take the horizontal line where the intersection of the two outermost display lines representing the viewing angle range of the screen are located as the farthest distance from the human eye to the screen, and the horizontal line at the intersection of the two display lines is marked as the first horizontal line;

Step S3: The horizontal line where the intersection of the two angular bisectors of the viewing angle range of the screen is located is used as the closest distance between the human eye and the screen, and the horizontal line where the intersection of the two angular bisectors is located is recorded as the second horizontal line;

Step S4: Use the first horizontal line, the second horizontal line, and a trapezoidal area surrounded by two shooting lines outside the camera device as the preset position range.
The video interactive operation method based on eye tracking according to claim 1, wherein the ROI images of the left and right eyes are grayscaled to obtain the grayscale ROI images of the left and right eyes, and the grayscaled left and right eye ROI images are obtained. The position of the lowest gray value in the ROI image is the position of the pupil, thereby determining the left pupil and the right pupil.
The eye-tracking-based video interactive operation method according to claim 1, wherein the visual area where the operator's vision falls on the screen is a rectangular visual area Q, and the width and height of the visual area Q are as follows The formula calculates:

In the formula, A represents the angle between the operator's sight and the screen.
The eye-tracking-based video interaction operation method according to claim 1, wherein in step 3, the pupil changes include pupil movement leftward, rightward movement, upward movement, downward movement, blinking and visual stop over a preset time in the visual area,

If the vision stays in the visual area for more than a preset time, the first operation is performed; if the pupil moves to the left, the second operation is performed; if the pupil moves to the right, the third operation is performed; if the pupil moves upward, the fourth operation is performed Operation, if the pupil moves down, execute the fifth operation, if blink, execute the sixth operation, the first operation and the sixth operation are different operations.
The eye-tracking-based video interactive operation method according to claim 5, wherein the pupil changes are compared by comparing the pupil positions in the two adjacent left and right eye ROI images before and after, if the abscissa of the current pupil position is compared to the previous If the abscissa of the pupil position is larger, the pupil moves to the right, if it is smaller, the pupil moves to the left, if the abscissa of the current pupil position is equal to the abscissa of the previous pupil position and the ordinate of the current pupil position is greater than the previous pupil position The ordinate of the pupil position, the pupil moves up, if it is smaller than it, moves down.
The eye-tracking-based video interaction operation method according to claim 5, wherein before step 3, it further comprises judging whether the operator moves, and the operator movement includes body movement, face movement and pupil movement,

If the operator moves, it will not respond to any video interaction of the operator,

If the operator only moves the face, then determine whether the moved face is within the preset position range, if so, continue to perform step 3, otherwise, end the process,

If the operator only moves the pupil, the pupil movement direction is determined by the pupil changes of the two adjacent left and right eye ROI images before and after the movement, so as to perform the corresponding operations in the first operation to the sixth operation.