CN109901716B

CN109901716B - Sight point prediction model establishing method and device and sight point prediction method

Info

Publication number: CN109901716B
Application number: CN201910159483.2A
Authority: CN
Inventors: 林煜; 曾光; 余清洲; 许清泉; 张伟
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2019-03-04
Filing date: 2019-03-04
Publication date: 2022-08-26
Anticipated expiration: 2039-03-04
Also published as: CN109901716A

Abstract

The invention provides a sight point prediction model establishing method, a sight point prediction model establishing device and a sight point prediction method, and relates to the technical field of human eye sight point prediction, wherein the sight point prediction model establishing method comprises the following steps: the method comprises the steps of obtaining a plurality of initial images shot by a camera and including human faces, obtaining position coordinates of human eye sight points in each initial image corresponding to a display, processing each initial image respectively to obtain sample data including eye images, eye parameters and human face parameters, and conducting deep network learning based on the sample data obtained by processing each initial image and the position coordinates of the human eye sight points in each initial image corresponding to the display to establish a sight point prediction model. By the method, the sight point of the human eye can be predicted quickly and reliably.

Description

Sight point prediction model establishing method and device and sight point prediction method

Technical Field

The invention relates to the technical field of human eye sight point prediction, in particular to a sight point prediction model establishing method and device and a sight point prediction method.

Background

At present, the prediction of the sight point of the human eye is mainly applied to a terminal device with various additional hardware such as an infrared emitter and a depth camera, so as to obtain a human face image through the depth camera to estimate the sight direction and calculate the distance based on the infrared emitter, thereby obtaining the area where the sight line falls on the terminal device.

The inventor finds that the existing sight point prediction method is complex in implementation process, namely, the existing sight point prediction method needs various additional hardware supports such as an infrared emitter and a depth camera, and the prediction result is usually not accurate enough. Therefore, it is an urgent technical problem to provide a method capable of accurately and rapidly predicting a point of sight of a human eye.

Disclosure of Invention

In view of this, the present invention provides a method and an apparatus for establishing a gaze point prediction model, and a gaze point prediction method, so as to improve the efficiency and accuracy of human eye gaze point prediction.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

a sight line point prediction model building method is applied to a processor of terminal equipment, the terminal equipment further comprises a camera and a display, and the method comprises the following steps:

acquiring a plurality of initial images including human faces shot by the camera, and acquiring position coordinates of a human eye sight point in each initial image on the display;

processing each initial image respectively to obtain sample data comprising an eye image, an eye parameter and a human face parameter;

and performing deep network learning on the basis of sample data obtained by processing each initial image and position coordinates of the human eye sight point in each initial image on the display to establish a sight point prediction model.

Optionally, in the method for establishing the gaze point prediction model, the step of processing each initial image to obtain sample data including an eye image, an eye parameter, and a face parameter includes:

carrying out face detection on each initial image to obtain a face image, positioning the five sense organs in the face image to obtain a face five sense organ frame, and obtaining an eye image from the face image based on the eye frame in the face five sense organ frame;

obtaining a first proportion coefficient of the eye image in the initial image, a second proportion coefficient of the face image in the initial image and a correction angle and a correction scale of the face in the face image, and taking the first proportion coefficient, the second proportion coefficient, the correction angle and the correction scale as face parameters;

and acquiring binocular coordinate data of the two eyes in the initial image in the eye image, taking the binocular coordinate data as eye parameters, and taking the eye image, the human face parameters and the eye parameters as sample data.

Optionally, in the above method for establishing a gaze point prediction model, the binocular coordinate data includes left eye coordinate data and right eye coordinate data, and the step of obtaining the binocular coordinate data of the two eyes in the initial image in the eye image includes:

and acquiring position coordinates of an upper eyelid, a lower eyelid, a left eye angle and a right eye angle of a left eye in the eye image in the initial image and calculating an average value to obtain left eye coordinate data, and acquiring position coordinates of the upper eyelid, the lower eyelid, the left eye angle and the right eye angle of the right eye in the eye image in the initial image and calculating an average value to obtain right eye coordinate data.

Optionally, in the above method for establishing a gaze point prediction model, the step of obtaining a turning angle and a turning scale of a face in the face image includes:

obtaining a horizontal coordinate difference value and a vertical coordinate difference value of the left eye and the right eye according to the left eye coordinate data and the right eye coordinate data;

and obtaining the correcting angle and the correcting scale according to the horizontal coordinate difference value and the vertical coordinate difference value.

Optionally, in the method for establishing the sight point prediction model, the step of performing deep network learning based on sample data obtained by processing each initial image and position coordinates of a sight point of a human eye in each initial image on the display to establish the sight point prediction model includes:

and training by adopting a pyrrch frame based on the sample data obtained by processing each initial image and the position coordinates of the human eye sight point in each initial image on the display so as to establish a sight point prediction model.

The application also provides a sight point prediction method, which is applied to a processor of terminal equipment, wherein the terminal equipment further comprises a camera and a display, the processor stores a sight point prediction model established according to the sight point prediction model establishing method, and the sight point prediction method comprises the following steps:

obtaining an image to be detected including a human face, which is shot by the camera;

processing the image to be detected to obtain data to be detected comprising an eye image, eye parameters and face parameters;

and predicting the data to be detected by adopting the sight point prediction model to obtain the coordinates of the target position of the sight point of the human eye in the image to be detected, which corresponds to the display.

Optionally, in the foregoing method for predicting a gaze point, after the step of predicting the data to be detected by using the gaze point prediction model to obtain a target position coordinate of the gaze point of the human eye in the image to be detected, corresponding to the display, the method further includes:

and forming a focusing frame with the pixel point corresponding to the target position coordinate as the center on a display interface of the display according to the resolution of the display, so as to perform processing based on the focusing frame.

The application also provides a device for establishing the sight point prediction model, which is applied to a processor in terminal equipment, the terminal equipment further comprises a camera and a display, and the device comprises:

the image acquisition module is used for acquiring a plurality of initial images including human faces and acquired by the camera, and acquiring position coordinates of the human eye sight line in each initial image, which correspond to the display;

the sample obtaining module is used for respectively processing each initial image to obtain sample data comprising an eye image, an eye parameter and a human face parameter;

and the prediction model obtaining module is used for carrying out deep network learning on the basis of sample data obtained by processing each initial image and the position coordinates of the human eye sight point in each initial image on the display so as to establish a sight point prediction model.

Optionally, in the device for establishing a gaze point prediction model, the sample obtaining module includes:

the detection positioning sub-module is used for carrying out face detection on each initial image to obtain a face image, positioning the five sense organs in the face image to obtain a face five sense organ frame, and obtaining an eye image from the face image based on the eye frame in the face five sense organ frame;

a face parameter obtaining submodule, configured to obtain a first proportion coefficient of the eye image in the initial image, a second proportion coefficient of the face image in the initial image, and a correction angle and a correction scale of a face in the face image, and use the first proportion coefficient, the second proportion coefficient, the correction angle, and the correction scale as face parameters;

and the sample data obtaining submodule is used for obtaining binocular coordinate data of the two eyes in the initial image in the eye image, taking the binocular coordinate data as eye parameters, and taking the eye image, the human face parameters and the eye parameters as sample data.

Optionally, in the device for establishing a gaze point prediction model, the prediction model obtaining module is further configured to train, based on the sample data obtained by processing each initial image and the position coordinates of the gaze point of human eyes in each initial image on the display, by using a pitorch frame to establish the gaze point prediction model.

The invention provides a method and a device for establishing a sight point prediction model and a sight point prediction method.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

Fig. 1 is a connection block diagram of a terminal device according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart of a method for establishing a gaze point prediction model according to an embodiment of the present invention.

Fig. 3 is a schematic flowchart of step S120 in fig. 2.

Fig. 4 is a flowchart illustrating a gaze point prediction method according to an embodiment of the present invention.

Fig. 5 is a connection block diagram of a gaze point prediction model establishment apparatus according to an embodiment of the present invention.

Fig. 6 is a connection block diagram of a sample obtaining module according to an embodiment of the present invention.

Icon: 10-a terminal device; 12-a memory; 14-a processor; 16-a camera; 18-a display; 100-a sight point prediction model establishing device; 110-an image acquisition module; 120-a sample acquisition module; 122-a detection positioning sub-module; 124-an eye data obtaining sub-module; 126-a face parameter obtaining sub-module; 128-sample acquisition submodule; 130-predictive model obtaining module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, unless otherwise expressly specified or limited, the terms "disposed," "connected," and "connected" are to be construed broadly, e.g., as meaning fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in a specific case to those of ordinary skill in the art.

Referring to fig. 1, according to a terminal device 10 provided by the present invention, the terminal device 10 may be a device with image capturing, image displaying and data processing functions, such as a mobile phone, a computer, a tablet computer, and the like, which is not limited herein. The terminal device 10 includes a memory 12, a processor 14, a camera 16, and a display 18.

The memory 12, processor 14, camera 16 and display 18 are electrically connected, directly or indirectly, to one another to enable the transfer or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 12 stores software functional modules stored in the memory 12 in the form of software or Firmware (Firmware), and the processor 14 executes various functional applications and data processing by running software programs and modules stored in the memory 12, such as the gaze point prediction model establishment apparatus 100 in the embodiment of the present invention, so as to implement the gaze point prediction model establishment method and the gaze point prediction method in the embodiment of the present invention.

The Memory 12 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an Electrically Erasable Read-Only Memory (EEPROM), and the like. Wherein the memory 12 is used for storing a program, and the processor 14 executes the program after receiving the execution instruction.

The processor 14 may be an integrated circuit chip having signal processing capabilities. The Processor 14 may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like. But may also be a digital signal processor 14(DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 2, the present invention provides a method for establishing a gaze point prediction model, which can be applied to the processor 14 in the terminal device 10, and the method includes three steps S110 to S130.

Step S110: a plurality of initial images including human faces shot by the camera 16 are obtained, and position coordinates of the human eye sight point in each initial image on the display 18 are obtained.

The position coordinates of the human eye sight point corresponding to the display 18 may be pixel point coordinates of the human eye sight point corresponding to the display 18, or coordinates in a coordinate system established based on a display interface of the display 18, which is not limited specifically herein. It is to be understood that, when the coordinates of the position on the display 18 corresponding to the eye sight point are coordinates in a coordinate system established based on the display interface of the display 18, the coordinate system may be established based on a fixed point on the display 18, such as an origin point of a vertex of a lower left corner, a central position point, a vertex of a lower right corner, or a vertex of a lower left corner, and a horizontal and vertical axis established by a length direction and a width direction of the display 18.

It is to be understood that the plurality of initial images may be facial images captured by the camera 16 when different users are gazing at different locations of the display 18, and that the facial images include portions of both eyes.

Step S120: and processing each initial image respectively to obtain sample data comprising the eye images, the eye parameters and the human face parameters.

The initial image is processed to obtain an eye image, eye parameters and face parameters, and the image is subjected to face detection or face recognition positioning to obtain a face image, and the face parameters are obtained based on the face image. The face parameters may be, but are not limited to, face inversion parameters, face inversion scale, proportion of face area in the initial image area, and/or proportion of face facial features image in the initial image, and the eye parameters may include, but are not limited to, position coordinates of eyes in the initial image and/or proportion of eye image in facial features image in the initial image or in the face image.

In this embodiment, the step S120 includes:

step S122: and carrying out face detection on each initial image to obtain a face image, positioning the five sense organs in the face image to obtain a face five sense organ frame, and obtaining an eye image from the face image based on the eye frame in the face five sense organ frame.

In the present embodiment, the size of each eye diagram obtained is the same for the convenience of the subsequent processing. That is, in the step S122, obtaining the eye image from the face image based on the eye frame in the facial feature frames specifically includes: and obtaining an eye image with a size of a set value from the face image based on the eye frame in the facial five-sense organ frame.

Step S124: and acquiring binocular coordinate data of the two eyes in the initial image in the eye image, and taking the binocular coordinate data as eye parameters.

The above steps may specifically be to obtain the canthus coordinate data of both eyes in the eye image as the eye parameter, or to obtain the central position coordinates of both eyes in the eye image as the eye parameter, which is not specifically limited herein, and may be set according to actual requirements.

In this embodiment, the coordinate data of both eyes includes left-eye coordinate data and right-eye coordinate data, and the step S124 specifically includes:

Step S126: and acquiring a first proportion coefficient of the eye image in the initial image, a second proportion coefficient of the face image in the initial image and a righting angle and a righting scale of the face in the face image, and taking the first proportion coefficient, the second proportion coefficient, the righting angle and the righting scale as face parameters.

Wherein the correction angle and the correction scale can be obtained based on coordinates of cheeks, coordinates of both eyes and/or eyebrow coordinates in the face image.

In this embodiment, the step S126 includes:

and obtaining a horizontal coordinate difference value and a vertical coordinate difference value of the left eye and the right eye according to the left eye coordinate data and the right eye coordinate data.

The specific manner of obtaining the correcting angle according to the abscissa difference value and the ordinate difference value may be as follows: the positive rotation angle is obtained by using the atan2 function, the abscissa difference value and the ordinate difference value. The specific way of obtaining the correction scale according to the abscissa difference value and the ordinate difference value may be: and after the sum of the square of the horizontal coordinate difference value and the square of the vertical coordinate difference value is obtained, squaring to obtain a square value, and dividing the square value by a constant (such as 100) to obtain the correction scale.

Step S128: and taking the eye image, the human face parameters and the eye parameters as sample data.

Step S130: and performing deep network learning on the basis of sample data obtained by processing each initial image and position coordinates of the human eye sight point in each initial image on the display 18 to establish a sight point prediction model.

In step S130, the sample data and the position coordinates corresponding to the multiple initial images may be divided into multiple groups, and deep network learning is performed on each group of sample data and corresponding position coordinates in a batch manner.

In this embodiment, the step S130 includes: training by adopting a pyrrch frame based on the sample data obtained by processing each initial image and the position coordinates of the human eye sight point in each initial image on the display 18 to establish a sight point prediction model.

Through the arrangement, the sight point prediction model is established, the prediction result obtained when the obtained sight point prediction model is adopted to predict the sight point of the human eye is more accurate, and the problems that in the prior art, when the sight point of the human eye is predicted, an infrared emitter and a depth camera are needed to detect the sight point of the human eye, hardware cost is too high, and time consumption is too long are solved.

With reference to fig. 4, on the basis of the foregoing, the present application further provides a gaze point prediction method, where the gaze point prediction method is applied to the terminal device 10, and a processor 14 in the terminal device 10 stores a gaze point prediction model established according to the gaze point prediction model establishment method, where the gaze point prediction method includes:

step S210: and obtaining an image to be detected including a human face, which is shot by the camera 16.

Step S220: and processing the image to be detected to obtain data to be detected comprising the eye image, the eye parameter and the human face parameter.

The manner of processing the image to be detected is similar to the manner of processing the initial image to obtain sample data, and therefore, the detailed description of step S220 may refer to the detailed description of step S120, which is not repeated herein.

Step S230: and predicting the data to be detected by adopting the sight point prediction model to obtain the coordinates of the target position of the sight point of the human eye in the image to be detected, which corresponds to the display 18.

In general, when an eye control operation needs to be performed on the terminal device 10, in order to achieve accuracy of performing the eye control operation, in this embodiment, after step S130 is performed, the method further includes:

and forming a focusing frame with the pixel point corresponding to the target position coordinate as the center on the display interface of the display 18 according to the resolution of the display 18, so as to perform processing based on the focusing frame.

The processing based on the focus frame may be that different operation modes corresponding to different set positions are prestored in the processor 14, and the operation modes may include, but are not limited to, performing operations such as page turning and selection, and the processing based on the focus frame may be based on whether the focus frame is seated in the area and includes the set position, and when the set position is included, the processing is performed according to the operation mode corresponding to the set position to achieve the effects of page turning and selection.

With reference to fig. 5, on the basis of the foregoing, the present invention further provides a gaze point prediction model establishing apparatus 100 applicable to the processor 14 in the terminal device 10, where the gaze point prediction model establishing apparatus 100 includes an image obtaining module 110, a sample obtaining module 120, and a prediction model obtaining module 130.

The image obtaining module 110 is configured to obtain a plurality of initial images including a human face captured by the camera 16, and obtain position coordinates of a human eye sight point in each of the initial images on the display 18. In the present embodiment, the image obtaining module 110 may be configured to perform step S110 shown in fig. 2, and the foregoing description of step S110 may be referred to for specific description of the image obtaining module 110.

The sample obtaining module 120 is configured to process each initial image to obtain sample data including an eye image, an eye parameter, and a face parameter. In this embodiment, the sample obtaining module 120 may be configured to perform step S120 shown in fig. 2, and the detailed description about the sample obtaining module 120 may refer to the foregoing description about step S120.

Referring to fig. 6, in the present embodiment, the sample obtaining module 120 includes a detection positioning sub-module 122, an eye data obtaining sub-module 124, a face parameter obtaining sub-module 126, and a sample obtaining sub-module 128.

The detection positioning sub-module 122 is configured to perform face detection on each initial image to obtain a face image, position facial features in the face image to obtain a facial feature frame, and obtain an eye image from the face image based on an eye frame in the facial feature frame. In this embodiment, the detection positioning sub-module 122 may be configured to execute step S122 shown in fig. 3, and reference may be made to the foregoing description of step S122 for specific description of the detection positioning sub-module 122.

The eye data obtaining sub-module 124 is configured to obtain coordinate data of two eyes in the initial image in the eye map, and use the coordinate data of two eyes as the eye parameter. In the present embodiment, the eye data obtaining sub-module 124 may be configured to perform step S124 shown in fig. 3, and the detailed description of the eye data obtaining sub-module 124 may refer to the description of step S124.

The face parameter obtaining sub-module 126 is configured to obtain a first proportion coefficient of the eye image in the initial image, a second proportion coefficient of the face image in the initial image, and a correction angle and a correction scale of a face in the face image, and use the first proportion coefficient, the second proportion coefficient, the correction angle, and the correction scale as face parameters. In this embodiment, the face parameter obtaining sub-module 126 may be configured to perform step S126 shown in fig. 3, and for the detailed description of the face parameter obtaining sub-module 126, reference may be made to the description of step S126.

The sample obtaining submodule 128 is configured to use the eye image, the face parameters, and the eye parameters as sample data. In this embodiment, the sample obtaining submodule 128 may be configured to perform step S128 shown in fig. 3, and the detailed description about the sample obtaining submodule 128 may refer to the description about step S128.

The prediction model obtaining module 130 is configured to perform deep network learning based on sample data obtained by processing each initial image and a position coordinate of a human eye sight point in each initial image on the display 18 to build a sight point prediction model. In the present embodiment, the prediction model obtaining module 130 may be configured to perform step S130 shown in fig. 2, and the detailed description about the prediction model obtaining module 130 may refer to the foregoing description about step S130.

In this embodiment, the prediction model obtaining module 130 is further configured to train, based on the sample data obtained by processing each initial image and the position coordinates of the human eye sight point in each initial image on the display 18, by using a pitoch frame to establish a sight point prediction model.

In summary, the method for establishing a gaze point prediction model, the device and the gaze point prediction method provided by the present invention obtain a plurality of initial images including faces captured by the camera 16, obtain position coordinates of a human eye gaze point in each initial image corresponding to the display 18, process each initial image to obtain sample data including eye images, eye parameters and face parameters, perform deep network learning based on the sample data obtained by processing each initial image and the position coordinates of the human eye gaze point in each initial image corresponding to the display 18 to establish the gaze point prediction model, so that the prediction result obtained when the obtained gaze point prediction model is used to perform the gaze point prediction of the human eye is more accurate, and avoid the hardware cost that is too high when hardware devices such as an infrared emitter and a depth camera are required to perform the gaze point detection of the human eye in the prior art when the gaze point prediction of the human eye is performed, the time consumption is too long.

In the embodiments provided in the embodiments of the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof, which essentially contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a terminal device 10, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM) 12, a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A sight line point prediction model building method is applied to a processor of a terminal device, the terminal device further comprises a camera and a display, and the method is characterized by comprising the following steps:

acquiring a plurality of initial images which are shot by the camera and comprise human faces, and acquiring position coordinates of human eye sight points in each initial image on the display;

performing deep network learning on the basis of sample data obtained by processing each initial image and position coordinates, corresponding to the human eye sight point in each initial image, on the display to establish a sight point prediction model;

the eye parameters comprise binocular coordinate data, the binocular coordinate data comprise left eye coordinate data and right eye coordinate data, the face parameters comprise a first proportion coefficient of an eye image in an initial image, a second proportion coefficient of a face image in the initial image and a turning angle and a turning scale of a face in the face image, and the turning angle and the turning scale are obtained through the following modes:

and obtaining a horizontal coordinate difference value and a vertical coordinate difference value of the left eye and the right eye according to the left eye coordinate data and the right eye coordinate data, and obtaining the alignment angle and the alignment scale according to the horizontal coordinate difference value and the vertical coordinate difference value.

2. The method for establishing the gaze point prediction model according to claim 1, wherein the step of processing each initial image to obtain sample data including an eye image, eye parameters, and face parameters comprises:

acquiring binocular coordinate data of the two eyes in the initial image in the eye image, and taking the binocular coordinate data as eye parameters;

and taking the eye image, the human face parameters and the eye parameters as sample data.

3. The gaze point prediction model establishment method according to claim 2, wherein the binocular coordinate data comprises left eye coordinate data and right eye coordinate data, and the step of obtaining binocular coordinate data of both eyes in the initial image in the eye map comprises:

4. The method for building a sight point prediction model according to claim 1, wherein the step of performing deep network learning based on sample data obtained by processing each initial image and position coordinates of a human eye sight point in each initial image on the display to build a sight point prediction model comprises:

5. A gaze point prediction method applied to a processor of a terminal device, the terminal device further comprising a camera and a display, wherein the processor stores therein a gaze point prediction model established by the gaze point prediction model establishment method according to any one of claims 1 to 4, the gaze point prediction method comprising:

6. The gaze point prediction method of claim 5, after performing the step of predicting the data to be detected by using the gaze point prediction model to obtain the coordinates of the target position on the display corresponding to the gaze point of the human eye in the image to be detected, the method further comprising:

7. A device for establishing a sight point prediction model is applied to a processor in terminal equipment, and is characterized in that the terminal equipment further comprises a camera and a display, and the device comprises:

the prediction model obtaining module is used for carrying out deep network learning on the basis of sample data obtained by processing each initial image and position coordinates, corresponding to the human eye sight points in each initial image, on the display to establish a sight point prediction model;

the eye parameters comprise binocular coordinate data, the binocular coordinate data comprise left eye coordinate data and right eye coordinate data, the face parameters comprise a first proportion coefficient of the eye image in the initial image, a second proportion coefficient of the face image in the initial image and a correction angle and a correction scale of the face in the face image, and the sample obtaining module is used for obtaining the correction angle and the correction scale in the following modes:

8. The gaze point prediction model creation device of claim 7, wherein the sample acquisition module comprises:

the eye data obtaining submodule is used for obtaining the coordinate data of the two eyes in the initial image in the eye image and taking the coordinate data of the two eyes as eye parameters;

and the sample obtaining submodule is used for taking the eye image, the human face parameters and the eye parameters as sample data.

9. The gaze point prediction model creation apparatus according to claim 7, wherein the prediction model obtaining module is further configured to train with a pitoch frame based on the sample data obtained by processing each of the initial images and the position coordinates of the human eye gaze point in each of the initial images corresponding to the display to create the gaze point prediction model.