CN116012459A - Mouse positioning method based on three-dimensional sight estimation and screen plane estimation - Google Patents

Mouse positioning method based on three-dimensional sight estimation and screen plane estimation Download PDF

Info

Publication number
CN116012459A
CN116012459A CN202211675699.2A CN202211675699A CN116012459A CN 116012459 A CN116012459 A CN 116012459A CN 202211675699 A CN202211675699 A CN 202211675699A CN 116012459 A CN116012459 A CN 116012459A
Authority
CN
China
Prior art keywords
target
camera
image
screen
translation matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211675699.2A
Other languages
Chinese (zh)
Inventor
王孝文
张越一
熊志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Advanced Technology University of Science and Technology of China
Original Assignee
Institute of Advanced Technology University of Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Advanced Technology University of Science and Technology of China filed Critical Institute of Advanced Technology University of Science and Technology of China
Priority to CN202211675699.2A priority Critical patent/CN116012459A/en
Publication of CN116012459A publication Critical patent/CN116012459A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a mouse positioning method based on three-dimensional sight estimation and screen plane estimation, and belongs to the technical field of image processing. The mouse positioning method based on three-dimensional sight estimation and screen plane estimation comprises the following steps: determining a line-of-sight direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera; determining a line-of-sight representation vector of a target face based on the line-of-sight direction vector and coordinates of a target point, wherein the coordinates of the target point are determined by a user depth image and a user color image acquired by a first camera; determining first intersection point coordinates of the sight line representation vector and the screen representation vector based on the sight line representation vector and the screen representation vector of the target screen, wherein the first intersection point coordinates are coordinates under a first coordinate system corresponding to the first camera; determining a second intersection point coordinate based on the first intersection point coordinate and the target rotation translation matrix; and controlling the movement of the mouse in the target screen based on the second intersection point coordinates.

Description

Mouse positioning method based on three-dimensional sight estimation and screen plane estimation
Technical Field
The application belongs to the technical field of image processing, and particularly relates to a mouse positioning method based on three-dimensional sight estimation and screen plane estimation.
Background
The visual estimation technology is a technology for acquiring the current gazing direction of a subject by utilizing various detection means such as machinery, optics, cameras and the like, and is widely applied to a plurality of fields such as man-machine interaction, auxiliary driving, psychological research, virtual reality, military and the like. The existing mouse positioning method based on sight estimation is mainly realized by 3D modeling according to the iris, pupil, purkinje spot and the like of human eyes in pictures, and the method needs a high-resolution camera and an additional light source and has higher requirements on a hardware system; in addition, the accuracy of the geometric relationship obtained from the image is not high, and the positioning precision and accuracy are greatly influenced.
Disclosure of Invention
The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides a mouse positioning method based on three-dimensional sight line estimation and screen plane estimation, which can improve the precision and accuracy of final positioning.
In a first aspect, the present application provides a method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation, the method comprising:
Determining a line-of-sight direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera;
determining a line-of-sight representation vector of a target face based on the line-of-sight direction vector and coordinates of a target point, the coordinates of the target point being determined by a user depth image acquired by the first camera and the user color image;
determining a first intersection point coordinate of the sight line representation vector and the screen representation vector based on the sight line representation vector and the screen representation vector of the target screen, wherein the first intersection point coordinate is a coordinate under a first coordinate system corresponding to the first camera;
determining a second intersection point coordinate based on the first intersection point coordinate and a target rotational translation matrix, wherein the second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen, and the target rotational translation matrix is a rotational translation matrix between the first camera and the target screen;
and controlling the mouse in the target screen to move based on the second intersection point coordinates.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, the sight line direction vector is obtained by collecting the user color image, the sight line expression vector is obtained based on the sight line direction vector and the coordinates of the target point, the first intersection point coordinate is determined based on the sight line direction vector and the screen expression vector, the second intersection point coordinate is determined based on the first intersection point coordinate and the target rotation translation matrix, the mouse in the target screen is controlled to move to the second intersection point coordinate, the sight line coordinate can be predicted through the characteristics of the target face in the user color image and the corresponding user depth image in the actual application process, and higher accuracy and precision are achieved, so that the accuracy of a subsequent positioning result is improved; by establishing the target rotation translation matrix between the first camera and the target screen, the method is suitable for coordinate system conversion under any situation so as to realize accurate positioning, widens the application scene, does not need to set other devices such as additional light sources and the like and specific devices, and has lower positioning cost and easy realization.
The method for positioning a mouse based on three-dimensional sight estimation and screen plane estimation in one embodiment of the application, wherein the method for determining the sight direction vector of a target face in a user color image based on the user color image comprises the following steps:
image segmentation is carried out on the user color image, and at least one of a face image corresponding to the target face and an eye image corresponding to a target human eye in the target face is obtained; the target human eye includes at least one of a left eye or a right eye;
extracting features of at least one of the face image and the eye image to obtain sight features;
the gaze direction vector is determined based on the gaze feature.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, through image segmentation of the color image of the user and obtaining of the eye image, feature extraction of at least one of the face image and the eye image and obtaining of the sight line feature, and then the sight line direction vector is determined based on the sight line feature, the real-time sight line direction of the user can be comprehensively predicted by combining multiple features such as the user head rotation feature and the eye feature when the sight line estimation is carried out, the precision and the accuracy of the sight line positioning are effectively improved, the mouse positioning method is suitable for any user gesture, and the positioning range is widened.
The method for positioning a mouse based on three-dimensional sight line estimation and screen plane estimation according to one embodiment of the present application, wherein the feature extraction is performed on at least one of the face image and the eye image, and sight line features are obtained, and the method comprises the following steps:
extracting features of the facial image to obtain head rotation features;
extracting features of the eye images to obtain eye features;
and fusing the head rotation feature and the eye feature to obtain the sight feature.
According to the mouse positioning method based on three-dimensional sight line estimation and screen plane estimation, the feature extraction is carried out on the face image and the eye image respectively to obtain the head rotation feature and the eye feature, the head rotation feature and the eye feature are fused to obtain the sight line feature, the real-time sight line direction of a user can be predicted according to the head rotation of the user when the sight line estimation is carried out, and the sight line estimation accuracy is further improved.
In one embodiment of the present application, before the determining, based on the user color image, a line-of-sight direction vector of a target face in the user color image, the method for positioning a mouse based on three-dimensional line-of-sight estimation and screen plane estimation further includes:
Acquiring a first rotation translation matrix between the first camera and the second camera, and acquiring a second rotation translation matrix between the second camera and the target screen;
the target rotational translation matrix is determined based on the first rotational translation matrix and the second rotational translation matrix.
According to the method for positioning the mouse based on three-dimensional sight estimation and screen plane estimation in one embodiment of the application, a first rotation translation matrix RT1 and a second rotation translation matrix RT2 are acquired to obtain a target rotation translation matrix R s Thereby establishing a coordinate conversion relation between the first camera and the target screen, and being capable of translating the matrix R based on target rotation in the actual execution process s The predicted first intersection point coordinate is converted into the mouse position coordinate in the target screen, so that the method is suitable for the sight direction vector output by any general model, the model is not required to be trained independently based on a specific application scene, and the method has wide application scenes and high universality.
The method for positioning a mouse based on three-dimensional line-of-sight estimation and screen plane estimation according to one embodiment of the present application, wherein the steps of obtaining a first rotation translation matrix between the first camera and the second camera, and obtaining a second rotation translation matrix between the second camera and the target screen include:
Acquiring a first image corresponding to a first calibration plate acquired by the first camera and a second image corresponding to a second calibration plate acquired by the second camera; the first calibration plate is a first surface of a double-sided calibration plate, the second calibration plate is a second surface of the double-sided calibration plate, and the fields of view of the first camera and the second camera are opposite;
determining a third rotational translation matrix between the first camera and the first calibration plate based on the first image;
determining a fourth rotational translation matrix between the second camera and the second calibration plate based on the second image;
acquiring a first origin position of the first calibration plate based on the first image;
acquiring a second origin position of the second calibration plate based on the second image;
determining a fifth rotational translation matrix between the first calibration plate and the second calibration plate based on the first origin position, the second origin position, and the thickness of the double-sided calibration plate;
the first rotational translation matrix is determined based on the third rotational translation matrix, the fourth rotational translation matrix, and the fifth rotational translation matrix.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, the position relation between the first camera and the second camera can be calibrated by determining the first rotation translation matrix RT1 based on the third rotation translation matrix, the fourth rotation translation matrix and the fifth rotation translation matrix, so that the coordinate conversion relation between the first camera and the target screen can be conveniently established subsequently, the mouse position coordinate in the target screen can be obtained, the mouse positioning method is suitable for sight direction vectors output by any general model, independent training models based on specific application scenes are not needed, and the mouse positioning method has wide application scenes and high universality.
The method for obtaining a second rotation translation matrix between the second camera and the target screen based on three-dimensional line-of-sight estimation and screen plane estimation according to one embodiment of the application comprises the following steps:
acquiring a third image corresponding to a third calibration plate acquired by the second camera, wherein the third calibration plate is a calibration plate displayed by the target screen;
based on the third image, the second rotational translation matrix is determined.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, the position relationship between the second camera and the target screen can be calibrated by acquiring the third image corresponding to the third calibration plate acquired by the second camera and determining the second rotation translation matrix RT2 based on the third image, so that the coordinate conversion relationship between the first camera and the target screen can be conveniently established subsequently, the mouse position coordinate in the target screen can be obtained, the mouse positioning method is suitable for sight direction vectors output by any general model, the model is not required to be trained independently based on specific application scenes, and the mouse positioning method has wide application scenes and high universality.
The method for positioning a mouse based on three-dimensional sight estimation and screen plane estimation in one embodiment of the application, wherein the method for determining the sight direction vector of a target face in a user color image based on the user color image comprises the following steps:
Performing feature recognition on the user color image to obtain at least one face image;
and determining the largest face image in the at least one face image as the target face.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, at least one face image is obtained through feature recognition on the user color image, the largest face image in the at least one face image is determined to be the target face, the target face can be accurately screened out under the condition that a plurality of faces exist in the acquired user color image, interference of a background environment on a follow-up sight prediction result is eliminated, and therefore accuracy of the prediction result is improved.
The method for positioning a mouse based on three-dimensional sight estimation and screen plane estimation in one embodiment of the application, wherein the method for determining the sight direction vector of a target face in a user color image based on the user color image comprises the following steps:
inputting the user color image into a target neural network, and obtaining the sight direction vector output by the target neural network;
the target neural network is obtained by training by taking a sample user color image as a sample and taking a sample sight direction vector corresponding to the sample user color image as a sample label.
According to the mouse positioning method based on three-dimensional sight line estimation and screen plane estimation, the user color image is input into the target neural network trained in advance to obtain the sight line direction vector output by the target neural network, the data can be directly obtained after the pre-training is carried out before the use in practical application, and the calculation efficiency is high and the accuracy is good; the target neural network has strong learning capability, and data in each application process can be used as training data in the next training process, so that the precision and accuracy of the model are improved, the use of a user is facilitated, the positioning range is widened, the universality is higher, and the precision of final sight estimation is improved.
In a second aspect, the present application provides an apparatus for mouse positioning based on three-dimensional gaze estimation and screen plane estimation, the apparatus comprising:
the first processing module is used for determining a sight line direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera;
the second processing module is used for determining a sight line representation vector of a target face based on the sight line direction vector and coordinates of a target point, wherein the coordinates of the target point are determined by a user depth image acquired by the first camera and the user color image;
The third processing module is used for determining first intersection point coordinates of the sight line representation vector and the screen representation vector of the target screen based on the sight line representation vector and the screen representation vector, wherein the first intersection point coordinates are coordinates under a first coordinate system corresponding to the first camera;
the fourth processing module is used for determining a second intersection point coordinate based on the first intersection point coordinate and a target rotation translation matrix, wherein the second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen, and the target rotation translation matrix is a rotation translation matrix between the first camera and the target screen;
and a fifth processing module, configured to control movement of the mouse in the target screen based on the second intersection point coordinate.
According to the mouse positioning device based on three-dimensional sight estimation and screen plane estimation, the sight line direction vector is obtained by collecting the user color image, the sight line expression vector is obtained based on the sight line direction vector and the coordinates of the target point, the first intersection point coordinate is determined based on the sight line expression vector and the screen expression vector, the second intersection point coordinate is determined based on the first intersection point coordinate and the target rotation translation matrix, the mouse in the target screen is controlled to move to the second intersection point coordinate, the sight line coordinate can be predicted through the characteristics of the target face in the user color image and the corresponding user depth image in the actual application process, and higher accuracy and precision are achieved, so that the accuracy of a subsequent positioning result is improved; by establishing the target rotation translation matrix between the first camera and the target screen, the method is suitable for coordinate system conversion under any situation so as to realize accurate positioning, widens the application scene, does not need to set other devices such as additional light sources and the like and specific devices, and has lower positioning cost and easy realization.
In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation as described in the first aspect above when executing the computer program.
In a fourth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation as described in the first aspect above.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation as described in the first aspect above.
The above technical solutions in the embodiments of the present application have at least one of the following technical effects:
the method comprises the steps of determining a sight line direction vector by collecting a user color image to obtain the sight line direction vector, obtaining a sight line representation vector based on the sight line direction vector and coordinates of a target point, determining a first intersection point coordinate based on the sight line representation vector and a screen representation vector, determining a second intersection point coordinate based on the first intersection point coordinate and a target rotation translation matrix, and controlling a mouse in a target screen to move to the second intersection point coordinate, wherein the sight line coordinate can be predicted through characteristics of a target face in the user color image and corresponding user depth images in the actual application process, so that higher accuracy and precision are realized, and the accuracy of a follow-up positioning result is improved; by establishing the target rotation translation matrix between the first camera and the target screen, the method is suitable for coordinate system conversion under any situation so as to realize accurate positioning, widens the application scene, does not need to set other devices such as additional light sources and the like and specific devices, and has lower positioning cost and easy realization.
Further, by performing image segmentation on the color image of the user and obtaining an eye image, extracting at least one of the face image and the eye image, obtaining the sight line feature, and determining the sight line direction vector based on the sight line feature, the real-time sight line direction of the user can be comprehensively predicted by combining multiple features such as the user head rotation feature and the eye feature when the sight line estimation is performed, the precision and the accuracy of sight line positioning are effectively improved, the method is suitable for any user gesture, and the positioning range is widened.
Further, the target rotational translation matrix R is obtained by acquiring a first rotational translation matrix RT1 and a second rotational translation matrix RT2 s Thereby establishing a coordinate conversion relation between the first camera and the target screen, and being capable of translating the matrix R based on target rotation in the actual execution process s The predicted first intersection point coordinate is converted into the mouse position coordinate in the target screen, so that the method is suitable for the sight direction vector output by any general model, the model is not required to be trained independently based on a specific application scene, and the method has wide application scenes and high universality.
Still further, the user color image is input into the pre-trained target neural network to obtain the sight direction vector output by the target neural network, and the data can be directly obtained after the pre-training is carried out before the user color image is used in practical application, so that the calculation efficiency is high and the accuracy is good; the learning capability of the target neural network is strong, and the data in each application process can be used as training data in the next training process, so that the precision and accuracy of the model are improved, the model is convenient for users to use, the positioning range is widened, the universality is higher, and the precision of final sight estimation is improved
Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, wherein:
FIG. 1 is one of the flow diagrams of a method for mouse positioning based on three-dimensional gaze estimation and screen plane estimation provided by embodiments of the present application;
FIG. 2 is a second flow chart of a method for mouse positioning based on three-dimensional gaze estimation and screen plane estimation provided by an embodiment of the present application;
FIG. 3 is a third flow chart of a method for mouse positioning based on three-dimensional gaze estimation and screen plane estimation provided by embodiments of the present application;
FIG. 4 is a schematic diagram of a method for mouse positioning based on three-dimensional gaze estimation and screen plane estimation according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a second method for mouse positioning based on three-dimensional gaze estimation and screen plane estimation provided by an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a mouse positioning device based on three-dimensional gaze estimation and screen plane estimation provided by an embodiment of the present application;
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation of the present application is described below in conjunction with fig. 1 to 5.
It should be noted that, the main body of execution of the mouse positioning method based on the three-dimensional line-of-sight estimation and the screen plane estimation may be a server, or may be a device for mouse positioning based on the three-dimensional line-of-sight estimation and the screen plane estimation, or may also be a terminal of a user, including but not limited to a mobile terminal and a non-mobile terminal.
For example, mobile terminals include, but are not limited to, cell phones, PDA smart terminals, tablet computers, vehicle-mounted smart terminals, and the like; non-mobile terminals include, but are not limited to, PC-side and the like.
As shown in fig. 1, the method for positioning a mouse based on three-dimensional sight line estimation and screen plane estimation comprises the following steps: step 110, step 120, step 130 and step 140.
Step 110, determining a line-of-sight direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera.
In this step, the first camera may be an RGB-D camera.
A user color image is captured by a first camera.
A user color image is an image that contains user features such as facial features or eye features.
The user color image may include one face or may include a plurality of faces, which is not limited in this application.
The target face is the face used for line-of-sight estimation in the user color image.
In the case where the user color image includes a plurality of faces, the target face may be determined based on a random algorithm; the target face may be determined either by user input to determine the face selected by the user as the target face, or may be determined based on a system default algorithm, or may also be determined based on the following.
In some embodiments, step 110 may include:
performing feature recognition on the color image of the user to obtain at least one face image;
and determining the largest face image in the at least one face image as a target face.
In this step, as shown in fig. 2, when a plurality of faces exist in the acquired user color image, face feature recognition may be performed on the user color image to obtain at least part of face features contained in the user color image, and region images (i.e., face images) where the faces are located are obtained by respectively dividing based on the face features, then the pixel region sizes of the face images are compared, and the face image with the largest pixel region is determined as the target face, so as to eliminate interference of the background environment on the subsequent sight prediction result, thereby improving accuracy of the prediction result.
For example, pre-selected frames with different sizes and different aspect ratios can be generated based on the color images of the users, and the memory ratio of the pre-selected frames is 10 k-100 k;
calculating the class score of each pre-selected frame, and correcting the pre-selected frames;
filtering out preselected boxes with category scores below a target threshold based on the category scores; thereby determining a preselection frame corresponding to the face;
for a pre-selected frame including a plurality of faces, performing non-maximal suppression (NMS) de-duplication on the pre-selected frame to determine a pre-selected frame corresponding to the target face, thereby determining the target face.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, at least one face image is obtained through feature recognition on the user color image, the largest face image in the at least one face image is determined to be the target face, the target face can be accurately screened out under the condition that a plurality of faces exist in the acquired user color image, interference of a background environment on a follow-up sight prediction result is eliminated, and therefore accuracy of the prediction result is improved.
The gaze direction vector is used to characterize the gaze direction of the eyes in the target face in the user color image.
In the actual execution process, the first camera can be used for shooting the image of the user in front of the target screen in real time, and the color image of the user at each moment is acquired.
For example, the gaze direction vector in the target face may be determined based on the user color image based on modeling or employing a deep learning method, which is not limited in this application.
In some embodiments, step 110 may further comprise:
image segmentation is carried out on the color image of the user, and at least one of a face image corresponding to the target face and an eye image corresponding to the target human eye in the target face is obtained; the target human eye comprises at least one of a left eye or a right eye;
extracting features of at least one of the face image and the eye image to obtain sight features;
based on the gaze characteristics, a gaze direction vector is determined.
In this embodiment, the face image is an image including a target face.
The eye image is an image including the target human eye.
The target human eye may include a left eye, or may include a right eye, or may also include a left eye and a right eye.
The line-of-sight feature is derived based on at least one of facial image features and eye image features.
When the user color image is subjected to image segmentation, a face image corresponding to a target face can be acquired, or an eye image corresponding to a target eye in the target face can be acquired, or a face image corresponding to the target face and an eye image corresponding to the target eye in the target face can be acquired, which is not limited in the application.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, provided by the embodiment of the application, the user color image is subjected to image segmentation to obtain the eye image, at least one of the face image and the eye image is subjected to feature extraction to obtain the sight feature, and then the sight direction vector is determined based on the sight feature, so that the real-time sight direction of the user can be comprehensively predicted by combining multiple features such as the user head rotation feature and the eye feature when the sight estimation is performed, the precision and the accuracy of sight positioning are effectively improved, the mouse positioning method is suitable for any user gesture, and the positioning range is widened.
Referring now to fig. 3, in some embodiments, feature extraction of at least one of a facial image and an eye image to obtain a gaze feature may include:
extracting features of the facial image to obtain head rotation features;
extracting features of the eye images to obtain eye features;
and fusing the head rotation characteristic and the eye characteristic to obtain the sight line characteristic.
In this embodiment, the head rotation feature is a feature extraction based on the face image, and is used to characterize the head rotation condition of the target face.
The eye features are obtained by extracting features based on eye images and are used for representing the interaction condition of the left eye and the right eye of a user.
The line-of-sight feature is derived based on at least one of facial image features and eye image features.
And when the head rotation characteristic and the eye characteristic are fused, the related information is fused, and redundant information is removed, so that the sight line characteristic is obtained.
The inventor finds that in the research and development process, a method for determining the direction of the line of sight by performing 3D modeling on the iris, the pupil and the like in the image based on a traditional geometric algorithm exists in the related technology, but the method needs a high-resolution camera and an additional light source, has limitation on the rotation angle of the head of a user, and has high cost and poor accuracy of a prediction result.
In the method, the head rotation characteristics are extracted to determine the sight line characteristics, so that the influence of the rotation angle of the head on the direction of the sight line is effectively considered, and the precision of sight line estimation is improved;
the eye characteristics of the left eye and the right eye are extracted, the viewing directions of the left eye and the right eye are combined with the characteristic of correlation, the viewing directions of the two eyes are determined together, and the accuracy of the viewing estimation is improved by improving the correlation between the left eye characteristics and the right eye characteristics;
In addition, the head rotation feature is used as guide information to obtain more accurate eye features, and then the head rotation feature and the eye features are fused to obtain the sight line feature, so that the accuracy and the accuracy of sight line estimation can be further improved, an additional light source is not required to be arranged, the requirement on hardware is low, and the cost is low.
According to the mouse positioning method based on three-dimensional vision estimation and screen plane estimation, the feature extraction is carried out on the face image and the eye image respectively to obtain the head rotation feature and the eye feature, the head rotation feature and the eye feature are fused to obtain the vision feature, the real-time vision direction of the user can be predicted according to the head rotation of the user when the vision estimation is carried out, and the vision estimation precision is further improved.
In some embodiments, prior to step 110, the method may further comprise: normalization (Normalization) is performed on the acquired user color image.
In the actual implementation process, the gesture of the user and the distance between the user and the first camera are uncontrollable, and the physical movement of the camera is simulated in an image transformation mode by carrying out Normalization processing on the color image of the user, so that the transformed color image of the user is close to the image shot after the physical movement of the first camera.
And 120, determining a sight line representation vector of the target face based on the sight line direction vector and coordinates of a target point, wherein the coordinates of the target point are determined by the user depth image acquired by the first camera and the user color image.
In this step, the line-of-sight representation vector is obtained based on the line-of-sight direction vector and the coordinates of the target point.
The target point is the point on the ray where the user's line of sight is located.
For example, the target point may be a pupil of the user and the gaze-representative vector may be determined based on the gaze-direction vector and coordinates at the pupil of the user. After the sight line expression vector is obtained, the intersection point coordinate between the sight line expression vector and the screen expression vector is calculated, and the first intersection point coordinate can be determined.
And 130, determining a first intersection point coordinate of the sight line representation vector and the screen representation vector based on the sight line representation vector and the screen representation vector of the target screen, wherein the first intersection point coordinate is a coordinate under a first coordinate system corresponding to the first camera.
In this step, the target screen is a screen at the time of human-computer interaction with the user, that is, a screen facing the user.
The screen representation vector is used to characterize the location of the plane in which the target screen lies.
The first coordinate system is a coordinate system taking the first camera as a reference system.
The first intersection point coordinates are coordinates in a first coordinate system corresponding to the first camera, and are obtained based on intersection points of the sight line representation vector and the screen representation vector.
It will be appreciated that during human interaction, the intersection point may appear as a user control point for the screen, such as may correspond to a mouse in the screen.
And 140, determining a second intersection point coordinate based on the first intersection point coordinate and the target rotation translation matrix, wherein the second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen.
In this step, the target rotates and translates the matrix R s Is the rotational translational moment between the first camera and the target screen.
The second coordinate system is a coordinate system taking the target screen as a reference system.
The second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen and is obtained based on the first intersection point coordinate and the target rotation translation matrix.
The inventor also finds that in the research and development process, a method for directly obtaining the position mapping relation between eyes and a screen to be seen in a user image based on deep learning exists in the related technology, but the method needs to be based on specific equipment, and under the condition that acquired data are different, an independent model needs to be retrained, so that the universality is low, and the steps are complicated.
In the application, the position conversion relation between the first camera and the target screen is established by establishing the target rotation translation matrix, in the actual use process, the acquired first intersection point coordinate can be converted into the second intersection point coordinate corresponding to the current actual target screen only by modifying the target rotation translation matrix based on the actual situation, specific equipment is not needed, an independent training model is not needed, the model universality is realized, the universality is strong, the implementation is easy, and the cost is low.
And 150, controlling the movement of the mouse in the target screen based on the second intersection point coordinates.
In this step, the second intersection point coordinate is the position of the mouse. In an actual implementation process, a first camera acquires a user image, wherein the user image comprises a user color image and a user depth image, and a sight direction vector of a target face in the user color image is determined based on the user color image.
The gaze direction vector may be expressed as:
(g x ,g y ,g z ),
wherein g x Is the abscissa of the line of sight direction vector g y Is the ordinate of the line-of-sight direction vector, g z Is the vertical coordinate of the gaze direction vector.
Determining the pupil position through face feature point detection, and then obtaining x-axis coordinates and y-axis coordinates of the pupil under a first camera coordinate system; after the user depth image and the user color image are aligned, the depth at the pupil (i.e., the z-axis coordinate of the pupil in the first camera coordinate system) is obtained based on the user depth image, and then the three-dimensional coordinate of the pupil in the RGB-D camera coordinate system (i.e., the first coordinate system corresponding to the first camera) is obtained as (x) 0 ,y 0 ,z 0 );
Based on the gaze direction vector and the three-dimensional coordinates (x 0 ,y 0 ,z 0 ) The line-of-sight representation vector may be represented as:
Figure BDA0004017011200000121
wherein x, y and z are variables, x 0 Is the abscissa of the three-dimensional coordinate at the pupil, y 0 Is the ordinate, z, of the three-dimensional coordinates at the pupil 0 Vertical coordinate, g, being the three-dimensional coordinate at the pupil x Is the abscissa of the line of sight direction vector g y Is the ordinate of the line-of-sight direction vector, g z Is the vertical coordinate of the gaze direction vector.
The normal vector of the target screen is the z-axis of the coordinate system of the target screen, then the normal vector of the target screen can be expressed as:
n=R s [:,2]=(n x ,n y ,n z ),
wherein n is the normal vector of the target screen, R s For the target rotation translation matrix, N x Is the abscissa of the normal vector of the target screen, n y Is the ordinate of the normal vector of the target screen, n z Is the vertical coordinate of the normal vector of the target screen.
Taking a point T in the plane of the target screen s =[t x ,t y ,t z ]From points of a planeThe French equation, the equation for the plane in which the target screen lies (i.e., the screen representation vector for the target screen) can be expressed as:
n x x+n y y+n z z=n x t x +n y t y +n z t z
wherein x, y and z are variables, n x Is the abscissa of the normal vector of the target screen, n y Is the ordinate of the normal vector of the target screen, n z Is the vertical coordinate of the normal vector of the target screen, t x For point T s Is t y For point T s Is t z For point T s Is a vertical coordinate of (c).
As shown in FIG. 4, the expression of the line of sight and the equation of the plane of the target screen are solved in a combined way to obtain a first intersection point coordinate, and then the first intersection point coordinate and the target rotation translation matrix R are based s And determining a second intersection point coordinate, and finally, based on the second intersection point coordinate, obtaining the mouse movement authority and moving the mouse to the second intersection point coordinate.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, the sight line direction vector is obtained by collecting the user color image, the sight line expression vector is obtained based on the sight line direction vector and the coordinates of the target point, the first intersection point coordinate is determined based on the sight line expression vector and the screen expression vector, the second intersection point coordinate is determined based on the first intersection point coordinate and the target rotation translation matrix, the mouse in the target screen is controlled to move to the second intersection point coordinate, the sight line coordinate can be predicted through the characteristics of the target face in the user color image and the corresponding user depth image in the actual application process, and higher accuracy and precision are achieved, so that the accuracy of a subsequent positioning result is improved; by establishing the target rotation translation matrix between the first camera and the target screen, the method is suitable for coordinate system conversion under any situation so as to realize accurate positioning, widens the application scene, does not need to set other devices such as additional light sources and the like and specific devices, and has lower positioning cost and easy realization.
The manner of determining the target rotational translation matrix is described below.
In some embodiments, prior to step 110, the method may further comprise:
acquiring a first rotation translation matrix between the first camera and the second camera, and acquiring a second rotation translation matrix between the second camera and the target screen;
a target rotational translation matrix is determined based on the first rotational translation matrix and the second rotational translation matrix.
In this embodiment, the first rotational translation matrix RT1 is used to convert coordinate systems corresponding to the first camera and the second camera.
The second rotation translation matrix RT2 is used for converting a coordinate system corresponding to the second camera and the target screen.
Target rotation translation matrix R s For a rotational translation matrix between the first camera and the target screen for converting a coordinate system between the first camera and the target screen, in practical application, the rotational translation matrix R is based on the target s And determining the position of the mouse.
In the actual implementation process, the target rotation translation matrix R s This can be obtained from the following formula:
R s =RT1*RT2
wherein R is s For the target rotational translation matrix, RT1 is the first rotational translation matrix and RT2 is the second rotational translation matrix.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation provided by the embodiment of the application, the first rotation translation matrix RT1 and the second rotation translation matrix RT2 are obtained to obtain the target rotation translation matrix R s Thereby establishing a coordinate conversion relation between the first camera and the target screen, and being capable of translating the matrix R based on target rotation in the actual execution process s The predicted first intersection point coordinate is converted into the mouse position coordinate in the target screen, so that the method is suitable for the sight line representation vector output by any general model, does not need to train the model independently based on a specific application scene, and has wide application scenes and high universality.
Referring now to fig. 5, in some embodiments, acquiring a first rotational translation matrix between a first camera and a second camera and acquiring a second rotational translation matrix between the second camera and a target screen may include:
acquiring a first image corresponding to a first calibration plate acquired by a first camera and a second image corresponding to a second calibration plate acquired by a second camera; the first calibration plate is a first surface of the double-sided calibration plate, the second calibration plate is a second surface of the double-sided calibration plate, and the view fields of the first camera and the second camera are opposite;
determining a third rotational translation matrix between the first camera and the first calibration plate based on the first image;
determining a fourth rotational translation matrix between the second camera and the second calibration plate based on the second image;
Acquiring a first origin position of a first calibration plate based on the first image;
acquiring a second origin position of a second calibration plate based on the second image;
determining a fifth rotational translation matrix between the first calibration plate and the second calibration plate based on the first origin position, the second origin position and the thickness of the double-sided calibration plate;
the first rotational translation matrix is determined based on the third rotational translation matrix, the fourth rotational translation matrix, and the fifth rotational translation matrix.
In this embodiment, the first calibration plate is a first side of a double sided calibration plate.
The second calibration plate is a second surface of the double-surface calibration plate.
The first calibration plate and the second calibration plate may be considered to be placed approximately in parallel.
The first image is an image corresponding to a first calibration plate acquired by the first camera.
The second image is an image corresponding to a second calibration plate acquired by the second camera.
The double-sided calibration plate is placed between the first camera and the second camera.
Third rotation translation matrix
Figure BDA0004017011200000141
For converting the coordinate relationship between the first camera and the first calibration plate.
Fourth rotation translation matrix
Figure BDA0004017011200000142
For converting the coordinate relationship between the second camera and the second calibration plate.
The first origin position is a center position of the first calibration plate, and can be obtained based on calibration results of the first camera and the first calibration plate.
The second origin position is a center position of the second calibration plate and can be obtained based on calibration results of the second camera and the second calibration plate.
Fifth rotation translation matrix
Figure BDA0004017011200000143
The method is obtained based on the first origin position, the second origin position and the thickness of the double-sided calibration plate and is used for converting the coordinate relationship between the first calibration plate and the second calibration plate.
First rotation translation matrix
Figure BDA0004017011200000144
For a third rotational translation matrix->
Figure BDA0004017011200000145
Fourth rotational translation matrix->
Figure BDA0004017011200000146
And a fifth rotational translation matrix->
Figure BDA0004017011200000147
And the obtained coordinate relation is used for converting the coordinate relation between the first camera and the second camera.
In the actual implementation process, the first camera is used to capture the first calibration plate to obtain a first image, and based on the first image, a Zhang Zhengyou calibration method can be used to obtain a third rotational translation matrix between the first camera and the first calibration plate
Figure BDA0004017011200000148
Then a second camera is used to shoot a second calibration plate to obtain a second image, and a Zhang Zhengyou calibration method can be used to obtain a fourth rotary translation matrix between the second camera and the second calibration plate based on the second image
Figure BDA0004017011200000149
Based on the calibration results of the first camera and the first calibration plate, a first origin position of the first calibration plate is obtained; acquiring a second origin position of the second calibration plate based on calibration results of the second camera and the second calibration plate; based on the first origin position, the second origin position and the thickness of the double-sided calibration plate, a fifth rotary translation matrix between the first calibration plate and the second calibration plate is obtained
Figure BDA00040170112000001410
The first rotational translation matrix RT1 between the first camera and the second camera can be obtained by the following formula:
Figure BDA00040170112000001411
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00040170112000001412
for a rotational translation matrix between the second camera and the first calibration plate +.>
Figure BDA00040170112000001413
For the fourth rotational translation matrix,>
Figure BDA0004017011200000151
for the fifth rotational translation matrix, RT1 is the first rotational translation matrix, +.>
Figure BDA0004017011200000152
Is a third rotary flatAnd (5) matrix shifting.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, the position relation between the first camera and the second camera can be calibrated by determining the first rotation translation matrix RT1 based on the third rotation translation matrix, the fourth rotation translation matrix and the fifth rotation translation matrix, so that the coordinate conversion relation between the first camera and the target screen can be conveniently established subsequently, the mouse position coordinate in the target screen can be obtained, the mouse positioning method is suitable for sight direction vectors output by any general model, independent training models based on specific application scenes are not needed, and the mouse positioning method has wide application scenes and high universality.
In some embodiments, acquiring a second rotational translation matrix between a second camera and a target screen may include:
acquiring a third image corresponding to a third calibration plate acquired by a second camera, wherein the third calibration plate is a calibration plate displayed by a target screen;
A second rotational translation matrix is determined based on the third image.
In this embodiment, the third image is an image corresponding to the third calibration plate acquired by the second camera.
The third calibration plate is a calibration plate displayed on the target screen, such as a checkerboard displayed on the target screen, as shown in fig. 5.
The second rotation translation matrix RT2 is obtained based on the third image and is used for converting the coordinate relationship between the second camera and the target screen.
In an actual implementation, the third calibration plate is photographed using the second camera to obtain a third image, and based on the third image, the second rotational-translational matrix RT2 between the second camera and the target screen may be obtained using the Zhang Zhengyou calibration method.
According to the mouse positioning method based on three-dimensional sight estimation and screen plane estimation, the position relationship between the second camera and the target screen can be calibrated by acquiring the third image corresponding to the third calibration plate acquired by the second camera and determining the second rotation translation matrix RT2 based on the third image, so that the coordinate conversion relationship between the first camera and the target screen can be conveniently established subsequently, the mouse position coordinate in the target screen can be obtained, the mouse positioning method is suitable for sight direction vectors output by any general model, the model is not required to be trained independently based on specific application scenes, and the mouse positioning method has wide application scenes and high universality.
In the actual implementation process, the step 110 may be implemented by using a neural network model, and the implementation manner of the step 110 is described below by taking a target neural network as an example.
In some embodiments, step 110 may further comprise:
inputting the color image of the user into a target neural network, and obtaining a sight direction vector output by the target neural network;
the target neural network is obtained by training by taking a sample user color image as a sample and taking a sample sight direction vector corresponding to the sample user color image as a sample label.
In this embodiment, the target neural network may include an image segmentation layer, a feature extraction layer, and a fusion layer connected in sequence.
As shown in fig. 3, in the actual implementation process, the image segmentation layer is configured to segment the color image of the user into a face image and an eye image, and specifically implemented as follows: inputting the user color image into a target neural network to obtain face feature points, and carrying out image segmentation on the user color image based on the face feature points to obtain a face image and an eye image.
The feature extraction layer is used for extracting features of the face image and the eye image, and the specific implementation method comprises the following steps: inputting the facial image into a facial feature extraction submodule to extract facial feature information, wherein the facial feature extraction submodule comprises a plurality of Conv2d/BatchNorm/ReLU blocks; the eye image is input into an eye feature extraction submodule to extract eye feature information, which may include left eye feature information and right eye feature information, wherein the eye feature extraction submodule includes a Conv2d/Batchnorm/ReLU block and a residual network.
The fusion layer is used for fusing facial feature information and eye feature information, and the specific implementation method comprises the following steps: the facial feature information and the eye feature information are input to a feature fusion sub-module to fuse the facial feature information and the eye feature information.
In some embodiments, the neural network may further include a pooling layer, a linear layer, and an activation layer.
The pooling layer is used for compressing the characteristic information and removing redundant information.
The linear layer is used for carrying out linear transformation on the characteristic information output by the previous layer.
The activation layer is used for carrying out nonlinear transformation on the characteristic information.
In the actual execution process, the feature fusion sub-module is used for splicing the feature information together, the pooling layer is used for removing the redundant information, and the linear layer and the activating layer are used for outputting the processed facial feature information and the processed eye feature information.
Meanwhile, the left eye and right eye characteristic information can be fused through an attention mechanism so as to pay more attention to the correlation between the left eye and right eye characteristic information.
And finally, respectively inputting the fused eye characteristic information and the eye characteristic information into the respective linear layers of the left eye and the right eye so as to predict the respective sight line direction vectors of the left eye and the right eye.
In addition, the target neural network is trained by taking a sample user color image as a sample and taking a sample sight direction vector corresponding to the sample user color image as a sample label.
In the actual execution process, a color image of a sample user is input, an output value which is as close to a sample label as possible is output through a plurality of neurons, and under the condition that the error between the output value and the sample label is large, the weight of the neurons is modified until the error between the output value and the sample label is within a target range.
In the application process, the user color image can be input into a pre-trained target neural network to obtain the sight direction vector output by the target neural network.
According to the mouse positioning method based on three-dimensional sight line estimation and screen plane estimation, the user color image is input into the target neural network trained in advance to obtain the sight line direction vector output by the target neural network, the data can be directly obtained after the pre-training is carried out before the use in practical application, and the calculation efficiency is high and the accuracy is good; the target neural network has strong learning capability, and data in each application process can be used as training data in the next training process, so that the precision and accuracy of the model are improved, the use of a user is facilitated, the positioning range is widened, the universality is higher, and the precision of final sight estimation is improved.
The following specifically describes a flow of the mouse positioning method based on three-dimensional line-of-sight estimation and screen plane estimation in the practical application process, as shown in fig. 2.
Firstly, collecting a user color image by using a first camera;
sending the user color image into a target neural network, judging whether a face exists in the user color image, judging whether a plurality of faces exist in the user color image under the condition that the face exists in the user color image, and determining the largest face image as the target face under the condition that the plurality of faces exist in the user color image;
under the condition that no face exists in the user color image, the user color image is acquired again;
then, the position of a face in the color image of the user, namely the coordinates of a face frame, is obtained;
image segmentation is carried out on the color image of the user, and a face image and an eye image are obtained;
performing Normalization (Normalization) on the user color image;
inputting the color image of the user into a target neural network, namely a sight line estimation model, so as to acquire a sight line direction vector output by the target neural network; obtaining a gaze-representative vector based on the gaze-direction vector and the three-dimensional coordinates at the pupil; then determining a first intersection point coordinate with the target screen based on the sight line representation vector;
Converting the first intersection point coordinate based on the target rotation translation matrix to obtain a second intersection point coordinate;
and based on the second intersection point coordinates, acquiring the movement authority of the mouse and moving the mouse to the corresponding position.
The device for positioning a mouse based on three-dimensional line-of-sight estimation and screen plane estimation provided by the application is described below, and the device for positioning a mouse based on three-dimensional line-of-sight estimation and screen plane estimation described below and the method for positioning a mouse based on three-dimensional line-of-sight estimation and screen plane estimation described above can be referred to correspondingly each other.
The method for positioning the mouse based on the three-dimensional sight line estimation and the screen plane estimation provided by the embodiment of the application, the execution subject can be a device for positioning the mouse based on the three-dimensional sight line estimation and the screen plane estimation. In the embodiment of the application, a method for executing mouse positioning based on three-dimensional line-of-sight estimation and screen plane estimation by using the mouse positioning device based on three-dimensional line-of-sight estimation and screen plane estimation as an example is described.
The embodiment of the application also provides a device for positioning the mouse based on the three-dimensional sight line estimation and the screen plane estimation.
As shown in fig. 6, the apparatus for positioning a mouse based on three-dimensional line-of-sight estimation and screen plane estimation includes: a first processing module 610, a second processing module 620, a third processing module 630, a fourth processing module 640, and a fifth processing module 650.
In this embodiment, the first processing module 610 is configured to determine a line-of-sight direction vector of a target face in a user color image based on the user color image, where the user color image is acquired by the first camera;
the second processing module 620 determines a line of sight representation vector of the target face based on the line of sight direction vector and coordinates of a target point, the coordinates of the target point being determined by the user depth image and the user color image acquired by the first camera;
the third processing module 630 is configured to determine, based on the line-of-sight representation vector and the screen representation vector of the target screen, a first intersection point coordinate of the line-of-sight representation vector and the screen representation vector, where the first intersection point coordinate is a coordinate in a first coordinate system corresponding to the first camera;
a fourth processing module 640, configured to determine a second intersection point coordinate based on the first intersection point coordinate and the target rotational translation matrix, where the second intersection point coordinate is a coordinate in a second coordinate system corresponding to the target screen, and the target rotational translation matrix is a rotational translation matrix between the first camera and the target screen;
And a fifth processing module 650 for controlling the movement of the mouse in the target screen based on the second intersection point coordinates.
According to the mouse positioning device based on three-dimensional sight estimation and screen plane estimation, the sight line direction vector is obtained by collecting the user color image, the sight line expression vector is obtained based on the sight line direction vector and the coordinates of the target point, the first intersection point coordinate is determined based on the sight line expression vector and the screen expression vector, the second intersection point coordinate is determined based on the first intersection point coordinate and the target rotation translation matrix, the mouse in the target screen is controlled to move to the second intersection point coordinate, the sight line coordinate can be predicted through the characteristics of the target face in the user color image and the corresponding user depth image in the actual application process, and higher accuracy and precision are achieved, so that the accuracy of a subsequent positioning result is improved; by establishing the target rotation translation matrix between the first camera and the target screen, the method is suitable for coordinate system conversion under any situation so as to realize accurate positioning, widens the application scene, does not need to set other devices such as additional light sources and the like and specific devices, and has lower positioning cost and easy realization.
In some embodiments, the first processing module 610 may also be configured to:
image segmentation is carried out on the color image of the user, and at least one of a face image corresponding to the target face and an eye image corresponding to the target human eye in the target face is obtained; the target human eye comprises at least one of a left eye or a right eye;
extracting features of at least one of the face image and the eye image to obtain sight features;
based on the gaze characteristics, a gaze direction vector is determined.
According to the mouse positioning device based on three-dimensional sight estimation and screen plane estimation, provided by the embodiment of the application, the user color image is subjected to image segmentation, the eye image is obtained, at least one of the face image and the eye image is subjected to feature extraction, the sight line feature is obtained, then the sight line direction vector is determined based on the sight line feature, and the user head rotation feature, the eye feature and other various features can be combined when the sight line estimation is performed, so that the real-time sight line direction of the user can be comprehensively predicted, the precision and the accuracy of sight line positioning are effectively improved, the mouse positioning device is suitable for any user gesture, and the positioning range is widened.
In some embodiments, the apparatus may further include a sixth processing module configured to perform feature extraction on the facial image to obtain a head rotation feature;
Extracting features of the eye images to obtain eye features;
and fusing the head rotation characteristic and the eye characteristic to obtain the sight line characteristic.
According to the mouse positioning device based on three-dimensional vision estimation and screen plane estimation, which is provided by the embodiment of the application, the feature extraction is carried out on the face image and the eye image respectively to obtain the head rotation feature and the eye feature, the head rotation feature and the eye feature are fused to obtain the vision feature, the real-time vision direction of a user can be predicted according to the head rotation of the user when the vision estimation is carried out, and the vision estimation precision is further improved.
In some embodiments, the apparatus may further include a seventh processing module to obtain a first rotational translation matrix between the first camera and the second camera and obtain a second rotational translation matrix between the second camera and the target screen;
a target rotational translation matrix is determined based on the first rotational translation matrix and the second rotational translation matrix.
According to the mouse positioning device based on three-dimensional sight estimation and screen plane estimation, the target rotation translation matrix is obtained by acquiring the first rotation translation matrix RT1 and the second rotation translation matrix RT2 R s Thereby establishing a coordinate conversion relation between the first camera and the target screen, and being capable of translating the matrix R based on target rotation in the actual execution process s The predicted first intersection point coordinate is converted into the mouse position coordinate in the target screen, so that the method is suitable for the sight direction vector output by any general model, the model is not required to be trained independently based on a specific application scene, and the method has wide application scenes and high universality.
In some embodiments, the apparatus may further include an eighth processing module configured to acquire a first image corresponding to the first calibration plate acquired by the first camera and a second image corresponding to the second calibration plate acquired by the second camera; the first calibration plate is a first surface of the double-sided calibration plate, the second calibration plate is a second surface of the double-sided calibration plate, and the view fields of the first camera and the second camera are opposite;
determining a third rotational translation matrix between the first camera and the first calibration plate based on the first image;
determining a fourth rotational translation matrix between the second camera and the second calibration plate based on the second image;
acquiring a first origin position of a first calibration plate based on the first image;
acquiring a second origin position of a second calibration plate based on the second image;
Determining a fifth rotational translation matrix between the first calibration plate and the second calibration plate based on the first origin position, the second origin position and the thickness of the double-sided calibration plate;
the first rotational translation matrix is determined based on the third rotational translation matrix, the fourth rotational translation matrix, and the fifth rotational translation matrix.
According to the mouse positioning device based on three-dimensional sight estimation and screen plane estimation, the position relation between the first camera and the second camera can be calibrated by determining the first rotation translation matrix RT1 based on the third rotation translation matrix, the fourth rotation translation matrix and the fifth rotation translation matrix, so that the coordinate conversion relation between the first camera and the target screen can be conveniently established subsequently, the mouse position coordinate in the target screen can be obtained, the mouse positioning device is suitable for sight direction vectors output by any general model, independent training models based on specific application scenes are not needed, and the mouse positioning device has wide application scenes and high universality.
In some embodiments, the apparatus may further include a ninth processing module, configured to acquire a third image corresponding to a third calibration plate acquired by the second camera, where the third calibration plate is a calibration plate displayed by the target screen;
A second rotational translation matrix is determined based on the third image.
According to the mouse positioning device based on three-dimensional sight estimation and screen plane estimation, the position relationship between the second camera and the target screen can be calibrated by acquiring the third image corresponding to the third calibration plate acquired by the second camera and determining the second rotation translation matrix RT2 based on the third image, so that the coordinate conversion relationship between the first camera and the target screen can be conveniently established subsequently, the mouse position coordinate in the target screen can be obtained, the mouse positioning device is suitable for sight direction vectors output by any general model, the model is not required to be trained independently based on specific application scenes, and the mouse positioning device has wide application scenes and high universality.
In some embodiments, the first processing module 610 may also be configured to:
performing feature recognition on the color image of the user to obtain at least one face image;
and determining the largest face image in the at least one face image as a target face.
According to the mouse positioning device based on three-dimensional sight estimation and screen plane estimation, at least one face image is obtained through feature recognition on the user color image, and the largest face image in the at least one face image is determined to be the target face, so that the target face can be accurately screened out under the condition that a plurality of faces exist in the acquired user color image, interference of a background environment on a follow-up sight prediction result is eliminated, and accuracy of the prediction result is improved.
In some embodiments, the first processing module 610 may also be configured to:
inputting the color image of the user into a target neural network, and obtaining a sight direction vector output by the target neural network;
the target neural network is obtained by training by taking a sample user color image as a sample and taking a sample sight direction vector corresponding to the sample user color image as a sample label.
According to the mouse positioning device based on three-dimensional sight line estimation and screen plane estimation, the user color image is input into the target neural network trained in advance to obtain the sight line direction vector output by the target neural network, the data can be directly obtained after the pre-training is carried out before the use in practical application, and the computing efficiency is high and the accuracy is good; the target neural network has strong learning capability, and data in each application process can be used as training data in the next training process, so that the precision and accuracy of the model are improved, the use of a user is facilitated, the positioning range is widened, the universality is higher, and the precision of final sight estimation is improved.
The mouse positioning device based on the three-dimensional sight line estimation and the screen plane estimation in the embodiment of the application can be electronic equipment, and can also be a component in the electronic equipment, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.
The mouse positioning device based on the three-dimensional line-of-sight estimation and the screen plane estimation in the embodiment of the application may be a device with an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.
The mouse positioning device based on three-dimensional line-of-sight estimation and screen plane estimation provided in the embodiments of the present application can implement each process implemented by the method embodiments of fig. 1 to 5, and in order to avoid repetition, a detailed description is omitted here.
Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation, the method comprising: determining a line-of-sight direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera; determining a line-of-sight representation vector of a target face based on the line-of-sight direction vector and coordinates of a target point, wherein the coordinates of the target point are determined by a user depth image and a user color image acquired by a first camera; determining first intersection point coordinates of the sight line representation vector and the screen representation vector based on the sight line representation vector and the screen representation vector of the target screen, wherein the first intersection point coordinates are coordinates under a first coordinate system corresponding to the first camera; determining a second intersection point coordinate based on the first intersection point coordinate and the target rotary translation matrix, wherein the second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen, and the target rotary translation matrix is a rotary translation matrix between the first camera and the target screen; and controlling the movement of the mouse in the target screen based on the second intersection point coordinates.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present application also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation provided by the methods described above, the method comprising: determining a line-of-sight direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera; determining a line-of-sight representation vector of a target face based on the line-of-sight direction vector and coordinates of a target point, wherein the coordinates of the target point are determined by a user depth image and a user color image acquired by a first camera; determining first intersection point coordinates of the sight line representation vector and the screen representation vector based on the sight line representation vector and the screen representation vector of the target screen, wherein the first intersection point coordinates are coordinates under a first coordinate system corresponding to the first camera; determining a second intersection point coordinate based on the first intersection point coordinate and the target rotary translation matrix, wherein the second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen, and the target rotary translation matrix is a rotary translation matrix between the first camera and the target screen; and controlling the movement of the mouse in the target screen based on the second intersection point coordinates.
In yet another aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation provided by the methods described above, the method comprising: determining a line-of-sight direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera; determining a line-of-sight representation vector of a target face based on the line-of-sight direction vector and coordinates of a target point, wherein the coordinates of the target point are determined by a user depth image and a user color image acquired by a first camera; determining first intersection point coordinates of the sight line representation vector and the screen representation vector based on the sight line representation vector and the screen representation vector of the target screen, wherein the first intersection point coordinates are coordinates under a first coordinate system corresponding to the first camera; determining a second intersection point coordinate based on the first intersection point coordinate and the target rotary translation matrix, wherein the second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen, and the target rotary translation matrix is a rotary translation matrix between the first camera and the target screen; and controlling the movement of the mouse in the target screen based on the second intersection point coordinates.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation, comprising:
determining a line-of-sight direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera;
determining a line-of-sight representation vector of a target face based on the line-of-sight direction vector and coordinates of a target point, the coordinates of the target point being determined by a user depth image acquired by the first camera and the user color image;
determining a first intersection point coordinate of the sight line representation vector and the screen representation vector based on the sight line representation vector and the screen representation vector of the target screen, wherein the first intersection point coordinate is a coordinate under a first coordinate system corresponding to the first camera;
Determining a second intersection point coordinate based on the first intersection point coordinate and a target rotational translation matrix, wherein the second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen, and the target rotational translation matrix is a rotational translation matrix between the first camera and the target screen;
and controlling the mouse in the target screen to move based on the second intersection point coordinates.
2. The method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation of claim 1, wherein said determining a gaze direction vector of a target face in a user color image based on the user color image comprises:
image segmentation is carried out on the user color image, and at least one of a face image corresponding to the target face and an eye image corresponding to a target human eye in the target face is obtained; the target human eye includes at least one of a left eye or a right eye;
extracting features of at least one of the face image and the eye image to obtain sight features;
the gaze direction vector is determined based on the gaze feature.
3. The method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation of claim 2, wherein said feature extracting at least one of said facial image and said eye image, obtaining gaze features, comprises:
Extracting features of the facial image to obtain head rotation features;
extracting features of the eye images to obtain eye features;
and fusing the head rotation feature and the eye feature to obtain the sight feature.
4. A method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation according to any of claims 1-3, wherein prior to said determining a gaze direction vector of a target face in a user color image based on said user color image, the method further comprises:
acquiring a first rotation translation matrix between the first camera and the second camera, and acquiring a second rotation translation matrix between the second camera and the target screen;
the target rotational translation matrix is determined based on the first rotational translation matrix and the second rotational translation matrix.
5. The method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation of claim 4, wherein the obtaining a first rotational translation matrix between the first camera and a second camera and obtaining a second rotational translation matrix between the second camera and the target screen comprises:
Acquiring a first image corresponding to a first calibration plate acquired by the first camera and a second image corresponding to a second calibration plate acquired by the second camera; the first calibration plate is a first surface of a double-sided calibration plate, the second calibration plate is a second surface of the double-sided calibration plate, and the fields of view of the first camera and the second camera are opposite;
determining a third rotational translation matrix between the first camera and the first calibration plate based on the first image;
determining a fourth rotational translation matrix between the second camera and the second calibration plate based on the second image;
acquiring a first origin position of the first calibration plate based on the first image;
acquiring a second origin position of the second calibration plate based on the second image;
determining a fifth rotational translation matrix between the first calibration plate and the second calibration plate based on the first origin position, the second origin position, and the thickness of the double-sided calibration plate;
the first rotational translation matrix is determined based on the third rotational translation matrix, the fourth rotational translation matrix, and the fifth rotational translation matrix.
6. The method of three-dimensional gaze estimation and screen plane estimation based mouse positioning of claim 4, wherein said obtaining a second rotational translation matrix between said second camera and said target screen comprises:
Acquiring a third image corresponding to a third calibration plate acquired by the second camera, wherein the third calibration plate is a calibration plate displayed by the target screen;
based on the third image, the second rotational translation matrix is determined.
7. A method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation according to any of claims 1-3, wherein said determining a gaze direction vector of a target face in a user color image based on said user color image comprises:
performing feature recognition on the user color image to obtain at least one face image;
and determining the largest face image in the at least one face image as the target face.
8. A method of mouse positioning based on three-dimensional gaze estimation and screen plane estimation according to any of claims 1-3, wherein said determining a gaze direction vector of a target face in a user color image based on said user color image comprises:
inputting the user color image into a target neural network, and obtaining the sight direction vector output by the target neural network;
the target neural network is obtained by training by taking a sample user color image as a sample and taking a sample sight direction vector corresponding to the sample user color image as a sample label.
9. An apparatus for mouse positioning based on three-dimensional gaze estimation and screen plane estimation, comprising:
the first processing module is used for determining a sight line direction vector of a target face in a user color image based on the user color image, wherein the user color image is acquired by a first camera;
the second processing module is used for determining a sight line representation vector of a target face based on the sight line direction vector and coordinates of a target point, wherein the coordinates of the target point are determined by a user depth image acquired by the first camera and the user color image;
the third processing module is used for determining first intersection point coordinates of the sight line representation vector and the screen representation vector of the target screen based on the sight line representation vector and the screen representation vector, wherein the first intersection point coordinates are coordinates under a first coordinate system corresponding to the first camera;
the fourth processing module is used for determining a second intersection point coordinate based on the first intersection point coordinate and a target rotation translation matrix, wherein the second intersection point coordinate is a coordinate under a second coordinate system corresponding to the target screen, and the target rotation translation matrix is a rotation translation matrix between the first camera and the target screen;
And a fifth processing module, configured to control movement of the mouse in the target screen based on the second intersection point coordinate.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of three-dimensional gaze estimation and screen plane estimation based mouse positioning of any of claims 1-8 when the program is executed.
CN202211675699.2A 2022-12-26 2022-12-26 Mouse positioning method based on three-dimensional sight estimation and screen plane estimation Pending CN116012459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211675699.2A CN116012459A (en) 2022-12-26 2022-12-26 Mouse positioning method based on three-dimensional sight estimation and screen plane estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211675699.2A CN116012459A (en) 2022-12-26 2022-12-26 Mouse positioning method based on three-dimensional sight estimation and screen plane estimation

Publications (1)

Publication Number Publication Date
CN116012459A true CN116012459A (en) 2023-04-25

Family

ID=86020389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211675699.2A Pending CN116012459A (en) 2022-12-26 2022-12-26 Mouse positioning method based on three-dimensional sight estimation and screen plane estimation

Country Status (1)

Country Link
CN (1) CN116012459A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117912086A (en) * 2024-03-19 2024-04-19 中国科学技术大学 Face recognition method, system, equipment and medium based on broadcast-cut effect driving
CN117912086B (en) * 2024-03-19 2024-05-31 中国科学技术大学 Face recognition method, system, equipment and medium based on broadcast-cut effect driving

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117912086A (en) * 2024-03-19 2024-04-19 中国科学技术大学 Face recognition method, system, equipment and medium based on broadcast-cut effect driving
CN117912086B (en) * 2024-03-19 2024-05-31 中国科学技术大学 Face recognition method, system, equipment and medium based on broadcast-cut effect driving

Similar Documents

Publication Publication Date Title
US11632537B2 (en) Method and apparatus for obtaining binocular panoramic image, and storage medium
CN108229277B (en) Gesture recognition method, gesture control method, multilayer neural network training method, device and electronic equipment
CN107545302B (en) Eye direction calculation method for combination of left eye image and right eye image of human eye
Itoh et al. Interaction-free calibration for optical see-through head-mounted displays based on 3d eye localization
WO2020103700A1 (en) Image recognition method based on micro facial expressions, apparatus and related device
CN109359514B (en) DeskVR-oriented gesture tracking and recognition combined strategy method
CN107105333A (en) A kind of VR net casts exchange method and device based on Eye Tracking Technique
WO2014187223A1 (en) Method and apparatus for identifying facial features
CN113366491B (en) Eyeball tracking method, device and storage medium
CN109559332B (en) Sight tracking method combining bidirectional LSTM and Itracker
CN108305321B (en) Three-dimensional human hand 3D skeleton model real-time reconstruction method and device based on binocular color imaging system
CN111209811B (en) Method and system for detecting eyeball attention position in real time
CN114120432A (en) Online learning attention tracking method based on sight estimation and application thereof
CN111046734A (en) Multi-modal fusion sight line estimation method based on expansion convolution
CN113642393A (en) Attention mechanism-based multi-feature fusion sight line estimation method
CN114333046A (en) Dance action scoring method, device, equipment and storage medium
Perra et al. Adaptive eye-camera calibration for head-worn devices
Kang et al. Real-time eye tracking for bare and sunglasses-wearing faces for augmented reality 3D head-up displays
CN113903210A (en) Virtual reality simulation driving method, device, equipment and storage medium
CN107659772A (en) 3D rendering generation method, device and electronic equipment
CN115049819A (en) Watching region identification method and device
CN114898447B (en) Personalized fixation point detection method and device based on self-attention mechanism
CN116012459A (en) Mouse positioning method based on three-dimensional sight estimation and screen plane estimation
CN115841602A (en) Construction method and device of three-dimensional attitude estimation data set based on multiple visual angles
CN112099330B (en) Holographic human body reconstruction method based on external camera and wearable display control equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination