CN111353429A - Interest degree method and system based on eyeball turning - Google Patents

Interest degree method and system based on eyeball turning Download PDF

Info

Publication number
CN111353429A
CN111353429A CN202010128432.6A CN202010128432A CN111353429A CN 111353429 A CN111353429 A CN 111353429A CN 202010128432 A CN202010128432 A CN 202010128432A CN 111353429 A CN111353429 A CN 111353429A
Authority
CN
China
Prior art keywords
eyeball
characteristic
video
target
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010128432.6A
Other languages
Chinese (zh)
Inventor
卢宁
徐国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010128432.6A priority Critical patent/CN111353429A/en
Publication of CN111353429A publication Critical patent/CN111353429A/en
Priority to PCT/CN2021/071261 priority patent/WO2021169642A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements

Abstract

The invention discloses a video-based eyeball turning determination method, which comprises the following steps: acquiring a target video, wherein the target video is a video of a target product watched by a target user; inputting the target video into an eyeball turning characteristic identification model to obtain an eyeball turning characteristic queue matrix; and determining a target steering angle of the target user based on the eyeball steering characteristic queue matrix. The invention also discloses a video-based eyeball steering determination system. According to the invention, the eyeball turning characteristic queue matrix is obtained by inputting the target video into the eyeball turning characteristic identification model, and then the target turning angle and the target turning time of the corresponding target product are obtained by the eyeball turning characteristic queue matrix, so that the accuracy of eyeball tracking is improved.

Description

Interest degree method and system based on eyeball turning
Technical Field
The embodiment of the invention relates to the technical field of computer vision, in particular to a method and a system for determining eyeball turning based on video.
Technical Field
Eye tracking has long been used to study the visual attention of individuals, and the most common eye tracking technique is Pupil Central Corneal Reflex (PCCR). The PCCR technology is based on the principle that a light source irradiates on a pupil to form highly visible reflected images captured by a camera of a physical tracking device, the images are used for determining the reflection conditions of the light source in a cornea and the pupil, and finally, the direction watched by human eyes is obtained by calculating the vector included angle formed by the reflection of the cornea and the pupil and other geometric characteristics. However, the scheme has large correlation with the light source, a plurality of interference factors and inaccurate identification.
The current visual application in artificial intelligence is mainly based on image processing, or is an application that video is decomposed into pictures of one frame and one frame, and is essentially based on a single-frame image. The relation between the videos is not correlated, and the correlation and continuity between the pictures cannot be reflected. And when the eyeball is tracked, the accuracy is not enough.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and a system for determining eyeball steering based on video, so as to improve the accuracy of eyeball tracking.
In order to achieve the above object, an embodiment of the present invention provides a method for determining eyeball steering based on a video, including:
acquiring a target video, wherein the target video is a video of a target product watched by a target user;
performing eyeball feature labeling on the target video to obtain a labeled video;
inputting the marked video into an eyeball turning characteristic recognition model, wherein the eyeball turning characteristic recognition model comprises an eyeball characteristic extraction layer, a frame relation processing layer and an eyeball turning action recognition layer;
converting each frame of image of the marked video into a characteristic matrix through the eyeball characteristic extraction layer, and inputting the characteristic matrix corresponding to each frame of image into the frame relation processing layer;
the frame relation processing layer sorts the feature matrixes of each frame of image according to the video time points corresponding to the feature matrixes to obtain feature queues, and the feature queues are input to the eyeball turning action recognition layer;
and the eyeball turning action recognition layer performs characteristic fusion on the characteristic queue to obtain an eyeball turning characteristic queue matrix, and determines the target turning angle of the target user based on the eyeball turning characteristic queue matrix.
Further, after determining the target steering angle of the target user based on the eyeball steering characteristic queue matrix, the method further includes:
acquiring a video time point corresponding to the eyeball turning characteristic queue matrix;
and calculating the distance from the video time point corresponding to the first characteristic matrix to the video time point corresponding to the last characteristic matrix in the eyeball turning characteristic queue matrix as eyeball turning time.
Further, the performing eyeball feature labeling on the target video to obtain a labeled video includes:
identifying eyeball characteristics of each frame of image in the target video;
and performing frame selection on the area where the eyeball characteristics are located through the marking frame to obtain a marked video.
Further, the converting each frame of image of the annotated video into a feature matrix by the eyeball feature extraction layer comprises:
determining eyeball key points of each frame of image of the annotated video, wherein the eyeball key points comprise 128 key points or 256 key points;
acquiring pixel point coordinates of eyeball key points of each frame of image;
and establishing a characteristic matrix according to the eyeball key points of each frame of image, wherein the characteristic matrix comprises 128 or 256 pixel point coordinates.
Further, the performing feature fusion on the feature queue by the eyeball turning action recognition layer to obtain an eyeball turning feature queue matrix includes:
calculating the differential image characteristics of the adjacent frame images to judge whether the eyeball characteristics corresponding to the adjacent frame images are the same or not;
if the characteristic matrixes are the same, the characteristic matrix corresponding to one frame of image is reserved, and the other same characteristic matrix is deleted from the characteristic queue until the characteristic matrixes in the characteristic queue are different, so that a target characteristic queue is obtained;
aligning the eyeball steering feature array matrix
The characteristic matrixes in the sequence are combined to obtain the eyeball turning characteristic queue matrix.
Further, calculating a difference image feature of the target image of the adjacent frame to determine whether eyeball features corresponding to the target image of the adjacent frame are the same includes:
acquiring pixel point coordinates of adjacent frame images;
carrying out differential operation on the pixel point coordinates of the adjacent frame images to obtain differential image characteristics;
and comparing the differential image characteristics with a preset binarization threshold value to judge whether the eyeball characteristics corresponding to the target images of adjacent frames are the same.
Further, determining the target steering angle of the target user based on the eye-steering feature queue matrix comprises:
marking the position of a product in the target video by taking the central position of the eyeball of the target user as an origin;
and calculating a matrix value of the eyeball turning characteristic queue matrix to obtain a target turning angle.
In order to achieve the above object, an embodiment of the present invention further provides a video-based eyeball steering determination system, including:
the acquisition module is used for acquiring a target video, wherein the target video is a video of a target product watched by a target user;
the labeling module is used for labeling eyeball characteristics of the target video to obtain a labeled video;
the input module is used for inputting the marked video into an eyeball turning characteristic identification model, wherein the eyeball turning characteristic identification model comprises an eyeball characteristic extraction layer, a frame relation processing layer and an eyeball turning action identification layer;
the conversion module is used for converting each frame of image of the annotated video into a feature matrix through the eyeball feature extraction layer and inputting the feature matrix corresponding to each frame of image into the frame relation processing layer;
the characteristic sorting module is used for sorting the characteristic matrix of each frame of image by the frame relation processing layer according to the video time point corresponding to the characteristic matrix to obtain a characteristic queue and inputting the characteristic queue to the eyeball turning action identification layer;
and the characteristic fusion and output module is used for performing characteristic fusion on the characteristic queue by the eyeball turning action recognition layer to obtain an eyeball turning characteristic queue matrix, and determining the target turning angle of the target user based on the eyeball turning characteristic queue matrix.
To achieve the above object, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the computer program, when executed by the processor, implements the steps of the video-based eye turning determination method as described above.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, where the computer program is executable by at least one processor to cause the at least one processor to execute the steps of the video-based eyeball turning determination method as described above.
According to the method and the system for determining the eyeball turning direction based on the video, provided by the embodiment of the invention, the target video is marked to obtain the marked video, the marked video is input into the eyeball turning characteristic identification model to obtain the eyeball turning characteristic queue matrix, and then the target turning angle of the corresponding target user is obtained based on the eyeball turning characteristic queue matrix, so that the accuracy of eyeball tracking is improved.
Drawings
Fig. 1 is a flowchart of a first embodiment of a video-based eyeball-direction determination method according to the present invention.
Fig. 2 is a flowchart of step S102 in fig. 1 according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating step S106 in FIG. 1 according to an embodiment of the present invention.
Fig. 4 is a flowchart of step S110 in fig. 1 according to an embodiment of the present invention.
Fig. 5 is a flowchart of step S110A in fig. 4 according to the embodiment of the present invention.
FIG. 6 is a flowchart illustrating another embodiment of step S110 in FIG. 1 according to the present invention.
Fig. 7 is a flowchart of step S111 and step S112 according to an embodiment of the invention.
Fig. 8 is a schematic diagram of program modules of a second embodiment of a video-based eye-turning determination system according to the present invention.
Fig. 9 is a schematic diagram of a hardware structure of a third embodiment of the computer apparatus according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart illustrating steps of a video-based eye-turning determination method according to a first embodiment of the present invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is made by way of example with the computer device 2 as the execution subject. The details are as follows.
Step S100, a target video is obtained, wherein the target video is the video of a target product watched by a target user.
Specifically, the process that the target user watches the target product is shot through the camera to obtain a target video, and the target video is transmitted to the computer device 2 for processing.
And S102, performing eyeball feature annotation on the target video to obtain an annotated video.
Specifically, each frame of image of the target video is subjected to image segmentation, object detection, image annotation and other processing, so as to obtain an annotated video.
Exemplarily, referring to fig. 2, step S102 further includes:
step S102A, identifying eyeball features of each frame of image in the target video.
Specifically, eyeball features of each frame of image in the target video are identified through eyeball key point detection.
Step S102B, selecting the area where the eyeball characteristics are located through the marking frame to obtain a marked video.
Specifically, the region corresponding to the eyeball key point of each frame of video is selected through the labeling frame, and the labeled video is obtained. And labeling the orientation of the eyeball to acquire a turning motion area of the eyeball in the target video.
And step S104, inputting the marked video into an eyeball turning characteristic identification model, wherein the eyeball turning characteristic identification model comprises an eyeball characteristic extraction layer, a frame relation processing layer and an eyeball turning action identification layer.
Specifically, the eyeball turning characteristic identification model is a pre-trained model and is used for analyzing the marked video and obtaining an eyeball turning characteristic queue matrix. Pre-training an eyeball turning characteristic recognition model based on a deep learning network model:
acquiring a large amount of sample video data, and identifying each frame of sample eyeball characteristic area in each sample video data to obtain a sample image; labeling the sample images according to the time sequence to obtain labeled sample images; inputting the marked sample image into a deep neural network, and extracting a sample characteristic vector of the marked sample image from a CNN convolution layer of the deep neural network; calculating the difference between the marked sample images of the adjacent frames by pixel processing of the sample characteristic vectors to obtain a difference value; deleting the same sample image according to the difference value to obtain a feature queue; and outputting an eyeball steering characteristic queue matrix obtained based on the characteristic queue through the full-connection output layer. The feature extraction method includes, but is not limited to, a facial feature extraction algorithm based on a deep neural network and an eyeball turning feature extraction algorithm based on geometric features.
Illustratively, the eyeball feature extraction layer is used for extracting eyeball features of a target user from each frame of image of the target video and converting the eyeball features into a feature matrix;
the frame relation processing layer is used for determining the frame relation between each frame of image with eyeball characteristics according to the video time point of each frame of image of the target video; and
and the eyeball turning action recognition layer is used for determining an eyeball turning characteristic queue matrix of the target user according to the frame relation and the characteristic matrix.
And step S106, converting each frame of image of the annotated video into a feature matrix through the eyeball feature extraction layer, and inputting the feature matrix corresponding to each frame of image into the frame relation processing layer.
Specifically, the eyeball feature extraction layer splits the target video into each frame of image, and extracts eyeball turning from each frame of image to obtain the respective corresponding features of each frame of image. The eyeball features are composed of a plurality of key points, and can be feature matrixes composed of 128 or 512 key points.
Exemplarily, referring to fig. 3, step S106 further includes:
step S106A, determining eyeball key points of each frame of image of the annotation video, where the eyeball key points include 128 key points or 256 key points.
Specifically, the eyeball feature extraction layer splits the annotation video into each frame of image, and extracts eyeball turning features from each frame of image to obtain a feature matrix corresponding to each frame of image. The eyeball feature is composed of a plurality of eyeball key points, and can be 128 key points or 512 key points.
Step S106B, obtaining coordinates of pixel points of the eyeball key points of each frame of image.
Specifically, pixel point coordinates of key points of each eyeball are obtained, each frame of image is subjected to graying processing to obtain a two-dimensional gray image, and the two-dimensional gray image is converted into a two-dimensional coordinate.
Step S106C, establishing a feature matrix according to the eyeball key points of each frame of image, wherein the feature matrix comprises 128 or 256 pixel point coordinates.
Specifically, the coordinates of the pixel points are sorted to obtain a feature matrix in the form of 128 rows or 256 rows and 2 columns.
And step S108, the frame relation processing layer sorts the feature matrixes of each frame of image according to the video time points corresponding to the feature matrixes to obtain feature queues, and the feature queues are input to the eyeball turning action recognition layer.
Specifically, the frame relation processing layer calculates a corresponding feature matrix of adjacent video time points to determine whether to process the frame image. The frame relation processing layer carries out difference operation on two adjacent frames of images to obtain difference image characteristics, and obtains a movement route of eyeball turning through difference image characteristic analysis, namely, when the difference image characteristics of the two adjacent frames of images are changed to be kept unchanged, the eyeball turning is shown to finish turning movement; when the difference image characteristics of two adjacent frame images change from unchanged, the eyeball begins to perform eyeball turning motion at the moment, and a characteristic queue at the moment is obtained. And arranging the feature matrixes of each frame of image according to the sequence of video time points to obtain a feature queue, so that the subsequent calculation is facilitated. And taking the characteristic queue as a frame relation between the respective corresponding characteristics of each frame of image.
Step S110, the eyeball turning action recognition layer performs feature fusion on the feature queue to obtain an eyeball turning feature queue matrix, and determines the target turning angle of the target user based on the eyeball turning feature queue matrix.
Specifically, the eyeball turning characteristic layer performs duplication checking processing on the characteristic queue, deletes the same characteristics in the queue to obtain target characteristic queues with different eyeball characteristics, and combines arrays of the target characteristic queues according to a time sequence to obtain an eyeball turning characteristic queue matrix.
Exemplarily, referring to fig. 4, step S110 further includes:
in step S110A, difference image features of adjacent frame images are calculated to determine whether corresponding eyeball features of the adjacent frame images are the same.
Specifically, the difference between the feature matrices of the adjacent frame images is calculated through difference operation, and the difference image features are obtained. When the difference image characteristics of two adjacent frames of images are changed to be kept unchanged, indicating that the eyeball turns to complete turning movement at the moment; when the difference image characteristic of two adjacent frame images changes from unchanged to changed, the eyeball begins to perform eyeball turning motion at the moment.
Step S110B, if the feature matrixes are the same, retaining the feature matrix corresponding to one of the frames of images, and deleting another same feature matrix from the feature queue until the feature matrixes in the feature queue are all different, thereby obtaining a target feature queue.
Specifically, if the features are the same, a feature matrix corresponding to the eyeball features of one frame is reserved, and the reserved features may be the next or previous features. If the retained eyeball characteristic is the last same eyeball characteristic, indicating that the characteristic queue comprises the turn-around time; if the retained eye feature is not the last identical feature, it indicates that the feature queue does not include turnaround time. When the feature matrixes in the feature queue are different, namely eyeball turning directions are different, the multi-frame image corresponding to the feature queue is represented as an eyeball motion area.
Step S110C, combining the feature matrices in the target feature queue to obtain the eyeball turning feature queue matrix.
Specifically, the target feature queue comprises a target steering angle, the target steering angle corresponds to the feature queue, and feature matrices of the feature queue are combined according to a time sequence to obtain an eyeball steering feature queue matrix.
Exemplarily, referring to fig. 5, the step S110A further includes:
step S110a1, obtaining coordinates of pixel points of adjacent frame images.
Specifically, the current first frame image is set to be F _ k (x, y), the second frame image is set to be F _ (k-1) (x, y), and (x, y) are the coordinates of the pixel points in each frame image.
Step S110a2, performing difference operation on the pixel coordinates of the adjacent frame images to obtain a difference image feature.
Specifically, see the difference operation formula D _ k (x, y) ═ F _ k (x, y) -F _ (k-1) (x, y) | for calculation; d _ k (x, y) is a difference image feature.
Step S110a3, comparing the difference image features with a preset binarization threshold to determine whether the eyeball features corresponding to the target images of adjacent frames are the same.
Specifically, the differential image features are compared with a preset binarization threshold value by a formula | D _ k (x, y) | > T, and if the differential image features are larger than the preset binarization threshold value, the differential image features are different, and if the differential image features are smaller than the preset binarization threshold value, the differential image features are the same.
Exemplarily, referring to fig. 6, step S110 further includes:
step S1101, marking coordinates of the position of the product in the target video with the center position of the eyeball of the target user as an origin.
Specifically, the target video is provided with a plurality of products, and the position of each product is marked by taking the central position of the eyeball of the target user as an origin, so that the target steering angle calculated based on the eyeball steering characteristic queue matrix corresponds to the target product. The angle of each product relative to the center position of the eyeball can also be calculated according to the coordinates of the products.
Step S1102, calculating a matrix value of the eyeball steering characteristic queue matrix to obtain a target steering angle.
Specifically, a matrix value of an eyeball steering characteristic queue matrix corresponding to the target characteristic queue is calculated, a target steering angle of the target user is obtained, and the target steering angle corresponds to the coordinate, so that the target product is obtained. And if the position corresponding to the target steering angle has deviation, selecting the product closest to the target steering angle as the target product.
Illustratively, referring to fig. 7, the method further comprises:
and step S111, acquiring a video time point corresponding to the eyeball turning characteristic queue matrix.
Specifically, the video time points corresponding to the plurality of frame images corresponding to the eyeball turning characteristic queue matrix are obtained, and the frame relationship processing layer can perform time labeling so as to obtain the video time points.
Step S112, calculating a distance between a video time point corresponding to the first feature matrix and a video time point corresponding to the last feature matrix in the eyeball turning feature queue matrix, as eyeball turning time.
Specifically, the time of the last frame image is subtracted from the time of the first frame image to obtain the target turning time of the target product. The reciprocal of the target steering time is the interest degree, and the longer the steering time is, the smaller the reciprocal is, the higher the interest degree is.
Example two
Referring to fig. 8, a schematic diagram of program modules of a second embodiment of the video-based eye-turning determination system according to the invention is shown. In the present embodiment, the video-based eyeball-diversion determination system 20 may include or be divided into one or more program modules, which are stored in a storage medium and executed by one or more processors, to implement the present invention and implement the above-described video-based eyeball-diversion determination method. The program modules referred to in the embodiments of the present invention refer to a series of computer program instruction segments capable of performing specific functions, and are more suitable than the program itself for describing the execution process of the video-based eyeball-steering determination system 20 in the storage medium. The following description will specifically describe the functions of the program modules of the present embodiment:
the first obtaining module 200 is configured to obtain a target video, where the target video is a video of a target product watched by a target user.
Specifically, the process that the target user watches the target product is shot through the camera to obtain a target video, and the target video is transmitted to the computer device 2 for processing.
And the labeling module 202 is configured to perform eyeball feature labeling on the target video to obtain a labeled video.
Illustratively, the annotation module 202 is further configured to:
and identifying eyeball characteristics of each frame of image in the target video.
Specifically, eyeball features of each frame of image in the target video are identified through eyeball key point detection.
And performing frame selection on the area where the eyeball characteristics are located through the marking frame to obtain a marked video.
Specifically, the region corresponding to the eyeball key point of each frame of video is selected through the labeling frame, and the labeled video is obtained. And labeling the orientation of the eyeball to acquire a turning motion area of the eyeball in the target video.
The input module 204 is configured to input the annotation video into an eyeball turning characteristic identification model, where the eyeball turning characteristic identification model includes an eyeball characteristic extraction layer, a frame relationship processing layer, and an eyeball turning action identification layer.
Specifically, the eyeball turning characteristic identification model is a pre-trained model and is used for analyzing the marked video and obtaining an eyeball turning characteristic queue matrix. Pre-training an eyeball turning characteristic recognition model based on a deep learning network model:
acquiring a large amount of sample video data, and identifying each frame of sample eyeball characteristic area in each sample video data to obtain a sample image; labeling the sample images according to the time sequence to obtain labeled sample images; inputting the marked sample image into a deep neural network, and extracting a sample characteristic vector of the marked sample image from a CNN convolution layer of the deep neural network; calculating the difference between the marked sample images of the adjacent frames by pixel processing of the sample characteristic vectors to obtain a difference value; deleting the same sample image according to the difference value to obtain a feature queue; and outputting an eyeball steering characteristic queue matrix obtained based on the characteristic queue through the full-connection output layer. The feature extraction method includes, but is not limited to, a facial feature extraction algorithm based on a deep neural network and an eyeball turning feature extraction algorithm based on geometric features.
Illustratively, the eyeball feature extraction layer is used for extracting eyeball features of a target user from each frame of image of the target video and converting the eyeball features into a feature matrix;
the frame relation processing layer is used for determining the frame relation between each frame of image with eyeball characteristics according to the video time point of each frame of image of the target video; and
and the eyeball turning action recognition layer is used for determining an eyeball turning characteristic queue matrix of the target user according to the frame relation and the characteristic matrix.
The conversion module 206 is configured to convert each frame of image of the annotation video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of image into the frame relationship processing layer.
Specifically, the eyeball feature extraction layer splits the target video into each frame of image, and extracts eyeball turning from each frame of image to obtain the respective corresponding features of each frame of image. The eyeball features are composed of a plurality of key points, and can be feature matrixes composed of 128 or 512 key points.
Illustratively, the conversion module 206 is further configured to:
determining eyeball key points of each frame of image of the annotated video, wherein the eyeball key points comprise 128 key points or 256 key points.
Specifically, the eyeball feature extraction layer splits the annotation video into each frame of image, and extracts eyeball turning features from each frame of image to obtain a feature matrix corresponding to each frame of image. The eyeball feature is composed of a plurality of eyeball key points, and can be 128 key points or 512 key points.
And acquiring the pixel point coordinates of the eyeball key points of each frame of image.
Specifically, pixel point coordinates of key points of each eyeball are obtained, each frame of image is subjected to graying processing to obtain a two-dimensional gray image, and the two-dimensional gray image is converted into a two-dimensional coordinate.
And establishing a characteristic matrix according to the eyeball key points of each frame of image, wherein the characteristic matrix comprises 128 or 256 pixel point coordinates.
Specifically, the coordinates of the pixel points are sorted to obtain a feature matrix in the form of 128 rows or 256 rows and 2 columns.
And the feature sorting module 208 is configured to sort, by the frame relation processing layer, the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and input the feature queue to the eyeball turning action identification layer.
Specifically, the frame relation processing layer calculates a corresponding feature matrix of adjacent video time points to determine whether to process the frame image. The frame relation processing layer carries out difference operation on two adjacent frames of images to obtain difference image characteristics, and obtains a movement route of eyeball turning through difference image characteristic analysis, namely, when the difference image characteristics of the two adjacent frames of images are changed to be kept unchanged, the eyeball turning is shown to finish turning movement; when the difference image characteristics of two adjacent frame images change from unchanged, the eyeball begins to perform eyeball turning motion at the moment, and a characteristic queue at the moment is obtained. And arranging the feature matrixes of each frame of image according to the sequence of video time points to obtain a feature queue, so that the subsequent calculation is facilitated. And taking the characteristic queue as a frame relation between the respective corresponding characteristics of each frame of image.
A feature fusion and output module 210, configured to perform feature fusion on the feature queue by the eyeball turning motion recognition layer to obtain an eyeball turning feature queue matrix, and determine a target turning angle of the target user based on the eyeball turning feature queue matrix.
Specifically, the eyeball turning characteristic layer performs duplication checking processing on the characteristic queue, deletes the same characteristics in the queue to obtain target characteristic queues with different eyeball characteristics, and combines arrays of the target characteristic queues according to a time sequence to obtain an eyeball turning characteristic queue matrix.
Illustratively, the feature fusion and output module 210 is further configured to:
and calculating the difference image characteristics of the adjacent frame images to judge whether the eyeball characteristics corresponding to the adjacent frame images are the same.
Specifically, the difference between the feature matrices of the adjacent frame images is calculated through difference operation, and the difference image features are obtained. When the difference image characteristics of two adjacent frames of images are changed to be kept unchanged, indicating that the eyeball turns to complete turning movement at the moment; when the difference image characteristic of two adjacent frame images changes from unchanged to changed, the eyeball begins to perform eyeball turning motion at the moment.
If the characteristic matrixes are the same, the characteristic matrix corresponding to one frame of image is reserved, and the other same characteristic matrix is deleted from the characteristic queue until the characteristic matrixes in the characteristic queue are different, so that the target characteristic queue is obtained.
Specifically, if the same, one eyeball feature is retained, and the retained feature may be the latter or former. If the retained eyeball characteristic is the last same eyeball characteristic, indicating that the characteristic queue comprises the turn-around time; if the retained eye feature is not the last identical feature, it indicates that the feature queue does not include turnaround time. When the feature matrixes in the target feature queue are different, namely eyeball turning directions are different, the multi-frame image corresponding to the target feature queue is represented as an eyeball motion area.
And combining the characteristic matrixes in the target characteristic queue to obtain the eyeball turning characteristic queue matrix.
Specifically, the target feature queue comprises a target steering angle, the target steering angle corresponds to the feature queue, and feature matrices of the feature queue are combined according to a time sequence to obtain an eyeball steering feature queue matrix.
Illustratively, the feature fusion and output module 210 is further configured to:
and acquiring the pixel point coordinates of the adjacent frame images.
Specifically, the current first frame image is set to be F _ k (x, y), the second frame image is set to be F _ (k-1) (x, y), and (x, y) are the coordinates of the pixel points in each frame image.
And carrying out differential operation on the pixel point coordinates of the adjacent frame images to obtain the differential image characteristics.
Specifically, see the difference operation formula D _ k (x, y) ═ F _ k (x, y) -F _ (k-1) (x, y) | for calculation; d _ k (x, y) is a difference image feature.
And comparing the differential image characteristics with a preset binarization threshold value to judge whether the eyeball characteristics corresponding to the target images of adjacent frames are the same.
Specifically, the differential image features are compared with a preset binarization threshold value by a formula | D _ k (x, y) | > T, and if the differential image features are larger than the preset binarization threshold value, the differential image features are different, and if the differential image features are smaller than the preset binarization threshold value, the differential image features are the same.
Illustratively, the feature fusion and output module 210 is further configured to:
and marking the position of the product in the target video by taking the central position of the eyeball of the target user as an origin.
Specifically, the target video is provided with a plurality of products, and the position of each product is marked by taking the central position of the eyeball of the target user as an origin, so that the target steering angle calculated based on the eyeball steering characteristic queue matrix corresponds to the target product. The angle of each product relative to the center position of the eyeball can also be calculated according to the coordinates of the products.
And calculating the matrix value of the target characteristic queue to obtain a target steering angle.
Specifically, a matrix value of an eyeball steering characteristic queue matrix corresponding to the target characteristic queue is calculated, a target steering angle of the target user is obtained, and the target steering angle corresponds to the coordinate, so that the target product is obtained. And if the position corresponding to the target steering angle has deviation, selecting the product closest to the target steering angle as the target product.
EXAMPLE III
Fig. 9 is a schematic diagram of a hardware architecture of a computer device according to a third embodiment of the present invention. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers), and the like. As shown in fig. 9, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a video-based eye-steering determination system 20, which may be communicatively coupled to each other via a system bus. Wherein:
in this embodiment, the memory 21 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both internal and external memory units of the computer device 2. In this embodiment, the memory 21 is generally used for storing an operating system installed in the computer device 2 and various types of application software, such as the program codes of the video-based eye-turning determining system 20 of the second embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is configured to execute the program code stored in the memory 21 or process data, for example, execute the video-based eye-turning determining system 20, so as to implement the video-based eye-turning determining method according to the first embodiment.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing communication connection between the server 2 and other electronic devices. For example, the network interface 23 is used to connect the server 2 to an external terminal via a network, establish a data transmission channel and a communication connection between the server 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like. It is noted that fig. 9 only shows the computer device 2 with components 20-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the video-based eye-turning determination system 20 stored in the memory 21 may also be divided into one or more program modules, which are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 22) to complete the present invention.
For example, fig. 8 is a schematic diagram of program modules of a second embodiment of the video-based eye-turning determination system 20, in which the video-based eye-turning determination system 20 may be divided into an acquisition module 200, an annotation module 202, an input module 204, a transformation module 206, a feature sorting module 208, and a feature fusion and output module 210. The program modules referred to herein are a series of computer program instruction segments that can perform specific functions, and are more suitable than programs for describing the execution of the video-based eye-turning determination system 20 in the computer device 2. The specific functions of the program modules 200 and 210 have been described in detail in the second embodiment, and are not described herein again.
Example four
The present embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer-readable storage medium of the present embodiment is used for storing the video-based eye-turning determining system 20, and when being executed by the processor, the computer-readable storage medium implements the video-based eye-turning determining method of the first embodiment.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A video-based eyeball steering determination method is characterized by comprising the following steps:
acquiring a target video, wherein the target video is a video of a target product watched by a target user;
performing eyeball feature labeling on the target video to obtain a labeled video;
inputting the marked video into an eyeball turning characteristic recognition model, wherein the eyeball turning characteristic recognition model comprises an eyeball characteristic extraction layer, a frame relation processing layer and an eyeball turning action recognition layer;
converting each frame of image of the marked video into a characteristic matrix through the eyeball characteristic extraction layer, and inputting the characteristic matrix corresponding to each frame of image into the frame relation processing layer;
the frame relation processing layer sorts the feature matrixes of each frame of image according to the video time points corresponding to the feature matrixes to obtain feature queues, and the feature queues are input to the eyeball turning action recognition layer;
and the eyeball turning action recognition layer performs characteristic fusion on the characteristic queue to obtain an eyeball turning characteristic queue matrix, and determines the target turning angle of the target user based on the eyeball turning characteristic queue matrix.
2. The method of claim 1, after determining the target steering angle of the target user based on the eye-steering feature queue matrix, further comprising:
acquiring a video time point corresponding to the eyeball turning characteristic queue matrix;
and calculating the distance from the video time point corresponding to the first characteristic matrix to the video time point corresponding to the last characteristic matrix in the eyeball turning characteristic queue matrix as eyeball turning time.
3. The method of claim 1, wherein performing eye feature labeling on the target video to obtain a labeled video comprises:
identifying eyeball characteristics of each frame of image in the target video;
and performing frame selection on the area where the eyeball characteristics are located through the marking frame to obtain a marked video.
4. The method of claim 1, wherein converting each frame of image of the annotated video into a feature matrix by the eye feature extraction layer comprises:
determining eyeball key points of each frame of image of the annotated video, wherein the eyeball key points comprise 128 key points or 256 key points;
acquiring pixel point coordinates of eyeball key points of each frame of image;
and establishing a characteristic matrix according to the eyeball key points of each frame of image, wherein the characteristic matrix comprises 128 or 256 pixel point coordinates.
5. The method according to claim 4, wherein the eye-turning action recognition layer performs feature fusion on the feature queue to obtain an eye-turning feature queue matrix, which comprises:
calculating the differential image characteristics of the adjacent frame images to judge whether the eyeball characteristics corresponding to the adjacent frame images are the same or not;
if the characteristic matrixes are the same, the characteristic matrix corresponding to one frame of image is reserved, and the other same characteristic matrix is deleted from the characteristic queue until the characteristic matrixes in the characteristic queue are different, so that a target characteristic queue is obtained;
and combining the characteristic matrixes in the target characteristic queue to obtain the eyeball turning characteristic queue matrix.
6. The method of claim 5, wherein calculating the difference image characteristic of the target image of the adjacent frames to determine whether the eyeball characteristics corresponding to the target image of the adjacent frames are the same comprises:
acquiring pixel point coordinates of adjacent frame images;
carrying out differential operation on the pixel point coordinates of the adjacent frame images to obtain differential image characteristics;
and comparing the differential image characteristics with a preset binarization threshold value to judge whether the eyeball characteristics corresponding to the target images of adjacent frames are the same.
7. The method of claim 5, wherein determining the target steering angle for the target user based on the eye-steering feature queue matrix comprises:
marking the position of a product in the target video by taking the central position of the eyeball of the target user as an origin;
and calculating a matrix value of the eyeball turning characteristic queue matrix to obtain a target turning angle.
8. A video-based eye-steering determination system, comprising:
the acquisition module is used for acquiring a target video, wherein the target video is a video of a target product watched by a target user;
the labeling module is used for labeling eyeball characteristics of the target video to obtain a labeled video;
the input module is used for inputting the marked video into an eyeball turning characteristic identification model, wherein the eyeball turning characteristic identification model comprises an eyeball characteristic extraction layer, a frame relation processing layer and an eyeball turning action identification layer;
the conversion module is used for converting each frame of image of the annotated video into a feature matrix through the eyeball feature extraction layer and inputting the feature matrix corresponding to each frame of image into the frame relation processing layer;
the characteristic sorting module is used for sorting the characteristic matrix of each frame of image by the frame relation processing layer according to the video time point corresponding to the characteristic matrix to obtain a characteristic queue and inputting the characteristic queue to the eyeball turning action identification layer;
and the characteristic fusion and output module is used for performing characteristic fusion on the characteristic queue by the eyeball turning action recognition layer to obtain an eyeball turning characteristic queue matrix, and determining the target turning angle of the target user based on the eyeball turning characteristic queue matrix.
9. A computer device, characterized in that the computer device comprises a memory, a processor, the memory having stored thereon a computer program being executable on the processor, the computer program being executable by the processor to implement the steps of the video-based eye-turning determination method according to any of the claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program which is executable by at least one processor to cause the at least one processor to perform the steps of the video-based eye-turning determination method according to any one of claims 1-7.
CN202010128432.6A 2020-02-28 2020-02-28 Interest degree method and system based on eyeball turning Pending CN111353429A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010128432.6A CN111353429A (en) 2020-02-28 2020-02-28 Interest degree method and system based on eyeball turning
PCT/CN2021/071261 WO2021169642A1 (en) 2020-02-28 2021-01-12 Video-based eyeball turning determination method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010128432.6A CN111353429A (en) 2020-02-28 2020-02-28 Interest degree method and system based on eyeball turning

Publications (1)

Publication Number Publication Date
CN111353429A true CN111353429A (en) 2020-06-30

Family

ID=71195806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010128432.6A Pending CN111353429A (en) 2020-02-28 2020-02-28 Interest degree method and system based on eyeball turning

Country Status (2)

Country Link
CN (1) CN111353429A (en)
WO (1) WO2021169642A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053600A (en) * 2020-08-31 2020-12-08 上海交通大学医学院附属第九人民医院 Orbit endoscope navigation surgery training method, device, equipment and system
WO2021169642A1 (en) * 2020-02-28 2021-09-02 深圳壹账通智能科技有限公司 Video-based eyeball turning determination method and system
CN115544473A (en) * 2022-09-09 2022-12-30 苏州吉弘能源科技有限公司 Photovoltaic power station operation and maintenance terminal login control system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677024A (en) * 2015-12-31 2016-06-15 北京元心科技有限公司 Eye movement detection tracking method and device, and application of eye movement detection tracking method
CN107679448A (en) * 2017-08-17 2018-02-09 平安科技(深圳)有限公司 Eyeball action-analysing method, device and storage medium
CN109359512A (en) * 2018-08-28 2019-02-19 深圳壹账通智能科技有限公司 Eyeball position method for tracing, device, terminal and computer readable storage medium
US20190294240A1 (en) * 2018-03-23 2019-09-26 Aisin Seiki Kabushiki Kaisha Sight line direction estimation device, sight line direction estimation method, and sight line direction estimation program
CN110555426A (en) * 2019-09-11 2019-12-10 北京儒博科技有限公司 Sight line detection method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353429A (en) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 Interest degree method and system based on eyeball turning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677024A (en) * 2015-12-31 2016-06-15 北京元心科技有限公司 Eye movement detection tracking method and device, and application of eye movement detection tracking method
CN107679448A (en) * 2017-08-17 2018-02-09 平安科技(深圳)有限公司 Eyeball action-analysing method, device and storage medium
US20190294240A1 (en) * 2018-03-23 2019-09-26 Aisin Seiki Kabushiki Kaisha Sight line direction estimation device, sight line direction estimation method, and sight line direction estimation program
CN109359512A (en) * 2018-08-28 2019-02-19 深圳壹账通智能科技有限公司 Eyeball position method for tracing, device, terminal and computer readable storage medium
CN110555426A (en) * 2019-09-11 2019-12-10 北京儒博科技有限公司 Sight line detection method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169642A1 (en) * 2020-02-28 2021-09-02 深圳壹账通智能科技有限公司 Video-based eyeball turning determination method and system
CN112053600A (en) * 2020-08-31 2020-12-08 上海交通大学医学院附属第九人民医院 Orbit endoscope navigation surgery training method, device, equipment and system
CN115544473A (en) * 2022-09-09 2022-12-30 苏州吉弘能源科技有限公司 Photovoltaic power station operation and maintenance terminal login control system
CN115544473B (en) * 2022-09-09 2023-11-21 苏州吉弘能源科技有限公司 Photovoltaic power station operation and maintenance terminal login control system

Also Published As

Publication number Publication date
WO2021169642A1 (en) 2021-09-02

Similar Documents

Publication Publication Date Title
CN110909651B (en) Method, device and equipment for identifying video main body characters and readable storage medium
CN112597941B (en) Face recognition method and device and electronic equipment
CN109960742B (en) Local information searching method and device
CN110929622A (en) Video classification method, model training method, device, equipment and storage medium
CN109858333B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN111797657A (en) Vehicle peripheral obstacle detection method, device, storage medium, and electronic apparatus
CN111626123A (en) Video data processing method and device, computer equipment and storage medium
WO2023010758A1 (en) Action detection method and apparatus, and terminal device and storage medium
WO2021169642A1 (en) Video-based eyeball turning determination method and system
CN115427982A (en) Methods, systems, and media for identifying human behavior in digital video using convolutional neural networks
CN111062263B (en) Method, apparatus, computer apparatus and storage medium for hand gesture estimation
CN112528974B (en) Distance measuring method and device, electronic equipment and readable storage medium
CN114049512A (en) Model distillation method, target detection method and device and electronic equipment
CN112560796A (en) Human body posture real-time detection method and device, computer equipment and storage medium
CN113496208B (en) Video scene classification method and device, storage medium and terminal
CN111914762A (en) Gait information-based identity recognition method and device
CN113706481A (en) Sperm quality detection method, sperm quality detection device, computer equipment and storage medium
CN112200056A (en) Face living body detection method and device, electronic equipment and storage medium
WO2023279799A1 (en) Object identification method and apparatus, and electronic system
CN114842466A (en) Object detection method, computer program product and electronic device
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN114359787A (en) Target attribute identification method and device, computer equipment and storage medium
CN111652181B (en) Target tracking method and device and electronic equipment
CN115115552B (en) Image correction model training method, image correction device and computer equipment
CN115577768A (en) Semi-supervised model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination