WO2021169642A1 - Procédé et système de détermination de rotation de globe oculaire basée sur une vidéo - Google Patents

Procédé et système de détermination de rotation de globe oculaire basée sur une vidéo Download PDF

Info

Publication number
WO2021169642A1
WO2021169642A1 PCT/CN2021/071261 CN2021071261W WO2021169642A1 WO 2021169642 A1 WO2021169642 A1 WO 2021169642A1 CN 2021071261 W CN2021071261 W CN 2021071261W WO 2021169642 A1 WO2021169642 A1 WO 2021169642A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
eyeball
video
matrix
turning
Prior art date
Application number
PCT/CN2021/071261
Other languages
English (en)
Chinese (zh)
Inventor
卢宁
徐国强
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021169642A1 publication Critical patent/WO2021169642A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements

Definitions

  • the embodiments of the present application relate to the field of computer vision technology, and in particular to a method and system for determining eyeball rotation based on video.
  • PCCR pupil center corneal reflection
  • the inventor realizes that the current vision applications in artificial intelligence are still mainly image processing, or video is decomposed into pictures frame by frame, which is essentially an application based on a single frame of image. It does not correlate the relationship between videos, and cannot reflect the relevance and continuity between pictures. For eye tracking, the accuracy is not enough.
  • the purpose of the embodiments of the present application is to provide a video-based method and system for determining eyeball turning, which improves the accuracy of eyeball tracking.
  • an embodiment of the present application provides a video-based method for determining eyeball rotation, including:
  • Target video is a video of the target user watching the target product
  • the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
  • the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
  • the eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  • an embodiment of the present application also provides a video-based eyeball turning determination system, including:
  • An obtaining module configured to obtain a target video, where the target video is a video of a target user watching a target product;
  • An annotation module configured to annotate the eyeball features of the target video to obtain the annotated video
  • An input module for inputting the annotated video into an eyeball turning feature recognition model wherein the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
  • a conversion module configured to convert each frame of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of the image to the frame relationship processing layer;
  • the feature sorting module is used for the frame relation processing layer to sort the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and input the feature queue to the eyeball turning Action recognition layer;
  • the feature fusion and output module is used for the eye turning action recognition layer to perform feature fusion on the feature queue to obtain an eye turning feature queue matrix, and to determine the target turning angle of the target user based on the eye turning feature queue matrix.
  • an embodiment of the present application also provides a computer device, the computer device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the computer program is When the processor is executed, the video-based eye-turn determination method as described above is implemented, and the video-based eye-turn determination method includes the following steps:
  • Target video is a video of the target user watching the target product
  • the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
  • the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
  • the eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  • an embodiment of the present application also provides a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor to enable the At least one processor executes the video-based method for determining eyeball rotation as described above, and the method for determining eyeball rotation based on video includes the following steps:
  • Target video is a video of the target user watching the target product
  • the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
  • the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
  • the eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  • the embodiment of the application obtains the annotated video by annotating the target video, inputs the annotated video to the eye-turn feature recognition model to obtain the eye-turn feature queue matrix, and then obtains the target steering angle of the corresponding target user based on the eye-turn feature queue matrix, thereby improving The accuracy of eye tracking is improved.
  • FIG. 1 is a flowchart of Embodiment 1 of a method for determining eyeball rotation based on video in this application.
  • Fig. 2 is a flowchart of step S102 in Fig. 1 of an embodiment of the application.
  • Fig. 3 is a flowchart of step S106 in Fig. 1 of the embodiment of the application.
  • Fig. 4 is a flowchart of step S110 in Fig. 1 of an embodiment of the application.
  • FIG. 5 is a flowchart of step S110A in FIG. 4 according to an embodiment of the application.
  • Fig. 6 is a flowchart of another embodiment of step S110 in Fig. 1 of the embodiment of the application.
  • FIG. 7 is a flowchart of step S111 and step S112 in the first embodiment of this application.
  • FIG. 8 is a schematic diagram of the program modules of the second embodiment of the video-based eyeball turning determination system of this application.
  • FIG. 9 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.
  • the technical solution of this application can be applied to the fields of artificial intelligence, smart city and/or blockchain technology.
  • the data involved in this application such as video, feature queue matrix, and/or steering angle, can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application .
  • FIG. 1 there is shown a flow chart of the method for determining eyeball rotation based on video in the first embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps.
  • the following exemplarily describes the computer device 2 as the execution subject. details as follows.
  • Step S100 Obtain a target video, where the target video is a video of a target user watching a target product.
  • the process of the target user watching the target product is captured by the camera to obtain the target video, and the target video is transmitted to the computer device 2 for processing.
  • Step S102 Perform eye feature annotation on the target video to obtain an annotation video.
  • each frame of the target video is processed by image segmentation, object detection, image annotation, etc., to obtain annotated video.
  • step S102 further includes:
  • Step S102A Identify the eyeball feature of each frame of image in the target video.
  • the eyeball feature of each frame of image in the target video is identified.
  • step S102B the area where the eyeball feature is located is selected by the marking frame to obtain the marked video.
  • the area corresponding to the eyeball key points of each frame of video is selected through the marking frame to obtain the marked video. And mark the eyeball direction to obtain the eyeball turning movement area in the target video.
  • Step S104 Input the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer.
  • the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer.
  • the eyeball turning feature recognition model is a pre-trained model, which is used to analyze the annotated video and obtain the eyeball turning feature queue matrix. Pre-training the eyeball turning feature recognition model based on the deep learning network model:
  • Feature extraction methods include, but are not limited to, facial feature extraction algorithms based on deep neural networks and eye turn feature extraction algorithms based on geometric features.
  • the eyeball feature extraction layer is used to extract the eyeball feature of the target user from each frame of the target video, and convert the eyeball feature into a feature matrix;
  • the frame relationship processing layer is configured to determine the frame relationship between images with eyeball features in each frame according to the video time point of each frame of the target video;
  • the eye turning action recognition layer is used to determine the eye turning feature queue matrix of the target user according to the frame relationship and the feature matrix.
  • Step S106 Convert each frame of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of image to the frame relationship processing layer.
  • the eyeball feature extraction layer splits the target video into each frame of image, and extracts the eyeball turning from each frame of image, and obtains the corresponding feature of each frame of image.
  • the eyeball feature is composed of multiple key points, which can be a feature matrix composed of 128 or 512 key points.
  • step S106 further includes:
  • Step S106A Determine the eyeball key points of each frame of the annotated video, where the eyeball key points include 128 key points or 256 key points.
  • the eyeball feature extraction layer splits the annotated video into each frame of image, and extracts the eyeball turning feature from each frame of image, and obtains the feature matrix corresponding to each frame of image.
  • the eyeball feature is composed of multiple eyeball key points, which can be 128 or 512 key points.
  • Step S106B Obtain the pixel point coordinates of the key eyeball points of each frame of image.
  • each key point of the eyeball is obtained, and each frame of image is firstly grayed out to obtain a two-dimensional gray-scale image, which is then converted into two-dimensional coordinates.
  • step S106C a feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
  • the pixel coordinates are sorted to obtain a feature matrix in the form of 128 rows or 256 rows and 2 columns.
  • Step S108 the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning action recognition layer .
  • the frame relationship processing layer calculates the corresponding feature matrix of adjacent video time points to determine whether to process the frame image.
  • the frame relationship processing layer performs differential operations on two adjacent frames of images to obtain differential image features, and obtains the movement route of the eyeball turning through differential image feature analysis, that is, when the differential image features of adjacent two frames of images change from change to remain unchanged , Means that the eyeballs are turning to complete the turning movement at this time; when the difference image characteristics of the adjacent two frames of images change from unchanged to changing, it means that the eyeballs start to move the eyeballs at this time, and the feature queue at this time is obtained.
  • the feature matrix of each frame of image is arranged in the order of the video time point to obtain a feature queue, which is convenient for subsequent calculations.
  • the feature queue is regarded as the frame relationship between the corresponding features of each frame of image.
  • step S110 the eye-turning action recognition layer performs feature fusion on the feature queue to obtain an eye-turn feature queue matrix, and determines the target steering angle of the target user based on the eye-turn feature queue matrix.
  • the eyeball turns to the feature layer to perform duplicate check processing on the feature queue, delete the same features in the queue to obtain target feature queues with different eyeball features, and combine the arrays of the target feature queues in chronological order to obtain eyeballs Turn to the feature queue matrix.
  • step S110 further includes:
  • Step S110A Calculate the difference image features of adjacent frame images to determine whether the eyeball features corresponding to the adjacent frame images are the same.
  • the difference between the feature matrices of adjacent frame images is calculated through a difference operation to obtain the feature of the difference image.
  • the difference image characteristics of two adjacent frames of images change from change to remain unchanged, it means that the eyeballs are turning to complete the turning movement; when the difference image characteristics of the adjacent two frames of images change from unchanged to change, it means that the eyeballs start at this time Perform eye-turning movements.
  • step S110B if they are the same, the feature matrix corresponding to one frame of the image is retained, and another identical feature matrix is deleted from the feature queue until the feature matrixes in the feature queue are all different, and the target feature queue is obtained.
  • the feature matrix corresponding to the eyeball feature of one frame is retained, and the retained feature can be the latter or the former. If the retained eye feature is the last identical eye feature, it indicates that the feature queue includes the turning time; if the retained eye feature is not the last identical feature, it indicates that the feature queue does not include the turning time.
  • the feature matrices in the feature queue are all different, that is, the eyeball turns are all different, it means that the multi-frame image corresponding to the feature queue is the eye movement area.
  • Step S110C Combine the feature matrices in the target feature queue to obtain the eyeball turning feature queue matrix.
  • the target feature queue includes a target steering angle, and the target steering angle corresponds to the feature queue.
  • the feature matrix of the feature queue is combined in chronological order to obtain the eye-turning feature queue matrix.
  • step S110A further includes:
  • step S110A1 the pixel coordinates of the adjacent frame image are obtained.
  • the current first frame of image is set to F_k(x, y)
  • the second frame of image is set to F_(k-1)(x, y)
  • (x, y) is the pixel point coordinates in each frame of image.
  • Step S110A2 Perform a difference operation on the pixel coordinates of the adjacent frame images to obtain a difference image feature.
  • D_k(x,y)
  • step S110A3 the difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
  • the difference image feature is compared with a preset binarization threshold by the formula
  • step S110 further includes:
  • Step S1101 using the center position of the eyeball of the target user as the origin, and mark the position of the product in the target video with coordinates.
  • each product is marked with the center position of the target user’s eyeball as the origin, so that the target steering angle calculated based on the eyeball turning feature queue matrix is mapped to Target product.
  • the angle of each product relative to the center position of the eyeball can also be calculated according to the coordinates of the product.
  • Step S1102 Calculate the matrix value of the eyeball turning feature queue matrix to obtain the target turning angle.
  • the matrix value of the eyeball turning characteristic queue matrix corresponding to the target characteristic queue is calculated to obtain the target steering angle of the target user, and the target steering angle is corresponding to the coordinates to obtain the target product. If the position corresponding to the target steering angle is deviated, the product closest to the target steering angle is selected as the target product.
  • the method further includes:
  • Step S111 Obtain a video time point corresponding to the eye turn feature queue matrix.
  • the video time points corresponding to the multiple frame images corresponding to the eye turn feature queue matrix are acquired. Since the frame relationship processing layer has been obtained, time stamping may be performed for acquisition.
  • Step S112 Calculate the distance between the video time point corresponding to the first feature matrix in the eye turning feature queue matrix and the video time point corresponding to the last feature matrix as the eye turning time.
  • the time of the last frame of image is subtracted from the time of the first frame of image to obtain the target turning time of the target product.
  • the reciprocal of the target turning time is the degree of interest. The longer the turning time, the smaller the reciprocal, the greater the degree of interest.
  • FIG. 8 shows a schematic diagram of the program modules of the second embodiment of the video-based eyeball turning determination system of the present application.
  • the video-based eyeball turning determination system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors.
  • the program module referred to in the embodiments of the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable than the program itself to describe the execution process of the video-based eyeball turning determination system 20 in the storage medium. The following description will specifically introduce the functions of each program module in this embodiment:
  • the first obtaining module 200 is configured to obtain a target video, where the target video is a video of a target user watching a target product.
  • the process of the target user watching the target product is captured by the camera to obtain the target video, and the target video is transmitted to the computer device 2 for processing.
  • the tagging module 202 is used to tag the target video with eyeball features to obtain the tagged video.
  • the labeling module 202 is further used for:
  • the eyeball feature of each frame of image in the target video is identified.
  • the area where the eyeball feature is located is selected by the marking frame to obtain the marked video.
  • the area corresponding to the eyeball key points of each frame of video is selected through the marking frame to obtain the marked video. And mark the eyeball direction to obtain the eyeball turning movement area in the target video.
  • the input module 204 is configured to input the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer.
  • the eyeball turning feature recognition model is a pre-trained model, which is used to analyze the annotated video and obtain the eyeball turning feature queue matrix. Pre-training the eyeball turning feature recognition model based on the deep learning network model:
  • Feature extraction methods include, but are not limited to, facial feature extraction algorithms based on deep neural networks and eye turn feature extraction algorithms based on geometric features.
  • the eyeball feature extraction layer is used to extract the eyeball feature of the target user from each frame of the target video, and convert the eyeball feature into a feature matrix;
  • the frame relationship processing layer is configured to determine the frame relationship between images with eyeball features in each frame according to the video time point of each frame of the target video;
  • the eye turning action recognition layer is used to determine the eye turning feature queue matrix of the target user according to the frame relationship and the feature matrix.
  • the conversion module 206 is configured to convert each frame of image of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of image to the frame relationship processing layer.
  • the eyeball feature extraction layer splits the target video into each frame of image, and extracts the eyeball turning from each frame of image, and obtains the corresponding feature of each frame of image.
  • the eyeball feature is composed of multiple key points, which can be a feature matrix composed of 128 or 512 key points.
  • the conversion module 206 is also used for:
  • the eyeball key points of each frame of the image of the annotation video are determined, where the eyeball key points include 128 key points or 256 key points.
  • the eyeball feature extraction layer splits the annotated video into each frame of image, and extracts the eyeball turning feature from each frame of image, and obtains the feature matrix corresponding to each frame of image.
  • the eyeball feature is composed of multiple eyeball key points, which can be 128 or 512 key points.
  • each key point of the eyeball is obtained, and each frame of image is firstly grayed out to obtain a two-dimensional gray-scale image, which is then converted into two-dimensional coordinates.
  • a feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
  • the pixel coordinates are sorted to obtain a feature matrix in the form of 128 rows or 256 rows and 2 columns.
  • the feature sorting module 208 is used for the frame relation processing layer to sort the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and input the feature queue to the eyeball Turn to the action recognition layer.
  • the frame relationship processing layer calculates the corresponding feature matrix of adjacent video time points to determine whether to process the frame image.
  • the frame relationship processing layer performs differential operations on two adjacent frames of images to obtain differential image features, and obtains the movement route of the eyeball turning through differential image feature analysis, that is, when the differential image features of adjacent two frames of images change from change to remain unchanged , Means that the eyeballs are turning to complete the turning movement at this time; when the difference image characteristics of the adjacent two frames of images change from unchanged to changing, it means that the eyeballs start to move the eyeballs at this time, and the feature queue at this time is obtained.
  • the feature matrix of each frame of image is arranged in the order of the video time point to obtain a feature queue, which is convenient for subsequent calculations.
  • the feature queue is regarded as the frame relationship between the corresponding features of each frame of image.
  • the feature fusion and output module 210 is used for the eye turning action recognition layer to perform feature fusion on the feature queue to obtain an eye turning feature queue matrix, and to determine the target turning angle of the target user based on the eye turning feature queue matrix .
  • the eyeball turns to the feature layer to perform duplicate check processing on the feature queue, delete the same features in the queue to obtain target feature queues with different eyeball features, and combine the arrays of the target feature queues in chronological order to obtain eyeballs Turn to the feature queue matrix.
  • the feature fusion and output module 210 is also used for:
  • the difference between the feature matrices of adjacent frame images is calculated through a difference operation to obtain the feature of the difference image.
  • the difference image characteristics of two adjacent frames of images change from change to remain unchanged, it means that the eyeballs are turning to complete the turning movement; when the difference image characteristics of the adjacent two frames of images change from unchanged to change, it means that the eyeballs start at this time Perform eye-turning movements.
  • the feature matrix corresponding to one frame of the image is retained, and another identical feature matrix is deleted from the feature queue until the feature matrices in the feature queue are all different, and the target feature queue is obtained.
  • one eyeball feature is retained, and the retained feature can be the latter or the former. If the retained eye feature is the last identical eye feature, it indicates that the feature queue includes the turning time; if the retained eye feature is not the last identical feature, it indicates that the feature queue does not include the turning time.
  • the feature matrices in the target feature queue are all different, that is, the turning of the eyeballs are all different, it means that the multiple frames of images corresponding to the target feature queue are eye movement regions.
  • the target feature queue includes a target steering angle, and the target steering angle corresponds to the feature queue.
  • the feature matrix of the feature queue is combined in chronological order to obtain the eye-turning feature queue matrix.
  • the feature fusion and output module 210 is further configured to:
  • the current first frame of image is set to F_k(x, y)
  • the second frame of image is set to F_(k-1)(x, y)
  • (x, y) is the pixel point coordinates in each frame of image.
  • D_k(x,y)
  • the difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
  • the difference image feature is compared with a preset binarization threshold by the formula
  • the feature fusion and output module 210 is also used for:
  • the position of the product in the target video is coordinated.
  • each product is marked with the center position of the target user’s eyeball as the origin, so that the target steering angle calculated based on the eyeball turning feature queue matrix is mapped to Target product.
  • the angle of each product relative to the center position of the eyeball can also be calculated according to the coordinates of the product.
  • the matrix value of the eyeball turning characteristic queue matrix corresponding to the target characteristic queue is calculated to obtain the target steering angle of the target user, and the target steering angle is corresponding to the coordinates to obtain the target product. If the position corresponding to the target steering angle is deviated, the product closest to the target steering angle is selected as the target product.
  • the computer device 2 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • the computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
  • the computer device 2 at least includes, but is not limited to, a memory and a processor.
  • the computer device 2 may also include a network interface and/or a video-based eyeball turning determination system.
  • the computer device 2 may include a memory 21, a processor 22, a network interface 23, and a video-based eye turn determination system 20.
  • the memory 21, a processor 22, a network interface 23, and a video-based system bus can communicate with each other through a system bus.
  • the eyeballs are turned to determine the system 20. in:
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2.
  • the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SMC) equipped on the computer device 2. SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, such as the program code of the video-based eyeball turning determination system 20 in the second embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 22 is generally used to control the overall operation of the computer device 2.
  • the processor 22 is configured to run the program code or process data stored in the memory 21, for example, to run the video-based eye-turn determination system 20, so as to implement the video-based eye-turn determination method of the first embodiment.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the server 2 and other electronic devices.
  • the network interface 23 is used to connect the server 2 to an external terminal through a network, and to establish a data transmission channel and a communication connection between the server 2 and the external terminal.
  • the network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 9 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the video-based eyeball turning determination system 20 stored in the memory 21 may also be divided into one or more program modules, and the one or more program modules are stored in the memory 21, and It is executed by one or more processors (the processor 22 in this embodiment) to complete the application.
  • FIG. 8 shows a schematic diagram of a program module implementing the second embodiment of the video-based eye-turn determination system 20.
  • the video-based eye-turn determination system 20 can be divided into an acquisition module 200, an annotation Module 202, input module 204, transformation module 206, feature ranking module 208, and feature fusion and output module 210.
  • the program module referred to in the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable than a program to describe the execution process of the video-based eyeball turning determination system 20 in the computer device 2.
  • the specific functions of the program modules 200-210 have been described in detail in the second embodiment, and will not be repeated here.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Readable memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, The corresponding function is realized when the program is executed by the processor.
  • the computer-readable storage medium of this embodiment is used to store the video-based eyeball turning determination system 20, and when executed by a processor, realizes the video-based eyeball turning determination method of the first embodiment.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Ophthalmology & Optometry (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé et un système de détermination de rotation de globe oculaire basée sur une vidéo. Le procédé consiste à : acquérir une vidéo cible, la vidéo cible étant une vidéo d'un utilisateur cible visualisant un produit cible (S100) ; entrer la vidéo cible dans un modèle de reconnaissance de caractéristique de rotation de globe oculaire pour obtenir une matrice de file d'attente de caractéristiques de rotation de globe oculaire ; et sur la base de la matrice de file d'attente de caractéristiques de rotation de globe oculaire, déterminer un angle de rotation cible de l'utilisateur cible. Selon le procédé, la matrice de file d'attente de caractéristiques de rotation de globe oculaire est acquise au moyen de l'entrée de la vidéo cible dans le modèle de reconnaissance de caractéristique de rotation de globe oculaire et l'angle de rotation cible et le temps de rotation cible d'un produit cible correspondant sont ensuite acquis au moyen de la matrice de file d'attente de caractéristiques de rotation du globe oculaire, ce qui permet d'améliorer la précision du suivi du globe oculaire.
PCT/CN2021/071261 2020-02-28 2021-01-12 Procédé et système de détermination de rotation de globe oculaire basée sur une vidéo WO2021169642A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010128432.6A CN111353429A (zh) 2020-02-28 2020-02-28 基于眼球转向的感兴趣度方法与系统
CN202010128432.6 2020-02-28

Publications (1)

Publication Number Publication Date
WO2021169642A1 true WO2021169642A1 (fr) 2021-09-02

Family

ID=71195806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071261 WO2021169642A1 (fr) 2020-02-28 2021-01-12 Procédé et système de détermination de rotation de globe oculaire basée sur une vidéo

Country Status (2)

Country Link
CN (1) CN111353429A (fr)
WO (1) WO2021169642A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353429A (zh) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 基于眼球转向的感兴趣度方法与系统
CN112053600B (zh) * 2020-08-31 2022-05-03 上海交通大学医学院附属第九人民医院 眼眶内窥镜导航手术训练方法、装置、设备、及系统
CN115544473B (zh) * 2022-09-09 2023-11-21 苏州吉弘能源科技有限公司 一种光伏发电站运维终端登录控制系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677024A (zh) * 2015-12-31 2016-06-15 北京元心科技有限公司 一种眼动检测跟踪方法、装置及其用途
CN107679448A (zh) * 2017-08-17 2018-02-09 平安科技(深圳)有限公司 眼球动作分析方法、装置及存储介质
CN109359512A (zh) * 2018-08-28 2019-02-19 深圳壹账通智能科技有限公司 眼球位置追踪方法、装置、终端及计算机可读存储介质
US20190294240A1 (en) * 2018-03-23 2019-09-26 Aisin Seiki Kabushiki Kaisha Sight line direction estimation device, sight line direction estimation method, and sight line direction estimation program
CN110555426A (zh) * 2019-09-11 2019-12-10 北京儒博科技有限公司 视线检测方法、装置、设备及存储介质
CN111353429A (zh) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 基于眼球转向的感兴趣度方法与系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677024A (zh) * 2015-12-31 2016-06-15 北京元心科技有限公司 一种眼动检测跟踪方法、装置及其用途
CN107679448A (zh) * 2017-08-17 2018-02-09 平安科技(深圳)有限公司 眼球动作分析方法、装置及存储介质
US20190294240A1 (en) * 2018-03-23 2019-09-26 Aisin Seiki Kabushiki Kaisha Sight line direction estimation device, sight line direction estimation method, and sight line direction estimation program
CN109359512A (zh) * 2018-08-28 2019-02-19 深圳壹账通智能科技有限公司 眼球位置追踪方法、装置、终端及计算机可读存储介质
CN110555426A (zh) * 2019-09-11 2019-12-10 北京儒博科技有限公司 视线检测方法、装置、设备及存储介质
CN111353429A (zh) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 基于眼球转向的感兴趣度方法与系统

Also Published As

Publication number Publication date
CN111353429A (zh) 2020-06-30

Similar Documents

Publication Publication Date Title
US11348249B2 (en) Training method for image semantic segmentation model and server
US20240062369A1 (en) Detection model training method and apparatus, computer device and storage medium
WO2021169642A1 (fr) Procédé et système de détermination de rotation de globe oculaire basée sur une vidéo
CN108470332B (zh) 一种多目标跟踪方法及装置
CN110929622B (zh) 视频分类方法、模型训练方法、装置、设备及存储介质
WO2020228446A1 (fr) Procédé et appareil d'entraînement de modèles, et terminal et support de stockage
CN109960742B (zh) 局部信息的搜索方法及装置
WO2023010758A1 (fr) Procédé et appareil de détection d'action, dispositif terminal et support de stockage
CN110503076B (zh) 基于人工智能的视频分类方法、装置、设备和介质
US20220148291A1 (en) Image classification method and apparatus, and image classification model training method and apparatus
WO2021238548A1 (fr) Procédé, appareil et dispositif de reconnaissance de région, et support de stockage lisible
WO2021051547A1 (fr) Procédé et système de détection de comportement violent
CN108986137B (zh) 人体跟踪方法、装置及设备
US20230334893A1 (en) Method for optimizing human body posture recognition model, device and computer-readable storage medium
CN113496208B (zh) 视频的场景分类方法及装置、存储介质、终端
CN113706481A (zh) 精子质量检测方法、装置、计算机设备和存储介质
CN115115825B (zh) 图像中的对象检测方法、装置、计算机设备和存储介质
Alkhudaydi et al. Counting spikelets from infield wheat crop images using fully convolutional networks
Sharjeel et al. Real time drone detection by moving camera using COROLA and CNN algorithm
Mar et al. Cow detection and tracking system utilizing multi-feature tracking algorithm
CN112836682B (zh) 视频中对象的识别方法、装置、计算机设备和存储介质
CN113780145A (zh) 精子形态检测方法、装置、计算机设备和存储介质
Liao et al. Multi-scale saliency features fusion model for person re-identification
CN111753766A (zh) 一种图像处理方法、装置、设备及介质
CN113762231B (zh) 端对端的多行人姿态跟踪方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21761730

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21761730

Country of ref document: EP

Kind code of ref document: A1