CN107392120B - Attention intelligent supervision method based on sight line estimation - Google Patents

Attention intelligent supervision method based on sight line estimation Download PDF

Info

Publication number
CN107392120B
CN107392120B CN201710546644.4A CN201710546644A CN107392120B CN 107392120 B CN107392120 B CN 107392120B CN 201710546644 A CN201710546644 A CN 201710546644A CN 107392120 B CN107392120 B CN 107392120B
Authority
CN
China
Prior art keywords
area
user
watching
attention
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710546644.4A
Other languages
Chinese (zh)
Other versions
CN107392120A (en
Inventor
姬艳丽
胡玉晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710546644.4A priority Critical patent/CN107392120B/en
Publication of CN107392120A publication Critical patent/CN107392120A/en
Application granted granted Critical
Publication of CN107392120B publication Critical patent/CN107392120B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Ophthalmology & Optometry (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Aiming at the problems in the prior art, the invention provides an attention intelligent supervision method based on sight line estimation, which adopts a watching area concept, divides the watching area into 9 areas, selects the manual frames of eyes by acquiring the facial pictures of watching images of 9 areas of different collected objects, marks the watching area, and trains the set Yolo network as training data to obtain a watching area estimation model based on the Yolo network. And finally, acquiring a face image of the user in real time and sending the face image into a trained viewing area estimation model based on the Yolo network in use to obtain a result of judging whether the user focuses on the area five or not, and further judging whether the user focuses on the area five or not. The invention judges through the division of the watching area and the positions of the iris and the pupil of the eye, thus greatly reducing the requirements on equipment, reducing the implementation cost, having no requirements on the use position of a user, expanding the application range and being convenient for popularization and use.

Description

Attention intelligent supervision method based on sight line estimation
Technical Field
The invention belongs to the technical field of computer vision and human-computer interaction, and particularly relates to an intelligent attention supervision method based on sight estimation.
Background
The eye occupies an important position in the human facial organs, since 80% of the information that a human acquires from the surrounding environment is obtained by the eye. The eyes can not only help people to observe and recognize the external world, but also reflect the psychological activities of people; the eye expresses one's desire, needs, emotion and cognitive processes and it can act as a silent communication between people.
Gaze estimationThe technology is a technology which takes a picture of human eyes as an input medium and reflects the user's sensing, attention and interest area distribution on external equipment by acquiring the gaze information of the eyes.
The sight line estimation can be divided into sight line estimation of a single camera and sight line estimation of multiple cameras according to the number of cameras for collecting data. The sight line estimation of a single camera is lower in accuracy than that of a plurality of cameras, the movable range of the head is smaller during sight line estimation, but the application range is the widest, a plurality of mobile devices in people's life are single cameras such as mobile phones and personal notebook computers, and meanwhile, the sight line estimation cost of a monocular camera is lower.
The high-precision single-camera sight line estimation technology is mature at present, and the external light source detection is adopted at presentEye part Light spotThe line of sight estimation of (1) has been able to control the error below 1 deg., but its expensive price limits its commercial application. At present, most of the information of the eyes needs to be calibrated before use, the use is inconvenient when experimenters change, the requirement on the acquired pictures is high because of the need of accurate information of eyeballs, part of the information needs to be assisted by an external light source, and the application scene is limited.
With the progress of society, people pay more and more attention to education, and various education equipment and education robots with intelligent learning functions are widely released. However, the learning of children using education equipment is often lack of supervision, attention is often not focused, and the learning state of a user can be timely supervised in real time when the user uses the learning interactive robot.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an attention intelligent supervision method based on sight line estimation, which is convenient to use, reduces the requirements on equipment, reduces the implementation cost and enlarges the application range.
In order to achieve the above object, the attention intelligent supervision method based on sight line estimation of the present invention is characterized by comprising the following steps:
(1) division of the region of interest
Dividing the whole watching area in front of a user into 9 blocks, setting the area where a screen of the learning interactive robot is located as an area five, and setting the area five as an attention focusing area for the user to learn by using the learning interactive robot; the upper left of the area five is an area I, the upper side is an area II, the upper right is an area III, the left side is an area IV, the right side is an area VI, the lower left is an area seven, the lower side is an area eight, and the lower right is an area nine;
(2) acquisition of training data
2.1) acquiring training data by using a color camera, wherein the position of the color camera is fixed in a fifth area, the face of an acquisition object (user) is over against the color camera, then the acquisition object (user) watches 9 areas respectively, and each watching area acquires n pictures with the same number;
2.2) carrying out image acquisition on different acquisition objects according to the step 2.1), and acquiring n images in each watching area;
2.3) classifying the collected pictures of all the collected objects according to the watching areas to obtain training data of 9 watching areas;
(3) labeling of training data
The training data is manually marked, and the marking content comprises two aspects: positioning the eye position, namely selecting the eye frame at which position of the whole picture the eye is positioned; the type of the eye information is the gaze region which is divided corresponding to the information (iris and pupil positions) of the eye selected by the frame in the picture; the marking process is simply to tell the network what is the eye, the region of gaze of such eye is a few;
(4) construction and training of gaze region estimation model
Adopting a Yolo network as a gazing area estimation model, adjusting the input of a GoogleNet network in the Yolo network from 224 multiplied by 224 to 448 multiplied by 448, extracting features from training data by an initial convolutional layer of the network, further extracting the features from other convolutional layers layer by layer, and predicting the gazing area category probability and a frame by a final full-connection layer; in the selection of the activation function, the Yolo network uses the ReLU (corrected linear units) in the other layers except the last layer using the logic activation function;
in the training process, selecting the parameters of neurons of an official model of a Yolo network as initial values, reserving the first 23 layers of the model, and in the training process, modifying the parameters of the neurons of the last 3 layers (the last layer of convolution layer and two layers of full-connected layers) through the error between the output obtained by the Yolo model and a label according to training data, wherein the last layer of convolution layer is provided with 70 filters;
and setting the iteration times and the learning rate of training and updating the weight of the trained pictures once, and then sending the training data into the set Yolo network to obtain a fixation area estimation model based on the Yolo network.
(5) Detecting the direction of sight in real time
The method comprises the steps that a kinect color camera (a color camera developed by Microsoft) on a learning interactive robot is used for collecting facial images of a user in real time, the collected images are used as input and sent into a watching region estimation model, and a frame, namely the eye position and the corresponding watching region category probability are obtained;
(6) eye gaze estimation-based attention detection
When the real-time detection result is not the area five within a period of time, the fact that the user's attention leaves the screen for a period of time represents that the user has left the learning state is judged according to the learning time of the user, and if the learning time of the user is lower than a set threshold value, the user is prompted to concentrate the attention until the user gazes the area to return to the area five; and if the learning time of the user is higher than the set threshold value, prompting the user to rest, recording the rest time, and continuing to detect after the rest is finished.
The object of the invention is thus achieved.
Aiming at the problems in the prior art, the invention provides an attention intelligent supervision method based on sight line estimation, which adopts a watching area concept, divides the watching area into 9 areas, manually frames eyes out by acquiring facial pictures of watching images of 9 areas of different collected objects, marks the watching area, and trains a set Yolo network, namely a watching area estimation model of the invention as training data to obtain the watching area estimation model based on the Yolo network. And finally, acquiring a face image of the user in real time during use, sending the acquired image into a trained viewing area estimation model based on the Yolo network, obtaining a result of whether the user focuses on the area five, and further judging whether the user focuses on the area five. The invention judges through the division of the watching area and the positions of the iris and the pupil of the eye, thus greatly reducing the requirements on equipment, reducing the implementation cost, having no requirements on the use position of a user, expanding the application range and being convenient for popularization and use.
Drawings
FIG. 1 is a schematic diagram of the principle of intelligent attention supervision;
FIG. 2 is a flow chart of an embodiment of the intelligent attention supervision method based on gaze estimation;
fig. 3 is a schematic diagram of the division of the gaze area;
FIG. 4 is a schematic diagram of a training data acquisition environment;
fig. 5 is a schematic view of the gaze region corresponding to different eye information;
FIG. 6 is a schematic view of a marked picture;
FIG. 7 is a schematic diagram of the structure of the Yolo network;
FIG. 8 is a schematic diagram of gaze direction estimation by the gaze region estimation model;
FIG. 9 is a schematic view of an attention detection process;
FIG. 10 is a schematic view of attention detection in an ideal state;
FIG. 11 is a schematic view of attention detection in a non-ideal state;
FIG. 12 is a schematic diagram of positioning a human body under a kinect coordinate system of a learning interactive robot;
FIG. 13 is a data acquisition schematic of a calibration process;
fig. 14 is a schematic diagram of the trimming process.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
The invention provides a sight line estimation method for attention intelligent supervision, which solves the problems through a deep learning method, has the advantages of convenient use, simple equipment and lower implementation cost, can be implemented on mobile equipment only by obtaining RGB images of the face of a user and obtaining results, and the like, and solves the defects of smaller cost and application range.
With the progress of society, people pay more and more attention to education, and various education equipment and education robots with intelligent learning functions are widely released. However, children often lack supervision to learn using educational equipment. Aiming at the problem, the invention designs a learning attention supervision method aiming at children education, which takes a sight estimation algorithm as a core, monitors whether the attention of a user is concentrated when the user uses education equipment and an education robot to learn, and can supervise the learning state of the user in real time when the user uses a learning interactive robot. According to the method and the device, the sight line direction accurate to a specific angle does not need to be obtained, the system is divided into a plurality of areas according to actual requirements, and the area sight line is estimated.
The invention divides the area facing human eyes into 9 areas, and estimates the areas of the sight line of the user, but because the space is infinitely extended, the areas except five areas are infinitely large, if the learning interactive robot is placed in the areas, larger errors can be caused, so the position of the learning interactive robot and the learning area are set as the area five, as shown in figure 1, when the sight line area of the user is detected to be the area five, the attention of the user is considered to be focused on the learning interactive robot, and when the other areas are detected, the attention of the user is not focused on the learning interactive robot.
The invention aims at the learning attention supervision of children education, uses objects (users) as children, monitors whether the attention of the users is concentrated when the users use the interactive system for learning, and can timely supervise the learning state of the users in real time when the users use the learning interactive robot.
The sight estimation part needs to use a model of a yolo network training model, does not have a ready-made database, and needs to build the database by self. In this embodiment, a scene of interactive learning using a larger screen is simulated, a sight area, that is, a watching area is divided into 9 blocks, sight estimation is performed on the 9 watching areas, data acquisition is performed according to the 9 watching areas, the learning time of a child is generally morning and afternoon, the light in the time period is sufficient, the learning place is generally in a room with sufficient light, and therefore the data acquisition time is set in the morning and afternoon, and the place is set in the room with sufficient light.
FIG. 2 is a flow chart of an embodiment of the intelligent attention supervision method based on gaze estimation;
in this embodiment, as shown in fig. 2, the attention intelligent supervision method based on gaze estimation of the present invention includes two major parts, namely: off-line training and real-time detection specifically include:
step 101: division of a gaze area
In the present invention, as shown in fig. 3, the whole gazing area in front of the user (user) is divided into 9 blocks, the area where the learning interactive robot screen is located is area five, and the area five is set as the attention focusing area where the user uses the learning interactive robot to learn; the upper left of the area five is area one, the upper is area two, the upper right is area three, the left is area four, the right is area six, the lower left is area seven, the lower is area eight, and the lower right is area nine.
Step 102: acquisition of training data
The method can train a model capable of estimating the user watching area, namely a watching area estimation model, through the yolo network, and training data needs a database, and the database is not provided and needs to be established by self. The data acquisition environment established by the database is as shown in fig. 4, in this embodiment, a color camera of the Kinect is used for acquiring data, the position of the Kinect color camera is fixed in a region five, an acquisition object, namely a user face, needs to be over against the Kinect color camera, and then the acquisition object looks at 9 regions respectively, the Kinect color camera is used for acquiring a picture of the user at the moment as training data, the more acquisition objects are in the database, but it needs to be ensured that the data volume of different acquisition objects is the same, and the better each acquisition object looks at the data volume of 9 regions is the same. Acquiring training data by using a color camera, fixing the position of the color camera in a fifth area, wherein the face of an acquisition object (a user) is opposite to the color camera, and then respectively watching 9 areas, wherein each watching area acquires n pictures with the same number; and (4) carrying out image acquisition on different acquisition objects according to the same method, and acquiring n images in each watching area.
Then classifying the collected pictures of all the collected objects according to the watching areas to obtain training data of 9 watching areas;
step 103: labeling of training data
After the training data is collected, the training data needs to be manually labeled, and the labeling content includes two aspects: positioning the eye position, namely selecting the eye frame at which position of the whole picture the eye is positioned; the type of the eye information, i.e., which gaze region the information (iris, pupil position) of the eye selected by the frame in the picture corresponds to is shown in fig. 5.
The marking process is simply to tell the network what is the eye, the gaze area of such eye is several, and the marked picture is shown in fig. 6.
Step 104: construction and training of gaze region estimation model
The invention adopts a Yolo network as a gazing area estimation model, the Yolo network structure is realized through CNN, the structure diagram is shown in figure 7, the network has 24 convolutional layers and 2 full-connection layers, the former 20 convolutional layers directly use the former 20 layers of GoogleNet, and the whole network largely uses the cascade characteristic of convolution. In the invention, in order to obtain more accurate information by detection, the input of a GoogleNet network in a Yolo network is adjusted from 224 multiplied by 224 to 448 multiplied by 448, an initial convolutional layer of the network extracts characteristics from training data, and other convolutional layers further extract the characteristics layer by layer, and a final full-connection layer predicts the category probability and the frame of a gazing area; in the selection of the activation function, the Yolo network uses the ReLU (corrected Linear Units) in the other layers except the last layer using the logical activation function.
During the training process, parameters of neurons of an official model of the yolo network are selected as initial values, the first 23 layers of the model are reserved, and during the training process, the parameters of the neurons of the last 3 layers (namely, a convolutional layer 24 and two fully-connected layers) are modified according to errors between data obtained through the model and labels.
We divide the line-of-sight region into 9 blocks, then the classification result is also 9 classes, and we set 70 filters in the last layer of convolutional layer.
And setting iteration times and learning rate of model training and updating the weight of the trained pictures once, and then sending training data into a set Yolo network to obtain a fixation area estimation model based on the Yolo network.
In the present embodiment, the number of iterations of model training is set to 50000 times, the learning rate is set to 0.00001, and the weight value is updated every 64 training images.
Training is carried out through the setting, the trained model effect is good, when the test set is detected, the accuracy rate reaches 100%, real-time detection can be achieved in a 30-frame video screen, and the accuracy rate is high. Thus, the offline model training is finished.
Step 201: color camera real-time acquisition of user face image
The method comprises the steps of collecting facial images of a user in real time through a color camera, taking the collected images as input, sending the input images into a watching region estimation model, and obtaining a frame, namely the eye position, and the corresponding watching region category probability.
Real-time detection of the user's attention requires real-time user data, i.e., a user's facial picture. In the embodiment, the Kinect color camera is used for collecting the face image of the user, and the collection frequency is 30 frames/second.
In this embodiment, a face picture of a user is acquired in real time through a Kinect color camera, a watching region estimation model is obtained by combining with previous training, the direction of sight of the user can be estimated, the picture acquired in real time is used as input, the watching region estimation model can process the picture of each frame, and a detection result of the picture is output: the frame is the eye position and the corresponding gaze area class probability.
In this embodiment, the time taken for the gaze region estimation model to process one frame of picture is about 0.014 seconds, and the processing speed is greater than the acquisition speed, which can well complete the real-time detection task. The gaze direction estimation process by the gaze region estimation model is shown in fig. 8.
Step 202: eye gaze estimation based attention detection
When the real-time detection result is not the area five within a period of time, the fact that the user's attention leaves the screen for a period of time represents that the user has left the learning state is judged according to the learning time of the user, and if the learning time of the user is lower than a set threshold value, the user is prompted to concentrate the attention until the user gazes the area to return to the area five; and if the learning time of the user is higher than the set threshold value, prompting the user to rest, recording the rest time, and continuing to detect after the rest is finished. The specific attention detection is shown in fig. 9.
Step 203: locating user position using Kinect
The model trained offline is in an ideal case, such as the Kinect color camera of the learning interactive robot with the position of the user 1, i.e. the human, facing the area five in fig. 10, and when the position of the human changes, and is in an non-ideal state (the position of the human does not face the area five), i.e. the position of the user 2 in fig. 11, a large error is caused if we still use the model in the ideal case to detect, so we need to calibrate the user in the non-ideal case. And the user needs to use the Kinect to locate the position of the user, so as to judge whether the user is in an ideal use state or a non-ideal use state at present.
The position of a user is located by using a Kinect on a learning interactive robot, the visual field range of the Kinect is in a conical shape, as shown in FIG. 12, the coordinates of a user joint point in a Kinect coordinate system can be obtained by combining a depth camera of the Kinect with data acquired by a color camera, and the position of the user in the space can be located by the coordinates of the user in the Kinect coordinate system.
Step 204: and judging whether the user is in an ideal use state or a non-ideal use state, if so, returning to the step 202 and continuing to perform attention detection.
Step 205: user calibration
When the user is in a non-ideal state, the model trained in the ideal state can cause a large error when being used for detection, and the model can not be used for detection at the moment.
In the calibration process, the user needs to watch the 9 regions according to the prompt of the learning interactive robot, and a small amount of data, namely pictures, of the 9 regions watched by the user at the current position are collected, a data collection schematic diagram in the calibration process is shown in fig. 13, and the collected data needs to record the types of the watching regions during collection.
The method includes the steps that collected data are used for modifying a gazing area estimation model, features extracted by a convolutional layer in an ideal state are still applicable to a non-ideal state, only fine tuning of a full connection layer of the model is needed by using the data (pictures), the fine tuning process can be divided into two steps, ① data are automatically marked, the data collected in the calibration process are input into the gazing area estimation model trained offline, gazing area information in an obtained detection result is replaced by area information recorded in the marking process, automatic marking can be completed, ② fine tuning of the model is achieved, the model is trained by using the automatically marked data, parameters of the first 24 layers are kept unchanged, only parameters of the full connection layer are modified, the number of times of training iteration can be set to be small, only parameters of the full connection layer are fine tuned, training time is short, and a user only needs to wait for a short period of time, and the fine tuning process is shown in fig. 14.
Then, the fine-tuned model is used to replace the offline-trained region estimation model, so that the attention of the user in the non-ideal state can be detected, and the step 202 is returned.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (3)

1. An attention intelligent supervision method based on sight line estimation is characterized by comprising the following steps:
(1) division of the region of interest
Dividing the whole watching area in front of the user into 9 blocks, setting the area where the screen of the learning interactive robot is located as an area five, and setting the area five as an attention focusing area for the user to learn by using the learning interactive robot; the upper left of the area five is an area I, the upper side is an area II, the upper right is an area III, the left side is an area IV, the right side is an area VI, the lower left is an area seven, the lower side is an area eight, and the lower right is an area nine;
(2) acquisition of training data
2.1) acquiring training data by using a color camera, wherein the position of the color camera is fixed in a fifth area, the face of the user is acquired to face the color camera, then the user watches 9 areas respectively, and each watching area acquires n pictures with the same number;
2.2) carrying out image acquisition on different acquisition users according to the step 2.1), and acquiring n images in each watching area;
2.3) classifying the collected pictures of all the collected users according to the watching areas to obtain training data of 9 watching areas;
(3) labeling of training data
The training data is manually marked, and the marking content comprises two aspects: positioning the eye position, namely selecting the eye frame at which position of the whole picture the eye is positioned; the type of the eye information is the gaze region divided by the information of the eye selected by the frame in the picture; the marking process is simply to tell the network what is the eye, the region of gaze of such eye is a few;
(4) construction and training of gaze region estimation model
Adopting a Yolo network as a gazing area estimation model, adjusting the input of a GoogleNet network in the Yolo network from 224 multiplied by 224 to 448 multiplied by 448, extracting features from training data by an initial convolutional layer of the network, further extracting the features from other convolutional layers layer by layer, and predicting the gazing area category probability and a frame by a final full-connection layer; in the selection of the activation function, the Yolo network uses the logical activation function in the last layer, and the other layers are all used relus (Rectified Linear Units);
in the training process, selecting the parameters of the neurons of the official model of the Yolo network as initial values, reserving the first 23 layers of the model, in the training process, modifying the parameters of the neurons of the last 3 layers according to the error between the output obtained by the Yolo model and the label according to training data, and setting 70 filters on the convolution layer of the last layer;
setting the iteration times and the learning rate of training and updating the weight of the trained pictures once, and then sending training data into a set Yolo network to obtain a fixation area estimation model based on the Yolo network;
(5) detecting the direction of sight in real time
Acquiring facial images of a user in real time through a color camera on the learning interactive robot, and sending the acquired images as input into a watching region estimation model to obtain a frame, namely an eye position and a corresponding watching region category probability;
(6) eye gaze estimation-based attention detection
When the real-time detection result is not the area five within a period of time, the fact that the user's attention leaves the screen for a period of time represents that the user has left the learning state is judged according to the learning time of the user, and if the learning time of the user is lower than a set threshold value, the user is prompted to concentrate the attention until the user gazes the area to return to the area five; and if the learning time of the user is higher than the set threshold value, prompting the user to rest, recording the rest time, and continuing to detect after the rest is finished.
2. The intelligent attention supervision method according to claim 1, wherein in the step (4), the number of iterations of model training is set to 50000 times, the learning rate is set to 0.00001, and the weight value is updated every 64 training charts.
3. The intelligent attention supervision method according to claim 1, further comprising the steps of:
(7) positioning the position of a user by using a kinect color camera on the learning interactive robot, judging whether the user is in an ideal use state or a non-ideal use state, returning to the step (6) if the user is in the ideal use state, continuing to perform attention detection, and otherwise, turning to the step (8);
(8) user calibration
The method comprises the following steps that a user watches 9 regions according to the prompt of a learning interactive robot, pictures of the 9 regions watched by the user at the position of the user are collected, the collected pictures are used for modifying a watching region estimation model, the characteristics extracted from a convolutional layer in an ideal state are still suitable for a non-ideal state, so that the fine adjustment of a full connection layer of the model only needs to be carried out by using the data, the fine adjustment process can be divided into two steps, namely ① automatically marking data, inputting the data collected in the calibration process into the watching region estimation model trained offline, replacing the watching region information in the obtained detection result with the region information recorded during marking, and completing automatic marking;
and (5) replacing the offline-trained area estimation model with the fine-tuned model, namely, performing attention detection on the user in the non-ideal state, and returning to the step (6).
CN201710546644.4A 2017-07-06 2017-07-06 Attention intelligent supervision method based on sight line estimation Active CN107392120B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710546644.4A CN107392120B (en) 2017-07-06 2017-07-06 Attention intelligent supervision method based on sight line estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710546644.4A CN107392120B (en) 2017-07-06 2017-07-06 Attention intelligent supervision method based on sight line estimation

Publications (2)

Publication Number Publication Date
CN107392120A CN107392120A (en) 2017-11-24
CN107392120B true CN107392120B (en) 2020-04-14

Family

ID=60335468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710546644.4A Active CN107392120B (en) 2017-07-06 2017-07-06 Attention intelligent supervision method based on sight line estimation

Country Status (1)

Country Link
CN (1) CN107392120B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108684B (en) * 2017-12-15 2020-07-17 杭州电子科技大学 Attention detection method integrating sight detection
CN108460700B (en) * 2017-12-28 2021-11-16 北京科教科学研究院 Intelligent student education management regulation and control system
CN108510062A (en) * 2018-03-29 2018-09-07 东南大学 A kind of robot irregular object crawl pose rapid detection method based on concatenated convolutional neural network
CN108595047A (en) * 2018-04-20 2018-09-28 北京硬壳科技有限公司 Touch control object recognition methods and device
CN108961679A (en) * 2018-06-27 2018-12-07 广州视源电子科技股份有限公司 A kind of attention based reminding method, device and electronic equipment
CN110479331A (en) * 2019-08-05 2019-11-22 江苏大学 A kind of preparation method and its usage of 3D printing monolithic catalyst
CN110543828A (en) * 2019-08-08 2019-12-06 南京励智心理大数据产业研究院有限公司 Student attention analysis system based on wearable device and multi-mode intelligent analysis
CN111508142A (en) * 2020-04-17 2020-08-07 深圳爱莫科技有限公司 Sight voice interaction automatic cigarette vending machine
CN111881830A (en) * 2020-07-28 2020-11-03 安徽爱学堂教育科技有限公司 Interactive prompting method based on attention concentration detection
CN112306832A (en) * 2020-10-27 2021-02-02 北京字节跳动网络技术有限公司 User state response method and device, electronic equipment and storage medium
CN113064485A (en) * 2021-03-17 2021-07-02 广东电网有限责任公司 Supervision method and system for training and examination
CN113705349B (en) * 2021-07-26 2023-06-06 电子科技大学 Attention quantitative analysis method and system based on line-of-sight estimation neural network
CN116214522B (en) * 2023-05-05 2023-08-29 中建科技集团有限公司 Mechanical arm control method, system and related equipment based on intention recognition

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593352A (en) * 2009-06-12 2009-12-02 浙江大学 Driving safety monitoring system based on face orientation and visual focus
CN102018519A (en) * 2009-09-15 2011-04-20 由田新技股份有限公司 Staff concentration degree monitoring system
CN103366381A (en) * 2013-08-06 2013-10-23 山东大学 Sight line tracking correcting method based on space position
CN103661375A (en) * 2013-11-25 2014-03-26 同济大学 Lane departure alarming method and system with driving distraction state considered
CN104460185A (en) * 2014-11-28 2015-03-25 小米科技有限责任公司 Automatic focusing method and device
CN104850228A (en) * 2015-05-14 2015-08-19 上海交通大学 Mobile terminal-based method for locking watch area of eyeballs
CN105005788A (en) * 2015-06-25 2015-10-28 中国计量学院 Target perception method based on emulation of human low level vision
CN106796449A (en) * 2014-09-02 2017-05-31 香港浸会大学 Eye-controlling focus method and device
CN106909220A (en) * 2017-02-21 2017-06-30 山东师范大学 A kind of sight line exchange method suitable for touch-control

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5474202B2 (en) * 2009-09-29 2014-04-16 アルカテル−ルーセント Method and apparatus for detecting a gazing point based on face detection and image measurement

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593352A (en) * 2009-06-12 2009-12-02 浙江大学 Driving safety monitoring system based on face orientation and visual focus
CN102018519A (en) * 2009-09-15 2011-04-20 由田新技股份有限公司 Staff concentration degree monitoring system
CN103366381A (en) * 2013-08-06 2013-10-23 山东大学 Sight line tracking correcting method based on space position
CN103661375A (en) * 2013-11-25 2014-03-26 同济大学 Lane departure alarming method and system with driving distraction state considered
CN106796449A (en) * 2014-09-02 2017-05-31 香港浸会大学 Eye-controlling focus method and device
CN104460185A (en) * 2014-11-28 2015-03-25 小米科技有限责任公司 Automatic focusing method and device
CN104850228A (en) * 2015-05-14 2015-08-19 上海交通大学 Mobile terminal-based method for locking watch area of eyeballs
CN105005788A (en) * 2015-06-25 2015-10-28 中国计量学院 Target perception method based on emulation of human low level vision
CN106909220A (en) * 2017-02-21 2017-06-30 山东师范大学 A kind of sight line exchange method suitable for touch-control

Also Published As

Publication number Publication date
CN107392120A (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN107392120B (en) Attention intelligent supervision method based on sight line estimation
TWI741512B (en) Method, device and electronic equipment for monitoring driver's attention
WO2020125499A1 (en) Operation prompting method and glasses
US11844608B2 (en) Posture analysis systems and methods
CN109343700B (en) Eye movement control calibration data acquisition method and device
CN101587542A (en) Field depth blending strengthening display method and system based on eye movement tracking
CN106095089A (en) A kind of method obtaining interesting target information
CN104978548A (en) Visual line estimation method and visual line estimation device based on three-dimensional active shape model
CN105516280A (en) Multi-mode learning process state information compression recording method
WO2023011339A1 (en) Line-of-sight direction tracking method and apparatus
CN101383000A (en) Information processing apparatus, information processing method, and computer program
US10884494B1 (en) Eye tracking device calibration
CN112666705A (en) Eye movement tracking device and eye movement tracking method
CN111008542A (en) Object concentration analysis method and device, electronic terminal and storage medium
CN109559332A (en) A kind of sight tracing of the two-way LSTM and Itracker of combination
CN110472546B (en) Infant non-contact eye movement feature extraction device and method
CN110148092A (en) The analysis method of teenager's sitting posture based on machine vision and emotional state
CN106725531A (en) Children's concentration detecting and analysing system and method based on sight line
WO2023041940A1 (en) Gaze-based behavioural monitoring system
WO2020151430A1 (en) Air imaging system and implementation method therefor
CN116453198B (en) Sight line calibration method and device based on head posture difference
JP6819194B2 (en) Information processing systems, information processing equipment and programs
CN112861633A (en) Image recognition method and device based on machine learning and storage medium
CN113491502A (en) Eyeball tracking calibration inspection method, device, equipment and storage medium
CN112651270A (en) Gaze information determination method and apparatus, terminal device and display object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant