CN112766097B - Sight line recognition model training method, sight line recognition device and sight line recognition equipment - Google Patents

Sight line recognition model training method, sight line recognition device and sight line recognition equipment Download PDF

Info

Publication number
CN112766097B
CN112766097B CN202110015600.5A CN202110015600A CN112766097B CN 112766097 B CN112766097 B CN 112766097B CN 202110015600 A CN202110015600 A CN 202110015600A CN 112766097 B CN112766097 B CN 112766097B
Authority
CN
China
Prior art keywords
sight
information
line
sight line
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110015600.5A
Other languages
Chinese (zh)
Other versions
CN112766097A (en
Inventor
朱冬晨
林敏静
李航
李嘉茂
张晓林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Microsystem and Information Technology of CAS
Original Assignee
Shanghai Institute of Microsystem and Information Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Microsystem and Information Technology of CAS filed Critical Shanghai Institute of Microsystem and Information Technology of CAS
Priority to CN202110015600.5A priority Critical patent/CN112766097B/en
Publication of CN112766097A publication Critical patent/CN112766097A/en
Application granted granted Critical
Publication of CN112766097B publication Critical patent/CN112766097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The invention relates to a training method, a sight line recognition method, a device and equipment of a sight line recognition model, the method comprises the steps of obtaining a sample image set, wherein the sample image set comprises sample images comprising a first area and a second area, inputting the sample images into a preset machine learning model, performing sight line recognition processing to obtain first sight line information corresponding to the first area and second sight line information corresponding to the second area, determining third sight line information corresponding to the second area based on the first sight line information, determining first loss information based on the second sight line information and the third sight line information, determining loss information according to the first loss information, adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as the sight line recognition model. The invention can improve the accuracy of the sight line recognition model in recognizing the sight line on the premise of not increasing the scale of the sight line recognition model.

Description

Sight line recognition model training method, sight line recognition device and sight line recognition equipment
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, apparatus, and device for training a gaze recognition model.
Background
Research shows that about 80% of external information acquired by a person comes from human eyes, and the human eyes acquire object information in a mode that the eye balls are rotated to enable the clearest image of the object to appear in the fovea of the retina, and the connecting line of the fovea and the cornea center is the sight. Because the sight has the characteristics of substantivity, naturality, bidirectionality and the like, the sight becomes a research hot spot in recent years, so that the sight can be well applied to various scenes. For example, the method can be applied to a human-computer interaction scene, the machine is controlled by utilizing the moving direction of the sight, the physiological inconvenience of the disabled and the aged is compensated, the method can also be applied to a safe driving scene, the driving state of the driver is estimated by tracking the sight of the driver so as to prevent accidents, the method can also be applied to a media scene, and the interest point of the user is determined by collecting the stay time of the sight for advertisement.
The existing sight line estimation method mainly comprises an apparent application-based method and a geometric model-based method. Both methods can adopt two modes of 3D sight line estimation and 2D sight line estimation, wherein the 3D sight line estimation predicts the three-dimensional sight line direction of the sight line, and the 2D sight line estimation predicts the coordinate of the sight line on a preset plane.
Due to the rise of the deep learning technology, the appearance-based method can adopt a convolutional neural network to regress the sight line from a single eye, normalize the original picture according to eye key points, camera internal parameters and a three-dimensional average face model, so as to reduce the influence of head gestures on the sight line estimation. However, when the convolutional neural network has only an eye region as input, and other useful regions of the face are absent, the accuracy of the line-of-sight estimation of the output will be reduced. Based on the method, the input of the convolutional neural network can be expanded from the eye area to the full face area to perform line-of-sight regression, a attention mechanism is added in the convolutional neural network, the weight of the eye area and the weight of the face area are increased, and the weight of useless areas such as the background is weakened, so that the accuracy of line-of-sight estimation is improved. In addition, the training can be combined through a plurality of convolutional neural networks, namely, the video regression is split, specifically, one network can be used for extracting the picture to extract the head gesture, the key points of the face, the face depth map, the eye area and the like, and then the other network is used for performing the line of sight regression based on the extracted characteristics to obtain the line of sight estimation.
In the method for estimating the sight line based on the convolutional neural network, the sight line can be estimated without additionally designing eye features, and compared with the method based on the appearance, the method has stronger applicability. Although the convolutional neural network can obtain abundant characteristic information by independently inputting an eye region or inputting a full face region, a complex framework structure is introduced in design, a method of combining and training a plurality of convolutional neural networks also complicates sight estimation, and accuracy of outputting sight information is difficult to ensure.
Disclosure of Invention
The embodiment of the invention provides a training method, a sight line identification method, a device and electronic equipment for a sight line identification model, which can improve the accuracy of the sight line identification model in identifying the sight line on the premise of not increasing the scale of the sight line identification model.
The embodiment of the invention provides a training method of a sight line recognition model, which comprises the following steps:
acquiring a sample image set; the sample image set includes sample images; the sample image includes a first region and a second region;
inputting the sample image into a preset machine learning model, and performing line-of-sight identification processing to obtain first line-of-sight information corresponding to a first region and second line-of-sight information corresponding to a second region;
determining third sight line information corresponding to the second area based on the first sight line information;
determining first loss information based on the second line-of-sight information and the third line-of-sight information;
determining loss information according to the first loss information;
and adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as a sight line recognition model.
Further, determining third line-of-sight information corresponding to the second region based on the first line-of-sight information, includes:
based on a preset coordinate system, determining a first center vector corresponding to the first area, a second center vector corresponding to the second area and a first vector corresponding to the first sight line information;
third line-of-sight information is determined from the first center vector, the second center vector, and the first vector.
Further, after determining the first loss information based on the second line-of-sight information and the third line-of-sight information, further includes:
acquiring sight tag information corresponding to a sample image; the sight line tag information comprises sight line tag information corresponding to the first area and sight line tag information corresponding to the second area;
determining second loss information according to the sight line tag information, the first sight line information and the second sight line information;
determining loss information from the first loss information, comprising:
and determining loss information according to the first loss information and the second loss information.
Further, determining the first loss information based on the second line-of-sight information and the third line-of-sight information includes:
determining distance information corresponding to the second sight line information and the third sight line information based on a preset coordinate system;
and determining first loss information according to the distance information.
Further, after obtaining the first line-of-sight information corresponding to the first area and the second line-of-sight information corresponding to the second area, the method further includes:
determining first loss information from the first line-of-sight information and the second line-of-sight information, comprising:
based on a preset coordinate system, determining first coordinate information corresponding to the first sight line information, second coordinate information corresponding to the second sight line information, and determining a first standardized vector corresponding to the first area and a second standardized vector corresponding to the second area;
based on a preset conversion rule, determining a first angle vector corresponding to the first coordinate information and a second angle vector corresponding to the second coordinate information;
the first loss information is determined based on the first normalized vector, the second normalized vector, the first angle vector, and the second angle vector.
Correspondingly, the embodiment of the invention also provides a sight line identification method, which comprises the following steps:
acquiring an image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified;
taking the image to be processed as input of a sight line recognition model, and performing sight line recognition processing to obtain first target sight line information corresponding to a first region to be recognized and second target sight line information corresponding to a second region to be recognized; wherein the line of sight recognition model is the line of sight recognition model described hereinabove.
Correspondingly, the embodiment of the invention also provides a training device of the sight line recognition model, which comprises:
the sample image set acquisition module is used for acquiring a sample image set; the sample image set includes sample images; the sample image includes a first region and a second region;
the recognition processing module is used for inputting the sample image into a preset machine learning model, and performing sight line recognition processing to obtain first sight line information corresponding to the first area and second sight line information corresponding to the second area;
a third sight line information determining module, configured to determine third sight line information corresponding to the second area based on the first sight line information;
a first loss information determination module for determining first loss information based on the second line-of-sight information and the third line-of-sight information;
the loss information determining module is used for determining loss information according to the first loss information;
the line-of-sight recognition model determining module is used for adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as the line-of-sight recognition model.
Correspondingly, the embodiment of the invention also provides a sight line identification device, which comprises:
the image acquisition module to be processed is used for acquiring the image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified;
the target sight determining module is used for taking the image to be processed as the input of the sight recognition model, and performing sight recognition processing to obtain first target sight information corresponding to the first region to be recognized and second target sight information corresponding to the second region to be recognized; wherein the line of sight recognition model is the line of sight recognition model described hereinabove.
Correspondingly, the embodiment of the invention also provides a training device of the sight line recognition model, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the training method of the sight line recognition model.
Accordingly, an embodiment of the present invention further provides a line-of-sight recognition apparatus, where the apparatus includes a processor and a memory, and at least one instruction, at least one program, a code set, or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the line-of-sight recognition method of the line-of-sight recognition model described above.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention can improve the accuracy of the sight line recognition model in recognizing the sight line on the premise of not increasing the scale of the sight line recognition model.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a training method of a line-of-sight recognition model according to an embodiment of the present invention;
FIG. 2 is a schematic illustration of a sample image provided by an embodiment of the present invention;
fig. 3 is a flowchart of determining third sight line information corresponding to a second area according to an embodiment of the present invention;
FIG. 4 is a flow chart of determining first loss information provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of distance information according to an embodiment of the present invention;
FIG. 6 is a flow chart of determining first loss information provided by an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a training device for a line-of-sight recognition model according to an embodiment of the present invention;
FIG. 8 is a flow chart of a line of sight identification method provided by an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a sight line recognition device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail with reference to the accompanying drawings. It will be apparent that the described embodiments are merely one embodiment of the invention, and not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic may be included in at least one implementation of the invention. In the description of embodiments of the present invention, it should be understood that the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", or a third "may include one or more of the feature, either explicitly or implicitly. Moreover, the terms "first," "second," "third," and the like, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or modules is not necessarily limited to those steps or modules that are expressly listed or inherent to such process, method, apparatus, article, or device.
The present specification provides method operational steps as illustrated by an example or flowchart, but may include more or fewer operational steps based on conventional or non-inventive labor. The sequence of steps recited in the embodiments is only one manner of a plurality of execution sequences, and does not represent a unique execution sequence, and when actually executed, may be executed sequentially or in parallel (e.g., in a parallel processor or a multithreaded environment) according to the method shown in the embodiments or the drawings.
Referring to the following description of a specific embodiment of a training method of a line-of-sight recognition model according to the present invention, fig. 1 is a flowchart of a training method of a line-of-sight recognition model according to an embodiment of the present invention, and specifically as shown in fig. 1, the method includes:
s101: acquiring a sample image set; the sample image set includes sample images; the sample image includes a first region and a second region.
In an alternative embodiment, the sample image set may comprise a plurality of sample images, each of which may comprise a first region and a second region, i.e. each of which may comprise a left eye corresponding region and a right eye corresponding region. Fig. 2 is a schematic diagram of a sample image according to an embodiment of the present invention, where the sample image includes a first area and a second area, and the shapes of the first area and the second area may be rectangular, circular, or triangular, and the disclosure is not limited specifically.
In another alternative embodiment, the sample image set may include a plurality of sample image groups, each sample image group may include a first sample image and a second sample image, the first sample image may include a first region, and the second sample image may include a second region, so that each sample image group may include a first region and a second region, that is, two sample images in each sample image group may include a left-eye corresponding region and a right-eye corresponding region, respectively.
In practical applications, the sample image may further include other facial regions, for example, corresponding regions of other face key parts such as a forehead corresponding region, a mouth corresponding region, a nose corresponding region, and the like. When the sample image includes the corresponding areas of other face key parts, the sample image at least needs to include the left eye corresponding area and the right eye corresponding area. The richness of the acquired information can be increased by using the corresponding regions of both eyes alone or in combination with the corresponding regions of other face key parts.
S103: and inputting the sample image into a preset machine learning model, and performing line-of-sight recognition processing to obtain first line-of-sight information corresponding to the first region and second line-of-sight information corresponding to the second region.
In the embodiment of the invention, the sample image can be input into a preset machine learning model for line-of-sight recognition processing, and the preset machine learning model can output first line-of-sight information corresponding to the first region and second line-of-sight information corresponding to the second region, that is, the preset machine learning model can output line-of-sight corresponding to the left eye and line-of-sight corresponding to the right eye.
S105: third line-of-sight information corresponding to the second region is determined based on the first line-of-sight information.
The eyes have a coordination property, normal people can rotate in a linkage way between the eyes and ensure that the eyes watch the same target point, and even if one eye is blocked, the blocked eye still rotates along with the rotation of the non-blocked eye.
In an embodiment of the present invention, fig. 3 is a flowchart of determining third line-of-sight information corresponding to a second area provided in the embodiment of the present invention, and as shown in fig. 3, the step of determining the third line-of-sight information corresponding to the second area may specifically include:
s301: determining a first center vector corresponding to the first area, a second center vector corresponding to the second area and a first vector corresponding to the first sight line information;
in the embodiment of the present invention, the first center vector may be a vector corresponding to a center of one eye corresponding to the first area, the second center vector may be a vector corresponding to a center of the other eye corresponding to the second area, and the first vector may be a vector corresponding to the eye corresponding to the first area when the eye gazes at the target point.
S303: and determining third sight line information corresponding to the second area according to the first center vector, the second center vector and the first vector.
In an alternative embodiment, assuming that the first region is a left eye corresponding region and the second region is a right eye corresponding region, the first center vector P corresponding to the center of the left eye may be determined based on a spatial coordinate system pre-stored in a preset machine learning model 1 Second center vector P corresponding to right eye center 2 And a first vector V corresponding to the left eye's line of sight when the left eye gazes at the target point 1 Determining the corresponding line of sight of the right eye at the momentInformation, namely a third vector corresponding to the right eye which accords with the binocular coordination characteristic. The formula for determining the sight line information corresponding to the right eye conforming to the binocular coordination characteristic can be specifically as follows:
V 12 =P 1 -P 2 +V 1
the above example is assumed that the first region is a left-eye corresponding region, the second region is a right-eye corresponding region, and the line-of-sight information corresponding to the right eye that accords with the coordination attribute is determined. To avoid repetition, no further description is provided here.
S107: first loss information is determined based on the second line-of-sight information and the third line-of-sight information.
In an embodiment of the present invention, fig. 4 is a flowchart of determining first loss information provided in the embodiment of the present invention, and as shown in fig. 4, the step of determining the first loss information may specifically include:
s401: and determining distance information corresponding to the second sight line information and the third sight line information based on a preset coordinate system.
In the embodiment of the present invention, the specific expression form of the distance information may be a norm solution between the vector corresponding to the second sight line information and the vector corresponding to the third sight line information, or may be a spatial angle difference between the vector corresponding to the second sight line information and the vector corresponding to the third sight line information. The representation forms of the second line-of-sight information and the third line-of-sight information include, but are not limited to, an euler angle and an euler space vector.
In the embodiment of the invention, the second vector V corresponding to the second sight line information may be determined based on a preset coordinate system 2 Third vector V corresponding to third sight line information 12 And then according to the second vector V 2 And a third vector V 12 Determining distance information L corresponding to the second sight line information and the third sight line information 12 Determining distance information L 12 The formula of (c) may be specifically:
L 12 =Dist(V 12 ,V 2 )
in an alternative embodimentThe first region is assumed to be a left-eye corresponding region, and the second region is assumed to be a right-eye corresponding region. Specifically, the second vector V corresponding to the line of sight corresponding to the right eye may be determined based on the spatial coordinate system described above 2 And a third vector V corresponding to the right eye corresponding line of sight which accords with the binocular coordination feature 12 And then according to the second vector V 2 And a third vector V 12 Distance information corresponding to the second line-of-sight information and the third line-of-sight information is determined. FIG. 5 is a schematic diagram of distance information, determining distance information L according to an embodiment of the present invention 12 The formula of (c) may be specifically:
s403: and determining first loss information according to the distance information.
In an alternative embodiment, the first loss information may be determined based on the distance information. For example, the first loss value may be determined from a norm solution between a vector corresponding to the second line-of-sight information and a vector corresponding to the third line-of-sight information. The first loss value may be determined based on a spatial angle difference between a vector corresponding to the second line-of-sight information and a vector corresponding to the third line-of-sight information.
In another alternative embodiment, after obtaining the first line-of-sight information corresponding to the first area and the second line-of-sight information corresponding to the second area as described above, the first loss information may be further determined according to the first line-of-sight information and the second line-of-sight information. Fig. 6 is a flowchart of determining first loss information according to an embodiment of the present invention, where, as shown in fig. 6, the step of determining a first loss value may specifically include:
s601: based on a preset coordinate system, determining first coordinate information corresponding to the first sight line information, second coordinate information corresponding to the second sight line information, and determining a first standardized vector corresponding to the first area and a second standardized vector corresponding to the second area.
In an embodiment of the present invention, the first is determined based on the spatial coordinate system described aboveFirst coordinate information (x) corresponding to line of sight information 1 ,y 1 ,z 1 ) Second coordinate information (x) corresponding to the second line of sight information 2 ,y 2 ,z 2 ) And determining a first normalized vector P corresponding to the first region based on the spatial coordinate system 3 A second normalized vector P corresponding to the second region 4
S603: and determining a first angle vector corresponding to the first coordinate information and a second angle vector corresponding to the second coordinate information based on a preset conversion rule.
In the embodiment of the present invention, the first coordinate information (x) may be determined based on a preset conversion rule 1 ,y 1 ,z 1 ) Corresponding first angle vector g 1 And second coordinate information (x 2 ,y 2 ,z 2 ) Corresponding second angle vector g 2 . In an alternative embodiment, the preset conversion rule may be expressed as follows:
x=cos(φ)*sin(θ)
y=-sin(φ)
z=cos(φ)*cos(θ)
φ=-arcsin(y)
wherein the first angle vector g 1 And a second angle vector g 2 The (θ, phi) representation may be used, respectively.
S605: the first loss information is determined based on the first normalized vector, the second normalized vector, the first angle vector, and the second angle vector.
In the embodiment of the present invention, the first normalized vector P may be used 3 Second normalized vector P 4 First angle vector g 1 And a second angle vector g 2 Distance information L corresponding to the first sight line information and the second sight line information is determined, and first loss information is determined according to the distance information. In an alternative embodiment, the formula for determining the distance information L may be specifically:
where G represents the conversion of the angle vector into a vector corresponding to coordinates and α represents the relative depth adjustment factor.
S109: and determining loss information according to the first loss information.
In an alternative embodiment, the loss information may be determined based on the first parameter information and the first loss information. Specifically, the loss value may be determined according to a first preset parameter and a norm solution between a vector corresponding to the second line-of-sight information and a vector corresponding to the third line-of-sight information. The loss value may also be determined according to the first preset parameter and a spatial angle difference between a vector corresponding to the second line of sight information and a vector corresponding to the third line of sight information.
In another alternative embodiment, after the first loss information is determined based on the second line-of-sight information and the third line-of-sight information, line-of-sight tag information corresponding to the sample image may be further acquired, and the second loss information is determined according to the line-of-sight tag information, the first line-of-sight information, and the second line-of-sight information, and further loss information is determined according to the first loss information and the second loss information. The sight line tag information comprises sight line tag information corresponding to the first area and sight line tag information corresponding to the second area. That is, the second loss information may be determined based on the line-of-sight tag information corresponding to the first area, the line-of-sight tag information corresponding to the second area, the first line-of-sight information, and the second line-of-sight information, the second loss value may be determined based on the line-of-sight tag information corresponding to the first area and the first line-of-sight information, and the second loss value may be determined based on the line-of-sight tag information corresponding to the second area and the second line-of-sight information.
Next, description will be given taking an example of determining second loss information based on the line-of-sight tag information, the first line-of-sight information, and the second line-of-sight information.
Based on the space coordinate system, sight line tag information corresponding to the first area, sight line tag information corresponding to the second area, vectors corresponding to the first sight line information and the second sight line information respectively can be determined, further, first distance information between the sight line tag information corresponding to the first area and the vectors corresponding to the first sight line information respectively is determined, second distance information between the sight line tag information corresponding to the second area and the vectors corresponding to the second sight line information respectively is determined, and second loss information is determined according to the first distance information and the second distance information. Here, the specific expression of the first distance information may be a norm solution between the vector corresponding to the line-of-sight tag information and the vector corresponding to the first line-of-sight information, or may be a spatial angle difference between the vector corresponding to the line-of-sight tag information and the vector corresponding to the first line-of-sight information. The second distance information may be expressed in a form of a norm solution between a vector corresponding to the line-of-sight tag information and a vector corresponding to the second line-of-sight information, or may be a spatial angle difference between a vector corresponding to the line-of-sight tag information and a vector corresponding to the second line-of-sight information. The representation forms of the line-of-sight tag information, the first line-of-sight information, and the second line-of-sight information include, but are not limited to, european space vectors and Euler angles.
Based on the planar coordinate system, the first distance information may be in a specific form of a distance between a vector end point corresponding to the line-of-sight tag information and a vector end point corresponding to the first line-of-sight information, and the second distance information may be in a specific form of a distance between a vector end point corresponding to the line-of-sight tag information and a vector end point corresponding to the second line-of-sight information.
The specific formulas of the first distance information and the second distance information can adopt the following three formulas:
L=||V-V'||=||P-P'||
wherein V represents a vector corresponding to the sight line tag information, V 'represents a vector corresponding to the first sight line information or a vector corresponding to the second sight line information, P represents a vector end point corresponding to the sight line tag information, P' represents a vector end point corresponding to the first sight line information or a vector end point corresponding to the second sight line information,
s111: and adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as a sight line recognition model.
In the embodiment of the invention, parameters in the preset machine learning model can be adjusted based on the loss information until the loss information meets the preset condition, namely, when the total loss value is stably smaller than the expected loss value, the preset machine learning model is used as the sight line recognition model.
By adopting the training method of the sight line recognition model provided by the embodiment of the invention, the accuracy of the sight line recognition model in recognizing the sight line can be improved on the premise of not increasing the scale of the sight line recognition model by comprehensively considering the recognized binocular sight line information and the sight line information output based on the binocular coordination rule to determine the loss information so as to train the sight line recognition model.
The embodiment of the invention also provides a training device for the sight line recognition model, and fig. 7 is a schematic structural diagram of the training device for the sight line recognition model, as shown in fig. 7, where the training device includes:
the sample image set acquisition module 701 is configured to acquire a sample image set; the sample image set includes sample images; the sample image includes a first region and a second region;
the recognition processing module 703 is configured to input the sample image into a preset machine learning model, perform line-of-sight recognition processing, and obtain first line-of-sight information corresponding to the first region and second line-of-sight information corresponding to the second region;
the third sight line information determining module 705 is configured to determine third sight line information corresponding to the second area based on the binocular coordination rule and the first sight line information;
the first loss information determination module 707 is configured to determine first loss information based on the second line-of-sight information and the third line-of-sight information;
the loss information determining module 709 is configured to determine loss information according to the first loss information;
the line-of-sight recognition model determination module 711 is configured to adjust model parameters in a preset machine learning model based on the loss information until the loss information satisfies a preset condition, and take the preset machine learning model when the preset condition is satisfied as the line-of-sight recognition model.
The apparatus and method embodiments in the embodiments of the present invention are based on the same inventive concept.
The embodiment of the invention also provides training equipment of the sight line recognition model, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, code set or instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the training method of the sight line recognition model.
Referring to the following description of a specific embodiment of a line of sight recognition method according to the present invention, fig. 8 is a flowchart of a line of sight recognition method according to an embodiment of the present invention, and specifically as shown in fig. 8, the method includes:
s801: acquiring an image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified.
In the embodiment of the invention, the image to be processed may include a first area to be identified and a second area to be identified, that is, the image to be processed may include a left eye corresponding area and a right eye corresponding area. In practical applications, the image to be processed may further include other facial regions, for example, corresponding regions of other face key parts such as a forehead corresponding region, a mouth corresponding region, a nose corresponding region, and the like.
S803: taking the image to be processed as input of a sight line recognition model, and performing sight line recognition processing to obtain first target sight line information corresponding to a first region to be recognized and second target sight line information corresponding to a second region to be recognized; wherein the line of sight recognition model is the line of sight recognition model described hereinabove.
The embodiment of the invention also provides a sight line recognition device, and fig. 9 is a schematic structural diagram of the sight line recognition device provided by the embodiment of the invention, as shown in fig. 9, the device includes:
the image to be processed acquisition module 901 is used for acquiring an image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified;
the target sight determining module 903 is configured to perform sight recognition processing by using the image to be processed as input of a sight recognition model, to obtain first target sight information corresponding to the first region to be recognized and second target sight information corresponding to the second region to be recognized; wherein the line of sight recognition model is the line of sight recognition model described hereinabove.
The apparatus and method embodiments in the embodiments of the present invention are based on the same inventive concept.
The embodiment of the invention also provides a sight line recognition device, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the sight line recognition method of the sight line recognition model.
According to the embodiment of the method, the device or the storage medium for training the sight line recognition model, which are provided by the embodiment of the invention, the recognized binocular sight line information and the sight line information output based on the binocular coordination rule are comprehensively considered, and the loss information is determined to train the sight line recognition model, so that the scale of the sight line recognition model can be reduced, and the accuracy of the sight line recognition model in recognizing the sight line can be improved. .
It should be noted that: the order in which the embodiments of the invention are presented is intended to be illustrative only and is not intended to limit the invention to the particular embodiments disclosed, and other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order in a different embodiment and can achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or the sequential order shown, to achieve desirable results, and in some embodiments, multitasking parallel processing may be possible or advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the embodiments of the device, the description is relatively simple, since it is based on embodiments similar to the method, as relevant see the description of parts of the method embodiments.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims (9)

1. A method of training a line-of-sight recognition model, comprising:
acquiring a sample image set; the sample image set includes a sample image; the sample image includes a first region and a second region;
inputting the sample image into a preset machine learning model, and performing line-of-sight identification processing to obtain first line-of-sight information corresponding to the first region and second line-of-sight information corresponding to the second region;
determining third sight line information corresponding to the second area based on the first sight line information;
determining first loss information based on the second line-of-sight information and the third line-of-sight information;
determining loss information according to the first loss information;
adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as the sight line recognition model;
the method comprises the steps of determining third sight line information corresponding to the second area based on the first sight line information, wherein the steps of determining the first center vector corresponding to the first area, the second center vector corresponding to the second area and the first vector corresponding to the first sight line information based on a preset coordinate system, and determining the third sight line information according to the first center vector, the second center vector and the first vector;
the first loss information is determined based on the second sight line information and the third sight line information, the distance information corresponding to the second sight line information and the third sight line information is determined based on the preset coordinate system, and the first loss information is determined according to the distance information.
2. The method of claim 1, wherein the determining the third line-of-sight information from the first center vector, the second center vector, and the first vector comprises:
based on formula V 12 =P 1 -P 2 +V 1 Determining the third sight line information;
wherein P is 1 For the first center vector, P 2 For the second center vector, V 1 For the first vector, V 12 And the third sight line information.
3. The method of claim 1, wherein after determining first loss information based on the second line-of-sight information and the third line-of-sight information, further comprising:
acquiring sight line label information corresponding to the sample image; the sight line tag information comprises sight line tag information corresponding to the first area and sight line tag information corresponding to the second area;
determining second loss information according to the sight line tag information, the first sight line information and the second sight line information;
the determining loss information according to the first loss information includes:
and determining loss information according to the first loss information and the second loss information.
4. The method according to claim 1, wherein after obtaining the first line-of-sight information corresponding to the first area and the second line-of-sight information corresponding to the second area, further comprises:
determining the first loss information according to the first line-of-sight information and the second line-of-sight information, including:
determining first coordinate information corresponding to the first sight line information, second coordinate information corresponding to the second sight line information and a first standardized vector corresponding to the first area and a second standardized vector corresponding to the second area based on the preset coordinate system;
determining a first angle vector corresponding to the first coordinate information and a second angle vector corresponding to the second coordinate information based on a preset conversion rule;
and determining the first loss information according to the first standardized vector, the second standardized vector, the first angle vector and the second angle vector.
5. A line-of-sight recognition method, comprising:
acquiring an image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified;
taking the image to be processed as input of a sight line recognition model, and performing sight line recognition processing to obtain first target sight line information corresponding to the first region to be recognized and second target sight line information corresponding to the second region to be recognized; wherein the line of sight identification model is the line of sight identification model of any one of claims 1-4.
6. A training device for a line-of-sight recognition model, comprising:
the sample image set acquisition module is used for acquiring a sample image set; the sample image set includes a sample image; the sample image includes a first region and a second region;
the recognition processing module is used for inputting the sample image into a preset machine learning model, and performing sight line recognition processing to obtain first sight line information corresponding to the first area and second sight line information corresponding to the second area;
a third sight line information determining module, configured to determine third sight line information corresponding to the second area based on the first sight line information; determining a first center vector corresponding to the first area, a second center vector corresponding to the second area and a first vector corresponding to the first sight line information based on a preset coordinate system, and determining the third sight line information according to the first center vector, the second center vector and the first vector;
a first loss information determination module configured to determine first loss information based on the second line-of-sight information and the third line-of-sight information; determining distance information corresponding to the second sight line information and the third sight line information based on the preset coordinate system, and determining the first loss information according to the distance information;
the loss information determining module is used for determining loss information according to the first loss information;
the line-of-sight recognition model determining module is used for adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model meeting the preset conditions as the line-of-sight recognition model.
7. A line-of-sight recognition apparatus, comprising:
the image acquisition module to be processed is used for acquiring the image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified;
the target sight determining module is used for taking the image to be processed as the input of a sight recognition model, and performing sight recognition processing to obtain first target sight information corresponding to the first region to be recognized and second target sight information corresponding to the second region to be recognized; wherein the line of sight identification model is the line of sight identification model of any one of claims 1-5.
8. A training device for a line of sight identification model, characterized in that the device comprises a processor and a memory, in which at least one instruction, at least one program, code set or instruction set is stored, which is loaded and executed by the processor to implement the training method for a line of sight identification model according to any of claims 1-4.
9. A line of sight identification apparatus, characterized in that the apparatus comprises a processor and a memory, in which at least one instruction, at least one program, a set of codes or a set of instructions is stored, which is loaded and executed by the processor to implement the line of sight identification method of claim 5.
CN202110015600.5A 2021-01-06 2021-01-06 Sight line recognition model training method, sight line recognition device and sight line recognition equipment Active CN112766097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110015600.5A CN112766097B (en) 2021-01-06 2021-01-06 Sight line recognition model training method, sight line recognition device and sight line recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110015600.5A CN112766097B (en) 2021-01-06 2021-01-06 Sight line recognition model training method, sight line recognition device and sight line recognition equipment

Publications (2)

Publication Number Publication Date
CN112766097A CN112766097A (en) 2021-05-07
CN112766097B true CN112766097B (en) 2024-02-13

Family

ID=75700353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110015600.5A Active CN112766097B (en) 2021-01-06 2021-01-06 Sight line recognition model training method, sight line recognition device and sight line recognition equipment

Country Status (1)

Country Link
CN (1) CN112766097B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113233266A (en) * 2021-06-03 2021-08-10 昆山杜克大学 Non-contact elevator interaction system and method thereof
CN113807330B (en) * 2021-11-19 2022-03-08 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Three-dimensional sight estimation method and device for resource-constrained scene

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107065207A (en) * 2017-03-28 2017-08-18 文栋 A kind of imaging display methods, device and its application that simple eye 3D effect can be achieved
CN108427921A (en) * 2018-02-28 2018-08-21 辽宁科技大学 A kind of face identification method based on convolutional neural networks
CN110008835A (en) * 2019-03-05 2019-07-12 成都旷视金智科技有限公司 Sight prediction technique, device, system and readable storage medium storing program for executing
CN110058694A (en) * 2019-04-24 2019-07-26 腾讯科技(深圳)有限公司 Method, the method and device of Eye-controlling focus of Eye-controlling focus model training
CN110321820A (en) * 2019-06-24 2019-10-11 东南大学 A kind of sight drop point detection method based on contactless device
WO2020063000A1 (en) * 2018-09-29 2020-04-02 北京市商汤科技开发有限公司 Neural network training and line of sight detection methods and apparatuses, and electronic device
CN111199189A (en) * 2019-12-18 2020-05-26 中国科学院上海微系统与信息技术研究所 Target object tracking method and system, electronic equipment and storage medium
CN111222399A (en) * 2019-10-30 2020-06-02 腾讯科技(深圳)有限公司 Method and device for identifying object identification information in image and storage medium
CN111259713A (en) * 2019-09-16 2020-06-09 浙江工业大学 Sight tracking method based on self-adaptive weighting
CN111723828A (en) * 2019-03-18 2020-09-29 北京市商汤科技开发有限公司 Watching region detection method and device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10671890B2 (en) * 2018-03-30 2020-06-02 Tobii Ab Training of a neural network for three dimensional (3D) gaze prediction
US11526808B2 (en) * 2019-05-29 2022-12-13 The Board Of Trustees Of The Leland Stanford Junior University Machine learning based generation of ontology for structural and functional mapping

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107065207A (en) * 2017-03-28 2017-08-18 文栋 A kind of imaging display methods, device and its application that simple eye 3D effect can be achieved
CN108427921A (en) * 2018-02-28 2018-08-21 辽宁科技大学 A kind of face identification method based on convolutional neural networks
WO2020063000A1 (en) * 2018-09-29 2020-04-02 北京市商汤科技开发有限公司 Neural network training and line of sight detection methods and apparatuses, and electronic device
CN110969061A (en) * 2018-09-29 2020-04-07 北京市商汤科技开发有限公司 Neural network training method, neural network training device, visual line detection method, visual line detection device and electronic equipment
CN110008835A (en) * 2019-03-05 2019-07-12 成都旷视金智科技有限公司 Sight prediction technique, device, system and readable storage medium storing program for executing
CN111723828A (en) * 2019-03-18 2020-09-29 北京市商汤科技开发有限公司 Watching region detection method and device and electronic equipment
CN110058694A (en) * 2019-04-24 2019-07-26 腾讯科技(深圳)有限公司 Method, the method and device of Eye-controlling focus of Eye-controlling focus model training
CN110321820A (en) * 2019-06-24 2019-10-11 东南大学 A kind of sight drop point detection method based on contactless device
CN111259713A (en) * 2019-09-16 2020-06-09 浙江工业大学 Sight tracking method based on self-adaptive weighting
CN111222399A (en) * 2019-10-30 2020-06-02 腾讯科技(深圳)有限公司 Method and device for identifying object identification information in image and storage medium
CN111199189A (en) * 2019-12-18 2020-05-26 中国科学院上海微系统与信息技术研究所 Target object tracking method and system, electronic equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Convolutional Neural Network-Based Methods for Eye Gaze Estimation: A Survey;Andronicus A. Akinyelu等;IEEE Access;第8卷;第142581-142605页 *
Cooperative Movements of Binocular Motor System;Zhang Xiao lin等;2008 IEEE International Conference on Automation Science and Engineering;第321-327页 *
仿生双眼的立体视控制系统;张晓林;电子设计工程;第26卷(第6期);第1-6页 *
基于视线跟踪的人机交互感知机制的研究;房爱青;中国优秀硕士学位论文全文数据库 信息科技辑(第12期);第I138-1364页 *
头部运动与视线追踪数据融合技术的研究;蒋光毅;中国优秀硕士学位论文全文数据库 信息科技辑(第2期);第I138-1747页 *

Also Published As

Publication number Publication date
CN112766097A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
US11302009B2 (en) Method of image processing using a neural network
CN104978548B (en) A kind of gaze estimation method and device based on three-dimensional active shape model
USRE42205E1 (en) Method and system for real-time facial image enhancement
CN104794465B (en) A kind of biopsy method based on posture information
JP4692526B2 (en) Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method
WO2009091029A1 (en) Face posture estimating device, face posture estimating method, and face posture estimating program
JP4865517B2 (en) Head position / posture detection device
CN112766097B (en) Sight line recognition model training method, sight line recognition device and sight line recognition equipment
EP3154407B1 (en) A gaze estimation method and apparatus
MX2013002904A (en) Person image processing apparatus and person image processing method.
JP2008194146A (en) Visual line detecting apparatus and its method
CN110913751A (en) Wearable eye tracking system with slip detection and correction functions
CN111680550B (en) Emotion information identification method and device, storage medium and computer equipment
CN110531853B (en) Electronic book reader control method and system based on human eye fixation point detection
JP4936491B2 (en) Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method
Emery et al. OpenNEEDS: A dataset of gaze, head, hand, and scene signals during exploration in open-ended VR environments
KR101639161B1 (en) Personal authentication method using skeleton information
Sun et al. Real-time gaze estimation with online calibration
CN114120432A (en) Online learning attention tracking method based on sight estimation and application thereof
CN112183200A (en) Eye movement tracking method and system based on video image
CN112633217A (en) Human face recognition living body detection method for calculating sight direction based on three-dimensional eyeball model
CN106778576B (en) Motion recognition method based on SEHM characteristic diagram sequence
CN113486691A (en) Intelligent device and control method thereof
Zhou et al. Learning a 3D gaze estimator with adaptive weighted strategy
JP2014093006A (en) Head posture estimation device, head posture estimation method and program for making computer execute head posture estimation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant