CN112766097B

CN112766097B - Sight line recognition model training method, sight line recognition device and sight line recognition equipment

Info

Publication number: CN112766097B
Application number: CN202110015600.5A
Authority: CN
Inventors: 朱冬晨; 林敏静; 李航; 李嘉茂; 张晓林
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2021-01-06
Filing date: 2021-01-06
Publication date: 2024-02-13
Anticipated expiration: 2041-01-06
Also published as: CN112766097A

Abstract

The invention relates to a training method, a sight line recognition method, a device and equipment of a sight line recognition model, the method comprises the steps of obtaining a sample image set, wherein the sample image set comprises sample images comprising a first area and a second area, inputting the sample images into a preset machine learning model, performing sight line recognition processing to obtain first sight line information corresponding to the first area and second sight line information corresponding to the second area, determining third sight line information corresponding to the second area based on the first sight line information, determining first loss information based on the second sight line information and the third sight line information, determining loss information according to the first loss information, adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as the sight line recognition model. The invention can improve the accuracy of the sight line recognition model in recognizing the sight line on the premise of not increasing the scale of the sight line recognition model.

Description

Sight line recognition model training method, sight line recognition device and sight line recognition equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, apparatus, and device for training a gaze recognition model.

Background

Research shows that about 80% of external information acquired by a person comes from human eyes, and the human eyes acquire object information in a mode that the eye balls are rotated to enable the clearest image of the object to appear in the fovea of the retina, and the connecting line of the fovea and the cornea center is the sight. Because the sight has the characteristics of substantivity, naturality, bidirectionality and the like, the sight becomes a research hot spot in recent years, so that the sight can be well applied to various scenes. For example, the method can be applied to a human-computer interaction scene, the machine is controlled by utilizing the moving direction of the sight, the physiological inconvenience of the disabled and the aged is compensated, the method can also be applied to a safe driving scene, the driving state of the driver is estimated by tracking the sight of the driver so as to prevent accidents, the method can also be applied to a media scene, and the interest point of the user is determined by collecting the stay time of the sight for advertisement.

The existing sight line estimation method mainly comprises an apparent application-based method and a geometric model-based method. Both methods can adopt two modes of 3D sight line estimation and 2D sight line estimation, wherein the 3D sight line estimation predicts the three-dimensional sight line direction of the sight line, and the 2D sight line estimation predicts the coordinate of the sight line on a preset plane.

Due to the rise of the deep learning technology, the appearance-based method can adopt a convolutional neural network to regress the sight line from a single eye, normalize the original picture according to eye key points, camera internal parameters and a three-dimensional average face model, so as to reduce the influence of head gestures on the sight line estimation. However, when the convolutional neural network has only an eye region as input, and other useful regions of the face are absent, the accuracy of the line-of-sight estimation of the output will be reduced. Based on the method, the input of the convolutional neural network can be expanded from the eye area to the full face area to perform line-of-sight regression, a attention mechanism is added in the convolutional neural network, the weight of the eye area and the weight of the face area are increased, and the weight of useless areas such as the background is weakened, so that the accuracy of line-of-sight estimation is improved. In addition, the training can be combined through a plurality of convolutional neural networks, namely, the video regression is split, specifically, one network can be used for extracting the picture to extract the head gesture, the key points of the face, the face depth map, the eye area and the like, and then the other network is used for performing the line of sight regression based on the extracted characteristics to obtain the line of sight estimation.

In the method for estimating the sight line based on the convolutional neural network, the sight line can be estimated without additionally designing eye features, and compared with the method based on the appearance, the method has stronger applicability. Although the convolutional neural network can obtain abundant characteristic information by independently inputting an eye region or inputting a full face region, a complex framework structure is introduced in design, a method of combining and training a plurality of convolutional neural networks also complicates sight estimation, and accuracy of outputting sight information is difficult to ensure.

Disclosure of Invention

The embodiment of the invention provides a training method, a sight line identification method, a device and electronic equipment for a sight line identification model, which can improve the accuracy of the sight line identification model in identifying the sight line on the premise of not increasing the scale of the sight line identification model.

The embodiment of the invention provides a training method of a sight line recognition model, which comprises the following steps:

acquiring a sample image set; the sample image set includes sample images; the sample image includes a first region and a second region;

inputting the sample image into a preset machine learning model, and performing line-of-sight identification processing to obtain first line-of-sight information corresponding to a first region and second line-of-sight information corresponding to a second region;

determining third sight line information corresponding to the second area based on the first sight line information;

determining first loss information based on the second line-of-sight information and the third line-of-sight information;

determining loss information according to the first loss information;

and adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as a sight line recognition model.

Further, determining third line-of-sight information corresponding to the second region based on the first line-of-sight information, includes:

based on a preset coordinate system, determining a first center vector corresponding to the first area, a second center vector corresponding to the second area and a first vector corresponding to the first sight line information;

third line-of-sight information is determined from the first center vector, the second center vector, and the first vector.

Further, after determining the first loss information based on the second line-of-sight information and the third line-of-sight information, further includes:

acquiring sight tag information corresponding to a sample image; the sight line tag information comprises sight line tag information corresponding to the first area and sight line tag information corresponding to the second area;

determining second loss information according to the sight line tag information, the first sight line information and the second sight line information;

determining loss information from the first loss information, comprising:

and determining loss information according to the first loss information and the second loss information.

Further, determining the first loss information based on the second line-of-sight information and the third line-of-sight information includes:

determining distance information corresponding to the second sight line information and the third sight line information based on a preset coordinate system;

and determining first loss information according to the distance information.

Further, after obtaining the first line-of-sight information corresponding to the first area and the second line-of-sight information corresponding to the second area, the method further includes:

determining first loss information from the first line-of-sight information and the second line-of-sight information, comprising:

based on a preset coordinate system, determining first coordinate information corresponding to the first sight line information, second coordinate information corresponding to the second sight line information, and determining a first standardized vector corresponding to the first area and a second standardized vector corresponding to the second area;

based on a preset conversion rule, determining a first angle vector corresponding to the first coordinate information and a second angle vector corresponding to the second coordinate information;

the first loss information is determined based on the first normalized vector, the second normalized vector, the first angle vector, and the second angle vector.

Correspondingly, the embodiment of the invention also provides a sight line identification method, which comprises the following steps:

acquiring an image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified;

taking the image to be processed as input of a sight line recognition model, and performing sight line recognition processing to obtain first target sight line information corresponding to a first region to be recognized and second target sight line information corresponding to a second region to be recognized; wherein the line of sight recognition model is the line of sight recognition model described hereinabove.

Correspondingly, the embodiment of the invention also provides a training device of the sight line recognition model, which comprises:

the sample image set acquisition module is used for acquiring a sample image set; the sample image set includes sample images; the sample image includes a first region and a second region;

the recognition processing module is used for inputting the sample image into a preset machine learning model, and performing sight line recognition processing to obtain first sight line information corresponding to the first area and second sight line information corresponding to the second area;

a third sight line information determining module, configured to determine third sight line information corresponding to the second area based on the first sight line information;

a first loss information determination module for determining first loss information based on the second line-of-sight information and the third line-of-sight information;

the loss information determining module is used for determining loss information according to the first loss information;

the line-of-sight recognition model determining module is used for adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as the line-of-sight recognition model.

Correspondingly, the embodiment of the invention also provides a sight line identification device, which comprises:

the image acquisition module to be processed is used for acquiring the image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified;

the target sight determining module is used for taking the image to be processed as the input of the sight recognition model, and performing sight recognition processing to obtain first target sight information corresponding to the first region to be recognized and second target sight information corresponding to the second region to be recognized; wherein the line of sight recognition model is the line of sight recognition model described hereinabove.

Correspondingly, the embodiment of the invention also provides a training device of the sight line recognition model, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the training method of the sight line recognition model.

Accordingly, an embodiment of the present invention further provides a line-of-sight recognition apparatus, where the apparatus includes a processor and a memory, and at least one instruction, at least one program, a code set, or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the line-of-sight recognition method of the line-of-sight recognition model described above.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention can improve the accuracy of the sight line recognition model in recognizing the sight line on the premise of not increasing the scale of the sight line recognition model.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a training method of a line-of-sight recognition model according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of a sample image provided by an embodiment of the present invention;

fig. 3 is a flowchart of determining third sight line information corresponding to a second area according to an embodiment of the present invention;

FIG. 4 is a flow chart of determining first loss information provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of distance information according to an embodiment of the present invention;

FIG. 6 is a flow chart of determining first loss information provided by an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a training device for a line-of-sight recognition model according to an embodiment of the present invention;

FIG. 8 is a flow chart of a line of sight identification method provided by an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a sight line recognition device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail with reference to the accompanying drawings. It will be apparent that the described embodiments are merely one embodiment of the invention, and not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic may be included in at least one implementation of the invention. In the description of embodiments of the present invention, it should be understood that the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", or a third "may include one or more of the feature, either explicitly or implicitly. Moreover, the terms "first," "second," "third," and the like, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or modules is not necessarily limited to those steps or modules that are expressly listed or inherent to such process, method, apparatus, article, or device.

The present specification provides method operational steps as illustrated by an example or flowchart, but may include more or fewer operational steps based on conventional or non-inventive labor. The sequence of steps recited in the embodiments is only one manner of a plurality of execution sequences, and does not represent a unique execution sequence, and when actually executed, may be executed sequentially or in parallel (e.g., in a parallel processor or a multithreaded environment) according to the method shown in the embodiments or the drawings.

Referring to the following description of a specific embodiment of a training method of a line-of-sight recognition model according to the present invention, fig. 1 is a flowchart of a training method of a line-of-sight recognition model according to an embodiment of the present invention, and specifically as shown in fig. 1, the method includes:

s101: acquiring a sample image set; the sample image set includes sample images; the sample image includes a first region and a second region.

In an alternative embodiment, the sample image set may comprise a plurality of sample images, each of which may comprise a first region and a second region, i.e. each of which may comprise a left eye corresponding region and a right eye corresponding region. Fig. 2 is a schematic diagram of a sample image according to an embodiment of the present invention, where the sample image includes a first area and a second area, and the shapes of the first area and the second area may be rectangular, circular, or triangular, and the disclosure is not limited specifically.

In another alternative embodiment, the sample image set may include a plurality of sample image groups, each sample image group may include a first sample image and a second sample image, the first sample image may include a first region, and the second sample image may include a second region, so that each sample image group may include a first region and a second region, that is, two sample images in each sample image group may include a left-eye corresponding region and a right-eye corresponding region, respectively.

In practical applications, the sample image may further include other facial regions, for example, corresponding regions of other face key parts such as a forehead corresponding region, a mouth corresponding region, a nose corresponding region, and the like. When the sample image includes the corresponding areas of other face key parts, the sample image at least needs to include the left eye corresponding area and the right eye corresponding area. The richness of the acquired information can be increased by using the corresponding regions of both eyes alone or in combination with the corresponding regions of other face key parts.

S103: and inputting the sample image into a preset machine learning model, and performing line-of-sight recognition processing to obtain first line-of-sight information corresponding to the first region and second line-of-sight information corresponding to the second region.

In the embodiment of the invention, the sample image can be input into a preset machine learning model for line-of-sight recognition processing, and the preset machine learning model can output first line-of-sight information corresponding to the first region and second line-of-sight information corresponding to the second region, that is, the preset machine learning model can output line-of-sight corresponding to the left eye and line-of-sight corresponding to the right eye.

S105: third line-of-sight information corresponding to the second region is determined based on the first line-of-sight information.

The eyes have a coordination property, normal people can rotate in a linkage way between the eyes and ensure that the eyes watch the same target point, and even if one eye is blocked, the blocked eye still rotates along with the rotation of the non-blocked eye.

In an embodiment of the present invention, fig. 3 is a flowchart of determining third line-of-sight information corresponding to a second area provided in the embodiment of the present invention, and as shown in fig. 3, the step of determining the third line-of-sight information corresponding to the second area may specifically include:

s301: determining a first center vector corresponding to the first area, a second center vector corresponding to the second area and a first vector corresponding to the first sight line information;

in the embodiment of the present invention, the first center vector may be a vector corresponding to a center of one eye corresponding to the first area, the second center vector may be a vector corresponding to a center of the other eye corresponding to the second area, and the first vector may be a vector corresponding to the eye corresponding to the first area when the eye gazes at the target point.

S303: and determining third sight line information corresponding to the second area according to the first center vector, the second center vector and the first vector.

In an alternative embodiment, assuming that the first region is a left eye corresponding region and the second region is a right eye corresponding region, the first center vector P corresponding to the center of the left eye may be determined based on a spatial coordinate system pre-stored in a preset machine learning model ₁ Second center vector P corresponding to right eye center ₂ And a first vector V corresponding to the left eye's line of sight when the left eye gazes at the target point ₁ Determining the corresponding line of sight of the right eye at the momentInformation, namely a third vector corresponding to the right eye which accords with the binocular coordination characteristic. The formula for determining the sight line information corresponding to the right eye conforming to the binocular coordination characteristic can be specifically as follows:

V ₁₂ ＝P ₁ -P ₂ +V ₁

the above example is assumed that the first region is a left-eye corresponding region, the second region is a right-eye corresponding region, and the line-of-sight information corresponding to the right eye that accords with the coordination attribute is determined. To avoid repetition, no further description is provided here.

S107: first loss information is determined based on the second line-of-sight information and the third line-of-sight information.

In an embodiment of the present invention, fig. 4 is a flowchart of determining first loss information provided in the embodiment of the present invention, and as shown in fig. 4, the step of determining the first loss information may specifically include:

s401: and determining distance information corresponding to the second sight line information and the third sight line information based on a preset coordinate system.

In the embodiment of the present invention, the specific expression form of the distance information may be a norm solution between the vector corresponding to the second sight line information and the vector corresponding to the third sight line information, or may be a spatial angle difference between the vector corresponding to the second sight line information and the vector corresponding to the third sight line information. The representation forms of the second line-of-sight information and the third line-of-sight information include, but are not limited to, an euler angle and an euler space vector.

In the embodiment of the invention, the second vector V corresponding to the second sight line information may be determined based on a preset coordinate system ₂ Third vector V corresponding to third sight line information ₁₂ And then according to the second vector V ₂ And a third vector V ₁₂ Determining distance information L corresponding to the second sight line information and the third sight line information ₁₂ Determining distance information L ₁₂ The formula of (c) may be specifically:

L ₁₂ ＝Dist(V ₁₂ ,V ₂ )

in an alternative embodimentThe first region is assumed to be a left-eye corresponding region, and the second region is assumed to be a right-eye corresponding region. Specifically, the second vector V corresponding to the line of sight corresponding to the right eye may be determined based on the spatial coordinate system described above ₂ And a third vector V corresponding to the right eye corresponding line of sight which accords with the binocular coordination feature ₁₂ And then according to the second vector V ₂ And a third vector V ₁₂ Distance information corresponding to the second line-of-sight information and the third line-of-sight information is determined. FIG. 5 is a schematic diagram of distance information, determining distance information L according to an embodiment of the present invention ₁₂ The formula of (c) may be specifically:

s403: and determining first loss information according to the distance information.

In an alternative embodiment, the first loss information may be determined based on the distance information. For example, the first loss value may be determined from a norm solution between a vector corresponding to the second line-of-sight information and a vector corresponding to the third line-of-sight information. The first loss value may be determined based on a spatial angle difference between a vector corresponding to the second line-of-sight information and a vector corresponding to the third line-of-sight information.

In another alternative embodiment, after obtaining the first line-of-sight information corresponding to the first area and the second line-of-sight information corresponding to the second area as described above, the first loss information may be further determined according to the first line-of-sight information and the second line-of-sight information. Fig. 6 is a flowchart of determining first loss information according to an embodiment of the present invention, where, as shown in fig. 6, the step of determining a first loss value may specifically include:

s601: based on a preset coordinate system, determining first coordinate information corresponding to the first sight line information, second coordinate information corresponding to the second sight line information, and determining a first standardized vector corresponding to the first area and a second standardized vector corresponding to the second area.

In an embodiment of the present invention, the first is determined based on the spatial coordinate system described aboveFirst coordinate information (x) corresponding to line of sight information ₁ ,y ₁ ,z ₁ ) Second coordinate information (x) corresponding to the second line of sight information ₂ ,y ₂ ,z ₂ ) And determining a first normalized vector P corresponding to the first region based on the spatial coordinate system ₃ A second normalized vector P corresponding to the second region ₄ 。

S603: and determining a first angle vector corresponding to the first coordinate information and a second angle vector corresponding to the second coordinate information based on a preset conversion rule.

In the embodiment of the present invention, the first coordinate information (x) may be determined based on a preset conversion rule ₁ ,y ₁ ,z ₁ ) Corresponding first angle vector g ₁ And second coordinate information (x ₂ ,y ₂ ,z ₂ ) Corresponding second angle vector g ₂ . In an alternative embodiment, the preset conversion rule may be expressed as follows:

x＝cos(φ)*sin(θ)

y＝-sin(φ)

z＝cos(φ)*cos(θ)

φ＝-arcsin(y)

wherein the first angle vector g ₁ And a second angle vector g ₂ The (θ, phi) representation may be used, respectively.

S605: the first loss information is determined based on the first normalized vector, the second normalized vector, the first angle vector, and the second angle vector.

In the embodiment of the present invention, the first normalized vector P may be used ₃ Second normalized vector P ₄ First angle vector g ₁ And a second angle vector g ₂ Distance information L corresponding to the first sight line information and the second sight line information is determined, and first loss information is determined according to the distance information. In an alternative embodiment, the formula for determining the distance information L may be specifically:

where G represents the conversion of the angle vector into a vector corresponding to coordinates and α represents the relative depth adjustment factor.

S109: and determining loss information according to the first loss information.

In an alternative embodiment, the loss information may be determined based on the first parameter information and the first loss information. Specifically, the loss value may be determined according to a first preset parameter and a norm solution between a vector corresponding to the second line-of-sight information and a vector corresponding to the third line-of-sight information. The loss value may also be determined according to the first preset parameter and a spatial angle difference between a vector corresponding to the second line of sight information and a vector corresponding to the third line of sight information.

In another alternative embodiment, after the first loss information is determined based on the second line-of-sight information and the third line-of-sight information, line-of-sight tag information corresponding to the sample image may be further acquired, and the second loss information is determined according to the line-of-sight tag information, the first line-of-sight information, and the second line-of-sight information, and further loss information is determined according to the first loss information and the second loss information. The sight line tag information comprises sight line tag information corresponding to the first area and sight line tag information corresponding to the second area. That is, the second loss information may be determined based on the line-of-sight tag information corresponding to the first area, the line-of-sight tag information corresponding to the second area, the first line-of-sight information, and the second line-of-sight information, the second loss value may be determined based on the line-of-sight tag information corresponding to the first area and the first line-of-sight information, and the second loss value may be determined based on the line-of-sight tag information corresponding to the second area and the second line-of-sight information.

Next, description will be given taking an example of determining second loss information based on the line-of-sight tag information, the first line-of-sight information, and the second line-of-sight information.

Based on the space coordinate system, sight line tag information corresponding to the first area, sight line tag information corresponding to the second area, vectors corresponding to the first sight line information and the second sight line information respectively can be determined, further, first distance information between the sight line tag information corresponding to the first area and the vectors corresponding to the first sight line information respectively is determined, second distance information between the sight line tag information corresponding to the second area and the vectors corresponding to the second sight line information respectively is determined, and second loss information is determined according to the first distance information and the second distance information. Here, the specific expression of the first distance information may be a norm solution between the vector corresponding to the line-of-sight tag information and the vector corresponding to the first line-of-sight information, or may be a spatial angle difference between the vector corresponding to the line-of-sight tag information and the vector corresponding to the first line-of-sight information. The second distance information may be expressed in a form of a norm solution between a vector corresponding to the line-of-sight tag information and a vector corresponding to the second line-of-sight information, or may be a spatial angle difference between a vector corresponding to the line-of-sight tag information and a vector corresponding to the second line-of-sight information. The representation forms of the line-of-sight tag information, the first line-of-sight information, and the second line-of-sight information include, but are not limited to, european space vectors and Euler angles.

Based on the planar coordinate system, the first distance information may be in a specific form of a distance between a vector end point corresponding to the line-of-sight tag information and a vector end point corresponding to the first line-of-sight information, and the second distance information may be in a specific form of a distance between a vector end point corresponding to the line-of-sight tag information and a vector end point corresponding to the second line-of-sight information.

The specific formulas of the first distance information and the second distance information can adopt the following three formulas:

L＝||V-V'||＝||P-P'||

wherein V represents a vector corresponding to the sight line tag information, V 'represents a vector corresponding to the first sight line information or a vector corresponding to the second sight line information, P represents a vector end point corresponding to the sight line tag information, P' represents a vector end point corresponding to the first sight line information or a vector end point corresponding to the second sight line information,

s111: and adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as a sight line recognition model.

In the embodiment of the invention, parameters in the preset machine learning model can be adjusted based on the loss information until the loss information meets the preset condition, namely, when the total loss value is stably smaller than the expected loss value, the preset machine learning model is used as the sight line recognition model.

By adopting the training method of the sight line recognition model provided by the embodiment of the invention, the accuracy of the sight line recognition model in recognizing the sight line can be improved on the premise of not increasing the scale of the sight line recognition model by comprehensively considering the recognized binocular sight line information and the sight line information output based on the binocular coordination rule to determine the loss information so as to train the sight line recognition model.

The embodiment of the invention also provides a training device for the sight line recognition model, and fig. 7 is a schematic structural diagram of the training device for the sight line recognition model, as shown in fig. 7, where the training device includes:

the sample image set acquisition module 701 is configured to acquire a sample image set; the sample image set includes sample images; the sample image includes a first region and a second region;

the recognition processing module 703 is configured to input the sample image into a preset machine learning model, perform line-of-sight recognition processing, and obtain first line-of-sight information corresponding to the first region and second line-of-sight information corresponding to the second region;

the third sight line information determining module 705 is configured to determine third sight line information corresponding to the second area based on the binocular coordination rule and the first sight line information;

the first loss information determination module 707 is configured to determine first loss information based on the second line-of-sight information and the third line-of-sight information;

the loss information determining module 709 is configured to determine loss information according to the first loss information;

the line-of-sight recognition model determination module 711 is configured to adjust model parameters in a preset machine learning model based on the loss information until the loss information satisfies a preset condition, and take the preset machine learning model when the preset condition is satisfied as the line-of-sight recognition model.

The apparatus and method embodiments in the embodiments of the present invention are based on the same inventive concept.

The embodiment of the invention also provides training equipment of the sight line recognition model, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, code set or instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the training method of the sight line recognition model.

Referring to the following description of a specific embodiment of a line of sight recognition method according to the present invention, fig. 8 is a flowchart of a line of sight recognition method according to an embodiment of the present invention, and specifically as shown in fig. 8, the method includes:

s801: acquiring an image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified.

In the embodiment of the invention, the image to be processed may include a first area to be identified and a second area to be identified, that is, the image to be processed may include a left eye corresponding area and a right eye corresponding area. In practical applications, the image to be processed may further include other facial regions, for example, corresponding regions of other face key parts such as a forehead corresponding region, a mouth corresponding region, a nose corresponding region, and the like.

S803: taking the image to be processed as input of a sight line recognition model, and performing sight line recognition processing to obtain first target sight line information corresponding to a first region to be recognized and second target sight line information corresponding to a second region to be recognized; wherein the line of sight recognition model is the line of sight recognition model described hereinabove.

The embodiment of the invention also provides a sight line recognition device, and fig. 9 is a schematic structural diagram of the sight line recognition device provided by the embodiment of the invention, as shown in fig. 9, the device includes:

the image to be processed acquisition module 901 is used for acquiring an image to be processed; the image to be processed comprises a first area to be identified and a second area to be identified;

the target sight determining module 903 is configured to perform sight recognition processing by using the image to be processed as input of a sight recognition model, to obtain first target sight information corresponding to the first region to be recognized and second target sight information corresponding to the second region to be recognized; wherein the line of sight recognition model is the line of sight recognition model described hereinabove.

The embodiment of the invention also provides a sight line recognition device, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the sight line recognition method of the sight line recognition model.

According to the embodiment of the method, the device or the storage medium for training the sight line recognition model, which are provided by the embodiment of the invention, the recognized binocular sight line information and the sight line information output based on the binocular coordination rule are comprehensively considered, and the loss information is determined to train the sight line recognition model, so that the scale of the sight line recognition model can be reduced, and the accuracy of the sight line recognition model in recognizing the sight line can be improved. .

It should be noted that: the order in which the embodiments of the invention are presented is intended to be illustrative only and is not intended to limit the invention to the particular embodiments disclosed, and other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order in a different embodiment and can achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or the sequential order shown, to achieve desirable results, and in some embodiments, multitasking parallel processing may be possible or advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the embodiments of the device, the description is relatively simple, since it is based on embodiments similar to the method, as relevant see the description of parts of the method embodiments.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A method of training a line-of-sight recognition model, comprising:

acquiring a sample image set; the sample image set includes a sample image; the sample image includes a first region and a second region;

inputting the sample image into a preset machine learning model, and performing line-of-sight identification processing to obtain first line-of-sight information corresponding to the first region and second line-of-sight information corresponding to the second region;

determining loss information according to the first loss information;

adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model when the loss information meets the preset conditions as the sight line recognition model;

the method comprises the steps of determining third sight line information corresponding to the second area based on the first sight line information, wherein the steps of determining the first center vector corresponding to the first area, the second center vector corresponding to the second area and the first vector corresponding to the first sight line information based on a preset coordinate system, and determining the third sight line information according to the first center vector, the second center vector and the first vector;

the first loss information is determined based on the second sight line information and the third sight line information, the distance information corresponding to the second sight line information and the third sight line information is determined based on the preset coordinate system, and the first loss information is determined according to the distance information.

2. The method of claim 1, wherein the determining the third line-of-sight information from the first center vector, the second center vector, and the first vector comprises:

based on formula V ₁₂ ＝P ₁ -P ₂ +V ₁ Determining the third sight line information;

wherein P is ₁ For the first center vector, P ₂ For the second center vector, V ₁ For the first vector, V ₁₂ And the third sight line information.

3. The method of claim 1, wherein after determining first loss information based on the second line-of-sight information and the third line-of-sight information, further comprising:

acquiring sight line label information corresponding to the sample image; the sight line tag information comprises sight line tag information corresponding to the first area and sight line tag information corresponding to the second area;

the determining loss information according to the first loss information includes:

4. The method according to claim 1, wherein after obtaining the first line-of-sight information corresponding to the first area and the second line-of-sight information corresponding to the second area, further comprises:

determining the first loss information according to the first line-of-sight information and the second line-of-sight information, including:

determining first coordinate information corresponding to the first sight line information, second coordinate information corresponding to the second sight line information and a first standardized vector corresponding to the first area and a second standardized vector corresponding to the second area based on the preset coordinate system;

determining a first angle vector corresponding to the first coordinate information and a second angle vector corresponding to the second coordinate information based on a preset conversion rule;

and determining the first loss information according to the first standardized vector, the second standardized vector, the first angle vector and the second angle vector.

5. A line-of-sight recognition method, comprising:

taking the image to be processed as input of a sight line recognition model, and performing sight line recognition processing to obtain first target sight line information corresponding to the first region to be recognized and second target sight line information corresponding to the second region to be recognized; wherein the line of sight identification model is the line of sight identification model of any one of claims 1-4.

6. A training device for a line-of-sight recognition model, comprising:

the sample image set acquisition module is used for acquiring a sample image set; the sample image set includes a sample image; the sample image includes a first region and a second region;

a third sight line information determining module, configured to determine third sight line information corresponding to the second area based on the first sight line information; determining a first center vector corresponding to the first area, a second center vector corresponding to the second area and a first vector corresponding to the first sight line information based on a preset coordinate system, and determining the third sight line information according to the first center vector, the second center vector and the first vector;

a first loss information determination module configured to determine first loss information based on the second line-of-sight information and the third line-of-sight information; determining distance information corresponding to the second sight line information and the third sight line information based on the preset coordinate system, and determining the first loss information according to the distance information;

the line-of-sight recognition model determining module is used for adjusting model parameters in the preset machine learning model based on the loss information until the loss information meets preset conditions, and taking the preset machine learning model meeting the preset conditions as the line-of-sight recognition model.

7. A line-of-sight recognition apparatus, comprising:

the target sight determining module is used for taking the image to be processed as the input of a sight recognition model, and performing sight recognition processing to obtain first target sight information corresponding to the first region to be recognized and second target sight information corresponding to the second region to be recognized; wherein the line of sight identification model is the line of sight identification model of any one of claims 1-5.

8. A training device for a line of sight identification model, characterized in that the device comprises a processor and a memory, in which at least one instruction, at least one program, code set or instruction set is stored, which is loaded and executed by the processor to implement the training method for a line of sight identification model according to any of claims 1-4.

9. A line of sight identification apparatus, characterized in that the apparatus comprises a processor and a memory, in which at least one instruction, at least one program, a set of codes or a set of instructions is stored, which is loaded and executed by the processor to implement the line of sight identification method of claim 5.