CN108491823B

CN108491823B - Method and device for generating human eye recognition model

Info

Publication number: CN108491823B
Application number: CN201810286481.5A
Authority: CN
Inventors: 陈艳琴
Original assignee: Baidu Online Network Technology Beijing Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2021-12-24
Anticipated expiration: 2038-03-30
Also published as: CN108491823A

Abstract

The embodiment of the application discloses a method and a device for generating a human eye recognition model. One embodiment of the method comprises: acquiring a training sample set, wherein the training sample comprises a sample eye image, and marked eye direction information and marked eye judgment information which correspond to the sample eye image, the marked eye direction information is used for indicating the gazing direction of the eye represented by the sample eye image, and the marked eye judgment information is used for indicating whether the eye represented by the sample eye image gazes at a target position; and training to obtain the human eye recognition model by using a machine learning method and taking the sample human eye image of the training sample in the training sample set as input, and taking the marked human eye direction information and marked human eye distinguishing information corresponding to the input sample human eye image as output. Flexibility in generating a human eye recognition model for human eye detection is improved.

Description

Method and device for generating human eye recognition model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating a human eye recognition model.

Background

Human eye detection has wide application in the field of computer vision, such as expression recognition, sight tracking, face recognition and the like. In practice, the human eye detection technology can be applied to unlocking of devices such as mobile phones and tablet computers, and can also be applied to the fields of judging whether drivers are tired in driving and the like.

Methods for human eye detection are mainly classified into two categories. One category uses a comparison of one red-eye image and another non-red-eye dark-pupil image based on the human eye's unique red-eye effect for further human eye detection and tracking. The other type is based on color or gray scale images, and the adopted methods are identification algorithms such as template matching, projection, neural networks and the like.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a human eye recognition model.

In a first aspect, an embodiment of the present application provides a method for generating a human eye recognition model, where the method includes: acquiring a training sample set, wherein the training sample comprises a sample eye image, and marked eye direction information and marked eye judgment information which correspond to the sample eye image, the marked eye direction information is used for indicating the gazing direction of the eye represented by the sample eye image, and the marked eye judgment information is used for indicating whether the eye represented by the sample eye image gazes at a target position; and training to obtain the human eye recognition model by using a machine learning method and taking the sample human eye image of the training sample in the training sample set as input, and taking the marked human eye direction information and marked human eye distinguishing information corresponding to the input sample human eye image as output.

In some embodiments, a machine learning method is used, a sample eye image of each training sample in a training sample set is used as an input, labeled eye direction information and labeled eye distinguishing information corresponding to the input sample eye image are used as outputs, and a human eye recognition model is obtained through training, and the method includes: extracting an initial neural network; the following training steps are performed: inputting at least one sample human eye image in the training sample set into an initial neural network to obtain human eye direction information and human eye distinguishing information corresponding to each sample human eye image in the at least one sample human eye image; comparing the human eye direction information and the human eye distinguishing information corresponding to each sample human eye image in at least one sample human eye image with the corresponding marked human eye direction information and marked human eye distinguishing information respectively, and determining whether the initial neural network reaches a preset optimization target or not according to a comparison result; in response to determining that the initial neural network meets the optimization goal, determining the initial neural network as a trained human eye recognition model.

In some embodiments, the training method includes that a machine learning method is used to input a sample eye image of each training sample in a training sample set, output labeled eye direction information and labeled eye discrimination information corresponding to the input sample eye image, and train to obtain an eye recognition model, and further includes: in response to determining that the initial neural network does not meet the optimization goal, network parameters of the initial neural network are adjusted, and the training steps continue with composing a set of training samples using unused training samples.

In some embodiments, comparing the eye direction information and the eye discrimination information corresponding to each sample eye image in at least one sample eye image with the corresponding labeled eye direction information and labeled eye discrimination information, and determining whether the initial neural network reaches a preset optimization target according to the comparison result includes: determining a function value of the first loss function and a function value of the second loss function based on a preset first loss function and a preset second loss function, wherein the function value of the first loss function is used for representing the difference degree between the human eye direction information corresponding to the sample human eye image input into the initial neural network and the corresponding marked human eye direction information, and the function value of the second loss function is used for representing the difference degree between the human eye judgment information corresponding to the sample human eye image input into the initial neural network and the marked human eye judgment information; and in response to determining that the function value of the first loss function is less than or equal to a preset first threshold value and the function value of the second loss function is less than or equal to a preset second threshold value, determining that the initial neural network reaches the optimization goal.

In a second aspect, an embodiment of the present application provides a method for identifying a human eye, the method including: acquiring a to-be-recognized eye image; inputting the human eye image to be recognized into a pre-trained human eye recognition model to obtain human eye direction information and human eye distinguishing information, wherein the human eye recognition model is generated according to any one implementation mode in the first aspect, the human eye direction information is used for indicating the human eye gazing direction represented by the human eye image to be recognized, and the human eye distinguishing information is used for indicating whether the human eye represented by the human eye image to be recognized gazes at the target position.

In a third aspect, an embodiment of the present application provides an apparatus for generating a human eye recognition model, where the apparatus includes: the training sample comprises a sample eye image, and marked eye direction information and marked eye judgment information which correspond to the sample eye image, wherein the marked eye direction information is used for indicating the gazing direction of the eye represented by the sample eye image, and the marked eye judgment information is used for indicating whether the eye represented by the sample eye image gazes at a target position; and the training unit is configured to use a machine learning method to input the sample eye images of the training samples in the training sample set, output the labeled eye direction information and labeled eye distinguishing information corresponding to the input sample eye images, and train to obtain the eye recognition model.

In some embodiments, the training unit comprises: an extraction module configured to extract an initial neural network; a training module configured to perform the following training steps: inputting at least one sample human eye image in the training sample set into an initial neural network to obtain human eye direction information and human eye distinguishing information corresponding to each sample human eye image in the at least one sample human eye image; comparing the human eye direction information and the human eye distinguishing information corresponding to each sample human eye image in at least one sample human eye image with the corresponding marked human eye direction information and marked human eye distinguishing information respectively, and determining whether the initial neural network reaches a preset optimization target or not according to a comparison result; in response to determining that the initial neural network meets the optimization goal, determining the initial neural network as a trained human eye recognition model.

In some embodiments, the training unit is further configured to: in response to determining that the initial neural network does not meet the optimization goal, network parameters of the initial neural network are adjusted, and the training steps continue with composing a set of training samples using unused training samples.

In some embodiments, the training module comprises: the first determining submodule is configured to determine a function value of the first loss function and a function value of the second loss function based on a preset first loss function and a preset second loss function, wherein the function value of the first loss function is used for representing the difference degree between human eye direction information corresponding to a sample human eye image input into the initial neural network and corresponding labeling human eye direction information, and the function value of the second loss function is used for representing the difference degree between human eye judgment information corresponding to the sample human eye image input into the initial neural network and labeling human eye judgment information; and the second determining submodule is configured to determine that the initial neural network reaches the optimization target in response to determining that the function value of the first loss function is smaller than or equal to a preset first threshold and the function value of the second loss function is smaller than or equal to a preset second threshold.

In a fourth aspect, an embodiment of the present application provides an apparatus for recognizing a human eye, the apparatus including: the acquisition unit is configured for acquiring a human eye image to be recognized; the identification unit is configured to input a to-be-identified human eye image into a pre-trained human eye identification model to obtain human eye direction information and human eye discrimination information, wherein the human eye identification model is generated according to any one implementation mode in the first aspect, the human eye direction information is used for indicating the gazing direction of the human eye represented by the to-be-identified human eye image, and the human eye discrimination information is used for indicating whether the human eye represented by the to-be-identified human eye image gazes at a target position.

In a fifth aspect, an embodiment of the present application provides an electronic device, where the server includes: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as described in any one of the implementations of the first and second aspects.

In a sixth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first and second aspects.

According to the method and the device for generating the human eye recognition model, the sample human eye image of the training sample in the training sample set is used as input, the marked human eye direction information and the marked human eye distinguishing information corresponding to the input sample human eye image are used as output, the human eye recognition model is obtained through training, and therefore the flexibility of generating the human eye recognition model for human eye detection is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a human eye recognition model according to the present application;

FIG. 3A is an exemplary schematic diagram of a positional relationship of a human eye to a target for a method for generating a human eye recognition model according to the present application;

fig. 3B is an exemplary diagram of a target for region segmentation for determining annotated human eye direction information according to the method for generating a human eye recognition model of the present application;

fig. 3C is an exemplary schematic diagram of a method for generating a human eye recognition model for determining a position of a camera marking human eye discrimination information on a target according to the present application;

FIG. 4 is an exemplary schematic diagram of a multi-task learning based neural network for a method for generating a human eye recognition model according to the present application;

FIG. 5 is a schematic illustration of an application scenario of a method for generating a human eye recognition model according to the present application;

FIG. 6 is a flow diagram of one embodiment of a method for identifying a human eye according to the present application;

FIG. 7 is a schematic diagram illustrating an embodiment of an apparatus for generating a human eye recognition model according to the present application;

FIG. 8 is a flow chart of one embodiment of an apparatus for identifying a human eye according to the present application;

FIG. 9 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which the method for generating a human eye recognition model or the apparatus for generating a human eye recognition model of the embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as data processing applications, image processing applications, payment applications, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices having a display screen, or various electronic devices having a camera thereon or being in communication connection with the camera in a wireless or wired manner, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, motion Picture Experts Group Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion Picture Experts Group Audio Layer 4), a laptop, a desktop computer, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for various applications on the

terminal devices

101, 102, 103. The background server may perform training by using the acquired training sample set, and feed back a training result (e.g., an eye recognition model) to the terminal device, or store the training result in the background server.

The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating the human eye recognition model provided in the embodiment of the present application may be executed by the server 105, and may also be executed by the

terminal devices

101, 102, and 103. Accordingly, the means for generating the human eye recognition model may be provided in the server 105, or may be provided in the

terminal devices

101, 102, 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for generating a human eye recognition model in accordance with the present application is shown. The method for generating the human eye recognition model comprises the following steps:

step 201, a training sample set is obtained.

In this embodiment, an executing subject (for example, a server or a terminal device shown in fig. 1) of the method for generating a human eye recognition model may obtain the training sample set from a remote location or a local location through a wired connection manner or a wireless connection manner. The training sample can comprise a sample eye image, and marked eye direction information and marked eye judgment information corresponding to the sample eye image.

In this embodiment, the annotation eye direction information may be used to indicate the gaze direction of the eye represented by the sample eye image. Gaze directions may include, but are not limited to, at least one of: up, down, left, right, forward, null direction (e.g. closed eye). As an example, as shown in fig. 3A, 301 is a circular target used for acquiring a sample human eye image by using a camera, a horizontal plane in which a pupil center point of a human eye 302 is located is the same as a horizontal plane in which a center of a circle of the target 301 is located, or a vertical distance between the two is smaller than or equal to a preset distance. As shown in fig. 3B, the target 301 is divided into

regions

3011, 3012, 3013, 3014, and 3015 having predetermined sizes. The camera for shooting the human eye image is arranged at a preset position. The annotation human eye direction information of the human eye image taken when the human eye 302 is gazed at the region 3011 may be set to "up". The annotation human eye direction information of the human eye image captured when the human eye 302 is gazed at the region 3012 may be set to "down". The annotation human eye direction information of the human eye image taken when the human eye 302 is gazed at the region 3013 may be set to "left". The annotation human eye direction information of the human eye image taken when the human eye 302 is gazed at the region 3014 may be set to "right". The annotation human eye direction information of the human eye image captured when the human eye 302 is gazed at the region 3015 may be set to "front". The annotated human eye direction information of the human eye image captured when the human eye 302 is in an abnormal fixation (e.g., closed eye, squinting, etc.) state may be set to "invalid".

In this embodiment, the annotation human eye discrimination information may be used to indicate whether the human eye represented by the sample human eye image gazes at the target position. The target position may be a position where a camera for capturing an image of a human eye is located or other positions (e.g., a certain spatial region) designated by a technician. As shown in fig. 3C, the camera 303 is disposed at a predetermined position on the target 301, and when the human eye 301 is gazed at the placement position of the camera 303 (or a circular area with a predetermined radius around the placement position), the marked human eye identification information of the captured human eye image may be set as "gazed". When the human eye 301 is gazing at an area other than the placement position of the camera 303, the tag human eye discrimination information may be set to "non-gazing". It should be understood that the camera 303 can be disposed at a plurality of positions, and the technician can mark the human eye image shot by the camera at each position in advance.

It should be noted that the sample human eye image may be various types of images, such as a color image, an infrared image, a grayscale image, and the like.

Step 202, using a machine learning method to input a sample eye image of a training sample in a training sample set, using the labeled eye direction information and labeled eye discrimination information corresponding to the input sample eye image as outputs, and training to obtain an eye recognition model.

In this embodiment, based on the training sample set obtained in step 201, the executing agent may use a machine learning method to input a sample eye image of the training sample in the training sample set, and output labeled eye direction information and labeled eye distinguishing information corresponding to the input sample eye image, to train to obtain the eye recognition model. The human eye recognition model may be a human eye image recognition model obtained by a technician performing supervised training based on an existing artificial neural network (e.g., a convolutional neural network, a cyclic neural network, etc.). The artificial neural network may have various existing neural network structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.).

In some optional implementations of this embodiment, the executing entity may train to obtain the human eye recognition model according to the following steps:

first, an initial neural network is extracted. The initial neural network may be various types of untrained or untrained artificial neural networks. Each layer of the initial neural network may be provided with initial parameters, which may be continuously adjusted during the training of the neural network. By way of example, the initial neural network may be an untrained convolutional neural network (e.g., may include convolutional layers, pooling layers, fully-connected layers, etc.).

Then, the following training steps are performed: the method comprises the steps of firstly, inputting at least one sample human eye image in a training sample set into an initial neural network to obtain human eye direction information and human eye distinguishing information corresponding to each sample human eye image in the at least one sample human eye image. And secondly, comparing the human eye direction information and the human eye distinguishing information corresponding to each human eye image in at least one sample human eye image with the corresponding marked human eye direction information and marked human eye distinguishing information respectively, and determining whether the initial neural network reaches a preset optimization target or not according to a comparison result. The preset optimization target may be that the identification accuracy of the initial neural network (for example, the identification accuracy obtained by testing the initial neural network with a preset test sample set) reaches a preset threshold. And thirdly, in response to determining that the initial neural network reaches the optimization target, determining the initial neural network as the trained human eye recognition model.

In some optional implementations of this embodiment, the step of training to obtain the human eye recognition model may further include the following steps:

in response to determining that the initial neural network does not meet the optimization goal, network parameters of the initial neural network are adjusted, and the training steps continue with composing a set of training samples using unused training samples. As an example, the network parameters of the initial neural network may be adjusted by using a Back Propagation Algorithm (BP Algorithm) and a gradient descent method (e.g., a small batch gradient descent Algorithm). It should be noted that the back propagation algorithm and the gradient descent method are well-known technologies that are currently widely researched and applied, and are not described herein again.

In some optional implementations of the embodiment, the executing entity may determine whether the initial neural network reaches a preset optimization goal according to the following steps:

first, a function value of the first loss function and a function value of the second loss function are determined based on a preset first loss function and a preset second loss function. The function value of the first loss function is used for representing the difference degree between the human eye direction information corresponding to the sample human eye image input into the initial neural network and the corresponding marked human eye direction information. And the function value of the second loss function is used for representing the difference degree between the human eye discrimination information corresponding to the sample human eye image input into the initial neural network and the marked human eye discrimination information. As an example, the first and second loss functions described above may be various types of loss functions, such as softmax loss function, sigmoid loss function, and the like. The first loss function and the second loss function may be of the same type or different types.

And in response to determining that the function value of the first loss function is less than or equal to a preset first threshold value and the function value of the second loss function is less than or equal to a preset second threshold value, determining that the initial neural network reaches the optimization goal. In general, a loss function may be used to characterize the difference between the predicted and true outcomes of the neural network. The smaller the function value of the loss function, the smaller the difference between the predicted result and the true result of the neural network.

Optionally, the executing body may train the initial neural network based on a machine learning method of multitask learning to obtain the human eye detection model. Among them, multitask learning is an inductive migration method that allows features specific to a certain task in each layer of the neural network to be used by other tasks. The multi-task learning can obtain the characteristics suitable for several different tasks, and the characteristics of the different tasks are possibly associated with each other, so the multi-task learning can improve the identification precision of the trained model. In practice, multitask Learning can take many forms, such as Joint Learning (Joint Learning), autonomous Learning (Learning to Learning), Learning by means of Auxiliary Tasks (Learning with automatic Tasks), and so on. As an example, as shown in fig. 4, the tasks performed by the initial neural network 401 include two: task 1 is for determining human eye direction information of the input human eye image 402, and task 2 is for determining human eye discrimination information of the input human eye image. The feature data (e.g., the feature data represented by each circle in fig. 4) used by the tasks 1 and 2 can be shared among the layers (e.g., 4011 and 4013 in fig. 4) (e.g., the connecting lines between each circle in fig. 4 represent that the feature data is shared among the layers). Therefore, the trained human eye recognition model can further improve the accuracy of human eye detection. It should be noted that the above-mentioned multitask learning method is a well-known technology widely studied and applied at present, and is not described herein again.

With continued reference to fig. 5, fig. 5 is a schematic diagram of an application scenario of the method for generating a human eye recognition model according to the present embodiment. In the application scenario of fig. 5, a computer 502 obtains a set of training samples 503 from another computer 501 to which it is communicatively connected. The training sample comprises a sample eye image 5031, and marked eye direction information 5032 and marked eye distinguishing information 5033 corresponding to the sample eye image. Then, the computer 502 extracts an initial model 504 (e.g., a convolutional neural network), sequentially inputs the sample eye images into the initial model 504, and trains the initial model by outputting labeled eye direction information (e.g., "forward") and labeled eye discrimination information ("gaze") corresponding to the input sample eye images and the sample eye images. The human eye recognition model 505 is finally obtained through training by adjusting the parameters of the initial model 504 for a plurality of times. The eye recognition model 505 may recognize the input eye image, determine a gaze direction of a human eye represented by the input eye image, and determine whether the human eye represented by the input eye image gazes at the target location.

In the method provided by the embodiment of the application, the set of training samples including the sample eye images, the marked eye direction information and the marked eye judgment information corresponding to the sample eye images is obtained, then the machine learning method is utilized, the sample eye images of the training samples in the training sample set are used as input, the marked eye direction information and the marked eye judgment information corresponding to the input sample eye images are used as output, and the eye recognition model is obtained through training, so that the flexibility of generating the eye recognition model for eye detection is improved.

With further reference to fig. 6, a flow 600 of one embodiment of a method for identifying a human eye provided herein is illustrated. The method for identifying a human eye may comprise the steps of:

step 601, obtaining an eye image to be identified.

In the present embodiment, an execution subject (for example, a server or a terminal device shown in fig. 1) of the method for detecting a face may acquire a face image of a detection target in various ways. For example, the execution subject may include a camera, and the camera may capture an image of human eyes of the user, and obtain the image of human eyes of the user as the image to be recognized. For another example, assuming that the execution subject is a server as shown in fig. 1, the execution subject may acquire the eye image to be recognized from the terminal device as shown in fig. 1 through a wired connection manner or a wireless connection manner.

In the present embodiment, the human eye to be identified with the human eye image representation may be an eye of an arbitrary person (e.g., a user using an execution subject including a camera, or a person present in a shooting range of a camera communicatively connected to the execution subject, or the like).

Step 602, inputting a to-be-recognized eye image into a pre-trained eye recognition model to obtain eye direction information and eye discrimination information.

In this embodiment, based on the to-be-recognized eye image obtained in step 601, the executing body may input the to-be-recognized eye image into a pre-trained eye recognition model to obtain eye direction information and eye distinguishing information. The human eye direction information is used for indicating the gazing direction of the human eyes represented by the human eye image to be recognized, and the human eye distinguishing information is used for indicating whether the human eyes represented by the human eye image to be recognized gaze the target position.

In this embodiment, the human eye recognition model may be generated by the method described in the embodiment of fig. 2 above. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again.

Alternatively, after obtaining the human eye direction information and the human eye discrimination information, the execution subject may generate and output preset various information (e.g., warning information) in various forms (e.g., a text form, an image form, an audio form, etc.) based on the obtained human eye direction information and the human eye discrimination information.

As an example, the execution subject may be a mobile phone, and the human eye image to be recognized may be a human eye image of the user captured by a front camera mounted on the mobile phone. After the mobile phone obtains an operation signal of unlocking the screen of the user, the camera shoots an eye image of the user, and the mobile phone can input the eye image into the eye recognition model to obtain the eye direction information 'forward' and the eye judgment information 'watching'. The watching mode is used for representing that human eyes of a user watch on a front camera of the mobile phone. Then, the mobile phone may generate an unlocking signal according to a preset unlocking condition (for example, the human eye direction information is "forward" and the human eye discrimination information is "staring") to unlock the screen of the mobile phone.

According to the method provided by the embodiment, the eye image to be recognized is obtained, and then the eye image to be recognized is input into the pre-trained eye recognition model to obtain the eye direction information and the eye judgment information, so that the accuracy of determining the eye gazing direction according to the eye image and judging whether the eye gazes at the target position is improved.

With further reference to fig. 7, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an apparatus for generating a human eye recognition model, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 7, the apparatus 700 for generating a human eye recognition model of the present embodiment includes: an obtaining unit 701 configured to obtain a training sample set, where the training sample includes a sample eye image, and labeled eye direction information and labeled eye discrimination information corresponding to the sample eye image, the labeled eye direction information is used to indicate a gazing direction of an eye represented by the sample eye image, and the labeled eye discrimination information is used to indicate whether the eye represented by the sample eye image gazes at a target position; the training unit 702 is configured to use a machine learning method to input a sample eye image of a training sample in a training sample set, output labeled eye direction information and labeled eye discrimination information corresponding to the input sample eye image, and train to obtain an eye recognition model.

In this embodiment, the obtaining unit 701 may obtain the training sample set from a remote location or a local location through a wired connection manner or a wireless connection manner. The training sample can comprise a sample eye image, and marked eye direction information and marked eye judgment information corresponding to the sample eye image. The annotation eye direction information may be used to indicate a gaze direction of the eye represented by the sample eye image. Gaze directions may include, but are not limited to, at least one of: up, down, left, right, forward, null direction (e.g. closed eye). The annotation eye discrimination information may be used to indicate whether the eye represented by the sample eye image is gazing at the target location. The target position may be a position of a camera for capturing an image of a human eye or other position (e.g., a certain space region) designated by a technician

In this embodiment, based on the training sample set acquired by the acquiring unit 701, the training unit 702 may use a machine learning method to input a sample eye image of a training sample in the training sample set, and train the input sample eye image to obtain the eye recognition model by outputting the direction information of the labeled eye and the discrimination information of the labeled eye corresponding to the input sample eye image. The human eye recognition model may be a human eye image recognition model obtained by a technician performing supervised training based on an existing artificial neural network (e.g., a convolutional neural network, a cyclic neural network, etc.). The artificial neural network may have various existing neural network structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.).

In some optional implementations of this embodiment, the training unit 702 may include: an extraction module configured to extract an initial neural network; a training module configured to perform the following training steps: inputting at least one sample human eye image in the training sample set into an initial neural network to obtain human eye direction information and human eye distinguishing information corresponding to each sample human eye image in the at least one sample human eye image; comparing the human eye direction information and the human eye distinguishing information corresponding to each sample human eye image in at least one sample human eye image with the corresponding marked human eye direction information and marked human eye distinguishing information respectively, and determining whether the initial neural network reaches a preset optimization target or not according to a comparison result; in response to determining that the initial neural network meets the optimization goal, determining the initial neural network as a trained human eye recognition model.

In some optional implementations of this embodiment, the training unit 702 may be further configured to: in response to determining that the initial neural network does not meet the optimization goal, network parameters of the initial neural network are adjusted, and the training steps continue with composing a set of training samples using unused training samples.

In some optional implementations of this embodiment, the training module may include: the first determining submodule is configured to determine a function value of the first loss function and a function value of the second loss function based on a preset first loss function and a preset second loss function, wherein the function value of the first loss function is used for representing the difference degree between human eye direction information corresponding to a sample human eye image input into the initial neural network and corresponding labeling human eye direction information, and the function value of the second loss function is used for representing the difference degree between human eye judgment information corresponding to the sample human eye image input into the initial neural network and labeling human eye judgment information; and the second determining submodule is configured to determine that the initial neural network reaches the optimization target in response to determining that the function value of the first loss function is smaller than or equal to a preset first threshold and the function value of the second loss function is smaller than or equal to a preset second threshold.

It will be understood that the elements described in the apparatus 700 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 700 and the units included therein, and will not be described herein again.

With further reference to fig. 8, as an implementation of the method shown in fig. 6, the present application provides an embodiment of an apparatus for recognizing human eyes, which corresponds to the embodiment of the method shown in fig. 6, and which can be applied to various electronic devices.

As shown in fig. 8, the apparatus 800 for generating a human eye recognition model of the present embodiment includes: an acquiring unit 801 configured to acquire a to-be-recognized eye image; the recognition unit 802 is configured to input a to-be-recognized eye image into a pre-trained eye recognition model to obtain eye direction information and eye discrimination information, where the eye recognition model is generated according to any one of the implementation manners in the first aspect, the eye direction information is used to indicate a gaze direction of an eye represented by the to-be-recognized eye image, and the eye discrimination information is used to indicate whether the eye represented by the to-be-recognized eye image gazes at a target position.

It will be understood that the elements described in the apparatus 800 correspond to various steps in the method described with reference to fig. 6. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 800 and the units included therein, and are not described herein again.

Referring now to FIG. 9, a block diagram of a computer system 900 suitable for use in implementing an electronic device (e.g., the server or terminal device shown in FIG. 1) of an embodiment of the present application is shown. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the method of the present application are executed when the computer program is executed by a Central Processing Unit (CPU) 901. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a training unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, a receiving unit may also be described as a "unit that obtains a set of training samples".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a training sample set, wherein the training sample comprises a sample eye image, and marked eye direction information and marked eye judgment information which correspond to the sample eye image, the marked eye direction information is used for indicating the gazing direction of the eye represented by the sample eye image, and the marked eye judgment information is used for indicating whether the eye represented by the sample eye image gazes at a target position; and training to obtain the human eye recognition model by using a machine learning method and taking the sample human eye image of the training sample in the training sample set as input, and taking the marked human eye direction information and marked human eye distinguishing information corresponding to the input sample human eye image as output.

Further, the one or more programs, when executed by the electronic device, may further cause the electronic device to: acquiring a to-be-recognized eye image; inputting the human eye image to be recognized into a human eye recognition model trained in advance to obtain human eye direction information and human eye distinguishing information, wherein the human eye recognition model can be generated by the method for generating the human eye recognition model described in the above embodiments, the human eye direction information is used for indicating the gazing direction of the human eye represented by the human eye image to be recognized, and the human eye distinguishing information is used for indicating whether the human eye represented by the human eye image to be recognized gazes at the target position.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating a human eye recognition model, comprising:

acquiring a training sample set, wherein the training sample comprises a sample eye image, and marked eye direction information and marked eye judgment information which correspond to the sample eye image, the marked eye direction information is used for indicating the gazing direction of the eye represented by the sample eye image, and the marked eye judgment information is used for indicating whether the eye represented by the sample eye image gazes at a target position;

by utilizing a machine learning method, taking a sample eye image of a training sample in the training sample set as input, taking marked eye direction information and marked eye distinguishing information corresponding to the input sample eye image as output, and training to obtain an eye recognition model; the human eye recognition model is obtained by training an initial neural network, wherein tasks of the initial neural network comprise a task for determining human eye direction information and a task for determining human eye judgment information.

2. The method as claimed in claim 1, wherein the training with the machine learning method using the sample eye image of each training sample in the training sample set as input and using the labeled eye direction information and labeled eye distinguishing information corresponding to the input sample eye image as output to obtain the eye recognition model includes:

extracting an initial neural network;

the following training steps are performed: inputting at least one sample human eye image in the training sample set into an initial neural network to obtain human eye direction information and human eye distinguishing information corresponding to each sample human eye image in the at least one sample human eye image; comparing the human eye direction information and the human eye distinguishing information corresponding to each sample human eye image in at least one sample human eye image with the corresponding marked human eye direction information and marked human eye distinguishing information respectively, and determining whether the initial neural network reaches a preset optimization target or not according to a comparison result; in response to determining that the initial neural network meets the optimization goal, determining the initial neural network as a trained human eye recognition model.

3. The method as claimed in claim 2, wherein the training by the machine learning method includes taking a sample eye image of each training sample in the training sample set as an input, taking labeled eye direction information and labeled eye distinguishing information corresponding to the input sample eye image as outputs, and training to obtain the eye recognition model, further including:

in response to determining that the initial neural network does not meet the optimization goal, adjusting network parameters of the initial neural network, and using unused training samples to form a set of training samples, continuing to perform the training step.

4. The method according to claim 2 or 3, wherein the comparing the eye direction information and the eye discrimination information corresponding to each sample eye image in the at least one sample eye image with the corresponding labeled eye direction information and labeled eye discrimination information respectively, and determining whether the initial neural network reaches a preset optimization target according to the comparison result comprises:

determining a function value of the first loss function and a function value of the second loss function based on a preset first loss function and a preset second loss function, wherein the function value of the first loss function is used for representing the difference degree between human eye direction information corresponding to a sample human eye image input into an initial neural network and corresponding labeling human eye direction information, and the function value of the second loss function is used for representing the difference degree between human eye judgment information corresponding to the sample human eye image input into the initial neural network and labeling human eye judgment information;

and in response to determining that the function value of the first loss function is less than or equal to a preset first threshold value and the function value of the second loss function is less than or equal to a preset second threshold value, determining that the initial neural network reaches an optimization goal.

5. A method for identifying a human eye, comprising:

acquiring a to-be-recognized eye image;

inputting the to-be-recognized eye image into a pre-trained eye recognition model to obtain eye direction information and eye discrimination information, wherein the eye recognition model is generated according to the method of any one of claims 1 to 4, the eye direction information is used for indicating the gazing direction of the eye represented by the to-be-recognized eye image, and the eye discrimination information is used for indicating whether the eye represented by the to-be-recognized eye image gazes at a target position.

6. An apparatus for generating a human eye recognition model, comprising:

the training sample comprises a sample eye image, and marked eye direction information and marked eye judgment information which correspond to the sample eye image, wherein the marked eye direction information is used for indicating the gazing direction of the eye represented by the sample eye image, and the marked eye judgment information is used for indicating whether the eye represented by the sample eye image gazes at a target position;

the training unit is configured to use a machine learning method to input a sample eye image of a training sample in the training sample set, output labeled eye direction information and labeled eye distinguishing information corresponding to the input sample eye image, and train to obtain an eye recognition model; the human eye recognition model is obtained by training an initial neural network, wherein tasks of the initial neural network comprise a task for determining human eye direction information and a task for determining human eye judgment information.

7. The apparatus of claim 6, wherein the training unit comprises:

an extraction module configured to extract an initial neural network;

a training module configured to perform the following training steps: inputting at least one sample human eye image in the training sample set into an initial neural network to obtain human eye direction information and human eye distinguishing information corresponding to each sample human eye image in the at least one sample human eye image; comparing the human eye direction information and the human eye distinguishing information corresponding to each sample human eye image in at least one sample human eye image with the corresponding marked human eye direction information and marked human eye distinguishing information respectively, and determining whether the initial neural network reaches a preset optimization target or not according to a comparison result; in response to determining that the initial neural network meets the optimization goal, determining the initial neural network as a trained human eye recognition model.

8. The apparatus of claim 7, wherein the training unit is further configured to:

9. The apparatus of claim 7 or 8, wherein the training module comprises:

the first determining submodule is configured to determine a function value of the first loss function and a function value of the second loss function based on a preset first loss function and a preset second loss function, wherein the function value of the first loss function is used for representing the difference degree between human eye direction information corresponding to a sample human eye image input into an initial neural network and corresponding labeled human eye direction information, and the function value of the second loss function is used for representing the difference degree between human eye distinguishing information corresponding to the sample human eye image input into the initial neural network and labeled human eye distinguishing information;

and the second determining submodule is configured to determine that the initial neural network reaches the optimization target in response to determining that the function value of the first loss function is smaller than or equal to a preset first threshold value and the function value of the second loss function is smaller than or equal to a preset second threshold value.

10. An apparatus for recognizing human eyes, comprising:

the acquisition unit is configured for acquiring a human eye image to be recognized;

the identification unit is configured to input the to-be-identified human eye image into a pre-trained human eye identification model to obtain human eye direction information and human eye judgment information, wherein the human eye identification model is generated according to the method of any one of claims 1 to 4, the human eye direction information is used for indicating the gazing direction of the human eye represented by the to-be-identified human eye image, and the human eye judgment information is used for indicating whether the human eye represented by the to-be-identified human eye image gazes at a target position.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.