CN111626087A

CN111626087A - Neural network training and eye opening and closing state detection method, device and equipment

Info

Publication number: CN111626087A
Application number: CN201910153463.4A
Authority: CN
Inventors: 王飞; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2020-09-04
Also published as: JP7227385B2; KR20210113621A; WO2020173135A1; JP2022517398A

Abstract

An embodiment of the present disclosure discloses a neural network training method, an eye opening/closing state detection method, an intelligent driving control method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program, wherein the neural network training method includes: carrying out eye opening and closing state detection processing on a plurality of eye images in an image set corresponding to at least two eye opening and closing detection training tasks respectively through an eye opening and closing detection neural network to be trained, and outputting eye opening and closing state detection results; wherein the eye images contained in the different image sets are at least partially different; and determining losses corresponding to the at least two eye opening and closing detection training tasks respectively according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, and adjusting network parameters of the neural network according to the losses corresponding to the at least two eye opening and closing detection training tasks respectively.

Description

Neural network training and eye opening and closing state detection method, device and equipment

Technical Field

The present disclosure relates to computer vision technology, and in particular, to a neural network training method, a neural network training device, an eye opening/closing state detection method, an eye opening/closing state detection device, an intelligent driving control method, an intelligent driving control device, an electronic device, a computer-readable storage medium, and a computer program.

Background

The eye opening and closing state detection means detection of the opening and closing condition of the eyes. The eye opening and closing state monitoring can be used in the fields of fatigue monitoring, living body recognition, expression recognition and the like. For example, in the driving assistance technique, it is necessary to detect the eye opening/closing state of the driver, determine whether the driver is in a fatigue driving state based on the eye opening/closing state detection result, and monitor fatigue driving. The opening and closing state of the eyes can be accurately detected, misjudgment can be avoided as much as possible, and the safety of vehicle driving can be improved.

Disclosure of Invention

The embodiment of the disclosure provides a neural network training, eye opening and closing state detection and intelligent driving control technical scheme.

According to an aspect of the disclosed embodiments, there is provided a neural network training method, the method including: carrying out eye opening and closing state detection processing on a plurality of eye images in an image set corresponding to at least two eye opening and closing detection training tasks respectively through an eye opening and closing detection neural network to be trained, and outputting eye opening and closing state detection results; wherein the eye images contained in the different image sets are at least partially different; and determining losses corresponding to the at least two eye opening and closing detection training tasks respectively according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, and adjusting network parameters of the neural network according to the losses corresponding to the at least two eye opening and closing detection training tasks respectively.

In an embodiment of the present disclosure, the at least two open-closed eye detection training tasks include at least two of the following tasks: an open/close eye detection task for a case where the eyes have attachments, an open/close eye detection task for a case where the eyes have no attachments, an open/close eye detection task in an indoor environment, an open/close eye detection task in an outdoor environment, an open/close eye detection task for a case where the eyes have attachments and the attachments have no spots, and an open/close eye detection task for a case where the eyes have attachments and the attachments have no spots; the image sets corresponding to the at least two open-closed eye detection training tasks respectively comprise at least two image sets corresponding to the following images: the eye image set comprises an eye image set with attachments on the eyes, an eye image set without attachments on the eyes, an eye image set collected in an indoor environment, an eye image set collected in an outdoor environment, an eye image set with attachments on the eyes and light spots on the attachments, and an eye image set with attachments on the eyes and no light spots on the attachments.

In another embodiment of the present disclosure, the neural network for detecting open/close eyes to be trained, which performs eye open/close state detection processing on a plurality of eye images in an image set corresponding to each of at least two open/close eye detection training tasks, and outputs an eye open/close state detection result, includes: according to the preset image quantity proportion of different open-close eye detection training tasks, aiming at the different open-close eye detection training tasks, respectively acquiring a corresponding quantity of eye images from different image sets; and carrying out eye opening and closing state detection processing on the eye images of the corresponding number by the neural network for eye opening and closing detection to be trained, and outputting eye opening and closing state detection results corresponding to the eye images.

In yet another embodiment of the present disclosure, the determining, according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, a loss corresponding to each of at least two opening and closing eye detection training tasks includes: for any eye opening and closing detection training task, determining the loss corresponding to the training task according to the included angle between the maximum probability value in the eye opening and closing state detection results respectively output by the neural network aiming at the plurality of eye images in the image set corresponding to the training task and the interface corresponding to the labeling information of the corresponding eye image in the image set.

In yet another embodiment of the present disclosure, the adjusting the network parameters of the neural network according to the loss corresponding to each of the at least two open-closed eye detection training tasks includes: determining the comprehensive loss of the at least two open-close eye detection training tasks according to the loss corresponding to the at least two open-close eye detection training tasks; and adjusting network parameters of the neural network according to the comprehensive loss.

According to still another aspect of the disclosed embodiments, there is provided an eye-opening/closing state detection method including: acquiring an image to be processed; carrying out eye opening and closing state detection processing on the image to be processed through a neural network, and outputting eye opening and closing state detection results; the neural network is obtained by training by using the neural network training method described in the above embodiment.

In an embodiment of the present disclosure, the eye opening/closing state detection method further includes: determining eye movement and/or facial expression and/or fatigue state and/or interaction control information of a target object at least according to the eye opening and closing state detection result of the same target object in a plurality of images to be processed with time sequence relation.

According to still another aspect of the disclosed embodiments, there is provided a smart driving control method, including: acquiring an image to be processed, which is acquired by a camera device arranged on a vehicle; carrying out eye opening and closing state detection processing on the image to be processed through a neural network, and outputting eye opening and closing state detection results; determining the fatigue state of the target object at least according to the eye opening and closing state detection results of the same target object in a plurality of images to be processed with time sequence relation; forming a corresponding instruction according to the fatigue state of the target object, and outputting the instruction; the neural network is obtained by training by using the neural network training method described in the above embodiment.

According to still another aspect of the disclosed embodiments, there is provided a neural network training apparatus, the apparatus including: the neural network for detecting the open/close eyes to be trained is used for respectively carrying out eye open/close state detection processing on a plurality of eye images in an image set corresponding to at least two eye open/close detection training tasks and outputting eye open/close state detection results; wherein the eye images contained in the different image sets are at least partially different; and the adjusting module is used for respectively determining the loss corresponding to the at least two eye opening and closing detection training tasks according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, and adjusting the network parameters of the neural network according to the loss corresponding to the at least two eye opening and closing detection training tasks.

In yet another embodiment of the present disclosure, the apparatus further comprises: the input module is used for respectively acquiring a corresponding number of eye images from different image sets aiming at different open-close eye detection training tasks according to the preset image number proportion of different open-close eye detection training tasks and providing the eye images for the open-close eye detection neural network to be trained; and the open-close eye detection neural network to be trained respectively performs eye open-close state detection processing on the eye images of the corresponding number, and outputs eye open-close state detection results corresponding to the eye images.

In yet another embodiment of the present disclosure, the adjusting module is further configured to: for any eye opening and closing detection training task, determining the loss corresponding to the training task according to the included angle between the maximum probability value in the eye opening and closing state detection results respectively output by the neural network aiming at the plurality of eye images in the image set corresponding to the training task and the interface corresponding to the labeling information of the corresponding eye image in the image set.

In yet another embodiment of the present disclosure, the adjusting module is further configured to: determining the comprehensive loss of the at least two open-close eye detection training tasks according to the loss corresponding to the at least two open-close eye detection training tasks; and adjusting network parameters of the neural network according to the comprehensive loss.

According to still another aspect of the disclosed embodiments, there is provided an eye-opening-and-closing-state detecting device including: the acquisition module is used for acquiring an image to be processed; the neural network is used for carrying out eye opening and closing state detection processing on the image to be processed and outputting eye opening and closing state detection results; the neural network is obtained by training with the neural network training device according to the above embodiment.

In an embodiment of the present disclosure, the eye opening/closing state detection device further includes: the determining module is used for determining the eye movement and/or facial expression and/or fatigue state and/or interaction control information of the target object at least according to the eye opening and closing state detection result of the same target object in the images to be processed with the time sequence relationship.

According to still another aspect of the disclosed embodiments, there is provided an intelligent driving control apparatus, the apparatus including: the acquisition module is used for acquiring an image to be processed, which is acquired by a camera device arranged on a vehicle; the neural network is used for carrying out eye opening and closing state detection processing on the image to be processed and outputting eye opening and closing state detection results; the fatigue state determining module is used for determining the fatigue state of the target object at least according to the eye opening and closing state detection results of the same target object in the images to be processed with the time sequence relationship; the instruction module is used for forming a corresponding instruction according to the fatigue state of the target object and outputting the instruction; the neural network is obtained by training with the neural network training device according to the above embodiment.

According to still another aspect of the disclosed embodiments, there is provided an electronic device including: a memory for storing a computer program; a processor for executing the computer program stored in the memory, and when executed, implementing any of the method embodiments of the present disclosure.

According to yet another aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the method embodiments of the present disclosure.

According to a further aspect of an embodiment of the present disclosure, there is provided a computer program comprising computer instructions for implementing any one of the method embodiments of the present disclosure when the computer instructions are run in a processor of a device.

The inventor finds that, in the process of practicing the embodiment of the present disclosure, a conventional neural network trained on a single task often appears as a neural network trained on an image set of the task, and has a better open-closed eye detection accuracy in a scene corresponding to the task, but in other scenes not corresponding to the task, the open-closed eye detection accuracy is difficult to be ensured. If the images collected from a plurality of different scenes are simply used as a whole image set for neural network training, whether the images in the image set come from different scenes or correspond to different training tasks is not distinguished, the distribution of the image subsets (Batch) input into the neural network training each time from the whole image set is uncontrollable, there are the possibility that the images of a certain scene are more, the images of other scenes are fewer or even none, and the distribution of the image subsets trained at different iteration times is not completely the same, that is, the image subsets in each iteration of the neural network are distributed too randomly, different training tasks do not purposefully perform loss calculation, the neural network cannot be controlled to take into account the ability learning of different training tasks during the training process, therefore, the accuracy of the open-close eye detection of the trained neural network in different scenes corresponding to different tasks cannot be ensured.

Based on the neural network training method and device, the eye opening and closing state detection method and device, the intelligent driving control method and device, the electronic device, the computer readable storage medium and the computer program provided by the present disclosure, through respectively determining corresponding image sets from a plurality of different eye opening and closing detection tasks, determining a plurality of eye images of the neural network for a single training from the plurality of image sets, respectively determining loss of the neural network for the eye opening and closing detection result of each training task in the training according to the eye images from the plurality of image sets, and adjusting network parameters of the neural network according to the loss, in this way, eye images corresponding to each training task are included in a subset of the eye images fed into the neural network in each iterative training of the neural network, and the loss of each training task is calculated in a targeted manner, so that the ability learning of eye opening and closing ability detection can be performed for each training task in the neural network training process, the ability learning of different training tasks is considered, so that the trained neural network can simultaneously improve the accuracy of eye opening and closing detection of eye images of each scene in a plurality of scenes corresponding to the training tasks, the universality and the generalization of the technical scheme for accurately detecting the eye opening and closing of different scenes based on the neural network are improved, and the actual application requirements of multiple scenes are better met.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and the embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of one embodiment of a neural network training method of the present disclosure;

FIG. 2 is a schematic diagram of one embodiment of a plurality of open-closed eye detection training tasks according to the present disclosure;

FIG. 3 is a flow chart of one embodiment of a method of eye opening and closing status detection according to the present disclosure;

FIG. 4 is a flow chart of one embodiment of an intelligent driving control method of the present disclosure;

FIG. 5 is a schematic diagram of a neural network training device according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an embodiment of the eye opening and closing state detection apparatus according to the present disclosure;

FIG. 7 is a schematic structural diagram of an embodiment of an intelligent driving control apparatus according to the present disclosure;

fig. 8 is a block diagram of an exemplary device implementing embodiments of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, and servers, which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, and servers, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Electronic devices such as terminal devices, computer systems, and servers may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, and data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Exemplary embodiments

Fig. 1 is a flowchart of an embodiment of a neural network training method of the present disclosure. As shown in fig. 1, the method of this embodiment includes: step S100 and step S110. The respective steps in fig. 1 are described in detail below.

S100, carrying out eye opening and closing state detection processing on a plurality of eye images in an image set corresponding to at least two eye opening and closing detection training tasks through a neural network for eye opening and closing detection to be trained, and outputting eye opening and closing state detection results.

In an alternative example, the neural network for eye opening and closing detection to be trained of the present disclosure may be configured to, after successful training, perform eye opening and closing state detection on the image to be processed, and output a result of the eye opening and closing state detection on the image to be processed, for example, for the image to be processed, the neural network outputs two probability values, where one probability value represents a probability that an eye of a target object in the image to be processed is in an opening state, and the larger the probability value, the closer the probability value is to the opening state; wherein the other probability value represents the probability that the eyes of the target object in the image to be processed are in the closed state, and the higher the probability value is, the closer the eyes are to the closed state is. The sum of the two probability values may be 1.

In one optional example, the neural network in the present disclosure may be a convolutional neural network. Neural networks in the present disclosure may include, but are not limited to: convolutional layers, Relu (Rectified Linear Unit) layers (which may also be referred to as active layers), pooling layers, fully-connected layers, and layers for classification (e.g., binary classification), etc. The greater the number of layers that the neural network contains, the deeper the network. The present disclosure does not limit the specific structure of the neural network.

In an alternative example, the present disclosure may be applied to training a neural network, where at least two open-closed eye detection training tasks are involved, and each open-closed eye detection training task should belong to an overall training task for enabling the neural network to implement open-closed eye state detection. The training targets corresponding to different open-closed eye detection training tasks are not exactly the same. That is, the present disclosure may divide the total training task of the neural network into a plurality of training tasks, each training task is directed to one training target, and training targets corresponding to different training tasks are different.

In one optional example, the at least two open-closed eye detection training tasks in the present disclosure may include at least two of the following tasks: an open/close eye detection task for a case where the eyes have attachments, an open/close eye detection task for a case where the eyes have no attachments, an open/close eye detection task in an indoor environment, an open/close eye detection task in an outdoor environment, an open/close eye detection task for a case where the eyes have attachments and the attachments have no spots, and an open/close eye detection task for a case where the eyes have attachments and the attachments have no spots. The attachment may be glasses or a transparent plastic sheet. The light spot may be a light spot formed on the attachment due to reflection of light by the attachment. The term "eyewear" in this disclosure generally refers to eyewear in which the wearer's eyes can be seen through the lenses.

Alternatively, the open-closed eye detecting task for the case where the eyes have attachments may be an open-closed eye detecting task with glasses. The open-closed eye detection task with the glasses can be realized as follows: at least one of open-closed eye detection for indoor-zone glasses and open-closed eye detection for outdoor-zone glasses.

Alternatively, the open-closed eye detecting task for the eye-free attachment case may be an open-closed eye detecting task without glasses. The open-closed eye detection task without glasses can be realized as follows: at least one of open-closed eye detection without glasses indoors and open-closed eye detection without glasses outdoors.

Optionally, the task of detecting open and closed eyes in an indoor environment may be implemented as follows: at least one of open-closed eye detection for indoors with no glasses, open-closed eye detection for indoors with glasses that are reflective, and open-closed eye detection for indoors with glasses that are non-reflective.

Optionally, the open-closed eye detection task in the outdoor environment may be implemented as follows: at least one of open-closed eye detection for outdoor glasses-free, open-closed eye detection for outdoor glasses-worn glasses-reflective, and open-closed eye detection for outdoor glasses-worn glasses-non-reflective.

Alternatively, the open-closed eye detecting task in which the eyes have attachments and the attachments have light spots thereon may be an open-closed eye detecting task in which the glasses are worn and the glasses reflect light. The open-close eye detection task with the glasses and the glasses reflecting light can be realized as follows: at least one of open-closed eye detection for indoors worn glasses that are reflective of the glasses and open-closed eye detection for outdoors worn glasses that are reflective of the glasses.

Alternatively, the open-closed eye detection task in which the eyes have attachments and no light spot is on the attachments may be an open-closed eye detection task with glasses and the glasses are not reflective. The open-close eye detection task with the glasses and without reflection of light of the glasses can be realized as follows: at least one of open-closed eye detection for indoors worn glasses that are non-reflective to light and open-closed eye detection for outdoors worn glasses that are non-reflective to light.

As is apparent from the above description, there is an intersection between the different open-closed eye detection training tasks in the present disclosure, and for example, the open-closed eye detection task with glasses may intersect with the open-closed eye detection task in an indoor environment, the open-closed eye detection task in an outdoor environment, the open-closed eye detection task in a case where the eyes have attachments and the attachments have light spots, the open-closed eye detection task in a case where the eyes have attachments and the attachments have no light spots, respectively. The above-mentioned case where there is an intersection between the six open-closed eye detection training tasks is not described one by one. In addition, the present disclosure does not limit the number of open-closed eye detection training tasks involved, and the number of open-closed eye detection training tasks may be determined according to actual needs, nor does the present disclosure limit the concrete expression of any one open-closed eye detection training task.

Alternatively, as shown in fig. 2, the at least two open-closed eye detection training tasks in the present disclosure may include the following three open-closed eye detection training tasks:

an open-close eye detection training task a and an open-close eye detection training task in an indoor environment;

an open-close eye detection training task b and an open-close eye detection task in an outdoor environment;

an open/closed eye detection training task c, and an open/closed eye detection task in which the eyes have attachments and the attachments have light spots.

There is no intersection between the open-closed eye detection training task a and the open-closed eye detection training task b, there may be an intersection between the training task a and the training task c, and there may be an intersection between the training task b and the training task c.

In one optional example, at least two open-closed eye detection training tasks in the present disclosure each correspond to a set of images. Each image set typically includes a plurality of eye images. The different image sets contain at least partially different eye images. That is, for any image set, at least some of the eye images in that image set do not appear in the other image sets. Alternatively, the eye images contained in different image sets may intersect.

Alternatively, the image sets corresponding to the above-mentioned six open-closed eye detection training tasks may be: the eye image set comprises an eye image set with attachments on the eyes, an eye image set without attachments on the eyes, an eye image set collected in an indoor environment, an eye image set collected in an outdoor environment, an eye image set with attachments on the eyes and light spots on the attachments, and an eye image set with attachments on the eyes and no light spots on the attachments.

Alternatively, all eye images in the eye image set with attachments to the eye may be eye images with glasses, for example, the eye image set may include: an eye image with glasses acquired in an indoor environment and an eye image with glasses acquired in an outdoor environment.

Alternatively, all eye images in the set of eye images with eyes without attachments may be eye images without glasses, for example, the set of eye images may comprise: an eye image without glasses acquired in an indoor environment and an eye image without glasses acquired in an outdoor environment.

Optionally, the set of eye images acquired in an indoor environment may include: an eye image taken in an indoor environment without glasses, and an eye image taken in an indoor environment with glasses.

Optionally, the set of eye images acquired in an outdoor environment may include: an eye image without glasses acquired in an outdoor environment, and an eye image with glasses acquired in an outdoor environment.

Alternatively, all the eye images in the eye image set with the eye having the attachment and the light spot on the attachment may be the eye images with the glasses, and the eye image set with the light spot on the glasses may include: an eye image with glasses and having a light spot on the glasses, collected in an indoor environment, and an eye image with glasses and having a light spot on the glasses, collected in an outdoor environment.

Alternatively, all eye images in the eye image set with the eye having the attachment and no light spot on the attachment may be eye images with glasses and no light spot on the glasses, for example, the eye image set may include: eye images with glasses and no light spots on the glasses collected in an indoor environment and eye images with glasses and no light spots on the glasses collected in an outdoor environment.

In one optional example, the set of images encompassed by the present disclosure is determined by an open-closed eye detection training task encompassed by the present disclosure. For example, if the present disclosure includes at least two of the above-described six open-closed eye detection training tasks, the present disclosure includes eye image sets corresponding to the at least two open-closed eye detection training tasks, respectively.

In an alternative example, the eye images used in the neural network training process of the present disclosure may also be referred to as eye image samples, and the image content of the eye image samples typically contains eyes. The eye image sample in the present disclosure is typically a monocular-based eye image sample, i.e., the image content of the eye image sample does not contain both eyes, but rather contains one eye. Alternatively, the eye image samples may be eye image samples based on a single side eye, e.g. eye corner image samples based on the left eye. Of course, this disclosure also does not exclude the case where the eye image samples are eye image samples based on both eyes or eye image samples based on any side eye.

In one alternative example, the eye image in the present disclosure may generally be: an eye image block cut out from an image including an eye captured by a camera. For example, the process of forming an eye image in the present disclosure may include: the present disclosure may optionally perform scaling and/or image content mapping (e.g., converting a right-eye image block into a left-eye image block by image content mapping) on the segmented image blocks to form an eye image for training the neural network for eye opening and closing detection. Of course, the eye image in the present disclosure does not exclude the possibility of taking a complete image including the eye captured by the image pickup device as the eye image. Additionally, the eye images in the present disclosure may be eye images in a corresponding set of training samples.

In an alternative example, the eye image for training the neural network for open-closed eye detection in the present disclosure generally has annotation information, and the annotation information may indicate the open-closed state of the eyes in the eye image. That is, the annotation information may indicate whether the eyes in the eye image are in an open state or a closed state. In an alternative example, the label information of the eye image is 1, which indicates that the eyes in the eye image are open, and the label information of the eye image is 0, which indicates that the eyes in the eye image are closed.

In an alternative example, the present disclosure will generally acquire a corresponding number of eye images from respective sets of eye images corresponding to different training tasks. According to an optional example, according to the preset image quantity proportion of different training tasks, respectively acquiring a corresponding quantity of eye images from respective corresponding eye image sets of the different training tasks; in addition, a preset batch number is generally considered in acquiring an eye image. For example, in the case where the preset image number ratio is 1:1:1 for the open-closed eye detection training task a, the open-closed eye detection training task b, and the open-closed eye detection training task c, if the preset batch number is 600, the present disclosure may acquire 200 eye images from the eye image set corresponding to the open-closed eye detection training task a, 200 eye images from the eye image set corresponding to the open-closed eye detection training task b, and 200 eye images from the eye image set corresponding to the open-closed eye detection training task c.

Alternatively, if the number of eye images in the eye image set corresponding to a certain open-close eye detection training task is less than the corresponding number (e.g., less than 200), a corresponding number of eye images may be acquired from the eye image sets corresponding to other open-close eye detection training tasks, so as to achieve the batch processing number. For example, assuming that only 100 eye images exist in the eye image set corresponding to the open/close eye detection training task c, and the number of eye images in the eye image set corresponding to each of the open/close eye detection training task a and the open/close eye detection training task b exceeds 250, 250 eye images may be acquired from the eye image set corresponding to the open/close eye detection training task a, 250 eye images may be acquired from the eye image set corresponding to the open/close eye detection training task b, and 100 eye images may be acquired from the eye image set corresponding to the open/close eye detection training task c, so that 600 eye images may be acquired in total. In this way, the flexibility of acquiring an eye image may be increased.

It should be noted that, in the present disclosure, a manner of randomly setting the number may also be adopted, and a corresponding number of eye images are respectively obtained from the eye image sets corresponding to the different training tasks. The present disclosure does not limit the specific implementation of obtaining a corresponding number of eye images from respective sets of eye images corresponding to different training tasks. In addition, in the process of acquiring the eye images from the eye image set, the eye images with the labeling information in the open-close uncertain state are avoided from being acquired, so that the detection accuracy of the open-close eye detection neural network is improved.

In an alternative example, the present disclosure may sequentially provide the acquired plurality of eye images to the open-close eye detecting neural network to be trained, and the open-close eye detecting neural network to be trained may perform the eye open-close state detecting process on each of the input eye images, respectively, so that the open-close eye detecting neural network to be trained may sequentially output the eye open-close state detection result for each of the eye images. For example, after an eye image input into the open-closed eye detection neural network to be trained is sequentially subjected to convolutional layer processing, full-link layer processing, and layer processing for classification, the open-closed eye detection neural network to be trained outputs two probability values, the value ranges of the two probability values are 0 to 1, and the sum of the two probability values is 1. Wherein a probability value corresponds to the open state, the closer the probability value is to 1, the closer the eye in the eye image is to the open state. Wherein the other probability value corresponds to the closed state, and the closer the probability value is to 1, the closer the eye in the eye image is to the closed state.

S110, determining losses corresponding to the at least two eye opening and closing detection training tasks respectively according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, and adjusting network parameters of the neural network according to the losses corresponding to the at least two eye opening and closing detection training tasks respectively.

In an alternative example, the present disclosure should determine a loss corresponding to each open-closed eye detection training task, determine a combined loss according to the losses corresponding to all the training tasks, and adjust a network parameter of the neural network using the combined loss. Network parameters in the present disclosure may include, but are not limited to: convolution kernel parameters and/or matrix weights, etc. The present disclosure does not limit the specifics of the network parameters involved.

In an optional example, for any open-closed eye detection training task, the present disclosure may determine a loss corresponding to the training task according to an included angle between a maximum probability value in eye open-closed state detection results respectively output by a neural network for a plurality of eye images in an image set corresponding to the training task and an interface corresponding to label information of the corresponding eye image in the image set. Optionally, the present disclosure may determine, according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, losses corresponding to different eye opening and closing detection training tasks respectively by using an a-softmax (normalized index with an angle) loss function, determine a comprehensive loss (e.g., a sum of the losses) according to the losses corresponding to the different eye opening and closing detection training tasks, and adjust the network parameter of the neural network by using a random gradient descent method. For example, the present disclosure may calculate the loss corresponding to each open-closed eye detection training task using an a-softmax loss function, and perform back propagation processing according to the sum of the losses corresponding to all open-closed eye detection training tasks, so that the network parameters of the open-closed eye detection neural network to be trained are updated in a manner that the loss gradient decreases.

As can be seen from the above description, in training the neural network, all of the eye images provided to the neural network may form a subset of eye images per iterative training. The eye image subset includes eye images corresponding to the respective training tasks. According to the method, the loss of each training task is calculated in a targeted manner, so that the neural network training can perform the capability learning of eye opening and closing capability detection aiming at each training task in the training process, and the capability learning of different training tasks is considered, so that the trained neural network can simultaneously improve the accuracy of eye opening and closing detection of eye images of each scene in a plurality of scenes corresponding to the training tasks, the universality and the generalization of the technical scheme for accurately detecting the eye opening and closing of different scenes based on the neural network are improved, and the actual application requirements of multiple scenes are better met.

The a-softmax loss function in this disclosure may be shown as equation (1) below:

in the above formula (1), L_angRepresenting a loss corresponding to a training task; n represents the number of eye images for the training task; | | | represents a modulus value; x is the number of_iRepresenting the ith eye image corresponding to the training task; y is_iRepresenting the labeling value of the ith eye image corresponding to the training task; m is a constant, and the minimum value of m is usually not less than a predetermined value, e.g., the minimum value of m is not less than

And the included angle between the maximum probability value in the eye opening and closing state detection result output by the neural network and the interface corresponding to the label value is shown for the ith eye image.

Represents the product of m and the above-mentioned angle.

In an alternative example, the training process is ended when the training for the open-closed eye detecting neural network to be trained reaches a predetermined iteration condition. The predetermined iteration condition in the present disclosure may include: the open-close eye detection neural network to be trained meets the requirement of preset difference according to the difference between the eye open-close state detection result output by the eye image and the labeling information of the eye image. And under the condition that the difference meets the preset difference requirement, successfully training the neural network. The predetermined iteration condition in the present disclosure may also include: training the open-closed eye detection neural network to be trained, using the number of eye images reaching a predetermined number requirement, and the like. When the number of the used eye images reaches the requirement of the preset number, however, the difference does not meet the requirement of the preset difference, the neural network is not trained successfully. The successfully trained neural network may be used for the eye-open-closed state detection process.

According to the method, comprehensive loss is formed according to the loss of different training tasks, network parameters of the neural network for detecting the eye opening and closing are adjusted by utilizing the comprehensive loss, so that the neural network training can perform the ability learning of the eye opening and closing ability detection aiming at each training task in the training process, and the ability learning of different training tasks is considered, so that the trained neural network can simultaneously improve the accuracy of the eye opening and closing detection of the eye images of each scene in a plurality of scenes corresponding to the training tasks, the universality and the generalization of a technical scheme for accurately detecting the eye opening and closing of different scenes based on the neural network are improved, and the actual application requirements of the multiple scenes are better met.

Fig. 3 is a flowchart of an embodiment of the eye-open/close state detection method of the present disclosure.

As shown in fig. 3, the method of this embodiment includes: step S300 and step S310. Each step in fig. 3 is described in detail below.

And S300, acquiring an image to be processed.

In an alternative example, the image to be processed in the present disclosure may be an image presenting a static picture or a photo, or may be a video frame in a video presenting a dynamic state, for example, a video frame in a video captured by a camera device disposed on a moving object, and for example, a video frame in a video captured by a camera device disposed at a fixed position. The moving object may be a vehicle, a robot, or a robot arm. The fixed position can be a table top or a wall, etc. The present disclosure is not limited to the specific representation of moving objects and fixed locations.

In an optional example, after the image to be processed is acquired, the present disclosure may detect a position area where eyes in the image to be processed are located, for example, an eye outline of the image to be processed may be determined by methods such as face detection or face key point detection. The present disclosure may then segment the image of the eye region from the image to be processed according to the eye outline, the segmented eye patch being provided to the neural network. Of course, the segmented eye image blocks may be provided to the neural network after certain pre-processing. For example, the divided eye image blocks are scaled so that the size of the scaled eye image blocks meets the size requirement of the neural network on the input image. For another example, after the eye image blocks of the two eyes of the target object are cut, the eye image blocks on the predetermined sides are mapped to form two eye image blocks on the same side of the target object, and optionally, the two eye image blocks on the same side may be scaled. The present disclosure does not limit the specific implementation of segmenting an eye image block from an image to be processed, nor does it limit the specific implementation of pre-processing the segmented eye image block.

And S310, carrying out eye opening and closing state detection processing on the image to be processed through a neural network, and outputting eye opening and closing state detection results. The neural network in the present disclosure is obtained by successfully training using an embodiment of the neural network training method in the present disclosure.

In an alternative example, the neural network in the present disclosure may output the eye opening/closing state detection result for an input eye patch as at least one probability value, for example, a probability value representing that the eye is in an opening state and a probability value identifying that the eye is in a closing state, where the two probability values may both range from 0 to 1, and the sum of the two probability values for the same eye patch is 1. The closer the magnitude of the probability value indicating that the eye is in the open state is to 1, the closer the eye in the eye image block is to the open eye state. The closer the magnitude of the probability value indicating that the eye is in the closed state is to 1, the closer the eye in the eye image block is to the closed state.

In an optional example, the present disclosure may further determine, with respect to the detection result of the eye opening and closing state with a time series relationship, output by the neural network, so that the eye movement of the target object in the multiple images to be processed with the time series relationship may be determined, for example, a quick blinking movement, an eye opening and closing movement, an eye squinting movement, or the like.

In an alternative example, the present disclosure may determine a facial expression of the target object in the plurality of images to be processed having a time series relationship, for example, smiling, laughing, or crying, or having a bitter taste, or the like, based on the eye-open-close state detection results having a time series relationship and the states of other organs of the face of the target object, which are output to the neural network.

In an alternative example, the present disclosure may further determine the eye opening/closing state detection result output by the neural network and having a time series relationship, so that the fatigue state of the target object in the plurality of images to be processed having the time series relationship may be determined, for example, light fatigue, dozing, sound sleep, or the like.

In an optional example, the present disclosure may further determine, with respect to the eye opening and closing state detection result output by the neural network and having a time series relationship, an eye movement of the target object in the multiple images to be processed having the time series relationship, so that the present disclosure may determine, at least according to the eye movement, the interaction control information expressed by the target object in the multiple images to be processed having the time series relationship.

In an alternative example, eye movements, facial expressions, fatigue states, and interaction control information, among others, determined by the present disclosure may be utilized by a variety of applications. For example, a predetermined special effect in the live broadcast/rebroadcast process is triggered or a corresponding man-machine conversation is realized by using a predetermined eye action and/or facial expression of the target object, so that the implementation mode of rich application is facilitated; for another example, in the smart driving technology, it is advantageous to prevent the fatigue driving phenomenon by detecting the fatigue state of the driver in real time. The present disclosure does not limit the specific application of the eye opening and closing state detection result output by the neural network.

FIG. 4 is a flow chart of one embodiment of an intelligent driving control method of the present disclosure. The intelligent driving control method disclosed by the invention can be suitable for an automatic driving environment and can also be suitable for a cruising driving environment. The present disclosure does not limit the applicable environment for the intelligent driving control method.

As shown in fig. 4, the method of this embodiment includes: step S400, step S410, step S420, and step S430. The steps in fig. 4 will be described in detail below.

And S400, acquiring an image to be processed, which is acquired by a camera device arranged on the vehicle. The specific implementation of this step can be referred to the description of S300 in fig. 3 in the above method embodiment, and is not described in detail here.

And S410, carrying out eye opening and closing state detection processing on the image to be processed through a neural network, and outputting eye opening and closing state detection results. The neural network in this embodiment is obtained by successfully training using the implementation manner of the neural network training method described above. The specific implementation of this step can be referred to the description of the above method embodiment for S310 in fig. 3, and is not described in detail here.

And S420, determining the fatigue state of the target object at least according to the eye opening and closing state detection results of the plurality of images to be processed with the time sequence relationship, which belong to the same target object.

In one optional example, the target object in the present disclosure is typically a driver. According to the eye opening and closing state monitoring method and device, index parameters such as the blinking times, the single eye closing time length or the single eye opening time length of a target object (such as a driver) in unit time are determined according to the eye opening and closing state monitoring results which belong to the same target object and have a time sequence relation, and therefore the corresponding index parameters are further judged by utilizing the preset index requirements, and whether the target object (such as the driver) is in a fatigue state can be determined. The fatigue state in the present disclosure may include a plurality of different degrees of fatigue states, for example, a mild fatigue state, a moderate fatigue state, or a moderate fatigue state, and the like. The present disclosure does not limit the specific implementation of determining the fatigue state of the target object.

And S430, forming a corresponding instruction according to the fatigue state of the target object, and outputting the instruction.

In one optional example, the present disclosure may generate the instructions according to a fatigue state of the target object, the instructions may include: the driver is switched to at least one of an intelligent driving state instruction, a voice warning fatigue driving instruction, a vibration awakening driver instruction, a dangerous driving information reporting instruction and the like, and the specific expression form of the instruction is not limited by the disclosure.

The neural network successfully trained by the neural network training method is beneficial to improving the accuracy of the open-closed eye state detection result of the neural network, so that the fatigue state judgment is carried out by utilizing the open-closed eye state detection result output by the neural network, the accuracy of fatigue state detection is beneficial to improving, a corresponding instruction is formed according to the detected fatigue state detection, the fatigue driving is avoided, and the driving safety is improved.

Fig. 5 is a schematic structural diagram of an embodiment of a neural network training device according to the present disclosure. The neural network training device shown in fig. 5 includes: an open-closed eye detection neural network 500 to be trained, and an adjustment module 510. Optionally, the apparatus may further include: an input module 520.

The open-closed eye detection neural network 500 to be trained is configured to perform eye open-closed state detection processing on a plurality of eye images in an image set corresponding to each of at least two open-closed eye detection training tasks, and output an eye open-closed state detection result. The different image sets contain at least partially different eye images.

In an alternative example, the neural network 500 for eye opening and closing detection to be trained of the present disclosure may be configured to, after successful training, perform eye opening and closing state detection on an image to be processed and output a result of the eye opening and closing state detection on the image to be processed, for example, for the image to be processed, the neural network 500 outputs two probability values, where one probability value represents a probability that an eye of a target object in the image to be processed is in an open state, and the greater the probability value, the closer the eye is to the open state; wherein the other probability value represents the probability that the eyes of the target object in the image to be processed are in the closed state, and the higher the probability value is, the closer the eyes are to the closed state is. The sum of the two probability values may be 1.

In one optional example, the neural network 500 in the present disclosure may be a convolutional neural network. The neural network 500 in the present disclosure may include, but is not limited to: convolutional layers, Relu layers (which may also be referred to as active layers), pooling layers, fully-connected layers, and layers for classification (e.g., binary), etc. The greater the number of layers that the neural network 500 contains, the deeper the network. The present disclosure does not limit the specific structure of the neural network 500.

In an alternative example, the present disclosure may be applied to training the neural network 500, where at least two open-closed eye detection training tasks are involved, and each open-closed eye detection training task should belong to the total training task for enabling the neural network to implement open-closed eye state detection. The training targets corresponding to different open-closed eye detection training tasks are not exactly the same. That is, the present disclosure may divide the total training task of the neural network 500 into a plurality of training tasks, each training task is directed to one training target, and the training targets corresponding to different training tasks are different.

In one optional example, the at least two open-closed eye detection training tasks in the present disclosure may include at least two of the following tasks: an open/close eye detection task for a case where the eyes have attachments, an open/close eye detection task for a case where the eyes have no attachments, an open/close eye detection task in an indoor environment, an open/close eye detection task in an outdoor environment, an open/close eye detection task for a case where the eyes have attachments and the attachments have no spots, and an open/close eye detection task for a case where the eyes have attachments and the attachments have no spots. The attachment may be glasses or a transparent plastic sheet. The light spot may be a light spot formed on the attachment due to reflection of light by the attachment. The above-mentioned tasks are described in detail with reference to the above-mentioned method embodiments, and will not be described in detail here.

Alternatively, the image sets corresponding to the above-mentioned six open-closed eye detection training tasks may be: the eye image set comprises an eye image set with attachments on the eyes, an eye image set without attachments on the eyes, an eye image set collected in an indoor environment, an eye image set collected in an outdoor environment, an eye image set with attachments on the eyes and light spots on the attachments, and an eye image set with attachments on the eyes and no light spots on the attachments. The detailed description of the image set exemplified above refers to the description of the above method embodiments, and will not be described in detail here.

In one alternative example, the eye image in the present disclosure may generally be: an eye image block cut out from an image including an eye captured by a camera. The process of forming the eye image in the present disclosure can be referred to the description of the above method embodiments, and will not be described in detail here.

In an alternative example, the eye image used for training the open-closed eye detecting neural network 500 in the present disclosure generally has annotation information, and the annotation information may indicate the open-closed state of the eyes in the eye image. Optionally, the labeling information in the present disclosure may also indicate that the eyes in the eye image are in the state of uncertainty in opening and closing, however, the eye image used for training the neural network 500 in the present disclosure generally does not include the eye image labeled with the information of uncertainty in opening and closing, so as to be beneficial to avoiding the influence of the eye image in the state of uncertainty in opening and closing on the neural network 500 and to improving the detection accuracy of the neural network 500 for detecting opening and closing eyes.

The input module 520 is used to acquire a corresponding number of eye images from different image sets and provide the eye-opening and eye-closing detecting neural network 500 to be trained. For example, the input module 520 obtains a corresponding number of eye images from different image sets for different open-close eye detection training tasks according to a preset image number ratio of the different open-close eye detection training tasks, and provides the eye images to the open-close eye detection neural network 500 to be trained. In addition, the input module 520 generally takes into account a predetermined batch number in acquiring the eye image. For example, in a case where the preset image number ratio is 1:1:1 for the open-closed eye detection training task a, the open-closed eye detection training task b, and the open-closed eye detection training task c, if the preset batch number is 600, the input module 520 may acquire 200 eye images from the eye image set corresponding to the open-closed eye detection training task a, 200 eye images from the eye image set corresponding to the open-closed eye detection training task b, and 200 eye images from the eye image set corresponding to the open-closed eye detection training task c.

Alternatively, if the number of eye images in the eye image set corresponding to a certain open-closed eye detection training task is less than the corresponding number (e.g., less than 200), the input module 520 may obtain a corresponding number of eye images from the eye image sets corresponding to other open-closed eye detection training tasks, so as to achieve the batch processing number. For example, assuming that only 100 eye images exist in the eye image set corresponding to the open-closed eye detection training task c, and the number of eye images in the eye image set corresponding to each of the open-closed eye detection training task a and the open-closed eye detection training task b exceeds 250, the input module 520 may acquire 250 eye images from the eye image set corresponding to the open-closed eye detection training task a, acquire 250 eye images from the eye image set corresponding to the open-closed eye detection training task b, and acquire 100 eye images from the eye image set corresponding to the open-closed eye detection training task c, so that the input module 520 acquires 600 eye images in total.

It should be noted that, the input module 520 may also use a mode of randomly setting the number to respectively acquire a corresponding number of eye images from the eye image sets corresponding to the different training tasks. The present disclosure does not limit the specific implementation manner in which the input module 520 respectively acquires a corresponding number of eye images from the respective eye image sets corresponding to different training tasks. In addition, in the process of acquiring the eye images from the eye image set by the input module 520, the acquisition of the eye images with the labeling information in the open-close uncertain state should be avoided, so that the detection accuracy of the open-close eye detection neural network is improved.

In an alternative example, the input module 520 may sequentially provide the acquired plurality of eye images to the open-close eye detecting neural network 500 to be trained, and the open-close eye detecting neural network 500 to be trained may perform the eye open-close state detecting process on each of the input eye images, respectively, so that the open-close eye detecting neural network 500 to be trained may sequentially output the eye open-close state detection result of each of the eye images. For example, after one eye image input to the open-closed eye detecting neural network 500 to be trained is sequentially subjected to convolutional layer processing, full-link layer processing, and layer processing for classification, the open-closed eye detecting neural network 500 to be trained outputs two probability values, the value ranges of the two probability values are 0 to 1, respectively, and the sum of the two probability values is 1. Wherein a probability value corresponds to the open state, the closer the probability value is to 1, the closer the eye in the eye image is to the open state. Wherein the other probability value corresponds to the closed state, and the closer the probability value is to 1, the closer the eye in the eye image is to the closed state.

The adjusting module 510 is configured to determine respective losses of the at least two eye opening and closing detection training tasks according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network 500, and adjust the network parameters of the neural network 500 according to the respective losses of the at least two eye opening and closing detection training tasks.

In an alternative example, the adjusting module 510 should determine a loss corresponding to each of the open-closed eye detection training tasks, and determine a combined loss according to the losses corresponding to all the training tasks, and the adjusting module 510 uses the combined loss to adjust the network parameters of the neural network. Network parameters in the present disclosure may include, but are not limited to: convolution kernel parameters and/or matrix weights, etc. The present disclosure does not limit the specifics of the network parameters involved.

In an optional example, for any open-closed eye detection training task, the adjusting module 510 may determine the loss corresponding to the training task according to an included angle between a maximum probability value in eye open-closed state detection results respectively output by the neural network for a plurality of eye images in an image set corresponding to the training task and an interface corresponding to the label information of the corresponding eye image in the image set.

Optionally, the adjusting module 510 may determine respective losses of different eye opening and closing detection training tasks by using an a-softmax (normalized index with an angle) loss function according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, and determine a comprehensive loss (such as a sum of the respective losses) according to the respective losses of the different eye opening and closing detection training tasks, and then, the adjusting module 510 adjusts the network parameter of the neural network by using a random gradient descent method. For example, the adjusting module 510 may calculate the loss corresponding to each open-close eye detection training task by using an a-softmax loss function, and perform back propagation processing according to the sum of the losses corresponding to all open-close eye detection training tasks, so that the network parameters of the open-close eye detection neural network 500 to be trained are updated in a manner of decreasing the loss gradient.

In an alternative example, the adjusting module 510 may control the training process to end when the training for the open-closed eye detecting neural network 500 to be trained reaches a predetermined iteration condition. The predetermined iteration condition in the present disclosure may include: the open/closed eye detection neural network 500 to be trained satisfies a predetermined difference requirement with respect to a difference between an eye open/closed state detection result output from the eye image and label information of the eye image. And under the condition that the difference meets the preset difference requirement, successfully training the neural network 500 at this time.

Optionally, the predetermined iteration condition used by the adjusting module 510 may also include: training the open-closed eye detection neural network to be trained, using the number of eye images reaching a predetermined number requirement, and the like. In the case that the number of eye images used reaches the predetermined number requirement, however, the difference does not satisfy the predetermined difference requirement, the neural network 500 is not trained successfully this time. The successfully trained neural network 500 may be used for the eye-open-closed state detection process.

Fig. 6 is a schematic structural diagram of an embodiment of the eye opening and closing state detection apparatus according to the present disclosure. As shown in fig. 6, the apparatus of this embodiment includes: an acquisition module 600 and a neural network 600. Optionally, the eye opening/closing state detecting means may further include: a determination module 620.

The obtaining module 600 is used for obtaining an image to be processed.

In an optional example, the image to be processed acquired by the acquiring module 600 may be an image presenting a static picture or a photo, or may be a video frame presenting a dynamic video, for example, a video frame in a video captured by a camera device disposed on a moving object, or a video frame in a video captured by a camera device disposed at a fixed position. The moving object may be a vehicle, a robot, or a robot arm. The fixed position can be a table top or a wall, etc.

In an optional example, after the obtaining module 600 obtains the image to be processed, it may detect a position area where the eyes in the image to be processed are located, for example, the obtaining module 600 may determine an eye outline of the image to be processed by using a method such as face detection or face key point detection. Then, the obtaining module 600 may segment the image of the eye region from the image to be processed according to the eye outline, and the segmented eye image block is provided to the neural network 600. Of course, the obtaining module 600 may perform a certain pre-processing on the segmented eye image blocks and provide the processed eye image blocks to the neural network 610. For example, the obtaining module 600 performs scaling on the segmented eye image blocks so that the size of the scaled eye image blocks meets the size requirement of the neural network 610 on the input image. For another example, after the eye image blocks of the two eyes of the target object are cut out, the obtaining module 600 performs mapping processing on the eye image blocks on a predetermined side, so as to form two eye image blocks on the same side of the target object, and optionally, the obtaining module 600 may further perform scaling processing on the two eye image blocks on the same side. The present disclosure does not limit the specific implementation manner of the obtaining module 600 for segmenting the eye image blocks from the image to be processed, nor the specific implementation manner of the obtaining module 600 for preprocessing the segmented eye image blocks.

The neural network 610 is configured to perform eye opening/closing state detection processing on an image to be processed, and output an eye opening/closing state detection result.

In an alternative example, the neural network 600 in the present disclosure may output the eye opening and closing state detection result for an input eye patch as at least one probability value, for example, a probability value representing that the eye is in an opening state and a probability value identifying that the eye is in a closing state, where the two probability values may both range from 0 to 1, and the sum of the two probability values for the same eye patch is 1. The closer the magnitude of the probability value indicating that the eye is in the open state is to 1, the closer the eye in the eye image block is to the open eye state. The closer the magnitude of the probability value indicating that the eye is in the closed state is to 1, the closer the eye in the eye image block is to the closed state.

The determining module 620 is configured to determine the eye movement and/or facial expression and/or fatigue state and/or interaction control information of the target object according to at least the eye opening and closing state detection result of the same target object in the multiple images to be processed having the time sequence relationship.

In one optional example, the target object's eye movement, e.g., a quick blinking movement, or an eye opening and closing movement, or a squinting movement, etc. The facial expression of the target object, for example, smiles, laughs, or cries, or is worried about, or the like. The fatigue state of the target subject, for example, light fatigue or dozing or deep sleep, etc. Interaction control information expressed by the target object, for example, confirm or deny, etc.

Fig. 7 is a schematic structural diagram of an embodiment of the intelligent driving control device of the present disclosure. The apparatus in fig. 7 mainly comprises: an acquisition module 600, a neural network 610, a fatigue state determination module 700, and an instruction module 710.

The obtaining module 600 is configured to obtain an image to be processed, which is acquired by a camera device disposed on a vehicle.

The operations specifically performed by the acquisition module 600 and the neural network 610 can be referred to in the description of the above device embodiments. The description will not be repeated here.

The fatigue state determining module 700 is configured to determine the fatigue state of the target object at least according to the eye opening and closing state detection results of the same target object in the multiple images to be processed having the time sequence relationship.

In one optional example, the target object in the present disclosure is typically a driver. The fatigue state determining module 700 may determine index parameters such as the number of blinks, the single eye-closing time length, or the single eye-opening time length of the target object (e.g., the driver) in a unit time according to the monitoring results of the eye-opening and closing states of the plurality of eyes belonging to the same target object and having a time sequence relationship, so that the fatigue state determining module 700 further determines corresponding index parameters by using the predetermined index requirements, and the fatigue state determining module 700 may determine whether the target object (e.g., the driver) is in a fatigue state. The fatigue state in the present disclosure may include a plurality of different degrees of fatigue states, for example, a mild fatigue state, a moderate fatigue state, or a moderate fatigue state, and the like. The present disclosure is not limited to a particular implementation in which the determine fatigue state module 700 determines the fatigue state of the target object.

The instruction module 710 is configured to form a corresponding instruction according to the fatigue state of the target object, and output the instruction.

In an alternative example, the instruction module 710 may generate the instructions according to the fatigue state of the target object, including: the driver is switched to at least one of an intelligent driving state instruction, a voice warning fatigue driving instruction, a vibration awakening driver instruction, a dangerous driving information reporting instruction and the like, and the specific expression form of the instruction is not limited by the disclosure.

Since the neural network 610 successfully trained by using the neural network training method of the present disclosure is favorable for improving the accuracy of the open-closed eye state detection result of the neural network, the fatigue state determining module 700 performs fatigue state judgment by using the open-closed eye state detection result output by the neural network 610, which is favorable for improving the accuracy of fatigue state detection, so that the instruction module 710 forms a corresponding instruction according to the detected fatigue state detection, which is favorable for avoiding fatigue driving, thereby being favorable for improving driving safety.

Exemplary device

Fig. 8 illustrates an exemplary device 800 suitable for implementing the present disclosure, which device 800 may be a control system/electronic system configured in an automobile, a mobile terminal (e.g., a smart mobile phone, etc.), a personal computer (PC, e.g., a desktop or laptop computer, etc.), a tablet computer, a server, and so forth. In fig. 8, the device 800 includes one or more processors, communication sections, and the like, and the one or more processors may be: one or more Central Processing Units (CPU)801 and/or one or more acceleration units 813, the acceleration unit 813 may be an image processor (GPU) or the like, and the processor may perform various appropriate actions and processes according to executable instructions stored in a Read Only Memory (ROM)802 or loaded from a storage section 808 into a Random Access Memory (RAM) 803. The communication portion 812 may include, but is not limited to, a network card, which may include, but is not limited to, an ib (infiniband) network card. The processor may communicate with the read only memory 802 and/or the random access memory 803 to execute executable instructions, communicate with the communication portion 812 through the bus 804, and communicate with other target devices through the communication portion 812 to accomplish the corresponding steps in the present disclosure.

The operations performed by the above instructions can be referred to the related description in the above method embodiments, and are not described in detail here. In addition, the RAM803 may store various programs and data necessary for the operation of the apparatus. The CPU801, ROM802, and RAM803 are connected to each other via a bus 804.

The ROM802 is an optional module in the case of the RAM 803. The RAM803 stores or writes executable instructions into the ROM802 at runtime, which cause the central processing unit 801 to execute the steps included in the object segmentation method described above. An input/output (I/O) interface 805 is also connected to bus 804. The communication unit 812 may be provided integrally with the bus, or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and connected to the bus.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output portion 807 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted in the storage section 808 as necessary.

It should be particularly noted that the architecture shown in fig. 8 is only an optional implementation manner, and in a specific practical process, the number and types of the components in fig. 8 may be selected, deleted, added or replaced according to actual needs; in different functional component arrangements, implementation manners such as a separate arrangement or an integrated arrangement may also be adopted, for example, the acceleration unit 813 and the CPU801 may be separately provided, further, for example, the acceleration unit 813 may be integrated on the CPU801, the communication portion may be separately provided, or may be integrally provided on the CPU801 or the acceleration unit 813, and the like. These alternative embodiments are all within the scope of the present disclosure.

In particular, according to embodiments of the present disclosure, the processes described below with reference to the flowcharts may be implemented as a computer software program, for example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the steps illustrated in the flowcharts, the program code may include instructions corresponding to performing the steps in the methods provided by the present disclosure.

In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809 and/or installed from the removable medium 811. When the computer program is executed by the Central Processing Unit (CPU)801, the instructions described in the present disclosure to realize the respective steps described above are executed.

In one or more optional embodiments, the present disclosure also provides a computer program product storing computer readable instructions that, when executed, cause a computer to perform the neural network training method or the eye-open/close state detection method or the smart driving control method described in any of the above embodiments.

The computer program product may be embodied in hardware, software or a combination thereof. In one alternative, the computer program product is embodied in a computer storage medium, and in another alternative, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

In one or more alternative embodiments, the disclosed embodiments further provide another visual tracking method and training method of a neural network, and corresponding apparatus and electronic device, computer storage medium, computer program, and computer program product, wherein the method includes: the first device sending a neural network training instruction or an eye-open/closed state detection instruction or a smart driving control instruction to the second device, the instruction causing the second device to execute the neural network training method or the eye-open/closed state detection method or the smart driving control method in any of the above possible embodiments; the first device receives a neural network training result or an eye opening and closing state detection result or an intelligent driving control result sent by the second device.

In some embodiments, the neural network training indication or the eye opening and closing state detection indication or the smart driving control indication may be embodied as a call instruction, and the first device may instruct the second device to perform the neural network training operation or the eye opening and closing state detection operation or the smart driving control operation by calling, and accordingly, in response to receiving the call instruction, the second device may perform the steps and/or processes in any of the above neural network training method or the eye opening and closing state detection method or the smart driving control method.

It is to be understood that the terms "first," "second," and the like in the embodiments of the present disclosure are used for distinguishing and not limiting the embodiments of the present disclosure. It is also understood that in the present disclosure, "plurality" may refer to two or more and "at least one" may refer to one, two or more. It is also to be understood that any reference to any component, data, or structure in this disclosure is generally to be construed as one or more, unless explicitly stated otherwise or indicated to the contrary hereinafter. It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

The methods and apparatus, electronic devices, and computer-readable storage media of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus, the electronic devices, and the computer-readable storage media of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A neural network training method, comprising:

carrying out eye opening and closing state detection processing on a plurality of eye images in an image set corresponding to at least two eye opening and closing detection training tasks respectively through an eye opening and closing detection neural network to be trained, and outputting eye opening and closing state detection results; wherein the eye images contained in the different image sets are at least partially different;

and determining losses corresponding to the at least two eye opening and closing detection training tasks respectively according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, and adjusting network parameters of the neural network according to the losses corresponding to the at least two eye opening and closing detection training tasks respectively.

2. The method of claim 1, wherein:

the at least two open-closed eye detection training tasks include at least two of the following tasks: an open/close eye detection task for a case where the eyes have attachments, an open/close eye detection task for a case where the eyes have no attachments, an open/close eye detection task in an indoor environment, an open/close eye detection task in an outdoor environment, an open/close eye detection task for a case where the eyes have attachments and the attachments have no spots, and an open/close eye detection task for a case where the eyes have attachments and the attachments have no spots;

the image sets corresponding to the at least two open-closed eye detection training tasks respectively comprise at least two image sets corresponding to the following images: the eye image set comprises an eye image set with attachments on the eyes, an eye image set without attachments on the eyes, an eye image set collected in an indoor environment, an eye image set collected in an outdoor environment, an eye image set with attachments on the eyes and light spots on the attachments, and an eye image set with attachments on the eyes and no light spots on the attachments.

3. An eye opening/closing state detection method, comprising:

acquiring an image to be processed;

carrying out eye opening and closing state detection processing on the image to be processed through a neural network, and outputting eye opening and closing state detection results;

wherein the neural network is obtained by training using the method of claims 1-2.

4. An intelligent driving control method, comprising:

acquiring an image to be processed, which is acquired by a camera device arranged on a vehicle;

determining the fatigue state of the target object at least according to the eye opening and closing state detection results of the same target object in a plurality of images to be processed with time sequence relation;

forming a corresponding instruction according to the fatigue state of the target object, and outputting the instruction;

5. A neural network training device, comprising:

the neural network for detecting the open/close eyes to be trained is used for respectively carrying out eye open/close state detection processing on a plurality of eye images in an image set corresponding to at least two eye open/close detection training tasks and outputting eye open/close state detection results; wherein the eye images contained in the different image sets are at least partially different;

and the adjusting module is used for respectively determining the loss corresponding to the at least two eye opening and closing detection training tasks according to the eye opening and closing labeling information of the eye image and the eye opening and closing state detection result output by the neural network, and adjusting the network parameters of the neural network according to the loss corresponding to the at least two eye opening and closing detection training tasks.

6. An eye opening/closing state detection device, comprising:

the acquisition module is used for acquiring an image to be processed;

the neural network is used for carrying out eye opening and closing state detection processing on the image to be processed and outputting eye opening and closing state detection results;

wherein the neural network is obtained by training with the device of claim 5.

7. An intelligent driving control device, comprising:

the acquisition module is used for acquiring an image to be processed, which is acquired by a camera device arranged on a vehicle;

the fatigue state determining module is used for determining the fatigue state of the target object at least according to the eye opening and closing state detection results of the same target object in the images to be processed with the time sequence relationship;

the instruction module is used for forming a corresponding instruction according to the fatigue state of the target object and outputting the instruction;

wherein the neural network is obtained by training with the device of claim 5.

8. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing a computer program stored in the memory, and which, when executed, implements the method of any of the preceding claims 1-4.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 4.

10. A computer program comprising computer instructions for implementing the method of any of claims 1-4 when said computer instructions are run in a processor of a device.