WO2020173135A1

WO2020173135A1 - Neural network training and eye opening and closing state detection method, apparatus, and device

Info

Publication number: WO2020173135A1
Application number: PCT/CN2019/118127
Authority: WO
Inventors: 王飞; 钱晨
Original assignee: 北京市商汤科技开发有限公司
Priority date: 2019-02-28
Filing date: 2019-11-13
Publication date: 2020-09-03
Also published as: JP2022517398A; JP7227385B2; KR20210113621A; CN111626087A

Abstract

Disclosed in the embodiments of the present disclosure are a neural network training method, an eye opening and closing state detection method, a smart driving control method, an apparatus, an electronic device, a computer readable storage medium, and a computer program, the neural network training method comprising: by means of a neural network to be trained for eye opening and closing state detection, respectively performing eye opening and closing state detection processing on multiple eye images in image sets corresponding to each of at least two eye opening and closing detection training tasks, and outputting eye opening and closing state detection results, the eye images contained in different image sets being at least partially different; on the basis of eye opening and closing label information of the eye images and the eye opening and closing state detection results outputted by the neural network, respectively determining the loss corresponding to each of the at least two eye opening and closing detection training tasks and, on the basis of the loss corresponding to each of the at least two eye opening and closing detection training tasks, adjusting the network parameters of the neural network.

Description

Neural network training and eye open and closed state detection method, device and equipment

This disclosure requires the priority of a Chinese patent application filed with the Chinese Patent Office on February 28, 2019, the application number is 201910153463.4, and the invention title is "Neural Network Training and Eye Open and Closed State Detection Method, Apparatus, and Equipment", and its entire contents Incorporated in this disclosure by reference.

Technical field

The present disclosure relates to computer vision technology, in particular to a neural network training method, neural network training device, eye open and closed state detection method, eye open and closed state detection device, intelligent driving control method, intelligent driving control device, electronic equipment, computer Readable storage medium and computer program.

Background technique

The eye open and closed state detection is to detect the open and closed conditions of the eyes. Eye open and closed state detection can be used in fatigue monitoring, living body recognition, facial expression recognition and other fields. For example, in assisted driving technology, it is necessary to detect the eye open and closed state of the driver, and determine whether the driver is in a fatigued driving state based on the detection result of the eye open and closed state, so as to realize the fatigue driving monitoring. Accurately detect the open and closed state of the eyes, avoid misjudgment as much as possible, and help improve the safety of vehicle driving.

Summary of the invention

The embodiments of the present disclosure provide a technical solution for neural network training, eye open and closed state detection, and intelligent driving control.

According to one aspect of the embodiments of the present disclosure, a neural network training method is provided, which includes: a neural network for open and closed eye detection to be trained, for multiple eyes in an image set corresponding to each of at least two open and closed eye detection training tasks The image is subjected to the eye open and closed state detection processing, and the eye open and closed state detection results are output; wherein, the eye images contained in different image sets are at least partially different; according to the eye open and closed annotation information of the eye image and the neural network output According to the detection results of the eye open and closed state of the at least two eye open and closed detection training tasks, the respective losses corresponding to each of the at least two eye open and closed detection training tasks are determined, and the network of the neural network is adjusted according to the respective losses of the at least two eye open and closed detection training tasks. parameter.

According to another aspect of the embodiments of the present disclosure, there is provided a method for detecting the open and closed state of eyes, including: acquiring an image to be processed; performing eye open and closed state detection processing on the image to be processed through a neural network, and outputting the open and closed eyes State detection result; wherein, the neural network is obtained by training using the neural network training method described in the foregoing implementation manner.

According to another aspect of the embodiments of the present disclosure, there is provided an intelligent driving control method, including: acquiring a to-be-processed image collected by a camera set on a vehicle; and performing an eye-opening state on the to-be-processed image via a neural network Detection processing, outputting the detection result of the eye open and closed state; at least according to the detection result of the eye open and closed state belonging to the same target object in the multiple images to be processed with a time series relationship, determine the fatigue state of the target object; according to the target object A corresponding instruction is formed in the fatigue state, and the instruction is output; wherein, the neural network is obtained by training using the neural network training method described in the foregoing embodiment.

According to still another aspect of the embodiments of the present disclosure, a neural network training device is provided, which includes: a neural network for open and closed eye detection to be trained, used for detecting a large number of images in at least two open and closed eye detection training tasks. Eye images, respectively perform eye open and closed state detection processing, and output eye open and closed state detection results; wherein, the eye images contained in different image sets are at least partially different; the adjustment module is used to mark the eye open and closed according to the eye image Information and the detection result of the eye open and closed state output by the neural network, respectively determine the loss corresponding to each of the at least two eye open and closed detection training tasks, and determine the loss corresponding to each of the at least two eye open and closed detection training tasks Adjust the network parameters of the neural network.

According to still another aspect of the embodiments of the present disclosure, there is provided an eye open and closed state detection device, including: an acquisition module for acquiring an image to be processed; a neural network for detecting the eye open and closed state of the image to be processed Processing and outputting the detection result of the eye open and closed state; wherein the neural network is obtained by training using the neural network training device described in the foregoing embodiment.

According to another aspect of the embodiments of the present disclosure, there is provided an intelligent driving control device, including: an acquisition module for acquiring images to be processed collected by a camera set on a vehicle; a neural network for evaluating the images to be processed , Perform eye open and closed state detection processing, and output eye open and closed state detection results; determine the fatigue state module, used to determine at least according to the eye open and closed state detection results of the same target object in multiple images to be processed with a time sequence relationship The fatigue state of the target object; an instruction module for forming a corresponding instruction according to the fatigue state of the target object, and outputting the instruction; wherein, the neural network is trained using the neural network training device described in the above embodiment acquired.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a memory for storing a computer program; a processor for executing the computer program stored in the memory, and when the computer program is executed, Any method embodiment of the present disclosure.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, it implements any method embodiment of the present disclosure.

According to another aspect of the embodiments of the present disclosure, there is provided a computer program, including computer instructions, which, when the computer instructions run in a processor of the device, implement any method implementation of the present disclosure.

In practicing the embodiments of the present disclosure, the inventor found that the traditional single-task training neural network often appears as a neural network trained on the image set of the task, which has better accuracy in detecting open and closed eyes in the scene corresponding to the task. However, in other scenarios not corresponding to the task, it is difficult to guarantee the accuracy of open and closed eyes detection. If you simply use the images collected from multiple different scenes as a whole image set for neural network training, it does not distinguish whether the images in the image set come from different scenes or correspond to different training tasks, and the whole image set is input to the neural network every time. The distribution of image subsets (Batch) in network training is uncontrollable. It is possible that there are many images in one scene but few or no images in other scenes. The distribution of image subsets in different iterations of training is not exactly the same, that is to say , The distribution of image subsets in each iteration of the neural network is too random, and different training tasks do not perform targeted loss calculations, so it is impossible to control the ability of the neural network to take into account different training tasks during the training process, so the training can not be guaranteed. The accuracy of the neural network's eyesight detection in different scenes corresponding to different tasks.

Based on the neural network training method and device, eye open and closed state detection method and device, intelligent driving control method and device, electronic equipment, computer readable storage medium, and computer program provided by the present disclosure, through multiple different eye open and closed detection tasks Respectively determine the corresponding image set, determine multiple eye images for a single training of the neural network from multiple image sets, and determine the opening and closing of the neural network for each training task in the training according to the eye images from multiple image sets The loss of eye detection results, and adjust the network parameters of the neural network according to each loss, so that the eye image subset fed to the neural network in each iteration of the neural network training includes the eye image corresponding to each training task, and Targeted calculation of the loss of each training task enables the neural network training process to learn the ability to detect the ability to open and close the eyes for each training task, taking into account the ability learning of different training tasks, so that the trained neural network can improve at the same time. The accuracy of the open and closed eye detection of the eye images of each of the multiple scenes corresponding to a training task is helpful to improve the universality and generalization of the technical solution for accurate detection of open and closed eyes in different scenarios based on the neural network , Which is conducive to better meet the actual application requirements of multiple scenarios.

The technical solutions of the present disclosure will be further described in detail below through the drawings and embodiments.

Description of the drawings

The drawings constituting a part of the specification describe the embodiments of the present disclosure, and together with the description, serve to explain the principle of the present disclosure.

With reference to the accompanying drawings, the present disclosure can be understood more clearly according to the following detailed description, in which:

Fig. 1 is a flowchart of an embodiment of the neural network training method of the present disclosure;

2 is a schematic diagram of an embodiment of multiple open and closed eye detection training tasks in the present disclosure;

FIG. 3 is a flowchart of an embodiment of the method for detecting the open and closed state of the eyes of the present disclosure;

4 is a flowchart of an embodiment of the intelligent driving control method of the present disclosure;

5 is a schematic structural diagram of an embodiment of the neural network training device of the present disclosure;

FIG. 6 is a schematic structural diagram of an embodiment of the eye open/close state detection device of the present disclosure;

FIG. 7 is a schematic structural diagram of an embodiment of the intelligent driving control device of the present disclosure;

Fig. 8 is a block diagram of an exemplary device for implementing the embodiments of the present disclosure.

Specific embodiment

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that unless specifically stated otherwise, the relative arrangement, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present disclosure.

At the same time, it should be understood that, for ease of description, the sizes of the various parts shown in the drawings are not drawn in accordance with actual proportional relationships.

The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use.

The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the specification.

It should be noted that similar reference numerals and letters indicate similar items in the following drawings, and therefore, once an item is defined in one drawing, it does not need to be discussed further in subsequent drawings.

The embodiments of the present disclosure can be applied to electronic devices such as terminal devices, computer systems, and servers, which can operate with many other general or special computing system environments or configurations. Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, and servers, including but not limited to: personal computer systems, server computer systems, thin clients, thick Client computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, large computer systems, and distributed cloud computing technology environments including any of the above systems, etc. .

Electronic devices such as terminal devices, computer systems, and servers can be described in the general context of computer system executable instructions (such as program modules) executed by the computer system. Generally, program modules can include routines, programs, target programs, components, logic, and data structures, etc., which perform specific tasks or implement specific abstract data types. The computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are executed by remote processing equipment linked through a communication network. In a distributed cloud computing environment, program modules may be located on a storage medium of a local or remote computing system including a storage device.

Exemplary embodiment

FIG. 1 is a flowchart of an embodiment of the neural network training method of the present disclosure. As shown in Fig. 1, the method of this embodiment includes steps: S100 and S110. Each step in Figure 1 will be described in detail below.

S100. After the neural network for eye open and closed detection to be trained, perform eye open and closed state detection processing on multiple eye images in the respective image sets corresponding to at least two open and closed eye detection training tasks, and output eye open and closed state detection result.

In an optional example, the to-be-trained neural network for eye-opening detection of the present disclosure can be used to detect the eye-open-close state of the image to be processed after being successfully trained, and output the detection result of the eye-open-close state of the image to be processed For example, for an image to be processed, the neural network outputs two probability values, where one probability value represents the probability that the target object in the image to be processed is in the open state. The greater the probability value, the closer to the open state; The other probability value represents the probability that the eyes of the target object in the image to be processed are in the closed state, and the larger the probability value, the closer to the closed state. The sum of the two probability values can be 1.

In an alternative example, the neural network in the present disclosure may be a convolutional neural network. The neural network in the present disclosure may include but is not limited to: convolutional layer, Relu (Rectified Linear Unit) layer (also called activation layer), pooling layer, fully connected layer, and classification (such as two Classification), etc. The more layers the neural network contains, the deeper the network. The present disclosure does not limit the specific structure of the neural network.

In an optional example, in the process of training the neural network in the present disclosure, there are at least two open and closed eye detection training tasks involved, and each open and closed eye detection training task should belong to the neural network. Realize the total training task of detecting open and closed eyes. The training targets corresponding to different open and closed eye detection training tasks are not exactly the same. That is to say, the present disclosure can divide the total training task of the neural network into multiple training tasks, each training task is aimed at one type of training target, and different training tasks correspond to different training targets.

In an optional example, the at least two open and closed eye detection training tasks in the present disclosure may include the following at least two tasks: the open and closed eye detection task when the eye has an attachment, and the open and closed eye detection when the eye has no attachment. Tasks, open and closed eyes detection tasks in indoor environments, open and closed eyes detection tasks in outdoor environments, open and closed eyes detection tasks with attachments to the eyes and spots on the attachments, eyes with attachments and no spots on the attachments The situation of open and closed eyes detection task. The above-mentioned attachments may be glasses or transparent plastic sheets. The aforementioned light spot may be a light spot formed on the attachment due to reflection of light from the attachment. The glasses in the present disclosure generally refer to glasses that can see the eye of the wearer through the lens.

Optionally, the open and closed eyes detection task in the case where the eyes have attachments may be the open and closed eyes detection task with glasses. The task of detecting open and closed eyes with glasses can be realized: at least one of detecting open and closed eyes with glasses indoors and detecting open and closed eyes with glasses outdoors.

Optionally, the open and closed eyes detection task in the case where there is no eye attachment may be the open and closed eyes detection task without glasses. The task of detecting open and closed eyes without glasses can be realized: at least one of the detection of open and closed eyes indoors without glasses and the detection of open and closed eyes outdoors without glasses.

Optionally, the task of detecting open and closed eyes in an indoor environment can be realized: detection of open and closed eyes without glasses in the room, detection of open and closed eyes with glasses in the room and reflection of glasses, and detection of glasses in the room without reflection of glasses At least one of the open and closed eyes detection.

Optionally, the task of detecting open and closed eyes in an outdoor environment can be realized: the detection of open and closed eyes without glasses outdoors, the detection of open and closed eyes with glasses and reflective glasses outdoors, and the detection of eyes with glasses and non-reflective glasses outdoors. At least one of the open and closed eyes detection.

Optionally, the open and closed eyes detection task in the case where there is an attachment on the eye and a light spot on the attachment may be an open and closed eye detection task with glasses and reflection of the glasses. The task of detecting open and closed eyes with glasses and reflections of the glasses can be realized: at least one of detection of open and closed eyes with glasses and reflections of glasses indoors and detection of open and closed eyes with glasses and reflections of glasses outdoors.

Optionally, the open and closed eye detection task where there is an attachment on the eye and there is no light spot on the attachment may be the open and closed eye detection task with glasses and the glasses are not reflective. The task of detecting open and closed eyes with glasses and non-reflective glasses can be realized: at least one of the detection of open and closed eyes with glasses and non-reflective glasses indoors and the detection of open and closed eyes with glasses and non-reflective glasses outdoors.

It can be seen from the above description that there is an intersection between different open and closed eye detection training tasks in the present disclosure. For example, the open and closed eye detection task with glasses can be compared with the open and closed eye detection task in an indoor environment, and the open and closed eye detection task in an outdoor environment. There is an overlap between the detection task, the open and closed eyes detection task when the eyes have attachments and light spots on the attachments, and the open and closed eyes detection tasks when the eyes have attachments and no light spots on the attachments. The situation where there is an intersection between the six open and closed eye detection training tasks mentioned above will not be explained one by one here. In addition, the present disclosure does not limit the number of open and closed eye detection training tasks involved, and the number of open and closed eye detection training tasks can be determined according to actual needs, and the present disclosure does not limit the specific performance of any open and closed eye detection training tasks. form.

Optionally, as shown in FIG. 2, the at least two open and closed eye detection training tasks in the present disclosure may include the following three open and closed eye detection training tasks:

Open and closed eyes detection training task a. Open and closed eyes detection training task in indoor environment;

Open and closed eyes detection training task b. Open and closed eyes detection task in outdoor environment;

Open and closed eyes detection training task c. Open and closed eyes detection task with attachments to the eyes and spots on the attachments.

There is no intersection between the open and closed eye detection training task a and the open and closed eye detection training task b, there can be an intersection between the training task a and the training task c, and there can be an intersection between the training task b and the training task c.

In an optional example, at least two open and closed eye detection training tasks in the present disclosure each correspond to an image set, for example, open and closed eyes detection training task a, open and closed eyes detection training task b, and open and closed eye detection training tasks in FIG. Each eye detection training task c corresponds to an image set. Each image set usually includes multiple eye images. The eye images contained in different image sets are at least partially different. That is, for any image set, at least part of the eye images in the image set will not appear in other image sets. Optionally, the eye images contained in different image sets may have an intersection.

Optionally, the image sets corresponding to each of the six open and closed eye detection training tasks mentioned above can be respectively: an eye image set with eyes attached, an eye image set without eyes attached, and an eye image collected in an indoor environment. Sets, eye image sets collected in an outdoor environment, eye image sets with attachments to the eyes and spots on the attachments, eye image sets with attachments to the eyes and no spots on the attachments.

Optionally, all eye images in the eye image set with eye attachments may be eye images with glasses. For example, the eye image set may include: eye images with glasses collected in an indoor environment and images in an outdoor environment. The captured eye image with glasses.

Optionally, all eye images in the eye image set without eye attachments may be eye images without glasses. For example, the eye image set may include: eye images without glasses collected in an indoor environment and those outdoors. Eye images without glasses collected in the environment.

Optionally, the set of eye images collected in an indoor environment may include: eye images without glasses collected in an indoor environment, and eye images with glasses collected in an indoor environment.

Optionally, the set of eye images collected in an outdoor environment may include: eye images without glasses collected in an outdoor environment, and eye images with glasses collected in an outdoor environment.

Optionally, all eye images in the eye image set with attachments on the eyes and spots on the attachments may be eye images with glasses and spots on the glasses. For example, the eye image set may include: in an indoor environment Collected eye images with glasses and speckles on the spectacles and eye images with spectacles and speckles on the spectacles collected in an outdoor environment.

Optionally, all eye images in the eye image set with attachments to the eyes and no spots on the attachments may be eye images with glasses and no spots on the glasses. For example, the eye image set may include: in an indoor environment Collected eye images with glasses and no light spots on the glasses and eye images with glasses and no light spots on the glasses collected in an outdoor environment.

In an optional example, the image set included in the present disclosure is determined by the open and closed eye detection training task included in the present disclosure. For example, if the present disclosure includes at least two of the above-mentioned six open and closed eye detection training tasks, the present disclosure includes respective eye image sets corresponding to the at least two open and closed eye detection training tasks.

In an optional example, the eye image used in the neural network training process of the present disclosure may also be called an eye image sample, and the image content of the eye image sample usually includes eyes. The eye image sample in the present disclosure is usually a monocular-based eye image sample, that is, the image content of the eye image sample does not include two eyes, but includes one eye. Optionally, the eye image sample may be an eye image sample based on a single side eye, for example, an eye image sample based on the left eye. Of course, the present disclosure does not exclude the case where the eye image sample is an eye image sample based on both eyes or an eye image sample based on any side of the eye.

In an optional example, the eye image in the present disclosure may generally be: an eye image block cut out from an image containing the eye captured by the camera. For example, the process of forming an eye image in the present disclosure may include: performing eye detection on the image taken by the camera device to determine the eye part in the image, and then segmenting the detected eye part from the image, optionally Yes, the present disclosure can perform processing such as zooming and/or image content mapping (such as converting a right-eye image block into a left-eye image block through image content mapping) on the segmented image blocks, thereby forming a method for training open and closed eyes detection Eye image with neural network. Of course, the eye image in the present disclosure does not rule out the possibility of using the complete image including the eye captured by the camera as the eye image. In addition, the eye image in the present disclosure may be the eye image in the corresponding training sample set.

In an optional example, the eye image used for training the neural network for detecting open and closed eyes in the present disclosure usually has annotation information, and the annotation information may indicate the open and closed state of the eyes in the eye image. In other words, the annotation information can indicate whether the eyes in the eye image are in an open state or a closed state. As an alternative example, the label information of the eye image is 1, which means that the eyes in the eye image are in the open state, and the label information of the eye image is 0, which means that the eyes in the eye image are in the closed state.

In an optional example, the present disclosure usually obtains a corresponding number of eye images from the eye image sets corresponding to different training tasks. For example, in Fig. 2, the eye images of the corresponding data obtained from the image set corresponding to the open and closed eye detection training task a are provided to the neural network for open and closed eye detection to be trained, and the image set corresponding to the open and closed eye detection training task b is obtained The eye images of the corresponding data are provided to the neural network for open and closed eyes detection to be trained, and the eye images with corresponding data obtained from the image set corresponding to the open and closed eye detection training task c are provided to the neural network for open and closed eyes detection to be trained.

As an optional example, the present disclosure may obtain a corresponding number of eye images from the eye image set corresponding to each training task according to the preset image number ratio of different training tasks; in addition, in the process of obtaining eye images, usually The preset batch quantity will be considered. For example, if the preset image quantity ratio is 1:1:1 for open and closed eyes detection training task a, open and closed eyes detection training task b, and open and closed eyes detection training task c, if the preset batch processing If the number is 600, the present disclosure can obtain 200 eye images from the eye image set corresponding to the open and closed eyes detection training task a, and 200 eye images from the eye image set corresponding to the open and closed eyes detection training task b, and from the open and closed eyes The eye image corresponding to the detection training task c is collected to acquire 200 eye images.

Optionally, if the number of eye images in the eye image set corresponding to a certain open and closed eye detection training task does not reach the corresponding number (such as less than 200), the eye images corresponding to the other open and closed eye detection training tasks can be detected Collect the corresponding number of eye images to achieve batch processing. For example, suppose that there are only 100 eye images in the eye image set corresponding to the open and closed eye detection training task c, and the number of eye images in the eye image set corresponding to each of the open and closed eye detection training task a and the open and closed eye detection training task b exceeds 250, then 250 eye images can be obtained from the eye image set corresponding to the open and closed eyes detection training task a, 250 eye images can be obtained from the eye image set corresponding to the open and closed eyes detection training task b, and 250 eye images can be obtained from the open and closed eyes detection training task c The corresponding eye images are collected in 100 eye images, so that a total of 600 eye images are obtained. In this way, the flexibility of obtaining eye images can be increased.

It should be particularly noted that the present disclosure may also adopt a method of randomly setting the number to obtain a corresponding number of eye images from the eye image sets corresponding to different training tasks. The present disclosure does not limit the specific implementation of obtaining a corresponding number of eye images from eye image sets corresponding to different training tasks. In addition, in the process of acquiring eye images from the eye image collection, it is necessary to avoid acquiring eye images whose label information is in an uncertain state of opening and closing, which is beneficial to improve the detection accuracy of the neural network for eye opening and closing detection.

In an optional example, the present disclosure may sequentially provide the acquired multiple eye images to the neural network for eye-opening detection to be trained, and the neural network for eye-opening detection to be trained performs the input for each eye The images are respectively subjected to eye open and closed state detection processing, so that the neural network for eye open and closed detection to be trained will sequentially output the eye open and closed state detection results of each eye image. For example, an eye image input to the neural network for open and closed eyes detection to be trained is processed by the convolutional layer, the fully connected layer, and the layer for classification. The neural network is used to output two probability values, the ranges of the two probability values are both 0 to 1, and the sum of the two probability values is 1. One of the probability values corresponds to the open state. The closer the probability value is to 1, the closer the eyes in the eye image are to the open state. The other probability value corresponds to the closed state, and the closer the probability value is to 1, the closer the eyes in the eye image are to the closed state.

S110. According to the eye open and closed annotation information of the eye image and the eye open and closed state detection result output by the above neural network, respectively determine the respective losses of the at least two open and closed eye detection training tasks, and detect according to the at least two open and closed eyes The loss corresponding to each training task adjusts the network parameters of the neural network.

In an optional example, the present disclosure should determine the loss corresponding to each open and closed eye detection training task, and determine the comprehensive loss according to the loss corresponding to each training task, and use the comprehensive loss to adjust the network of the neural network parameter. The network parameters in the present disclosure may include but are not limited to: convolution kernel parameters and/or matrix weights. The present disclosure does not limit the specific content contained in the network parameters.

In an optional example, for any open and closed eye detection training task, the present disclosure may output the largest of the eye open and closed state detection results respectively output by the neural network for multiple eye images in the image set corresponding to the training task. The angle between the probability value and the interface corresponding to the annotation information of the corresponding eye image in the image set is used to determine the loss corresponding to the training task. Optionally, the present disclosure may use the A-softmax (normalized index with angle) loss function to determine different openings and closings based on the eye opening and closing annotation information of the eye image and the detection result of the eye opening and closing state output by the neural network. The loss corresponding to each of the eye detection training tasks is determined, and the comprehensive loss (such as the sum of each loss) is determined according to the corresponding loss of different open and closed eye detection training tasks, and the stochastic gradient descent method is used to adjust the network parameters of the neural network. For example, the present disclosure can use the A-softmax loss function to calculate the respective loss of each open and closed eye detection training task, and perform back propagation processing based on the sum of the respective losses of all open and closed eye detection training tasks. The network parameters of the neural network for the open and closed eye detection to be trained are updated in the manner of loss gradient descent.

It can be seen from the above description that in the process of training the neural network in the present disclosure, all eye images provided to the neural network for each iteration of training can form a subset of eye images. The eye image subset includes eye images corresponding to each training task. In the present disclosure, by calculating the loss of each training task in a targeted manner, the neural network can learn the ability to detect the ability to open and close the eyes for each training task during the training process, taking into account the ability learning of different training tasks, so that the trained nerve The network can simultaneously improve the accuracy of eye open and closed detection of eye images of each scene in multiple scenes corresponding to multiple training tasks, thereby helping to improve the generalization of the technical solution for accurate detection of eye open and closed in different scenarios based on the neural network. Adaptability and generalization can better meet the actual application requirements of multiple scenarios.

The A-softmax loss function in the present disclosure can be represented by the following formula (1):

In the above formula (1), _Lang represents the loss corresponding to a training task; N represents the number of eye images of the training task; ||*|| represents the modulus of *; x _i represents the i-th corresponding to the training task Eye images; y _i represents the label value of the i-th eye image corresponding to the training task; m is a constant, and the minimum value of m is usually not less than a predetermined value, for example, the minimum value of m is not less than

Represents the angle between the maximum probability value in the eye open-close state detection result output by the neural network and the interface corresponding to the label value for the i-th eye image.

Represents the product of m and the above included angle.

In an optional example, this training process ends when the training of the neural network for detecting open and closed eyes to be trained reaches a predetermined iterative condition. The predetermined iterative conditions in the present disclosure may include: the difference between the eye open and closed state detection result output by the neural network for eye open and closed detection to be trained for the eye image and the label information of the eye image, which meets the predetermined difference requirement. In the case that the difference meets the predetermined difference requirement, the training of the neural network is successfully completed this time. The predetermined iterative conditions in the present disclosure may also include: training the neural network for open and closed eye detection to be trained, and the number of eye images used reaches a predetermined number requirement, etc. When the number of eye images used reaches the predetermined number requirement, however, the difference does not meet the predetermined difference requirement, the neural network was not successfully trained this time. The neural network that has been successfully trained can be used for the detection and processing of the eye open and closed state.

The present disclosure forms a comprehensive loss based on the loss of different training tasks, and uses the comprehensive loss to adjust the network parameters of the neural network for eye-opening detection, so that the neural network can open and close the eyes for each training task during the training process The ability learning of ability detection takes into account the ability learning of different training tasks, so that the trained neural network can simultaneously improve the accuracy of the eye image detection of the eyes of each scene in the multiple scenes corresponding to multiple training tasks, and then It is helpful to improve the universality and generalization of the technical solution based on the neural network for accurate detection of open and closed eyes in different scenarios, and better meet the actual application requirements of multiple scenarios.

FIG. 3 is a flowchart of an embodiment of the method for detecting the open and closed state of the eyes of the present disclosure.

As shown in FIG. 3, the method of this embodiment includes steps: S300 and S310. Each step in Figure 3 will be described in detail below.

S300. Obtain an image to be processed.

In an optional example, the image to be processed in the present disclosure may be an image that presents a static picture or a photo, or may be a video frame in a dynamic video, for example, captured by a camera set on a moving object The video frame in the video, for example, is a video frame in a video taken by a camera set at a fixed position. The above-mentioned moving objects may be vehicles, robots, or robotic arms. The above-mentioned fixed position can be a desktop or a wall. The present disclosure does not limit the specific manifestations of moving objects and fixed positions.

In an optional example, after acquiring the image to be processed, the present disclosure may detect the location area of the eyes in the image to be processed. For example, the method of face detection or face key point detection may be used to determine the area to be processed. The eye of the image circumscribes the frame. Then, the present disclosure can segment the image of the eye area from the image to be processed according to the circumscribed frame of the eye, and the segmented eye image block is provided to the neural network. Of course, the segmented eye image blocks can be provided to the neural network after certain preprocessing. For example, the segmented eye image block is scaled, so that the size of the eye image block after the scaled process can meet the size requirement of the neural network for the input image. For another example, after the eye image blocks of the two eyes of the target object are segmented, the eye image blocks on the predetermined side are mapped to form two eye image blocks on the same side of the target object. Optionally, also The two eye image blocks on the same side can be scaled. The present disclosure does not limit the specific implementation of segmenting the eye image block from the image to be processed, nor does it limit the specific implementation of preprocessing the segmented eye image block.

S310: Perform an eye open/close state detection process on the above-mentioned image to be processed via a neural network, and output an eye open/close state detection result. The neural network in the present disclosure is obtained through successful training using the implementation of the neural network training method in the present disclosure.

In an optional example, the neural network in the present disclosure is directed to the input eye image block, and the output eye open and closed state detection result may be at least one probability value, for example, a probability value indicating that the eye is open and For the probability value of the closed state, the value range of the two probability values may both be 0-1, and the sum of the two probability values for the same eye image block is 1. The closer the probability value that the eyes are in the open state is to 1, the closer the eyes in the eye image block are to the open eyes state. The closer the probability value that the eyes are in the closed state is to 1, the closer the eyes in the eye image block are to the closed-eye state.

In an optional example, the present disclosure can make further judgments based on the detection result of the eye open and closed state with a timing relationship output by the neural network, so as to determine the target object in the multiple images to be processed with a timing relationship. Eye movements, for example, fast blinking, opening one eye and closing one eye, or squinting.

In an optional example, the present disclosure can determine the multiple to-be-processed images with a timing relationship based on the detection result of the eye open and closed state with a timing relationship and the state of other organs of the target object's face output to the neural network The facial expressions of the target object, for example, smiling, laughing or crying or sad.

In an optional example, the present disclosure can make further judgments based on the detection result of the eye open and closed state with a timing relationship output by the neural network, so as to determine the target object in the multiple images to be processed with a timing relationship. The state of fatigue, for example, mild fatigue or dozing off or asleep.

In an optional example, the present disclosure can make further judgments based on the detection result of the eye open and closed state with a timing relationship output by the neural network, so as to determine the target object in the multiple images to be processed with a timing relationship. Eye actions, so that the present disclosure can determine the interactive control information expressed by the target objects in multiple images to be processed with a time sequence relationship at least according to eye actions.

In an optional example, the eye movements, facial expressions, fatigue states, and interactive control information determined by the present disclosure can be utilized by various applications. For example, using the predetermined eye movements and/or facial expressions of the target object to trigger the predetermined special effects in the live broadcast/rebroadcasting process or realize the corresponding human-computer interaction, etc., so as to facilitate the realization of rich applications; another example, in the intelligent driving technology In the real-time detection of the driver’s fatigue state, it is helpful to prevent fatigue driving. The present disclosure does not limit the specific application of the eye open and closed state detection results output by the neural network.

FIG. 4 is a flowchart of an embodiment of the intelligent driving control method of the present disclosure. The intelligent driving control method of the present disclosure can be applied in an automatic driving environment and also in a cruise driving environment. The present disclosure does not limit the applicable environment of the intelligent driving control method.

As shown in FIG. 4, the method of this embodiment includes steps: S400, S410, S420, and S430. The steps in Figure 4 will be described in detail below.

S400: Acquire an image to be processed collected by a camera device provided on the vehicle. For the specific implementation manner of this step, reference may be made to the description of S300 in FIG. 3 in the foregoing method implementation, which is not described in detail here.

S410: Perform an eye open/close state detection process on the above-mentioned image to be processed via a neural network, and output an eye open/close state detection result. The neural network in this embodiment is obtained through successful training using the implementation of the neural network training method described above. For the specific implementation manner of this step, reference may be made to the description of S310 in FIG. 3 in the foregoing method implementation, which is not described in detail here.

S420: Determine the fatigue state of the target object at least according to the detection results of the open and closed eyes of the same target object of the multiple images to be processed with a time series relationship.

In an alternative example, the target object in the present disclosure is usually the driver of the vehicle. The present disclosure can determine the number of blinks, the duration of a single eye closure, or a single eye opening of the target object (such as a driver) in a unit time based on the monitoring results of multiple eye open and closed states that belong to the same target object and have a time sequence relationship. Index parameters such as eye length can be used to determine whether the target object (such as the driver) is in a state of fatigue by using predetermined index requirements to further determine the corresponding index parameters. The fatigue state in the present disclosure may include various fatigue states of different degrees, for example, a mild fatigue state, a moderate fatigue state, or a severe fatigue state. The present disclosure does not limit the specific implementation of determining the fatigue state of the target object.

S430: Form a corresponding instruction according to the fatigue state of the target object, and output the instruction.

In an optional example, according to the fatigue state of the target object, the instructions generated by the present disclosure may include: switch to smart driving state instruction, voice alert fatigue driving instruction, vibration wake-up driver instruction, and report dangerous driving information instruction. At least one kind, the present disclosure does not limit the specific manifestation of the instruction.

Since the neural network successfully trained by the neural network training method of the present disclosure is beneficial to improve the accuracy of the detection result of the open and closed eye state of the neural network, the detection result of the open and closed eye state output by the neural network is used to perform the fatigue state The judgment is beneficial to improve the accuracy of the fatigue state detection, so that corresponding instructions are formed according to the detected fatigue state detection, which is beneficial to avoid fatigue driving, and thus is beneficial to improve driving safety.

FIG. 5 is a schematic structural diagram of an embodiment of the neural network training device of the present disclosure. The neural network training device as shown in FIG. 5 includes: a neural network 500 for detecting open and closed eyes to be trained and an adjustment module 510. Optionally, the device may further include: an input module 520.

The neural network 500 for eye open and closed detection to be trained is used to perform eye open and closed state detection processing on multiple eye images in the image set corresponding to at least two open and closed eye detection training tasks, respectively, and output eye open and closed state detection results . The eye images contained in different image sets are at least partially different.

In an optional example, the to-be-trained neural network 500 for eye-opening detection of the present disclosure can be used to detect the eye-open state of the image to be processed after being successfully trained, and output the eye-open state detection of the image to be processed As a result, for example, for an image to be processed, the neural network 500 outputs two probability values, one of which indicates the probability that the target object in the image to be processed is open. The larger the probability value, the closer to being open. State; where another probability value represents the probability that the eyes of the target object in the image to be processed is in the closed state, and the larger the probability value, the closer to the closed state. The sum of the two probability values can be 1.

In an alternative example, the neural network 500 in the present disclosure may be a convolutional neural network. The neural network 500 in the present disclosure may include, but is not limited to: a convolutional layer, a Relu layer (also referred to as an activation layer), a pooling layer, a fully connected layer, and a layer for classification (such as binary classification). The more layers contained in the neural network 500, the deeper the network. The present disclosure does not limit the specific structure of the neural network 500.

In an optional example, in the process of training the neural network 500 in the present disclosure, there are at least two open and closed eye detection training tasks involved, and each open and closed eye detection training task should belong to The network realizes the total training task of detecting the state of open and closed eyes. The training targets corresponding to different open and closed eye detection training tasks are not exactly the same. That is to say, the present disclosure can divide the total training task of the neural network 500 into multiple training tasks, each training task is aimed at one type of training target, and different training tasks correspond to different training targets.

In an optional example, the at least two open and closed eye detection training tasks in the present disclosure may include the following at least two tasks: the open and closed eye detection task when the eye has an attachment, and the open and closed eye detection when the eye has no attachment. Tasks, open and closed eyes detection tasks in indoor environments, open and closed eyes detection tasks in outdoor environments, open and closed eyes detection tasks with attachments to the eyes and spots on the attachments, eyes with attachments and no spots on the attachments The situation of open and closed eyes detection task. The above-mentioned attachments may be glasses or transparent plastic sheets. The aforementioned light spot may be a light spot formed on the attachment due to reflection of light from the attachment. For specific descriptions of the above-exemplified tasks, refer to the descriptions in the foregoing method implementations, and detailed descriptions are omitted here.

In an optional example, at least two open and closed eye detection training tasks in the present disclosure each correspond to an image set. Each image set usually includes multiple eye images. The eye images contained in different image sets are at least partially different. That is, for any image set, at least part of the eye images in the image set will not appear in other image sets. Optionally, the eye images contained in different image sets may have an intersection.

Optionally, the image sets corresponding to each of the six open and closed eye detection training tasks mentioned above can be respectively: an eye image set with eyes attached, an eye image set without eyes attached, and an eye image collected in an indoor environment. Sets, eye image sets collected in an outdoor environment, eye image sets with attachments to the eyes and spots on the attachments, eye image sets with attachments to the eyes and no spots on the attachments. For the specific description of the above-exemplified image set, please refer to the description in the above-mentioned method implementation, which will not be detailed here.

In an optional example, the eye image in the present disclosure may generally be: an eye image block cut out from an image containing the eye captured by the camera. For the formation process of the eye image in the present disclosure, reference may be made to the description in the foregoing method embodiment, which is not described in detail here.

In an optional example, the eye image used for training the neural network 500 for detecting open and closed eyes in the present disclosure usually has annotation information, and the annotation information may indicate the open and closed state of the eyes in the eye image. Optionally, the labeling information in the present disclosure may also indicate that the eyes in the eye image are in an uncertain state of opening and closing. However, the eye image used for training the neural network 500 in the present disclosure generally does not include the labeling information as being open. Closing the eye image in the uncertain state is beneficial to avoid the influence of the eye image in the uncertain state of opening and closing on the neural network 500, and is beneficial to improving the detection accuracy of the neural network 500 for detecting open and closed eyes.

The input module 520 is used to obtain a corresponding number of eye images from different image sets, and provide them to the neural network 500 for eye opening and closing detection to be trained. For example, the input module 520 obtains a corresponding number of eye images from different image sets for different open and closed eye detection training tasks according to preset image quantity ratios for different open and closed eye detection training tasks, and provides them to the open and closed eyes to be trained. Neural network 500 for closed eyes detection. In addition, the input module 520 usually considers the preset batch processing quantity when acquiring the eye image. For example, if the preset image quantity ratio is 1:1:1 for open and closed eyes detection training task a, open and closed eyes detection training task b, and open and closed eyes detection training task c, if the preset batch processing If the number is 600, the input module 520 can obtain 200 eye images from the eye image set corresponding to the open and closed eye detection training task a, and 200 eye images from the eye image set corresponding to the open and closed eye detection training task b, The eye image corresponding to the eye detection training task c is collected 200 eye images.

Optionally, if the number of eye images in the eye image set corresponding to a certain open and closed eye detection training task does not reach the corresponding number (such as less than 200), the input module 520 may correspond to other open and closed eye detection training tasks Obtain the corresponding number of eye images in the eye image collection to achieve batch processing. For example, suppose that there are only 100 eye images in the eye image set corresponding to the open and closed eye detection training task c, and the number of eye images in the eye image set corresponding to each of the open and closed eye detection training task a and the open and closed eye detection training task b exceeds 250, the input module 520 can obtain 250 eye images from the eye image set corresponding to the open and closed eyes detection training task a, and 250 eye images from the eye image set corresponding to the open and closed eyes detection training task b, and detect from the open and closed eyes The eye images corresponding to the training task c acquire 100 eye images collectively, so that the input module 520 acquires 600 eye images in total.

It should be particularly noted that the input module 520 may also adopt a manner of randomly setting a number to obtain a corresponding number of eye images from respective eye image sets corresponding to different training tasks. The present disclosure does not limit the specific implementation manner in which the input module 520 obtains a corresponding number of eye images from eye image sets corresponding to different training tasks. In addition, in the process of the input module 520 acquiring eye images from the eye image collection, it should avoid acquiring eye images whose labeling information is in an uncertain state of opening and closing, so as to help improve the detection accuracy of the neural network for eye opening and closing detection.

In an optional example, the input module 520 may sequentially provide the acquired multiple eye images to the neural network 500 for eye-opening detection to be trained, and the neural network for eye-opening detection 500 to be trained performs the input An eye image is separately processed for eye open and closed state detection, so that the neural network 500 for eye open and closed detection to be trained will sequentially output the eye open and closed state detection results of each eye image. For example, an eye image input to the neural network 500 for open and closed eyes detection to be trained is processed by the convolutional layer, the fully connected layer, and the layer for classification. The detection neural network 500 outputs two probability values, the value ranges of the two probability values are both 0 to 1, and the sum of the two probability values is 1. One of the probability values corresponds to the open state. The closer the probability value is to 1, the closer the eyes in the eye image are to the open state. The other probability value corresponds to the closed state, and the closer the probability value is to 1, the closer the eyes in the eye image are to the closed state.

The adjustment module 510 is configured to determine the respective corresponding losses of the at least two open and closed eye detection training tasks according to the eye open and closed annotation information of the eye image and the eye open and closed state detection result output by the neural network 500, and according to the at least two open and closed eye detection training tasks. The network parameters of the neural network 500 are adjusted for the respective losses corresponding to the closed-eye detection training tasks.

In an optional example, the adjustment module 510 should determine the respective loss corresponding to each open and closed eye detection training task, and determine the comprehensive loss according to the respective loss of all training tasks, and the adjustment module 510 uses the comprehensive loss to adjust the nerve. Network parameters of the network. The network parameters in the present disclosure may include but are not limited to: convolution kernel parameters and/or matrix weights. The present disclosure does not limit the specific content contained in the network parameters.

In an optional example, for any open and closed eye detection training task, the adjustment module 510 may output according to the eye open and closed state detection results of the multiple eye images in the image set corresponding to the training task by the neural network. The angle between the maximum probability value and the interface corresponding to the annotation information of the corresponding eye image in the image set is used to determine the loss corresponding to the training task.

Optionally, the adjustment module 510 may use the A-softmax (normalized index with angle) loss function to determine different openings based on the eye opening and closing annotation information of the eye image and the detection result of the eye opening and closing state output by the neural network. The loss corresponding to each of the closed-eye detection training tasks is determined, and the comprehensive loss (such as the sum of each loss) is determined according to the respective corresponding losses of the different open and closed-eye detection training tasks. Then, the adjustment module 510 adopts a stochastic gradient descent method to adjust the neural network Network parameters. For example, the adjustment module 510 can use the A-softmax loss function to calculate the respective loss of each open and closed eye detection training task, and perform back propagation processing based on the sum of the respective losses of all open and closed eye detection training tasks. , The network parameters of the neural network 500 for eye open and closed detection to be trained are updated in the manner of loss gradient descent.

In an optional example, when the training of the neural network 500 for detecting open and closed eyes to be trained reaches a predetermined iterative condition, the adjustment module 510 may control the end of this training process. The predetermined iterative condition in the present disclosure may include: the difference between the eye open and closed state detection result output by the neural network 500 for eye open and closed detection to be trained for the eye image and the annotation information of the eye image meets the predetermined difference requirement. In the case where the difference meets the predetermined difference requirement, the neural network 500 is successfully trained this time.

Optionally, the predetermined iterative conditions used by the adjustment module 510 may also include: training a neural network for detecting open and closed eyes to be trained, and the number of eye images used reaches a predetermined number requirement, etc. When the number of eye images used reaches the predetermined number requirement, but the difference does not meet the predetermined difference requirement, the neural network 500 is not successfully trained this time. The neural network 500 that has been successfully trained can be used for the detection processing of the eye open and closed state.

Fig. 6 is a schematic structural diagram of an embodiment of an eye open-close state detection device of the present disclosure. As shown in FIG. 6, the device of this embodiment includes: an acquisition module 600 and a neural network 600. Optionally, the device for detecting the eye open and closed state may further include: a determining module 620.

The acquiring module 600 is used to acquire the image to be processed.

In an optional example, the image to be processed obtained by the obtaining module 600 may be an image that presents a static picture or a photo, or may be a video frame in a dynamic video, for example, a camera set on a moving object. The video frame in the captured video is another example of the video frame in the video captured by a camera set at a fixed position. The above-mentioned moving objects may be vehicles, robots, or robotic arms. The above-mentioned fixed position can be a desktop or a wall.

In an optional example, after acquiring the image to be processed, the acquiring module 600 may detect the location area of the eyes in the image to be processed. For example, the acquiring module 600 may use methods such as face detection or face key point detection. Determine the eye circumscribed frame of the image to be processed. Then, the acquisition module 600 can segment the image of the eye area from the image to be processed according to the circumscribed frame of the eye, and the segmented eye image block is provided to the neural network 600. Of course, the acquisition module 600 may perform certain preprocessing on the segmented eye image blocks and provide them to the neural network 610. For example, the acquisition module 600 performs scaling processing on the segmented eye image blocks, so that the size of the eye image blocks after the scaling process meets the size requirement of the neural network 610 for the input image. For another example, after segmenting the eye image blocks of the two eyes of the target object, the acquisition module 600 performs mapping processing on the eye image blocks on the predetermined side thereof, thereby forming two eye image blocks on the same side of the target object. Yes, the acquisition module 600 can also perform scaling processing on two eye image blocks on the same side. The present disclosure does not limit the specific implementation manner of the acquisition module 600 segmenting the eye image blocks from the image to be processed, nor the specific implementation manner of the acquisition module 600 preprocessing the segmented eye image blocks.

The neural network 610 is used for the image to be processed, performing the detection processing of the eye open and closed state, and output the detection result of the eye open and closed state.

In an optional example, the neural network 600 in the present disclosure is directed to the input eye image block, and the output eye open and closed state detection result may be at least one probability value, for example, a probability value indicating that the eye is in an open state and For the probability value of the closed state, the value range of the two probability values may both be 0-1, and the sum of the two probability values for the same eye image block is 1. The closer the probability value that the eyes are in the open state is to 1, the closer the eyes in the eye image block are to the open eyes state. The closer the probability value that the eyes are in the closed state is to 1, the closer the eyes in the eye image block are to the closed-eye state.

The determining module 620 is configured to determine the eye movements and/or facial expressions and/or fatigue status and/or interaction of the target object at least according to the detection results of the open and closed eyes of the same target object in the multiple to-be-processed images with a time sequence relationship. Control information.

In an optional example, the eye motion of the target object, for example, a quick blinking motion, or an eye opening and closing motion, or an eye squinting motion, etc. Facial expressions of the target object, for example, smiling, laughing or crying or sadness, etc. The fatigue state of the target object, for example, mild fatigue or dozing off or deep asleep. The interactive control information expressed by the target object, for example, confirmation or denial.

FIG. 7 is a schematic structural diagram of an embodiment of the intelligent driving control device of the present disclosure. The device in FIG. 7 mainly includes: an acquisition module 600, a neural network 610, a fatigue state determination module 700, and an instruction module 710.

The acquisition module 600 is used to acquire the to-be-processed image collected by the camera device installed on the vehicle.

For specific operations performed by the acquisition module 600 and the neural network 610, refer to the description in the foregoing device implementation. The description will not be repeated here.

The fatigue state determining module 700 is configured to determine the fatigue state of the target object at least according to the detection results of the open/closed state of the eyes belonging to the same target object in the plurality of images to be processed with a time series relationship.

In an alternative example, the target object in this disclosure is usually a driver. The fatigue state determining module 700 can determine the number of blinks per unit time, the duration of a single eye closure, or the duration of a single eye closure of the target object (such as a driver) based on the monitoring results of multiple eye open and closed states that belong to the same target object and have a time sequence relationship. Index parameters such as the duration of a single eye opening to determine the fatigue state module 700 further judge the corresponding index parameters by using predetermined index requirements, and the fatigue state determination module 700 can determine whether the target object (such as the driver) is in a fatigue state. The fatigue state in the present disclosure may include various fatigue states of different degrees, for example, a mild fatigue state, a moderate fatigue state, or a severe fatigue state. The present disclosure does not limit the specific implementation manner of determining the fatigue state of the target object by the fatigue state determining module 700.

The instruction module 710 is used to form a corresponding instruction according to the fatigue state of the target object, and output the instruction.

In an optional example, the instruction module 710 generates instructions based on the fatigue state of the target object, and the generated instructions may include: switch to smart driving state instruction, voice alert fatigue driving instruction, vibration wake up driver instruction, and report dangerous driving information instruction, etc. At least one of the instructions, the present disclosure does not limit the specific manifestation of the instruction.

Since the neural network 610 successfully trained by the neural network training method of the present disclosure is beneficial to improve the accuracy of the detection results of the open and closed eyes of the neural network, therefore, the fatigue state determining module 700 uses the open and closed output of the neural network 610 Judging the fatigue state based on the result of the eye state detection is beneficial to improve the accuracy of the fatigue state detection. Therefore, the instruction module 710 forms a corresponding instruction according to the detected fatigue state detection, which is beneficial to avoid fatigue driving and thus is beneficial to improve driving safety.

Exemplary equipment

FIG. 8 shows an exemplary device 800 suitable for implementing the present disclosure. The device 800 may be a control system/electronic system configured in a car, a mobile terminal (for example, a smart mobile phone, etc.), a personal computer (PC, for example, a desktop). Computer or notebook computer, etc.), tablet computer, server, etc. In FIG. 8, the device 800 includes one or more processors, communication parts, etc., and the one or more processors may be: one or more central processing units (CPU) 801, and/or, one or more acceleration The unit 813, the acceleration unit 813 may be a graphics processor (GPU), etc., and the processor may be based on executable instructions stored in a read-only memory (ROM) 802 or loaded from the storage part 808 to a random access memory (RAM) 803. The instructions can be executed to perform various appropriate actions and processing. The communication unit 812 may include but is not limited to a network card, and the network card may include but is not limited to an IB (Infiniband) network card. The processor can communicate with the read-only memory 802 and/or the random access memory 803 to execute executable instructions, connect with the communication part 812 through the bus 804, and communicate with other target devices through the communication part 812, thereby completing the corresponding in this disclosure. step.

For the operations performed by the foregoing instructions, reference may be made to the related descriptions in the foregoing method embodiments, and detailed descriptions are omitted here. In addition, the RAM 803 can also store various programs and data required for device operation. The CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.

In the case of RAM803, ROM802 is an optional module. The RAM 803 stores executable instructions, or writes executable instructions into the ROM 802 during operation, and the executable instructions cause the central processing unit 801 to execute the steps included in the above method. An input/output (I/O) interface 805 is also connected to the bus 804. The communication unit 812 may be integrated, or may be configured to have multiple sub-modules (for example, multiple IB network cards) and be connected to the bus respectively.

The following components are connected to the I/O interface 805: an input part 806 including a keyboard and a mouse; an output part 807 such as a cathode ray tube (CRT), a liquid crystal display (LCD) and a speaker; a storage part 808 including a hard disk; And the communication part 809 including a network interface card such as a LAN card, a modem, and the like. The communication section 809 performs communication processing via a network such as the Internet. The driver 810 is also connected to the I/O interface 805 as needed. A removable medium 811, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 810 as needed, so that the computer program read from it is installed in the storage section 808 as needed.

It should be noted that the architecture shown in Figure 8 is only an optional implementation. In the specific practice process, the number and types of components in Figure 8 can be selected, deleted, added or replaced according to actual needs. ; In the setting of different functional components, implementation methods such as separate or integrated settings can also be used. For example, the acceleration unit 813 and the CPU801 can be separately arranged. For another example, the acceleration unit 813 can be integrated on the CPU801, and the communication part can be separately arranged, It can also be integrated on the CPU801 or the acceleration unit 813. These alternative embodiments all fall into the protection scope of the present disclosure.

In particular, according to the embodiments of the present disclosure, the process described below with reference to the flowcharts can be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product, which includes a computer program product tangibly contained on a machine-readable medium. A computer program. The computer program includes program code for executing the steps shown in the flowchart. The program code may include instructions corresponding to the steps in the method provided by the present disclosure.

In such an embodiment, the computer program may be downloaded and installed from the network through the communication part 809, and/or installed from the removable medium 811. When the computer program is executed by the central processing unit (CPU) 801, the instructions described in the present disclosure to implement the above-mentioned corresponding steps are executed.

In one or more optional implementation manners, the embodiments of the present disclosure also provide a computer program program product for storing computer-readable instructions, which when executed, cause a computer to execute the procedures described in any of the foregoing embodiments. Neural network training method or eye open and closed state detection method or intelligent driving control method.

The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional example, the computer program product is specifically embodied as a computer storage medium. In another optional example, the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.

In one or more optional implementation manners, the embodiments of the present disclosure also provide another method for detecting the open and closed state of eyes, a method for intelligent driving control, and a training method for neural networks, and corresponding devices, electronic equipment, and computer storage media. , A computer program and a computer program product, wherein the method includes: the first device sends a neural network training instruction or an eye open and closed state detection instruction or an intelligent driving control instruction to the second device, and the instruction causes the second device to perform any of the above possible The neural network training method or the eye open/close state detection method or the intelligent driving control method in the embodiment; the first device receives the neural network training result or the eye open/close state detection result or the intelligent driving control result sent by the second device.

In some embodiments, the neural network training instruction or the eye open and closed state detection instruction or the intelligent driving control instruction may be specifically a call instruction, and the first device may instruct the second device to perform the neural network training operation or open and close the eyes by calling. The state detection operation or the intelligent driving control operation, correspondingly, in response to receiving the call instruction, the second device may execute the steps and steps in any embodiment of the above-mentioned neural network training method or the eye-opening state detection method or the intelligent driving control method. /Or process.

It should be understood that terms such as “first” and “second” in the embodiments of the present disclosure are only for distinguishing purposes, and should not be construed as limiting the embodiments of the present disclosure. It should also be understood that in the present disclosure, "plurality" can refer to two or more, and "at least one" can refer to one, two, or more than two. It should also be understood that any component, data, or structure mentioned in the present disclosure can generally be understood as one or more unless it is clearly defined or the context gives opposite enlightenment. It should also be understood that the description of the various embodiments in the present disclosure emphasizes the differences between the various embodiments, and the same or similarities can be referred to each other, and for the sake of brevity, the details are not repeated one by one.

The method and apparatus, electronic equipment, and computer-readable storage medium of the present disclosure may be implemented in many ways. For example, the method and apparatus, electronic equipment, and computer-readable storage medium of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware. The above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated. In addition, in some embodiments, the present disclosure can also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

The description of the present disclosure is given for the sake of example and description, and is not exhaustive or limits the present disclosure to the disclosed form. Many modifications and changes are obvious to those of ordinary skill in the art. The embodiments are selected and described in order to better explain the principles and practical applications of the present disclosure, and to enable those of ordinary skill in the art to understand that the embodiments of the present disclosure can design various embodiments with various modifications suitable for specific purposes. .

Claims

A neural network training method is characterized in that it includes:

After the neural network for eye open and closed detection to be trained, perform eye open and closed state detection processing on multiple eye images in the image set corresponding to each of at least two open and closed eye detection training tasks, and output eye open and closed state detection results; Wherein, the eye images contained in different image sets are at least partially different;

According to the eye open and closed annotation information of the eye image and the detection result of the eye open and closed state output by the neural network, the loss corresponding to each of the at least two eye open and closed detection training tasks is determined, and according to the at least two Adjust the network parameters of the neural network by detecting the loss corresponding to each of the open and closed eyes detection training tasks.
The method according to claim 1, wherein:

The at least two open and closed eye detection training tasks include the following at least two tasks: the open and closed eye detection task when the eyes have attachments, the open and closed eye detection task when the eyes have no attachments, and the open and closed eyes detection in an indoor environment Tasks, open and closed eyes detection tasks in outdoor environments, open and closed eyes detection tasks with attachments to the eyes and spots on the attachments, open and closed eyes detection tasks with attachments to the eyes and no spots on the attachments;

The respective image sets corresponding to the at least two open and closed eye detection training tasks include the following corresponding at least two image sets: an eye image set with eyes attached, an eye image set without eyes attached, and eyes collected in an indoor environment Image set, eye image set collected in an outdoor environment, eye image set with attachments to the eyes and spots on the attachments, eye image sets with attachments to the eyes and no spots on the attachments.
The method according to claim 1 or 2, wherein the neural network for detecting open and closed eyes to be trained performs multiple eye images in the image set corresponding to at least two open and closed eye detection training tasks, respectively, Perform eye open and closed state detection processing respectively, and output the eye open and closed state detection results, including:

According to preset image quantity ratios of different open and closed eye detection training tasks, for different open and closed eye detection training tasks, obtain a corresponding number of eye images from different image sets;

After the neural network for eye open and closed detection to be trained, the eye open and closed state detection processing is performed on the corresponding number of eye images, and the eye open and closed state detection results corresponding to each eye image are output.
The method according to any one of claims 1 to 3, characterized in that, according to the eye open and closed annotation information of the eye image and the eye open and closed state detection result output by the neural network, the The loss corresponding to at least two open and closed eyes detection training tasks includes:

For any open and closed eye detection training task, the maximum probability value in the eye open and closed state detection results respectively output by the neural network for the multiple eye images in the image set corresponding to the training task corresponds to the corresponding value in the image set. The angle between the interface corresponding to the annotation information of the eye image determines the loss corresponding to the training task.
The method according to any one of claims 1 to 4, wherein the adjusting the network parameters of the neural network according to the respective losses of the at least two open and closed eye detection training tasks comprises:

Determine the comprehensive loss of the at least two open and closed eye detection training tasks according to the respective losses of the at least two open and closed eye detection training tasks;

According to the comprehensive loss, the network parameters of the neural network are adjusted.
A method for detecting the open and closed state of eyes, characterized in that it comprises:

Obtain the image to be processed;

Through the neural network, perform eye open and closed state detection processing on the image to be processed, and output the eye open and closed state detection result;

Wherein, the neural network is obtained by training using the method described in any one of claims 1-5.
The method according to claim 6, wherein the method for detecting the open and closed state of the eyes further comprises:

Determine the eye movements and/or facial expressions and/or fatigue status and/or interactive control information of the target object at least according to the detection results of the open and closed eyes of the same target object in the multiple to-be-processed images having a time sequence relationship.
An intelligent driving control method, characterized by comprising:

Acquiring the image to be processed collected by the camera device installed on the vehicle;

Through the neural network, perform eye open and closed state detection processing on the image to be processed, and output the eye open and closed state detection result;

Determining the fatigue state of the target object at least according to the detection results of the open and closed eyes of the same target object in the plurality of images to be processed in a time series relationship;

Form a corresponding instruction according to the fatigue state of the target object, and output the instruction;

Wherein, the neural network is obtained by training using the method described in any one of claims 1-5.
A neural network training device is characterized in that it comprises:

The neural network for eye open and closed detection to be trained is used to perform eye open and closed state detection processing on multiple eye images in the image set corresponding to each of at least two open and closed eye detection training tasks, and output the eye open and closed state detection results ; Among them, the eye images contained in different image sets are at least partially different;

The adjustment module is configured to determine the respective corresponding losses of the at least two eye open and closed detection training tasks according to the eye open and closed annotation information of the eye image and the eye open and closed state detection result output by the neural network, and according to The loss corresponding to each of the at least two open and closed eye detection training tasks adjusts the network parameters of the neural network.
The device according to claim 9, wherein:

The at least two open and closed eye detection training tasks include the following at least two tasks: the open and closed eye detection task when the eyes have attachments, the open and closed eye detection task when the eyes have no attachments, and the open and closed eyes detection in an indoor environment Tasks, open and closed eyes detection tasks in outdoor environments, open and closed eyes detection tasks with attachments to the eyes and spots on the attachments, open and closed eyes detection tasks with attachments to the eyes and no spots on the attachments;

The respective image sets corresponding to the at least two open and closed eye detection training tasks include the following corresponding at least two image sets: an eye image set with eyes attached, an eye image set without eyes attached, and eyes collected in an indoor environment Image set, eye image set collected in an outdoor environment, eye image set with attachments to the eyes and spots on the attachments, eye image sets with attachments to the eyes and no spots on the attachments.
The device according to claim 9 or 10, wherein the device further comprises:

The input module is used to obtain a corresponding number of eye images from different image sets for different open and closed eye detection training tasks according to the preset image quantity ratios of different open and closed eye detection training tasks, and provide them to the training task Neural network for detecting open and closed eyes;

The neural network for detecting open and closed eyes to be trained performs eye open and closed state detection processing on the corresponding number of eye images, and outputs the corresponding eye open and closed state detection results of each eye image.
The device according to any one of claims 9 to 11, wherein the adjustment module is further configured to:

For any open and closed eye detection training task, the maximum probability value in the eye open and closed state detection results output by the neural network for multiple eye images in the image set corresponding to the training task is the same as the corresponding eye image in the image set. The angle between the interface corresponding to the labeling information of, determines the loss corresponding to the training task.
The device according to any one of claims 9 to 12, wherein the adjustment module is further configured to:

Determine the comprehensive loss of the at least two open and closed eye detection training tasks according to the respective losses of the at least two open and closed eye detection training tasks;

According to the comprehensive loss, the network parameters of the neural network are adjusted.
A device for detecting the state of opening and closing eyes, which is characterized in that it comprises:

The acquisition module is used to acquire the image to be processed;

The neural network is used to perform eye open/close state detection processing on the image to be processed, and output the eye open/close state detection result;

Wherein, the neural network is obtained by training with the device according to any one of claims 9 to 13.
The device according to claim 14, wherein the device for detecting the state of opening and closing the eyes further comprises:

The determining module is used to determine the eye movements and/or facial expressions and/or fatigue status and/or fatigue status of the target object at least according to the detection results of the open and closed eyes of the same target object in the multiple images to be processed with a time sequence relationship Or interactive control information.
An intelligent driving control device, characterized in that it comprises:

The acquisition module is used to acquire the to-be-processed image collected by the camera device installed on the vehicle;

The neural network is used to perform eye open/close state detection processing on the image to be processed, and output the eye open/close state detection result;

The fatigue state determining module is configured to determine the fatigue state of the target object at least according to the detection results of the open and closed state of the eyes belonging to the same target object in the plurality of images to be processed with a time sequence relationship;

The instruction module is used to form a corresponding instruction according to the fatigue state of the target object, and output the instruction;

Wherein, the neural network is obtained by training with the device according to any one of claims 9 to 13.
An electronic device including:

Memory, used to store computer programs;

The processor is configured to execute the computer program stored in the memory, and when the computer program is executed, it implements the method according to any one of claims 1-8.
A computer-readable storage medium with a computer program stored thereon, and when the computer program is executed by a processor, the method according to any one of claims 1-8 is realized.
A computer program comprising computer instructions, when the computer instructions run in the processor of the device, the method according to any one of claims 1-8 is implemented.