WO2021016873A1 - 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质 - Google Patents

基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2021016873A1
WO2021016873A1 PCT/CN2019/098407 CN2019098407W WO2021016873A1 WO 2021016873 A1 WO2021016873 A1 WO 2021016873A1 CN 2019098407 W CN2019098407 W CN 2019098407W WO 2021016873 A1 WO2021016873 A1 WO 2021016873A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
inattention
face
detection method
convolutional neural
Prior art date
Application number
PCT/CN2019/098407
Other languages
English (en)
French (fr)
Inventor
李晓会
彭刚
南楠
叶丽萍
Original Assignee
珠海全志科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 珠海全志科技股份有限公司 filed Critical 珠海全志科技股份有限公司
Priority to PCT/CN2019/098407 priority Critical patent/WO2021016873A1/zh
Priority to CN201980001324.8A priority patent/CN110678873A/zh
Priority to US17/631,083 priority patent/US20220277558A1/en
Publication of WO2021016873A1 publication Critical patent/WO2021016873A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris

Definitions

  • the present invention relates to the processing field of image recognition, in particular, an attention detection method based on a cascaded neural network, a computer device and a computer-readable storage medium for implementing the method.
  • the main purpose of the present invention is to provide a cascaded neural network-based attention detection method with low computational complexity and good computational performance.
  • Another object of the present invention is to provide a computer device that implements the above-mentioned attention detection method based on the cascaded neural network.
  • Another object of the present invention is to provide a computer-readable storage medium that implements the above-mentioned attention detection method based on the cascaded neural network.
  • the attention detection method based on the cascaded neural network includes acquiring video data, recognizing multi-frame images, and extracting face regions of multi-frame images; and applying the first volume
  • the product neural network recognizes the face area and determines whether the first situation of inattention occurs; if it is confirmed that the first situation of inattention does not occur, the second convolutional neural network is used to recognize the face area. Determine whether there is a second situation of inattention; wherein, the computational complexity of the first convolutional neural network is less than the computational complexity of the second convolutional neural network.
  • a preferred solution is that applying the second convolutional neural network to recognize the face region includes: cutting out multiple regions of interest from the face region, and judging whether there is attention based on the recognition results of two or more regions of interest The second case of inconcentration.
  • a further solution is that multiple regions of interest include face frame regions and face supplementary regions; the second case of judging whether there is inattention based on the recognition results of more than two types of regions of interest includes: according to face frame regions And the image recognition result of the face supplement area determines whether the second situation of inattention occurs.
  • a further solution is that multiple regions of interest include face frame regions and eye regions; the second case of judging whether there is inattention based on the recognition results of more than two regions of interest includes: according to the face frame region And the image recognition result of the eye area determines whether the second situation of inattention occurs.
  • a further solution is that determining whether the first situation of inattention occurs includes: applying the first convolutional neural network to recognize the face region, and determining whether the rotation angle of the head in the preset direction is greater than the preset angle, if so , Confirm the first situation of inattention.
  • the second convolutional neural network includes a first convolutional layer, a depthwise convolutional layer, multiple bottleneck residual layers, a second convolutional layer, a linear global depthwise convolutional layer, Linear convolutional layer, fully connected layer and classification layer.
  • the bottleneck residual layer includes a convolution unit and a depth-wise convolution unit that receives the output of the convolution unit, and is also provided with a residual unit.
  • the step length of the convolution unit is 1, the residual unit realizes the bottleneck The residual operation of the residual layer.
  • a further solution is that, after the video data is acquired, the recognition of multiple frames of images includes: selecting one frame of images from each continuous preset number of frames of the video data for recognition.
  • the computer device includes a processor and a memory, and the memory stores a computer program.
  • the computer program is executed by the processor, each step of the attention detection method based on the cascaded neural network is realized.
  • the present invention provides a computer readable storage medium with a computer program stored on it, and when the computer program is executed by a processor, each step of the attention detection method based on the cascaded neural network is realized.
  • the solution of the present invention first divides the face area into a variety of regions of interest, and separately recognizes the multiple regions of interest. Then combine the recognition of multiple regions of interest for fusion analysis to determine whether the second situation of inattention occurs. In this way, the accuracy of analysis can be improved, and the recognition effect of inattention is better.
  • various attention situations of the person can be recognized, such as the driver looking at the left rearview mirror, looking straight ahead, looking at the car rearview mirror, looking at the right
  • the rearview mirror, gaze at the instrument panel, gaze at the central control area and closed eyes can determine whether the driver is distracted driving or whether he wants to change lanes.
  • Continuous face images are divided into the car's rearview mirror, the dashboard, and the center control area. It can be judged that the driver is distracted while driving. When several consecutive face images are divided into the left rearview mirror and the front, it can be judged. The driver wants to change lanes.
  • the driver can be judged whether the driver is fatigued, distracted, or wants to change lanes.
  • the driver can be judged as fatigue Driving, when several consecutive face images are divided into left or right, and the eyes are fixed to the left or right, the second situation of inattention can be considered.
  • the first situation of inattention occurs. In this way, the design of the first convolutional neural network is very simple and the amount of calculation is small. Once it is judged that the first situation of inattention occurs, there is no need to perform the judgment of the second situation, which can save the overall calculation amount of inattention judgment.
  • the calculation of the second convolutional neural network is more complicated, it can accurately identify whether the driver has other inattention situations, and the judgment is more accurate.
  • each continuous multi-frame image only one of the images is selected for recognition, which can greatly reduce the amount of calculation for inattention recognition, but it can also ensure the accuracy of the recognition result.
  • one frame of image can be selected from every six frames for identification.
  • Fig. 1 is a flowchart of an embodiment of an attention detection method based on a cascaded neural network of the present invention.
  • Fig. 2 is the calculation formula of the numerical softmax probability value in the embodiment of the attention detection method based on the cascaded neural network of the present invention.
  • Fig. 3 is a structural block diagram of the first convolutional neural network in the embodiment of the attention detection method based on the cascaded neural network of the present invention.
  • Fig. 4 is a schematic diagram of four regions of interest for image recognition using the embodiment of the attention detection method based on cascaded neural network of the present invention.
  • Fig. 5 is a structural block diagram of the second convolutional neural network in the embodiment of the attention detection method based on the cascaded neural network of the present invention.
  • Fig. 6 is a structural block diagram of the second convolutional neural network in the embodiment of the attention detection method based on the cascaded neural network when the step size of the bottleneck residual layer of the second convolutional neural network is 1.
  • FIG. 7 is a structural block diagram of the second convolutional neural network in the embodiment of the attention detection method based on the cascaded neural network when the step size of the bottleneck residual layer of the second convolutional neural network is 2.
  • the attention detection method based on the cascaded neural network of the present invention is applied to a smart device.
  • the smart device is provided with a camera device, such as a camera, etc., and the smart device uses the video data obtained by the camera device to perform image analysis to determine the specific Of the staff are inattentive.
  • the smart device is provided with a processor and a memory, and a computer program is stored on the memory, and the processor implements the attention detection method based on the cascaded neural network by executing the computer program.
  • This embodiment is mainly based on head posture and eye information, and applies a cascaded convolutional neural network to pay attention to a specific person.
  • the whole method mainly includes three steps of video acquisition, image processing, and attention detection.
  • the camera device is used to capture video data.
  • This embodiment can identify the video data of different scenes (including different shooting angles, external lighting conditions, target positions, etc.). Therefore, the camera device can capture a variety of different The target video data in the posture.
  • the image processing step multiple frames of images are obtained from the video data, the frames are detected by the face detection algorithm, and the image of the face area is intercepted.
  • the attention detection step the first convolutional neural network with low computational complexity is used to judge the head posture of the detection object, and then the primary judgment of attention detection is realized; then, the detected face area of interest is further
  • the second convolutional neural network with high computational complexity is used to extract head posture features and eye feature information. By analyzing the person's gaze direction, the person's behavior can be judged.
  • the cascaded convolutional neural network used in this embodiment has good generalization performance, low computational complexity, and is suitable for embedded devices.
  • step S1 is executed to obtain video data, that is, continuous video data is obtained by the camera device of the smart device.
  • the smart device may be a device installed in the car to detect whether the driver's attention is not concentrated, and the camera device may be installed directly in front of the driving position, side front, etc., for example, installed under the sun visor in the driving position , Above the center console, etc.
  • the camera device can start recording video after the car engine is started, and transmit the acquired continuous video data to the processor, and the processor will process the video data.
  • step S2 is executed to recognize the image and extract the face area in the image. Since the video data obtained in step S1 includes continuous multi-frame images, step S2 is to identify the received multi-frame images. However, since the pictures of consecutive multi-frame images are very similar, if each frame of image is recognized, not only will the amount of calculation be very large, but the recognition results of adjacent multi-frame images are often the same. Therefore, this embodiment assumes In each continuous multi-frame image, one of the images can be selected for recognition. For example, set to select one frame from every six or eight frames for recognition, that is, face detection on this frame of image and interception detection The area of the face that is reached. Specifically, the process of face detection is the process of determining the position, size, and posture of all faces in the image assuming that there are one or more faces in the input image. The process can use currently known face detection algorithms. Implementation, I will not repeat it here.
  • the cascaded convolutional neural network includes a first neural network and a second neural network.
  • the first convolutional neural network uses To determine the head posture of the detection object, and then realize the primary judgment of attention, that is, to determine whether the first situation of inattention occurs.
  • the second convolutional neural network is used to extract head posture features and eye feature information. By analyzing the person's gaze direction, the person's behavior can be judged to realize attention detection.
  • step S3 is first performed to apply the first convolutional neural network to recognize the face area.
  • this embodiment uses a pre-trained detection model to determine the concentration state of the detection object. For example, when the head of the detection object rotates more than a certain angle, it belongs to the first situation of inattention, such as when When the driver is turning his head left (greater than 60 degrees), turning his head right (greater than 60 degrees), raising his head (greater than 60 degrees), or lowering his head (greater than 60 degrees), it is considered inattention.
  • the first convolutional neural network is a convolutional neural network with a small size and low computational complexity.
  • the first convolutional neural network of this embodiment includes several convolution layers (convolution), several pooling layers (maxpool), a fully connected layer 16 (fully connected) and a classification layer 17, among which, Each pooling layer is located between two adjacent convolutional layers.
  • the dashed box 11 includes a unit composed of multiple convolutional layers and pooling layers, and each unit includes one Convolutional layer and a pooling layer, the output of the last pooling layer is input to convolutional layer 15, so the number of convolutional layers is one more layer than the number of pooling layers.
  • the parameters for multiple convolutional layers there are two types of parameters for multiple convolutional layers, one of which has a parameter filter number of m, a convolution kernel size of k 1 ⁇ k 1 , and a step size pixel S 1 .
  • the parameters of a convolutional layer are the number of filters is n, the size of the convolution kernel is k 2 ⁇ k 2 , and the step size pixel is S 2 .
  • Each pooling layer samples the output of the previous convolutional layer.
  • the fully connected layer 16 is used to implement the process of turning the two-dimensional feature matrix output by the convolution layer 15 into a one-dimensional feature vector.
  • the classification layer 17 uses the softmax function to map the outputs of multiple neurons into the (0, 1) interval, which can be understood as a probability distribution. Assumed probability distribution for the vector P, P, P i represents the i-th value, softmax probability value of the numerical value of Equation 2 is defined as shown in FIG.
  • the detection result is whether the rotation angle of the driver's head exceeds a preset angle.
  • step S4 is executed to determine whether the detection result of step S3 is whether the rotation angle of the driver's head exceeds the preset angle. If so, it is confirmed that the driver has the first situation of inattention.
  • step S9 is executed to issue a warning Information, such as a voice warning message.
  • step S5 is first performed to intercept various regions of interest from the face region.
  • the first type of interest area is to directly obtain the embedded human face field of vision, and the corresponding human image information is the middle part of the rearview mirror and the left mirror of the driver.
  • the first region of interest does not need to perform face detection operations, and the image information can be directly used to determine the person's attention.
  • the second type of region of interest is to use a known face detection algorithm to detect and intercept the face frame as an input image.
  • the image area within the solid-line frame 22, the second region of interest can be called the face frame region.
  • the third type of region of interest is based on the second type of region of interest.
  • the detected face frame is expanded along the four directions up, down, left, and right, and additional information about the face is added, as shown in the solid line box In the image area within 23, the third type of interest area can be referred to as the face supplement area.
  • the third method of intercepting the region of interest adds additional auxiliary features, which can not only determine the position of the human head but also has good robustness.
  • the fourth type of interest area is based on the second type of interest area and only the upper half of the face is intercepted, such as the image area within the solid line frame 24 in Figure 4, so the fourth type of interest area is the eye area , Mainly used to judge the driver's attention by paying attention to eye information.
  • step S6 is executed to apply the second convolutional neural network to identify multiple regions of interest.
  • a pre-trained attention detection model is used to identify multiple regions of interest and classify the recognition results.
  • the attention of the detected object is divided into seven categories, namely looking at the left rearview mirror, looking straight ahead, looking at the car rearview mirror, looking at the right rearview mirror, looking at the dashboard, and looking at Control area and close your eyes.
  • the attention of the detected object can be divided into six categories, namely, gaze at the left side, gaze at the right side, gaze straight ahead, gaze above, gaze below, and close eyes.
  • the second convolutional neural network of this embodiment includes a first convolution layer 31 (convolution), a depthwise convolution layer 32 (depthwise convolution), a number of bottleneck residual layers (bottleneck residual), The second convolution layer 35, the linear global depthwise convolution layer 36 (linear GDConv), the linear convolution layer 37 (linear Conv), the fully connected layer 38 (fully connect), and the classification layer 39.
  • the dashed frame in Figure 5 represents a unit composed of multiple bottleneck residual layers.
  • multiple bottleneck residual layers include bottleneck residual layers 33, 34, etc.
  • the bottleneck residual layer repeats n i times, the expansion number of each layer is t i , and the step size is s i .
  • the parameters of the first convolutional layer 31 and the second convolutional layer 35 may be different, and the parameter of one convolutional layer is that the number of filters is m and the size of the convolution kernel is k 1 ⁇ k. 1.
  • the step size pixel is S 1
  • the parameters of the other convolution layer are the number of filters is n
  • the convolution kernel size is k 2 ⁇ k 2
  • the step size pixel is S 2 .
  • the depth-wise convolutional layer 32 uses the convolution kernel of the corresponding channel to perform convolution operation on each input channel. Assuming that the input dimension is m, the size is w ⁇ h, the number of corresponding convolutional layer filters is m, and the volume The size of the product kernel is k ⁇ k, and the depth-wise convolution operation is adopted. At this time, the output dimension is m, and the size is w' ⁇ h'.
  • Each bottleneck residual layer contains a convolution unit, a depth-wise convolution unit, and a residual unit.
  • the depth-wise convolution unit is used to receive the output of the convolution unit.
  • the residual unit is when the step size of the convolution unit is 1. , Realize the residual operation of the bottleneck residual layer. As shown in Figure 6, when the step size of the convolution unit is 1 and the number of channels is c'for the residual unit, the values on the corresponding input and output channels are added to realize the residual operation, that is, the input data is sequentially
  • the first convolution unit 41, the depthwise convolution unit 42, the second convolution unit 43, and the residual unit 44 are cascaded.
  • the residual unit 44 implements the accumulation calculation of input and output.
  • the input dimension is [w, h]
  • the output dimension is [w', h'] due to the input dimension and output
  • the dimensions of are not equal, and the residual calculation is not performed at this time.
  • the structural block diagram at this time is shown in FIG. 7.
  • the input sequentially passes through the first convolution unit 51, the depthwise convolution unit 52 and the second convolution unit 53 and then outputs.
  • the size of the convolution kernel of the linear global depthwise convolution layer 36 is the same as the input size, the number of filters is m, the size of the convolution kernel is k ⁇ k, the number of input channels is n, and the size is also k ⁇ k, after linear global depth-by-depth convolution operation, the number of output channels is m, and the size is 1 ⁇ 1.
  • the linear convolution layer 37 is a special form of convolution layer that uses a linear function as the activation function of the layer.
  • the calculation process of the fully connected layer 38 is a process of converting the two-dimensional feature matrix output by the upper layer into a one-dimensional feature vector, and the output dimension is the same as the number of classifications.
  • the calculation method of the classification layer 39 is the same as the calculation method of the classification layer 17 of the first convolutional neural network, and will not be repeated here.
  • step S7 is performed to perform fusion analysis on the recognition results of the four regions of interest respectively according to step S6 to obtain the results of the fusion analysis.
  • the face detection algorithm is first used to detect the face area and intercept the corresponding Realize the classification of the face frame; then expand the intercepted face frame in four directions to obtain a new image, realize the classification of the new image, and use the results of the two classifications to determine whether the driver has Distracted driving or whether you want to change lanes, when a number of continuous face images are divided into looking at the rearview mirror, looking at the dashboard, looking at the central control area, it can be judged that the driver is distracted driving, when a number of continuous face images are divided into When looking at the left rearview mirror and looking straight ahead, it can be judged that the driver wants to change lanes.
  • the face detection algorithm is first used to detect the face area, and the corresponding face frame image is intercepted to realize the face frame Then the upper half of the face frame is retained, that is, the information of the eye area is obtained, and the eye information is classified at the same time, and the fusion analysis of the two classification results can be used to determine whether the driver is fatigued or distracted When driving or whether you want to change lanes, when several consecutive face images are divided into closed eyes, it can be judged that the driver is driving fatigued.
  • this method can also be applied to other scenarios, such as the detection of students’ classroom attention, combining the recognition results of the second type of interest area and the fourth type of interest area, when several consecutive face images are divided into left or right Turning to the right and looking at the left or right side of the eye can be considered a situation of inattention.
  • step S7 is executed, and according to the analysis result of step S6, it is determined whether there is a second situation of inattention, such as fatigue driving, distracted driving, etc., if so, step S9 is executed to issue a warning message, otherwise, execute Step S10: Predict the driver's behavior based on the analysis result of step S7, such as wishing to change lanes to the left, etc.
  • the prediction result can be provided to other algorithms for use. For example, in the field of assisted driving, according to the result of step S7, it can be judged that when the driver wants to change lanes to the left, it can detect the situation of the vehicle behind the left side, such as whether there is a driving vehicle within a certain distance behind the left side, and so on. The driver issues instructions.
  • the second convolutional neural network can be replaced with a network architecture with strong computing power and lighter weight, such as ShuffleNet, or the bottleneck residual layer of the convolutional neural network can be reduced, and the model can be retrained.
  • a network architecture with strong computing power and lighter weight such as ShuffleNet
  • the computer device of this embodiment may be a smart device, such as a vehicle-mounted monitoring instrument with image processing capabilities.
  • the computer device includes a processor, a memory, and a computer program stored in the memory and running on the processor.
  • the processor executes The computer program implements each step of the attention detection method based on the cascaded neural network.
  • a computer program may be divided into one or more modules, and one or more modules are stored in a memory and executed by a processor to complete each module of the present invention.
  • One or more modules may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal device.
  • the processor referred to in the present invention may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and Application Specific Integrated Circuit (ASIC) , Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor, or the processor may also be any conventional processor, etc.
  • the processor is the control center of the terminal device, and various interfaces and lines are used to connect various parts of the entire terminal device.
  • the memory may be used to store computer programs and/or modules, and the processor implements various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory and calling data stored in the memory.
  • the memory may mainly include a storage program area and a storage data area, where the storage program area can store the operating system, at least one application program (such as sound playback function, image playback function, etc.) required by the function; the storage data area can be stored according to the mobile phone Use the created data (such as audio data, phone book, etc.).
  • the memory may include high-speed random access memory, and may also include non-volatile memory, such as hard disks, memory, plug-in hard disks, smart media cards (SMC), and secure digital (SD) cards. , Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the computer program stored in the above computer device is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer readable storage medium.
  • the present invention implements all or part of the processes in the above-mentioned embodiment methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program includes computer program code
  • the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • Computer-readable media may include: any entity or device capable of carrying computer program code, recording media, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access Memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal, software distribution medium, etc. It should be noted that the content contained in computer-readable media can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, computer-readable media does not include Electric carrier signal and telecommunications signal.
  • the present invention is not limited to the above-mentioned embodiments, such as changes to the method of dividing multiple regions of interest, or the specific process and result changes of the fusion analysis based on the recognition results of multiple regions of interest. These changes should also include Within the protection scope of the claims of the present invention.
  • the method of the present invention uses a cascaded convolutional neural network for recognition. Because the first-level convolutional neural network has low computational complexity, it can analyze simple scenes and determine whether the driver has the first inattention. In one case, this can reduce the computational complexity of the entire convolutional neural network, and the entire convolutional neural network model has a small volume and low computational complexity.
  • the method of the present invention first uses head posture information to preliminarily judge whether attention is concentrated, and then uses head posture and eye information to further detect the driver’s attention; before detecting the driver’s attention, four methods are used
  • the original image is processed to obtain four regions of interest, and the classification results are merged to analyze human behavior and intentions. Therefore, the cascaded convolutional neural network of the present invention has good generalization performance, low computational complexity, and is suitable for embedding ⁇ Type equipment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Ophthalmology & Optometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明提供一种基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质,该方法包括获取视频数据,并对多帧图像进行识别,提取多帧图像的人脸区域;并且,应用第一卷积神经网络对人脸区域进行识别,判断是否出现注意力不集中的第一情形;如确认没有出现注意力不集中的第一情形,则应用第二卷积神经网络对人脸区域进行识别,判断是否出现注意力不集中的第二情形;其中,第一卷积神经网络的计算复杂度小于第二卷积神经网络的计算复杂度。本发明还提供实现上述方法的计算机装置及计算机可读存储介质。

Description

基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质 技术领域
本发明涉及图像识别的处理领域,具体地,是基于级联神经网络的注意力检测方法以及实现这种方法的计算机装置、计算机可读存储介质。
背景技术
随着智能技术的发展,通过图像识别的方式来对人的注意力检测已经成为前沿的新兴技术。人的注意力检测一直以来是机器学习领域研究的重点和热点之一,其主要应用于安防、辅助驾驶等方面。由于真实环境存在大量不确定的因素,如白天、黑夜等不同的光照条件的影响,又例如人的头部姿态和表情具有多样性,还存在人种、性别和年龄等差异以及人穿戴眼镜等因素,因此,真实环境下检测人的注意力状态是相当具有挑战性的。
有鉴于此,如何提高人的注意力检测性能成为人工智能研究领域的热点,人们也为此提出了多种算法。例如,远程眼球跟踪是一种检测人的注意力的经典算法,在室外环境下该方法需要依赖于近红外照明设备来产生明亮的瞳孔效果,进而捕捉眼球信息,但是,这种方法所使用的近红外照明设备受到震动和颠簸的影响,很容易被损坏,需长期进行维护,成本比较高。
因此,一些研究人员提出依靠头部姿态和眼部信息实现人的注意力的检测。一种方案是将头部姿态、眼部特征以及汽车的几何特征进行融合,进而对人眼关注的区域进行分类,实现人的注意力的检测,该方法取得了很好的效果。而另一种头部姿态和眼部特征相结合的方法是对人眼关注的区域进行分类。但上述两种方法有两个主要问题,第一个是需进行人脸检测、人脸标定、眼球检测以及特征提取等一系列复杂操作,如果该算法中的某一个子模块性能不佳,势必会影响整体效果;第二个是在进行特征提取时,采用传统机器学习方法与传统特征提取的算法泛化能力较差,例如在摄像头拍摄角度、外界光照条件以及目标的位置发生变化时,该方法性能急剧下降。
因此,一些研究人员提出一种基于卷积神经网络的注意力估计方法,该方法能够自动从数据样本中学习头部姿态特征和眼部特征信息,无需手动设计特征提取算法,鲁棒性好,但采用的卷积神经网络的模型体积大,计算复杂度高,不适用于嵌入式设备,导致这种方法的使用受到很大的限制。
发明概述
技术问题
本发明的主要目的是提供一种计算复杂度小且计算性能好的基于级联神经网络的注意力检测方法。
本发明的另一目的是提供一种实现上述基于级联神经网络的注意力检测方法的计算机装置。
本发明的再一目的是提供一种实现上述基于级联神经网络的注意力检测方法的计算机可读存储介质。
技术解决手段
为实现本发明的主要目的,本发明提供的基于级联神经网络的注意力检测方法包括获取视频数据,并对多帧图像进行识别,提取多帧图像的人脸区域;并且,应用第一卷积神经网络对人脸区域进行识别,判断是否出现注意力不集中的第一情形;如确认没有出现注意力不集中的第一情形,则应用第二卷积神经网络对人脸区域进行识别,判断是否出现注意力不集中的第二情形;其中,第一卷积神经网络的计算复杂度小于第二卷积神经网络的计算复杂度。
一个优选的方案是,应用第二卷积神经网络对人脸区域进行识别包括:从人脸区域划截取出多种感兴趣区域,根据二种以上的感兴趣区域的识别结果判断是否出现注意力不集中的第二情形。
进一步的方案是,多种感兴趣区域包括人脸框区域以及人脸补充区域;根据二种以上的感兴趣区域的识别结果判断是否出现注意力不集中的第二情形包括:根据人脸框区域以及人脸补充区域的图像识别结果判断是否出现注意力不集中的第二情形。
更进一步的方案是,多种感兴趣区域包括人脸框区域以及眼部区域;根据二种以上的感兴趣区域的识别结果判断是否出现注意力不集中的第二情形包括:根 据人脸框区域以及眼部区域的图像识别结果判断是否出现注意力不集中的第二情形。
更进一步的方案是,判断是否出现注意力不集中的第一情形包括:应用第一卷积神经网络对人脸区域进行识别,判断头部向预设方向的转动角度是否大于预设角度,如是,确认出现注意力不集中的第一情形。
更进一步的方案是,第二卷积神经网络包括依次级联的第一卷积层、逐深度卷积层、多个瓶颈残差层、第二卷积层、线性全局逐深度卷积层、线性卷积层、全连接层以及分类层。
更进一步的方案是,瓶颈残差层包括卷积单元以及接收卷积单元输出的逐深度卷积单元,还设置有残差单元,残差单元在卷积单元的步长为1时,实现瓶颈残差层的残差运算。
更进一步的方案是,获取视频数据后对多帧图像进行识别包括:从视频数据的每连续预设帧数的图像中选取一帧图像进行识别。
为实现上是的另一目的,本发明提供的计算机装置包括处理器以及存储器,存储器存储有计算机程序,计算机程序被处理器执行时实现上述基于级联神经网络的注意力检测方法的各个步骤。
为实现上是的再一目的,本发明提供计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述基于级联神经网络的注意力检测方法的各个步骤。
问题的解决方案
发明的有益效果
有益效果
应用本发明的方案,在提取多帧图像的人脸区域以后,首先通过第一级卷积神经网络进行识别,并判断是否出现注意力不集中的第一情形,只有确认没有出现注意力不集中的第一情形时,才使用第二卷积神经网络进行识别,判断是否出现注意力不集中的第二情形。这样,可以避免针对所有情况都使用计算程度比较复杂的卷积神经网络进行计算,从而简化注意力检测的整体复杂程度。
此外,本发明的方案应用第二卷积神经网络对是否出现注意力不集中的第二情 形时,首先将人脸区域划分为多种感兴趣区域,并且对多种感兴趣区域进行单独识别,然后结合多种感兴趣区域的识别结合进行融合分析,从而判断是否出现注意力不集中第二情形。这样,可以提高分析的准确性,对注意力不集中的识别效果较好。
具体的,分别对人脸框区域以及人脸补充区域进行识别,可以识别出人员的多种注意力情况,例如驾驶员注视左后视镜、注视正前方、注视车内后视镜、注视右后视镜、注视仪表盘、注视中控区以及闭眼睛等情况,结合人脸框区域以及人脸补充区域的识别结果,可以判断驾驶员有无分心驾驶或者是否想变道行驶,当若干连续人脸图像被分成注视车内后视镜、注视仪表盘、注视中控区时可判断驾驶员分心驾驶,当若干连续人脸图像被分成注视左后视镜以及注视正前方时可判断驾驶员想变道行驶。
而利用人脸框区域以及眼部区域的识别结果,可以判断驾驶员有无疲劳驾驶、分心驾驶或者是否想变道行驶,当连续若干人脸图像被分成闭眼睛时,可判断驾驶员疲劳驾驶,当若干连续人脸图像被分成偏左或者偏右,眼睛注视左侧或者右侧,可认为出现注意力不集中的第二情形。
判断驾驶员是否出现注意力不集中的第一情形,只需要判断驾驶员的头部转动方向是否大于预设角度,例如向上、向下、向左或者向右转动超过60°,即可以认为出现注意力不集中的第一情形,这样,第一卷积神经网络的设计非常简单,运算量较小。一旦被判断为出现注意力不集中的第一情形,不需要执行第二情形的判断,可以节省注意力不集中判断的整体计算量。
而第二卷积神经网络的计算量较为复杂,可以精确识别出驾驶员是否出现其他注意力不集中的情形,且判断更为准确。
另外,由于在获取的视频数据中,连续帧的画面很相近,如果对每一帧都进行识别,将导致计算量非常巨大,而且进行大量的相似计算,计算结果也基本相同,因此,设定每连续多帧图像中,只选取其中一帧图像进行识别,可以大大减小注意力不集中识别的计算量,但又能够保证识别结果的准确性。优选的,可以从每六帧图像中选取一帧图像进行识别。
对附图的简要说明
附图说明
图1是本发明基于级联神经网络的注意力检测方法实施例的流程图。
图2是本发明基于级联神经网络的注意力检测方法实施例中数值的softmax概率值的计算公式。
图3是本发明基于级联神经网络的注意力检测方法实施例中第一卷积神经网络的结构框图。
图4是应用本发明基于级联神经网络的注意力检测方法实施例对图像进行识别的四种感兴趣区域的示意图。
图5是本发明基于级联神经网络的注意力检测方法实施例中第二卷积神经网络的结构框图。
图6是本发明基于级联神经网络的注意力检测方法实施例中第二卷积神经网络的瓶颈残差层步长为1时的结构框图。
图7是本发明基于级联神经网络的注意力检测方法实施例中第二卷积神经网络的瓶颈残差层步长为2时的结构框图。
以下结合附图及实施例对本发明作进一步说明。
发明实施例
本发明的实施方式
本发明的基于级联神经网络的注意力检测方法应用在智能设备上,优选的,智能设备设置有摄像装置,如摄像头等,智能设备利用摄像装置所获取的视频数据进行图像分析,进而判断特定的人员是否出现注意力不集中的情形。优选的,智能设备设置有处理器以及存储器,存储器上存储有计算机程序,处理器通过执行该计算机程序实现基于级联神经网络的注意力检测方法。
基于级联神经网络的注意力检测方法实施例:
本实施例主要是基于头部姿态和眼部信息,并应用级联卷积神经网络对特定人员的注意力进行,整个方法主要包括视频采集、图像处理以及注意力检测三个步骤。
在视频采集步骤中,利用摄像装置拍摄视频数据,本实施例可以针对不同场景(包括不同拍摄角度、外界光照条件、目标的位置等)的视频数据进行识别, 因此,摄像装置可以获取多种不同姿态下的目标视频数据。在图像处理步骤中,从视频数据中获取多帧图像,利用人脸检测算法检测帧,并截取人脸区域的图像。在注意力检测步骤中,先采用计算复杂度低的第一卷积神经网络判断检测对象的头部姿态,进而实现注意力检测的初级判断;然后,对检测到的感兴趣的人脸区域进一步进行截取和扩充,采用计算复杂度高的第二卷积神经网络提取头部姿态特征和眼部特征信息,通过分析人的注视方向,进而判断人的行为。本实施例所采用的级联卷积神经网络泛化性能好、计算复杂度低,而且适用于嵌入式设备。
下面结合图1对本实施例的具体工作方法进行说明。首先,执行步骤S1,获取视频数据,即由智能设备的摄像装置获取连续的视频数据。具体的,智能设备可以是设置在车内的用于检测驾驶员注意力是否不集中的设备,摄像装置可以设置在驾驶位的正前方、侧前方等位置,例如设置在驾驶位遮阳板的下方、中控台上方等。摄像装置可以在汽车发动机启动后开始录制视频,并且将获取的连续的视频数据传输至处理器,由处理器对视频数据进行处理。
然后,执行步骤S2,对图像进行识别,提取图像中的人脸区域。由于步骤S1获取的视频数据包括连续的多帧图像,步骤S2是对接收到的多帧图像进行识别。但由于连续的多帧图像的画面很相近,如果对每一帧图像均进行识别,不但导致计算量非常巨大,而且相邻的多帧图像的识别结果往往是相同的,因此,本实施例设定每连续的多帧图像中,可以选取其中一帧图像进行识别,例如设定从每六帧或者八帧图像中选取一帧图像进行识别,即对该帧图像进行人脸检测,并截取检测到的人脸区域。具体的,人脸检测的过程是假设在输入图像中存在一个或者多个人脸的情况下,确定图像中全部人脸的位置、大小和姿势的过程,该过程可以采用目前公知的人脸检测算法实现,在此不再赘述。
然后,对提取的人脸区域进行注意力检测,具体的,执行步骤S3至步骤S10。本实施例通过一种级联的卷积神经网络来对图像进行识别,具体的,该级联的卷积神经网络包括第一神经网络以及第二神经网络,其中,第一卷积神经网络用于判断检测对象的头部姿态,进而实现注意力的初级判断,即判断是否出现注意力不集中的第一情形。第二卷积神经网络用于提取头部姿态特征、眼部特 征信息,通过分析人的注视方向,进而判断人的行为,实现注意力检测。
具体的,先执行步骤S3,应用第一卷积神经网络对人脸区域进行识别。具体的,本实施例利用预先训练好的检测模型判断检测对象的注意力集中状态,例如可以设定检测对象的头部转动超过一定的角度时,属于注意力不集中的第一情形,如当驾驶员处于左转头(大于60度)、右转头(大于60度)、上抬头(大于60度)、下低头(大于60度)时均属于注意力不集中。
因此,第一卷积神经网络的识别任务相对简单、容易区分,第一卷积神经网络是一个体积小、计算复杂度低的卷积神经网络。参见图3,本实施例的第一卷积神经网络包括若干卷积层(convolution)、若干池化层(maxpool)、一层全连接层16(fully connect)和一层分类层17,其中,每一个池化层位于相邻的两个卷积层之间,如图3所示的,虚线框11中包括多个卷积层与池化层组合而成的单元,每一个单元内包括一个卷积层以及一个池化层,在最后一个池化层的输出被输入至卷积层15,因此卷积层的数量比池化层的数量多1层。
本实施例中,多个卷积层的参数有两种,其中一种卷积层的参数滤波器个数为m、卷积核大小为k 1×k 1、步长像素为S 1,另一种卷积层的参数是滤波器个数为n、卷积核大小为k 2×k 2、步长像素为S 2。每一层池化层是对上一层卷积层的输出进行采样。全连接层16用于实现将卷积层15输出的二维特征矩阵变成一维特征向量的过程。分类层17作为第一卷积神经网络的最后一层,采用softmax函数,将多个神经元的输出映射到(0,1)区间内,其可以被理解成概率分布。假设概率分布向量为P,P i表示P中第i个数值,该数值的softmax概率值定义如图2的公式所示。
在P中寻找最大值,将概率最大的i所对应的类别作为检测结果。该检测结果为驾驶员的头部转动角度是否超过预设的角度。
然后,执行步骤S4,判断步骤S3的检测结果为是否驾驶员的头部转动角度超过预设角度,如是,则确认驾驶员出现注意力不集中的第一情形,此时执行步骤S9,发出警示信息,例如发出语音警示信息。
如果确认没有出现注意力不集中的第一情形,则应用第二卷积神经网络判断是否出现注意力不集中的第二情形。具体的,先执行步骤S5,将人脸区域中截取 出多种感兴趣区域。参见图4,以驾驶员坐在在驾驶位置为例,第一种感兴趣区域是直接得到嵌入式人脸视野,其对应的人的图像信息是后视镜和驾驶员左侧镜子中间部分,虚线框21内的图像部分,第一感兴趣区域无需进行人脸检测操作,可直接利用该图像信息判断人的注意力。第二种感兴趣区域是利用已知的人脸检测算法检测并截取人脸框作为输入图像,实线框22内的图像区域,第二感兴趣区域可以被称为人脸框区域。第三种感兴趣区域是在第二种感兴趣区域的基础之上,对检测出的人脸框沿着上下左右四个方向进行扩充,添加了人脸额外部分信息,如图中实线框23内的图像区域,第三种感兴趣区域可以被称为人脸补充区域。第三种感兴趣区域的截取方法添加了额外的辅助特征,不仅能够确定人的头部位置而且具有很好的鲁棒性。第四种感兴趣区域是在第二种感兴趣区域的基础上只截取人脸的上半部分,如图4的实线框24内的图像区域,因此第四种感兴趣区域是眼部区域,主要用于通过关注眼部信息来判断驾驶员的注意力情况。
然后,执行步骤S6,应用第二卷积神经网络对多种感兴趣区域进行识别。例如,利用预先训练好的注意力检测模型对多种感兴趣区域进行识别并对识别结果进行分类。以驾驶员坐在驾驶位置为例,检测对象的注意力分为七类,分别是注视左后视镜、注视正前方、注视车内后视镜、注视右后视镜、注视仪表盘、注视中控区以及闭眼睛。在其它应用场景中,可以将检测对象的注意力分为六类,分别是注视左侧、注视右侧、注视正前方、注视上方、注视下方以及闭眼睛。
由于步骤S6的识别、分类任务比较复杂,尤其相连区域如正前方和仪表盘区分难度比较大,因此,本实施例使用一个学习能力强、运算速度快的卷积神经网络进行识别,即应用第二卷积神经网络实现上述识别工作。参见图5,本实施例的第二卷积神经网络包括依次级联的第一卷积层31(convolution)、逐深度卷积层32(depthwise convolution)、若干瓶颈残差层(bottleneck residual)、第二卷积层35、线性全局逐深度卷积层36(linear GDConv)、线性卷积层37(linear Conv)、全连接层38(fully connect)以及分类层39。其中,图5的虚线框表示多个瓶颈残差层组成的单元,例如多个瓶颈残差层包括瓶颈残差层33、34等,对于第i个瓶颈残差 层,瓶颈残差层重复n i次,每一层通道扩充数为t i,步长为s i
本实施例中,第一卷积层31、第二卷积层35的参数可以是不相同的,其中一个卷积层的参数是滤波器个数为m、卷积核大小为k 1×k 1、步长像素为S 1,另一个卷积层的参数是滤波器个数为n、卷积核大小为k 2×k 2、步长像素为S 2
逐深度卷积层32是对输入的每一个通道单独用对应通道的卷积核进行卷积操作,假设输入维度是m,尺寸是w×h,对应卷积层滤波器个数是m,卷积核大小是k×k,采用逐深度卷积操作,此时输出维度为m,尺寸大小为w’×h’。
每一个瓶颈残差层包含卷积单元、逐深度卷积单元和残差单元,其中,逐深度卷积单元用于接收卷积单元的输出,残差单元在卷积单元的步长为1时,实现瓶颈残差层的残差运算。如图6所示,残差单元在卷积单元的步长为1、通道数为c’时,此时输入和输出的对应通道上的值相加实现残差运算,即输入的数据经过依次级联的第一卷积单元41、逐深度卷积单元42、第二卷积单元43以及残差单元44,在残差单元44上实现输入与输出的累加计算。
残差单元在卷积单元的步长为2、通道数为c’时,此时输入的维度为[w,h],输出的维度为[w’,h’],由于输入的维度与输出的维度不相等,此时不进行残差运算,此时的结构框图如图7所示,输入依次经过第一卷积单元51、逐深度卷积单元52以及第二卷积单元53后输出。
线性全局逐深度卷积层36的卷积核大小与输入尺寸的大小相同,滤波器个数为m,卷积核大小为k×k,此时输入通道数为n,尺寸大小也为k×k,经过线性全局逐深度卷积运算,输出通道数为m,尺寸大小为1×1。
线性卷积层37是一种特殊形式的卷积层,其采用线性函数作为该层的激活函数。全连接层38的计算过程是将上层输出的二维特征矩阵变成一维特征向量的过程,输出的维度与分类个数相同。分类层39的计算方法与第一卷积神经网络的分类层17的计算方法相同,在此不再赘述。
然后,执行步骤S7,根据步骤S6分别对四种感兴趣区域的识别结果进行融合分析,得到融合分析的结果。具体的,以驾驶员坐在驾驶位置为例,利用第二种感兴趣区域以及第三种感兴趣区域的识别结果进行融合分析时,首先采用人脸检测算法检测出人脸区域,并截取对应的人脸框图像,实现人脸框的分类;然 后对截取的人脸框沿着四个方向上的扩充得到新图像,实现新图像的分类,利用二者分类的结果可以判断驾驶员有无分心驾驶或者是否想变道行驶,当若干连续人脸图像被分成注视车内后视镜、注视仪表盘、注视中控区时可判断驾驶员分心驾驶,当若干连续人脸图像被分成注视左后视镜以及注视正前方时可判断驾驶员想变道行驶。
又例如,利用第二种感兴趣区域以及第四种感兴趣区域的识别结果进行融合分析时,先采用人脸检测算法检测出人脸区域,并截取对应的人脸框图像,实现人脸框的分类,然后保留人脸框的上半部分,即获取眼部区域的信息,同时对眼部信息进行分类,利用二者的分类结果进行融合分析,可以判断驾驶员有无疲劳驾驶、分心驾驶或者是否想变道行驶,当连续若干人脸图像被分成闭眼睛时,可判断驾驶员疲劳驾驶。可选的,该方法还可应用在其它场景,例如学生课堂注意力的检测,结合第二种感兴趣区域以及第四种感兴趣区域的识别结果,当若干连续人脸图像被分成偏左或者偏右、眼睛注视左侧或者右侧,可认为是注意力不集中的情形。
然后,执行步骤S7,根据步骤S6的分析结果,判断是否出现注意力不集中的第二情形,例如出现疲劳驾驶、分心驾驶等情况,如是,则执行步骤S9,发出警示信息,否则,执行步骤S10,根据步骤S7的分析结果对驾驶员的行为进行预测,例如希望向左变道行驶等,该预测结果可以提供给其他算法使用。例如,在辅助驾驶领域,可以根据步骤S7的结果,判断驾驶员希望向左变道行驶时,可以检测左侧后方来车情况,如左侧后方一定距离内是否有行驶的车辆等,从而给驾驶员发出指示信息。
可选的,第二卷积神经网络采可以被替换为计算能力强更加轻量的网络架构,如ShuffleNet,也可以减少卷积神经网络的瓶颈残差层,重新对该模型进行训练。
计算机装置实施例:
本实施例的计算机装置可以是智能设备,例如具有图像处理能力的车载监视仪器等,该计算机装置包括有处理器、存储器以及存储在存储器中并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述基于级联神经网络的 注意力检测方法的各个步骤。
例如,计算机程序可以被分割成一个或多个模块,一个或者多个模块被存储在存储器中,并由处理器执行,以完成本发明的各个模块。一个或多个模块可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述计算机程序在终端设备中的执行过程。
本发明所称处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,处理器是终端设备的控制中心,利用各种接口和线路连接整个终端设备的各个部分。
存储器可用于存储计算机程序和/或模块,处理器通过运行或执行存储在存储器内的计算机程序和/或模块,以及调用存储在存储器内的数据,实现终端设备的各种功能。存储器可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
计算机可读存储介质:
上述计算机装置所存储的计算机程序如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述基于级联神经网络的注意力检测方法的各个步骤。
其中,计算机程序包括计算机程序代码,计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质可以包括:能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
最后需要强调的是,本发明不限于上述实施方式,如对多种感兴趣区域划分方法的变化,或者根据多种感兴趣区域识别结果融合分析具体过程及结果的改变等,这些改变也应该包括在本发明权利要求的保护范围内。
工业应用性
本发明的方法使用级联的卷积神经网络进行识别,由于第一级卷积神经网络计算复杂度较低,可以对简单的场景进行分析,并判断出驾驶员是否出现注意力不集中的第一情形,这样可以减小整个卷积神经网络的运算量,整个卷积神经网络模型体积小、计算复杂度低。
此外,本发明的方法先利用头部姿态信息初步判断是否注意力集中,然后利用头部姿态和眼部信息进一步来检测驾驶员的注意力;在检测驾驶员的注意力之前,采用四种方法对原始图像进行处理得到四种感兴趣区域,将分类结果进行融合,分析人的行为与意图,因此,本发明的级联卷积神经网络泛化性能好、计算复杂度低,而且适用于嵌入式设备。

Claims (10)

  1. 基于级联神经网络的注意力检测方法,包括:
    获取视频数据,并对多帧图像进行识别,提取多帧图像的人脸区域;
    其特征在于:
    应用第一卷积神经网络对所述人脸区域进行识别,判断是否出现注意力不集中的第一情形;
    如确认没有出现注意力不集中的第一情形,则应用第二卷积神经网络对所述人脸区域进行识别,判断是否出现注意力不集中的第二情形;
    其中,所述第一卷积神经网络的计算复杂度小于所述第二卷积神经网络的计算复杂度。
  2. 根据权利要求1所述的基于级联神经网络的注意力检测方法,其特征在于:
    应用所述第二卷积神经网络对所述人脸区域进行识别包括:从所述人脸区域划截取出多种感兴趣区域,根据二种以上的所述感兴趣区域的识别结果判断是否出现注意力不集中的第二情形。
  3. 根据权利要求2述的基于级联神经网络的注意力检测方法,其特征在于:
    多种所述感兴趣区域包括人脸框区域以及人脸补充区域;
    根据二种以上的所述感兴趣区域的识别结果判断是否出现注意力不集中的第二情形包括:根据所述人脸框区域以及所述人脸补充区域的图像识别结果判断是否出现注意力不集中的第二情形。
  4. 根据权利要求2所述的基于级联神经网络的注意力检测方法,其特征在于:
    多种所述感兴趣区域包括人脸框区域以及眼部区域;
    根据二种以上的所述感兴趣区域的识别结果判断是否出现注意力不集中的第二情形包括:根据所述人脸框区域以及所述眼部区域 的图像识别结果判断是否出现注意力不集中的第二情形。
  5. 根据权利要求1至4任一项述的基于级联神经网络的注意力检测方法,其特征在于:
    判断是否出现注意力不集中的第一情形包括:应用第一卷积神经网络对所述人脸区域进行识别,判断头部向预设方向的转动角度是否大于预设角度,如是,确认出现注意力不集中的第一情形。
  6. 根据权利要求1至4任一项所述的基于级联神经网络的注意力检测方法,其特征在于:
    所述第二卷积神经网络包括依次级联的第一卷积层、逐深度卷积层、多个瓶颈残差层、第二卷积层、线性全局逐深度卷积层、线性卷积层、全连接层以及分类层。
  7. 根据权利要求6所述的基于级联神经网络的注意力检测方法,其特征在于:
    所述瓶颈残差层包括卷积单元以及接收所述卷积单元输出的逐深度卷积单元,还设置有残差单元,所述残差单元在所述卷积单元的步长为1时,实现所述瓶颈残差层的残差运算。
  8. 根据权利要求1至4任一项所述的基于级联神经网络的注意力检测方法,其特征在于:
    获取所述视频数据后对多帧所述图像进行识别包括:从所述视频数据的每连续预设帧数的图像中选取一帧所述图像进行识别。
  9. 计算机装置,其特征在于,包括处理器以及存储器,所述存储器存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8中任意一项所述基于级联神经网络的注意力检测方法的各个步骤。
  10. 计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现如权利要求1至8中任意一项所述基于级联神经网络的注意力检测方法的各个步骤。
PCT/CN2019/098407 2019-07-30 2019-07-30 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质 WO2021016873A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2019/098407 WO2021016873A1 (zh) 2019-07-30 2019-07-30 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质
CN201980001324.8A CN110678873A (zh) 2019-07-30 2019-07-30 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质
US17/631,083 US20220277558A1 (en) 2019-07-30 2019-07-30 Cascaded Neural Network-Based Attention Detection Method, Computer Device, And Computer-Readable Storage Medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/098407 WO2021016873A1 (zh) 2019-07-30 2019-07-30 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021016873A1 true WO2021016873A1 (zh) 2021-02-04

Family

ID=69088299

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/098407 WO2021016873A1 (zh) 2019-07-30 2019-07-30 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质

Country Status (3)

Country Link
US (1) US20220277558A1 (zh)
CN (1) CN110678873A (zh)
WO (1) WO2021016873A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076884A (zh) * 2021-04-08 2021-07-06 华南理工大学 一种从近红外光到可见光的跨模态眼睛状态识别方法
CN113408466A (zh) * 2021-06-30 2021-09-17 东风越野车有限公司 车辆驾驶员不良驾驶行为检测方法及设备
CN114067440A (zh) * 2022-01-13 2022-02-18 深圳佑驾创新科技有限公司 级联神经网络模型的行人检测方法、装置、设备及介质
CN114581438A (zh) * 2022-04-15 2022-06-03 深圳市海清视讯科技有限公司 Mri图像分类方法、装置、电子设备及存储介质
CN117197415A (zh) * 2023-11-08 2023-12-08 四川泓宝润业工程技术有限公司 天然气长输管道巡检区域目标检测方法、装置及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310705A (zh) * 2020-02-28 2020-06-19 深圳壹账通智能科技有限公司 图像识别方法、装置、计算机设备及存储介质
CN111563468B (zh) * 2020-05-13 2023-04-07 电子科技大学 一种基于神经网络注意力的驾驶员异常行为检测方法
CN111739027B (zh) * 2020-07-24 2024-04-26 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备及可读存储介质
US20230290134A1 (en) * 2020-09-25 2023-09-14 Intel Corporation Method and system of multiple facial attributes recognition using highly efficient neural networks
CN112580458B (zh) * 2020-12-10 2023-06-20 中国地质大学(武汉) 人脸表情识别方法、装置、设备及存储介质
CN114112984B (zh) * 2021-10-25 2022-09-20 上海布眼人工智能科技有限公司 一种基于自注意力的织物纤维成分定性方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4967559B2 (ja) * 2006-09-19 2012-07-04 株式会社豊田中央研究所 居眠り運転防止装置及びプログラム
CN108664947A (zh) * 2018-05-21 2018-10-16 五邑大学 一种基于表情识别的疲劳驾驶预警方法
CN109598174A (zh) * 2017-09-29 2019-04-09 厦门歌乐电子企业有限公司 驾驶员状态的检测方法、及其装置和系统
CN109740477A (zh) * 2018-12-26 2019-05-10 联创汽车电子有限公司 驾驶员疲劳检测系统及其疲劳检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4967559B2 (ja) * 2006-09-19 2012-07-04 株式会社豊田中央研究所 居眠り運転防止装置及びプログラム
CN109598174A (zh) * 2017-09-29 2019-04-09 厦门歌乐电子企业有限公司 驾驶员状态的检测方法、及其装置和系统
CN108664947A (zh) * 2018-05-21 2018-10-16 五邑大学 一种基于表情识别的疲劳驾驶预警方法
CN109740477A (zh) * 2018-12-26 2019-05-10 联创汽车电子有限公司 驾驶员疲劳检测系统及其疲劳检测方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076884A (zh) * 2021-04-08 2021-07-06 华南理工大学 一种从近红外光到可见光的跨模态眼睛状态识别方法
CN113408466A (zh) * 2021-06-30 2021-09-17 东风越野车有限公司 车辆驾驶员不良驾驶行为检测方法及设备
CN114067440A (zh) * 2022-01-13 2022-02-18 深圳佑驾创新科技有限公司 级联神经网络模型的行人检测方法、装置、设备及介质
CN114581438A (zh) * 2022-04-15 2022-06-03 深圳市海清视讯科技有限公司 Mri图像分类方法、装置、电子设备及存储介质
CN117197415A (zh) * 2023-11-08 2023-12-08 四川泓宝润业工程技术有限公司 天然气长输管道巡检区域目标检测方法、装置及存储介质
CN117197415B (zh) * 2023-11-08 2024-01-30 四川泓宝润业工程技术有限公司 天然气长输管道巡检区域目标检测方法、装置及存储介质

Also Published As

Publication number Publication date
CN110678873A (zh) 2020-01-10
US20220277558A1 (en) 2022-09-01

Similar Documents

Publication Publication Date Title
WO2021016873A1 (zh) 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质
US20210073953A1 (en) Method for applying bokeh effect to image and recording medium
CN109584507B (zh) 驾驶行为监测方法、装置、系统、交通工具和存储介质
US10095927B2 (en) Quality metrics for biometric authentication
CN108038466B (zh) 基于卷积神经网络的多通道人眼闭合识别方法
Rangesh et al. Driver gaze estimation in the real world: Overcoming the eyeglass challenge
US20220058407A1 (en) Neural Network For Head Pose And Gaze Estimation Using Photorealistic Synthetic Data
JP7305869B2 (ja) 歩行者検出方法及び装置、コンピュータ読み取り可能な記憶媒体並びにチップ
CN111062292B (zh) 一种疲劳驾驶检测装置与方法
JP2022078338A (ja) 顔生体検出方法、装置、電子機器及び記憶媒体
CN111860316B (zh) 一种驾驶行为的识别方法、装置及存储介质
CN116758117B (zh) 可见光与红外图像下的目标跟踪方法及系统
Sadiq et al. FD-YOLOv5: a fuzzy image enhancement based robust object detection model for safety helmet detection
CN113283338A (zh) 驾驶员驾驶行为识别方法、装置、设备及可读存储介质
CN111325107A (zh) 检测模型训练方法、装置、电子设备和可读存储介质
US20120189161A1 (en) Visual attention apparatus and control method based on mind awareness and display apparatus using the visual attention apparatus
Panicker et al. Open-eye detection using iris–sclera pattern analysis for driver drowsiness detection
Ling et al. Driver eye location and state estimation based on a robust model and data augmentation
CN111178181B (zh) 交通场景分割方法及相关装置
Srivastava et al. Driver’s Face Detection in Poor Illumination for ADAS Applications
Zhou Eye-Blink Detection under Low-Light Conditions Based on Zero-DCE
Horak Fatigue features based on eye tracking for driver inattention system
CN115995142A (zh) 基于可穿戴装置的驾驶训练提醒方法及可穿戴装置
CN113537176A (zh) 一种驾驶员疲劳状态的确定方法、装置以及设备
Dahiphale et al. Real-Time Computer Vision System for Continuous Face Detection and Tracking

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19939561

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19939561

Country of ref document: EP

Kind code of ref document: A1