CN111723602A

CN111723602A - Driver behavior recognition method, device, equipment and storage medium

Info

Publication number: CN111723602A
Application number: CN201910207840.8A
Authority: CN
Inventors: 乔梁
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2020-09-29
Anticipated expiration: 2039-03-19
Also published as: CN111723602B

Abstract

The application discloses a method, a device, equipment and a storage medium for identifying the behavior of a driver, and belongs to the technical field of intelligent traffic. The method comprises the following steps: the method comprises the steps of obtaining a target image, obtaining a first area image from the target image, wherein the first area image comprises a preset behavior of a driver, carrying out reduction processing on the peripheral area of the first area image according to a first proportion threshold value in the target image to obtain a second area image, carrying out expansion processing on the peripheral area of the first area image according to a second proportion threshold value to obtain a third area image, and identifying the behavior of the driver based on the first area image, the second area image and the third area image. After the first area image is subjected to internal truncation and external expansion, the obtained second area image and third area image can comprise more information of peripheral objects related to the preset behaviors, so that the behavior of the driver is recognized based on the three area images, and the recognition accuracy can be improved.

Description

Driver behavior recognition method, device, equipment and storage medium

Technical Field

The present application relates to the field of intelligent transportation technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a driver's behavior.

Background

With the rapid development of intelligent transportation technology, people pay more and more attention to how to solve the driving safety problem caused by driver distraction through an intelligent means. In some scenes, the face part of the driver can be shot, the behavior of the driver is analyzed based on the shot image, and when the illegal behavior is determined to exist, an alarm prompt is given, so that the driver is reminded to drive safely.

At present, the network model after deep learning can be adopted to detect the behavior of the driver. That is, the network model to be trained may be deeply trained in advance based on some training samples, which may include the image samples and the location information and behavior categories of the areas where behaviors are located in the image samples, so that the trained network model may recognize some gestures, such as making a call, smoking a cigarette, and the like, based on the captured images.

However, since the driver may have a gesture similar to an illegal behavior during driving, such as touching the ear, touching the chin, covering the mouth, etc., in this case, the above-provided behavior recognition method for the driver may cause erroneous judgment.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for identifying the behavior of a driver, and can solve the problem that misjudgment is possibly caused by the method for identifying the behavior of the driver in the related technology. The technical scheme is as follows:

in a first aspect, a method for identifying driver behavior is provided, the method comprising:

acquiring a target image, wherein the target image comprises a face of a driver;

acquiring a first area image from the target image, wherein the first area image comprises a preset behavior of the driver, and the preset behavior refers to a behavior with similarity to an illegal behavior larger than a preset threshold;

in the target image, carrying out reduction processing on the peripheral area of the first area image according to a first proportion threshold value to obtain a second area image, and carrying out expansion processing on the peripheral area of the first area image according to a second proportion threshold value to obtain a third area image;

recognizing the behavior of the driver based on the first area image, the second area image, and the third area image.

Optionally, before recognizing the behavior of the driver based on the first area image, the second area image, and the third area image, the method further includes:

adjusting the first area image, the second area image and the third area image to be area images of the same size;

accordingly, the detecting the behavior of the driver based on the first area image, the second area image, and the third area image includes:

calling a target network model, wherein the target network model is used for determining the behavior category of any behavior based on a group of regional images corresponding to the behavior;

and identifying the behavior of the driver through the target network model based on the first area image, the second area image and the third area image after the size adjustment.

Optionally, the target network model includes an input layer, an intermediate layer, a splicing layer, a full connection layer, and an output layer;

the identifying the behavior of the driver through the target network model based on the resized first area image, second area image, and third area image includes:

performing channel superposition processing on image data in the first area image, the second area image and the third area image after the size adjustment through the input layer based on the resolution and the number of channels of the area image after the size adjustment to obtain a first feature map;

performing convolution sampling processing on the first characteristic diagram through the intermediate layer to obtain a plurality of second characteristic diagrams, wherein the second characteristic diagrams are the same in size and different in channel number;

performing channel superposition processing on the plurality of second feature maps through the splicing layer, and performing feature fusion on the feature maps after channel superposition through the convolution layer in the deep network layer to obtain a third feature map;

determining, by the fully-connected layer, behavior of the driver based on the third feature map;

outputting the behavior through the output layer.

Optionally, the intermediate layer includes N groups of convolutional layers and N groups of sampling layers, and each group of convolutional layers corresponds to each group of sampling layers one to one;

the obtaining a plurality of second feature maps by performing convolution sampling processing on the first feature map through the intermediate layer includes:

determining the first feature map as a target feature map by taking i as 1; performing convolution processing on the target characteristic diagram through the ith group of convolution layers, and performing sampling processing on the obtained characteristic diagram through the ith group of sampling layers according to two different multiples to obtain 2ⁱMultiple feature map and ith second feature map of reference size, 2ⁱObtaining the target feature map as a multiple feature map, wherein the reference dimension is greater than or equal to 2ⁱThe size of the multiple feature map;

when i is smaller than N, making i equal to i +1, returning to the step of performing convolution processing on the target feature map through the ith group of convolution layers, and performing sampling processing on the obtained feature map according to two different multiples through the ith group of sampling layers to obtain 2ⁱMultiple feature map and ith second feature map of reference size, 2ⁱAcquiring a multiplied feature map as the operation of the target feature map;

and when i is equal to the N, performing convolution processing on the target feature map through the Nth group of convolution layers, performing primary sampling processing on the obtained feature map through the Nth group of sampling layers to obtain a second feature map of the Nth reference size, and ending the operation.

Optionally, before the invoking the target network model, the method further includes:

obtaining a plurality of training samples, wherein the plurality of training samples comprise a plurality of groups of images and behavior categories in each group of images, each group of images comprise regional images of behaviors, reduced regional images determined after the regional images are subjected to reduction processing, and expanded regional images determined after the regional images are subjected to expansion processing;

and training the network model to be trained based on the plurality of training samples to obtain the target network model.

Optionally, the acquiring a first region image from the target image includes:

calling a target detection model, inputting the target image into the target detection model, and outputting a face detection frame and a preset behavior detection frame, wherein the target detection model is used for identifying the face and the preset behavior in the image based on any image;

when the number of the face detection frames is one, determining the face detection frames as target face detection frames; when the number of the face detection frames is multiple, acquiring a face detection frame with the largest area from the multiple face detection frames, and determining the acquired face detection frame as a target face detection frame;

and determining the first area image based on the target face detection frame.

Optionally, the determining the first region image based on the target face detection frame includes:

filtering out the preset behavior detection frames which are not overlapped with the target face detection frame or have the distance with the target face detection frame larger than a preset distance threshold value;

and cutting out the region corresponding to the filtered residual preset behavior detection frame from the target image to obtain the first region image.

Optionally, the acquiring the target image includes:

detecting a working mode of a camera for shooting the face of the driver;

when the working mode of the camera is an infrared shooting mode, obtaining a gray image obtained by shooting;

calling an image pseudo-color conversion model, inputting the gray level image into the image pseudo-color conversion model, and outputting a three-channel color image corresponding to the gray level image, wherein the image pseudo-color conversion model is used for converting any gray level image into the three-channel color image corresponding to the gray level image;

and acquiring the output three-channel color image as the target image.

Optionally, before detecting an operating mode of a camera for shooting the face of the driver, the method further includes:

acquiring the current vehicle speed and the current illumination intensity;

and when the vehicle speed is greater than a preset vehicle speed threshold value and the illumination intensity is lower than an illumination intensity threshold value, switching the working mode of the camera to the infrared shooting mode.

Optionally, after detecting the behavior of the driver based on the first area image, the second area image, and the third area image, the method further includes:

when the behavior of the driver belongs to the violation behavior, counting the number of the violation behaviors of the driver in a preset time length;

and when the number of illegal behaviors of the driver reaches a preset number threshold within the preset time, giving an alarm for illegal driving.

In a second aspect, there is provided a behavior recognition apparatus for a driver, the apparatus including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a target image, and the target image comprises the face of a driver;

the second acquisition module is used for acquiring a first area image from the target image, wherein the first area image comprises a preset behavior of the driver, and the preset behavior refers to a behavior with similarity between the behavior and an illegal behavior larger than a preset threshold;

the image processing module is used for carrying out reduction processing on the peripheral area of the first area image according to a first proportion threshold value in the target image to obtain a second area image, and carrying out expansion processing on the peripheral area of the first area image according to a second proportion threshold value to obtain a third area image;

an identification module to identify a behavior of the driver based on the first area image, the second area image, and the third area image.

Optionally, the apparatus further comprises:

a size adjustment module, configured to adjust the first area image, the second area image, and the third area image into area images of the same size;

the identification module is used for calling a target network model, and the target network model is used for determining the behavior category of any behavior based on a group of regional images corresponding to the behavior;

Optionally, the identification module is configured to:

the target network model comprises an input layer, an intermediate layer, a splicing layer, a full connection layer and an output layer; performing channel superposition processing on image data in the first area image, the second area image and the third area image after the size adjustment through the input layer based on the resolution and the number of channels of the area image after the size adjustment to obtain a first feature map;

outputting the behavior through the output layer.

Optionally, the identification module is configured to:

the intermediate layer comprises N groups of convolution layers and N groups of samplesEach group of convolution layers corresponds to each group of sampling layers one by one; determining the first feature map as a target feature map by taking i as 1; performing convolution processing on the target characteristic diagram through the ith group of convolution layers, and performing sampling processing on the obtained characteristic diagram through the ith group of sampling layers according to two different multiples to obtain 2ⁱMultiple feature map and ith second feature map of reference size, 2ⁱObtaining the target feature map as a multiple feature map, wherein the reference dimension is greater than or equal to 2ⁱThe size of the multiple feature map;

Optionally, the apparatus further comprises a training module configured to:

Optionally, the second obtaining module is configured to:

and determining the first area image based on the target face detection frame.

Optionally, the second obtaining module is configured to:

Optionally, the first obtaining module is configured to:

detecting a working mode of a camera for shooting the face of the driver;

and acquiring the output three-channel color image as the target image.

Optionally, the apparatus further comprises:

the third acquisition module is used for acquiring the current vehicle speed and the illumination intensity;

and the switching module is used for switching the working mode of the camera to the infrared shooting mode when the vehicle speed is greater than a preset vehicle speed threshold value and the illumination intensity is lower than an illumination intensity threshold value.

Optionally, the apparatus further comprises:

the counting module is used for counting the number of illegal behaviors of the driver within a preset time length when the behavior of the driver belongs to the illegal behaviors;

and the alarm module is used for giving an alarm for illegal driving when the number of illegal behaviors of the driver reaches a preset number threshold within the preset time length.

In a third aspect, a smart device includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the behavior recognition method for a driver according to the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, which stores instructions that, when executed by a processor, implement the behavior recognition method for a driver according to the first aspect.

In a fifth aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the method for driver behavior recognition as described above in the first aspect.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

the method comprises the steps of obtaining a target image including a face of a driver, obtaining a first area image including a preset behavior of the driver from the target image, wherein the preset behavior is similar to an illegal behavior, namely obtaining the first area image of the preset behavior close to the illegal behavior from the target image. And then, performing inner truncation and outer expansion of different scales on the first region image in the target image to obtain a second region image and a third region image. Because the first area image may only include the preset behavior, but does not include or include a very small part of the peripheral object related to the preset behavior, and whether the preset behavior is really an illegal behavior cannot be accurately determined based on the first area image, after the first area image is subjected to internal truncation and external expansion, the obtained second area image and third area image can include more information of the peripheral object related to the preset behavior, so that the behavior of the driver can be detected based on the three area images, and the detection accuracy can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of a captured image of a driver's face, shown in accordance with an exemplary embodiment;

FIG. 2 is a flow chart illustrating a method of driver behavior recognition according to an exemplary embodiment;

FIG. 3 is a schematic diagram illustrating a driver's face detection box in accordance with an exemplary embodiment;

FIG. 4 is a schematic illustration of a region image shown in accordance with an exemplary embodiment;

FIG. 5 is a schematic diagram illustrating a process flow for convolution sampling of an intermediate layer in accordance with an exemplary embodiment;

FIG. 6 is a schematic diagram illustrating a driver's behavior recognition device according to an exemplary embodiment;

fig. 7 is a schematic configuration diagram showing a behavior recognizing apparatus for a driver according to another exemplary embodiment;

fig. 8 is a schematic structural view showing a behavior recognizing apparatus for a driver according to another exemplary embodiment;

fig. 9 is a schematic structural view showing a behavior recognizing apparatus for a driver according to another exemplary embodiment;

fig. 10 is a schematic configuration diagram showing a behavior recognizing apparatus for a driver according to another exemplary embodiment;

fig. 11 is a block diagram of a terminal 1000 according to an example embodiment.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Before describing the behavior recognition method of the driver provided by the embodiment of the present application in detail, the application scenario and the implementation environment related to the embodiment of the present application are briefly described.

First, a brief description is given of an application scenario related to an embodiment of the present application.

Image recognition is currently widely used in the field of intelligent transportation and, in some embodiments, may be used to detect driver behavior. During the driving process of the driver, the human face part of the driver can be shot, and then the shot image is detected and analyzed through the trained network model so as to detect whether the violation behavior exists in the driver. However, some behaviors of the driver may be similar to the violations, for example, please refer to fig. 1, the face-touching gesture and the call-making gesture of the driver are similar, in which case, when the detection analysis is performed through the trained network model, the face-touching gesture is easily misjudged as the violations, which results in false alarm, thus not only affecting passengers, but also possibly affecting driver distraction. Therefore, the embodiment of the present application provides a method for identifying a behavior of a driver, which can accurately determine whether the behavior of the driver is an illegal behavior, and for specific implementation, refer to the embodiment shown in fig. 2 below.

Next, a brief description will be given of an implementation environment related to the embodiments of the present application.

The method for identifying the behavior of the driver can be executed by intelligent equipment, the intelligent equipment can be configured or connected with a camera, and the camera can be installed in a center console, an instrument panel or an A pillar and other areas in front of the vehicle so as to shoot the face part of the driver through the camera, and therefore clear images around the driver can be obtained in real time. In some embodiments, the smart device may be a terminal such as a mobile phone, a tablet computer, a computer device, or the like, or the smart device may also be an intelligent camera device, and the like, which is not limited in this embodiment of the present application.

After the application scenarios and the implementation environments related to the embodiments of the present application are described, the driver behavior recognition method provided by the embodiments of the present application will be described in detail with reference to the accompanying drawings.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for identifying a driver's behavior according to an exemplary embodiment, where the method is performed by the above-mentioned smart device in the embodiment of the present application, and the method for identifying a driver's behavior may include the following implementation steps:

step 201: a target image is acquired, the target image including a face of a driver.

In a possible implementation manner, the intelligent device may obtain the video image shot by the camera every other time threshold to obtain the target image. Or it can be understood that, in the process of shooting a video by the camera, the intelligent device may obtain one frame of video image every preset number of video image frames to obtain the target image.

The duration threshold value can be set by a user according to actual needs in a self-defined manner, and can also be set by the intelligent device in a default manner, which is not limited in the embodiment of the application.

In addition, the preset number can be set by a user according to actual requirements in a self-defined mode, and can also be set by the intelligent device in a default mode. For example, the predetermined number may be 5, and the target image may be a first video image frame, a sixth video image frame, an eleventh video image frame, and the like.

Further, the target image may be a three-channel color image, where the three channels refer to R, G and B, and represent the red value, the green value, and the blue value of the pixel point respectively. In implementation, the intelligent device detects a working mode of a camera for shooting the face of the driver, when the working mode of the camera is an infrared shooting mode, a gray level image obtained through shooting is obtained, an image pseudo-color conversion model is called, the gray level image is input into the image pseudo-color conversion model, a three-channel color image corresponding to the gray level image is output, the image pseudo-color conversion model is used for converting any gray level image into a three-channel color image corresponding to the gray level image, and the output three-channel color image is obtained as the target image.

That is to say, the camera can be according to day night light intensity independently switch mode of operation, and this mode of operation can include infrared shooting mode and non-infrared shooting mode. In the non-infrared shooting mode, the image shot by the camera is a three-channel color image, and at the moment, the image collected by the camera can be directly acquired as a target image. However, in the infrared shooting mode, the image shot by the camera is a grayscale image, and since the grayscale image loses color information, the subsequent processing difficulty may be increased for the smart device. Therefore, when the camera is detected to be in the infrared shooting mode, the gray image can be converted into a three-channel color image, namely, the pseudo color information of the gray image is restored, and the three-channel color image obtained after conversion is obtained as a target image.

In some embodiments, the grayscale image may be converted to a corresponding three-channel color image by an image pseudo-color conversion model. That is, the grayscale image may be input to a pseudo-color conversion model that outputs a three-channel color image of the same size as the grayscale image.

Further, before the image pseudo color conversion model is called, a plurality of gray scale image samples and color image samples corresponding to each gray scale image sample can be obtained, and the image pseudo color conversion model is obtained after a network to be trained is trained on the basis of the gray scale image samples and the color image samples corresponding to each gray scale image sample.

For example, more than 50 ten thousand color image samples cut from a natural scene may be obtained, and converted into corresponding gray image samples by the following formula (1), and then the more than 50 ten thousand color image samples and gray image samples are input into a network to be trained for training. Wherein the formula (1) is:

H＝R*0.299+G*0.587+B*0.114 (1)

h represents the gray value of the pixel point, R represents the red value of the pixel point, G represents the green value of the pixel point, and B represents the blue value of the pixel point.

In a possible implementation manner, the network to be trained may use, but is not limited to, a conditional-generation countermeasure network, in this case, during the training process, the network includes a generator and a discriminator, the generator may generate a color image of the same size by performing parameter superposition on the gray image samples, and the discriminator generates a training loss by discriminating a difference value between the generated color image and the color image sample corresponding to the gray image sample, so as to iteratively train the network.

Further, before detecting the working mode of the camera for shooting the face of the driver, the intelligent device acquires the current vehicle speed and the illumination intensity, and when the vehicle speed is greater than a preset vehicle speed threshold value and the illumination intensity is lower than an illumination intensity threshold value, the working mode of the camera is switched to the infrared shooting mode.

That is to say, only when the vehicle speed is greater than a certain preset vehicle speed threshold value and the illumination intensity is low, the camera automatically switches to the infrared shooting mode, otherwise, the camera performs video shooting in the non-infrared shooting mode, that is, performs video shooting in the conventional mode.

The preset vehicle speed threshold value can be set by a user according to actual requirements, and can also be set by the intelligent device in a default mode, and the embodiment of the application is not limited.

In addition, the illumination intensity threshold may be set by a user according to actual needs, or may be set by default by the smart device, which is not limited in the embodiment of the present application.

Further, the above description is only given by taking an example in which the camera is controlled to automatically switch to the infrared shooting mode under one of the conditions of the light intensity. In another embodiment, in order to remind the driver not to fatigue, there may be a need to detect whether the driver is eyes-closed, in which case, if the driver wears sunglasses, the camera also needs to be switched to the infrared shooting mode, and at this time, the user may manually switch to trigger the infrared shooting switching instruction, so that the smart device switches the camera to the infrared shooting mode based on the infrared shooting switching instruction.

It should be noted that, the above description is only given by taking the target image as a three-channel color image as an example, and in another embodiment, the target image may also be a grayscale image, which is not limited in this embodiment of the present application.

It should be further noted that, the above description is only given by taking an example that when the vehicle speed is greater than the preset vehicle speed threshold and the illumination intensity is low, the camera is automatically switched to the infrared shooting mode, and otherwise shooting is performed in the non-infrared shooting mode. In another embodiment, the camera may not be turned on when the vehicle speed is below the preset threshold, that is, the driver may not be detected for behavior if the vehicle speed is below the preset threshold. Therefore, before the target image is acquired, whether the vehicle speed is greater than a preset vehicle speed threshold value or not can be judged, when the vehicle speed is greater than the preset vehicle speed threshold value, the operation of acquiring the target image is executed, otherwise, the camera is not started, namely, the operation of acquiring the target image is not executed.

Step 202: and acquiring a first area image from the target image, wherein the first area image comprises a preset behavior of the driver, and the preset behavior refers to a behavior with similarity larger than a preset threshold value with the illegal behavior.

The preset behavior can be set by a user according to actual needs in a self-defined mode, and can also be set by the intelligent device in a default mode.

For example, the predetermined action may include making a phone call, smoking a cigarette, touching the chin, covering the mouth, touching the side face, touching the ears, etc. Here, the call is a gesture of holding the mobile phone close to the ear.

In addition, the preset threshold may be set by a user according to actual requirements in a self-defined manner, or may be set by default by the computer device, which is not limited in the embodiment of the present application.

That is to say, the intelligent device determines the region where the preset behavior similar to the violation behavior of the driver is located from the target image, and then obtains the determined region from the target image, so that the preset behavior can be further analyzed in detail later, and whether the preset behavior is really the violation behavior is determined.

In some embodiments, the specific implementation of acquiring the first region image from the target image may include: and calling a target detection model, inputting the target image into the target detection model, and outputting a face detection frame and a preset behavior detection frame, wherein the target detection model is used for identifying the face and the preset behavior in the image based on any image. When the number of the face detection frames is one, determining the face detection frames as target face detection frames; when the number of the face detection frames is multiple, the face detection frame with the largest area is obtained from the multiple face detection frames, the obtained face detection frame is determined as a target face detection frame, and the first area image is determined based on the target face detection frame.

That is, the object detection model can not only detect a human face, but also detect a region corresponding to a predetermined action, such as a call, a face touch, a smoke, and the like. In general, the number of the detected predetermined behavior detection frames is plural, and the predetermined behavior corresponding to one of the predetermined behavior detection frames may not be of the driver, for example, may be of a passenger standing beside the driver, and the smart device may filter out the predetermined behavior detection frames unrelated to the driver. In an implementation, the face detection box of the driver may be determined, and then the filtering operation is performed based on the face detection box of the driver.

Since the camera is used for shooting the face part of the driver, when the number of the output face detection frames is one, the face detection frame can be determined to correspond to the face of the driver, and the face detection frame is determined to be the target detection frame. In other embodiments, the passenger may be standing near the driver, such as standing behind the driver's side and facing the camera, in which case the object detection model may detect multiple face detection frames. Since the camera is used for shooting the face of the driver, the face detection frame with the largest area can be determined to correspond to the face of the driver, for example, as shown in fig. 3, and the face detection frame with the largest area is determined as the target face detection frame.

Further, based on the target face detection frame, determining a specific implementation of the first region image may include: and filtering the preset behavior detection frames which are not overlapped with the target face detection frame or have the distance between the preset behavior detection frames and the target face detection frame larger than a preset distance threshold value, and cutting out the regions corresponding to the residual filtered preset behavior detection frames from the target image to obtain the first region image.

The smart device may filter the preset behavior detection box that does not belong to the preset behavior of the driver based on the target face detection box. It is understood that if the preset behavior detection frame is not overlapped with the target face detection frame, it indicates that the preset behavior detection frame is far away from the face of the driver, so that it can be determined that the preset behavior corresponding to the preset behavior detection frame is not the behavior of the driver. In addition, the preset behavior detection frame may also be determined according to a distance between the preset behavior detection frame and the target face detection frame, and when the distance between the preset behavior detection frame and the target face detection frame is greater than a preset distance threshold, it is also indicated that the preset behavior detection frame is farther from the face of the driver, so that it may be determined that the preset behavior corresponding to the preset behavior detection frame is not the behavior of the driver.

The preset distance threshold value can be set by a user according to actual needs in a self-defined mode, and can also be set by the intelligent device in a default mode.

After the intelligent device filters the preset behavior detection frames, the remaining preset behavior detection frames can be determined to be detection frames corresponding to the preset behavior of the driver, and the intelligent device cuts out the corresponding region from the target image to obtain a first region image.

It should be noted that, if the currently output preset behavior detection frame is not overlapped with the target face detection frame, or the distances between the currently output preset behavior detection frame and the target face detection frame are both greater than a preset distance threshold, it may be determined that the driver does not currently have an illegal behavior in the target image.

Further, when the fact that the driver wears sunglasses is detected through the target detection model, the camera can be automatically controlled to be switched to the infrared shooting mode. That is, if it is detected by the object detection model that the driver wears sunglasses during the daytime, the camera may be switched to the infrared photographing mode in order to detect whether there is eye-closing behavior of the driver.

Further, before the target detection model is called, a plurality of image samples and the face frame and the preset behavior frame in each image sample can be obtained, and the plurality of image samples and the face frame and the preset behavior frame in each image sample are input to the detection model to be trained for training to obtain the target detection model.

The face frame and the preset behavior frame in each image sample can be calibrated in advance, that is, based on the plurality of image samples and the calibrated face frame and the preset behavior frame in the plurality of image samples, deep learning and training are performed on the detection model to be trained, so that the obtained target detection model can automatically detect the face and the preset behavior in any image.

In a possible implementation manner, the detection model to be trained may be a YOLO (You Only see once) network, an SSD (Single Shot Detector), and the like, which is not limited in this embodiment.

Step 203: in the target image, the peripheral area of the first area image is subjected to reduction processing according to a first proportion threshold value to obtain a second area image, and the peripheral area of the first area image is subjected to expansion processing according to a second proportion threshold value to obtain a third area image.

Since the first area image may only include the predetermined action, but not include or include only a small part of the article related to the predetermined action, for example, the article is a mobile phone, a cigarette, etc., that is, the mobile phone or the cigarette may not be completely contained in the first area image or only occupy a small part thereof, and at the same time, objects such as the mobile phone or the cigarette may have different sizes due to the camera mounting and the size problem thereof. In order to further determine whether the preset behavior is really an illegal behavior, the intelligent device performs inside cropping and outside expanding on the region corresponding to the first region image according to different scales, namely in the target image, performing reduction processing on the peripheral region of the first region image according to a first proportion threshold value, and performing expansion processing on the peripheral region of the first region image according to a second proportion threshold value to obtain a second region image and a third region image.

For example, referring to fig. 4, assuming that the first scale threshold is 0.8 and the second scale threshold is 1.2, the second area image and the third area image shown in fig. 4 at 41 and 42 can be obtained after performing the inner truncation and the outer extension on the first area image from the target image, wherein 43 in fig. 4 is the first area image.

It is worth mentioning that the first area image, the second area image and the third area image are obtained from the target image, and redundant information in the whole target image is eliminated, so that compared with behavior detection based on the whole target image, subsequent behavior discrimination processing based on the first area image, the second area image and the third area image can improve the detection accuracy.

The first proportional threshold may be set by self-definition according to actual requirements, or may be set by default by the intelligent device, which is not limited in the embodiment of the present application.

The second proportional threshold value may be set by self-definition according to actual requirements, or may be set by default by the intelligent device, which is not limited in the embodiment of the present application.

Step 204: and adjusting the first area image, the second area image and the third area image into area images with the same size.

In this embodiment of the application, in order to facilitate subsequent recognition of the behavior of the driver based on the first area image, the second area image, and the third area image, the smart device resizes the three area images, that is, scales the area images in different scales to the same size.

Step 205: and calling a target network model, wherein the target network model is used for determining the behavior category of any behavior based on a group of regional images corresponding to the behavior.

The group of area images corresponding to any behavior comprises an area image, a reduced area image and an expanded area image of the behavior, wherein the reduced area image is an area image determined after the periphery of the area where the behavior is located is reduced from a shot image where the behavior is located, and the expanded area image is an area image determined after the periphery of the area where the behavior is located is expanded from the shot image where the behavior is located. Further, the size of the area image, the reduced area image, and the expanded area image of the behavior are the same.

Further, before the target network model is called, a plurality of training samples can be obtained, the plurality of training samples include a plurality of groups of images and behavior categories in each group of images, each group of images includes a region image of a behavior, a reduced region image determined after the region image is reduced, an extended region image determined after the region image is extended, and the target network model is obtained after the network model to be trained is trained based on the plurality of training samples.

That is, the target network model may be obtained by training in advance. In the training process, the multiple groups of images and the behavior categories in each group of images can be input into a network model to be trained for training. The images of each group include a region image of a behavior, a reduced region image and an expanded region image, that is, the region image of each group of images can be obtained by capturing a region where the behavior is located from a captured image where the behavior is located, the reduced region image can be obtained by reducing a region around the region where the behavior is located from the captured image where the behavior is located, and the expanded region image can be obtained by expanding a region around the region where the behavior is located from the captured image where the behavior is located. Further, the three area images in each group of images have the same size, that is, after the reduction processing and the expansion processing, the size of the obtained area image and the area image corresponding to the behavior can be adjusted, so that the sizes of the three area images corresponding to the behavior are consistent.

In addition, because the information contained in different channels is different, in order to enable the network model to be trained to learn the weight of each channel on the recognition task, a compression-activation module can be added in the network model to be trained to weight the channels of the network model to be trained, so that the learning of the network model to be trained focuses more on the region of interest, and the accuracy of target network model detection is improved.

Step 206: and identifying the behavior of the driver through the target network model based on the resized first area image, second area image and third area image.

Further, the target network model includes an input layer, an intermediate layer, a splicing layer, a full-link layer, and an output layer, and the specific implementation of identifying the behavior of the driver through the target network model based on the resized first area image, second area image, and third area image may include: and carrying out channel superposition processing on image data in the first area image, the second area image and the third area image after the size adjustment through the input layer based on the resolution and the number of channels of the area image after the size adjustment to obtain a first characteristic diagram. And carrying out convolution sampling processing on the first characteristic diagram through the middle layer to obtain a plurality of second characteristic diagrams, wherein the second characteristic diagrams are the same in size and different in channel number, carrying out channel superposition processing on the second characteristic diagrams through the splicing layer, and carrying out characteristic fusion on the characteristic diagrams after channel superposition through the convolution layer in the deep layer of the network to obtain a third characteristic diagram. And determining the behavior of the driver based on the third characteristic diagram through the full connection layer, and outputting the behavior through the output layer.

For example, assuming that the resolution of each region image is 256 × 256 and the number of channels per region image is 3, in this case, three region images of the resized first region image, second region image, and third region image are input into the target network model, and the three region images are subjected to channel stacking through the output layer, so that a three-dimensional matrix of 256 × 9 can be obtained, where 9 is the number of channels after channel stacking, and the three-dimensional matrix is used to indicate the first feature map.

And then, carrying out convolution sampling processing on the first characteristic diagram through an intermediate layer in the target network model, wherein in a possible mode, the intermediate layer comprises N groups of convolution layers and N groups of sampling layers, each group of convolution layers corresponds to each group of sampling layers one by one, and N is an integer greater than 1. Further, each set of convolutional layers may include at least one convolutional layer, and each set of sampling layers may include at least one sampling layer. In this case, the obtaining a plurality of second feature maps by performing convolution sampling processing on the first feature map through the intermediate layer may include: determining the first feature map as a target feature map by setting i to 1; performing convolution processing on the target characteristic diagram through the ith group of convolution layers, and performing sampling processing on the obtained characteristic diagram through the ith group of sampling layers according to two different multiples to obtain 2ⁱMultiple feature map and ith second feature map of reference size, 2ⁱObtaining the target feature map as a multiple feature map, wherein the reference dimension is greater than or equal to 2ⁱThe size of the multiple feature map; when i is smaller than N, making i equal to i +1, returning to the step of performing convolution processing on the target feature map through the ith group of convolution layers, and performing sampling processing on the obtained feature map according to two different multiples through the ith group of sampling layers to obtain 2ⁱMultiple feature map and ith second feature map of reference size, 2ⁱAcquiring a multiplied feature map as the operation of the target feature map; and when i is equal to the N, performing convolution processing on the target feature map through the Nth group of convolution layers, performing sampling processing on the obtained feature map through the Nth group of sampling layers once to obtain a second feature map of the Nth reference size, and finishing the operation.

The reference size may be set according to actual requirements, or may be set by the smart device type default, which is not limited in the embodiments of the present application.

Image features beneficial to identifying rows as categories can be extracted through convolution processing of the convolution layers, and multi-scale feature images can be obtained through sampling processing of the sampling layers. Referring to fig. 5, in implementation, the first feature map output by the output layer is input to the first set of convolutional layers and is convolved to obtain a multi-channel feature map, and then the 2 × 2 sampling layers and the 8 × 8 sampling layers in the first set of sampling layers are respectively downsampled by different multiples to obtain a 2-fold feature map and a second feature map with a reference size. And then continuously inputting the 2-time feature map into the second group of convolution layers for convolution processing to obtain a multi-channel feature map, and then respectively performing downsampling processing of different multiples through the 2 x 2 sampling layers and the 4 x 4 sampling layers in the second group of sampling layers to obtain a 4-time feature map and a second feature map with a reference size. And then continuing to perform convolution sampling processing according to the mode until the Nth group of convolution layers perform convolution processing, performing 2 x 2 sampling layer processing through the Nth group of sampling layers to obtain a second feature map of the Nth reference size, and finishing the convolution sampling operation.

With reference to fig. 5, after a plurality of second feature maps are obtained, channel stacking is performed through the splice layer, and then feature fusion is performed through the convolutional layer, so as to obtain a third feature map. That is, before the final prediction result, the first-time, second-time and fourth-time feature maps of the network shallow layer are sampled by different times to obtain feature maps with the same size, then the feature maps are superposed on a channel, feature fusion is carried out on the feature maps after splicing through a convolution layer, and then the feature maps after feature fusion are output to a full connection layer.

It is worth mentioning that the feature maps of multiple scales are fused, so that the target network model learns the feature distribution of objects of different scales, and the accuracy of behavior detection is improved.

It should be noted that the size of the convolution kernel of each convolutional layer may be preset according to actual requirements, and in addition, the parameters in the convolution kernel of each convolutional layer are determined in the training process. In addition, in the implementation process, parameters such as the image size, the feature map size, the number of channels, the number of down-sampling times, the network depth and the like of the target network model of different behaviors may be different.

After the third characteristic map is obtained, the behavior of the driver is determined based on the third characteristic map through the full connection layer, and then the behavior is output through the output layer, such as whether the behavior is calling or not calling. Further, the behavior may be identified by a behavior identifier, such as outputting a behavior identifier of "1" to indicate a call, and outputting a behavior identifier of "0" to indicate a non-call. In some embodiments, a confidence level determination may be given to the output result, where the confidence level interval is [0,1], and when the confidence level is greater than 0.5, the behavior category is determined to be an illegal behavior, otherwise, the behavior is determined not to belong to the illegal behavior.

It should be noted that, the above description is only given by taking the target network model as an example, and the target network model includes an input layer, an intermediate layer, a splicing layer, a full connection layer and an output layer, and in another embodiment, the target network model further includes other layers to perform other operations on the feature map through other layers, for example, the target network model may further include a BN (Batch Normalization) layer, and the like, which is not limited in this embodiment.

Further, when the behavior of the driver belongs to the violation behavior, counting the number of times of violation behavior of the driver in a preset time period, and when the number of times of violation behavior of the driver in the preset time period reaches a preset number threshold, giving an alarm for violation driving.

When the behavior of the driver is determined to belong to the violation behavior, counting the number of continuous violation behaviors of the driver in a certain time period, namely counting the behavior of the driver detected in a plurality of video image frames in all the video image frames in the preset time period to belong to the violation behavior, and if the number of the violation behavior reaches a preset number threshold, determining that the driver is in dangerous driving behavior. For example, counting that 75% of all video image frames in 3 seconds have a detection result indicating that the behavior of the driver belongs to an illegal behavior, at this time, an illegal driving warning prompt may be performed.

In some embodiments, the illegal driving warning prompt can be performed through a buzzer, or the illegal driving warning prompt can also be performed in a voice broadcast mode, so that the driver is reminded of driving safely, and the passenger can supervise the behavior of the driver.

Therefore, when the number of times of illegal behaviors of the driver reaches a certain preset number threshold value, the alarming prompt for illegal driving is carried out, false alarming caused by deviation in detection of individual video image frames can be avoided, and the alarming accuracy is improved.

The preset duration can be set by a user according to actual needs in a self-defined mode, and can also be set by the intelligent device in a default mode.

The preset time threshold value can be set by a user according to actual needs in a self-defined mode, and can also be set by the intelligent device in a default mode.

In the embodiment of the application, a target image including a face of a driver is acquired, and a first area image including a preset behavior of the driver, which is similar to an illegal behavior, is acquired from the target image, that is, a first area image of the preset behavior close to the illegal behavior is acquired from the target image. And then, performing inner truncation and outer expansion of different scales on the first region image in the target image to obtain a second region image and a third region image. Because the first area image may only include a small part of the preset behavior, but does not include or include a very small part of the peripheral object related to the preset behavior, and whether the preset behavior is really an illegal behavior cannot be accurately determined based on the first area image, after the first area image is subjected to internal truncation and external expansion, the obtained second area image and third area image can include more information of the peripheral object related to the preset behavior, so that the behavior of the driver can be detected based on the three area images, and the detection accuracy can be improved.

In addition, the three regional images are subjected to channel fusion, so that the target network model can focus on information in a certain range around the object, the behavior category of the driver can be accurately determined by the target network model, and the robustness of the target network model is improved.

Fig. 6 is a schematic structural diagram illustrating a driver's behavior recognizing apparatus according to an exemplary embodiment, which may be implemented by software, hardware, or a combination of both. The behavior recognizing device of the driver may include:

a first obtaining module 501, configured to obtain a target image, where the target image includes a face of a driver;

a second obtaining module 502, configured to obtain a first area image from the target image, where the first area image includes a preset behavior of the driver, and the preset behavior is a behavior in which a similarity between the preset behavior and an illegal behavior is greater than a preset threshold;

an image processing module 503, configured to perform reduction processing on the peripheral area of the first area image according to a first proportion threshold value in the target image to obtain a second area image, and perform expansion processing on the peripheral area of the first area image according to a second proportion threshold value to obtain a third area image;

an identifying module 504 configured to identify a behavior of the driver based on the first area image, the second area image, and the third area image.

Optionally, referring to fig. 7, the apparatus further includes:

a resizing module 505, configured to resize the first area image, the second area image, and the third area image into area images of the same size;

the identification module 504 is configured to invoke a target network model, where the target network model is configured to determine a behavior category of any behavior based on a set of region images corresponding to the behavior;

Optionally, the identifying module 504 is configured to:

outputting the behavior through the output layer.

Optionally, the identifying module 504 is configured to:

the middle layer comprises N groups of convolution layers and N groups of sampling layers, and each group of convolution layers corresponds to each group of sampling layers one to one; determining the first feature map as a target feature map by taking i as 1; performing convolution processing on the target characteristic diagram through the ith group of convolution layers, and performing sampling processing on the obtained characteristic diagram through the ith group of sampling layers according to two different multiples to obtain 2ⁱMultiple feature map and ith second feature map of reference size, 2ⁱObtaining the target feature map as a multiple feature map, wherein the reference dimension is greater than or equal to 2ⁱThe size of the multiple feature map;

Optionally, referring to fig. 8, the apparatus further includes a training module 506, where the training module 506 is configured to:

Optionally, the second obtaining module 502 is configured to:

and determining the first area image based on the target face detection frame.

Optionally, the second obtaining module 502 is configured to:

Optionally, the first obtaining module 501 is configured to:

detecting a working mode of a camera for shooting the face of the driver;

and acquiring the output three-channel color image as the target image.

Optionally, referring to fig. 9, the apparatus further includes:

a third obtaining module 507, configured to obtain a current vehicle speed and illumination intensity;

and the switching module 508 is configured to switch the working mode of the camera to the infrared shooting mode when the vehicle speed is greater than a preset vehicle speed threshold and the illumination intensity is lower than an illumination intensity threshold.

Optionally, referring to fig. 10, the apparatus further includes:

the counting module 509 is configured to count the number of violations of the driver within a preset time period when the behavior of the driver belongs to the violations;

and the alarm module 510 is configured to perform an illegal driving alarm prompt when the number of illegal behaviors of the driver reaches a preset number threshold within the preset time period.

It should be noted that: when the driver behavior recognition device provided in the above embodiment implements the driver behavior recognition method, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the behavior recognition device for the driver provided by the embodiment and the behavior recognition method for the driver belong to the same concept, and specific implementation processes of the behavior recognition device for the driver are detailed in the method embodiment and are not described again.

Fig. 11 shows a block diagram of a terminal 1000 according to an exemplary embodiment of the present application. The terminal 1000 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio layer iii, motion video Experts compression standard Audio layer 3), an MP4 player (Moving Picture Experts Group Audio layer IV, motion video Experts compression standard Audio layer 4), a notebook computer, or a desktop computer. Terminal 1000 can also be referred to as user equipment, portable terminal, laptop terminal, desktop terminal, or the like by other names.

In general, terminal 1000 can include: a processor 1001 and a memory 1002.

Processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1001 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1001 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 1001 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for execution by processor 1001 to implement a method of driver behavior recognition as provided by method embodiments herein.

In some embodiments, terminal 1000 can also optionally include: a peripheral interface 1003 and at least one peripheral. The processor 1001, memory 1002 and peripheral interface 1003 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, touch screen display 1005, camera 1006, audio circuitry 1007, positioning components 1008, and power supply 1009.

The peripheral interface 1003 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1001 and the memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1001, the memory 1002, and the peripheral interface 1003 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1004 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1004 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1005 is a touch display screen, the display screen 1005 also has the ability to capture touch signals on or over the surface of the display screen 1005. The touch signal may be input to the processor 1001 as a control signal for processing. At this point, the display screen 1005 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display screen 1005 can be one, providing a front panel of terminal 1000; in other embodiments, display 1005 can be at least two, respectively disposed on different surfaces of terminal 1000 or in a folded design; in still other embodiments, display 1005 can be a flexible display disposed on a curved surface or on a folded surface of terminal 1000. Even more, the display screen 1005 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display screen 1005 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 1006 is used to capture images or video. Optionally, the camera assembly 1006 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing or inputting the electric signals to the radio frequency circuit 1004 for realizing voice communication. For stereo sound collection or noise reduction purposes, multiple microphones can be provided, each at a different location of terminal 1000. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1007 may also include a headphone jack.

A location component 1008 is employed to locate a current geographic location of terminal 1000 for navigation or LBS (location based Service). The positioning component 1008 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 1009 is used to supply power to various components in terminal 1000. The power source 1009 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 1009 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1000 can also include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.

Acceleration sensor 1011 can detect acceleration magnitudes on three coordinate axes of a coordinate system established with terminal 1000. For example, the acceleration sensor 1011 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1001 may control the touch display screen 1005 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1011. The acceleration sensor 1011 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1012 may detect a body direction and a rotation angle of the terminal 1000, and the gyro sensor 1012 and the acceleration sensor 1011 may cooperate to acquire a 3D motion of the user on the terminal 1000. From the data collected by the gyro sensor 1012, the processor 1001 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1013 may be disposed on a side frame of terminal 1000 and/or on a lower layer of touch display 1005. When pressure sensor 1013 is disposed on a side frame of terminal 1000, a user's grip signal on terminal 1000 can be detected, and processor 1001 performs left-right hand recognition or shortcut operation according to the grip signal collected by pressure sensor 1013. When the pressure sensor 1013 is disposed at a lower layer of the touch display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1005. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1014 is used to collect a fingerprint of the user, and the processor 1001 identifies the user according to the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying, and changing settings, etc. Fingerprint sensor 1014 can be disposed on the front, back, or side of terminal 1000. When a physical key or vendor Logo is provided on terminal 1000, fingerprint sensor 1014 can be integrated with the physical key or vendor Logo.

The optical sensor 1015 is used to collect the ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the touch display screen 1005 according to the intensity of the ambient light collected by the optical sensor 1015. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1005 is turned down. In another embodiment, the processor 1001 may also dynamically adjust the shooting parameters of the camera assembly 1006 according to the intensity of the ambient light collected by the optical sensor 1015.

Proximity sensor 1016, also known as a distance sensor, is typically disposed on a front panel of terminal 1000. Proximity sensor 1016 is used to gather the distance between the user and the front face of terminal 1000. In one embodiment, when proximity sensor 1016 detects that the distance between the user and the front surface of terminal 1000 gradually decreases, processor 1001 controls touch display 1005 to switch from a bright screen state to a dark screen state; when proximity sensor 1016 detects that the distance between the user and the front of terminal 1000 is gradually increased, touch display screen 1005 is controlled by processor 1001 to switch from a breath-screen state to a bright-screen state.

Those skilled in the art will appreciate that the configuration shown in FIG. 11 is not intended to be limiting and that terminal 1000 can include more or fewer components than shown, or some components can be combined, or a different arrangement of components can be employed.

The embodiment of the application also provides a non-transitory computer readable storage medium, and when instructions in the storage medium are executed by a processor of the mobile terminal, the mobile terminal is enabled to execute the behavior recognition method of the driver provided by the embodiment.

Embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method for identifying a behavior of a driver provided in the above embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for identifying a driver's behavior, the method comprising:

2. The method of claim 1, wherein prior to identifying the behavior of the driver based on the first region image, the second region image, and the third region image, further comprising:

accordingly, the identifying the behavior of the driver based on the first area image, the second area image, and the third area image includes:

3. The method of claim 2, wherein the target network model comprises an input layer, an intermediate layer, a splice layer, a fully-connected layer, and an output layer;

outputting the behavior through the output layer.

4. The method of claim 3, wherein the intermediate layer comprises N sets of convolutional layers and N sets of sampling layers, each set of convolutional layers corresponding one-to-one to each set of sampling layers;

5. The method of claim 2, wherein prior to invoking the target network model, further comprising:

6. The method of claim 1, wherein said acquiring a first region image from said target image comprises:

and determining the first area image based on the target face detection frame.

7. The method of claim 6, wherein the determining the first region image based on the target face detection box comprises:

8. The method of claim 1, wherein said acquiring a target image comprises:

detecting a working mode of a camera for shooting the face of the driver;

and acquiring the output three-channel color image as the target image.

9. The method of claim 8, wherein prior to detecting an operating mode of a camera used to capture the driver's face, further comprising:

acquiring the current vehicle speed and the current illumination intensity;

10. The method of claim 1, wherein after identifying the behavior of the driver based on the first region image, the second region image, and the third region image, further comprising:

11. A behavior recognition apparatus for a driver, characterized in that the apparatus comprises:

12. A smart device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the steps of any of the methods of claims 1-10.

13. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of any of the methods of claims 1-10.