CN111931639A

CN111931639A - Driver behavior detection method and device, electronic equipment and storage medium

Info

Publication number: CN111931639A
Application number: CN202010790208.3A
Authority: CN
Inventors: 王飞; 钱晨
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2020-11-13
Anticipated expiration: 2040-08-07
Also published as: CN111931639B; KR20220032074A; JP2023500218A; WO2022027894A1

Abstract

The present disclosure provides a driver behavior detection method, device, electronic device and storage medium, the method comprising: acquiring an image to be detected of a driving position area in a cabin; detecting the image to be detected to obtain a target detection result, wherein the target detection result comprises a steering wheel detection result and a human hand detection result; determining the driving behavior category of the driver according to the target detection result; and when the driving behavior category of the driver is dangerous driving, warning information is sent out.

Description

Driver behavior detection method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of deep learning, in particular to a driver behavior detection method and device, electronic equipment and a storage medium.

Background

With the rapid development of vehicles, vehicles become important transportation means for users to travel, so that safe driving of vehicles becomes one of the important subjects in the current automobile industry. The safe driving of the vehicle is determined by various factors, such as the driving behavior of the driver, the road conditions of the road, the weather conditions, and the like.

In general, dangerous driving behavior is one of the major factors responsible for most traffic accidents. Therefore, the driving behavior of the driver can be detected in order to improve the driving safety and ensure the safety of the passengers and the driver.

Disclosure of Invention

In view of this, the disclosed embodiments at least provide a driver behavior detection method, device, electronic device and storage medium.

In a first aspect, the present disclosure provides a driver behavior detection method, including:

acquiring an image to be detected of a driving position area in a cabin;

detecting the image to be detected to obtain a target detection result, wherein the target detection result comprises a steering wheel detection result and a human hand detection result;

determining the driving behavior category of the driver according to the target detection result;

and when the driving behavior category of the driver is dangerous driving, warning information is sent out.

By adopting the method, the target detection result is obtained by detecting the image to be detected corresponding to the acquired driving position area, the target detection result comprises the steering wheel detection result and the hand detection result, the driving behavior type of the driver is determined according to the target detection result, and the warning information is sent when the driving behavior type of the driver is dangerous driving, so that the driving behavior of the driver is detected, the driver is conveniently and safely reminded, and the safety of vehicle driving is improved.

In one possible embodiment, when the steering wheel is included in the steering wheel detection result and the human hand is included in the human hand detection result, determining the driving behavior category of the driver according to the target detection result includes:

determining the position relation between the steering wheel and the hand according to the steering wheel detection result and the hand detection result;

and determining the driving behavior category of the driver according to the position relation.

In one possible implementation, determining the driving behavior category of the driver according to the position relationship includes:

and determining the driving behavior class of the driver as safe driving under the condition that the position relation indicates that the driver holds the steering wheel by hand.

and under the condition that the position relation indicates that the two hands of the driver are separated from the steering wheel, determining that the driving behavior category of the driver is dangerous driving.

In one possible embodiment, when the steering wheel is included in the steering wheel detection result and the human hand is not included in the human hand detection result, determining the driving behavior category of the driver according to the target detection result includes:

and determining the driving behavior type of the driver as dangerous driving according to the target detection result.

In a possible implementation manner, detecting the image to be detected to obtain a target detection result includes:

generating a middle characteristic diagram corresponding to the image to be detected based on the image to be detected;

performing at least one target convolution processing on the intermediate feature map to generate detection feature maps of a plurality of channels corresponding to the intermediate feature map;

performing characteristic value conversion processing on each characteristic value of a target channel characteristic diagram at a characteristic position in the detection characteristic diagrams of the channels by using an activation function to generate a converted target channel characteristic diagram;

performing maximum pooling processing on the converted target channel characteristic diagram according to a preset pooling size and a pooling step length to obtain a plurality of pooling values and a position index corresponding to each pooling value in the plurality of pooling values; the position index is used for identifying the position of the pooled value in the converted target channel feature map;

generating target detection frame information based on the plurality of pooled values and a position index corresponding to each of the plurality of pooled values;

and determining the target detection result according to the target detection frame information.

In the above embodiment, the maximum pooling is performed on the target channel feature map, so that the plurality of pooled values and the position index corresponding to each of the plurality of pooled values are obtained, the target detection frame information is generated, and data support is provided for generating the target detection result.

In one possible embodiment, the generating the target detection frame information based on the plurality of pooled values and the position index corresponding to each of the plurality of pooled values includes:

determining a target pooling value belonging to a center point of the target detection box from the plurality of pooling values based on the plurality of pooling values and a pooling threshold value if at least one of the plurality of pooling values is greater than the set pooling threshold value;

and generating the target detection frame information based on the position index corresponding to the target pooling value.

In the above embodiment, the pooling value larger than the pooling threshold value among the plurality of pooling values is determined as the target pooling value belonging to the center point of the target detection frame of the steering wheel or the driver's hand, and at least one piece of target detection frame information of the steering wheel or the driver's hand is generated more accurately based on the position index corresponding to the target pooling value.

determining that the target detection frame information is empty if the plurality of pooling values are less than or equal to a set pooling threshold.

In one possible embodiment, determining the position relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result includes:

under the condition that the hand detection result comprises one hand, if a detection frame corresponding to the hand in the hand detection result and a detection frame corresponding to the steering wheel in the steering wheel detection result have an overlapped area, determining the position relationship between the steering wheel and the hand as the driver holds the steering wheel; and if the detection frame corresponding to the hands in the hand detection result does not have an overlapping area with the detection frame corresponding to the steering wheel in the steering wheel detection result, determining that the steering wheel and the position relation between the hands are that the two hands of the driver are separated from the steering wheel.

under the condition that the hand detection result comprises two hands, if detection frames corresponding to the two hands in the hand detection result do not have overlapping areas with detection frames corresponding to a steering wheel in the steering wheel detection result, determining the position relation between the steering wheel and the hands as that the two hands of the driver are separated from the steering wheel; and if the detection frame corresponding to at least one hand in the hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result have an overlapping area, determining the position relationship between the steering wheel and the hand as that the driver holds the steering wheel by hand.

performing convolution processing on the intermediate feature map for at least one time to generate a classification feature map of a two-channel corresponding to the intermediate feature map; each channel feature map in the classification feature maps of the two channels corresponds to a category of a human hand;

extracting two feature values at feature positions matched with the central point position information from the classification feature map based on the central point position information indicated by the detection frame information corresponding to the human hand in the human hand detection result; selecting a maximum characteristic value from the two characteristic values, and determining the category of the channel characteristic diagram corresponding to the maximum characteristic value in the classification characteristic diagram as the category corresponding to the central point position information;

and determining the position relation between the steering wheel and the hand of the person based on the category corresponding to the position information of each central point indicated by the detection frame information corresponding to the hand of the person.

In the above embodiment, the steering wheel detection result is determined, the intermediate feature map is convolved at least once to generate the classification feature map, and the generated position information of the central point of the driver's hand is combined to more accurately determine the position relationship between the steering wheel and the human hand.

In a possible implementation manner, determining the position relationship between the steering wheel and the human hand based on the category corresponding to each piece of central point position information indicated by the detection frame information corresponding to the human hand includes:

determining the category corresponding to the central point position information as the position relation between the steering wheel and the human hand under the condition that the detection frame information corresponding to the human hand comprises the central point position information;

determining the position relation between the steering wheel and the hands of the driver as the two hands of the driver are separated from the steering wheel under the condition that the detection frame information corresponding to the hands of the driver comprises two central point position information and the category corresponding to the two central point position information is that the hands of the driver are separated from the steering wheel; and determining the position relationship between the steering wheel and the hands of the driver when at least one category corresponding to the central point position information exists in the categories corresponding to the two pieces of central point position information is that the driver holds the steering wheel by hand.

The following descriptions of the effects of the apparatus, the electronic device, and the like refer to the description of the above method, and are not repeated here.

In a second aspect, the present disclosure provides a driver behavior detection apparatus comprising:

the acquisition module is used for acquiring an image to be detected of a driving position area in the cabin;

the detection module is used for detecting the image to be detected to obtain a target detection result, and the target detection result comprises a steering wheel detection result and a human hand detection result;

the determining module is used for determining the driving behavior category of the driver according to the target detection result;

and the warning module is used for sending warning information when the driving behavior category of the driver is dangerous driving.

In a third aspect, the present disclosure provides an electronic device comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the driver behavior detection method according to the first aspect or any one of the embodiments.

In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the driver behavior detection method according to the first aspect or any one of the embodiments described above.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

FIG. 1 is a flow chart illustrating a driver behavior detection method provided by an embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating a specific method for detecting an image to be detected to obtain a target detection result in a driver behavior detection method provided by the embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating an architecture of a driver behavior detection apparatus provided in an embodiment of the present disclosure;

fig. 4 shows a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It is considered that dangerous driving behavior is one of the major factors responsible for most traffic accidents. Therefore, the driving behavior of the driver can be detected in order to improve the driving safety and ensure the safety of the passengers and the driver. Therefore, in order to solve the above problem, embodiments of the present disclosure provide a method for detecting driver behavior.

For the convenience of understanding the embodiment of the present disclosure, a detailed description will be given to a driver behavior detection method disclosed in the embodiment of the present disclosure.

Referring to fig. 1, a flow chart of a driver behavior detection method provided in an embodiment of the present disclosure is schematically illustrated, and the method includes S101-S104, where:

s101, obtaining an image to be detected of a driving position area in the cabin.

S102, detecting the image to be detected to obtain a target detection result, wherein the target detection result comprises a steering wheel detection result and a human hand detection result.

And S103, determining the driving behavior type of the driver according to the target detection result.

And S104, when the driving behavior category of the driver is dangerous driving, sending out warning information.

According to the method, the obtained image to be detected of the driving position area is detected to obtain a target detection result, the target detection result comprises a steering wheel detection result and a hand detection result, the driving behavior type of the driver is determined according to the target detection result, and warning information is sent when the driving behavior type of the driver is dangerous driving, so that the driving behavior of the driver is detected, the driver is conveniently and safely reminded, and the safety of vehicle driving is improved.

For S101:

here, a camera device may be provided in the vehicle cabin, and an image to be detected of the driving position area is acquired in real time by the camera device provided in the vehicle cabin. Wherein the mounting position of the image pickup apparatus may be a position where a steering wheel and a driver seat area in the driver seat area can be photographed.

For S102 and S103:

here, the images to be detected may be input into the trained neural network, and the images to be detected are respectively detected to obtain target detection results, where the target detection results include a steering wheel detection result and a human hand detection result. The detection result of the steering wheel comprises information whether the steering wheel exists in the image to be detected, and when the steering wheel exists, the detection result of the steering wheel comprises detection frame information of the steering wheel; the human hand detection result comprises information of whether a detection frame of the human hand exists in the image to be detected; when the human hand exists, the human hand detection result comprises detection frame information of the human hand.

In an alternative embodiment, referring to fig. 2, detecting an image to be detected to obtain a target detection result may include:

s201, generating a middle characteristic diagram corresponding to the image to be detected based on the image to be detected.

And S202, performing target convolution processing on the intermediate feature map at least once to generate detection feature maps of a plurality of channels corresponding to the intermediate feature map.

And S203, performing characteristic value conversion processing on each characteristic value of the target channel characteristic diagram representing the position in the detection characteristic diagrams of the channels by using an activation function to generate a converted target channel characteristic diagram.

S204, performing maximum pooling processing on the converted target channel characteristic diagram according to a preset pooling size and a pooling step length to obtain a plurality of pooling values and a position index corresponding to each pooling value in the plurality of pooling values; the location index is used to identify the location of the pooled value in the transformed target channel feature map.

S205, target detection frame information is generated based on the plurality of pooled values and the position index corresponding to each of the plurality of pooled values.

And S206, determining a target detection result according to the target detection frame information.

The above embodiment obtains a plurality of pooling values and a position index corresponding to each of the plurality of pooling values by performing maximum pooling on the target channel feature map, generates target detection frame information, and provides data support for generating a target detection result.

The image to be detected can be input into the trained neural network, and the backbone network in the trained neural network performs convolution processing on the image to be detected for multiple times to generate an intermediate characteristic diagram corresponding to the image to be detected. The structure of the backbone network in the neural network can be set according to actual needs.

Here, the intermediate feature maps may be input to a steering wheel detection branch network and a hand detection branch network of the neural network, respectively, to generate a steering wheel detection result and a human hand detection result. The generation of the steering wheel detection result will be described in detail below.

Here, the intermediate feature map may be first subjected to at least one first convolution process (i.e., a target convolution process), so as to generate a detection feature map of a plurality of channels corresponding to the steering wheel, where the number of channels corresponding to the detection feature map may be three. The detection feature map comprises a first channel feature map (the first channel feature map is a target channel feature map) representing a position, a second channel feature map representing detection frame length information, and a third channel feature map representing detection frame width information.

And then, performing characteristic value conversion processing on the target channel characteristic diagram at the characteristic position in the detection characteristic diagrams of the plurality of channels by using an activation function to generate a converted target channel characteristic diagram, wherein each characteristic value in the converted target channel characteristic diagram is a numerical value between 0 and 1. Wherein, the activation function can be a sigmoid function. For the feature value of any feature point in the converted target channel feature map, if the feature value tends to 1, the probability that the feature point corresponding to the feature value belongs to the center point of the detection frame of the steering wheel is higher.

Then, performing maximum pooling processing on the converted target channel characteristic diagram according to a preset pooling size and a pooling step length to obtain a pooling value corresponding to each characteristic position in the target channel characteristic diagram and a position index corresponding to each pooling value; the location index may be used to identify the location of the pooled value in the transformed target channel feature map. Furthermore, the same position index in the position index corresponding to each feature position may be merged to obtain a plurality of pooled values corresponding to the target channel feature map and a position index corresponding to each pooled value in the plurality of pooled values. The preset pooling size and the pooling step length may be set according to actual needs, for example, the preset pooling size may be 3 × 3, and the preset pooling step length may be 1.

Further, first detection frame information (i.e., target detection frame information) corresponding to the steering wheel may be generated based on the plurality of pooling values and the position index corresponding to each of the plurality of pooling values.

For example, the maximum pooling process with a step size of 1 may be performed on the target channel profile by 3 × 3; in pooling, for each feature value of 3 × 3 feature points in the target channel feature map, the maximum response value (i.e., pooled value) of the 3 × 3 feature points and the position index of the maximum response value on the target channel feature map are determined. At this time, the number of the maximum response values is related to the size of the target channel feature map; for example, if the size of the target channel feature map is 80 × 60 × 3, the maximum response values obtained after performing the maximum pooling process on the target channel feature map are 80 × 60 in total; and for each maximum response value there may be at least one other maximum response value that is the same as its position index. And then combining the maximum response values with the same position index to obtain M maximum response values and a position index corresponding to each maximum response value in the M maximum response values. Finally, first detection frame information corresponding to the steering wheel is generated based on the M maximum response values (pooled values) and the position index corresponding to each maximum response value.

The determining process of the second detection frame information corresponding to the human hand may refer to the determining process of the first detection frame information corresponding to the steering wheel, and is not repeated here.

After obtaining the first detection frame information corresponding to the steering wheel, the first detection frame information may be determined as a steering wheel detection result. And when the first detection frame information corresponding to the steering wheel is not obtained, determining that the steering wheel detection result does not include the steering wheel. And after the second detection frame information corresponding to the human hand is obtained, determining the second detection frame information as a human hand detection result. And when the second detection frame information corresponding to the human hand is not obtained, determining that the human hand detection result does not include the human hand.

In an alternative embodiment, generating the target detection frame information based on the plurality of pooled values and the position index corresponding to each of the plurality of pooled values may include:

a1, when at least one of the pooling values is larger than the set pooling threshold, determining a target pooling value of the center point of the target detection box from the pooling values based on the pooling values and the pooling threshold.

A2, generating object detection frame information based on the position index corresponding to the object pooling value.

Continuing with the example of the steering wheel, here, a pooling threshold may be set, and when at least one of the pooling values is greater than the set pooling threshold, the pooling values may be filtered based on the set pooling threshold to obtain a target pooling value of the pooling values that is greater than the pooling threshold. In the case where each of the plurality of pooling values is less than or equal to the set pooling threshold, then there is no target pooling value, i.e., there is no first detection frame information of the steering wheel.

Further, the center point position information of the first detection frame corresponding to the steering wheel may be generated based on the position index corresponding to the target pooling value. The pooling threshold value corresponding to the steering wheel may be the same as or different from the pooling threshold value corresponding to the driver's hand. Specifically, the pooling threshold corresponding to the steering wheel and the pooling threshold corresponding to the hands of the driver may be determined according to actual conditions. For example, a plurality of frames of sample images acquired by the camera device corresponding to the image to be detected can be acquired, and the pooling threshold corresponding to the steering wheel and the pooling threshold corresponding to the driver's hand are respectively generated according to the acquired plurality of frames of sample images by using a self-adaptive algorithm.

Continuing with the above example, after obtaining M maximum response values and the position index corresponding to each of the M maximum response values, each of the M maximum response values may be compared with the pooling threshold; when a certain maximum response value is greater than the pooling threshold value, the maximum response value is determined as a target pooling value. And the position index corresponding to the target pooling value is the central point position information of the first detection frame of the steering wheel.

Here, the maximum pooling processing may also be directly performed on the target channel feature map before the conversion, so as to obtain the central point position information of the first detection frame of the steering wheel.

For example, after the center point position information of the first detection frame of the steering wheel is obtained, based on the center point position information, a second feature value at a feature position matched with the center point position information may be selected from the second channel feature map, the selected second feature value may be determined as a length corresponding to the first detection frame of the steering wheel, a third feature value at a feature position matched with the center point position information may be selected from the third channel feature map, and the selected third feature value may be determined as a width corresponding to the first detection frame of the steering wheel, so as to obtain the size information of the first detection frame of the steering wheel.

When the driver is faced with the hand, one or two pieces of second detection frame information can be obtained, namely the second detection frame information corresponding to the left hand and/or the right hand respectively can be obtained. Specifically, the process of determining the second detection frame information corresponding to the hand of the driver may refer to the process of determining the first detection frame information of the steering wheel, and is not repeated here.

In an alternative embodiment, generating the target detection frame information based on the plurality of pooled values and the position index corresponding to each of the plurality of pooled values includes: and determining that the target detection frame information is empty when the plurality of pooling values are less than or equal to the set pooling threshold.

When a plurality of pooling values corresponding to the steering wheel are smaller than or equal to the set pooling threshold, determining that the first detection frame information of the steering wheel is empty; and when at least one pooling value in a plurality of pooling corresponding to the steering wheel is larger than the set pooling threshold, determining that the first detection frame information of the steering wheel is not empty.

After the steering wheel detection result and the human hand detection result are obtained, the driving behavior category of the driver may be determined based on the steering wheel detection result and the human hand detection result.

determining the position relation between the steering wheel and the hand according to the detection result of the steering wheel and the detection result of the hand;

Here, the positional relationship between the steering wheel and the human hand may be determined based on the result of the detection of the steering wheel and the result of the detection of the human hand, and the type of the driving behavior of the driver, that is, whether the driver is driving safely or driving dangerously, may be determined based on the determined positional relationship.

In one possible implementation, determining the driving behavior category of the driver according to the position relationship includes: and determining the driving behavior class of the driver as safe driving under the condition that the position relation indicates that the driver holds the steering wheel by hand.

Here, when it is detected that the positional relationship indicates that the driver holds the steering wheel, the behavior category of the driver is determined as safe driving. The driver's hand holding the steering wheel includes the driver holding the steering wheel in the left hand, the driver holding the steering wheel in the right hand, or both the driver's hands holding the steering wheel.

In one possible implementation, determining the driving behavior category of the driver according to the position relationship includes: and under the condition that the position relation indicates that the two hands of the driver are separated from the steering wheel, determining the driving behavior category of the driver as dangerous driving.

Here, when the positional relationship is determined such that the driver's both hands are off the steering wheel, the type of the driving behavior of the driver is determined as dangerous driving.

In one possible embodiment, when the steering wheel is included in the steering wheel detection result and the human hand is not included in the human hand detection result, determining the driving behavior category of the driver according to the target detection result includes: and determining the driving behavior type of the driver as dangerous driving according to the target detection result.

Here, if the steering wheel is detected in the steering wheel detection result and the human hand is not detected in the human hand detection result, it is represented that both hands of the driver are separated from the steering wheel, and the driving behavior type of the driver is determined to be dangerous driving.

For example, if the steering wheel is not detected in the steering wheel detection result, the image to be detected is determined to be an abnormal image, so that the driving behavior category of the driver is determined to be abnormal.

In one possible embodiment, determining the positional relationship between the steering wheel and the human hand according to the result of the detection of the steering wheel and the result of the detection of the human hand includes: under the condition that the hand detection result comprises one hand, if the detection frame corresponding to the hand in the hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result have an overlapped area, determining the position relationship between the steering wheel and the hand to enable a driver to hold the steering wheel; and if the detection frame corresponding to the hand in the hand detection result does not have an overlapping area with the detection frame corresponding to the steering wheel in the steering wheel detection result, determining that the position relationship between the steering wheel and the hand is that the two hands of the driver are separated from the steering wheel.

In one possible embodiment, determining the positional relationship between the steering wheel and the human hand according to the result of the detection of the steering wheel and the result of the detection of the human hand includes: under the condition that the hand detection result comprises two hands, if the detection frames corresponding to the two hands in the hand detection result do not have overlapping areas with the detection frames corresponding to the steering wheel in the steering wheel detection result, determining the position relationship between the steering wheel and the hands to enable the two hands of the driver to be separated from the steering wheel; and if the detection frame corresponding to at least one hand in the hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result have an overlapping area, determining the position relationship between the steering wheel and the hands as that the driver holds the steering wheel by hand.

Here, the positional relationship between the steering wheel and the human hand may be determined using a detection frame corresponding to the steering wheel in the steering wheel detection result and a detection frame corresponding to the human hand in the human hand detection result.

And determining that the position relation between the steering wheel and the hand is the hand-held steering wheel when the detection frame corresponding to the hand of the person and the detection frame corresponding to the steering wheel have the overlapped area under the condition that the hand detection result comprises one hand. And when the detection frame corresponding to the hand of the person and the detection frame corresponding to the steering wheel have the non-overlapping area, determining the position relation between the steering wheel and the hand of the person as that the hand is separated from the steering wheel.

And under the condition that the hand detection result comprises two hands, when the detection frame corresponding to at least one hand of the person and the detection frame corresponding to the steering wheel have an overlapped area, determining that the position relation between the steering wheel and the hands is the hand-holding steering wheel. And when the detection frames respectively corresponding to the two hands of the user and the detection frames corresponding to the steering wheel have non-coincident areas, determining the position relationship between the steering wheel and the hands of the user to be that the hands are separated from the steering wheel.

In an alternative embodiment, determining the positional relationship between the steering wheel and the human hand according to the result of the detection of the steering wheel and the result of the detection of the human hand may include:

generating an intermediate characteristic diagram corresponding to the image to be detected based on the image to be detected;

performing convolution processing on the intermediate characteristic diagram at least once to generate a classification characteristic diagram of the two channels corresponding to the intermediate characteristic diagram; each channel feature map in the classification feature maps of the two channels corresponds to one type of human hand.

Extracting two characteristic values at characteristic positions matched with the central point position information from the classification characteristic diagram based on the central point position information indicated by the detection frame information corresponding to the human hand in the human hand detection result; and selecting the maximum characteristic value from the two characteristic values, and determining the category of the channel characteristic diagram corresponding to the maximum characteristic value in the classification characteristic diagrams as the category corresponding to the central point position information.

And determining the position relation between the steering wheel and the hands of the person on the basis of the category corresponding to each piece of central point position information indicated by the detection frame information corresponding to the hands of the person.

Here, when the second detection frame information indicated by the human hand detection result is not empty, the intermediate feature map may be subjected to convolution processing at least once to generate a classification feature map of two channels corresponding to the intermediate feature map. Each channel feature map in the classification feature maps of the two channels corresponds to one type of human hand. For example, in the classification feature map, the category corresponding to the channel feature map of the 0 th channel may be that the driver's hand is separated from the steering wheel; the channel characteristic map of the 1 st channel corresponds to a category that a driver holds a steering wheel.

Further, two feature values at a feature position matched with the center point position information may be extracted from the classification feature map based on the center point position information indicated by the checkbox information corresponding to the human hand, a maximum feature value may be selected from the two feature values, and a category of the channel feature map corresponding to the maximum feature value in the classification feature map may be determined as a category corresponding to the center point position information.

When the detection frame information corresponding to the human hand includes two pieces of central point position information (that is, includes central point position information corresponding to the left hand and central point position information corresponding to the right hand), the category corresponding to the central point position information is determined for each piece of central point position information.

For example, in the classification feature map, the category corresponding to the channel feature map of the 0 th channel may be that the driver's hand is separated from the steering wheel; the category corresponding to the channel feature map of the 1 st channel may be that a driver holds a steering wheel with hands, two feature values, that is, 0.8 and 0.2, are extracted from the classification feature map for the center point position information corresponding to the left hand, and the category of the 0 th channel feature map corresponding to 0.8 in the classification feature map is determined as the category of the center point position information corresponding to the left hand, that is, the category of the center point position information corresponding to the left hand is that the driver's hand is separated from the steering wheel. Meanwhile, the category of the central point position information corresponding to the right hand can be obtained.

In an alternative embodiment, determining the position relationship between the steering wheel and the human hand based on the category corresponding to each piece of central point position information indicated by the detection frame information corresponding to the human hand comprises:

in the first mode, when the detection frame information corresponding to the human hand includes one piece of center point position information, the category corresponding to the center point position information is determined as the position relationship between the steering wheel and the human hand.

Determining the position relation between the steering wheel and the hands of the driver to be that the hands of the driver are separated from the steering wheel under the condition that the detection frame information corresponding to the hands of the driver comprises two pieces of central point position information and the category corresponding to the two pieces of central point position information is that the hands of the driver are separated from the steering wheel; and determining the position relation between the steering wheel and hands as the hand holding of the steering wheel by the driver under the condition that at least one category corresponding to the central point position information exists in the categories corresponding to the two pieces of central point position information.

In the first mode, when the detection frame information corresponding to the hand includes center point position information, that is, the detection frame information corresponding to the hand includes center point position information corresponding to the left hand or center point position information corresponding to the right hand, the category corresponding to the center point position information in the detection frame information corresponding to the hand may be determined as the position relationship between the steering wheel and the hand. For example, the detection frame information corresponding to the human hand includes center point position information corresponding to a left hand, and when the type of the center point position information corresponding to the left hand is that the driver holds the steering wheel, the position relationship between the steering wheel and the human hand is that the driver holds the steering wheel.

According to the second mode, the detection frame information corresponding to the hands of the driver comprises two central point position information, namely the detection frame information corresponding to the hands of the driver comprises central point position information corresponding to a left hand and central point position information corresponding to a right hand, and when the categories corresponding to the two central point position information are that the hands of the driver are separated from the steering wheel, the position relationship between the steering wheel and the hands of the driver is determined to be that the hands of the driver are separated from the steering wheel; and when the type of the central point position information corresponding to the left hand is that the driver holds the steering wheel, and/or the type of the central point position information corresponding to the right hand is that the driver holds the steering wheel, determining the position relationship between the steering wheel and the hands of the driver as that the driver holds the steering wheel.

For S104:

here, when it is determined that the driving behavior type of the driver is dangerous driving, warning information for the driver may be generated based on the driving behavior type of the driver. Wherein, the warning information can be played in a voice mode. For example, the generated warning message may be "danger, please grip the steering wheel".

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same concept, an embodiment of the present disclosure further provides a driver behavior detection apparatus, as shown in fig. 3, which is an architecture schematic diagram of the driver behavior detection apparatus provided in the embodiment of the present disclosure, and includes an obtaining module 301, a detecting module 302, a determining module 303, and an alerting module 304, specifically:

the acquisition module 301 is configured to acquire an image to be detected corresponding to a driving position area in a cabin;

the detection module 302 is configured to detect the image to be detected to obtain a target detection result, where the target detection result includes a steering wheel detection result and a human hand detection result;

the determining module 303 is configured to determine a driving behavior category of the driver according to the target detection result;

and the warning module 304 is configured to send warning information when the driving behavior category of the driver is dangerous driving.

In one possible implementation manner, when the steering wheel is included in the steering wheel detection result and the human hand is included in the human hand detection result, the determining module 303, when determining the driving behavior category of the driver according to the target detection result, is configured to:

In one possible embodiment, the determining module 303, when determining the driving behavior category of the driver according to the position relationship, is configured to:

In one possible implementation, when the steering wheel is included in the steering wheel detection result, and the human hand is not included in the human hand detection result, the determining module 303, when determining the driving behavior category of the driver according to the target detection result, is configured to:

In a possible implementation manner, the detecting module 302, when detecting the image to be detected to obtain a target detection result, is configured to:

In one possible implementation, the detecting module 302, when generating the target detection frame information based on the plurality of pooled values and the position index corresponding to each of the plurality of pooled values, is configured to:

determining a target pooling value of a center point of the target detection box from the plurality of pooling values based on the plurality of pooling values and a pooling threshold value if there is at least one of the plurality of pooling values that is greater than a set pooling threshold value;

In a possible implementation, the detecting module 302, when generating the target detection frame information based on a plurality of pooling values and a position index corresponding to each of the plurality of pooling values, is configured to:

In one possible implementation, the determining module 303, when determining the position relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result, is configured to:

In a possible implementation manner, the determining module 303, when determining the position relationship between the steering wheel and the human hand based on the category corresponding to each central point position information indicated by the detection frame information corresponding to the human hand, is configured to:

determining the position relation between the steering wheel and the hands of the driver to enable the hands of the driver to be separated from the steering wheel under the condition that the detection frame information corresponding to the hands of the driver comprises two central point position information and the category corresponding to the two central point position information is that the hands of the driver are separated from the steering wheel; and determining the position relationship between the steering wheel and the hands of the driver when at least one category corresponding to the central point position information exists in the categories corresponding to the two pieces of central point position information is that the driver holds the steering wheel by hand.

In some embodiments, the functions of the apparatus provided in the embodiments of the present disclosure or the included templates may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, no further description is provided here.

Based on the same technical concept, the embodiment of the disclosure also provides an electronic device. Referring to fig. 4, a schematic structural diagram of an electronic device provided in the embodiment of the present disclosure includes a processor 401, a memory 402, and a bus 403. The memory 402 is used for storing execution instructions and includes a memory 4021 and an external memory 4022; the memory 4021 is also referred to as an internal memory, and is configured to temporarily store operation data in the processor 401 and data exchanged with the external memory 4022 such as a hard disk, the processor 401 exchanges data with the external memory 4022 through the memory 4021, and when the electronic device 400 operates, the processor 401 communicates with the memory 402 through the bus 403, so that the processor 401 executes the following instructions:

acquiring an image to be detected of a driving position area in a cabin;

Furthermore, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the driver behavior detection method described in the above method embodiments.

The computer program product of the driver behavior detection method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the driver behavior detection method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above are only specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A driver behavior detection method, characterized by comprising:

acquiring an image to be detected of a driving position area in a cabin;

2. The method according to claim 1, wherein when the steering wheel is included in the steering wheel detection result and the human hand is included in the human hand detection result, determining the driving behavior category of the driver according to the target detection result comprises:

3. The method according to claim 2, wherein determining the driving behavior category of the driver from the positional relationship comprises:

4. The method according to claim 2, wherein determining the driving behavior category of the driver from the positional relationship comprises:

5. The method according to claim 1, wherein when a steering wheel is included in the steering wheel detection result and a human hand is not included in the human hand detection result, determining the driving behavior category of the driver according to the target detection result comprises:

6. The method according to any one of claims 1 to 5, wherein detecting the image to be detected to obtain a target detection result comprises:

7. The method of claim 6, wherein generating the target detection box information based on the plurality of pooled values and a position index corresponding to each of the plurality of pooled values comprises:

8. The method of claim 6, wherein generating the target detection box information based on the plurality of pooled values and a position index corresponding to each of the plurality of pooled values comprises:

9. The method of claim 2, wherein determining the positional relationship between the steering wheel and the human hand based on the steering wheel detection result and the human hand detection result comprises:

10. The method of claim 2, wherein determining the positional relationship between the steering wheel and the human hand based on the steering wheel detection result and the human hand detection result comprises:

11. The method of claim 2, wherein determining the positional relationship between the steering wheel and the human hand based on the steering wheel detection result and the human hand detection result comprises:

12. The method according to claim 11, wherein determining the position relationship between the steering wheel and the human hand based on the category corresponding to each central point position information indicated by the detection frame information corresponding to the human hand comprises:

13. A driver behavior detection device characterized by comprising:

14. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the driver behavior detection method according to any one of claims 1 to 12.

15. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the driver behavior detection method according to one of claims 1 to 12.