WO2022027894A1

WO2022027894A1 - Driver behavior detection method and apparatus, electronic device, storage medium and program

Info

Publication number: WO2022027894A1
Application number: PCT/CN2020/135501
Authority: WO
Inventors: 王飞; 钱晨
Original assignee: 上海商汤临港智能科技有限公司
Priority date: 2020-08-07
Filing date: 2020-12-10
Publication date: 2022-02-10
Also published as: CN111931639A; KR20220032074A; JP2023500218A

Abstract

A driver behavior detection method and apparatus, an electronic device, a computer storage medium and a computer program, the method comprising: acquiring an image to be detected of the driving position area in a vehicle cabin (S101); detecting the image to be detected and obtaining target detection results, which comprise a steering wheel detection result and a human hand detection result (S102); determining a driving behavior type of the driver according to the target detection results (S103); and sending warning information when the driving behavior type of the driver is dangerous driving (S104).

Description

Driver behavior detection method, device, electronic device, storage medium and program

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number of 202010790208.3 and the filing date of August 7, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.

technical field

The present application relates to the technical field of deep learning, and in particular, to a driver behavior detection method, device, electronic device, computer storage medium and computer program.

Background technique

With the rapid development of vehicles, vehicles have become an important means of transportation for users, making safe driving of vehicles one of the important issues in the current automotive industry. The safe driving of a vehicle is determined by many factors, such as the driver's driving behavior, road conditions, and weather conditions.

In general, dangerous driving behavior is one of the main factors that cause most traffic accidents. Therefore, in order to improve the driving safety and ensure the safety of the passengers and the driver, the driving behavior of the driver can be detected.

SUMMARY OF THE INVENTION

The embodiments of the present application provide at least one driver behavior detection method, device, electronic device, computer storage medium, and computer program.

The embodiment of the present application provides a driver behavior detection method, including:

Obtain the to-be-detected image of the driving position area in the cabin;

Detecting the to-be-detected image to obtain a target detection result, where the target detection result includes a steering wheel detection result and a human hand detection result;

According to the target detection result, determine the driving behavior category of the driver;

When the driving behavior category of the driver is dangerous driving, a warning message is issued.

Using the above method, the target detection result is obtained by detecting the to-be-detected image corresponding to the obtained driving position area, and the target detection result includes the steering wheel detection result and the hand detection result, and the driving behavior category of the driver is determined through the target detection result. , and when the driver's driving behavior category is dangerous driving, a warning message is issued to realize the detection of the driver's driving behavior, so as to facilitate the safety reminder to the driver and improve the safety of vehicle driving.

In some embodiments of the present application, when the steering wheel detection result includes a steering wheel, and the human hand detection result includes a human hand, the driving behavior category of the driver is determined according to the target detection result, including:

According to the detection result of the steering wheel and the detection result of the human hand, determine the positional relationship between the steering wheel and the human hand;

According to the position relationship, the driving behavior category of the driver is determined.

In some embodiments of the present application, the driving behavior category of the driver is determined according to the position relationship, including:

When the positional relationship indicates that the driver is holding the steering wheel, it is determined that the driving behavior category of the driver is safe driving.

When the positional relationship indicates that the driver's hands are off the steering wheel, it is determined that the driver's driving behavior is classified as dangerous driving.

In some embodiments of the present application, determining the positional relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result, including:

In the case where the human hand detection result includes a human hand, if the detection frame corresponding to the human hand in the human hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result have an overlapping area, it is determined that the steering wheel and the detection frame corresponding to the steering wheel in the steering wheel detection result overlap. The positional relationship between the human hands is that the driver holds the steering wheel; if there is no overlapping area between the detection frame corresponding to the human hand in the human hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result, it is determined that the steering wheel and the The positional relationship between the human hands is that the driver's hands are off the steering wheel.

In the case where the human hand detection result includes two human hands, if there is no overlapping area between the detection frames corresponding to the two human hands in the human hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result, determine the The positional relationship between the steering wheel and the human hand is that the driver's hands are off the steering wheel; if there is an overlapping area between the detection frame corresponding to at least one human hand in the detection result of the human hand and the detection frame corresponding to the steering wheel in the detection result of the steering wheel, determine The positional relationship between the steering wheel and the human hand is that the driver holds the steering wheel.

generating an intermediate feature map corresponding to the to-be-detected image based on the to-be-detected image;

Perform at least one convolution process on the intermediate feature map to generate a two-channel classification feature map corresponding to the intermediate feature map; wherein, each channel feature map in the two-channel classification feature map corresponds to a type of human hand. category;

Based on the center point position information indicated by the detection frame information corresponding to the human hand in the hand detection result, two feature values at the feature positions matching the center point position information are extracted from the classification feature map; The maximum eigenvalue is selected from the value, and the category of the channel feature map corresponding to the maximum eigenvalue in the classification feature map is determined as the category corresponding to the center point position information;

Based on the category corresponding to each center point position information indicated by the detection frame information corresponding to the human hand, the positional relationship between the steering wheel and the human hand is determined.

In the above embodiment, the detection result of the steering wheel is determined, and the classification feature map is generated by performing at least one convolution process on the intermediate feature map, and then combined with the generated center point position information of the driver's hand, the steering wheel and the steering wheel can be more accurately determined. The positional relationship between the hands.

In some embodiments of the present application, determining the positional relationship between the steering wheel and the human hand based on the category corresponding to each center point position information indicated by the detection frame information corresponding to the human hand, including:

In the case that the detection frame information corresponding to the human hand includes a center point position information, the category corresponding to the center point position information is determined as the positional relationship between the steering wheel and the human hand;

In the case that the detection frame information corresponding to the human hand includes two center point position information, and the category corresponding to the two center point position information is that the driver's hand is off the steering wheel, determine the distance between the steering wheel and the human hand. The positional relationship is that the driver's hands are off the steering wheel; in the categories corresponding to the two center point position information, there is at least one category corresponding to the center point position information is that the driver is holding the steering wheel. The positional relationship between the hands is that the driver holds the steering wheel.

In some embodiments of the present application, the steering wheel detection result includes a steering wheel, and when the human hand detection result does not include a human hand, determining the driving behavior category of the driver according to the target detection result, including:

According to the target detection result, it is determined that the driving behavior category of the driver is dangerous driving.

In some embodiments of the present application, the image to be detected is detected to obtain a target detection result, including:

Perform at least one target convolution process on the intermediate feature map to generate detection feature maps of multiple channels corresponding to the intermediate feature map;

Using the activation function to perform feature value conversion processing on each feature value of the target channel feature map representing the position in the detection feature maps of multiple channels, to generate a converted target channel feature map;

According to the preset pooling size and pooling step size, the converted target channel feature map is subjected to maximum pooling processing to obtain multiple pooling values and the position corresponding to each pooling value in the multiple pooling values. index; the position index is used to identify the position of the pooled value in the converted target channel feature map;

generating target detection frame information based on a plurality of pooling values and a position index corresponding to each of the plurality of pooling values;

The target detection result is determined according to the target detection frame information.

In the above embodiment, by performing maximum pooling on the target channel feature map, a plurality of pooled values and a position index corresponding to each of the plurality of pooled values are obtained, and target detection frame information is generated. , which provides data support for generating target detection results.

In some embodiments of the present application, generating the target detection frame information based on multiple pooling values and a position index corresponding to each of the multiple pooling values includes:

In the case where at least one pooling value is greater than the set pooling threshold value among the plurality of pooling values, based on the plurality of pooling values and the pooling threshold, it is determined from the plurality of pooling values that the The target pooling value of the center point of the target detection frame;

The target detection frame information is generated based on the position index corresponding to the target pooling value.

In the above embodiment, the pooling value greater than the pooling threshold among the plurality of pooling values is determined as the target pooling value belonging to the center point of the target detection frame of the steering wheel or the driver's hand, based on the corresponding target pooling value. The position index generates at least one target detection frame information of the steering wheel or the driver's hand more accurately.

In the case that the multiple pooling values are less than or equal to the set pooling threshold, it is determined that the target detection frame information is empty.

For descriptions of the effects of the following apparatuses, electronic devices, etc., reference may be made to the descriptions of the above-mentioned methods, which will not be repeated here.

The embodiment of the present application provides a driver behavior detection device, including:

an acquisition module, configured to acquire a to-be-detected image of the driving position area in the vehicle cabin;

a detection module, configured to detect the to-be-detected image to obtain a target detection result, where the target detection result includes a steering wheel detection result and a hand detection result;

a determining module, configured to determine the driving behavior category of the driver according to the target detection result;

The warning module is configured to issue warning information when the driving behavior category of the driver is dangerous driving.

An embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, and when the electronic device runs, the processor and the memory The machine-readable instructions are executed by the processor to execute the driver behavior detection method according to any one of the above embodiments.

An embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the driver behavior detection method described in any of the foregoing embodiments is executed.

Embodiments of the present application further provide a computer program, including computer-readable codes. When the computer-readable codes are executed in an electronic device, the processor in the electronic device executes the program for implementing the above-mentioned methods in any of the foregoing embodiments. The described driver behavior detection method.

In order to make the above-mentioned objects, features and advantages of the present application more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

Description of drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the embodiments. The drawings here are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present application, and together with the description, are used to illustrate the technical solutions of the present application. It should be understood that the following drawings only show some embodiments of the present application, and therefore should not be regarded as a limitation of the scope. Other related figures are obtained from these figures.

FIG. 1a shows a schematic diagram of an application scenario of an embodiment of the present application;

Fig. 1b shows a schematic flowchart of a driver behavior detection method provided by an embodiment of the present application;

2 shows a schematic flowchart of a specific method for detecting an image to be detected and obtaining a target detection result in a driver behavior detection method provided by an embodiment of the present application;

FIG. 3 shows a schematic structural diagram of a driver behavior detection device provided by an embodiment of the present application;

FIG. 4 shows a schematic structural diagram of an electronic device provided by an embodiment of the present application.

detailed description

In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is only a part of the embodiments of the present application, but not all of the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present application.

Consider that risky driving behavior is one of the main factors that cause most traffic accidents. Therefore, in order to improve the driving safety and ensure the safety of the passengers and the driver, the driving behavior of the driver can be detected. Therefore, in order to solve the above problem, the embodiment of the present application provides a driver behavior detection method.

To facilitate understanding of the embodiments of the present application, a driver behavior detection method disclosed in the embodiments of the present application is first introduced in detail.

In some embodiments of the present application, the driver behavior detection method may be performed by a driver behavior detection device, and the driver behavior detection device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular Phone, cordless phone, Personal Digital Assistant (PDA), handheld device, computing device, vehicle-mounted device, wearable device, etc., the method can be implemented by the processor calling the computer-readable instructions stored in the memory . Alternatively, the method can be performed by a server.

The present application will be further described below with reference to an application scenario.

The driver behavior detection method of the embodiment of the present application can be applied to application scenarios such as driver driving; FIG. 1a is a schematic diagram of an application scenario of the embodiment of the present application, as shown in FIG. 10. Acquire an image to be detected 11 of the driving position area, and the image to be detected 11 includes a steering wheel 110 and a human hand 111; in the driver behavior detection device 10, the steering wheel 110 and the human hand can be obtained by performing processing through the driver behavior detection described in the foregoing embodiment. 111; further, according to the detection results of the steering wheel 110 and the human hand 111, the driving behavior category of the driver can be determined, and when the driving behavior category of the driver is dangerous driving, the driver behavior detection device 10 issues a warning message, The detection of the driving behavior of the driver is realized, thereby facilitating the safety reminder to the driver and improving the safety of vehicle driving.

Referring to Fig. 1b, which is a schematic flowchart of a driver behavior detection method provided by an embodiment of the present application, the method includes S101-S104, wherein:

S101, acquiring an image to be detected of a driving position area in the vehicle cabin.

S102: Detect the image to be detected to obtain a target detection result, where the target detection result includes a steering wheel detection result and a human hand detection result.

S103: Determine the driving behavior category of the driver according to the target detection result.

S104, when the driving behavior category of the driver is dangerous driving, a warning message is issued.

In the above method, the target detection result is obtained by detecting the acquired image to be detected in the driving position area, and the target detection result includes the steering wheel detection result and the hand detection result, and the driving behavior category of the driver is determined by the target detection result, And when the driver's driving behavior category is dangerous driving, a warning message is issued to realize the detection of the driver's driving behavior, so as to facilitate the safety reminder to the driver and improve the safety of vehicle driving.

For S101:

Here, a camera device may be provided in the vehicle cabin, and an image to be detected of the driving position area may be acquired in real time through the camera device provided in the vehicle cabin. Wherein, the installation position of the imaging device may be a position where the steering wheel and the driver's seat area in the driving position area can be photographed.

For S102 and S103:

Here, the images to be detected may be input into the trained neural network, and the images to be detected are detected respectively to obtain target detection results, wherein the target detection results include steering wheel detection results and human hand detection results. The steering wheel detection result includes the information of whether there is a steering wheel in the image to be detected. When there is a steering wheel, the steering wheel detection result includes the detection frame information of the steering wheel; the human hand detection result includes the detection frame information of whether there is a human hand in the image to be detected. , the hand detection result includes the detection frame information of the hand.

In an optional embodiment, as shown in FIG. 2 , the image to be detected is detected to obtain a target detection result, which may include:

S201 , based on the image to be detected, generate an intermediate feature map corresponding to the image to be detected.

S202: Perform at least one target convolution process on the intermediate feature map to generate detection feature maps of multiple channels corresponding to the intermediate feature map.

S203 , using an activation function to perform feature value conversion processing on each feature value of the target channel feature map representing the position in the detection feature maps of the multiple channels, to generate a converted target channel feature map.

S204, according to the preset pooling size and pooling step size, perform maximum pooling processing on the converted target channel feature map to obtain multiple pooling values and each pooling value corresponding to the multiple pooling values The position index of ; the position index is used to identify the position of the pooled value in the transformed target channel feature map.

S205 , generating target detection frame information based on the plurality of pooling values and a position index corresponding to each of the plurality of pooling values.

S206: Determine the target detection result according to the target detection frame information.

The above embodiment obtains multiple pooled values and a position index corresponding to each of the multiple pooled values by performing maximum pooling processing on the target channel feature map, and generates target detection frame information, which is Generating object detection results provides data support.

The image to be detected can be input into the trained neural network, and the backbone network in the trained neural network performs multiple convolution processing on the image to be detected to generate an intermediate feature map corresponding to the image to be detected. Among them, the structure of the backbone network in the neural network can be set according to actual needs.

Here, the intermediate feature map can be input into the steering wheel detection branch network and the hand detection branch network of the neural network respectively, to generate the steering wheel detection result and the human hand detection result. The generation of the steering wheel detection result will be described in detail below.

Here, at least one first convolution process (ie, target convolution process) may be performed on the intermediate feature map to generate detection feature maps of multiple channels corresponding to the steering wheel, and the number of channels corresponding to the detection feature map may be three channels. The detection feature map includes a first channel feature map representing the position (the first channel feature map is the target channel feature map), a second channel feature map representing the length information of the detection frame, and a feature map representing the width information of the detection frame. The third channel feature map.

Then, the activation function can be used to transform the feature value of the target channel feature map representing the position in the detection feature map of multiple channels to generate the converted target channel feature map. Each feature value in the converted target channel feature map is A numeric value between 0-1. Wherein, the activation function may be a sigmoid function. For the feature value of any feature point in the converted target channel feature map, if the feature value is closer to 1, the probability that the feature point corresponding to the feature value belongs to the center point of the detection frame of the steering wheel is greater.

Then, according to the preset pooling size and pooling step size, the maximum pooling process can be performed on the converted target channel feature map to obtain the pooling value corresponding to each feature position in the target channel feature map and each pool. The location index corresponding to the pooled value; the location index can be used to identify the location of the pooled value in the transformed target channel feature map. Then, the same position index in the corresponding position index at each feature position can be merged to obtain the target channel feature map corresponding to multiple pooling values and the position index corresponding to each pooling value in the multiple pooling values. . The preset pooling size and pooling step size may be set according to actual needs. For example, the preset pooling size may be 3×3, and the preset pooling step size may be 1.

Furthermore, the first detection frame information (ie, target detection frame information) corresponding to the steering wheel may be generated based on the plurality of pooled values and the position index corresponding to each of the plurality of pooled values.

In some embodiments of the present application, a 3×3 maximum pooling process with a step size of 1 may be performed on the target channel feature map; during pooling, for every 3×3 feature points in the target channel feature map The feature value of , determine the maximum response value (that is, the pooling value) of the 3 × 3 feature points and the position index of the maximum response value on the feature map of the target channel. At this time, the number of maximum response values is related to the size of the target channel feature map; for example, if the size of the target channel feature map is 80 × 60 × 3, the maximum response obtained after the maximum pooling process is performed on the target channel feature map There are 80×60 values in total; and for each maximum response value, there may be at least one other maximum response value with the same position index. Then, the maximum response values with the same position index are combined to obtain M maximum response values and a position index corresponding to each of the M maximum response values. Finally, based on the M maximum response values (pooled values) and the position index corresponding to each maximum response value, the first detection frame information corresponding to the steering wheel is generated.

The process of determining the information of the second detection frame corresponding to the human hand may refer to the process of determining the information of the first detection frame corresponding to the steering wheel, which will not be repeated here.

After obtaining the first detection frame information corresponding to the steering wheel, the first detection frame information may be determined as the steering wheel detection result. When the first detection frame information corresponding to the steering wheel is not obtained, it is determined that the steering wheel detection result does not include the steering wheel. And after obtaining the second detection frame information corresponding to the human hand, the second detection frame information may be determined as the human hand detection result. When the second detection frame information corresponding to the human hand is not obtained, it is determined that the human hand detection result does not include the human hand.

In an optional embodiment, generating target detection frame information based on a plurality of pooled values and a position index corresponding to each of the plurality of pooled values may include:

Step A1: In the case where at least one pooling value is greater than the set pooling threshold in the multiple pooling values, determine the center of the target detection frame from the multiple pooling values based on the multiple pooling values and the pooling threshold The target pooling value for the point.

Step A2: Generate target detection frame information based on the position index corresponding to the target pooling value.

Continue to take the steering wheel as an example. Here, a pooling threshold can be set. In the case where at least one pooling value is greater than the set pooling threshold among the multiple pooling values, multiple pooling values are set based on the set pooling threshold. The value is filtered to obtain a target pooling value that is greater than the pooling threshold among the multiple pooling values. In the case where each of the multiple pooling values is less than or equal to the set pooling threshold, there is no target pooling value, that is, there is no first detection frame information of the steering wheel.

Further, the center point position information of the first detection frame corresponding to the steering wheel may be generated based on the position index corresponding to the target pooling value. The pooling threshold corresponding to the steering wheel and the pooling threshold corresponding to the driver's hand may be the same or different. Specifically, the pooling threshold corresponding to the steering wheel and the pooling threshold corresponding to the driver's hand may be determined according to the actual situation. For example, multi-frame sample images collected by the camera device corresponding to the image to be detected can be obtained, and an adaptive algorithm can be used to generate a pooling threshold corresponding to the steering wheel and a pooling threshold corresponding to the driver's hand according to the collected multi-frame sample images.

Continuing with the above example, after obtaining the M maximum response values and the position index corresponding to each of the M maximum response values, each of the M maximum response values can be combined with the pooling threshold. Compare; when a certain maximum response value is greater than the pooling threshold, the maximum response value is determined as the target pooling value. The position index corresponding to the target pooling value, that is, the position information of the center point of the first detection frame of the steering wheel.

Here, it is also possible to directly perform maximum pooling processing on the feature map of the target channel before conversion to obtain the center point position information of the first detection frame of the steering wheel.

In some embodiments of the present application, after obtaining the center point position information of the first detection frame of the steering wheel, based on the center point position information, a feature matching the center point position information may be selected from the second channel feature map The second feature value at the position, the selected second feature value is determined as the length corresponding to the first detection frame of the steering wheel, and the third channel feature map at the feature position matching the center point position information is selected from the third channel. feature value, and the selected third feature value is determined as the width corresponding to the first detection frame of the steering wheel, and the size information of the first detection frame of the steering wheel is obtained.

Wherein, for the driver's hand, one or two second detection frame information can be obtained, that is, the second detection frame information corresponding to the left hand and/or the right hand respectively can be obtained. Specifically, for the process of determining the second detection frame information corresponding to the driver's hand, reference may be made to the above-mentioned process of determining the first detection frame information of the steering wheel, which will not be repeated here.

In an optional embodiment, generating target detection frame information based on a plurality of pooling values and a position index corresponding to each of the plurality of pooling values may include: when the plurality of pooling values are less than or When it is equal to the set pooling threshold, it is determined that the target detection frame information is empty.

Here, when the multiple pooling values corresponding to the steering wheel are less than or equal to the set pooling threshold, it is determined that the first detection frame information of the steering wheel is empty; there is at least one pooling value greater than the set pooling value among the multiple pooling values corresponding to the steering wheel When the pooling threshold is , it is determined that the information of the first detection frame of the steering wheel is not empty.

After the steering wheel detection result and the human hand detection result are obtained, the driving behavior category of the driver can be determined based on the steering wheel detection result and the human hand detection result.

In some embodiments of the present application, when the steering wheel detection result includes a steering wheel, and the human hand detection result includes a human hand, the driving behavior category of the driver is determined according to the target detection result, which may include:

Here, the positional relationship between the steering wheel and the human hand can be determined first according to the steering wheel detection result and the human hand detection result, and the driver's driving behavior category can be determined according to the determined positional relationship, that is, whether the driver is driving safely or dangerously.

In some embodiments of the present application, determining the driving behavior category of the driver according to the positional relationship may include: when the positional relationship indicates that the driver is holding the steering wheel, determining that the driving behavior category of the driver is safe drive.

Here, when it is detected that the positional relationship indicates that the driver is holding the steering wheel, it is determined that the driver's behavior category is safe driving. The case where the driver holds the steering wheel includes the driver holding the steering wheel with the left hand, the driver holding the steering wheel with the right hand, or the driver holding the steering wheel with both hands.

In some embodiments of the present application, determining the driving behavior category of the driver according to the positional relationship may include: when the positional relationship indicates that the driver's hands are off the steering wheel, determining that the driver's driving behavior category is dangerous driving.

Here, when it is determined that the positional relationship is that the driver's hands are off the steering wheel, the category of the driving behavior of the driver is determined to be dangerous driving.

In some embodiments of the present application, when the steering wheel detection result includes a steering wheel and the human hand detection result does not include a human hand, determining the driving behavior category of the driver according to the target detection result may include: determining the driver according to the target detection result. is classified as dangerous driving.

Here, if a steering wheel is detected in the steering wheel detection result, but no human hand is detected in the hand detection result, it means that the driver's hands are off the steering wheel, and the driving behavior category of the driver is determined to be dangerous driving.

In some embodiments of the present application, if no steering wheel is detected in the steering wheel detection result, it is determined that the image to be detected is an abnormal image, so the driving behavior category of the driver is determined to be abnormal.

In some embodiments of the present application, determining the positional relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result may include: if the human hand detection result includes a human hand, if the human hand detection result includes a human hand The corresponding detection frame and the detection frame corresponding to the steering wheel in the steering wheel detection result have overlapping areas, and the positional relationship between the steering wheel and the human hand is determined to be that the driver holds the steering wheel; There is no overlapping area in the detection frame corresponding to the steering wheel, and it is determined that the positional relationship between the steering wheel and the human hand is that the driver's hands are off the steering wheel.

In some embodiments of the present application, determining the positional relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result may include: if the human hand detection result includes two human hands, if two human hands are included in the human hand detection result There is no overlapping area between the detection frame corresponding to only the human hand and the detection frame corresponding to the steering wheel in the steering wheel detection result, and the positional relationship between the steering wheel and the human hand is determined to be that the driver's hands are off the steering wheel; if there is at least one human hand in the human hand detection result. There is an overlapping area between the detection frame and the detection frame corresponding to the steering wheel in the steering wheel detection result, and it is determined that the positional relationship between the steering wheel and the human hand is that the driver holds the steering wheel.

Here, the positional relationship between the steering wheel and the human hand can be determined by using the detection frame corresponding to the steering wheel in the steering wheel detection result and the detection frame corresponding to the human hand in the human hand detection result.

When a human hand is included in the human hand detection result, when there is an overlapping area between the detection frame corresponding to the human hand and the detection frame corresponding to the steering wheel, the positional relationship between the steering wheel and the human hand is determined as holding the steering wheel. When a non-overlapping area exists between the detection frame corresponding to the human hand and the detection frame corresponding to the steering wheel, it is determined that the positional relationship between the steering wheel and the human hand is that the hand is disengaged from the steering wheel.

When the human hand detection result includes two human hands, when the detection frame corresponding to at least one human hand and the detection frame corresponding to the steering wheel have an overlapping area, the positional relationship between the steering wheel and the human hand is determined as holding the steering wheel. When the detection frames corresponding to the two hands and the detection frames corresponding to the steering wheel both have non-overlapping regions, it is determined that the positional relationship between the steering wheel and the human hand is that the hand is separated from the steering wheel.

In an optional embodiment, determining the positional relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result may include:

Based on the image to be detected, an intermediate feature map corresponding to the image to be detected is generated;

Perform at least one convolution process on the intermediate feature map to generate a two-channel classification feature map corresponding to the intermediate feature map; wherein each channel feature map in the two-channel classification feature map corresponds to a category of human hands.

Based on the center point position information indicated by the detection frame information corresponding to the human hand in the hand detection result, two eigenvalues at the feature positions matching the center point position information are extracted from the classification feature map; the largest eigenvalue is selected from the two eigenvalues , the category of the channel feature map corresponding to the largest feature value in the classification feature map is determined as the category corresponding to the center point position information.

Here, when the second detection frame information indicated by the hand detection result is not empty, at least one convolution process may be performed on the intermediate feature map to generate a two-channel classification feature map corresponding to the intermediate feature map. Among them, each channel feature map in the two-channel classification feature map corresponds to a category of human hands. For example, in the classification feature map, the category corresponding to the channel feature map of the 0th channel may be that the driver's hand is off the steering wheel; the category corresponding to the channel feature map of the first channel may be that the driver is holding the steering wheel.

Further, based on the center point position information indicated by the frame detection information corresponding to the human hand, two eigenvalues at the feature positions matching the center point position information can be extracted from the classification feature map, and the largest eigenvalue is selected from the two eigenvalues. , and the category of the channel feature map corresponding to the maximum feature value in the classification feature map is determined as the category corresponding to the center point position information.

When the detection frame information corresponding to the human hand includes two center point position information (that is, including the center point position information corresponding to the left hand and the center point position information corresponding to the right hand), for each center point position information, determine the center point position information corresponding category.

For example, if in the classification feature map, the category corresponding to the channel feature map of the 0th channel can be that the driver's hand is off the steering wheel; the category corresponding to the channel feature map of the first channel can be that the driver is holding the steering wheel, then the center corresponding to the left hand Point position information, two feature values are extracted from the classification feature map, namely 0.8 and 0.2, then the classification feature map, the category of the 0th channel feature map corresponding to 0.8 is determined as the center point position information corresponding to the left hand The category of the center point position information corresponding to the left hand is that the driver's hand is off the steering wheel. At the same time, the category of the position information of the center point corresponding to the right hand can be obtained.

In an optional embodiment, based on the category corresponding to each center point position information indicated by the detection frame information corresponding to the human hand, determining the positional relationship between the steering wheel and the human hand may include:

Manner 1: When the detection frame information corresponding to the human hand includes a center point position information, the category corresponding to the center point position information is determined as the positional relationship between the steering wheel and the human hand.

Mode 2: When the detection frame information corresponding to the human hand includes two center point position information, and the category corresponding to the two center point position information is that the driver's hand is detached from the steering wheel, it is determined that the positional relationship between the steering wheel and the human hand is driving. The driver's hand is off the steering wheel; in the categories corresponding to the two center point position information, if there is at least one category corresponding to the center point position information is that the driver holds the steering wheel, the positional relationship between the steering wheel and the human hand is determined as the driver's hand. Hold the steering wheel.

For mode 1, when the detection frame information corresponding to the human hand includes a center point position information, that is, the detection frame information corresponding to the human hand includes the center point position information corresponding to the left hand or the center point position information corresponding to the right hand, the position information corresponding to the human hand can be In the detection frame information, the category corresponding to the position information of the center point is determined as the positional relationship between the steering wheel and the human hand. For example, the detection frame information corresponding to the human hand includes the center point position information corresponding to the left hand, and the type of the center point position information corresponding to the left hand is that when the driver holds the steering wheel, the positional relationship between the steering wheel and the human hand is the driver's hand. Hold the steering wheel.

For the second method, the detection frame information corresponding to the human hand includes two center point position information, that is, the detection frame information corresponding to the human hand includes the center point position information corresponding to the left hand and the center point position information corresponding to the right hand, then the two center The categories corresponding to the point position information are all when the driver's hand is off the steering wheel, and the positional relationship between the steering wheel and the human hand is determined to be that the driver's hand is off the steering wheel; the category of the center point position information corresponding to the left hand is that the driver is holding the steering wheel, and/ Or, when the type of the center point position information corresponding to the right hand is when the driver holds the steering wheel, the positional relationship between the steering wheel and the human hand is determined to be that the driver holds the steering wheel.

For S104:

Here, when it is determined that the driving behavior category of the driver is dangerous driving, warning information for the driver may be generated based on the driving behavior category of the driver. Wherein, the warning information can be played in the form of voice. For example, the generated warning message can be "Danger, please hold the steering wheel".

Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

Based on the same concept, an embodiment of the present application also provides a driver behavior detection device. Referring to FIG. 3 , a schematic structural diagram of a driver behavior detection device provided by an embodiment of the present application includes an acquisition module 301, a detection Module 302, determination module 303, and warning module 304, specifically:

The obtaining module 301 is configured to obtain the to-be-detected image corresponding to the driving position area in the vehicle cabin;

The detection module 302 is configured to detect the to-be-detected image to obtain a target detection result, where the target detection result includes a steering wheel detection result and a human hand detection result;

A determination module 303, configured to determine the driving behavior category of the driver according to the target detection result;

The warning module 304 is configured to issue warning information when the driving behavior category of the driver is dangerous driving.

In some embodiments of the present application, when the steering wheel detection result includes a steering wheel and the human hand detection result includes a human hand, the determining module 303 determines the driving behavior category of the driver according to the target detection result , the configuration is:

In some embodiments of the present application, the determining module 303, when determining the driving behavior category of the driver according to the positional relationship, is configured as follows:

In some embodiments of the present application, when the steering wheel detection result includes a steering wheel and the human hand detection result does not include a human hand, the determining module 303 determines the driving behavior of the driver according to the target detection result category, configure as:

In some embodiments of the present application, the detection module 302, when detecting the to-be-detected image to obtain a target detection result, is configured as follows:

In some embodiments of the present application, the detection module 302, when generating target detection frame information based on a plurality of pooling values and a position index corresponding to each of the plurality of pooling values, is configured as :

In the case where there is at least one pooling value in the plurality of pooling values that is greater than a set pooling threshold, determining the pooling value from the plurality of pooling values based on the plurality of pooling values and the pooling threshold The target pooling value of the center point of the target detection frame;

In some embodiments of the present application, when the detection module 302 generates the target detection frame information based on a plurality of pooling values and a position index corresponding to each pooling value in the plurality of pooling values, Configured as:

In some embodiments of the present application, the determining module 303, when determining the positional relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result, is configured to:

In some embodiments of the present application, when the determining module 303 determines the positional relationship between the steering wheel and the human hand based on the category corresponding to each center point position information indicated by the detection frame information corresponding to the human hand, Configured as:

In the case where the detection frame information corresponding to the human hand includes two center point position information, and the category corresponding to the two center point position information is that the driver's hand is off the steering wheel, determine the position between the steering wheel and the human hand The relationship is that the driver's hands are off the steering wheel; in the categories corresponding to the two center point position information, if there is at least one category corresponding to the center point position information is the driver holding the steering wheel, determine the steering wheel and the human hand The positional relationship between them is that the driver holds the steering wheel.

In some embodiments, the functions or templates included in the apparatuses provided in the embodiments of the present application may be used to execute the methods described in the above method embodiments. For specific implementation, reference may be made to the above method embodiments. For brevity, here No longer.

Based on the same technical concept, the embodiments of the present application also provide an electronic device. Referring to FIG. 4 , a schematic structural diagram of an electronic device provided in an embodiment of the present application includes a processor 401 , a memory 402 , and a bus 403 . Wherein, the memory 402 is configured to store execution instructions, including the memory 4021 and the external memory 4022; the memory 4021 here is also called internal memory, and is configured to temporarily store the operation data in the processor 401 and the data exchanged with the external memory 4022 such as the hard disk, The processor 401 exchanges data with the external memory 4022 through the memory 4021. When the electronic device 400 is running, the processor 401 communicates with the memory 402 through the bus 403, so that the processor 401 executes the following instructions:

Obtain the to-be-detected image of the driving position area in the cabin;

In addition, the embodiments of the present application also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the driver behavior detection method described in the above method embodiments is executed A step of.

The computer program product of the driver behavior detection method provided by the embodiments of the present application includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the driver behavior described in the above method embodiments. For the steps of the detection method, reference may be made to the foregoing method embodiments, which will not be repeated here.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only the specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the technical scope disclosed in the present application can easily think of changes or replacements, which should be covered within the scope of the present application. within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Industrial Applicability

The embodiment of the present application discloses a driver behavior detection method, device, electronic device, computer storage medium and computer program. The method includes: obtaining a program code for controlling the operation of a first controlled device through a local agent program; The local agent program sends the program code to the first controlled device, so that the first controlled device runs the program code. In the technical solutions of the embodiments of the present application, the program code can be obtained and pushed based on the local agent program, and then the control of the first controlled device can be realized based on the program code, and the program code can be flexibly edited. Control the first controlled device.

Claims

A driver behavior detection method, comprising:

Obtain the to-be-detected image of the driving position area in the cabin;

Detecting the to-be-detected image to obtain a target detection result, where the target detection result includes a steering wheel detection result and a human hand detection result;

According to the target detection result, determine the driving behavior category of the driver;

When the driving behavior category of the driver is dangerous driving, a warning message is issued.
The method according to claim 1, wherein, when the steering wheel detection result includes a steering wheel and the human hand detection result includes a human hand, the determining the driving behavior category of the driver according to the target detection result includes:

According to the detection result of the steering wheel and the detection result of the human hand, determine the positional relationship between the steering wheel and the human hand;

According to the position relationship, the driving behavior category of the driver is determined.
The method according to claim 2, wherein the determining the driving behavior category of the driver according to the position relationship comprises:

When the positional relationship indicates that the driver is holding the steering wheel, it is determined that the driving behavior category of the driver is safe driving.
The method according to claim 2, wherein the determining the driving behavior category of the driver according to the position relationship comprises:

When the positional relationship indicates that the driver's hands are off the steering wheel, it is determined that the driver's driving behavior is classified as dangerous driving.
The method according to claim 2, wherein the determining the positional relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result comprises:

In the case where the human hand detection result includes a human hand, if the detection frame corresponding to the human hand in the human hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result have an overlapping area, it is determined that the steering wheel and the detection frame corresponding to the steering wheel in the steering wheel detection result overlap. The positional relationship between the human hands is that the driver holds the steering wheel; if there is no overlapping area between the detection frame corresponding to the human hand in the human hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result, it is determined that the steering wheel and the The positional relationship between the human hands is that the driver's hands are off the steering wheel.
The method according to claim 2, wherein the determining the positional relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result comprises:

In the case where the human hand detection result includes two human hands, if there is no overlapping area between the detection frames corresponding to the two human hands in the human hand detection result and the detection frame corresponding to the steering wheel in the steering wheel detection result, determine the The positional relationship between the steering wheel and the human hand is that the driver's hands are off the steering wheel; if there is an overlapping area between the detection frame corresponding to at least one human hand in the detection result of the human hand and the detection frame corresponding to the steering wheel in the detection result of the steering wheel, determine The positional relationship between the steering wheel and the human hand is that the driver holds the steering wheel.
The method according to claim 2, wherein the determining the positional relationship between the steering wheel and the human hand according to the steering wheel detection result and the human hand detection result comprises:

generating an intermediate feature map corresponding to the to-be-detected image based on the to-be-detected image;

Perform at least one convolution process on the intermediate feature map to generate a two-channel classification feature map corresponding to the intermediate feature map; wherein, each channel feature map in the two-channel classification feature map corresponds to a type of human hand. category;

Based on the center point position information indicated by the detection frame information corresponding to the human hand in the hand detection result, two feature values at the feature positions matching the center point position information are extracted from the classification feature map; The maximum eigenvalue is selected from the value, and the category of the channel feature map corresponding to the maximum eigenvalue in the classification feature map is determined as the category corresponding to the center point position information;

The positional relationship between the steering wheel and the human hand is determined based on the category corresponding to each center point position information indicated by the detection frame information corresponding to the human hand.
The method according to claim 7, wherein determining the positional relationship between the steering wheel and the human hand based on the category corresponding to each center point position information indicated by the detection frame information corresponding to the human hand, comprising:

In the case that the detection frame information corresponding to the human hand includes a center point position information, the category corresponding to the center point position information is determined as the positional relationship between the steering wheel and the human hand;

In the case where the detection frame information corresponding to the human hand includes two center point position information, and the category corresponding to the two center point position information is that the driver's hand is off the steering wheel, determine the distance between the steering wheel and the human hand. The positional relationship is that the driver's hands are off the steering wheel; in the categories corresponding to the two center point position information, there is at least one category corresponding to the center point position information that the driver holds the steering wheel, determine the steering wheel and all The positional relationship between the hands is that the driver holds the steering wheel.
The method according to claim 1, wherein when the steering wheel detection result includes a steering wheel and the human hand detection result does not include a human hand, determining the driving behavior category of the driver according to the target detection result, including :

According to the target detection result, it is determined that the driving behavior category of the driver is dangerous driving.
The method according to any one of claims 1 to 9, wherein the detecting the to-be-detected image to obtain a target detection result comprises:

generating an intermediate feature map corresponding to the to-be-detected image based on the to-be-detected image;

Perform at least one target convolution process on the intermediate feature map to generate detection feature maps of multiple channels corresponding to the intermediate feature map;

Using the activation function to perform feature value conversion processing on each feature value of the target channel feature map representing the position in the detection feature maps of multiple channels, to generate a converted target channel feature map;

According to the preset pooling size and pooling step size, the converted target channel feature map is subjected to maximum pooling processing to obtain multiple pooling values and the position corresponding to each pooling value in the multiple pooling values. index; the position index is used to identify the position of the pooled value in the converted target channel feature map;

generating target detection frame information based on a plurality of pooling values and a position index corresponding to each of the plurality of pooling values;

The target detection result is determined according to the target detection frame information.
The method according to claim 10, wherein the generating the target detection frame information based on a plurality of pooling values and a position index corresponding to each of the plurality of pooling values comprises:

In the case where at least one pooling value is greater than a set pooling threshold value among the plurality of pooling values, determining the pooling value from the plurality of pooling values based on the plurality of pooling values and the pooling threshold The target pooling value of the center point of the target detection frame;

The target detection frame information is generated based on the position index corresponding to the target pooling value.
The method according to claim 10, wherein the generating the target detection frame information based on a plurality of pooling values and a position index corresponding to each of the plurality of pooling values comprises:

In the case that the multiple pooling values are less than or equal to the set pooling threshold, it is determined that the target detection frame information is empty.
A driver behavior detection device, comprising:

an acquisition module, configured to acquire a to-be-detected image of the driving position area in the vehicle cabin;

a detection module, configured to detect the to-be-detected image to obtain a target detection result, where the target detection result includes a steering wheel detection result and a human hand detection result;

a determining module, configured to determine the driving behavior category of the driver according to the target detection result;

The warning module is configured to issue warning information when the driving behavior category of the driver is dangerous driving.
An electronic device, comprising: a processor, a memory and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the memory communicate through the bus , when the machine-readable instructions are executed by the processor, the driver behavior detection method according to any one of claims 1 to 12 is executed.
A computer-readable storage medium storing a computer program on the computer-readable storage medium, the computer program executing the driver behavior detection method according to any one of claims 1 to 12 when the computer program is run by a processor.
A computer program comprising computer readable code, when the computer readable code is run in an electronic device, a processor in the electronic device executes a driver configured to implement any one of claims 1 to 12 Behavioral detection methods.