WO2023005275A1

WO2023005275A1 - Traffic behavior recognition method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023005275A1
Application number: PCT/CN2022/087745
Authority: WO
Inventors: 范佳柔; 甘伟豪; 武伟
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-07-30
Filing date: 2022-04-19
Publication date: 2023-02-02
Also published as: CN113516099A

Abstract

The present application provides a traffic behavior recognition method and apparatus, an electronic device, and a storage medium. The method may comprise: obtaining an image to be recognized comprising one or more rider areas; for any one of the one or more rider areas, executing operation comprising: determining an associated vehicle area associated with the rider area from said image, the rider area comprising a vehicle and at least one human body; performing manned quantity recognition on the rider area to obtain a manned quantity recognition result, and performing vehicle type recognition on the associated vehicle area to obtain a vehicle type recognition result; and determining, according to the manned quantity recognition result and the vehicle type recognition result, whether a target rider in the rider area has an illegal manned behavior.

Description

Traffic behavior recognition method and device, electronic device and storage medium

Related Application Cross Reference

This application claims the priority of the Chinese patent application with the application number 202110873586.2 and the filing date of July 30, 2021. The entire content of the Chinese patent application is hereby incorporated by reference into this application.

technical field

This application relates to computer technology, in particular to a traffic behavior recognition method and device, electronic equipment and storage media.

Background technique

As the regulatory authorities strengthen their supervision, traffic behavior needs to be identified. In some scenarios, traffic behavior identification can include the identification of non-motorized vehicle occupants. If illegal occupant behavior is found, penalties and safety education are required.

At present, for the identification of non-motor vehicle illegal loading behavior, the number of people loaded is mainly determined by identifying the number of heads or bodies appearing in the image. If the number of loaded people is too large, it is determined that the loading behavior is illegal.

Contents of the invention

This application proposes a traffic behavior recognition method. The method may include: acquiring an image to be recognized including one or more rider areas; for any rider area in the one or more rider areas, performing an operation includes: in the image to be identified, determining the The associated vehicle area associated with the rider area, the rider area includes a vehicle and at least one human body; identifying the number of passengers in the rider area to obtain the identification result of the number of passengers, and identifying the vehicle type in the associated vehicle area to obtain Vehicle type identification result; according to the identification result of the number of passengers and the identification result of the vehicle type, it is determined whether the target rider in the rider area has illegal loading behavior.

The present application also proposes a traffic behavior recognition device, which includes: an acquisition module, configured to acquire an image to be recognized including one or more rider areas; a first determination module, configured to target the one or more rider areas Any one of the rider areas, in the image to be recognized, determine the associated vehicle area associated with the rider area, the rider area includes a vehicle and at least one human body; the identification module is used to carry people in the rider area Quantity identification, obtaining the identification result of the number of people carried, and performing vehicle type identification on the associated vehicle area to obtain the result of vehicle type identification; a second determination module, used to identify the result of the number of people carried and the vehicle type identification result , to determine whether the target rider in the rider's area has illegal passenger behavior.

The present application also proposes an electronic device, the device includes: a processor; a memory for storing processor-executable instructions; wherein, the processor executes the executable instructions to implement any of the foregoing embodiments. The traffic behavior recognition method described above.

The present application also proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to make a processor execute the traffic behavior recognition method as described in any one of the foregoing embodiments.

The present application also proposes a computer program product, including a computer program stored in a memory, and when the computer program instructions are executed by a processor, the traffic behavior recognition method as described in any one of the foregoing embodiments is implemented.

It is to be understood that both the general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Description of drawings

In order to more clearly illustrate the technical solutions in one or more embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only one of the present application Or some embodiments described in multiple embodiments, for those skilled in the art, other drawings can also be obtained according to these drawings without creative work.

Fig. 1 is a method flowchart of a traffic behavior recognition method shown in the present application;

FIG. 2 is a flowchart of a method for determining an associated vehicle area shown in the present application;

FIG. 3 is a schematic diagram of an object detection process shown in the present application;

FIG. 4 is a flow chart of another method for determining an associated vehicle area shown in the present application;

FIG. 5 is a schematic flow diagram of a method for identifying a manned behavior shown in the present application;

Figure 6 is a schematic diagram of a judging rule for illegal manned behavior shown in this application;

FIG. 7 is a schematic structural diagram of a traffic behavior recognition device shown in the present application;

FIG. 8 is a schematic diagram of a hardware structure of an electronic device shown in the present application.

Detailed ways

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of devices and methods consistent with aspects of the present application as recited in the appended claims.

The terminology used in this application is for the purpose of describing particular embodiments only, and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a", "said" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if", as used herein, could be interpreted as "at" or "when" or "in response to a determination", depending on the context.

The purpose of this application is to propose a traffic behavior recognition method (hereinafter referred to as the recognition method). At present, for the identification of non-motor vehicle illegal loading behavior, the number of people loaded is mainly determined by identifying the number of heads or bodies appearing in the image. If the number of loaded people is too large, it is determined that the loading behavior is illegal. On the one hand, in the scene of non-motorized vehicle-mounted people, people are relatively close to each other, and it is easy to be blocked by human bodies and heads. Therefore, it is impossible to get the accurate number of human bodies or heads, which leads to incorrect identification of the number of people on board. On the other hand, different types of non-motorized vehicles have different requirements for the number of passengers. Existing methods cannot identify the legitimacy of different types of non-motorized vehicle occupant behavior.

FIG. 1 is a flow chart of a traffic behavior recognition method shown in this application. The method may include steps S102 to S108.

S102. Acquire images to be identified including one or more rider areas.

S104. For any one of the one or more rider areas, from the image to be recognized, determine an associated vehicle area associated with the rider area, where the rider area includes a vehicle and at least one human body.

S106. Identify the number of passengers in the rider area to obtain an identification result of the number of passengers, and identify the vehicle type in the associated vehicle area to obtain a vehicle type identification result.

S108. According to the identification result of the number of passengers and the identification result of the vehicle type, determine whether the target rider in the rider area has illegal loading behavior.

In the traffic behavior recognition method shown in this application, on the one hand, the method uses a neural network model to identify the number of people in the rider area, and can learn the number of people in the rider area through model self-adaptation, so that even in the area to be identified In the case of occlusion in the image, the accurate number of passengers can also be identified, thereby improving the accuracy of traffic behavior recognition.

On the other hand, the legitimacy of manned behavior can be identified based on the identification results of the number of people carried and the result of vehicle type recognition, so that the type of vehicle and the number of people carried can be comprehensively considered when performing legal identification, so as to achieve traffic behavior recognition for different types of vehicles Effect.

The identification method shown in FIG. 1 can be applied to electronic equipment. Wherein, the electronic device may execute the method by carrying software logic corresponding to the device method. The type of the electronic device may be a notebook computer, a computer, a server, a mobile phone, a PAD terminal, and the like. The type of the electronic device is not particularly limited in this application. The electronic device may also be a client device or a server device, which is not specifically limited here. It can be understood that the identification method can be executed solely by the client device or the server device, or can be executed by the client device and the server device in cooperation. The server can be a cloud built by a single server or a server machine. In the following, an electronic device (hereinafter referred to as device) is taken as an example for description.

In some embodiments, the device may acquire the image to be recognized from an image acquisition device deployed on a road site. The image acquisition device can perform image acquisition on a preset field of view of the road scene at a fixed angle or an adjustable angle, and the device can send the acquired image to be recognized to the device.

The image to be recognized may include one or more vehicles and one or more riders. The vehicle may be a non-motorized vehicle. The non-motorized vehicles may be motorcycles, tricycles, electric vehicles and the like. The rider may refer to a person with driving behavior.

After acquiring the image to be recognized, the device may execute S104. The rider area shown in this application refers to the area enclosed by the detection frame of the target rider in the image to be recognized. The target rider can be specified according to business needs. For example, the target rider may be a rider randomly selected from the riders included in the image to be recognized. For another example, the target rider may be the rider with the highest definition among the riders included in the image to be recognized. For another example, the target rider may include a rider who is about to leave the viewing area. For another example, each rider included in the image to be recognized may be designated as a target rider respectively. The rider area may include a vehicle and at least one human body.

The vehicle area shown in this application refers to the area enclosed by the detection frames of the vehicles in the image to be recognized.

In the present application, the associated vehicle area associated with the rider area can be determined at least through the correlation prediction score or overlap between the rider area and the vehicle area. The correlation prediction score or coincidence degree may characterize the degree of close spatial connection between the rider area and the associated vehicle area. The coincidence degree and correlation prediction score are described below respectively.

In some embodiments, the associated vehicle area can be determined by the overlapping degree between the rider area and the vehicle area.

Fig. 2 is a flowchart of a method for determining an associated vehicle area shown in the present application.

As shown in FIG. 2, S104 further includes S202 and S204. Wherein, at S202, the image to be recognized is detected to obtain one or more vehicle areas and the rider area. In S204, among the one or more vehicle areas, determine a target vehicle area that has the greatest overlap with the rider area, and determine the target vehicle area as an associated vehicle area associated with the rider area.

Therefore, by determining the target vehicle area with the largest overlap with the rider area as the associated vehicle area associated with the rider area, the spatial association relationship between the vehicle and the rider can be used to determine the accurate associated vehicle area, which helps It can accurately determine the type of vehicle driven by the target rider, which helps to improve the accuracy of traffic behavior recognition. In some embodiments, in S202, the object detection model can be used to detect the object, and the detection frames corresponding to the rider and the vehicle in the image to be recognized can be obtained; the target detection frame corresponding to the target rider can be surrounded in the image to be recognized The formed area is determined as the rider area; and the area surrounded by the detection frame corresponding to the vehicle in the image to be recognized is determined as the vehicle area.

The object recognition network can be based on a regional convolutional neural network (Region Convolutional Neural Networks, RCNN), a fast regional convolutional neural network (Fast Region Convolutional Neural Networks, FAST-RCNN) or a faster regional convolutional neural network (Faster A model built by Region Convolutional Neural Networks, FASTER-RCNN). The present application does not specifically limit the network structure of the object detection model.

FIG. 3 is a schematic diagram of an object detection process shown in the present application. It should be noted that, FIG. 3 only schematically illustrates the object detection process, and does not specifically limit the present application.

The object detection model 30 shown in FIG. 3 may be a model constructed based on the FASTER-RCNN network. The model may at least include a backbone network (backbone) 31, a candidate frame generation network (Region Proposal Network, RPN) 32, and a region-based convolutional neural network (Region-based Convolutional Neural Network, RCNN) 33.

Wherein, the backbone network 31 can be used to perform several convolution operations on the image to be recognized to obtain the target feature map of the image to be recognized. RPN 32 is used to process the target feature map to obtain anchor frames (anchors) corresponding to each rider and each vehicle in the image to be recognized. RCNN 33 is used to carry out detection frame (bounding box, bbox) regression and classification according to the anchor frame of RPN 32 output and the target feature map of backbone network 31 output, obtain the rider frame corresponding to each rider in the image to be identified respectively, and Vehicle box corresponding to each vehicle. In some examples, position information and/or size information of each rider frame or vehicle frame may be obtained. In some embodiments, the position information and/or size information of the rider frame or the vehicle frame can be represented by four vertex coordinates.

In some examples, the object detection model may be trained under supervision with several training samples first. In other examples, the position and size information of the object frame corresponding to each object (including the rider and the vehicle) in several sample images may be marked to obtain several training samples. Then, the model can be supervised by using the training samples in a conventional training manner until the model converges.

After the training is completed, the object detection model can be used to perform object detection on the image to be recognized to obtain the rider frame corresponding to each rider included in the image to be recognized, and the vehicle frame corresponding to each vehicle respectively. If the image includes multiple riders and/or multiple vehicles, different rider frames and/or different vehicle frames may also be numbered in the recognition result.

After obtaining the rider frame and the vehicle frame included in the image to be recognized, the target rider frame corresponding to the target rider can be selected, and the area enclosed by the target rider frame in the image to be recognized is determined as the rider area, And determining the area enclosed by the vehicle frame in the image to be recognized as the vehicle area.

In S204, for any one of the one or more rider areas, the degree of overlap between each vehicle area and the rider area may be calculated respectively. The vehicle areas may be sorted in descending order of the calculated coincidence degrees, and the vehicle area ranked first is determined as the target vehicle area. The target vehicle area may be determined as an associated vehicle area associated with the rider area.

In some embodiments, the degree of overlap may include a ratio of an area where the vehicle area and the rider area intersect to an area where the vehicle area and the rider area merge. That is, the intersection over union (IoU) between the vehicle area and the rider area is used to characterize the overlap between the two.

When calculating the intersection ratio, it may first be determined whether the vehicle area (hereinafter referred to as area 1) and the rider area (hereinafter referred to as area 2) overlap. If they overlap, the area of the area where area 1 and area 2 intersect can be divided by the area of the area where area 1 and area 2 merge to obtain the area intersection ratio IoU between area 1 and area 2 (area 1, area 2).

Assume that the coordinates of the upper left corner of area 1 are (p _x1 , p _y1 ), and the coordinates of the lower right corner are (p _x2 , p _y2 ). The coordinates of the upper left corner of area 2 are (h _x1 ,h _y1 ), and the coordinates of the lower right corner are (h _x2 ,h _y2 ).

If the value corresponding to p _x1 >h _x2 ||p _x2 <h _x1 ||p _y1 >h _y2 ||p _y2 <h _y1 is 1, it can be determined that the area 1 and area 2 do not overlap. In other words, it can be It is determined that the vehicle corresponding to area 1 and the target rider corresponding to area 2 are not spatially related.

If the corresponding value of p _x1 >h _x2 ||p _x2 <h _x1 ||p _y1 >h _y2 ||p _y2 <h _y1 is 0, then according to the formula Len=min(p _x2 ,h _x2 )–max( p _x1 -h _x1 ), determine the length Len of the intersecting area, and determine the width Wid of the intersecting area according to the formula Wid=min(p _y2 ,h _y2 )-max(p _y1 -h _y1 ).

After determining the length Len and width Wid of the intersection area, according to the formula S1=Len*Wid, the area S1 of the area where area 1 and area 2 intersect can be obtained.

Afterwards, the area of the combined area of the area 1 and the area 2 can be determined according to the formula S2=S(p)+S(h)−S1. in:

Area S(p) of region 1 = (p _y2 -p _y1 )*(p _x2 -p _x1 );

The area S(h) of the region 2=(h _y2 −h _y1 )*(h _x2 −h _x1 ).

Finally, according to the formula IoU=S1/S2, the degree of overlap between the vehicle area and the rider area can be determined. In this way, the degree of overlap between the vehicle area and the rider area can be accurately calculated, thereby accurately determining the associated vehicle area associated with the rider area, which helps to improve the accuracy of traffic behavior recognition.

In some embodiments, the target vehicle can also be determined by the correlation prediction score between the rider and the vehicle.

FIG. 4 is a flowchart of another method for determining an associated vehicle area shown in the present application.

As shown in FIG. 4, S104 may include S402 to S406. Wherein, at S402, the image to be recognized is detected to obtain the one or more vehicle areas and the rider area. At S404, the correlation score between the one or more vehicle regions and the rider region is determined by using a pre-trained correlation score prediction model. At S406, among the one or more vehicle areas, determine a target vehicle area with the highest associated score with the rider area, and determine the target vehicle area as an associated vehicle area associated with the rider area.

Therefore, the degree of correlation between the rider area and the vehicle area can be accurately represented by the correlation score, so that the associated vehicle area with the strongest correlation with the rider area can be determined, which helps to accurately determine the type of vehicle driven by the target rider, and then helps To improve the accuracy of traffic behavior recognition. It should be noted that for the description of S402, reference may be made to the description of S202, and no detailed description is given here.

The association score prediction network may be a network constructed based on a deep learning network. When training the network, first obtain images including multiple pairs of vehicle regions and rider regions, and then mark the correlation scores between each pair of vehicle regions and rider regions to obtain several training samples. Among them, if the rider area is associated with the vehicle area, the association score is marked as 1; otherwise, it is marked as 0. Afterwards, the network can be trained with supervision using the training samples until the network converges. After the training of the correlation score prediction network is completed, the network can be used to predict the correlation score between the vehicle region and the rider region in the image to be recognized.

After determining the associated vehicle area, the device may continue to execute S106. The object area in this step (including the rider area and the vehicle area) may be the area surrounded by the object frame corresponding to the object in the image to be recognized. The object region may carry image features related to the object.

The rider area described in this application may include the first image feature related to the rider's behavior of carrying people. For example, the first image feature may include a vehicle driven by a rider, and image features corresponding to a human body carried on the vehicle. The number of people on board can be judged by the first image feature.

The vehicle area described in the present application may include the second image feature related to the vehicle type. For example, the second image feature may include an image feature corresponding to the vehicle, such as a feature of the number of wheels, a feature of the wheel structure, a feature of the body structure, and the like. The vehicle type can be determined by means of the second image feature.

In some embodiments, S106 may include S1062 and S1064. In S1062, carry out identification of the number of passengers in the area for riders, and obtain the identification result of the number of passengers. In S1064, vehicle type identification is performed on the associated vehicle area to obtain a vehicle type identification result. The execution order of S1062 and S1064 is not limited in this application.

In some embodiments, when executing S1062, a rider area map corresponding to the rider area may be acquired. In some embodiments, the rider frame corresponding to the target rider and the image to be recognized (or the target feature map obtained by using the backbone network 31 to perform feature extraction on the image to be recognized) can be input into the regional feature extraction unit to obtain the target rider. Rider area map corresponding to the rider. Wherein, the rider area map may be a feature map, or an image of the rider area.

The region feature extraction unit may include a region of interest feature alignment (Region of interest Align, ROI Align) unit or a region of interest feature pooling (Region of interest Pooling, ROI Pooling) unit. The area feature extraction unit can be used to perform processing such as pooling and convolution on the rider area enclosed by the rider frame to obtain a rider area map. The rider area map may include high-dimensional or low-dimensional image features.

After the rider area map is obtained, the number of passengers can be identified on the rider area map to obtain the result of identification of the number of people.

In some embodiments, the number of people can be identified by using a pre-trained model for identifying the number of people. The identification model of the number of people on board may include a classifier built based on a neural network. The recognition result of the number of people carried by the model may include a first recognition result, a second recognition result, and a third recognition result, and confidence levels corresponding to each recognition result. Wherein, the first preset identification result indicates that the number of people carried has reached the first preset number. The second preset identification result indicates that the number of people on board has reached a second preset number. The third recognition result indicates that the number of people on board has reached a third preset number. The first preset quantity, the second preset quantity and the third preset quantity can be set according to business requirements. For example, the first preset number may be 3 or more, the second preset number may be 2, and the third preset number may be 1.

When determining the identification result of the number of persons carried, the identification result corresponding to the highest confidence can be selected. For example, using the aforementioned recognition model for the number of people on board, the recognition results obtained by classifying the number of people in the rider area map indicate that the first recognition result, the second recognition result, and the third recognition result correspond to confidence levels of 0.7, 0.2, and 0.1, respectively. . That is, it can be determined that the identification result of the number of persons carried is the first identification result corresponding to the highest confidence level of 0.7.

When training the recognition model of the number of people, it is possible to obtain training samples with labeled information of the number of people, and then use the training samples to perform multiple rounds of iterations through supervised training until the model converges. After the training is completed, the model can be used to identify the number of passengers. Therefore, the characteristics of adaptive learning of the neural network can be used to improve the accuracy of identifying the number of downloaders in various situations (including occlusion situations).

In many scenarios, traffic behavior recognition may not be necessary or cannot be performed normally. Such scenarios may be referred to as invalid scenarios. For example, although the scene where the rider pushes the cart and the scene where the rider stands next to the car includes both the rider and the vehicle, the rider does not drive, so there is no need to detect the manned behavior in this type of scene. For another example, in scenes where multiple riders are close to each other, low-definition scenes, and scenes where vehicles are occluded, the rider or vehicle may not be recognized normally due to the low identifiability of the rider or vehicle in the image, so it may not be possible to Carry out traffic behavior recognition normally.

A fourth identification result that indicates that the current identification is invalid may be added to the identification result of the number of passengers obtained after the identification of the number of passengers in the rider area. If the identification result of the number of passengers in the rider area is the fourth identification result, it can be explained that the scene in the rider area is an invalid scene, and traffic behavior identification is not required or impossible, so it is not necessary to perform traffic behavior in the rider area identify.

In the aforementioned circumstances, the identification results of the number of people on board output by the aforementioned number of people identification model may include the first identification result, the second identification result, the third identification result, and the fourth identification result, as well as confidence values corresponding to the various identification results. Spend.

The fourth recognition result indicates that at least one of the following scenes appears in the image to be recognized: a scene where a rider pushes a cart, a scene where a rider stands beside the car, a scene where multiple riders are close to each other, a scene with low definition, or a scene where the vehicle is Occluded scenes.

When determining the final identification result of the number of people carried, the identification result corresponding to the highest confidence can be selected. For example, using the aforementioned recognition model for the number of people on board, the recognition results obtained by classifying the number of people in the rider area map indicate: the confidence levels corresponding to the first recognition result, the second recognition result, the third recognition result, and the fourth recognition result 0.1, 0.2, 0.1, 0.6. Therefore, it can be determined that the identification result of the number of passengers is an invalid identification result corresponding to the highest confidence level of 0.6.

In some embodiments, training the human identification network may include S11-S13.

Wherein, S11, acquire a first training sample.

The first training sample includes a plurality of sample images of riders and first annotation information corresponding to the number of passengers, and the first annotation information includes one of the following labels: 1 person, 2 persons, 3 persons, or invalid Tags that include at least one of the following: a rider pushing a cart, a rider standing next to a cart, multiple riders in close proximity to each other, low clarity, or the cart is obscured;

S12. Input the first training sample into the preset first initial network to obtain the identification result of the number of people in the sample corresponding to each sample image.

The first initial network may be any type of neural network. The first initial network may output the identification result of the number of passengers.

S13. Determine a first loss based on the identification result of the number of people in the sample and the first annotation information, optimize the first initial network based on the first loss, and obtain the human identification network.

After obtaining the identification results of the number of people in the sample corresponding to each sample image for the first training sample, the first loss may be determined according to the first label information, and the parameters of the first initial network may be updated through a backpropagation operation, to complete a parameter iteration. In some embodiments, the number of parameter iterations can be preset, and the vehicle identification network can be obtained after the preset number of iterations are completed on the second initial network.

Through the training method, when carrying out the identification of the number of passengers, on the one hand, it is possible to reduce the invalid scenes that cannot or do not need to carry out the identification of the traffic behavior of the manned behavior detection, and improve the efficiency of the traffic behavior recognition of the manned behavior detection; on the other hand, it can accurately Identify the number of passengers and improve the effect of traffic behavior recognition.

In some embodiments, after identifying the number of passengers in the rider area, and obtaining the number of people and the corresponding first confidence level, in response to the first confidence level reaching the first confidence level threshold, the The number of passengers is determined as the identification result of the number of people in the rider area.

The first confidence threshold can be set according to business conditions. For example, assuming that the identification result of the number of passengers determined by the model is 1 person, the corresponding confidence level is 0.7, and the first confidence threshold is 0.7, then it is credible that the identification result of the number of passengers is 1 person, and the number of passengers can be output. Person recognition results. Wherein, the confidence level can represent the degree of credibility when the number of passengers is 1.

By setting the confidence threshold and outputting the recognition result of the number of passengers only when the confidence reaches the confidence threshold, the credibility of the output recognition result can be guaranteed, thereby ensuring the accuracy of traffic behavior recognition.

In some embodiments, various traffic behavior recognition scenarios can be flexibly adapted by adjusting the size of the first confidence threshold.

For example, in a scenario where the accuracy of illegal manned behavior is prioritized, the first confidence threshold may be set to a higher value (such as 0.9). In this way, the reliability of the output identification result of the number of passengers is sufficiently high, thereby improving the accuracy of traffic behavior identification. For another example, in a scenario where the sensitivity of manned behavior recognition is prioritized, the first confidence threshold can be set to a lower value (for example, 0.6), thereby increasing the number of output manned number recognition results, thereby improving Sensitivity of traffic behavior recognition.

In some embodiments, when executing S1064, the corresponding associated vehicle area map may be obtained first according to the associated vehicle area. In some embodiments, the vehicle frame corresponding to the associated vehicle area and the image to be recognized (or the target feature map obtained by using the backbone network 31 to perform feature extraction on the image to be recognized) can be input into the ROI Pooling unit to obtain the The above-mentioned associated vehicle area map. Wherein, the associated vehicle area map may be a feature map, or an image of the associated vehicle area.

Then, vehicle type identification may be performed on the associated vehicle area map to obtain a vehicle type identification result. In some embodiments, vehicle type recognition can be performed by a pre-trained vehicle recognition network. The vehicle recognition network may include a classifier built based on a neural network. The calculation results output by the vehicle identification network may include confidence levels (for example, probabilities) when vehicles in the vehicle area map are identified as respective preset vehicle types. When determining the final vehicle type, the vehicle type corresponding to the highest confidence level may be selected, for example, the vehicle type corresponding to the highest confidence level may be determined as the vehicle type identification result.

In some embodiments, training the vehicle recognition network may include S21-S23.

Wherein, S21, acquire a second training sample.

The second training samples include a plurality of sample images of vehicles and second labeling information of corresponding vehicle types.

S22. Input the second training sample into a preset second initial network to obtain a sample vehicle type recognition result of each sample image.

The second initial network may be any type of neural network. The second preset network may output a vehicle type identification result.

S23. Determine a second loss based on the sample vehicle type identification result and the second annotation information, optimize the second initial network based on the second loss, and obtain the vehicle identification network.

After the calculation result is obtained for the second training sample, the second loss may be determined according to the second label information, and the parameters of the second initial network may be updated through a backpropagation operation to complete a parameter iteration. In some embodiments, the number of parameter iterations can be preset, and the vehicle identification network can be obtained after the second initial network has completed a preset number of iterations.

Through the training method, when performing vehicle type identification, the characteristics of neural network self-adaptive learning can be utilized to improve the accuracy of vehicle type identification.

In some embodiments, after the vehicle type identification is performed on the associated vehicle area, and the vehicle type corresponding to the associated vehicle area and the corresponding second confidence level are obtained, the second confidence level may be reached in response to the second confidence level. degree threshold, the vehicle type is determined as the vehicle type identification result of the vehicle area.

The second confidence threshold can be set according to business conditions.

By setting the confidence threshold and outputting the vehicle type recognition result only when the confidence reaches the confidence threshold, the credibility of the output recognition result can be guaranteed, thereby ensuring the accuracy of traffic behavior recognition.

In some embodiments, various traffic behavior recognition scenarios can be flexibly adapted by adjusting the size of the second confidence threshold. For example, in a scenario where the accuracy of manned behavior recognition is improved, the second confidence threshold may be set to a higher value (such as 0.9). In this way, the reliability of the output vehicle type recognition result can be sufficiently high, thereby improving the accuracy of traffic behavior recognition. For another example, in the scenario of improving the sensitivity of passenger behavior recognition, the second confidence threshold can be set to a lower value (for example, 0.6), so that the number of output vehicle type recognition results can be increased, thereby improving traffic behavior. Recognition sensitivity.

After obtaining the identification result of the number of riders and the identification result of the vehicle type, the device may execute S108.

In some embodiments, actual recognition results can be output for different vehicle types.

When executing S108, in the first aspect, it may be determined that the target rider in the rider area is illegally carrying passengers in response to the identification result of the number of passengers being the first identification result; the first preset identification result represents the number of passengers The first preset amount is reached.

The first preset quantity may be an empirical value. For example, in the non-motor vehicle scene, no matter what type of vehicle, the number of people including the driver cannot exceed 3, and non-motor vehicles with more than 3 people can be considered as non-compliant. At this time, the first preset number can be set to 3, and if the number of passengers reaches 3 or more, it can be determined that the passenger-carrying violation is violated.

In the second aspect, it may be determined that the target rider is illegally carrying people in response to the recognition result of the number of passengers being the second recognition result, and the vehicle type represented by the type recognition result is a preset non-motor vehicle type; The second identification result indicates that the number of people on board has reached a second preset number, and the second preset number is smaller than the first preset number.

In the third aspect, in response to the number of people represented by the recognition result of the number of people being carried as the second recognition result, and the vehicle type represented by the type recognition result is not the preset non-motor vehicle type, determine the The above-mentioned target rider did not carry passengers in violation of regulations.

When the number of passengers is the second preset number, different types of vehicles, the corresponding behavior of carrying people may be compliant or may be illegal. If the vehicle (associated vehicle) driven by the target rider is a preset non-motor vehicle type, it can be determined that the manned behavior violates the regulations, otherwise it can be determined that the manned behavior is compliant.

The second preset quantity may be an empirical value. The preset non-motor vehicle type may refer to a vehicle whose number of people cannot reach the second preset number.

For example, assuming that the second preset number is 2, when the preset non-motor vehicle type is a utility vehicle such as a tricycle. This type of tool cart can only legally carry 1 person. If it is recognized that the number of people carried is 2, it can be determined that it has illegally carried people. If the vehicle type is not such a tool cart, such as a motorcycle or an electric bicycle, if the identified number of passengers is 2, it can be determined that the number of passengers is compliant.

In the fourth aspect, it may be determined that the target rider has not violated regulations for carrying passengers in response to the recognition result of the number of passengers being a third recognition result; the third recognition result indicates that the number of passengers is a third preset number, and the first Three preset numbers are smaller than the second preset number.

The third preset quantity may be an empirical value. For example, in a non-motor vehicle scene, no matter what type of vehicle, if the number of passengers including the driver is one person, it can be considered as complying with the regulations. At this time, the third preset number may be set to 1.

In the fifth aspect, it may be determined that the traffic behavior identification for the target rider is invalid in response to the identification result of the number of passengers being the fourth identification result. This eliminates the need for further traffic behavior recognition.

Through the legality judgment logic of the aforementioned five aspects, it is possible to output the actual legality recognition results for different vehicle types and scenarios, and improve the applicability of legality recognition.

In some embodiments, when it is determined that the target rider is illegally carrying passengers, a warning message may be issued.

In some embodiments, the device may be connected to an interactive terminal held by a traffic policeman. When the device recognizes the illegal behavior of carrying people, it can package the information corresponding to the target rider, the vehicle information it drives, and the reason for the violation as alarm information, and send it to the interactive terminal held by the traffic police. The traffic policeman can make corresponding processing after receiving the warning. In this way, an alarm can be automatically and timely issued for violations of manned behavior, which facilitates the handling of violations.

Embodiments will be described below in conjunction with non-motor vehicle vehicle-mounted person behavior recognition scenarios.

Several cameras are deployed in the road scene. The camera can send the to-be-recognized images collected in the predetermined area to the recognition device for rider behavior detection.

The identification device can be equipped with pre-trained rider-vehicle identification network (hereinafter referred to as network 1), passenger identification network (hereinafter referred to as network 2) and vehicle identification network (hereinafter referred to as network 3).

Wherein, the network 1 is used to detect the rider and vehicle appearing in the image to be recognized, and the corresponding vehicle area and rider area. The network 2 can be used to identify the number of people on board. The network 3 can be used to identify the type of vehicle.

The identification device can also perform multi-target tracking on each rider appearing in the image to be identified according to the identification result of the network 1 to obtain the corresponding driving track of each rider, so as to identify the rider who newly appears in the predetermined area and is still in the predetermined area Active riders and riders who are about to leave their intended area. A rider who is about to leave a predetermined area can be determined as a target rider.

FIG. 5 is a schematic flow chart of a method for identifying manned behavior shown in the present application.

As shown in Figure 5, after the recognition device receives the image to be recognized, in S501, the rider frame corresponding to the rider and the vehicle frame corresponding to the vehicle appearing in the image to be recognized are recognized through the network 1, and the target For the target rider frame corresponding to the rider, the area enclosed by the target rider frame in the image to be recognized is determined as the rider area, and the area enclosed by the vehicle frame in the image to be identified is determined as the vehicle area.

In S502, the overlap degree between each vehicle area and the rider area is determined by using IoU, and the target vehicle area corresponding to the maximum overlap degree is determined as an associated vehicle area spatially associated with the rider area. In this way, the spatial overlap relationship between the target rider and the vehicle he drives can be used to accurately determine the associated vehicle area associated with the rider area, which helps to improve the accuracy of vehicle type recognition and obtain accurate passenger behavior recognition results. .

In S503, obtain the rider area map corresponding to the rider area, and use the network 2 to obtain the identification result of the number of passengers. S504. Obtain an associated vehicle area map corresponding to the associated vehicle area, and use the network 3 to obtain a vehicle type identification result. In this example, it can be judged whether the confidence levels corresponding to the recognition results of the number of passengers and the vehicle type are respectively 0.8, so that the credible results of the recognition of the number of people and the type of vehicles can be screened out, thereby improving the accuracy of the recognition of people sex.

In S505, according to the result of identification of the number of passengers carried and the result of identification of the type, it is identified whether the passenger carrying behavior of the target rider violates regulations.

FIG. 6 is a schematic diagram of a judgment flow chart of an illegal manned behavior shown in the present application.

As shown in FIG. 6 , at S602 , the recognition result represented by the recognition result of the number of persons carried is judged. If the identification result of the number of passengers is invalid, the traffic behavior identification of the target rider may not be performed.

If the number of passengers represented by the recognition result of the number of passengers reaches 3, it is determined that the target rider is illegally carrying people.

If the number of people represented by the recognition result of the number of people carried is 2 persons, S604 may be executed to determine whether the vehicle type represented by the vehicle type recognition result is a tricycle.

If the vehicle type is a tricycle, it is determined that the target rider is illegally carrying passengers, otherwise it is determined that the target rider is not illegally carrying passengers.

If the number of passengers represented by the recognition result of the number of passengers is 1 person, it is determined that the target rider is not carrying people in violation of regulations.

As a result, on the one hand, it is unnecessary to perform manned behavior recognition for invalid scenarios, improving the efficiency and effect of manned behavior recognition; Applicability of identification methods.

If the violation of passenger loading behavior is identified, the alarm information can be generated based on the rider information, vehicle information, and information on the cause of the violation, and sent to the corresponding handheld device of the traffic police in time, so that the traffic police can make timely processing.

Corresponding to any of the above embodiments, the present application also proposes a traffic behavior recognition device.

FIG. 7 is a schematic structural diagram of a traffic behavior recognition device shown in the present application.

As shown in FIG. 7 , the device 70 may include: an acquisition module 71, configured to acquire an image to be recognized including one or more rider areas; a first determination module 72, configured to target Any one of the rider areas, from the image to be identified, determine the associated vehicle area associated with the rider area, the rider area includes a vehicle and at least one human body; the recognition module 73 is used to carry people in the rider area Quantity identification, obtaining the identification result of the number of people carried, and performing vehicle type identification on the associated vehicle area to obtain the result of vehicle type identification; the second determination module 74 is used to identify the number of people based on the identification result of the number of passengers and the vehicle type As a result, it is determined whether the target rider in the rider's zone has a manning violation.

In some embodiments, the first determination module 72 is configured to: detect the image to be recognized to obtain one or more vehicle areas and the rider area; in the one or more vehicle areas, determine A target vehicle area that overlaps the rider area with the greatest degree, and determining the target vehicle area as an associated vehicle area associated with the rider area.

In some embodiments, the first determination module 72 is configured to: detect the image to be recognized to obtain one or more vehicle areas and the rider area; The correlation score between the one or more vehicle areas and the rider area; in the one or more vehicle areas, determine the target vehicle area with the highest correlation score with the rider area, and determine the target vehicle area as The associated vehicle zone associated with this rider zone.

In some embodiments, the identification module 73 is configured to: identify the number of passengers in the rider area, obtain the number of passengers and the corresponding first confidence level; reach the first confidence level in response to the first confidence level Threshold, determining the number of passengers as the identification result of the number of passengers in the rider area; performing vehicle type identification on the associated vehicle area to obtain the vehicle type corresponding to the associated vehicle area and the corresponding second degree of confidence; responding When the second confidence level reaches a second confidence level threshold, the vehicle type is determined as the vehicle type identification result of the vehicle area.

In some embodiments, the second determining module 74 is configured to: determine that the target rider is illegally carrying passengers in response to the recognition result of the number of passengers being a first recognition result; The number reaches the first preset number; or, in response to the recognition result of the number of passengers is the second recognition result, and the vehicle type represented by the type recognition result is a preset non-motor vehicle type, determine that the target rider violates the rules Manning; the second identification result indicates that the number of people on board reaches a second preset number, and the second preset number is less than the first preset number; or, in response to the number of people on board represented by the identification result of the number of people on board The number of people is the second identification result, and the vehicle type represented by the type identification result is not the preset non-motor vehicle type, and it is determined that the target rider is not illegally carrying people; or, in response to the number of people carrying The recognition result is a third recognition result, and it is determined that the target rider has not carried people in violation of regulations; the third recognition result indicates that the number of people carried is a third preset number, and the third preset number is smaller than the second preset number or, in response to the identification result of the number of passengers being the fourth identification result, it is determined that the traffic behavior identification for the target rider is invalid.

In some embodiments, the fourth recognition result indicates that the image to be recognized includes at least one of the following scenes: a scene of a rider pushing a cart; a scene of a rider standing next to the car; a scene of multiple riders close to each other; low-resolution scene; or the scene where the vehicle is blocked.

In some embodiments, the device 70 further includes: an alarm module, configured to send out an alarm message in response to the target rider in the rider area carrying passengers illegally.

In some embodiments, the identification result of the number of passengers is obtained by detecting the rider area through the occupancy identification network; the device 70 also includes: a training module of the occupancy identification network, which is used to obtain the first training sample , the first training sample includes a plurality of sample images of riders and first annotation information corresponding to the number of passengers, and the first annotation information includes one of the following labels: 1 person, 2 people, 3 people, or An invalid label, the invalid label includes at least one of the following: a rider pushes a cart, a rider stands next to the car, multiple riders are close to each other, low definition, and the vehicle is blocked; the first training sample is input into the preset The first initial network obtains the identification result of the number of people in the sample corresponding to each sample image; determines a first loss based on the identification result of the number of people in the sample and the first annotation information, and optimizes the first loss based on the first loss An initial network, obtaining the manned identification network.

In some embodiments, the vehicle recognition result is obtained by detecting the associated vehicle area through a vehicle recognition network; the device 70 further includes: a training module of the vehicle recognition network, configured to obtain a second training sample, the The second training sample includes a plurality of sample images of the vehicle and the second label information of the corresponding vehicle type; the second training sample is input into the preset second initial network to obtain the sample vehicle type recognition result of each sample image Determining a second loss based on the sample vehicle type identification result and the second label information, optimizing the second initial network based on the second loss, to obtain the vehicle identification network.

Embodiments of the traffic behavior recognition device shown in this application can be applied to electronic equipment. Correspondingly, the present application discloses an electronic device, which may include: a processor, and a memory for storing instructions executable by the processor. Wherein, the processor is configured to invoke the executable instructions stored in the memory to implement the traffic behavior recognition method shown in any one of the foregoing embodiments.

As shown in FIG. 8 , the electronic device may include a processor 801 for executing instructions, a network interface 802 for connecting to a network, a memory 803 for storing operation data for the processor, and a memory 803 for storing behavior recognition device correspondence. The non-volatile memory 804 for instructions, the processor 801 , the network interface 802 , the memory 803 and the non-volatile memory 804 are coupled through an internal bus 805 .

Wherein, the embodiment of the device may be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located. From the perspective of hardware, in addition to the processor, memory, network interface, and non-volatile memory shown in Figure 8, the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.

It can be understood that, in order to increase the processing speed, the instructions corresponding to the traffic behavior recognition device may also be directly stored in the memory, which is not limited here.

The present application proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to make a processor execute the traffic behavior recognition method as shown in any one of the foregoing embodiments.

Those skilled in the art should understand that one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may be implemented as described on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. In the form of a computer program product.

"And/or" in this application means at least one of the two, for example, "A and/or B" may include three options: A, B, and "A and B".

Each embodiment in the present application is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.

The foregoing describes specific embodiments of the present application. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this application and their structural equivalents, or their A combination of one or more of them. Embodiments of the subject matter described in this application can be implemented as one or more computer programs, that is, one of computer program instructions encoded in a tangible, non-transitory program carrier to be executed by or to control the operation of data processing apparatus. or multiple modules. Alternatively or additionally, the program instructions may be encoded in an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for viewing by The data processing device executes. A computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).

Computers suitable for the execution of a computer program may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both. However, a computer is not required to have such a device. In addition, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.

Computer-readable media suitable for storing computer program instructions and data may include all forms of non-volatile memory, media and memory devices and may include, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices, magnetic disks such as internal hard drives or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this application contains many specific implementation details, these should not be construed as limitations on the scope of any disclosure or of what may be claimed, but rather as primarily describing features of particular disclosed embodiments. Certain features that are described in this application in multiple embodiments can also be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function in certain combinations as described and even initially claimed as such, one or more features from a claimed combination may in some cases be removed from that combination and the claimed A protected combination can point to a subcombination or a variant of a subcombination.

Similarly, while operations are depicted in the figures in a particular order, this should not be construed as requiring that those operations be performed in the particular order shown, or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of various system modules and components in the described embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems can often be integrated together in a single software product, or packaged into multiple software products.

Thus, certain embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above is only one or more embodiments of the application, and is not intended to limit the application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of one or more embodiments of the application , should be included within the protection scope of one or more embodiments of the present application.

Claims

A traffic behavior recognition method, comprising:

Obtain an image to be identified including one or more rider areas;

For any of the one or more rider zones, performing an action includes:

In the image to be recognized, determine an associated vehicle area associated with the rider area, where the rider area includes a vehicle and at least one human body;

Identifying the number of passengers in the rider area to obtain the identification result of the number of passengers, and identifying the vehicle type in the associated vehicle area to obtain the identification result of the vehicle type;

According to the identification result of the number of passengers and the identification result of the vehicle type, it is determined whether the target rider in the rider area has illegal loading behavior.
The method according to claim 1, wherein, in the image to be recognized, determining the associated vehicle area associated with the rider area comprises:

Detecting the image to be recognized to obtain one or more vehicle areas and the rider area;

Among the one or more vehicle areas, a target vehicle area that overlaps the rider area with the greatest degree is determined, and the target vehicle area is determined as an associated vehicle area associated with the rider area.
The method according to claim 1, wherein, in the image to be recognized, determining the associated vehicle area associated with the rider area comprises:

Detecting the image to be recognized to obtain one or more vehicle areas and the rider area;

determining an association score between the one or more vehicle areas and the rider area via a pre-trained association score prediction model;

Among the one or more vehicle areas, a target vehicle area with the highest associated score with the rider area is determined, and the target vehicle area is determined as an associated vehicle area associated with the rider area.
The method according to claim 1, wherein, carrying out the identification of the number of passengers in the rider area, and obtaining the identification result of the number of passengers includes:

Identify the number of passengers in the rider area to obtain the number of passengers and the corresponding first confidence level;

In response to the first confidence level reaching a first confidence level threshold, determining the number of occupants as an identification result of the number of occupants of the rider area;

Performing vehicle type identification on the associated vehicle area to obtain a vehicle type identification result, including:

Performing vehicle type identification on the associated vehicle area to obtain the vehicle type corresponding to the associated vehicle area and the corresponding second degree of confidence;

In response to the second confidence level reaching a second confidence level threshold, the vehicle type is determined as a vehicle type identification result for the associated vehicle area.
The method according to any one of claims 1-4, wherein, according to the identification result of the number of passengers and the identification result of the vehicle type, determining whether the target rider in the rider area has illegal passenger behavior includes the following Either:

In response to the identification result of the number of passengers being a first identification result, it is determined that the target rider is illegally carrying passengers; the first identification result indicates that the number of passengers has reached a first preset number;

In response to the recognition result of the number of passengers being a second recognition result, and the vehicle type represented by the type recognition result is a preset non-motor vehicle type, it is determined that the target rider is illegally carrying people; the second recognition result represents The number of people carried reaches a second preset number, and the second preset number is smaller than the first preset number;

In response to the number of passengers represented by the recognition result of the number of passengers being the second recognition result, and the vehicle type represented by the type recognition result is not the preset non-motor vehicle type, it is determined that the target rider is not in violation Manned;

In response to the identification result of the number of passengers being a third identification result, it is determined that the target rider is not illegally carrying passengers; the third identification result indicates that the number of passengers is a third preset number, and the third preset number is less than said second predetermined amount; or,

In response to the passenger number identification result being the fourth identification result, it is determined that the traffic behavior identification for the target rider is invalid.
The method according to claim 5, wherein the fourth recognition result indicates that the image to be recognized includes at least one of the following scenes:

The scene of riding a cart,

The scene of the rider standing next to the car,

Scenes with multiple riders leaning against each other,

low-resolution scenes, or

A scene where the vehicle is occluded.
The method according to any one of claims 1-6, further comprising:

In response to the illegal loading of the target rider in the rider area, a warning message is issued.
The method according to any one of claims 1-6, wherein the recognition result of the number of people is obtained by detecting the rider area through a people recognition network, wherein training the people recognition network includes:

Obtain a first training sample, the first training sample includes a plurality of rider sample images and the corresponding first annotation information of the number of passengers, and the first annotation information includes one of the following labels:

1 person, 2 persons, 3 persons, or invalid label,

The invalid label includes at least one of the following:

Rider pushing the cart, rider standing next to the cart, multiple riders close to each other, low resolution, or the cart is obscured;

Inputting the first training sample into the preset first initial network to obtain the recognition result of the number of people in the sample corresponding to each sample image;

A first loss is determined based on the identification result of the number of people in the sample and the first annotation information, and the first initial network is optimized based on the first loss to obtain the human identification network.
The method according to any one of claims 1-6, wherein the vehicle recognition result is obtained by detecting the associated vehicle area through a vehicle recognition network, wherein training the vehicle recognition network includes:

Obtaining a second training sample, the second training sample includes a plurality of sample images of vehicles and second labeling information corresponding to the vehicle type;

Inputting the second training sample into a preset second initial network to obtain a sample vehicle type recognition result of each sample image;

A second loss is determined based on the sample vehicle type identification result and the second label information, and the second initial network is optimized based on the second loss to obtain the vehicle identification network.
A traffic behavior recognition device, comprising:

An acquisition module, configured to acquire images to be identified including one or more rider areas;

The first determination module is configured to, for any one of the one or more rider areas, in the image to be recognized, determine an associated vehicle area associated with the rider area, the rider area includes a vehicle and at least one human body;

The identification module is used to identify the number of passengers in the rider area to obtain the identification result of the number of passengers, and to identify the vehicle type in the associated vehicle area to obtain the vehicle type identification result;

The second determining module is configured to determine whether the target rider in the rider area has illegal passenger behavior according to the identification result of the number of passengers and the identification result of the vehicle type.
An electronic device comprising:

processor;

memory for storing processor-executable instructions;

Wherein, the processor implements the traffic behavior recognition method according to any one of claims 1-9 by running the executable instructions.
A computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to make a processor execute the traffic behavior recognition method according to any one of claims 1-9.
A computer program product, comprising a computer program stored in a memory, when the computer program instructions are executed by a processor, the traffic behavior recognition method according to any one of claims 1-9 is realized.