CN113052048B

CN113052048B - Traffic event detection method and device, road side equipment and cloud control platform

Info

Publication number: CN113052048B
Application number: CN202110290826.6A
Authority: CN
Inventors: 董子超; 董洪义; 时一峰
Original assignee: Apollo Zhilian Beijing Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Filing date: 2021-03-18
Publication date: 2024-05-10
Anticipated expiration: 2041-03-18

Abstract

The application discloses a traffic event detection method, a traffic event detection device, road side equipment and a cloud control platform, relates to data processing, and particularly relates to the fields of artificial intelligence, intelligent traffic, computer vision, deep learning and cloud computing. The specific implementation scheme is as follows: acquiring a scene image of a road scene; determining a target image area contained in a scene image, wherein the target image area is an image area with the occurrence probability of traffic events being greater than or equal to a first threshold value; and detecting the traffic event in the target image area to obtain a detection result of whether the traffic event occurs in the target image area. Therefore, the traffic event detection in the road scene is realized based on the scene image, and the accuracy of the traffic event detection is improved.

Description

Traffic event detection method and device, road side equipment and cloud control platform

Technical Field

The application relates to the field of data processing, in particular to a traffic event detection method, a traffic event detection device, road side equipment and a cloud control platform, which can be used in the fields of artificial intelligence, intelligent traffic, computer vision, deep learning and cloud computing.

Background

With the gradual perfection of urban road construction and the technical progress of vehicles, the travel distance of people is increased, and the traffic condition of the road brings great influence to the life and work of people.

Traffic conditions on roads are mainly affected by traffic events such as traffic jams, vehicle non-compliance with signal light indication driving, traffic accidents, etc. Currently, traffic events are detected by inputting images in a surveillance video into a deep neural network.

Disclosure of Invention

The application provides a traffic event detection method, a traffic event detection device, road side equipment and a cloud control platform.

According to a first aspect of the present application, there is provided a traffic event detection method comprising:

acquiring a scene image of a road scene;

determining a target image area contained in the scene image, wherein the target image area is an image area with the occurrence probability of traffic events being greater than or equal to a first threshold value;

And detecting the traffic event in the target image area to obtain a detection result of whether the traffic event occurs in the target image area.

According to a second aspect of the present application, there is provided a model training method comprising:

Acquiring sample data;

training a feature extraction model according to the sample data;

the sample data comprises a plurality of sample images marked with traffic event occurrence areas, and the feature extraction model is used for extracting image features.

According to a third aspect of the present application, there is provided a traffic event detection device comprising:

an acquisition unit configured to acquire a scene image of a road scene;

The determining unit is used for determining a target image area contained in the scene image, wherein the target image area is an image area with the occurrence probability of the traffic event being greater than or equal to a first threshold value;

And the detection unit is used for detecting the traffic event in the target image area and obtaining a detection result of whether the traffic event occurs in the target image area.

According to a fourth aspect of the present application, there is provided a model training apparatus comprising:

An acquisition unit configured to acquire sample data;

the training unit is used for training the feature extraction model according to the sample data;

According to a fifth aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the traffic event detection method of the first aspect described above or the model training method of the second aspect described above.

According to a sixth aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the traffic event detection method according to the first aspect described above, or the model training method according to the second aspect described above.

According to a seventh aspect of the present application, there is provided a computer program product comprising: a computer program stored in a readable storage medium, from which it can be read by at least one processor of an electronic device, the at least one processor executing the computer program causing the electronic device to perform the traffic event detection method as described in the first aspect above, or the model training method as described in the second aspect above.

According to an eighth aspect of the present application, there is provided a roadside apparatus comprising the electronic apparatus as described in the fifth aspect above.

According to a ninth aspect of the present application, there is provided a cloud control platform comprising an electronic device as described in the fifth aspect above.

The technology according to the application improves the accuracy of traffic event detection.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a first embodiment according to the present application;

FIG. 3 is a schematic diagram of a second embodiment according to the present application;

FIG. 4 is a schematic diagram of a third embodiment according to the present application;

FIG. 5 (a) is an exemplary graph of the distribution of target image regions in a scene image at a second threshold of 1;

FIG. 5 (b) is an exemplary graph of the distribution of target image regions in a scene image at a second threshold of 2;

FIG. 6 is a schematic diagram of a fourth embodiment according to the present application;

FIG. 7 is a schematic diagram of a fifth embodiment according to the application;

fig. 8 is a schematic view of a sixth embodiment according to the present application;

Fig. 9 is a schematic view of a seventh embodiment according to the present application;

FIG. 10 is a schematic diagram according to an eighth embodiment of the application;

FIG. 11 is a block diagram of an electronic device for implementing a traffic event detection method and/or a model training method of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

And the detection of traffic incidents is beneficial to improving the efficiency and the intelligent degree of traffic management. Wherein traffic events such as traffic jams, vehicle or pedestrian non-compliance with traffic lights, traffic incidents, etc.

In general, the entire monitoring image is input into a deep neural network, image feature extraction is performed on the entire monitoring image through the deep neural network, and a detection result of a traffic event is obtained based on the extracted image features. This approach enables detection of traffic events, but does not take into account that traffic events tend to occur only in a certain cell of the road scene. The whole monitoring image of the road scene is input into the deep neural network, so that the deep neural network is difficult to pay attention to the image characteristics associated with the traffic event, the learning difficulty of the deep learning network is increased, and the detection accuracy of the traffic event is low.

In order to solve the problems, the application provides a traffic event detection method, a traffic event detection device, road side equipment and a cloud control platform, which are applied to the artificial intelligence field, the intelligent traffic field, the computer vision field, the deep learning field and the cloud computing field in the data processing field. In the application, the image area with higher occurrence probability of the traffic event is determined in the scene image of the road scene, and the traffic event detection is performed based on the determined image area, so as to achieve the aim of improving the accuracy and efficiency of the traffic event detection.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application, where the application scenario is a road scenario. As shown in fig. 1, taking a road scene as an example of a road intersection, the road scene includes one or more image capturing devices 101 and one or more roadside devices 102 (two image capturing devices 101 and two roadside devices 102 are taken as examples in the figure), and the image capturing devices 101 and the roadside devices 102 are in communication connection through wired or wireless modes.

In this application scenario, the image capturing device 101 captures a scene image of a road scene, and transmits the captured scene image to the roadside device 102, and the roadside device 102 processes the scene image.

Optionally, the image capturing device 101 is a roadside camera.

Optionally, as shown in fig. 1, the application scenario further includes a remote server 103. The server 103 communicates with the image pickup apparatus 101 and/or the roadside apparatus 102 via a network. The remote server 103 may receive the scene image sent by the image acquisition device 101, or may receive the scene image sent by the roadside device 102, and process the scene image.

In a system architecture for intelligent traffic road collaboration, a road side device includes a road side sensing device (e.g., a road side camera) connected to a road side computing device (e.g., a road side computing unit RSCU) and a road side computing device connected to a server device that can communicate with an autonomous or assisted driving vehicle in various ways; in another system architecture, the roadside aware device itself includes a computing function, where the roadside aware device is directly connected to the server device. Wherein the above connections may be wired or wireless; the server equipment in the application is, for example, a cloud control platform, a vehicle-road collaborative management platform, a central subsystem, an edge computing platform, a cloud computing platform and the like.

For example, the execution body of each method embodiment of the present application may be an electronic device, which may be a road side device (such as road side device 102 in fig. 1), or a terminal device, or a server device (such as server 103 in fig. 1), or a traffic event detection apparatus or device, or other apparatus or device that may execute each method embodiment of the present application.

Further, the execution body of each method embodiment of the present application is a road side device, and the road side device includes a road side sensing device with a computing function, and a road side computing device connected to the road side sensing device. Or the execution subject of the embodiments of the application is a server device connected with the road side computing device, or a server device directly connected with the road side sensing device, etc.

Fig. 2 is a schematic diagram according to a first embodiment of the present application. As shown in fig. 2, the traffic event detection method provided in this embodiment includes:

S201, acquiring a scene image of the road scene.

The road scene may be a road scene of a road intersection, where two or more roads intersect, and where vehicles gather or vehicles gather with pedestrians, and the traffic condition is complex, and it is necessary to detect traffic events for the road intersection. Road intersections include, for example, intersections, T-junctions, roundabout ingress/egress, and the like.

In the step, a scene image acquired in real time by an image acquisition device arranged at a road scene can be acquired so as to detect traffic events in real time aiming at the road scene. Or scene images which are stored in a preset database and are acquired at historical moments by an image acquisition device arranged at the road scene can be acquired so as to detect traffic events occurring at the historical moments at the road scene.

S202, determining a target image area contained in the scene image, wherein the target image area is an image area with the occurrence probability of the traffic event being greater than or equal to a first threshold value.

The first threshold is a preset probability threshold, and can be set by a professional according to experience and experiments. For example, the first threshold is set to 80%, the target image area is an image area where the probability of occurrence of a traffic event is greater than or equal to 80%. Further, the first threshold may be adjusted by the user as desired. For example, the user may adjust the first threshold to 95% in order to determine an image region in the scene image having a time occurrence probability greater than or equal to 95%.

In this step, in the scene image, one or more image areas having a traffic event occurrence probability greater than or equal to the first threshold are determined, and for convenience of description, the image area having a traffic event occurrence probability greater than or equal to the first threshold is referred to as a target image area. It can be appreciated that the higher the probability of occurrence of a traffic event in an image area, the higher the degree of association of the image area with the traffic event, and the easier the image features related to the traffic event are extracted in the image area. Therefore, when the occurrence probability of the traffic event of the target image area is greater than or equal to the first threshold value, the association degree of the target image area and the traffic event can be considered to be higher, and when the traffic event is detected, the image features associated with the traffic time can be extracted from the target image area more easily, so that the detection accuracy of the traffic event can be improved.

In some embodiments, one possible implementation of S202 includes: and determining a target image area with the traffic event occurrence probability larger than or equal to a first threshold in the scene image according to a mapping relation between the preconfigured image area types and the traffic event occurrence probability.

The mapping relationship between the multiple image area types and the traffic event occurrence probability is configured by a professional according to experience and experiments, or in multiple sample images serving as training data, the traffic event occurrence times of the image area types are counted, and the traffic event occurrence probability of the image area types is determined according to the traffic event occurrence times of the image area types.

In the mapping relationship between the image area types and the traffic event occurrence probability, one image area type may correspond to one traffic event occurrence probability. The image region types include, for example, a sidewalk region, a motor vehicle lane region, a non-motor vehicle lane region, an intersection center region, and the like.

Specifically, in the scene image, a sub-image region belonging to the image region type is identified according to the image region type. And determining the traffic event occurrence probability corresponding to the image region type to which the image region belongs as the traffic event occurrence probability of the image region in the mapping relation between the image region types and the traffic event occurrence probability. And comparing the traffic event occurrence probability of the image area with a first threshold value, and determining a target image area with the traffic event occurrence probability greater than or equal to the first threshold value according to the comparison result, thereby improving the accuracy of the traffic event occurrence probability of the image area.

Alternatively, in addition to this possible implementation, the target image area may be determined based on the probability of occurrence of a traffic event for each pixel, as can be seen in particular in the subsequent embodiments shown in fig. 3 and 3.

And S203, detecting the traffic event in the target image area to obtain a detection result of whether the traffic event occurs in the target image area.

In the step, after determining one or more target image areas with the occurrence probability of the traffic event being greater than or equal to a first threshold, traffic event detection is performed for each target image area, and a detection result of whether the traffic event occurs in the target image area is obtained. For example, for each target image area, traffic event detection is performed on the target image area through a deep neural network model, so as to obtain a detection result of whether the traffic event occurs in the target image area. Therefore, compared with the detection of traffic events in the whole scene image area, the detection of the traffic events in the target image area has higher association degree between each target image area and the traffic events, and the detection accuracy of the traffic events can be effectively improved.

In the embodiment of the application, in a road scene, a target image area with the occurrence probability of traffic events being greater than or equal to a first threshold value is determined, and traffic event detection is carried out on the target image area. Therefore, the process of detecting the traffic event on the scene image is divided into two stages, wherein the first stage is to screen out a target image area with higher association degree with the traffic event before the traffic event is detected, the second stage is to detect the traffic event in the target image area, and the image features related to the traffic event are more easily extracted in the target image area, so that the accuracy of detecting the traffic event is improved.

Fig. 3 is a schematic diagram according to a second embodiment of the application. As shown in fig. 3, the traffic event detection method provided in this embodiment includes:

s301, acquiring a scene image of a road scene.

The implementation principle and technical effect of S301 may refer to the foregoing embodiments, and will not be described herein.

S302, determining the occurrence probability of traffic events corresponding to all pixels in the scene image.

The traffic event occurrence probability corresponding to the pixel is the probability of the traffic event occurrence at the scene position corresponding to the pixel. The greater the probability of occurrence of a traffic event corresponding to a pixel, the greater the probability of occurrence of a traffic event at a scene location corresponding to the pixel, in other words, the greater the probability that the scene location corresponding to the pixel is located at an event area. Therefore, the probability of occurrence of a traffic event corresponding to a pixel can be understood as the probability that the pixel is located in the image area where the traffic event occurs.

In the step, the traffic event occurrence probability is predicted for the scene image, and the traffic event occurrence probability corresponding to each pixel in the scene image is obtained.

In some embodiments, one possible implementation of S302 includes: and performing convolutional encoding processing and feature recovery processing on the scene image to obtain a feature image, determining each pixel value in the feature image as the occurrence probability of the traffic event corresponding to the corresponding pixel in the scene image, and realizing the prediction of the occurrence probability of the traffic event of the pixel.

The image size of the feature image is the same as that of the scene image, and pixels in the feature image correspond to pixels in the scene image one by one. The feature image is a single-channel image, i.e. each pixel in the feature image corresponds to a pixel value. The pixel value in the feature image is the probability of occurrence of a traffic event corresponding to the pixel in the scene image at the same position as the pixel value.

Further, the value range of the pixel value in the characteristic image is 0-1. For example, if the pixel value at the pixel position (0, 0) in the feature image is 1, it indicates that the traffic event occurrence probability corresponding to the pixel at the pixel position (0, 0) in the scene image is 1, that is, 100%. As another example, if the pixel value at the pixel position (2, 1) in the feature image is 0.5, it indicates that the probability of occurrence of the traffic event corresponding to the pixel at the pixel position (2, 1) in the scene image is 50%.

Before the convolutional encoding process and the feature recovery process are performed on the scene image, the convolutional encoding process and the feature recovery process can be trained by sample data serving as training data, so that pixel values of the feature image obtained by the convolutional encoding process and the feature recovery process on the scene image can accurately reflect the occurrence probability of traffic events corresponding to all pixels in the scene image.

Wherein the sample data includes a plurality of sample images. The training process of the convolutional encoding process and the feature recovery process may be performed by the execution body of the present embodiment, or may be performed by another device, for example, by another server or a computer.

Further, the sample data includes a plurality of sample images marked with traffic event occurrence areas. For example, a plurality of historical scene images of a road scene are taken as a plurality of sample images, whether traffic events occur in the sample images is manually determined, and traffic event occurrence areas of the sample images are marked, for example, the traffic event occurrence areas are selected on the sample images in a frame mode.

Therefore, during training, for each sample image, a feature image corresponding to the sample image is determined according to the traffic event occurrence area marked on the sample image, and the image size of the feature image corresponding to the sample image is the same as the image size of the sample image. For example, the pixel value corresponding to each pixel in the traffic event occurrence area marked on the sample image on the characteristic image of the sample image is determined to be 1, and the pixel value corresponding to each pixel in the rest area except the traffic event occurrence area on the characteristic image of the sample image is determined to be 0, so as to obtain the characteristic image of the sample image. After the sample image and the characteristic image corresponding to the sample image are obtained, the sample image can be used as input, the characteristic image corresponding to the sample image is used as a label, and the convolutional coding treatment and the characteristic recovery treatment are subjected to supervised training. In the training process, comparing the characteristic image corresponding to the sample image with the characteristic image output by the characteristic extraction model, and adjusting the characteristic extraction model based on the difference value of each pixel between the characteristic image corresponding to the sample image and the characteristic image output by the characteristic extraction model. And the feature image output by the feature extraction model is the traffic event occurrence probability corresponding to each pixel in the sample image predicted by the feature extraction model.

In the process of performing convolutional encoding processing and feature recovery processing on the scene image to obtain a feature image, the convolutional encoding processing can be performed on the scene image to obtain an encoded image, wherein the encoded image contains image features related to the occurrence probability of traffic events. The convolution process is a process of extracting image features at different levels of abstraction, and more detail features in the scene image can be lost, so that feature recovery is carried out on the coded image after the convolution coding treatment, and finally, a feature image is obtained. Therefore, the accuracy of the obtained characteristic image is improved through the convolution coding processing and the characteristic recovery processing.

Optionally, the number of the encoded images is plural, and the image size of the encoded image is smaller than the image size of the scene image, so as to better extract image features of different image areas of the scene image.

By way of example, feature extraction is performed on a scene image by convolution to obtain an encoded image representing the result of feature encoding of the scene image, the image size of the encoded image being, for example, one sixteenth of the image size of the scene image. And decoding the coded image through deconvolution to realize feature recovery processing of the coded image and obtain a feature image with the same size as the image of the scene image.

Further, the feature extraction model is used for carrying out convolutional encoding processing and feature recovery processing on the scene image to obtain a feature image. The feature extraction model is a convolutional neural network model, for example, the feature extraction model comprises a plurality of convolutional layers and a plurality of pooling layers, in the convolutional layers, convolutional encoding processing is performed on the scene image to obtain a plurality of encoded images, and in the pooling layers, feature recovery processing is performed on the encoded images to obtain feature images of the scene image.

The feature extraction model may be trained prior to the convolutional encoding process, and the feature recovery process, of the scene image using the feature extraction model. The feature extraction model may be trained by sample data as training data. Therefore, when the traffic event occurrence probability corresponding to each pixel in the scene image is predicted through the feature extraction model, the prediction accuracy of the traffic event occurrence probability corresponding to the pixel is improved.

The sample data for training the feature extraction model may refer to the aforementioned sample data for training the convolutional encoding process and the feature recovery process. When the sample data comprises a plurality of sample images marked with traffic event occurrence areas, determining characteristic images corresponding to the sample images according to the traffic event occurrence areas marked on the sample images for each sample image, wherein the image size of the characteristic images corresponding to the sample images is the same as the image size of the sample images. The sample image can be used as input, the characteristic image corresponding to the sample image is used as a label, and the characteristic extraction model is subjected to supervised training. The training algorithm of the feature extraction model is, for example, a gradient descent algorithm.

The training process of the feature extraction model may be performed on the execution body of the present embodiment, or may be performed on other devices, for example, on other servers or computers.

S303, determining a target image area according to the traffic event occurrence probability corresponding to each pixel and the first threshold value.

In this step, the larger the traffic event occurrence probability corresponding to the pixel is, the larger the traffic event occurrence probability of the image area constituted by the pixels is. Therefore, after the occurrence probability of the traffic event corresponding to each pixel is determined, the target image area with the occurrence probability of the traffic event greater than or equal to the first threshold value can be determined in the scene image according to the occurrence probability of the traffic event corresponding to each pixel and the first threshold value, so that the accuracy of the target image area is improved based on the occurrence probability of the traffic event corresponding to each pixel.

In some embodiments, one possible implementation of S303 includes:

Determining a plurality of image areas contained in the scene image; determining the occurrence probability of the traffic event corresponding to each image area according to the occurrence probability of the traffic event corresponding to each pixel in the image area aiming at each image area contained in the scene image; and determining the image area with the traffic event occurrence probability being greater than or equal to the first threshold value as a target image area.

Specifically, a plurality of image areas may be determined in the scene image according to a preset area size. The preset area size may be one or more, for example: the image area size is, for example, one or more of 3 times 3, 9 times 9, 16 times 16 in pixels, or units of length (millimeters, centimeters, etc.). When determining a plurality of image areas in a scene image, the determination of the image areas may be performed in a preset order, for example, from left to right and from top to bottom, and the image areas having the sizes of the plurality of images being the preset area sizes are determined in the scene image. Or the image area may be randomly determined in the scene image. After determining the plurality of image areas, for each image area, for example, a mean value, a mode, or a median of the traffic event occurrence probabilities corresponding to all the pixels in the image area is determined as the traffic event occurrence probability of the image area.

Optionally, S303 may determine the target image area based on a comparison between the probability of occurrence of the traffic event corresponding to each pixel and the first threshold, in addition to the possible implementation, which may be specifically seen in the subsequent embodiments shown in fig. 4 and fig. 4.

S304, detecting the traffic event in the target image area to obtain a detection result of whether the traffic event occurs in the target image area.

The implementation principle and technical effect of S304 may refer to the foregoing embodiments, and will not be described herein.

In the embodiment of the application, the target image area with the traffic event occurrence probability larger than or equal to the first threshold value is determined based on the traffic event occurrence probability corresponding to each pixel in the scene image of the road scene, so that the accuracy of determining the target image area in the scene image is improved, and the accuracy of detecting the traffic time is further improved when the traffic event is detected in the target image area.

Fig. 4 is a schematic view of a third embodiment according to the present application. As shown in fig. 4, the traffic event detection method provided in this embodiment includes:

s401, acquiring a scene image of a road scene.

S402, determining the occurrence probability of traffic events corresponding to all pixels in the scene image.

The implementation process and the technical principle of S401 and S402 may refer to the foregoing embodiments, and are not repeated.

S403, determining a target pixel with the traffic event occurrence probability being greater than or equal to a first threshold value.

In the step, after determining the occurrence probability of the traffic event corresponding to each pixel in the scene image, the occurrence probability of the traffic event of each pixel is compared with a first threshold value, and if the occurrence probability of the traffic event of the pixel is greater than or equal to the first threshold value, the pixel is determined as a target pixel. Therefore, the target pixels with high occurrence probability of the traffic event are screened out from all pixels in the scene image.

Optionally, if there is no pixel in the scene image with the occurrence probability of the traffic event being greater than or equal to the first threshold, it is determined that there is no target image area in the scene image with the occurrence probability of the traffic event being greater than or equal to the first threshold, in other words, the probability of detecting the traffic event in the scene image is lower, so that it may be determined that the traffic event is not occurring in the scene image, so as to improve the detection efficiency of the traffic event. For example, a plurality of road side cameras are usually set in a road scene, and there may be an area monitored by the road side cameras as an area with a low occurrence probability of a traffic event, at this time, there may be no pixel with a occurrence probability of the traffic event greater than or equal to a first threshold value in a scene image captured by the road side cameras.

S404, determining a target image area according to the distribution of the target pixels in the scene image.

In this step, the target pixel is a pixel in the scene image with a traffic event occurrence probability greater than or equal to the first threshold value, so that a distribution area of the target pixel in the scene image can be determined according to the distribution of the target pixel in the scene image, and the distribution area of the target pixel in the scene image is determined as a target image area with a traffic event occurrence probability greater than or equal to the first threshold value, so as to improve the accuracy of determining the target image area in the scene image.

In some embodiments, one possible implementation of S404 includes: and in the scene image, carrying out aggregation processing on the target pixels to obtain a target image area, wherein the distance between adjacent target pixels in the target image area is smaller than or equal to a second threshold value.

Specifically, in the scene image, the target pixels with the distance smaller than or equal to the second threshold value are determined to be positioned in the same image area, so that aggregation processing of the target pixels is realized. In one or more image areas obtained after the aggregation processing, the distance between adjacent target pixels is smaller than or equal to a second threshold value. It can be seen that the one or more image areas are distribution areas of target pixels, and that a substantial majority of the pixels in the one or more image areas are target pixels. Thus, one or more image areas obtained after the aggregation process can be determined as target image areas.

Wherein the unit of distance is a pixel. There may be overlapping areas for different target image areas.

Optionally, the second threshold is a preset constant value.

Further, the second threshold value is 1, that is, the distance between adjacent target pixels in the target image area is less than or equal to 1 pixel, in other words, only two adjacent target pixels are included in the target image area. At this time, the probability of occurrence of the traffic event corresponding to all the pixels in the target image area is greater than or equal to the first threshold, and the accuracy of determining the target image area in the scene image is improved.

Further, the second threshold value is larger than 1, that is, the distance between adjacent target pixels among the target image pixels is larger than 1 pixel, in other words, in the target image region, in addition to the larger number of target pixels, pixels whose traffic event occurrence probability is smaller than the first threshold value are included. At the moment, the probability of occurrence of the traffic event corresponding to each pixel is not necessarily large in the image area with high association degree with the traffic event, and the recall rate of the target image area in the scene image is improved.

Optionally, the second threshold is a variable related to a total number of pixels of the scene image. In other words, the second threshold may be adjusted according to the total number of pixels of the scene image to improve the rationality of the second threshold, thereby determining the accuracy of the target image region in the scene image.

Further, the second threshold increases with increasing total number of pixels of the scene image, i.e. the more the total number of pixels of the scene image, the greater the second threshold. When the increase of pixels of the scene image is fully considered, the number of pixels with the probability of occurrence of the traffic event smaller than the first threshold value in the image area with higher association degree with the traffic event is correspondingly increased, and the accuracy of determining the target image area in the scene image is improved.

By way of example, fig. 5 (a) is an exemplary graph of the distribution of the target image region in the scene image when the second threshold is 1, and fig. 5 (b) is an exemplary graph of the distribution of the target image region in the scene image when the second threshold is 2. In fig. 5 (a) and fig. 5 (b), each square indicates a pixel, where a value of 1 in the square indicates that the corresponding pixel is a target pixel, and a value of 0 in the square indicates that the corresponding pixel is a non-target pixel (i.e., a pixel in the scene image having a probability of occurrence of a traffic event less than a first threshold value). The diagonally marked region represents the target image region.

As can be seen from fig. 5 (a), when the second threshold value is 1, only the target pixels are included in the target image area, and the distance between each adjacent target pixels is equal to 1. As can be seen from fig. 5 (b), when the second threshold value is 2, the target pixel region includes target pixels and non-target pixels, and the number of target pixels is large.

In some embodiments, another possible implementation of S404 includes: determining a plurality of image areas contained in the scene image; determining the number ratio of target pixels in each image area contained in the scene image; an image region in which the number of target pixels is greater than or equal to a third threshold is determined as a target image region.

Specifically, the plurality of image areas may be determined in the scene image according to a preset area size, wherein the number of preset area sizes may be one or more, and reference may be made to the description of the foregoing embodiments. When determining a plurality of image areas in a scene image, the determination of the image areas may be performed in a preset order, for example, from left to right and from top to bottom, and the image areas having the sizes of the plurality of images being the preset area sizes are determined in the scene image. Or the image area may be randomly determined in the scene image.

After determining a plurality of image areas included in the scene image, determining the number proportion of target pixels in the image area for each image area, if the number proportion of the target pixels in the image area is greater than or equal to a third threshold value, determining that the occurrence probability of the traffic event corresponding to the image area is greater than or equal to the first threshold value, and determining the image area as the target image area. Wherein the number of target pixels in the image area is the ratio of the number of target pixels in the image area to the total number of pixels in the image area.

Wherein the third threshold is a preset constant value. Therefore, the accuracy of determining the target image area in the scene image can be improved by setting the third threshold value having a larger value. The user can adjust the number of the target image areas obtained by screening by adjusting the size of the third threshold. The smaller the third threshold value, the greater the number of target image areas obtained by screening.

And S405, detecting the traffic event in the target image area to obtain a detection result of whether the traffic event occurs in the target image area.

The implementation principle and technical effect of S405 may refer to the description of the foregoing embodiment.

In the embodiment of the application, the occurrence probability of the traffic event corresponding to each pixel is determined in the scene image of the road scene, the pixel with the occurrence probability of the traffic event being greater than or equal to the first threshold value is determined as the target pixel, and the target image area is determined according to the distribution of the target pixel, so that the accuracy of determining the target image area in the scene image is improved, and further, the accuracy of detecting the traffic time is further improved when the traffic event is detected in the target image area.

In some embodiments, when the traffic event detection is performed on the target image area, and a detection result of whether the traffic event occurs in the target image area is obtained, one possible implementation manner (that is, one possible implementation manner of S203, S304, or S405) includes: and detecting the traffic event in the target image area through an event detection model to obtain a detection result of whether the traffic event occurs in the target image area, wherein the event detection model is used for detecting whether the traffic event occurs in the image.

The event detection model may be a semantic understanding model for semantic understanding of a video or image, for example, a time period network (Temporal Segment Networks, TSN) model may be employed.

Specifically, after obtaining one or more target image areas, the scene image may be cropped according to the target image areas to obtain one or more sub-images. The obtained one or more sub-images can be input into an event detection model, the event detection model extracts image features related to traffic events from the sub-images, and whether the traffic events occur in the sub-images or not is determined according to the extracted image features, namely whether the traffic time occurs in a target image area corresponding to the sub-images or not is determined, so that the accuracy of time detection of the target image area is improved.

Optionally, the areas of different target image areas may have different sizes, and before inputting the sub-images corresponding to the target image areas into the event detection model, normalization processing may be performed on the sub-images corresponding to the target image areas, so that the sizes of the sub-images corresponding to the target image areas are consistent, and further, the accuracy of traffic event detection performed on the sub-images by the event detection model is improved. The normalization processing is performed on the sub-images corresponding to the target image areas, for example, the sub-images corresponding to the target image areas are enlarged or reduced, so that the image sizes of the sub-images of the target image areas are consistent.

The trained event detection model may be obtained or trained prior to traffic event detection of the target image region by the event detection model. During training, a sample image marked with traffic event occurrence results is obtained, the sample image is taken as input, the traffic event occurrence results of the sample image are taken as labels, and the event detection model is subjected to supervised training to obtain a trained time detection model. The training algorithm of the event detection model is, for example, a gradient descent algorithm, and the traffic event occurrence result of the sample image includes whether a traffic event occurs in the sample image.

Optionally, the process of determining the target image area of the scene image by the feature extraction model (that is, performing convolutional encoding processing and feature recovery processing on the scene image to obtain a feature image, determining the occurrence probability of the traffic event corresponding to each pixel in the scene image based on the feature image, and determining the target image area in the scene image based on the occurrence probability of the traffic event corresponding to each pixel), and the process of performing time detection on the target image area by the event detection model may be combined with each other. In this case, the training processes of the feature extraction model and the event detection model are independent of each other, and for example, the training of the feature extraction model and the training of the event detection model may be performed on the same device or different devices, and sample images respectively related to the training of the feature extraction model and the training of the event detection model may be the same image or different images. Therefore, the traffic event detection in the scene image is divided into two stages through the feature extraction model and the time detection model, one stage is a stage of screening out a target image area with higher traffic event occurrence probability from the scene image, and the other stage is a stage of carrying out event detection on the target image area, so that the image features related to the traffic event are detected in the event detection stage, and the accuracy of the traffic event detection is improved.

In some embodiments, based on the traffic event detection method according to any one of the foregoing method embodiments, after obtaining a detection result of whether a traffic event occurs in the target image area, the traffic event detection method further includes at least one of the following: displaying a target image area where the traffic event occurs; displaying the number of target image areas where traffic events occur; transmitting a target image area where a traffic event occurs to a target server and/or a target terminal; and sending the number of the target image areas where the traffic event occurs to the target server and/or the target terminal.

In an example, when the current device has a display device, for example, the roadside camera is directly connected with the display screen, after determining that the traffic event occurs in the target image area, the target image area in which the traffic event occurs (for example, displaying a sub-image corresponding to the target image area in which the traffic event occurs, or displaying a scene image and marking the target image area in which the traffic event occurs in the scene image) and/or the number of the target image areas in which the traffic event occurs may be displayed on the display device.

Therefore, compared with the situation that whether the traffic event occurs in the scene image is only output, the method and the device can display the target image area and the number of the target image areas of the traffic event, improve the richness and the understandability (or the interpretability) of the traffic event detection result, facilitate the user to know the detailed area of the traffic event occurrence in the road scene, and improve the user experience.

Optionally, the current device may also store the target image area in the scene image where the traffic event occurred, and/or the number of target image areas where the traffic event occurred.

In another example, after the detection result of whether the traffic event occurs in the target image area, the target image area where the traffic event occurs and/or the number of the target image areas where the traffic event occurs may be further transmitted to the target server and/or the target terminal, so that the target server stores the number of the target image area where the traffic event occurs and/or the number of the target image areas where the traffic event occurs and/or the target terminal stores or displays the number of the target image area where the traffic event occurs. The number of the target image areas of the traffic event and/or the number of the target image areas of the traffic event are/is increased, and the remote user can know the traffic event occurrence condition of the road scene by accessing the server or checking the target terminal.

The target server is a cloud control platform, a vehicle-road collaborative management platform, a center subsystem, an edge computing platform and a cloud computing platform. Target terminals such as cell phones, computers, tablet computers, etc.

In some embodiments, based on the traffic event detection method shown in any one of the foregoing method embodiments, after obtaining a detection result of whether a traffic event occurs in the target image area, if it is determined that a traffic event occurs in the target image area, a geographic position corresponding to the target image area where the traffic event occurs may also be determined, so that the traffic event detection result is assisted by the geographic position corresponding to the target image area where the traffic event occurs, thereby further improving richness and interpretability of the traffic event detection result, and facilitating a user to accurately locate a detailed position where the traffic event occurs.

Further, after the geographic position corresponding to the target image area where the traffic event occurs is determined, the geographic position can be displayed, and/or the geographic position can be sent to the target server and/or the target terminal, so that a user can conveniently obtain the detailed position where the traffic event occurs from the current device or from the target server and the target terminal, and user experience is effectively improved.

According to an embodiment of the present application, there is provided a model training method for training a feature extraction model in the foregoing embodiment.

Fig. 6 is a schematic diagram according to a fourth embodiment of the present application. As shown in fig. 6, the model training method provided in this embodiment includes:

s601, acquiring sample data.

S602, training the feature extraction model according to sample data.

In the step, for each sample image, a feature image corresponding to the sample image is determined according to the traffic event occurrence area marked on the sample image, and the image size of the feature image corresponding to the sample image is the same as the image size of the sample image. The sample image can be used as input, the characteristic image corresponding to the sample image is used as a label, and the characteristic extraction model is subjected to supervised training. In the training process, comparing the characteristic image corresponding to the sample image with the characteristic image output by the characteristic extraction model, and adjusting the characteristic extraction model based on the difference value of each pixel between the characteristic image corresponding to the sample image and the characteristic image output by the characteristic extraction model. And the feature image output by the feature extraction model is the traffic event occurrence probability corresponding to each pixel in the sample image predicted by the feature extraction model.

The training algorithm of the feature extraction model is, for example, a gradient descent algorithm.

In some embodiments, one possible implementation of S602 includes: for each sample image, determining a characteristic image corresponding to the sample image according to the traffic event occurrence area marked on the sample image; and performing supervised training on the feature extraction model according to the sample image and the feature image corresponding to the sample image.

Specifically, when the sample data includes a plurality of sample images marked with traffic event occurrence areas, for each sample image, determining a feature image corresponding to the sample image according to the traffic event occurrence areas marked on the sample image, where the image size of the feature image corresponding to the sample image is the same as the image size of the sample image. For example, the pixel value corresponding to each pixel in the traffic event occurrence area marked on the sample image on the characteristic image of the sample image is determined to be 1, and the pixel value corresponding to each pixel in the rest area except the traffic event occurrence area on the characteristic image of the sample image is determined to be 0, so as to obtain the characteristic image of the sample image. After the feature images are obtained, taking the sample images as input, taking the feature images corresponding to the sample images as labels, and performing supervised training on the feature extraction model.

In the embodiment of the application, the feature extraction model for extracting the image features is trained based on the sample image so as to improve the accuracy of the occurrence probability of the traffic event of each pixel point in the predicted image of the feature extraction model and further improve the accuracy of traffic time detection.

According to the embodiment of the application, the application further provides a traffic event detection device.

Fig. 7 is a schematic diagram according to a fifth embodiment of the present application. As shown in fig. 7, the traffic event detection device provided in this embodiment includes:

an acquiring unit 701, configured to acquire a scene image of a road scene;

A determining unit 702, configured to determine a target image area included in the scene image, where the target image area is an image area with a traffic event occurrence probability greater than or equal to a first threshold value;

the detecting unit 703 is configured to detect a traffic event in the target image area, and obtain a detection result of whether the traffic event occurs in the target image area.

In one possible implementation, the determining unit 702 includes a first determining module and a second determining module. The first determining module is used for determining the occurrence probability of the traffic event corresponding to each pixel in the scene image, and the second determining module is used for determining the target image area according to the occurrence probability of the traffic event corresponding to each pixel and the first threshold value.

In one possible implementation, the first determination module includes an image processing module and a first determination sub-module. The image processing module is used for performing convolutional encoding processing and feature recovery processing on the scene image to obtain a feature image, and the first determining submodule is used for determining each pixel value in the feature image as the occurrence probability of the traffic event corresponding to the corresponding pixel in the scene image.

In one possible implementation, the image processing module is specifically configured to: performing convolutional encoding processing on the scene image to obtain an encoded image, wherein the image size of the encoded image is smaller than that of the scene image; and carrying out feature recovery processing on the coded image to obtain a feature image, wherein the image size of the feature image is the same as that of the scene image.

In one possible implementation, the image processing module is specifically configured to: and performing convolutional encoding processing and feature recovery processing on the scene image through the feature extraction model to obtain a feature image.

In one possible implementation, the second determination module includes a comparison module and a second determination sub-module. The comparison module is used for determining target pixels with the occurrence probability of the traffic event being greater than or equal to a first threshold value, and the second determination submodule is used for determining a target image area according to distribution of the target pixels in the scene image.

In one possible implementation, the second determining submodule is specifically configured to: and in the scene image, carrying out aggregation processing on the target pixels to obtain a target image area, wherein the distance between adjacent target pixels in the target image area is smaller than or equal to a second threshold value.

In one possible implementation, the second determining submodule is specifically configured to: determining a plurality of sub-image areas contained in the scene image; determining a number duty cycle of the target pixels in the sub-image region; an image region in which the number of target pixels is greater than or equal to a third threshold is determined as a target image region.

In one possible implementation, the detection unit 703 comprises a detection module. The detection module is used for detecting traffic events in the target image area through the event detection model to obtain a detection result of whether the traffic events occur in the target image area, and the event detection model is used for detecting whether the traffic events occur in the image.

Fig. 8 is a schematic diagram according to a sixth embodiment of the present application. As shown in fig. 8, the traffic event detection device provided in this embodiment includes:

An acquiring unit 801, configured to acquire a scene image of a road scene;

a determining unit 802, configured to determine a target image area included in the scene image, where the target image area is an image area with a traffic event occurrence probability greater than or equal to a first threshold value;

and the detection unit 803 is used for detecting the traffic event in the target image area to obtain a detection result of whether the traffic event occurs in the target image area.

In one possible implementation, the traffic event detection device further includes at least one of:

a first display unit 804 for displaying a target image area where a traffic event occurs;

a second display unit 805 for displaying the number of target image areas in which traffic events occur;

A first sending unit 806, configured to send, to a target server and/or a target terminal, a target image area where a traffic event occurs;

A second transmitting unit 807 configured to transmit the number of the target image areas in which the traffic event occurs to the target server and/or the target terminal.

Fig. 9 is a schematic diagram according to a seventh embodiment of the present application. As shown in fig. 9, the traffic event detection device provided in this embodiment includes:

an acquiring unit 901, configured to acquire a scene image of a road scene;

a determining unit 902, configured to determine a target image area included in the scene image, where the target image area is an image area with a traffic event occurrence probability greater than or equal to a first threshold value;

the detecting unit 903 is configured to detect a traffic event in the target image area, and obtain a detection result of whether the traffic event occurs in the target image area.

And the positioning unit 904 is used for determining the geographic position corresponding to the target image area where the traffic event occurs.

a third display unit 905 for displaying a geographical location;

and a third sending unit 906, configured to send the geographic location to the target server and/or the target terminal.

The object processing apparatus provided in fig. 7, 8 and 9 is used for executing the corresponding foregoing method embodiments, and its implementation principle and technical effects are similar, and this embodiment will not be repeated here.

Fig. 10 is a schematic diagram according to an eighth embodiment of the present application. As shown in fig. 10, the model training apparatus provided in this embodiment includes:

an acquisition unit 1001 for acquiring sample data;

A training unit 1002, configured to train the feature extraction model according to the sample data;

In one possible implementation, the training unit 1002 includes:

the determining module is used for determining characteristic images corresponding to the sample images according to the traffic event occurrence areas marked on the sample images;

and the training module is used for performing supervised training on the feature extraction model according to the sample image and the feature image corresponding to the sample image.

The model training device provided in fig. 10 is used for executing the corresponding foregoing method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.

According to an embodiment of the present application, there is also provided an electronic apparatus including: at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aspects provided in any one of the embodiments described above.

According to an embodiment of the present application, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the arrangement provided in any one of the embodiments described above.

According to an embodiment of the present application, there is also provided a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any one of the embodiments described above.

According to the embodiment of the application, the application further provides the road side equipment, and the road side equipment comprises the electronic equipment provided by the embodiment.

According to the embodiment of the application, the application further provides a cloud control platform, and the cloud control platform comprises the electronic equipment provided by the embodiment.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 11, the electronic device 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the electronic device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in the electronic device 1100 are connected to the I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the various methods and processes described above, such as traffic event detection methods and/or model training methods, for example, in some embodiments, the traffic event detection methods and/or model training methods may be implemented as computer software programs tangibly embodied on a machine-readable medium, such as the storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto electronic device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the traffic event detection method and/or the model training method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the traffic event detection method and/or the model training method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual PRIVATE SERVER" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A traffic event detection method, comprising:

acquiring a scene image of a road scene;

detecting traffic events in the target image area to obtain a detection result of whether the traffic events occur in the target image area;

Determining whether a traffic event occurs in the scene image according to a detection result of whether the traffic event occurs in the target image area;

The detecting the traffic event in the target image area to obtain a detection result of whether the traffic event occurs in the target image area, including:

Detecting traffic events in the target image area through an event detection model to obtain a detection result of whether the traffic events occur in the target image area, wherein the event detection model is used for detecting whether the traffic events occur in the image;

the determining a target image area contained in the scene image includes:

Determining a plurality of image areas in the scene image according to the preset area size;

For each image area, determining the number proportion of target pixels in the image area, if the number proportion of target pixels in the image area is larger than or equal to a third threshold value, determining that the occurrence probability of the traffic event corresponding to the image area is larger than or equal to the first threshold value, and determining the image area as a target image area; the target pixel is a pixel with the occurrence probability of the traffic event in the scene image being greater than or equal to a first threshold value;

If no pixels with the traffic event occurrence probability greater than or equal to the first threshold value exist in the scene image, determining that no target image area with the traffic event occurrence probability greater than or equal to the first threshold value exists in the scene image.

2. The traffic event detection method according to claim 1, wherein the determining a target image area contained in the scene image includes:

determining the occurrence probability of traffic events corresponding to pixels in the scene image;

And determining the target image area according to the traffic event occurrence probability corresponding to each pixel and the first threshold value.

3. The traffic event detection method according to claim 2, wherein the determining the probability of occurrence of the traffic event corresponding to each pixel in the scene image includes:

performing convolutional encoding processing and feature recovery processing on the scene image to obtain a feature image;

And determining each pixel value in the characteristic image as the occurrence probability of the traffic event corresponding to the corresponding pixel in the scene image.

4. The traffic event detection method according to claim 3, wherein the performing convolutional encoding processing and feature recovery processing on the scene image to obtain a feature image includes:

performing convolutional encoding processing on the scene image to obtain an encoded image, wherein the image size of the encoded image is smaller than that of the scene image;

And carrying out feature recovery processing on the coded image to obtain a feature image, wherein the image size of the feature image is the same as that of the scene image.

5. The traffic event detection method according to claim 3, wherein the performing convolutional encoding processing and feature recovery processing on the scene image to obtain a feature image includes:

And performing convolutional encoding processing and feature recovery processing on the scene image through a feature extraction model to obtain a feature image.

6. The traffic event detection method according to any one of claims 2 to 5, wherein the determining the target image area according to the traffic event occurrence probability corresponding to each pixel and the first threshold value includes:

Determining a target pixel with the traffic event occurrence probability being greater than or equal to the first threshold;

And determining the target image area according to the distribution of the target pixels in the scene image.

7. The traffic event detection method according to claim 6, wherein the determining the target image area according to the distribution of the target pixels in the scene image comprises:

And in the scene image, performing aggregation processing on the target pixels to obtain the target image area, wherein the distance between adjacent target pixels in the target image area is smaller than or equal to a second threshold value.

8. The traffic event detection method according to any one of claims 1 to 5, further comprising at least one of:

displaying a target image area where the traffic event occurs;

displaying the number of target image areas where traffic events occur;

transmitting a target image area where a traffic event occurs to a target server and/or a target terminal;

And sending the number of the target image areas where the traffic event occurs to the target server and/or the target terminal.

9. The traffic event detection method according to any one of claims 1 to 5, further comprising:

And determining the geographic position corresponding to the target image area where the traffic event occurs.

10. The traffic event detection method according to claim 9, further comprising at least one of:

Displaying the geographic position;

And sending the geographic position to a target server and/or a target terminal.

11. The method of claim 1, further comprising:

Acquiring sample data;

training a feature extraction model according to the sample data;

The sample data comprises a plurality of sample images marked with traffic event occurrence areas, and the feature extraction model is used for extracting image features;

the training the feature extraction model according to the sample data comprises the following steps:

For each sample image, determining a characteristic image corresponding to the sample image according to a traffic event occurrence area marked on the sample image;

and performing supervised training on the feature extraction model according to the sample image and the feature image corresponding to the sample image.

12. A traffic event detection device, comprising:

an acquisition unit configured to acquire a scene image of a road scene;

The detection unit is used for detecting traffic events in the target image area to obtain a detection result of whether the traffic events occur in the target image area; determining whether a traffic event occurs in the scene image according to a detection result of whether the traffic event occurs in the target image area;

the detection unit includes:

The detection module is used for detecting traffic events in the target image area through an event detection model to obtain a detection result of whether the traffic events occur in the target image area, and the event detection model is used for detecting whether the traffic events occur in the image;

The determining unit is specifically configured to:

13. The traffic event detection device according to claim 12, wherein the determination unit includes:

the first determining module is used for determining the occurrence probability of the traffic event corresponding to each pixel in the scene image;

And the second determining module is used for determining the target image area according to the traffic event occurrence probability corresponding to each pixel and the first threshold value.

14. The traffic event detection device of claim 13, wherein the first determination module comprises:

The image processing module is used for carrying out convolutional encoding processing and characteristic recovery processing on the scene image to obtain a characteristic image;

And the first determining submodule is used for determining each pixel value in the characteristic image as the occurrence probability of the traffic event corresponding to the corresponding pixel in the scene image.

15. The traffic event detection device according to claim 14, wherein the image processing module is specifically configured to:

16. The traffic event detection device according to claim 14, wherein the image processing module is specifically configured to:

17. The traffic event detection device of any of claims 13-16, wherein the second determination module comprises:

The comparison module is used for determining a target pixel with the traffic event occurrence probability being greater than or equal to the first threshold value;

And the second determining submodule is used for determining the target image area according to the distribution of the target pixels in the scene image.

18. The traffic event detection device of claim 17, wherein the second determination submodule is specifically configured to:

19. The traffic event detection device according to any one of claims 12-16, further comprising at least one of:

the first display unit is used for displaying a target image area where the traffic event occurs;

A second display unit for displaying the number of target image areas in which traffic events occur;

The first sending unit is used for sending the target image area where the traffic event occurs to the target server and/or the target terminal;

and the second sending unit is used for sending the number of the target image areas where the traffic event occurs to the target server and/or the target terminal.

20. The traffic event detection device according to any one of claims 12 to 16, further comprising:

And the positioning unit is used for determining the geographic position corresponding to the target image area where the traffic event occurs.

21. The traffic event detection device of claim 20, further comprising at least one of:

A third display unit for displaying the geographic location;

and the third sending unit is used for sending the geographic position to the target server and/or the target terminal.

22. The apparatus as recited in claim 12, further comprising:

An acquisition unit configured to acquire sample data;

wherein, training unit includes:

23. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

24. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11.

25. A roadside apparatus comprising: the electronic device of claim 23.

26. A cloud control platform, comprising: the electronic device of claim 23.