CN113255439A

CN113255439A - Obstacle identification method, device, system, terminal and cloud

Info

Publication number: CN113255439A
Application number: CN202110395709.6A
Authority: CN
Inventors: 高翔; 何洪刚; 王磊; 方昌銮; 黄凯明
Original assignee: Streamax Technology Co Ltd
Current assignee: Streamax Technology Co Ltd
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-08-13
Anticipated expiration: 2041-04-13
Also published as: CN113255439B

Abstract

The application discloses a method, a device, a system, a terminal, a cloud and a computer readable storage medium for obstacle identification. The obstacle identification method applied to the terminal is characterized in that the terminal is installed on a vehicle, a camera is further installed on the vehicle, and the obstacle identification method comprises the following steps: detecting whether barrier information exists in a road video acquired by the camera in real time based on a preset first deep learning model; if the obstacle information is detected to exist, generating an alarm signal; cutting the road video based on the alarm signal to obtain a key video segment; uploading the key video segment to a cloud end to indicate the cloud end to verify whether the key video segment is effective or not based on a preset second deep learning model, wherein the computing power of the second deep learning model is superior to that of the first deep learning model. Through the scheme, the obstacles on the road can be timely and accurately identified.

Description

Obstacle identification method, device, system, terminal and cloud

Technical Field

The application belongs to the technical field of image processing, and particularly relates to an obstacle identification method, an obstacle identification device, an obstacle identification system, a terminal, a cloud and a computer-readable storage medium.

Background

As the amount of automobiles kept increases, people are paying more attention to problems related to road safety. The situation that the driver is uneven in quality and can throw garbage at any time is considered; and some vehicles may also be thrown by objects transported by the vehicles due to bumpy roads during transportation. Currently, most cities are still actively patrolled by sanitation workers to find obstacles on roads. Due to the influence of traffic flow and patrol speed of sanitation workers, obstacles can not be timely and accurately identified, and hidden dangers can be left for road safety.

Disclosure of Invention

The application provides an obstacle identification method, an obstacle identification device, an obstacle identification system, a terminal, a cloud and a computer readable storage medium, which can timely and accurately identify obstacles appearing in a road.

In a first aspect, the present application provides an obstacle recognition method, where the obstacle recognition method is applied to a terminal, the terminal is mounted on a vehicle, and a camera is further mounted on the vehicle, and the obstacle recognition method includes:

detecting whether barrier information exists in a road video acquired by the camera in real time based on a preset first deep learning model;

if the obstacle information is detected to exist, generating an alarm signal;

cutting the road video based on the alarm signal to obtain a key video segment;

uploading the key video segment to a cloud end to indicate the cloud end to verify whether the key video segment is effective or not based on a preset second deep learning model, wherein the computing power of the second deep learning model is superior to that of the first deep learning model.

In a second aspect, the present application provides an obstacle identification method, where the obstacle identification method is applied to a cloud, and the obstacle identification method includes:

receiving a key video segment uploaded by a terminal, wherein the key video segment is obtained by cutting a road video by the terminal based on an alarm signal, the road video is collected by a camera mounted on a vehicle, and the alarm signal is generated by detecting obstacle information of the road video based on a preset first deep learning model;

verifying whether the key video segment is valid or not based on a preset second deep learning model, wherein the computing power of the second deep learning model is superior to that of the first deep learning model;

and if the key video segment is confirmed to be effective, generating an evidence chain based on the key video segment for storage.

In a third aspect, the present application provides an obstacle recognition device, where the obstacle recognition device is applied to a terminal, the terminal is mounted on a vehicle, and a camera is further mounted on the vehicle, and the obstacle recognition device includes:

the detection unit is used for detecting whether barrier information exists in the road video acquired by the camera in real time based on a preset first deep learning model;

a generation unit configured to generate an alarm signal if the presence of the obstacle information is detected;

the cutting unit is used for cutting the road video based on the alarm signal to obtain a key video segment;

and the uploading unit is used for uploading the key video segment to a cloud end so as to indicate the cloud end to verify whether the key video segment is effective or not based on a preset second deep learning model, wherein the computing power of the second deep learning model is superior to that of the first deep learning model.

In a fourth aspect, the present application provides an obstacle recognition device, where the obstacle recognition device is applied to a cloud, and the obstacle recognition device includes:

the receiving unit is used for receiving a key video segment uploaded by a terminal, wherein the key video segment is obtained by cutting a road video by the terminal based on an alarm signal, the road video is collected by a camera mounted on a vehicle, and the alarm signal is generated by detecting obstacle information of the road video based on a preset first deep learning model;

a verification unit, configured to verify whether the key video segment is valid based on a preset second deep learning model, where a computation power of the second deep learning model is better than a computation power of the first deep learning model;

and a storage unit, configured to generate an evidence chain based on the key video segment and store the evidence chain if the key video segment is confirmed to be valid.

In a fifth aspect, the present application provides an obstacle recognition system, where the obstacle recognition system includes a terminal, a cloud, and a camera, where the terminal and the camera are installed in a same vehicle;

the terminal includes:

the uploading unit is used for uploading the key video segment to the cloud end;

the cloud comprises:

the receiving unit is used for receiving the key video segments uploaded by the terminal;

the verification unit is used for verifying whether the key video segment is effective or not based on a preset second deep learning model;

a storage unit, configured to generate an evidence chain based on the key video segment for storage if it is determined that the key video segment is valid;

wherein the computational power of the second deep learning model is superior to the computational power of the first deep learning model.

In a sixth aspect, the present application provides a terminal, where the terminal includes a first memory, a first processor, and a first computer program stored in the first memory and executable on the first processor, and the first processor implements the steps of the method according to the first aspect when executing the first computer program.

In a seventh aspect, the present application provides a cloud, where the cloud includes a second memory, a second processor, and a second computer program stored in the second memory and executable on the second processor, and the second processor implements the steps of the method according to the second aspect when executing the second computer program.

In an eighth aspect, the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of the method of the first aspect; alternatively, the computer program as described above, when executed by a processor, performs the steps of the method as described above in the second aspect

In a ninth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method as described in the first aspect above; alternatively, the computer program as described above, when executed by one or more processors, performs the steps of the method as described above in the second aspect.

Compared with the prior art, the application has the beneficial effects that: the vehicle is provided with a terminal and a camera, the terminal can detect whether barrier information exists in road videos collected by the camera in real time based on a preset first deep learning model, when the barrier information exists, an alarm signal is generated, the road videos are cut based on the alarm signal, a key video segment is obtained, and finally the key video segment can be uploaded to a cloud end to indicate the cloud end to verify whether the key video segment is effective based on a preset second deep learning model, wherein the computing power of the second deep learning model is superior to that of the first deep learning model. Because the terminal and the camera are both arranged on the vehicle and are used for shooting the road to obtain the road video, the road video of the road where the vehicle passes can be obtained along with the running of the vehicle, and the road video is identified by the terminal in real time to judge whether the road where the vehicle passes has obstacles, and the process can greatly improve the identification efficiency of the obstacles on the road; in addition, the terminal uploads the key video segment containing the obstacle information to the cloud for further verification, and the identification accuracy of the obstacles on the road can be greatly improved in the process. It is understood that the beneficial effects of the second to ninth aspects can be seen from the description of the first aspect, and are not repeated herein.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of an implementation of an obstacle identification method provided in an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating an implementation of another obstacle identification method provided in the embodiment of the present application;

fig. 3 is a schematic diagram of an architecture of an obstacle identification system according to an embodiment of the present application;

fig. 4 is a schematic architecture diagram of a terminal provided in an embodiment of the present application;

fig. 5 is a block diagram of a structure of an obstacle recognition device according to an embodiment of the present application;

fig. 6 is a block diagram of another obstacle identification device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a cloud terminal provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution proposed in the present application, the following description will be given by way of specific examples.

The following describes an obstacle identification method provided by an embodiment of the present application, where the obstacle identification method provided by the embodiment of the present application is applied to a terminal. Wherein, this terminal mounting is in the vehicle, and still installs the camera on this vehicle. For example only, the camera may be mounted at the front of the vehicle body (e.g., at the front windshield of the vehicle) or at the rear of the vehicle body (e.g., at the rear windshield of the vehicle); if the camera is arranged at the front part of the vehicle body, the camera faces the front of the vehicle and can acquire and obtain a road video in front of the vehicle; if the camera is installed at the rear part of the vehicle body, the camera faces the rear part of the vehicle, and road videos behind the vehicle can be acquired. Referring to fig. 1, the obstacle identification method includes:

and 101, detecting whether barrier information exists in the road video acquired by the camera in real time based on a preset first deep learning model.

In an embodiment of the present application, the first deep learning model may be a target detection model. For example Only, the target detection model may employ ssd (single Shot multi-box detector) target detection algorithm, yolo (you Only Look one) target detection algorithm, or other low-computational target detection algorithm, which is not limited herein. Through the first deep learning model, the terminal can detect whether barrier information exists in the road video in real time.

It should be noted that the detection rate of the first deep learning model should be maintained at a higher level, at least equal to the output rate of the camera, so as to improve the recognition efficiency of the terminal and ensure the real-time performance of the recognition. For example, the camera outputs 20 frames of video frames per second to form a road video which is smoothly played; the terminal should also control its detection rate to be above 20 frames per second.

And 102, if the obstacle information is detected, generating an alarm signal.

In the embodiment of the application, when the terminal detects and finds that the obstacle information appears in the road video, an alarm signal is generated. Optionally, the alarm signal may carry a target time point, where the target time point is a time point at which the obstacle information is detected to exist.

And 103, cutting the road video based on the alarm signal to obtain a key video segment.

In the embodiment of the application, although the first deep learning model can quickly detect whether obstacle information exists in a video, due to low calculation power, false detection may occur. That is, the first deep learning model may erroneously detect background information as the obstacle information, resulting in false detection. Based on the method, the terminal can cut out the video segment (judged by the first deep learning model) with the obstacle information from the road video based on the alarm signal. For convenience of explanation, the video clip can be recorded as a key video segment.

And 104, uploading the key video segment to a cloud end to indicate the cloud end to verify whether the key video segment is effective or not based on a preset second deep learning model.

In the embodiment of the present application, as shown in step 103, the key video segment is a video segment determined to have obstacle information through the first deep learning model, but the first deep learning model may be mistakenly detected, and therefore, it cannot be completely determined that the key video segment objectively has obstacle information. Based on this, can upload this key video section to the high in the clouds, verify this key video section by the high in the clouds whether effective based on the second degree of depth learning model that predetermines, promptly, whether this key video end has barrier information to carry out further judgement by the high in the clouds. It should be noted that the computing power of the second deep learning model adopted by the cloud is superior to that of the first deep learning model adopted by the terminal, that is, the accuracy and precision of the second deep learning model are higher than those of the first deep learning model.

For example only, the second deep learning model may be a semantic segmentation model, through which the key video segment is semantically segmented frame by frame, and the semantic segmentation result frame by frame is analyzed, for example, whether the track obtained based on the segmentation result (i.e., the obstacle region) of each frame is reasonable or not is determined by a frame difference method of the previous and next video frames, so as to determine whether the key video segment is valid or not. It is considered that the key video segment verified to be valid via the second deep learning model has a high possibility of actually capturing an obstacle in the road.

In some embodiments, in order to ensure the accuracy of the alarm signal, the step 101 specifically includes:

and A1, detecting whether the obstacle information exists in each frame of video frame in the road video in real time based on a preset first deep learning model.

A2, when the target video frame is detected, performing track analysis on the obstacle information existing in the target video frame.

And A3, if the result of the track analysis meets the preset track condition, determining that the road video acquired by the camera has obstacle information.

The following scenario can be imagined: a barrier is arranged at a certain position of the road; when the vehicle drives to the obstacle, the obstacle is closer to the vehicle until the obstacle enters the field angle range of the camera; finally, the vehicle passes over the barrier and drives away, and the barrier leaves from the field angle range of the camera; that is, the camera goes through the following processes: the obstacle is not shot, the obstacle is shot, the obstacle has a stable track in a shot picture, and the obstacle disappears; represented in the road video, namely: obstacle information does not exist in the first few frames of video frames of the road video; obstacle information exists in the middle frames, and image positions corresponding to the obstacle information of the frames can form a continuous track; there is no obstacle information in the latter few video frames.

Based on this, the terminal can sequentially detect whether the obstacle information exists in each frame of video frames in real time based on the first deep learning model according to the time sequence of each frame of video frames in the road video, and once the first frame of video frame (also called a target video frame) with the obstacle information exists is detected, the track analysis can be performed on the obstacle information existing in the target video frame, specifically: extracting obstacle information corresponding to each of N consecutive video frames after the target video frame, where N is a positive integer, and for example only, N may be set to a value of 30 or 50, and the like, which is not limited herein; obtaining the position information of each obstacle information in the corresponding video frame, wherein the position information can be understood as the central coordinates of the area occupied by the obstacle information in the corresponding video frame; and fitting to obtain an obstacle track based on the position information of each obstacle information in the corresponding video frame, wherein the obstacle track is a track analysis result. It will be appreciated that the obstacle trajectory should at least satisfy the following trajectory conditions: the track is continuous, and no track point has sudden change, namely, no track point deviates from the track too far; the route of the track is matched with a target route corresponding to the driving direction of the vehicle, for example, the vehicle drives forwards, the camera shoots a road image in front of the vehicle, and then the shot obstacles are approximately represented by a route from top to bottom in the road video, and the route is the target route. After the obstacle track is determined to meet all track conditions, the obstacle information in the road video acquired by the camera can be determined.

It should be noted that, in consideration of the fact that obstacles may be encountered at multiple places during the running of the vehicle, the first frame video frame (i.e., the target video frame) with obstacle information proposed in the present application does not refer to the first frame video frame with obstacle information in the road video obtained by the camera from the start of the current power-on operation to the current time. In fact, for a certain video frame in which the existence of the obstacle information is detected, as long as no obstacle information exists in the M consecutive frames of the video frame, the video frame can be regarded as the first frame of the video frame in which the obstacle exists. M is a positive integer, and is only an example, M may be set to a value of 10, 20, or 50, etc., and is not limited herein.

For example, if there is no obstacle information in the 1 st frame video frame to the 100 th frame video frame, and there is obstacle information in the 101 st frame video frame, the 101 st frame video frame can be used as the target video frame, and the above steps a2 and A3 are performed; then, no obstacle information exists from the 150 th frame video frame to the 200 th frame video frame, and obstacle information exists from the 201 th frame video frame, so that the 201 th frame video frame can be used as the target video frame, and the steps a2 and A3 are executed.

In some embodiments, in order to improve the processing efficiency of the cloud end on the key video segment, the step 103 specifically includes:

and B1, determining the video starting and ending time based on the generation time of the alarm signal and a preset clipping period.

And B2, cutting the road video according to the video start-stop time to obtain the key video segment.

The preset cutting cycle may be set by a user, for example, may be set to 10 seconds. The electronic equipment can reserve half of the cutting period forward and half of the cutting period backward by taking the generation time of the alarm signal as the central time point of the key video segment. For example, the generation time of the alarm signal is 25 minutes and 48 seconds at x years, x months and x days 10, the clipping period is 10 seconds, and the video start-stop time is 25 minutes and 43 seconds at x years, x months and x days 10 to 25 minutes and 53 seconds at x years, x months and x days 10. Each video frame of the road video corresponds to a time stamp, and the time stamp represents the output time of the video frame. Therefore, the terminal can extract all video frames with the output time within the video starting and ending time, and the cut key video segment can be obtained.

In some embodiments, in order to make the cloud end further know about the obstacle and locate the obstacle in time, before step 104, the obstacle identification method further includes:

acquiring vehicle running information corresponding to the key video segment, wherein the vehicle running information comprises vehicle position information, vehicle speed information and vehicle identification information;

accordingly, the step 104 includes:

and packaging the key video segment and the vehicle driving information into a verification data packet and uploading the verification data packet to the cloud.

Although the position of the vehicle can be changed in real time in the driving process, the vehicle does not move instantaneously; that is, within a short time (e.g., 10 seconds), the vehicle can generally travel only a short distance (e.g., several tens of meters); based on this, the vehicle position information may include position information of the vehicle at the time of generating the warning signal. Also, the vehicle speed information may include speed information of the vehicle at the time of generating the warning signal.

Of course, the vehicle position information may also include position information of the vehicle at the start time of the key video segment and position information of the vehicle at the end time of the key video segment; alternatively, the position information of the vehicle at each time of the key video segment may be included, which is not limited herein. Similarly, the vehicle speed information may include average speed information of the vehicle during the video start-stop time; alternatively, the speed information of the vehicle at each time of the key video segment may be included, which is not limited herein.

The vehicle identification information may be a license plate number of the vehicle; alternatively, the Vehicle Identification Number (VIN) may be a Vehicle Identification Number (VIN), which is not limited herein. In fact, it is sufficient that the vehicle identification information uniquely identifies the vehicle.

As can be seen from the above, in the embodiment of the application, since the terminal and the camera are both mounted on the vehicle, and the camera is used for shooting the road to obtain the road video, the road video of the road where the vehicle passes can be obtained along with the running of the vehicle, and the terminal identifies the road videos in real time to judge whether the road where the vehicle passes has obstacles, so that the identification efficiency of the obstacles on the road can be greatly improved in the process; in addition, the terminal uploads the key video segment containing the obstacle information to the cloud for further verification, and the identification accuracy of the obstacles on the road can be greatly improved in the process.

The obstacle identification method provided by the embodiment of the present application is described below, wherein the obstacle identification method provided by the embodiment of the present application is applied to a cloud. Referring to fig. 2, the obstacle identification method includes:

step 201, receiving a key video segment uploaded by a terminal.

In this embodiment of the application, the key video segment is obtained by cutting a road video by a terminal based on an alarm signal, the road video is collected by a camera mounted on a vehicle, and the alarm signal is generated by detecting obstacle information of the road video based on a preset first deep learning model. The cloud terminal only needs to receive the key video segments uploaded by the terminal.

In some embodiments, the terminal packages the key video segment and the vehicle driving information corresponding to the key video segment into a verification data package for uploading. Therefore, the cloud end can receive the verification data packet uploaded by the terminal, and then analyze the verification data packet to obtain the key video segment carried in the verification data packet and the vehicle driving information corresponding to the key video segment.

Step 202, verifying whether the key video segment is valid based on a preset second deep learning model.

In the embodiment of the present application, although the key video segment is a video segment determined to have obstacle information through the first deep learning model, the first deep learning model may be mistakenly detected, and thus it cannot be completely determined that the key video segment objectively has obstacle information. Based on this, whether the key video segment is valid or not can be verified by the cloud based on the preset second deep learning model, that is, whether barrier information exists in the key video end or not can be further judged by the cloud. It should be noted that the computing power of the second deep learning model adopted by the cloud is superior to that of the first deep learning model adopted by the terminal, that is, the accuracy and precision of the second deep learning model are higher than those of the first deep learning model.

For example only, the second deep learning model may be a semantic segmentation model, through which the key video segment is semantically segmented frame by frame, and the semantic segmentation result frame by frame is analyzed, for example, whether the track obtained based on the segmentation result (i.e., the obstacle region) of each frame is reasonable or not is determined by a frame difference method of the previous and next video frames, so as to determine whether the key video segment is valid or not. It is considered that, after the second deep learning model verification, it is still determined that the valid key video segment has a high possibility of actually capturing an obstacle in the road.

In step 203, if the key video segment is confirmed to be valid, an evidence chain is generated based on the key video segment for storage.

In the embodiment of the application, if the key video segment is valid after being verified by the second deep learning model, it indicates that the key video segment has a high possibility of actually capturing an obstacle in a road, and at this time, an evidence chain may be generated based on the key video segment for storage. Considering that the verification data packet can also carry vehicle driving information corresponding to the key video segment, the cloud can store the key video segment and the vehicle driving information into a preset database as an evidence chain. Subsequent supervisors can screen related evidence chains in the database by taking the positions of obstacles, license plates or vehicle speeds and the like as search words.

In some embodiments, the cloud may further determine the key video segment by combining big data, and the obstacle identification method further includes:

if the key video segment is determined to be valid based on the second deep learning model, determining a corresponding obstacle occurrence weight based on the vehicle driving information;

accordingly, step 203 includes:

and if the key video segment is confirmed to be effective and the weight of the obstacle is higher than a preset weight threshold value, generating an evidence chain based on the key video segment and the vehicle running information and storing the evidence chain.

The cloud end can distribute the site weight of each site from the site dimension, the time weight of each time period from the time dimension and the vehicle type weight of each vehicle type from the vehicle type dimension according to the stored content in the database, respectively according to the occurrence frequency of the throwing events in different sites, the occurrence frequency of the throwing events in different time periods and the occurrence frequency of the throwing events on different vehicle types. Then, based on the vehicle position information, the vehicle speed information and the vehicle identification information in the vehicle running information of the key video segment, the corresponding target location weight, the target time weight and the target vehicle type weight are determined, and the normalization value of the sum of the three is the weight of the obstacle occurrence.

For example, 100 evidence chains are stored in the database, where the number of the evidence chains related to the location a is 50 (that is, the vehicle position information corresponding to 50 key video segments is the location a), and the obstacle occurrence weight of the location a is 0.5; by analogy, the occurrence weight of the obstacle at the point B is 0.3, the occurrence weight of the obstacle at the point C is 0.1, the occurrence weights of the obstacles at other points D, E and F are less than 0.1 and are not listed one by one, and meanwhile, the weight threshold is set to be 0.15; for a new key video segment acquired by the cloud, assuming that the key video segment is determined to be valid by the second deep learning model and the vehicle position information in the vehicle driving information corresponding to the key video segment is the location a, it is known that the target location weight of the key video segment is 0.5. By analogy, the target time weight of the key video segment is 0.3, the target vehicle type weight is 0.1, then the normalized value of the sum of the target location weight, the target time weight and the target vehicle type weight is 0.25, the normalized value is the obstacle occurrence weight and is higher than the weight threshold value by 0.15, and the cloud end can generate an evidence chain based on the key video segment and the corresponding vehicle driving information for storage.

On the contrary, if a certain key video segment is valid, but the weight of the corresponding obstacle is not higher than the weight threshold, the key video segment can be marked as a video to be manually verified, and the supervisor is waited to manually verify the key video segments. If the key video segment is confirmed to be valid after manual verification, an evidence chain can be generated and stored based on the key video segment and the corresponding vehicle running information.

In some embodiments, the invalid key video segments can be pushed to the terminal, and the terminal optimizes the first deep learning model adopted by the invalid key video segments as a negative sample so as to improve the accuracy and precision of the first deep learning model. The invalid key video segments not only comprise key video segments confirmed to be invalid after manual verification, but also comprise key video segments confirmed to be invalid through a second deep learning model. In addition, the cloud can also optimize the second deep learning model by taking the key video segment confirmed to be invalid after manual verification as a negative sample so as to improve the accuracy and precision of the second deep learning model.

In some embodiments, taking the second deep learning model as an example of the semantic segmentation model, the semantic segmentation model may be trained based on a training set of samples combined attention mechanism.

The training sample set comprises at least one training image and a mask image associated with each training image. That is, the training sample set includes at least one image pair, and one image pair is composed of one training image and a mask image corresponding to the training image. Considering that the obstacle recognition method according to the embodiment of the present application is mainly used for recognizing obstacles on a road, the situation of obstacles that may appear on the road may be simulated first, and then a plurality of training images may be obtained by shooting a simulated scene. Then, for each training image, obtaining a mask image based on the obstacle region label in the training image; that is, in the training image, the pixel value of the pixel point of the obstacle region is labeled as 1, and the pixel value of the pixel point of the background region (that is, other region outside the obstacle region) is labeled as 0, so that the mask image uniquely corresponding to each training image can be obtained. Of course, the image collected by the camera during the driving process of the vehicle may also be used as a training image, and the corresponding mask image is obtained through manual labeling, so as to form a new image pair, thereby enriching the training sample set.

The following introduces the training process of the semantic segmentation model: the semantic segmentation model comprises at least one convolution-pooling structure, wherein the convolution-pooling structure comprises at least one convolution layer and a pooling layer; the overall training thought of the semantic segmentation model is similar to that of the prior art, and only for each convolution-pooling structure, the following steps can be added by combining an attention mechanism:

and C1, acquiring a first characteristic diagram output by the convolution-pooling structure.

In the embodiment of the present application, if the convolution-pooling structure is the first convolution-pooling structure in the semantic segmentation model, the input of the convolution-pooling structure is the training image input in the current training; conversely, if the convolution-pooling structure is not the first convolution-pooling structure in the semantic segmentation model, the input of the convolution-pooling structure is the feature map passed based on the output of the last convolution-pooling structure. For convenience of explanation, in the embodiments of the present application, the training image input in the training of this time is referred to as a target training image.

For any convolution-pooling structure, the input of the convolution-pooling structure is first convolved by each convolution layer of the convolution-pooling structure, and then pooled by a pooling layer to obtain the output feature map of the convolution-pooling structure. It should be noted that the pooling operation performed by the pooling layer is typically maximum pooling. For convenience of explanation, the embodiment of the present application will be described with reference to the feature map output by the convolution-pooling structure as a first feature map. In fact, in the semantic segmentation model, no matter where the convolution-pooling structure is located, the output first feature map is directly or indirectly obtained based on the target training image.

And C2, splicing the first feature map and the target mask image to obtain a spliced feature map.

In the embodiment of the application, an attention mechanism is adopted, mainly in order to enable a semantic segmentation model to consider barrier information in an image more; that is, the obstacle information is a target to be detected with respect to the background information in the image. Based on this, a target mask image may be obtained from the training sample set, the target mask image referring to the mask image with which the target training image is associated in the training sample set. By means of direct splicing operation, the target mask image and the first feature map are spliced together, and a formed new image can be recorded as a spliced feature map.

In some embodiments, when the first feature map and the target mask image are spliced, the target mask image is adjusted to a target size, where the target size is the size of the first feature map, that is, the sizes of the first feature map and the target mask image are unified; and then copying the target mask image to ensure that the number of the target mask image is the same as that of the channels of the first feature map, and finally splicing the first feature map and all the target mask images in the channel dimension to obtain a spliced feature map. The size of the stitched feature obtained through the above process is still maintained at the target size, and the number of channels becomes twice the number of channels of the first feature. For example only, the target mask image may be scaled by a linear interpolation operation to achieve uniform sizes of the first feature map and the target mask image, and the linear interpolation operation is not limited herein.

And C3, fusing the spliced characteristic diagrams to obtain a second characteristic diagram.

In the embodiment of the application, a simple stitching operation cannot directly influence the target mask image on the first feature map, but it is desirable that information in the target mask image directly influence the first feature map, so that the semantic segmentation model can relatively focus on data related to obstacle information in the first feature map, and relatively ignore data related to background information in the first feature map. Based on this, can fuse the concatenation characteristic map, promptly, fuse the data of the pixel of same position on a plurality of passageways together, can obtain the second characteristic map from this.

In some embodiments, the second feature map may be obtained by fusing the above-mentioned stitched feature map in the channel dimension using a1 × 1 convolution kernel.

C4, transferring the second feature map to the next network layer of the convolution-pooling structure in the semantic segmentation model.

In the embodiment of the present application, after obtaining the second feature map, the second feature map may be transferred to the next network layer through the activation function, and used as an input of the next network layer; alternatively, the second feature map may be subjected to mean pooling for secondary dimensionality reduction, and the second feature map subjected to secondary dimensionality reduction is transferred to the next network layer through an activation function to be used as an input of the next network layer, which is not limited herein. Assuming that the current convolution-pooling structure is not the last convolution-pooling structure, the next network layer is typically a convolutional layer (which is the first convolutional layer of the next convolution-pooling structure); assuming that the current convolution-pooling structure is the last convolution-pooling structure, the next network layer is typically the fully-connected layer.

It can be appreciated that the embodiment of the present application actually improves the training process of the existing semantic segmentation model, and combines the attention mechanism to add the above step 201 and 204 after each convolution-pooling structure. In the training process, the training result (the training result is expressed in a mask mode) corresponding to the target training image is still output by the semantic segmentation model, the training result is compared with the target mask image to calculate the loss, the parameters of the semantic segmentation model are adjusted based on the loss, and the training is finished after the loss is converged.

The above process is described below by way of a simple specific example:

assume that the semantic segmentation model has a plurality of convolution-pooling structures, convolution-pooling structures 1, 2, 3, … …, n, respectively.

For convolution pooling structure 1, the input is a target training image of H W (e.g., 300W), where H is high and W is wide. After passing through convolutional layer 11, convolutional layer 12, and pooling layer 13 in convolutional-pooling structure 1, a first feature F11 of H ' × W ' × C ' (e.g., 32 × 64) is obtained, where C is the number of channels.

Adjusting the target mask image of H '. multidot.W' to the size of H '. multidot.W', and copying to obtain C target mask images of H '. multidot.W'; the C H ' × W ' target mask images are stitched to the H ' × W ' × C ' first feature map F11 in the channel dimension to obtain a stitched feature map F12 of H ' × W ' (+ C ') (i.e., H ' × W ' × 2C ').

By fusing the stitched feature map F12 of H ' × W ' (+ C ') through a1 × 1 convolution kernel, on the one hand, the computational power required for the subsequent semantic segmentation model can be reduced, and on the other hand, channel dimensionality reduction is achieved, so that the stitched feature map F12 is fused in channel dimensions to the second feature map F13 of H ' × W ' × C ".

The second characteristic diagram F13 of H '× W' × C "is transferred to the convolution pooling structure 2 via the activation function as an input to the convolution pooling structure 2, and the operation thereof is substantially the same as that of the convolution pooling structure 1, and will not be described herein again.

In some embodiments, the value of each position on the second feature map actually represents the attention degree of the position, so that after the second feature map is obtained, the feature map under a random channel in the second feature map is extracted as the feature map to be displayed, and a thermodynamic diagram is generated based on the feature map to be displayed, and the mathematical expression is that: for a three-dimensional feature map (i.e., a second feature map of H '× W' × C "), it is effectively converted into a two-dimensional matrix (i.e., a feature map to be displayed of H '× W'). The step of generating the thermodynamic diagram based on the feature map to be displayed may specifically be: and adjusting the feature map to be displayed to an original size, wherein the original size is the size of the target training image, and then generating a thermodynamic diagram based on the adjusted feature map to be displayed. For the pixel point with the coordinate (x, y) in the adjusted feature map to be displayed, the larger the pixel value of the pixel point is, the brighter the pixel point at the same position (that is, the coordinate (x, y)) in the generated thermodynamic diagram is.

In some embodiments, the semantic segmentation model generally employs a UNet network structure. For the later convolution-pooling structure, the second feature map based on the convolution-pooling structure can more truly describe the attention degree of the target training image. That is, for the plurality of convolution-pooling structures 1, 2, … …, n of the semantic segmentation model, the thermodynamic diagram generated based on the second feature map obtained by convolving the pooling structure n is relatively more effective.

It is understood that the first deep learning model can also be trained in the manner described above in conjunction with the attention mechanism, which is not limited herein.

The embodiment of the present application further provides an obstacle recognition system, as shown in fig. 3, the obstacle recognition system 3 includes at least one terminal 31, a cloud 32, and at least one camera 33. Wherein, a terminal can be connected with one or more cameras, and to the terminal and the camera that are connected, the terminal and the camera are installed in the same vehicle. Each terminal 31 implements the steps of the obstacle identification method applied to the terminal (i.e., step 101-.

In some embodiments, as shown in fig. 4, one terminal 31 may be split into multiple devices, for example, into a first device and a second device, where the first device establishes a connection with the second device and the camera 33, and the second device establishes a connection with the first device, the camera 33, and the cloud 32. The first device may perform the operations of

steps

101 and 102 and transmit the generated alarm signal to the second device; the second device executes the operations of

steps

103 and 104 after receiving the alarm signal transmitted by the first device.

In correspondence to the obstacle identification method applied to the terminal provided in the foregoing, an embodiment of the present application provides an obstacle identification device, where the obstacle identification device is applied to the terminal, and as shown in fig. 5, an obstacle identification device 500 in the embodiment of the present application includes:

the detection unit 501 is configured to detect whether obstacle information exists in a road video acquired by the camera in real time based on a preset first deep learning model;

a generation unit 502 for generating an alarm signal if the presence of the obstacle information is detected;

a clipping unit 503, configured to clip the road video based on the alarm signal to obtain a key video segment;

an uploading unit 504, configured to upload the key video segment to a cloud, so as to instruct the cloud to verify whether the key video segment is valid based on a preset second deep learning model, where a computing power of the second deep learning model is better than a computing power of the first deep learning model.

Optionally, the detecting unit 501 includes:

the video frame detection subunit is used for detecting whether barrier information exists in each frame of video frames in the road video in real time based on a preset first deep learning model;

the track analysis subunit is used for carrying out track analysis on obstacle information existing in a target video frame when the target video frame is detected, wherein the target video frame is a first frame video frame with the obstacle information;

and the information determining subunit is used for determining that the obstacle information exists in the road video acquired by the camera if the result of the track analysis meets a preset track condition.

Optionally, the clipping unit 503 includes:

the time determining subunit is used for determining the video starting and ending time based on the generation time of the alarm signal and a preset cutting period;

and the road cutting subunit is used for cutting the road video according to the video start-stop time to obtain the key video segment.

Optionally, the obstacle recognition device 500 further includes:

the acquiring unit is used for acquiring vehicle running information corresponding to the key video segment, wherein the vehicle running information comprises vehicle position information, vehicle speed information and vehicle identification information;

accordingly, the uploading unit 504 is specifically configured to package the key video segment and the vehicle driving information into a verification data packet and upload the verification data packet to the cloud.

Corresponding to the obstacle identification method applied to the cloud end provided in the foregoing, an embodiment of the present application provides an obstacle identification device, where the obstacle identification device is applied to the cloud end, as shown in fig. 6, an obstacle identification device 600 in an embodiment of the present application includes:

a receiving unit 601, configured to receive a key video segment uploaded by a terminal, where the key video segment is obtained by the terminal by cutting a road video based on an alarm signal, the road video is collected by a camera mounted on a vehicle, and the alarm signal is generated by detecting obstacle information of the road video based on a preset first deep learning model;

a verifying unit 602, configured to verify whether the key video segment is valid based on a preset second deep learning model, where a computation power of the second deep learning model is better than a computation power of the first deep learning model;

the storage unit 603 is configured to generate and store an evidence chain based on the key video segment if it is confirmed that the key video segment is valid.

Optionally, the receiving unit 601 includes:

a data packet receiving subunit, configured to receive a verification data packet uploaded by the terminal, where the verification data packet carries the key video segment;

and the data packet analyzing subunit is used for analyzing the verification data packet to obtain the key video segment.

Optionally, the verification data packet further carries vehicle driving information corresponding to the key video segment, where the vehicle driving information includes vehicle position information, vehicle speed information, and vehicle identification information; the obstacle recognition device 600 further includes:

a weight determination unit configured to determine a corresponding obstacle occurrence weight based on the vehicle travel information if it is determined that the key video segment is valid based on the second deep learning model;

accordingly, the storage unit 603 is specifically configured to generate and store an evidence chain based on the key video segment and the vehicle driving information if it is determined that the key video segment is valid and the obstacle occurrence weight is higher than a preset weight threshold.

Corresponding to the obstacle identification method applied to the terminal provided above, an embodiment of the present application provides a terminal, please refer to fig. 7, where the terminal 7 in the embodiment of the present application includes: a first memory 701, one or more first processors 702 (only one shown in fig. 7) and a first computer program stored on the first memory 701 and executable on the first processor. Wherein: the first memory 701 is used for storing software programs and modules, and the first processor 702 executes various functional applications and data processing by running the software programs and units stored in the first memory 701, so as to acquire resources corresponding to preset events. Specifically, when the first processor 702 runs the first computer program stored in the first memory 701, steps in the obstacle identification method applied to the terminal, such as steps 101 to 104 shown in fig. 1, may be implemented, and are not described herein again.

It should be understood that, in the embodiment of the present Application, the first Processor 702 may be a Central Processing Unit (CPU), and the first Processor 702 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or any conventional processor or the like.

The first memory 701 may include a read only memory and a random access memory, and provides instructions and data to the first processor 702. A portion or all of the first memory 701 may also include non-volatile random access memory. For example, the first memory 701 may also store information of device classes.

Corresponding to the obstacle identification method applied to the cloud end provided in the foregoing, an embodiment of the present application provides a cloud end, please refer to fig. 8, where the cloud end 8 in the embodiment of the present application includes: a second memory 801, one or more second processors 802 (only one shown in fig. 8) and a second computer program stored on the second memory 801 and executable on the second processor. Wherein: the second memory 801 is used for storing software programs and modules, and the second processor 802 executes various functional applications and data processing by running the software programs and units stored in the second memory 801 to acquire resources corresponding to preset events. Specifically, when the second processor 802 runs the second computer program stored in the second memory 801, the steps of the method for identifying an obstacle applied to the cloud, such as steps 201 to 203 shown in fig. 2, may be implemented, and are not described herein again.

It should be understood that, in the embodiment of the present Application, the second Processor 802 may be a Central Processing Unit (CPU), and the second Processor 802 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor, or may be any conventional processor or the like.

The second memory 801 may include a read-only memory and a random access memory, and provides instructions and data to the second processor 802. A portion or all of the second memory 801 may also include non-volatile random access memory. For example, the second memory 801 may also store device class information.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of external device software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules or units is only one logical functional division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer-readable storage medium may include: any entity or device capable of carrying the above-described computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer readable Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the computer readable storage medium may contain other contents which can be appropriately increased or decreased according to the requirements of the legislation and the patent practice in the jurisdiction, for example, in some jurisdictions, the computer readable storage medium does not include an electrical carrier signal and a telecommunication signal according to the legislation and the patent practice.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An obstacle recognition method is applied to a terminal, the terminal is installed on a vehicle, a camera is further installed on the vehicle, and the obstacle recognition method comprises the following steps:

if the obstacle information is detected to exist, generating an alarm signal;

cutting the road video based on the alarm signal to obtain a key video segment;

2. The obstacle recognition method according to claim 1, wherein the detecting whether the obstacle information exists in the road video acquired by the camera in real time based on a preset first deep learning model comprises:

detecting whether barrier information exists in each frame of video frame in the road video in real time based on a preset first deep learning model;

when a target video frame is detected, performing track analysis on obstacle information existing in the target video frame, wherein the target video frame is a first frame video frame with the obstacle information;

and if the track analysis result meets a preset track condition, determining that obstacle information exists in the road video acquired by the camera.

3. The obstacle identifying method according to claim 1, wherein the cropping the road video based on the warning signal to obtain a key video segment includes:

determining the starting and ending time of the video based on the generation time of the alarm signal and a preset cutting period;

and cutting the road video according to the video starting and stopping time to obtain the key video segment.

4. The obstacle identification method according to claim 1, wherein before the uploading of the key video segment to a cloud, the obstacle identification method further comprises:

correspondingly, the uploading the key video segment to the cloud includes:

5. An obstacle identification method is applied to a cloud end, and comprises the following steps:

verifying whether the key video segment is valid or not based on a preset second deep learning model, wherein the computing power of the second deep learning model is better than that of the first deep learning model;

6. The obstacle identifying method according to claim 5, wherein the key video segment uploaded by the receiving terminal includes:

receiving a verification data packet uploaded by the terminal, wherein the verification data packet carries the key video segment;

and analyzing the verification data packet to obtain the key video segment.

7. The obstacle identification method according to claim 6, wherein the verification data packet further carries vehicle driving information corresponding to the key video segment, the vehicle driving information including vehicle position information, vehicle speed information, and vehicle identification information; after verifying whether the key video segment is valid based on the preset second deep learning model, the obstacle identification method further comprises:

if the key video segment is determined to be valid based on the second deep learning model, determining the corresponding obstacle occurrence weight based on the vehicle driving information;

correspondingly, if the key video segment is confirmed to be valid, generating an evidence chain based on the key video segment for storage, including:

and if the key video segment is confirmed to be effective and the weight of the obstacle is higher than a preset weight threshold value, generating an evidence chain based on the key video segment and the vehicle driving information for storage.

8. An obstacle recognition device, characterized in that, obstacle recognition device is applied to the terminal, the terminal is installed in the vehicle, still install the camera on the vehicle, obstacle recognition device includes:

the uploading unit is used for uploading the key video segment to a cloud end so as to indicate the cloud end to verify whether the key video segment is effective or not based on a preset second deep learning model, wherein the computing power of the second deep learning model is superior to that of the first deep learning model.

9. An obstacle recognition device, wherein the obstacle recognition device is applied to a cloud, the obstacle recognition device comprising:

the verification unit is used for verifying whether the key video segment is effective or not based on a preset second deep learning model, wherein the computing power of the second deep learning model is superior to that of the first deep learning model;

and the storage unit is used for generating an evidence chain based on the key video segment for storage if the key video segment is confirmed to be valid.

10. The obstacle identification system is characterized by comprising a terminal, a cloud end and a camera, wherein the terminal and the camera are installed on the same vehicle;

the terminal includes:

the uploading unit is used for uploading the key video segment to a cloud end;

the cloud comprises:

the verification unit is used for verifying whether the key video segment is valid or not based on a preset second deep learning model;

the storage unit is used for generating an evidence chain based on the key video segment for storage if the key video segment is confirmed to be valid;

wherein the computational power of the second deep learning model is better than the computational power of the first deep learning model.

11. A terminal comprising a first memory, a first processor and a first computer program stored in the first memory and executable on the first processor, characterized in that the first processor implements the method according to any of claims 1 to 4 when executing the first computer program.

12. Cloud comprising a second memory, a second processor and a second computer program stored in the second memory and executable on the second processor, wherein the second processor implements the method according to any of claims 5 to 7 when executing the second computer program.

13. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4; alternatively, the computer program, when executed by a processor, implements the method of any of claims 5 to 7.