CN113255439B

CN113255439B - Obstacle identification method, device, system, terminal and cloud

Info

Publication number: CN113255439B
Application number: CN202110395709.6A
Authority: CN
Inventors: 高翔; 何洪刚; 王磊; 方昌銮; 黄凯明
Original assignee: Streamax Technology Co Ltd
Current assignee: Streamax Technology Co Ltd
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2024-01-12
Anticipated expiration: 2041-04-13
Also published as: CN113255439A

Abstract

The application discloses an obstacle identification method, an obstacle identification device, an obstacle identification system, a terminal, a cloud terminal and a computer readable storage medium. Wherein, for the obstacle recognition method applied to the terminal, the terminal is installed in the vehicle, still install the camera on the vehicle, the obstacle recognition method includes: detecting whether obstacle information exists in the road video acquired by the camera in real time based on a preset first deep learning model; if the existence of the obstacle information is detected, generating an alarm signal; cutting the road video based on the alarm signal to obtain a key video segment; and uploading the key video segment to a cloud end to instruct the cloud end to verify whether the key video segment is valid or not based on a preset second deep learning model, wherein the computational power of the second deep learning model is better than that of the first deep learning model. Through the scheme of the application, the obstacle appearing in the road can be timely and accurately identified.

Description

Obstacle identification method, device, system, terminal and cloud

Technical Field

The application belongs to the technical field of image processing, and particularly relates to an obstacle recognition method, an obstacle recognition device, an obstacle recognition system, a terminal, a cloud end and a computer readable storage medium.

Background

As the amount of automobile maintenance increases, there is an increasing concern about road safety concerns. Considering that the quality of drivers is uneven, the situation of throwing garbage can occur randomly; and some vehicles may throw objects transported by the vehicles due to road jolts during transportation. Currently, most cities are still actively patrol by sanitation workers to find obstacles on the road. Due to the influence of traffic and patrol speed of sanitation workers, obstacles can not be recognized timely and accurately, and hidden danger is left for road safety.

Disclosure of Invention

The application provides an obstacle recognition method, an obstacle recognition device, an obstacle recognition system, a terminal, a cloud end and a computer readable storage medium, which can timely and accurately recognize obstacles appearing in a road.

In a first aspect, the present application provides an obstacle identifying method, where the obstacle identifying method is applied to a terminal, where the terminal is mounted on a vehicle, and where a camera is further mounted on the vehicle, the obstacle identifying method includes:

detecting whether obstacle information exists in the road video acquired by the camera in real time based on a preset first deep learning model;

If the existence of the obstacle information is detected, generating an alarm signal;

cutting the road video based on the alarm signal to obtain a key video segment;

and uploading the key video segment to a cloud end to instruct the cloud end to verify whether the key video segment is valid or not based on a preset second deep learning model, wherein the calculation force of the second deep learning model is better than that of the first deep learning model.

In a second aspect, the present application provides an obstacle identifying method, where the obstacle identifying method is applied to a cloud, and the obstacle identifying method includes:

receiving a key video segment uploaded by a terminal, wherein the key video segment is obtained by cutting a road video by the terminal based on an alarm signal, the road video is collected by a camera installed on a vehicle, and the alarm signal is generated by detecting obstacle information of the road video based on a preset first deep learning model;

verifying whether the key video segment is valid or not based on a preset second deep learning model, wherein the computational power of the second deep learning model is superior to that of the first deep learning model;

And if the key video segment is confirmed to be effective, generating an evidence chain based on the key video segment for storage.

In a third aspect, the present application provides an obstacle recognition device, where the obstacle recognition device is applied to a terminal, where the terminal is mounted on a vehicle, and where a camera is further mounted on the vehicle, the obstacle recognition device includes:

the detection unit is used for detecting whether obstacle information exists in the road video acquired by the camera in real time based on a preset first deep learning model;

a generation unit for generating an alarm signal if the presence of the obstacle information is detected;

the clipping unit is used for clipping the road video based on the alarm signal to obtain a key video segment;

and the uploading unit is used for uploading the key video segment to a cloud end so as to instruct the cloud end to verify whether the key video segment is valid or not based on a preset second deep learning model, wherein the calculation force of the second deep learning model is better than that of the first deep learning model.

In a fourth aspect, the present application provides an obstacle identifying apparatus, where the obstacle identifying apparatus is applied to a cloud end, and the obstacle identifying apparatus includes:

The receiving unit is used for receiving the key video segment uploaded by the terminal, wherein the key video segment is obtained by cutting a road video by the terminal based on an alarm signal, the road video is collected by a camera installed on a vehicle, and the alarm signal is generated by detecting obstacle information of the road video based on a preset first deep learning model;

a verification unit, configured to verify whether the key video segment is valid based on a preset second deep learning model, where the computational power of the second deep learning model is better than the computational power of the first deep learning model;

and the storage unit is used for generating and storing an evidence chain based on the key video segment if the key video segment is confirmed to be valid.

In a fifth aspect, the present application provides an obstacle recognition system, where the obstacle recognition system includes a terminal, a cloud end, and a camera, where the terminal and the camera are installed on the same vehicle;

the terminal comprises:

the uploading unit is used for uploading the key video segment to the cloud;

the cloud comprises:

the receiving unit is used for receiving the key video segment uploaded by the terminal;

the verification unit is used for verifying whether the key video segment is valid or not based on a preset second deep learning model;

the storage unit is used for generating an evidence chain based on the key video segment to store if the key video segment is confirmed to be valid;

wherein the computational power of the second deep learning model is superior to the computational power of the first deep learning model.

In a sixth aspect, the present application provides a terminal comprising a first memory, a first processor and a first computer program stored in the first memory and executable on the first processor, the first processor implementing the steps of the method according to the first aspect when executing the first computer program.

In a seventh aspect, the present application provides a cloud end, where the cloud end includes a second memory, a second processor, and a second computer program stored in the second memory and executable on the second processor, where the second processor implements the steps of the method according to the second aspect when executing the second computer program.

In an eighth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of the first aspect described above; alternatively, the computer program, when executed by a processor, implements the steps of the method of the second aspect

In a ninth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, implements the steps of the method of the first aspect as described above; alternatively, the computer program described above, when executed by one or more processors, implements the steps of the method of the second aspect described above.

Compared with the prior art, the beneficial effects that this application exists are: the vehicle is provided with a terminal and a camera, the terminal can detect whether obstacle information exists in road video acquired by the camera in real time based on a preset first deep learning model, when the obstacle information exists, an alarm signal is generated, the road video is cut based on the alarm signal to obtain a key video segment, and finally the key video segment can be uploaded to a cloud end to instruct the cloud end to verify whether the key video segment is effective or not based on a preset second deep learning model, wherein the calculation force of the second deep learning model is superior to that of the first deep learning model. The terminal and the camera are arranged on the vehicle, and the camera is used for shooting a road to obtain a road video, so that along with the running of the vehicle, the road video of the road on which the vehicle runs can be obtained, and the terminal can identify the road videos in real time so as to judge whether an obstacle exists in the road on which the vehicle runs, and the process can greatly improve the identification efficiency of the obstacle on the road; in addition, the terminal can upload the key video segment containing the obstacle information to the cloud for further verification, and the process can greatly improve the accuracy of identifying the obstacle on the road. It will be appreciated that the advantages of the second to ninth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic implementation flow chart of an obstacle recognition method provided in an embodiment of the present application;

fig. 2 is a schematic implementation flow chart of another obstacle identifying method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an obstacle recognition system according to an embodiment of the present disclosure;

fig. 4 is a schematic architecture diagram of a terminal provided in an embodiment of the present application;

fig. 5 is a block diagram of a structure of an obstacle identifying apparatus provided in an embodiment of the present application;

FIG. 6 is a block diagram of another obstacle identifying apparatus provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a cloud terminal according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to illustrate the technical solutions proposed in the present application, the following description is made by specific embodiments.

The following describes an obstacle recognition method provided by the embodiment of the present application, where the obstacle recognition method provided by the embodiment of the present application is applied to a terminal. The terminal is mounted on a vehicle, and a camera is further mounted on the vehicle. By way of example only, the camera may be mounted to a front portion of the vehicle body (e.g., at a front windshield of the vehicle) or may be mounted to a rear portion of the vehicle body (e.g., at a rear windshield of the vehicle); if the camera is arranged at the front part of the vehicle body, the camera faces to the front of the vehicle, and road videos in front of the vehicle can be acquired and obtained; if the camera is arranged at the rear part of the vehicle body, the camera faces to the rear of the vehicle, and road videos at the rear of the vehicle can be acquired and obtained. Referring to fig. 1, the obstacle identifying method includes:

step 101, detecting whether obstacle information exists in the road video acquired by the camera in real time based on a preset first deep learning model.

In an embodiment of the present application, the first deep learning model may be a target detection model. For example only, the object detection model may employ a SSD (Single Shot multibox Detector) object detection algorithm, a YOLO (You Only Look Once) object detection algorithm, or other low-computational-force object detection algorithm, without limitation. Through the first deep learning model, the terminal can detect whether obstacle information exists in the road video in real time.

It should be noted that, the detection rate of the first deep learning model should be maintained at a higher level, at least equal to the output rate of the camera, so as to improve the recognition efficiency of the terminal and ensure the real-time recognition of the terminal. For example, the camera outputs 20 frames of video frames per second to form a road video that is smoothly played; the terminal should also control its detection rate to be above 20 frames per second.

Step 102, if the existence of obstacle information is detected, an alarm signal is generated.

In the embodiment of the application, when the terminal detects that the obstacle information appears in the road video, an alarm signal is generated. Alternatively, the alarm signal may carry a target time point, which is a time point at which the presence of the obstacle information is detected.

And 103, cutting the road video based on the alarm signal to obtain a key video segment.

In the embodiment of the present application, although the first deep learning model can quickly detect whether the obstacle information exists in the video, the situation of false detection may occur due to lower calculation power. That is, the first deep learning model may erroneously detect background information as obstacle information, resulting in false detection. Based on the above, the terminal may cut out the video clip (determined by the first deep learning model) with the obstacle information from the road video based on the alarm signal. For ease of illustration, the video clip may be referred to as a key video piece.

Step 104, uploading the key video segment to a cloud end to instruct the cloud end to verify whether the key video segment is valid based on a preset second deep learning model.

In the embodiment of the present application, it is known from step 103 that the key video segment is a video segment in which the obstacle information is determined to exist through the first deep learning model, but considering that the first deep learning is likely to have false detection, it is not possible to completely confirm that the obstacle information exists objectively in the key video segment. Based on the above, the key video segment can be uploaded to the cloud end, and the cloud end verifies whether the key video segment is valid or not based on a preset second deep learning model, that is, whether the key video segment has obstacle information or not is further judged by the cloud end. It should be noted that the computing power of the second deep learning model adopted by the cloud is better than that of the first deep learning model adopted by the terminal, that is, the accuracy and precision of the second deep learning model are higher than those of the first deep learning model.

By way of example only, the second deep learning model may be a semantic segmentation model by which a key video segment is semantically segmented from frame to frame and the result of the semantic segmentation from frame to frame is analyzed, for example, whether a trajectory obtained based on the segmentation result (i.e., an obstacle region) of each frame is reasonable is determined by a frame difference method of the front and rear video frames, etc., so as to confirm whether the key video segment is valid. It is considered that the key video pieces verified as valid via the second deep learning model have a high possibility that an obstacle in the road is indeed photographed.

In some embodiments, in order to ensure the accuracy of the alarm signal, the step 101 specifically includes:

a1, detecting whether obstacle information exists in each frame of video frame in the road video in real time based on a preset first deep learning model.

A2, when the target video frame is detected, track analysis is carried out on obstacle information existing in the target video frame.

A3, if the track analysis result meets the preset track condition, determining that barrier information exists in the road video acquired by the camera.

The following scenario can be envisioned: an obstacle is arranged at a certain place of the road; when the vehicle is driven to the position, the obstacle is more and more close to the vehicle until the obstacle enters the view angle range of the camera; finally, the vehicle passes over the obstacle and moves away, and the obstacle moves away from the view angle range of the camera; that is, the camera head goes through the following process: the obstacle is not photographed-photographed and has a stable trajectory in the photographed picture-the obstacle disappears; the method is expressed in the road video, namely: no obstacle information exists in the previous frames of the road video; obstacle information exists in the middle frames, and the image positions corresponding to the obstacle information of the frames can form a continuous track; no obstacle information exists in the next few frames of video frames.

Based on this, the terminal may detect whether each frame of video frame has obstacle information in real time based on the first deep learning model in sequence according to the time sequence of each frame of video frame in the road video, and once the first frame of video frame (i.e. the target video frame) having obstacle information is detected, track analysis may be performed on the obstacle information existing in the target video frame, specifically: extracting obstacle information corresponding to each of N consecutive video frames after the target video frame, where N is a positive integer, N may be set to a value of 30 or 50, etc., by way of example only, and is not limited herein; obtaining position information of each obstacle information in the corresponding video frame, wherein the position information can be understood as center coordinates of an area occupied by the obstacle information in the corresponding video frame; and fitting to obtain an obstacle track based on the position information of each obstacle information in the corresponding video frame, wherein the obstacle track is the track analysis result. It will be appreciated that the obstacle trajectory should at least meet the following trajectory conditions: the track is continuous, and no abrupt change occurs to the track points, i.e. no track points deviate too far from the track; the track route is matched with a target route corresponding to the running direction of the vehicle, for example, the vehicle runs forwards, the camera shoots a road image in front of the vehicle, and then the shot obstacle presents a route from top to bottom in the road video, and the route is the target route. After the obstacle track is determined to meet all track conditions, the existence of obstacle information in the road video acquired by the camera can be determined.

It should be noted that, considering that an obstacle may be encountered at multiple places during the running process of the vehicle, the first frame of video frame (i.e. the target video frame) with obstacle information provided in the present application refers to the first frame of video frame with obstacle information in the road video obtained from the current power-on start to the current moment of the camera. In fact, for a certain video frame for which the presence of obstacle information is detected, as long as none of the preceding M video frames of the video frame has the obstacle information, the video frame can be regarded as the first frame of video frame in which the obstacle is present. M is a positive integer, and is not limited thereto, and M may be a value of 10, 20, 50, or the like, for example.

For example, no obstacle information exists in any of the 1 st to 100 th video frames, and if obstacle information exists in the 101 st video frame, the 101 st video frame may be used as the target video frame, and steps A2 and A3 are performed; then, no obstacle information exists in the 150 th to 200 th video frames, and if obstacle information exists in the 201 st video frame, the 201 st video frame can be used as the target video frame, and the steps A2 and A3 are executed.

In some embodiments, in order to improve the processing efficiency of the cloud to the key video segment, the step 103 specifically includes:

b1, determining video start-stop time based on the generation time of the alarm signal and a preset clipping period.

And B2, cutting the road video according to the video start-stop time to obtain the key video segment.

The preset clipping period may be set by the user, for example, may be set to 10 seconds. The electronic device may reserve half of the clipping period time forward and half of the clipping period time backward with the generation time of the alarm signal as the central time point of the key video piece. For example, the generation time of the alarm signal is 25 minutes and 48 seconds of x years x months x days 10, and the clipping period is 10 seconds, and then the video start-stop time is 25 minutes and 43 seconds of x years x months x days 10 to 25 minutes and 53 seconds of x years x months x days 10. Each video frame of the road video corresponds to a time stamp indicating the output time of the video frame. Therefore, the terminal can extract all video frames with output time within the video start-stop time, and the cut key video segment can be obtained.

In some embodiments, in order for the cloud to further understand the obstacle and to be able to locate the place where the obstacle appears in time, the obstacle identifying method further includes, before step 104:

Acquiring vehicle running information corresponding to the key video segment, wherein the vehicle running information comprises vehicle position information, vehicle speed information and vehicle identification information;

accordingly, the step 104 includes:

and packaging the key video segment and the vehicle driving information into a verification data packet, and uploading the verification data packet to the cloud.

Wherein, although the position of the vehicle can change in real time in the running process, the vehicle can not move instantaneously; that is, the vehicle is typically only able to travel a short distance (e.g., tens of meters) within a short time (e.g., 10 seconds); based on this, the vehicle position information may include position information of the vehicle at the time of generating the warning signal. Also, the vehicle speed information may include speed information of the vehicle at the time of generating the warning signal.

Of course, the vehicle position information may include position information of the vehicle at a start time of the key video piece and position information of the vehicle at a stop time of the key video piece; alternatively, the position information of the vehicle at each time point of the key video piece may be included, which is not limited herein. Similarly, the vehicle speed information may also include average speed information of the vehicle during the video start-stop time; alternatively, the speed information of the vehicle at each time point of the key video piece may be included, which is not limited herein.

The vehicle identification information may be a license plate number of a vehicle; alternatively, the vehicle may be a vehicle identification number (Vehicle Identification Number, VIN), which is not limited herein. In practice, the vehicle identification information may uniquely identify the vehicle.

As can be seen from the above, in the embodiment of the present application, since the terminal and the camera are both mounted on the vehicle, and the camera is used for capturing a road to obtain a road video, along with the running of the vehicle, the road video of the road along which the vehicle runs can be obtained, and the terminal identifies the road videos in real time, so as to determine whether an obstacle exists in the road along which the vehicle runs, which can greatly improve the identification efficiency of the obstacle on the road; in addition, the terminal can upload the key video segment containing the obstacle information to the cloud for further verification, and the process can greatly improve the accuracy of identifying the obstacle on the road.

The following describes an obstacle recognition method provided by the embodiment of the present application, where the obstacle recognition method provided by the embodiment of the present application is applied to a cloud. Referring to fig. 2, the obstacle identifying method includes:

Step 201, receiving the key video segment uploaded by the terminal.

In this embodiment of the present application, the key video segment is obtained by clipping a road video by a terminal based on an alarm signal, the road video is collected by a camera installed on a vehicle, the alarm signal is generated based on detection of obstacle information on the road video by a preset first deep learning model, and the previous embodiment can be referred to specifically, and will not be described herein. The cloud only needs to receive the key video segment uploaded by the terminal.

In some embodiments, the terminal packages the key video portion and the vehicle driving information corresponding to the key video portion together into a verification data packet for uploading. Therefore, the cloud end can firstly receive the verification data packet uploaded by the terminal, then analyze the verification data packet, and obtain the key video segment carried in the verification data packet and the corresponding vehicle driving information.

Step 202, verifying whether the key video segment is valid based on a preset second deep learning model.

In the embodiment of the present application, although the key video piece is a video piece for which it is determined that the obstacle information exists via the first deep learning model, considering that there is a possibility of false detection in the first deep learning, it is not possible to completely determine that the obstacle information exists objectively in the key video piece. Based on the above, the cloud end verifies whether the key video segment is valid or not based on a preset second deep learning model, that is, the cloud end further judges whether the barrier information exists at the key video segment or not. It should be noted that the computing power of the second deep learning model adopted by the cloud is better than that of the first deep learning model adopted by the terminal, that is, the accuracy and precision of the second deep learning model are higher than those of the first deep learning model.

By way of example only, the second deep learning model may be a semantic segmentation model by which a key video segment is semantically segmented from frame to frame and the result of the semantic segmentation from frame to frame is analyzed, for example, whether a trajectory obtained based on the segmentation result (i.e., an obstacle region) of each frame is reasonable is determined by a frame difference method of the front and rear video frames, etc., so as to confirm whether the key video segment is valid. It is considered that after the verification of the second deep learning model, there is a high possibility that the effective key video pieces are still determined to actually shoot the obstacle in the road.

And 203, if the key video segment is confirmed to be valid, generating an evidence chain based on the key video segment and storing the evidence chain.

In this embodiment of the present application, if the key video segment is valid after verification by the second deep learning model, it indicates that the key video segment has a higher probability of actually capturing an obstacle in the road, and at this time, an evidence chain may be generated and stored based on the key video segment. Considering that the verification data packet can also carry vehicle driving information corresponding to the key video segment, the cloud end can store the key video segment and the vehicle driving information as a evidence chain into a preset database. And the subsequent supervisory personnel can select related evidence chains in the database by taking the obstacle places, license plate numbers or vehicle speeds and the like as search words.

In some embodiments, the cloud end may further determine the key video segment by combining big data, and the obstacle identifying method further includes:

if the key video segment is determined to be valid based on the second deep learning model, determining a corresponding obstacle occurrence weight based on the vehicle running information;

accordingly, the step 203 includes:

and if the key video segment is confirmed to be effective and the obstacle occurrence weight is higher than a preset weight threshold, generating and storing an evidence chain based on the key video segment and the vehicle driving information.

The cloud end can allocate the place weight of each place from the place dimension, the time weight of each time period from the time dimension and the vehicle type weight of each vehicle type from the vehicle type dimension according to the occurrence frequency of the throwing event in different places, the occurrence frequency of the throwing event in different time periods and the occurrence frequency of the throwing event on different vehicle types respectively according to the stored content in the database. And then, based on the vehicle position information, the vehicle speed information and the vehicle identification information in the vehicle running information of the key video segment, determining the corresponding target place weight, the target time weight and the target vehicle type weight, wherein the normalized value of the sum of the three is the obstacle occurrence weight.

For example, the database stores 100 evidence links together, wherein 50 evidence links related to the a-site (i.e. the vehicle position information corresponding to 50 key video segments is the a-site), and the obstacle occurrence weight of the a-site is 0.5; by analogy, the occurrence weight of the obstacle at the site B is 0.3, the occurrence weight of the obstacle at the site C is 0.1, the occurrence weights of the obstacle at other sites D, E and the obstacle at the site F are less than 0.1, the list is not repeated, and the weight threshold is set to be 0.15; and for the new key video segment acquired by the cloud, assuming that the key video segment is determined to be valid through the second deep learning model, and the vehicle position information in the vehicle driving information corresponding to the key video segment is the A location, the target location weight of the key video segment is known to be 0.5. And similarly, obtaining that the target time weight of the key video segment is 0.3 and the target vehicle type weight is 0.1, then calculating to obtain that the normalized value of the sum of the target place weight, the target time weight and the target vehicle type weight is 0.25, wherein the normalized value is the occurrence weight of the obstacle, is higher than the weight threshold value by 0.15, and the cloud can generate an evidence chain based on the key video segment and the corresponding vehicle driving information for storage.

Otherwise, if a certain key video segment is valid, but the corresponding obstacle occurrence weight is not higher than the weight threshold, the key video segment can be marked as a video to be manually checked, and a supervisor waits to manually check the key video segments. If the key video segment is confirmed to be effective after the manual verification, an evidence chain can be generated and stored based on the key video segment and the corresponding vehicle driving information.

In some embodiments, the invalid key video segments may be pushed to the terminal, and the terminal optimizes the first deep learning model adopted by the invalid key video segments as a negative sample, so as to improve the accuracy and precision of the first deep learning model. The invalid key video segment not only includes the key video segment which is confirmed to be invalid after manual verification, but also includes the key video segment which is confirmed to be invalid through a second deep learning model. In addition, the cloud end can also optimize the second deep learning model by taking the key video segment which is confirmed to be invalid after manual verification as a negative sample so as to improve the accuracy and precision of the second deep learning model.

In some embodiments, taking the second deep learning model as an example of a semantic segmentation model, the semantic segmentation model may be trained based on a training sample set in combination with an attention mechanism.

The training sample set comprises at least one training image and mask images associated with the training images. That is, the training sample set includes at least one image pair, and one image pair is composed of one training image and a mask image corresponding to the training image. Considering that the obstacle recognition method in the embodiment of the present application is mainly used for recognizing an obstacle on a road, therefore, the possible obstacle situation in the road can be simulated first, and then the simulated scene is photographed to obtain a plurality of training images. Then, aiming at each training image, obtaining a mask image based on the obstacle region mark in the training image; that is, in the training image, the pixel value of the pixel point of the obstacle region is marked with 1, and the pixel value of the pixel point of the background region (i.e., the other region outside the obstacle region) is marked with 0, whereby a mask image uniquely corresponding to each training image can be obtained. Of course, an image acquired by the camera during the running of the vehicle can be used as a training image, and a corresponding mask image can be obtained by manual labeling, so that a new image pair can be formed, and a training sample set can be enriched.

The following describes the training process of the semantic segmentation model: the semantic segmentation model comprises at least one convolution-pooling structure, wherein the convolution-pooling structure comprises at least one convolution layer and one pooling layer; the whole training idea of the semantic segmentation model is similar to the prior art, but for each convolution-pooling structure, the following steps can be added by combining an attention mechanism:

and C1, acquiring a first characteristic diagram output by the convolution-pooling structure.

In the embodiment of the present application, if the convolution-pooling structure is the first convolution-pooling structure in the semantic segmentation model, the input of the convolution-pooling structure is the training image input in the present training; otherwise, if the convolution-pooling structure is not the first convolution-pooling structure in the semantic segmentation model, the input of the convolution-pooling structure is the feature map transferred based on the output of the last convolution-pooling structure. For convenience of explanation, the training image input in this training is referred to as a target training image in this embodiment of the present application.

For any convolution-pooling structure, the input of the convolution-pooling structure firstly carries out convolution operation through each convolution layer of the convolution-pooling structure, and then carries out pooling operation on the result of the convolution operation through a pooling layer, so that a characteristic diagram output by the convolution-pooling structure can be obtained. It should be noted that the pooling operation performed by the pooling layer is usually maximum pooling. For ease of illustration, the embodiment of the present application refers to the feature map output by the convolution-pooling structure as the first feature map. In fact, in the semantic segmentation model, the first feature map output by the convolution-pooling structure is derived directly or indirectly based on the target training image, regardless of the location it is in.

And C2, splicing the first feature map and the target mask image to obtain a spliced feature map.

In the embodiment of the application, the attention mechanism is adopted, mainly for enabling the semantic segmentation model to consider more obstacle information in the image; that is, the obstacle information is an object to be detected with respect to the background information in the image. Based on this, a target mask image may be obtained from the training sample set, the target mask image referring to the mask image with which the target training image is associated in the training sample set. By means of direct stitching operation, the target mask image and the first feature image are stitched together, and the formed new image can be recorded as a stitched feature image.

In some embodiments, when the first feature map and the target mask image are spliced, the target mask image is adjusted to a target size, where the target size is the size of the first feature map, that is, the sizes of the first feature map and the target mask image are unified; and then copying the target mask images so that the number of the target mask images is the same as the number of channels of the first feature images, and finally splicing the first feature images with all the target mask images in the channel dimension to obtain spliced feature images. The size of the spliced feature map obtained through the above-described process is still kept at the target size, and the number of channels becomes twice the number of channels of the first feature map. For example only, the target mask image may be scaled using a linear interpolation operation to achieve a uniform size of the first feature map and the target mask image, the linear interpolation operation not being limited herein.

And C3, fusing the spliced feature images to obtain a second feature image.

In this embodiment of the present application, the simple stitching operation cannot make the target mask image directly affect the first feature map, but we want to make the information in the target mask image directly affect the first feature map, so that the semantic segmentation model can relatively pay attention to the data related to the obstacle information in the first feature map, and relatively ignore the data related to the background information in the first feature map. Based on the above, the spliced feature images can be fused, that is, the data of the pixels at the same position on the plurality of channels are fused together, so that the second feature image can be obtained.

In some embodiments, the second feature map may be obtained by fusing the above-mentioned spliced feature maps in the channel dimension using a 1×1 convolution kernel.

And C4, transmitting the second characteristic diagram to the next network layer of the convolution-pooling structure in the semantic segmentation model.

In the embodiment of the present application, after the second feature map is obtained, the second feature map may be transferred to the next network layer through the activation function, and used as an input of the next network layer; or, the second feature map may be subjected to mean pooling to perform secondary dimension reduction, and the second feature map after the secondary dimension reduction may be transferred to a next network layer through an activation function, which is used as an input of the next network layer, which is not limited herein. Assuming that the current convolution-pooling structure is not the last convolution-pooling structure, the next network layer is typically the convolution layer (which is the first convolution layer of the next convolution-pooling structure); assuming that the current convolution-pooling structure is the last convolution-pooling structure, the next network layer is typically the fully connected layer.

It will be appreciated that embodiments of the present application actually improve upon the training process of existing semantic segmentation models, incorporating the attention mechanism with the addition of steps 201-204 described above after each convolution-pooling structure. In the training process, the semantic segmentation model finally outputs a training result (the training result is in a mask form) corresponding to the target training image, the training result is compared with the target mask image to calculate and obtain loss, and parameters of the semantic segmentation model are adjusted based on the loss until the loss converges, and the training is finished.

The above procedure is described below by way of a simple specific example:

the semantic segmentation model is assumed to have a plurality of convolution-pooling structures, namely convolution-pooling structures 1, 2, 3, … … and n respectively.

For convolution pooling structure 1, the input is a target training image of h×w (e.g., 300×300), where H is high and W is wide. After passing through the convolutional layer 11, the convolutional layer 12 and the pooling layer 13 in the convolutional-pooling structure 1, a first feature map F11 of H ' ×w ' ×c ' (for example, 32×32×64) is obtained, where C is the number of channels.

The target mask images of H ' W are duplicated after being adjusted to the size of H ' W ', and C target mask images of H ' W ' are obtained; the C target mask images of H 'W' are stitched with the first feature map F11 of H 'W' C 'in the channel dimension to obtain a stitched feature map F12 of H' W '(C' +c ') (i.e., H' W '×2c').

The fusion of the concatenated feature map F12 of H ' ×w ' +c ' (C ' +c ') is performed by a 1×1 convolution kernel, so that on the one hand, the computational effort required for the subsequent semantic segmentation model can be reduced, and on the other hand, the channel dimension reduction is realized, so that the concatenated feature map F12 is fused into the second feature map F13 of H ' ×w ' ×c″ in the channel dimension.

The second feature map F13 of H '×w' ×c″ is transferred to the convolution pooling structure 2 through an activation function, and is used as an input of the convolution pooling structure 2, and the working process thereof is substantially the same as that of the convolution pooling structure 1, which is not described herein.

In some embodiments, the value of each position on the second feature map actually represents the attention degree of the position, so that after the second feature map is obtained, the feature map under a random channel in the second feature map can be extracted as a feature map to be displayed, and a thermodynamic diagram is generated based on the feature map to be displayed, and reflection means mathematically: for a certain three-dimensional feature map (i.e., the second feature map of H '×w' ×c "), the feature map is effectively converted into a matrix corresponding to one two dimensions (i.e., the feature map to be displayed of H '×w'). The step of generating the thermodynamic diagram based on the feature map to be displayed may specifically be: and adjusting the feature map to be displayed to an original size, wherein the original size is the size of the target training image, and then generating a thermodynamic diagram based on the adjusted feature map to be displayed. For the pixel point with the coordinates of (x, y) in the adjusted feature map to be displayed, the larger the pixel value of the pixel point is, the higher the pixel point at the same position (i.e., with the coordinates of (x, y)) in the generated thermodynamic diagram is.

In some embodiments, the semantic segmentation model typically employs a UNet network structure. For the later convolution-pooling structure, the second feature map obtained based on the convolution-pooling structure can truly describe the attention degree of the target training image. That is, the thermodynamic diagram generated based on the second feature map obtained by convolving the pooled structure n is relatively better for the plurality of convolutionally-pooled structures 1, 2, … …, n of the semantic segmentation model.

It will be appreciated that the first deep learning model may also be trained in the manner described above in connection with the attention mechanism, and is not limited herein.

The embodiment of the application further provides an obstacle recognition system, as shown in fig. 3, which includes at least one terminal 31, a cloud end 32 and at least one camera 33. One terminal can be connected with one or more cameras, and the terminal and the camera are installed on the same vehicle for the connected terminal and camera. Each terminal 31 performs the steps of the above-mentioned obstacle recognition method applied to the terminal (i.e., steps 101-104), and the cloud 32 performs the steps of the above-mentioned obstacle recognition method applied to the cloud (i.e., steps 201-103).

In some embodiments, as shown in fig. 4, one terminal 31 may be split into multiple devices, for example, into a first device and a second device, where the first device establishes a connection with the second device and the camera 33, and the second device establishes a connection with the first device, the camera 33, and the cloud 32. The first device may perform the operations of steps 101 and 102 described above and transmit the generated alarm signal to the second device; the second device performs the operations of steps 103 and 104 after receiving the alarm signal transmitted by the first device.

Corresponding to the above-provided obstacle identifying method applied to the terminal, the embodiment of the present application provides an obstacle identifying apparatus, where the obstacle identifying apparatus is applied to the terminal, as shown in fig. 5, an obstacle identifying apparatus 500 in the embodiment of the present application includes:

a detection unit 501, configured to detect, in real time, whether obstacle information exists in the road video acquired by the camera based on a preset first deep learning model;

a generating unit 502, configured to generate an alarm signal if it is detected that there is obstacle information;

a clipping unit 503, configured to clip the road video based on the alarm signal to obtain a key video segment;

and an uploading unit 504, configured to upload the key video segment to a cloud end, so as to instruct the cloud end to verify whether the key video segment is valid based on a preset second deep learning model, where the computational power of the second deep learning model is better than the computational power of the first deep learning model.

Optionally, the detecting unit 501 includes:

the video frame detection subunit is used for detecting whether obstacle information exists in each frame of video frame in the road video in real time based on a preset first deep learning model;

A track analysis subunit, configured to perform track analysis on obstacle information existing in a target video frame when the target video frame is detected, where the target video frame is a first frame video frame in which the obstacle information exists;

and the information determination subunit is used for determining that barrier information exists in the road video acquired by the camera if the track analysis result meets the preset track condition.

Optionally, the clipping unit 503 includes:

the time determining subunit is used for determining the video start-stop time based on the generation time of the alarm signal and a preset cutting period;

and the road clipping subunit is used for clipping the road video according to the video start-stop time to obtain the key video segment.

Optionally, the obstacle identifying apparatus 500 further includes:

an obtaining unit, configured to obtain vehicle running information corresponding to the key video segment, where the vehicle running information includes vehicle position information, vehicle speed information, and vehicle identification information;

accordingly, the uploading unit 504 is specifically configured to package the key video segment and the vehicle driving information into a verification data packet, and upload the verification data packet to the cloud.

Corresponding to the above-mentioned obstacle recognition method applied to the cloud, the embodiment of the present application provides an obstacle recognition device, where the obstacle recognition device is applied to the cloud, as shown in fig. 6, an obstacle recognition device 600 in the embodiment of the present application includes:

a receiving unit 601, configured to receive a key video segment uploaded by a terminal, where the key video segment is obtained by clipping a road video by the terminal based on an alarm signal, the road video is collected by a camera installed on a vehicle, and the alarm signal is generated by detecting obstacle information of the road video based on a preset first deep learning model;

A verification unit 602, configured to verify whether the key video segment is valid based on a preset second deep learning model, where the computational power of the second deep learning model is better than the computational power of the first deep learning model;

and the storage unit 603 is configured to generate and store a evidence chain based on the key video piece if the key video piece is confirmed to be valid.

Optionally, the receiving unit 601 includes:

a data packet receiving subunit, configured to receive an authentication data packet uploaded by the terminal, where the authentication data packet carries the key video segment;

and the data packet analysis subunit is used for analyzing the verification data packet to obtain the key video segment.

Optionally, the verification data packet further carries vehicle running information corresponding to the key video segment, where the vehicle running information includes vehicle position information, vehicle speed information and vehicle identification information; the obstacle identifying apparatus 600 further includes:

a weight determining unit configured to determine a corresponding obstacle occurrence weight based on the vehicle traveling information if the key video piece is determined to be valid based on the second deep learning model;

accordingly, the storage unit 603 is specifically configured to generate and store a evidence chain based on the key video piece and the vehicle driving information if the key video piece is confirmed to be valid and the obstacle occurrence weight is higher than a preset weight threshold.

Corresponding to the above-provided obstacle identifying method applied to the terminal, the embodiment of the present application provides a terminal, referring to fig. 7, the terminal 7 in the embodiment of the present application includes: a first memory 701, one or more first processors 702 (only one shown in fig. 7) and a first computer program stored on the first memory 701 and executable on the first processors. Wherein: the first memory 701 is used for storing software programs and modules, and the first processor 702 executes various functional applications and data processing by running the software programs and units stored in the first memory 701 to obtain resources corresponding to preset events. Specifically, the first processor 702 may implement steps in the obstacle identifying method applied to the terminal, such as steps 101 to 104 shown in fig. 1, by executing the first computer program stored in the first memory 701, which will not be described herein.

It should be appreciated that in embodiments of the present application, the first processor 702 may be a central processing unit (Central Processing Unit, CPU), the first processor 702 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Arra7, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor, or may be any conventional processor or the like.

The first memory 701 may include read only memory and random access memory, and provides instructions and data to the first processor 702. Part or all of the first memory 701 may also include a nonvolatile random access memory. For example, the first memory 701 may also store information of a device class.

Corresponding to the above-provided method for identifying an obstacle applied to a cloud, the embodiment of the present application provides a cloud, referring to fig. 8, the cloud 8 in the embodiment of the present application includes: a second memory 801, one or more second processors 802 (only one shown in fig. 8) and a second computer program stored on the second memory 801 and executable on the second processors. Wherein: the second memory 801 is used for storing software programs and modules, and the second processor 802 executes various functional applications and data processing by running the software programs and units stored in the second memory 801 to acquire resources corresponding to preset events. Specifically, the second processor 802 may implement steps in the method for identifying an obstacle applied to the cloud, for example, steps 201 to 203 shown in fig. 2, by executing the second computer program stored in the second memory 801, which is not described herein.

It should be appreciated that in embodiments of the present application, the second processor 802 may be a central processing unit (Central Processing Unit, CPU), and the second processor 802 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Arra, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor, or may be any conventional processor or the like.

The second memory 801 may include read only memory and random access memory and provide instructions and data to the second processor 802. Part or all of the second memory 801 may also include non-volatile random access memory. For example, the second memory 801 may also store information of a device class.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of external device software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of modules or units described above is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct associated hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The above computer readable storage medium may include: any entity or device capable of carrying the computer program code described above, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer readable Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier wave signal, a telecommunications signal, a software distribution medium, and so forth.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. An obstacle recognition method, wherein the obstacle recognition method is applied to a terminal, the terminal is mounted on a vehicle, and a camera is further mounted on the vehicle, the obstacle recognition method comprises:

cutting the road video based on the alarm signal to obtain a key video segment;

uploading the key video segment to a cloud end to instruct the cloud end to verify whether the key video segment is valid or not based on a preset second deep learning model, wherein the computational power of the second deep learning model is better than that of the first deep learning model;

Before uploading the key video segment to the cloud, the obstacle identifying method further includes:

after verifying whether the key video piece is valid based on the preset second deep learning model, the obstacle recognition method further comprises:

if the key video segment is determined to be effective based on the second deep learning model, determining a corresponding obstacle occurrence weight based on the vehicle driving information; determining a corresponding target place weight, a target time weight and a target vehicle type weight based on vehicle position information, vehicle speed information and vehicle identification information in the vehicle running information of the key video segment, wherein the normalized value of the sum of the three is the barrier occurrence weight;

and if the key video segment is confirmed to be effective and the occurrence weight of the obstacle is higher than a preset weight threshold, indicating the cloud to generate and store an evidence chain based on the key video segment and the vehicle running information.

2. The obstacle recognition method as claimed in claim 1, wherein the detecting whether the obstacle information exists in the road video acquired by the camera in real time based on the preset first deep learning model comprises:

Detecting whether obstacle information exists in each frame of video frame in the road video in real time based on a preset first deep learning model;

when a target video frame is detected, carrying out track analysis on obstacle information existing in the target video frame, wherein the target video frame is a first frame video frame with the obstacle information;

and if the track analysis result meets a preset track condition, determining that barrier information exists in the road video acquired by the camera.

3. The obstacle recognition method according to claim 1, wherein the clipping the road video based on the alarm signal to obtain a key video piece includes:

determining video start-stop time based on the generation time of the alarm signal and a preset cutting period;

and cutting the road video according to the start-stop time of the video to obtain the key video segment.

4. The obstacle identifying method as claimed in claim 1, wherein,

the uploading the key video segment to the cloud end comprises:

5. An obstacle recognition method, wherein the obstacle recognition method is applied to a cloud, and the obstacle recognition method comprises the following steps:

the method comprises the steps of receiving a key video segment uploaded by a terminal, wherein the key video segment is obtained by cutting a road video by the terminal based on an alarm signal, the road video is collected by a camera installed on a vehicle, and the alarm signal is generated by detecting obstacle information of the road video based on a preset first deep learning model;

verifying whether the key video segment is valid based on a preset second deep learning model, wherein the computational power of the second deep learning model is better than that of the first deep learning model;

if the key video segment is confirmed to be effective, generating an evidence chain based on the key video segment for storage;

receiving vehicle running information corresponding to the key video segment acquired by a terminal, wherein the vehicle running information comprises vehicle position information, vehicle speed information and vehicle identification information; after verifying whether the key video piece is valid based on the preset second deep learning model, the obstacle recognition method further comprises:

correspondingly, if the key video segment is confirmed to be valid, generating an evidence chain based on the key video segment for storage, including:

and if the key video segment is confirmed to be effective and the occurrence weight of the obstacle is higher than a preset weight threshold, generating an evidence chain based on the key video segment and the vehicle driving information and storing the evidence chain.

6. The obstacle identifying method as claimed in claim 5, wherein the key video segment uploaded by the receiving terminal comprises:

receiving a verification data packet uploaded by the terminal, wherein the verification data packet carries the key video segment;

and analyzing the verification data packet to obtain the key video segment.

7. The obstacle recognition method according to claim 6, wherein the verification data packet further carries vehicle driving information corresponding to the key video piece.

8. An obstacle recognition device, characterized in that, the obstacle recognition device is applied to the terminal, the terminal is installed in the vehicle, still install the camera on the vehicle, the obstacle recognition device includes:

the uploading unit is used for uploading the key video segment to a cloud end to instruct the cloud end to verify whether the key video segment is valid or not based on a preset second deep learning model, wherein the calculation force of the second deep learning model is superior to that of the first deep learning model;

the obstacle recognition device is further configured to:

9. An obstacle recognition device, wherein the obstacle recognition device is applied to the cloud, the obstacle recognition device comprising:

The verification unit is used for verifying whether the key video segment is valid or not based on a preset second deep learning model, wherein the computational power of the second deep learning model is superior to that of the first deep learning model;

the storage unit is used for generating an evidence chain based on the key video segment for storage if the key video segment is confirmed to be effective;

the receiving unit is further configured to receive vehicle running information corresponding to the key video segment acquired by the terminal, where the vehicle running information includes vehicle position information, vehicle speed information and vehicle identification information;

the obstacle recognition device further includes:

the weight determining unit is used for determining corresponding obstacle occurrence weights based on the vehicle running information if the key video segment is determined to be valid based on the second deep learning model; determining a corresponding target place weight, a target time weight and a target vehicle type weight based on vehicle position information, vehicle speed information and vehicle identification information in the vehicle running information of the key video segment, wherein the normalized value of the sum of the three is the barrier occurrence weight;

the storage unit is specifically configured to generate and store an evidence chain based on the key video segment and the vehicle driving information if the key video segment is confirmed to be valid and the occurrence weight of the obstacle is higher than a preset weight threshold.

10. The obstacle recognition system is characterized by comprising a terminal, a cloud end and a camera, wherein the terminal and the camera are installed on the same vehicle;

the terminal comprises:

the uploading unit is used for uploading the key video segment to the cloud;

the cloud comprises:

the verification unit is used for verifying whether the key video piece is valid or not based on a preset second deep learning model;

wherein the computational power of the second deep learning model is superior to the computational power of the first deep learning model;

the terminal is also configured to:

11. A terminal comprising a first memory, a first processor and a first computer program stored in the first memory and executable on the first processor, characterized in that the first processor implements the method according to any of claims 1 to 4 when executing the first computer program.

12. A cloud device comprising a second memory, a second processor and a second computer program stored in the second memory and executable on the second processor, wherein the second processor implements the method of any of claims 5 to 7 when executing the second computer program.

13. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method of any one of claims 1 to 4; alternatively, the computer program, when executed by a processor, implements the method of any of claims 5 to 7.