CN112784725B

CN112784725B - Pedestrian anti-collision early warning method, device, storage medium and stacker

Info

Publication number: CN112784725B
Application number: CN202110052548.0A
Authority: CN
Inventors: 郑智辉; 唐波; 闫威; 李飞; 王硕; 张聪; 张伯川; 刘燕欣; 高仕博; 肖利平; 闫涛; 徐安盛; 郭宸瑞; 龚任杰
Original assignee: Beijing Aerospace Automatic Control Research Institute
Current assignee: Beijing Aerospace Automatic Control Research Institute
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2024-06-07
Anticipated expiration: 2041-01-15
Also published as: CN112784725A

Abstract

The embodiment of the application provides a pedestrian anti-collision early warning method, equipment, a storage medium and a stacking machine, relates to the field of container port stacking machine assisted driving, and is used for solving the problem that the stacking machine is easy to collide with nearby pedestrians due to the fact that the stacking machine is too high and too large, the view is too narrow, the view blind area is too many and the like in the related art. The method comprises the following steps: acquiring video data acquired by a looking-around camera; decoding the video data and placing the decoded video data into an image data queue; detecting the latest image frames in the image data queue based on a pre-trained pedestrian target detection model to obtain a detection result; judging whether the image obtained from the image data queue is a first frame image or not; if the image is not the first frame image, performing target prediction by using a Kalman filter according to the tracking result in the previous frame image to obtain a prediction result of the current frame image; matching the predicted result with a corresponding detection result to obtain a matching result; and judging whether to perform early warning according to the matching result.

Description

Pedestrian anti-collision early warning method, device, storage medium and stacker

Technical Field

The application relates to the field of auxiliary driving of container port stacking machines, in particular to a pedestrian anti-collision early warning method, device, storage medium and stacking machine.

Background

The container stacking machine has the advantages of simple structure, flexibility, high loading and unloading speed, convenient operation and the like, becomes an important loading and unloading device for logistics enterprises, and is widely applied to places such as ports, railways, goods yards, warehouses and the like for loading, unloading, stacking and carrying operations. However, due to the fact that the stacking machine is too high, too large, too narrow in view, too many in view blind areas and the like, a driver is difficult to accurately judge the surrounding environment, and the stacking machine is extremely easy to collide with pedestrians and the like nearby.

Disclosure of Invention

The embodiment of the application provides a pedestrian anti-collision early warning method, equipment, a storage medium and a stacking machine, which are used for solving the problem that the stacking machine is easy to collide with nearby pedestrians due to the fact that the stacking machine is too high, too large, too narrow in view, too many in view blind areas and the like in the related technology.

An embodiment of a first aspect of the present application provides a pedestrian anti-collision early warning method, including:

acquiring video data acquired by a looking-around camera;

decoding the video data, and placing the decoded data into an image data queue;

Detecting the latest image frames in the image data queue based on a pre-trained pedestrian target detection model to obtain a detection result; judging whether the latest image obtained from the image data queue is a first frame image or not; if the image is not the first frame image, performing target prediction by using a Kalman filter according to the tracking result in the previous frame image to obtain a prediction result of the current frame image;

matching the detection result with a prediction result to obtain a matching result;

And judging whether to perform early warning according to the matching result.

Embodiments of the second aspect of the present application provide an apparatus comprising:

a memory;

A processor; and

A computer program;

Wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of the preceding claims.

An embodiment of the third aspect of the present application provides a computer-readable storage medium, characterized in that a computer program is stored thereon; the computer program being executable by a processor to implement the method as claimed in any one of the preceding claims.

An embodiment of a fourth aspect of the present application provides a stacker comprising a vehicle body, a looking-around camera, a buzzer and apparatus as described above; the looking around camera, the equipment and the buzzer are mounted to the vehicle body; the looking around camera and the buzzer are in communication connection with the equipment.

The embodiment of the application provides a pedestrian anti-collision early warning method, equipment, a storage medium and a stacking machine, which are beneficial to acquiring 360-degree environmental information around the stacking machine through video data collected by a looking-around camera; through detecting pedestrian targets and tracking motion tracks, when the distance between a pedestrian and the stacker exceeds a safe distance, the stacker is triggered to send out early warning to prompt a driver, so that the driver can know in time and take measures in time, the safety is improved, and the occurrence of collision accidents of the stacker is reduced or even avoided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of a pedestrian anti-collision early warning method according to an exemplary embodiment;

fig. 2 is a flow chart of a pedestrian anti-collision early warning method according to another exemplary embodiment;

Fig. 3 is a flowchart of a pedestrian anti-collision early warning method according to still another exemplary embodiment;

FIG. 4 is a flow chart of a pedestrian anti-collision pre-warning method according to yet another exemplary embodiment;

FIG. 5 is a flow chart of a four-pass multi-pedestrian target detection and tracking algorithm provided in accordance with yet another exemplary embodiment;

fig. 6a to 6d are images at times T1, T2, T3, T4, respectively;

FIG. 7 is a schematic diagram of a deep convolution feature extraction network architecture provided by an exemplary embodiment;

fig. 8 is a block diagram of a stacker provided in an exemplary embodiment.

Detailed Description

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

In recent years, for container port stacking machines, due to the fact that the stacking machines are too high and too large, the view is too narrow, the view blind areas are too many, and the like, drivers are difficult to accurately judge the surrounding environment, and the stacking machines are extremely prone to collision with nearby personnel, containers and the like. Therefore, it is necessary to configure the anti-collision early warning function for the stacker to reduce or even avoid the occurrence of the collision accident of the stacker.

When the traditional automobile auxiliary driving and blind area monitoring system is applied to the stacker, the traditional automobile auxiliary driving and blind area monitoring system is usually carried out in a radar detection early warning mode, a camera image acquisition monitoring mode or a combination mode of the radar detection early warning mode and the camera image acquisition monitoring mode, and the problems that false alarms are more and false alarms are serious when the radar detection mode is adopted because the working environment of the stacker, namely the space of a container port yard, is narrow; when the camera image acquisition and monitoring are adopted, the blind area monitoring image system only performs image acquisition, and all monitoring pictures are difficult to consider in the operation process of a driver, so that the anti-collision early warning effect is poor.

In order to overcome the problems, the embodiment provides a pedestrian anti-collision early warning method, which is beneficial to acquiring the environment information of 360 degrees around the stacker through looking around the video data collected by the cameras; through detecting pedestrian targets and tracking motion tracks, when the distance between a pedestrian and the stacker exceeds a safe distance, the stacker is triggered to send out early warning to prompt a driver, so that the driver can know in time and take measures in time, the safety is improved, and the occurrence of collision accidents of the stacker is reduced or even avoided. The method provided by the embodiment is high in operation speed, suitable for edge calculation, capable of meeting the embedded processing requirement of edge calculation and the real-time requirement, good in processing effect under the condition of multi-target tracking shielding, and free of frequent switching of target marks.

The function and implementation procedure of the method provided in this embodiment are illustrated in the following with reference to the accompanying drawings.

The pedestrian anti-collision early warning method provided in this embodiment, as shown in fig. 1, includes:

S101, acquiring video data acquired by a surrounding camera;

s102, decoding video data, and placing the decoded data into an image data queue;

s103, detecting the latest image frames in the image data queue based on a pre-trained pedestrian target detection model to obtain a detection result; judging whether the image obtained from the image data queue is a first frame image or not; if the image is not the first frame image, performing target prediction by using a Kalman filter according to the tracking result in the previous frame image to obtain a prediction result of the current frame image;

S104, matching the prediction result with a corresponding detection result to obtain a matching result;

s105, judging whether to perform early warning according to the matching result.

For step S101, the pan-around camera may include a plurality of cameras, and the imaging ranges of the cameras can jointly cover all the field of view ranges around the vehicle; the camera may comprise a wide angle camera. In addition, the all-around camera can further comprise a multi-path video processing module, and the multi-path video processing module is used for integrating and processing multi-path video images acquired by each camera so as to obtain video data corresponding to the environmental information in the range of 360 degrees around the stacking machine.

In step S102, the acquired video data is decoded, and the decoded image data is put into an image data queue; the image data in the image queue may be arranged in time order. In addition, an alarm region is defined in the decoded image according to a preset safety distance; illustratively, if the safety distance is 8 meters, an area with a distance of 8 meters or less from the stacker is determined as a warning area.

Before step S103, pedestrian data needs to be collected in advance, and a network training sample and a test sample are obtained to train a pedestrian target detection model. The pedestrian target detection model can be a lightweight MobileNet-YOLOv3 pedestrian target detection model.

The MobileNet-YOLOv pedestrian target detection model has small parameters and high operation speed, is suitable for edge calculation, can meet the embedded processing requirement of edge calculation, and can meet the requirement of instantaneity.

The MobileNet-YOLOv pedestrian target detection model supports human shape detection, and comprises various forms of standing/standing, lying/sideways, squatting, bending over, walking, running and the like of a human body; the support part limb mirror entering identification comprises characteristic information mirror entering of individual head, arm, leg and the like, and can be identified as pedestrian entering. The model is not more than 5M, the size of a single frame image is 640 x 480, the time for identifying a pedestrian target is not more than 25ms, the detection accuracy rate is 99% when the recall rate is 0.96, and the early warning accuracy can be ensured.

In step S103, detecting image frames in the image data queue in sequence based on a pre-trained pedestrian target detection model; specifically, the latest image frames in the image data queue are sequentially detected based on a pre-trained pedestrian target detection model, and a detection result is obtained. As shown in fig. 2, specifically may include:

s1031, acquiring the latest image frame in an image data queue;

S1032, detecting the latest image frame based on a pre-trained target detection model to obtain M detection frames in the latest image frame, wherein M is an integer greater than or equal to 0;

S1033, screening detection frames with the height larger than a minimum height threshold and the confidence coefficient larger than a minimum confidence coefficient threshold from the M detection frames;

s1034, combining the screened detection frames by using a non-maximum suppression NMS method to obtain N detection results, wherein N is less than or equal to M.

That is, only the detection frames with the height h greater than min_height and the confidence of the detection frames greater than min_confidence are reserved, and the detection frames are combined by using a non-maximum suppression method to obtain N detection results. Wherein, the minimum height threshold value min_height and the minimum confidence threshold value min_confidence are preset values; specific numerical values the present embodiment is not specifically limited herein.

After step S1034, further including:

and extracting 128-dimensional feature vectors of the image blocks corresponding to the detection frames in the detection result based on the convolutional neural network CNN.

The 128-dimensional feature vector is used for representing a pedestrian target in the image block for subsequent calculation or matching.

In step S103, as shown in fig. 3, further including:

s1036, judging whether the acquired image acquired from the image data queue is a first frame image;

Specifically, the image acquisition time can be judged according to the acquired image acquisition time. For example, the first image frame read or acquired after the anti-collision early warning system is turned on, or after the panoramic camera is turned on, or after the video processor is turned on, may be used as the first frame image. Specifically, whether the acquired image is the first frame image can be judged by judging whether the acquired image is the first image frame read in or acquired; if the acquired image is the first image frame read in, determining the image as a first frame image; if the acquired image is not the first image frame read in, it is determined that the image is not the first frame image.

S1037, if the image is determined not to be the first frame image, performing target prediction by using a Kalman filter according to the tracking result in the previous frame image to obtain a prediction result of the current frame image;

that is, if it is determined that the image is not the first frame image, the K trackers of the current frame image are obtained by performing target prediction by using a kalman filter according to the K trackers of the previous frame image.

S1038, if the image is determined to be the first frame image, creating a Kalman tracker for each detection frame of the first frame image, setting the created tracker to be in an unacknowledged state, and generating a display signal according to the detection result of the first frame image, wherein the display signal is used for triggering and displaying the detection result.

In a specific implementation, the display signal is used for sending to the display, and the display is used for performing corresponding display according to the display signal. At this time, it can be determined whether the detection frame of the first frame image is located within a predetermined guard range, and if the detection frame of the first frame image is located within the predetermined guard range, an early warning signal can be generated. Of course, only the detection result may be displayed.

It should be noted that: in the above steps, the order of obtaining the detection result and obtaining the prediction result is not limited in the present embodiment; that is, the order of steps S1031 to 1035 and steps S1036 to S1038 is not particularly limited. In addition, the order of execution of the steps or the order of execution of the plurality of operations in the steps in the present embodiment is not particularly limited unless otherwise specified.

In step S104, matching between the detection result (including N detection frames, i.e., detectors) and the tracking prediction result (including K trackers, i.e., tracking frames) is performed, so as to obtain a matching result. The matching process specifically includes cascade matching (steps S1042 to S1045 described below) and IOU matching (steps S1046 to S1049 described below). Cascading matches requires a round robin traversal to a maximum of P trackers that have failed the match

Specifically, as shown in fig. 4, step S104 includes:

s1041, acquiring a confirmed state tracker in a plurality of trackers of a prediction result;

S1042, determining a minimum cosine distance matrix between the depth convolution characteristic of each new detection result in the current frame and the characteristic set stored by each tracker in the layer;

s1043, determining the Marsh distance between the prediction result and the detection result of each tracker, and setting the cosine distance value corresponding to the tracker with the Marsh distance larger than the corresponding threshold value as infinity in the minimum cosine distance matrix to obtain a processed minimum cosine distance matrix;

s1044, taking the processed minimum cosine distance matrix as input of a Hungary algorithm to obtain a linear matching result;

S1045, removing the matching pairs with differences meeting preset conditions in the linear matching results to obtain matching pairs, unmatched trackers and unmatched detection frames after primary processing;

s1046, determining an IOU tracker candidate set according to the unmatched trackers and the trackers in the unacknowledged state, and determining the IOU distance between the candidate trackers in the IOU tracker candidate set and the unmatched detection frame to obtain a cost matrix;

s1047, setting a combination which is larger than a corresponding reset threshold in the cost matrix as the reset threshold, and deleting the combination to obtain a processed cost matrix;

s1048, matching based on the Hungary algorithm and the processed cost matrix to obtain a linear matching result;

s1049, deleting the combination with the IOU smaller than the corresponding deletion threshold value in the obtained linear matching result to obtain a reprocessed matching pair, a non-matching tracker and a non-matching detection frame; wherein the deletion threshold is less than or equal to the reset threshold.

In step S1041, the newly added tracker in the current frame is recorded as an unacknowledged state tracker unconfirmed tracker; the remaining trackers are trackers that already exist for the previous frame, and the state of the partial trackers may be the same as the previous frame. Specifically, a tracker list is preset, the tracker list can be provided with state information of each tracker, and the confirmed state tracker in the current frame can be obtained according to the tracker list; or acquiring the confirmed state tracker in the current frame according to the characteristic set of the confirmed state tracker. Of course, the specific implementation of acquiring the validated state tracker is not limited thereto, and the present embodiment is merely illustrated herein.

In step S1042, a cosine distance value between the depth convolution feature corresponding to each new detection result in the current frame and the feature set stored in each tracker in this layer is calculated, and a minimum cosine distance matrix cost_matrix is obtained according to the pre-distance value. In addition, a minimum cosine distance value can be selected from the minimum cosine distance matrix to be used as a calculated value of the tracker and the detection result.

In step S1043, a running information constraint is performed in the minimum cosine distance matrix. Specifically, for each tracker, a Martensitic distance (Mahalanobis Distance) between a prediction result and a detection result is calculated, and a value of the Martensitic distance of the corresponding tracker greater than a threshold value of rising_threshold is set to infinity in a minimum cosine distance matrix.

In step S1044, taking the minimum cosine distance matrix after the above processing as an input of the hungarian algorithm, to obtain a linear matching result;

In step S1045, the matching pairs with larger differences in the linear matching result are removed, and in specific implementation, the matching pairs with differences larger than the corresponding threshold in the matching result may be removed, so as to obtain the matching pair matches, the unmatched tracker unmatched tracker and the unmatched detection box unmatched detection after the primary processing.

Next, an IOU tracker candidate set iou_track_candidate is composed for the unacknowledged state tracker unconfirmed tracks and the unmatched tracker unmatched tracker in step S1045 together; the candidate tracker in IOU_track_candidate is matched with the unmatched detection box unmatched _detection (English is fully called Intersection over Union; chinese is fully called cross-correlation).

Specifically, in step S1046, the iou_distance between the candidate tracker in the iou_track_candidate and the unmatched detection box unmatched _detection is determined, so as to obtain a cost matrix.

In step S1047, in the cost matrix, a combination with the iou_distance greater than the corresponding threshold is set as the threshold, so that the combination is deleted later, and the processed cost matrix is obtained.

In step S1048, the cost matrix obtained in step S1047 is used as input of the hungarian algorithm to perform matching again, and a linear matching result is obtained again;

In step S1049, filtering is performed in the linear matching result obtained in step S1048, deleting the combination with the IOU smaller than the corresponding threshold, and reserving the combination with the larger IOU, so as to obtain the reprocessed matching pair matches, the unmatched tracker unmatched tracker and the unmatched detection box unmatched detection.

And carrying out the following processing according to the obtained matching result:

for the matched pair obtained from the two treatments:

updating parameters of the corresponding Kalman filter according to the detection result corresponding to the matched pair;

storing the depth characteristics of the detection results corresponding to the matched pairs into the characteristic sets of the corresponding trackers;

Wherein the parameters of the Kalman filter include the number of hits; updating the unacknowledged state tracker with the continuous hit times reaching the time threshold in the matching pair to be a confirmed state tracker;

For unmatched trackers that are again processed:

if the unmatched tracker is an unacknowledged state tracker, deleting the unacknowledged state tracker from the tracker list;

if the unmatched tracker is a confirmed state tracker and the corresponding detection result is not matched in all continuous preset frames, confirming that the unmatched tracker is invalid, and deleting the unmatched tracker from the tracker list;

for the unmatched detection frame obtained by the reprocessing:

A new tracker is created for the unmatched detection box.

That is to say:

Matches for matching pairs that have been successfully matched: updating parameters of the corresponding Kalman filter by using the detection result, wherein the updating of the parameters of the Kalman filter comprises the following steps: a series of motion variables, hit times, and reset time_sine_update. And saving the depth convolution characteristics of the detection results in the matching pairs into the characteristic sets of the corresponding trackers. Judging whether the state of the tracker is a confirmed state or not, if the tracker is a non-confirmed state tracker and the number of continuous hits reaches a number threshold, if the tracker has been hit for 3 times, changing the state of the tracker from the non-confirmed state to the confirmed state.

For unmatched _tracker that did not match successfully: if the tracker is in an unacknowledged state, directly deleting the tracker from the tracker list; if this tracker is in a validated state, but the corresponding detection result cannot be matched by the continuous preset frames, the tracker is determined to be invalid, and the tracker is deleted from the tracker list.

For the unmatched detection block unmatched _detection: a new tracker is created for the matched detection box. The newly created tracker is added to the list of trackers.

And updating the feature set of the confirmed state tracker according to the result obtained by the operation. Wherein each validated state tracker maintains at most a depth convolution network characteristic of a predetermined number of frame detection results recently matched thereto. For example, each validated state tracker maintains a maximum of 100 frame detection results of deep convolutional network features that most recently match.

In step S105, if the matching result includes a confirmed state tracker, and the detection frame corresponding to the confirmed state tracker is located in the preset alert range, an early warning signal is generated, and the early warning signal is used for triggering early warning. Specifically, the early warning signal is used for sending to the buzzer, and can trigger the buzzer of the stacker to give out a sound prompt; and/or the early warning signal is used for sending the early warning signal to the indicator lamp to trigger the indicator lamp of the stacker to send out visual prompts.

In addition, the prediction result of the confirmed state tracker and the obtained detection result can be displayed, so that the driver can look over and know the prediction result. And if the confirmed state tracker does not exist in the matching result, displaying the obtained detection result. Specifically, a display signal can be generated according to the detection result of the unmatched detection frame, and the display signal is used for triggering and displaying the detection result; for the first frame image, if the detection frames in the first frame image are unmatched detection frames, generating a display signal according to the detection result of the first frame image. In a specific implementation, the display signal is used for sending to the display, and the display is used for performing corresponding display according to the display signal. Of course, in specific implementation, early warning judgment can be performed on the unmatched detection frame, and whether the unmatched detection frame is located in a pre-defined early warning range is specifically judged, if yes, an early warning signal is generated.

In this example, to increase the processing speed and ensure early warning time, the steps described above may be implemented by using multithreading, that is, pedestrian target detection and motion track association are performed by using multithreading, as shown in fig. 5, optionally:

Decoding the video data by using the first thread, and placing the decoded data into an image data queue;

detecting the latest image frames in the image data queue by using a second thread and based on a pre-trained pedestrian target detection model to obtain a detection result;

judging whether the latest image acquired from the image data queue is a first frame image or not by utilizing a third process; if the image is not the first frame image, performing target prediction by using a Kalman filter according to the tracking result in the previous frame image to obtain a prediction result of the current frame image;

matching the detection result with the prediction result by using a third thread to obtain a matching result;

And outputting a result by using a fourth process, and judging whether early warning is performed or not according to the matching.

It will be appreciated that: in the above examples, the portions not illustrated or defined may be conventional arrangements in the art, and this embodiment will not be described herein.

The implementation procedure of the method of the present embodiment is exemplified below.

First, a pedestrian dataset is collected, a network training sample and a test sample are obtained, and MobileNet-YOLOv3 pedestrian detection model is trained.

And starting the thread 1, reading and decoding video image data, and placing the decoded data into an image data queue. In this embodiment, four frames of images T1, T2, T3, and T4 are sequentially obtained by decoding from the video stream, and are respectively stored in the data queue. Four frames of images T1, T2, T3 and T4 are shown in FIG. 6a, FIG. 6b, FIG. 6c and FIG. 6d respectively; in fig. 6a to 6d, it can be appreciated that: in order to clearly show the output detection result and the prediction result, the image content is hidden.

Processing based on the image at time T1:

Thread 2 calls MobileNet-YOLOv 3a pedestrian detection model to detect pedestrians on the image at the moment T1; the detection result is represented by a rectangular frame { x1, y1, w1, h1}, wherein (x, y) represents the coordinates of the upper left corner of the rectangular frame, w represents the width of the detection frame, and h represents the height of the detection frame. Screening detection frame: only the detection frames with the height h of the detection frame larger than min_height and the confidence of the detection frame larger than min_confidence are reserved, and the detection frames are combined by utilizing a non-maximum suppression method to obtain N detection frames (M is larger than or equal to N).

In this example, min_height takes 30 and min_confidence takes 0.7. And combining the detection frames by a non-maximum value inhibition method to obtain 18 detection results. And extracting 128-dimensional feature vectors of the image blocks corresponding to the N detection frames in the fourth step by adopting a Convolutional Neural Network (CNN). In this embodiment, the deep convolution feature extraction network structure is shown in fig. 7.

Thread 3 creates 18 tracker trackers for 18 detection boxes. The tracker is set to an unacknowledged state because it is the first frame image.

Thread 4 only outputs and displays the detection result (indicated by a solid rectangle) because the tracking result of the state has not been confirmed in the image at time T1, as shown in fig. 6 a.

Processing based on the image at time T2:

Thread 2 detects a total of 12 pedestrian target detection boxes based on the pedestrian detection model. Thread 3 uses a kalman filter to obtain 18 tracking predictions in the T2 moment image for 18 trackers in the T1 moment image. And counts the number of predictive updates for each tracker.

Matching is carried out on 12 pedestrian target detection results and 18 tracker prediction results:

Existing tracker trackers are divided into a validated state tracker confirmed tracker and an unconfirmed state tracker unconfirmed tracker. In this embodiment, all 18 trackers are in an unacknowledged state.

For the validated state trackers confirmed tracker, they are cascade matched with the current detection result. Since the image at time T2 has not yet been confirmed tracker, there is no cascade matching.

The tracker unconfirmed tracker and the non-matching tracker (unmatched tracker) together form a IoU _track_candidate set for the unacknowledged state; the tracker in IoU _track_detect set performs IOU matching with the unmatched detection box (unmatched _detection).

In this embodiment, the number of unconfirmed state trackers is 18, and the number of trackers that do not match is 0, so the number of trackers in IoU _track_candidate is 18, and the number of detection boxes that do not match is 12.

The IOU matching flow is as follows:

Calculating IoU _track_ candiate and unmatched _detection, namely IOU_distance (1-IOU) between every two frames to obtain a cost matrix cost_matrix;

setting the combination of the cost matrix greater than the threshold (in this embodiment, the threshold is 0.7) to 0.7;

Matching by using a Hungary algorithm;

Screening the matching result, and deleting the combination with the IOU larger (in the embodiment, the threshold value is 0.5), namely, removing the combination with the IOU larger than 0.5;

Obtaining 12 pairs of matched pairs matches, 6 unmatched trackers unmatched tracker and 0 unmatched detection boxes unmatched detection;

For 12 matched pairs which are successfully matched, the parameters of the corresponding Kalman filter, including the motion variable and hit times, are updated by using the detection result. The convolutional neural network features of the detection boxes in matches are stored in the feature sets of the corresponding trackers. If the state of the tracker in the matching pair is unacknowledged and the tracker has hit X times in succession (in this embodiment, X takes 3), then the tracker state changes to the acknowledged state.

A Kalman filter motion variable updating step:

based on the detection frame detection detected at the time t, the state of the tracker track associated with the detection frame detection is corrected, and a more accurate result is obtained.

Kalman filter equations 1 and 2:

x(k)＝Ax(k-1) (1)

p(k)＝Ap(k-1)A^T+Q (2)

Wherein x (k-1) is state information of the object, which is information of the object in the previous frame [ center x, center y, aspect, height, 0], and p (k-1) is an estimated error of the object; a is a state transition matrix; q is the systematic error.

y＝z-Hx' (3)

S＝HP'H^T+R (4)

K＝P'H^TS^-1 (5)

x＝x'+Ky (6)

P＝(I-KH)P' (7)

In formula 3, z is the mean vector of detection, and does not include a speed change value, i.e., z= [ cx, cy, r, h ]; h is called a measurement matrix, which maps the mean vector x' of the track to the detection space, and the formula calculates the mean error of the detection and track.

In formula 4, R is a noise matrix of the detection frame, which is a diagonal matrix of 4x4, the values on the diagonal are two coordinates of the center point and the noise of wide and high respectively, initialized with arbitrary values, the noise of wide and high is generally set to be larger than the noise of the center point, and the formula maps the covariance matrix P' to the detection space first, and then adds the noise matrix R.

Equation 5 calculates the kalman gain K, which is used to estimate the importance of the error.

Equation 6 and equation 7 result in an updated mean vector x and covariance matrix P.

For 6 unmatched trackers unmatched tracker: if this tracker is still unacknowledged, it is directly deleted from the tracker list; if this tracker is in the acknowledged state, but the consecutive max_age frames (set to 3 in this embodiment) have failed to match, it is removed from the tracker list. In this embodiment, 6 unmatched trackers will be deleted from the tracker list.

For the unmatched detection result unmatched _detection, a new tracker is created for it. In this embodiment, there is no unmatched detection frame in the image at time T2.

Updating the feature set of the identified tracker. In the present embodiment, since the tracker of the state is not confirmed in the image at the time T2, this operation is not performed.

Thread 4 only outputs the display detection result (indicated by the solid rectangle), as shown in fig. 6b, because there is no tracking result of the confirmed state in the image at time T2.

Processing based on the image at time T3:

thread 2 invokes the pedestrian detection model to detect a total of 14 pedestrian target detection boxes, and currently there are a total of 12 trackers (all in unacknowledged state).

Thread 3 uses a kalman filter to obtain 12 tracking predictions in the T3 moment image for 12 trackers in the T2 moment image.

Matching 14 pedestrian target detection frames with 12 tracker prediction results; the method comprises the following steps:

Existing tracker trackers are divided into a validated state tracker confirmed tracker and an unconfirmed state tracker unconfirmed tracker. In this embodiment, all 12 trackers are in an unacknowledged state.

For the confirmation state trackers confirmed tracker, they are cascade matched with the current detection results. Since the image at time T3 has not yet been confirmed tracker, there is no cascade matching.

IOU matching is performed for the unacknowledged state tracker unconfirmed tracker and the non-matching tracks (unmatched tracker together comprise a IoU _track_candidate set, the trackers in the IoU _track_candidate set are matched to the non-matching detection box unmatched _detection.

In this embodiment, the number of unconfirmed state trackers is 12, and the number of trackers that do not match is 0, so the number of IoU _track_candidates is 12, and the number of detection boxes that do not match is 14.

The end result is 11 pairs of matched pair matches, 1 non-matched tracker unmatched tracker, and 3 non-matched detection boxes unmatched detection.

For 11 matched pairs which are successfully matched, the parameters of the corresponding Kalman filter, including the motion variable and the hit number, are updated by using a detection frame in the matched pair. The convolutional neural network characteristics of the detection frame are stored in the characteristic set of the corresponding tracker. In this embodiment, if 11 trackers have hit X times in succession (in this embodiment, X takes 3), then the 11 tracker states are changed to acknowledged states.

For 1 unmatched tracker unmatched tracker, if this tracker is also in an unacknowledged state, it is directly deleted from the tracker list. In this embodiment, 1 unmatched tracker will be deleted from the tracker list.

For the unmatched detection result unmatched _detection, a new tracker is created for it. In this embodiment, 3 unmatched detection results create 3 new trackers.

Updating the feature set of the identified tracker. In this embodiment, 11 trackers have acknowledged status. The tracker feature set holds at most the deep convolution network features of the most recently matched 100 frame detection results.

Thread 4 outputs a tracker (indicated by a dotted rectangle) displaying the detection result (indicated by a solid rectangle) and 11 confirmed states, as shown in fig. 6 c. And generating an early warning signal when a detection frame corresponding to the tracker with the confirmed state is positioned in a preset warning range. The early warning signal can be used for triggering a buzzer and/or an indicator lamp of the stacker to give out a prompt.

Processing based on the image at time T4:

Thread 2 invokes the pedestrian detection model to detect a total of 14 pedestrian target detection boxes, currently there are 14 trackers (11 acknowledged states, 3 unacknowledged states).

Thread 3 uses a kalman filter to obtain 14 tracking prediction results in the T3 moment image for 14 trackers in the T3 moment image.

Matching is carried out on 14 pedestrian target detection results and 14 tracker prediction results. First, the existing tracker is divided into a validated state tracker confirmed tracker and an unconfirmed state tracker unconfirmed tracker. In this embodiment, there are 11 acknowledged states for 14 trackers, and 3 are unacknowledged states.

For the 11 trackers confirmed tracker in the confirmed state, they are cascade matched with the current detection result. The cascade matching step is as follows:

The cascading match operation needs to cycle through the set of trackers that just matched to trackers that have no matching success at most P times (p=30 in this embodiment).

The traversal procedure is as follows:

and calculating a minimum cosine distance matrix between the depth feature of each new detection result of the current frame and the feature set held by each tracker in the layer to serve as a cost matrix, (taking a minimum value as a calculated value between the tracker and the detection result).

And in the cost matrix, performing operation information constraint. For each tracker, the mahalanobis distance between the prediction result and the detection result is calculated, and the value of the mahalanobis distance of the corresponding tracker in the cost matrix, which is larger than the threshold value (the threshold value in the embodiment is 9.48), is set to be infinity.

And taking the cost matrix after the processing as input of the Hungary algorithm to obtain a linear matching result, and removing matching pairs with larger difference (the threshold value is 0.2 in the embodiment). 11 pairs of matched pair matches, 0 non-matched trackers unmatched tracker, and 3 non-matched detection boxes unmatched detection are obtained.

IOU matching is performed with respect to the non-acknowledged state tracker unconfirmed tracker and the non-matching trackers (unmatched tracker) together to form a IoU _track_candidate set, and the trackers in the IoU _track_candidate set are matched with the non-matching detection box (unmatched _detection).

In this embodiment, the number of unconfirmed state trackers is 3, and the number of trackers that do not match is 0, so the number of IoU _track_candidates is 3, and the number of detection boxes that do not match is 3.

In this embodiment, after the IOU is matched, 12 pairs of matched pairs matches, 2 unmatched trackers unmatched tracker, and 2 unmatched detection boxes unmatched detection are finally obtained.

For 12 matched pairs which are successfully matched, the parameters of the corresponding Kalman filter, including the motion variable and hit times, are updated by using the detection result. The convolutional neural network characteristics of the detection frame are stored in the characteristic set of the corresponding tracker. In this embodiment, if 11 trackers have hit X times in succession (in this embodiment, X takes 3), then the 11 tracker states are changed to acknowledged states. There are 1 trackers hit only 2 times, so set to unacknowledged state.

For 2 unmatched trackers unmatched tracker, if this is also an unacknowledged state, it is directly deleted from the tracker list. In this embodiment, 2 unmatched trackers will be deleted from the tracker list.

For the unmatched detection result unmatched _detection, a new tracker is created for it. In this embodiment, 2 unmatched detection results create 2 new trackers.

Thread 4 outputs a tracker (indicated by a dashed rectangle) displaying the detection result (indicated by a solid rectangle) and 11 confirmed states, as shown in fig. 6 d. And generating an early warning signal when a detection frame corresponding to the tracker with the confirmed state is positioned in a preset warning range. The early warning signal can be used for triggering a buzzer and/or an indicator lamp of the stacker to give out a prompt.

The steps of the method disclosed in this embodiment may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software units in the decoding processor. The software unit may be in a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc. which are mature in the art; the storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

In practical application, the pedestrian anti-collision early warning method can be realized through a computer program, such as application software and the like; or the method may also be implemented as a medium storing a related computer program, e.g., a usb disk, a cloud disk, etc.; still alternatively, the method may be implemented by a physical device, e.g., a chip, a mobile smart device, etc., integrated with or installed with a related computer program.

The present embodiment also provides an apparatus, including:

a memory;

A processor; and

A computer program;

Wherein the computer program is stored in the memory and configured to be executed by the processor to implement a method as in any of the preceding examples.

The specific implementation of the computer program in the device may refer to method embodiments, which are not described herein.

The memory is used for storing a computer program, and the processor executes the computer program after receiving the execution instruction, and the method executed by the apparatus for defining a flow disclosed in the foregoing corresponding embodiment may be applied to the processor or implemented by the processor.

The memory may comprise high-speed random access memory (RAM: random Access Memory), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The memory may be configured to implement a communication connection between the system network element and at least one other network element via at least one communication interface (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc.

The processor may be an integrated circuit chip having signal processing capabilities. In implementation, each step of the method disclosed in the first embodiment may be implemented by an integrated logic circuit of hardware in a processor or an instruction in a software form. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The corresponding methods, steps, and logic diagrams disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In this example, the processor may alternatively be a video processor.

The present embodiment also provides a computer-readable storage medium having a computer program stored thereon; the computer program is executed by a processor to implement a method as in any of the preceding examples.

The embodiment also provides a stacker, as shown in fig. 8, comprising a vehicle body, a looking-around camera 1, a buzzer 3 and the device 2 as in the previous example; the looking around camera 1, the equipment 2 and the buzzer 3 are mounted to the vehicle body; the looking around camera 1 and the buzzer 3 are in communication connection with the equipment 2.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The pedestrian anti-collision early warning method is characterized by comprising the following steps of:

acquiring video data acquired by a looking-around camera;

decoding the video data, and placing the decoded data into an image data queue;

Detecting the latest image frames in the image data queue based on a pre-trained pedestrian target detection model to obtain a detection result; judging whether the image obtained from the image data queue is a first frame image or not; if the image is not the first frame image, carrying out target prediction by using a Kalman filter according to the tracking result in the previous frame image to obtain a prediction result of the current frame image;

matching the prediction result with a corresponding detection result to obtain a matching result;

judging whether to perform early warning according to the matching result;

the detecting the latest image frame in the image data queue based on the pre-trained pedestrian target detection model to obtain a detection result comprises the following steps:

acquiring the latest image frame in the image data queue;

detecting the latest image frame based on a pre-trained target detection model to obtain a detection frame in the latest image frame;

Screening a detection frame with the height larger than a minimum height threshold and the confidence coefficient larger than a minimum confidence coefficient threshold from the obtained detection frames;

combining the screened detection frames by using a non-maximum suppression method to obtain a detection result;

matching the prediction result with a corresponding detection result to obtain a matching result, wherein the matching result comprises the following steps:

Acquiring a confirmed state tracker of a plurality of trackers of the prediction result;

Cascade matching is carried out on the confirmed state tracker and the detection frame corresponding to the detection result, so as to obtain a matched pair, a non-matched tracker and a non-matched detection frame;

the cascade matching is carried out on the confirmed state tracker and the detection frame corresponding to the detection result to obtain a matched pair, a non-matched tracker and a non-matched detection frame, and the cascade matching method comprises the following steps:

determining a minimum cosine distance matrix between the depth convolution characteristic of each new detection result of the current frame and the characteristic set stored by each tracker in the layer;

Determining the mahalanobis distance between the prediction result and the detection result of each tracker, and setting the cosine distance value corresponding to the tracker with the mahalanobis distance larger than the corresponding threshold value as infinity in the minimum cosine distance matrix to obtain a processed minimum cosine distance matrix;

Taking the processed minimum cosine distance matrix as input of a Hungary algorithm to obtain a linear matching result;

removing the matching pairs with differences meeting preset conditions in the linear matching results to obtain matching pairs, unmatched trackers and unmatched detection frames after primary processing;

after obtaining the matching pair, the unmatched tracker and the unmatched detection frame after the primary processing, the method further comprises the following steps:

Determining an IOU tracker candidate set according to the unmatched tracker and the unidentified tracker;

Determining the IOU distance between the candidate tracker in the IOU tracker candidate set and the unmatched detection frame to obtain a cost matrix;

Setting the combination which is larger than the corresponding reset threshold value in the cost matrix as the reset threshold value to obtain a processed cost matrix;

matching is carried out based on a Hungary algorithm and the processed cost matrix;

Deleting the combination of which the IOU is smaller than the corresponding deletion threshold value in the obtained matching result to obtain a reprocessed matching pair, a unmatched tracker and a unmatched detection frame;

Judging whether to perform early warning according to the matching result, including:

if the confirmed state tracker exists in the matching result and the detection frame of the confirmed state tracker is located in a preset warning range, an early warning signal is generated, and the early warning signal is used for triggering early warning.

2. The pedestrian collision avoidance early warning method of claim 1, further comprising, after obtaining the detection result: and extracting 128-dimensional feature vectors of the image blocks corresponding to the detection frames in the detection result based on the convolutional neural network.

3. The pedestrian collision avoidance warning method of claim 1, wherein after the determining whether the image acquired from the image data queue is the first frame image, if the image is the first frame image, a tracker is created for each detection frame of the first frame image, and the created tracker is set to an unacknowledged state.

4. The pedestrian collision avoidance early warning method of claim 1, further comprising:

for the matched pair obtained from the two treatments:

updating parameters of the corresponding Kalman filter according to the detection result corresponding to the matching pair;

storing the depth characteristics of the detection results corresponding to the matching pairs into the characteristic sets of the corresponding trackers;

Wherein the parameters of the Kalman filter comprise the number of hits; updating the unacknowledged state tracker with the continuous hit times reaching the time threshold in the matching pair to be a confirmed state tracker;

For unmatched trackers that are again processed:

if the unmatched tracker is an unacknowledged state tracker, deleting the unacknowledged state tracker from a tracker list;

If the unmatched tracker is a confirmed state tracker and the corresponding detection result is not matched in all continuous preset frames, confirming that the unmatched tracker is invalid, and deleting the unmatched tracker from a tracker list;

for the unmatched detection frame obtained by the reprocessing:

A new tracker is created for the unmatched detection box.

5. The pedestrian collision avoidance early warning method of claim 4, wherein the feature set of the validated state tracker is updated in accordance with the updated validated state tracker and the tracker list;

And the characteristic set stores the depth convolution network characteristics of the preset frame detection results matched with the corresponding trackers.

6. The pedestrian collision avoidance early warning method of claim 1, further comprising: and generating a display signal according to the confirmed state tracker, wherein the display signal is used for triggering and displaying a corresponding detection result or tracking result.

7. The pedestrian collision avoidance early warning method of claim 1, wherein,

Decoding the video data by using a first thread, and placing the decoded data into an image data queue;

Extracting 128-dimensional feature vectors of the image blocks corresponding to the detection frames in the detection results based on a convolutional neural network by using a second thread;

and judging whether to perform early warning or not according to the matching by using a fourth process.

8. An apparatus, comprising:

a memory;

A processor; and

A computer program;

Wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-7.

9. A computer-readable storage medium, characterized in that a computer program is stored thereon; the computer program being executed by a processor to implement the method of any of claims 1-7.

10. A stacker comprising a body, a camera, a buzzer and the apparatus of claim 8; the looking around camera, the equipment and the buzzer are mounted to the vehicle body; the looking around camera and the buzzer are in communication connection with the equipment.