WO2023065395A1

WO2023065395A1 - Work vehicle detection and tracking method and system

Info

Publication number: WO2023065395A1
Application number: PCT/CN2021/127840
Authority: WO
Inventors: 刘世望; 袁希文; 林军; 康高强; 游俊; 王泉东; 丁驰; 袁浩; 徐阳翰; 岳伟; 熊群芳
Original assignee: 中车株洲电力机车研究所有限公司
Priority date: 2021-10-18
Filing date: 2021-11-01
Publication date: 2023-04-27
Also published as: CN115995063A

Abstract

A work vehicle detection and tracking method and system. An image enhancement method is used for performing enhancement processing on an image of a complex mine environment, multiple types of work vehicle detection results are obtained by means of a work vehicle detection model based on a deep-learning target detection frame, a work vehicle tracking model uses the work vehicle tracking method based on cascade matching of motion information and appearance features, and multi-type target tracking is performed according to the work vehicle detection results.

Description

Working vehicle detection and tracking method and system

field of invention

The invention relates to the technical field of visual detection and tracking, in particular to a detection and tracking method and system for a working vehicle.

Background technique

In 2020, eight ministries and commissions including the National Development and Reform Commission and the Ministry of Industry and Information Technology issued the "Guiding Opinions on Accelerating the Intelligent Development of Coal Mine", which clearly pointed out the development of unmanned driving systems for open-pit mining trucks, and strived to achieve "intelligent perception, intelligent decision-making, and automatic execution" in 2035 . In the intelligent sensing link of the unmanned driving system for mining trucks, the road test sensing module is required to have the functions of traffic flow statistics, vehicle intrusion, parking, retrograde, deceleration, and lane change detection. At the same time, in the vehicle-mounted perception module, the radar cannot obtain visual information such as environmental color and texture, resulting in problems such as insufficient ability to judge the target type. Vision-based multi-target real-time detection and tracking technology can be applied to road test traffic flow statistics, vehicle intrusion, parking, retrograde detection and other scenarios, and can also make up for the lack of vehicle radar perception capabilities. With its low cost, rich perception information, and comparable driver's visual ability, it has become an important part of the unmanned driving system of mining trucks, and it is also an indispensable key core technology for the system's intelligent perception.

For a long time, multi-target real-time detection and tracking technology has been a hot research topic in the fields of automatic driving and industrial inspection, and researchers have conducted a lot of research on it. Taking the proposal of convolutional neural network in 2012 as a watershed, multi-target real-time detection and tracking technology can be divided into two main directions: traditional visual analysis and visual deep learning.

In the direction of traditional visual analysis, multi-target detection and tracking is carried out by manually selecting or designing image features, combined with machine learning and other methods. The main methods are: 1) Modeling based on the object model: modeling the appearance model, and then finding the object in subsequent frames. For example, algorithms such as area matching, feature point tracking, active contours, and optical flow. Among them, the most commonly used method is the feature matching method. First, the target features are extracted, and then the most similar features are found in subsequent frames for target positioning. The commonly used features are: SIFT, SURF, Harris-corner, etc. 2) Search-based method: The researchers found that the method based on object model modeling needs to process the entire picture, resulting in poor real-time performance. Therefore, a prediction algorithm is added to search for targets close to the predicted value, narrowing the search range, and improving the real-time tracking performance. Commonly used prediction algorithms include Kalman filter and particle filter. Another method to narrow the search range is the kernel method: it uses the principle of steepest descent, iterates step by step in the direction of gradient descent on the target template, until the optimal position, such as meanshift and camshift algorithms. However, the robustness of manually selecting or designing image features is poor, and the machine learning method itself has inherent defects. Traditional visual analysis technology is more likely to be affected by many factors such as image quality, foreign object occlusion, target rotation, etc., and has poor practicability, especially in complex mine environments, where the similarity between vehicle targets and image backgrounds is high, and traditional visual analysis technologies cannot effectively distinguish vehicles with background.

In the direction of visual deep learning, convolutional neural networks are generally used to extract image features, which can well overcome the shortcomings of manual feature selection. Optimizing the network parameters through the backpropagation algorithm and learning massive image data to train the deep network model can effectively reduce the impact of image quality, foreign object occlusion, target rotation, etc. In order to overcome the adverse effects of foreign object occlusion, target rotation and camera shake, the researchers proposed a tracking method based on multi-level convolution filtering features. The algorithm uses the principal component analysis feature vector obtained by hierarchical learning, then uses the Bhattacharyachian distance to evaluate the similarity between features, and finally combines the particle filter algorithm to achieve target tracking. However, due to the lack of real-time target detection information, the tracking error cannot be corrected in time and gradually expands, resulting in poor tracking stability and persistence. For this reason, the target tracking framework of "detect first, then track" based on deep learning has gradually become the mainstream. First, the detection model is used to obtain the target bounding box, and then the trajectory prediction and tracking are performed according to the relationship between the front and rear frames. The classic representative of this type of tracking framework is deepsort, which uses a detection framework based on candidate regions for target detection, and then adds deep learning features based on sort's IOU fast matching, and obtains similarity metrics by calculating the cosine distance between detection features and tracking features Do target tracking. The detection network used by it has a complex structure and deep layers, so its real-time performance is poor. In order to improve the real-time performance of target tracking, based on the end-to-end detection framework YOLO, the researchers proposed a vehicle multi-target detection and tracking method that incorporates appearance features, using Kalman filtering to achieve single-target motion state tracking, by calculating the target position and features Loss to complete the associated matching of the target. This method improves the tracking speed to a certain extent, but the real-time performance still cannot meet the requirements. At the same time, the current target tracking methods based on deep learning are mainly tracking multiple targets of a single type, or tracking multiple targets without distinguishing categories, while the mining truck unmanned driving system requires simultaneous tracking of multiple types of operating vehicles. Trackers make even higher demands.

At present, there is no mature detection and tracking method for operating vehicles in complex mine environments. Due to the complex mining scene, there are objective problems such as complex unstructured road surfaces, variable vehicle sizes and types, and small differences between targets and graphic backgrounds. Therefore, traditional visual analysis methods are difficult to handle the detection and tracking tasks of operating vehicles in complex mine environments. At the same time, the existing multi-target detection and tracking method based on deep learning has a complex network structure and low real-time performance.

Summary of the invention

A brief summary of one or more aspects is presented below to provide a basic understanding of these aspects. This summary is not an exhaustive overview of all contemplated aspects and is intended to neither identify key or critical elements of all aspects nor attempt to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

The purpose of the present invention is to solve the above problems, and provides a method and system for detection and tracking of work vehicles (hereinafter sometimes referred to as a method and system for detection and tracking of work vehicles), which is aimed at unstructured roads with complex pavement and small differences between targets and image backgrounds. 、Variable size and variety of vehicles. Through gamma image enhancement, multi-scale fusion prediction, multi-source information cascade matching and other means, the real-time detection and tracking of operating vehicles are realized, and the vehicle type, size, location, and quantity are obtained. , trajectory and other information.

The technical solution of the present invention is as follows: the present invention provides a detection and tracking method for a work vehicle, acquires an image, and uses an image enhancement method to perform image enhancement processing on the image; The work vehicle detection model performs target detection and obtains target detection results. Wherein, the work vehicle detection model adopts a deep learning target detection framework, extracts the image features of the work vehicle through a convolutional neural network, obtains multiple types of work vehicle detection results and inputs them into the work vehicle detection model; the work vehicle tracking model is based on The working vehicle tracking method based on the cascade matching of motion information and appearance features obtains tracking targets and tracking trajectories, and realizes multi-type working vehicle tracking.

According to the present invention, when analyzing the image of the complex mine environment, the image enhancement method is used to enhance the similar image background of the operating vehicle and the mine image in the image, which effectively solves the problem of similarity between the target and the background in the image in the complex mine environment. The problem of low image recognition accuracy due to large and inconspicuous grayscale contrast improves the efficiency of detection and recognition of work vehicles and the real-time performance of work vehicle tracking. When detecting work vehicles, the work vehicle detection model uses the convolutional neural network to extract the image features of the work vehicle from the image, which not only overcomes the errors caused by manual selection of image features, but also the convolutional neural network. The training of the operating vehicle detection model is carried out through massive image feature learning, so that the trained vehicle detection model is more in line with the image features of the operating vehicle, and the efficiency and accuracy of operating vehicle detection are improved. In the process of target tracking of work vehicles, the work vehicle tracking model in the present invention can perform cascading matching of motion information and appearance features on multi-type work vehicles according to the detection results of different types of work vehicles, thereby realizing multi-type target tracking , which finally enables the present invention to track a variety of operating vehicles of different sizes and types in a complex mine environment.

According to an embodiment of the work vehicle detection and tracking method of the present invention, the work vehicle detection model uses the YOLO framework as a deep learning target detection framework, uses a genetic algorithm to optimize grid hyperparameters, and outputs a multi-layer prediction module; wherein, the The work vehicle detection model uses DIOU to construct a regression loss function, and obtains the detection frame of the work vehicle through the K-means clustering algorithm. With this, the work vehicle detection model can directly output the position information and type information of the detection target through the end-to-end YOLO deep learning target detection framework, thereby improving the target detection speed and improving the real-time performance of work vehicle tracking. When analyzing images of complex mine environments, it is beneficial for DIOU to construct a regression loss function, which can perform stable regression of the real frame of the operating vehicle, thereby avoiding the divergence of the training of the operating vehicle detection model. Work vehicle detection will obtain qualified prediction frames as sample data, and use the K-means clustering algorithm to cluster and analyze the sample data, so as to obtain the size of the detection frame that conforms to the feature distribution of the image of the work vehicle and improve the detection accuracy of the work vehicle . In addition, the present invention uses the genetic algorithm to optimize the grid hyperparameters, and outputs multi-layer prediction modules for different sizes of operating vehicles, which not only can meet the detection requirements of operating vehicles of different sizes, but also improves the detection efficiency, and uses the genetic algorithm to optimize the grid hyperparameters. Parameters, the gradient change is integrated into the feature map of the operating vehicle, thereby reducing the weight, and greatly improving the accuracy of image feature recognition of the operating vehicle.

According to an embodiment of the working vehicle detection and tracking method of the present invention, the working vehicle tracking model uses Kalman filter to predict and update the tracking trajectory of the working vehicle, and the cascade matching between the motion information and appearance features is performed based on IOU matching Work vehicle movement information association and work vehicle characteristic information association; the work vehicle movement information association adopts the Mahalanobis distance to evaluate the degree of motion state correlation, the work vehicle characteristic information association uses the cosine distance to evaluate the appearance feature association degree, and the Mahalanobis distance The cosine distance and the cosine distance are calculated by the comprehensive metric calculation formula to obtain the cascade matching comprehensive metric evaluation cascade matching correlation degree. By means of this, when tracking the working vehicle, the tracking trajectory of the working vehicle is predicted and updated through the Kalman filter, and the tracking trajectory is matched with the working vehicle through IOU matching. When using the Mahalanobis distance to evaluate the correlation between the motion state of the work vehicle in the monitoring frame and the motion trajectory of the work vehicle in the detection frame, the appearance feature vector of the work vehicle is introduced for matching, and only the similarity measure that satisfies both the Mahalanobis distance and the cosine distance Only when the standard is met, the operating vehicle is determined to be correctly associated with the corresponding tracking track, thereby reducing the impact of the tracking track being incorrectly matched with the occluder due to the possible long-term occlusion of the operating vehicle in a complex mine environment, and greatly improving The accuracy of the association matching between the detection target and the corresponding tracking trajectory of the present invention. In addition, using IOU matching can solve the short-term occlusion problem of the working vehicle during the tracking process. When the working vehicle tracking model fails to match the predicted trajectory of the working vehicle within the predefined maximum frame number threshold, the tracking of the working vehicle will be terminated. The lost work vehicle is tracked and deleted, and the work vehicle tracking efficiency is improved.

According to an embodiment of the work vehicle detection and tracking method of the present invention, the work vehicle tracking model stores the work vehicle feature map with successful data association in the corresponding work vehicle feature image database, and uses the work vehicle feature network from the associated The feature vector of the work vehicle is successfully extracted from the feature map of the work vehicle; wherein, the feature image library of the work vehicle is provided with a fixed storage threshold, and the feature map of the work vehicle is updated according to the data association time. By means of this, the cosine distance can be calculated by using the eigenvectors of the working vehicles that are successfully matched with the tracking trajectory, which can improve the matching speed of the tracking trajectory and the working vehicle, thereby improving the real-time performance of working vehicle tracking.

According to an embodiment of the work vehicle detection and tracking method of the present invention, the work vehicle tracking method based on cascade matching of motion information and appearance features further includes:

Obtain the target detection result and use the Kalman filter to predict the trajectory;

Combine motion information and appearance features for cascade matching;

Judging whether the cascade matching is successful, if the trajectory matching is successful, then use the Kalman filter to update and track the trajectory; if the trajectory matching fails or the target matching fails, then perform IOU matching;

Judging whether the IOU matching is successful, if the track matching is successful or the target matching fails, then use the Kalman filter to update the tracking track; if the track matching fails, then judge whether to delete the track;

Determine whether the track is in a confirmed state, if not, delete the track; if so, determine whether the track exceeds the maximum frame number threshold, if not, delete the track; if so, use the Kalman filter to update the track said trajectory;

Judging whether the trajectory updated by the Kalman filter is in a confirmed state, if not, then perform IOU matching; if so, perform cascade matching in combination with motion information and appearance features or output the target and the trajectory.

According to an embodiment of the work vehicle detection and tracking method of the present invention, the image enhancement method is gamma transformation or histogram equalization. With the help of this, the gamma transformation is used to enhance the image, and the low gray value in a narrow range is mapped to the high gray value in a wide range, so that the gray distribution of the enhanced image is more balanced, the details of the dark part are richer, and the complexity is improved. In the mining environment, the similarity between the image target and the background is large, and the gray contrast is not obvious, which improves the recognition rate of image feature extraction.

The present invention also provides a work vehicle detection and tracking system, including a work vehicle detection module for obtaining work vehicle detection results and a work vehicle tracking module for tracking multiple types of work vehicles; the work vehicle detection module includes an image processing unit , an image feature extraction unit and a working vehicle detection unit, the vehicle tracking module includes a trajectory tracking unit, a data association unit and a feature image storage unit; wherein the image processing unit uses an image enhancement method to perform image enhancement processing on the input image, and transmit the enhanced processed image to the image feature extraction unit; the image feature extraction unit extracts the image features of the work vehicle from the image through a convolutional neural network, and transmits the image features of the work vehicle to the A work vehicle detection unit; the work vehicle detection unit performs target detection through the image features of the work vehicle, and transmits the obtained work vehicle detection results to the trajectory tracking unit; the trajectory tracking unit uses the work vehicle detection results Predict and update the tracking trajectory of the corresponding work vehicle, and transmit the trajectory to the data association unit for cascade matching; the data association unit performs cascade matching based on the work vehicle tracking method of cascade matching of motion information and appearance features , the trajectory tracking unit performs target tracking according to the cascade matching result; the feature image storage unit is used for storing the feature map of the working vehicle whose cascade matching is successful.

According to an embodiment of the work vehicle detection and tracking system of the present invention, the feature image storage unit has work vehicle feature image libraries of different types of work vehicles, and the work vehicle feature image library is provided with a fixed storage threshold, according to data association time The feature map of the work vehicle is updated.

According to an embodiment of the work vehicle detection and tracking system of the present invention, the vehicle tracking module further includes a work vehicle feature vector extraction unit, and the work vehicle feature vector extraction unit uses the work vehicle feature network from the feature image storage The feature vector of the work vehicle is extracted from the work vehicle feature map of the unit.

According to an embodiment of the work vehicle detection and tracking system of the present invention, the work vehicle tracking method based on the cascade matching of motion information and appearance features further includes: obtaining the target detection result, and using Kalman filter to predict the trajectory;

Combine motion information and appearance features for cascade matching;

Compared with the prior art, the present invention has the following beneficial effects: it provides a method and system for detecting and tracking an operating vehicle that combines an operating vehicle detection model based on deep learning with an operating vehicle tracking model based on cascade matching of motion information and appearance features. In the complex mine environment, the similarity between the operating vehicle and the background is large, and the grayscale contrast is not obvious. The image enhancement method is used to enhance the image in the complex mine environment to improve the clarity and resolution of the image, and then improve the detection accuracy of the operating vehicle. accuracy and timeliness. Among them, the work vehicle detection model uses the improved end-to-end YOLO as the deep learning target detection framework, uses DIOU to construct the vehicle frame loss function, and clusters the prediction frames of the work vehicle through K-means to obtain the detection that conforms to the image characteristics of the work vehicle The real-time detection performance of the operating vehicle is improved by optimizing the network hyperparameters through the genetic algorithm. In addition, the work vehicle tracking model in this application combines the motion information of the work vehicle and the appearance features of multi-layer depth to perform cascade matching of work vehicles, and uses Kalman filtering and the Hungarian algorithm based on IOU matching to perform data processing on work vehicles and tracking trajectories. Correlation, so as to realize the real-time tracking of various types of work vehicles. The method proposed in this method can adapt to the image scene and effectively complete the real-time detection and tracking tasks of multiple types of operating vehicles in complex mine environments.

Description of drawings

The above-mentioned features and advantages of the present invention can be better understood after reading the detailed description of the embodiments of the present disclosure in conjunction with the following drawings. In the drawings, components are not necessarily drawn to scale, and components with similar related properties or characteristics may have the same or similar reference numerals.

FIG. 1 is a system configuration diagram showing an embodiment of a work vehicle detection and tracking system of the present invention.

Fig. 2 is a flow chart showing an embodiment of the work vehicle detection and tracking method of the present invention.

FIG. 3 is a comparison diagram showing the effects of gamma transform enhanced images.

FIG. 4 is a network configuration diagram showing a work vehicle detection model.

FIG. 5 is a flow chart illustrating a work vehicle tracking method.

FIG. 6 is a network parameter table showing a characteristic network of a work vehicle.

Detailed Description of the Invention

The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. Note that the aspects described below in conjunction with the drawings and specific embodiments are only exemplary, and should not be construed as limiting the protection scope of the present invention.

In recent years, convolutional neural networks, as the basic structure of object detection models in most scenes, have achieved comparable effects to human vision. The current mainstream detection algorithms are divided into direct detection and indirect detection. The representative of the direct method is YOLO (You only look once, that is, you only need to glance at the image to immediately know the object in the image, which is a one-step target detection algorithm), and The representative of the indirect method is Faster RCNN. Faster RCNN uses a two-step structure to extract object candidate areas for positioning and recognition, while YOLO directly outputs position and category information without candidate areas. Research shows that the indirect method is more time-consuming, while the direct method is more real-time and more in line with the actual needs of the project. Therefore, this application chooses end-to-end, one-step direct detection as the detection algorithm to improve the system detection speed.

An embodiment of a working vehicle detection and tracking system is disclosed herein, as shown in FIG. 1 , including a working vehicle detection module and a working vehicle tracking module. Among them, the work vehicle detection module includes an image processing unit, an image feature extraction unit and a work vehicle detection unit for obtaining work vehicle detection results; the work vehicle detection module includes a trajectory tracking unit, a data association unit and a feature image storage unit for Detected work vehicles are tracked. The working vehicle detection module and the working vehicle tracking module cooperate with each other, so as to realize the detection and tracking of the working vehicle in the complex mine environment. FIG. 2 is a flow chart showing an embodiment of the method for detecting and tracking a work vehicle of the present invention. This embodiment will be further described below in conjunction with FIG. 1 and FIG. 2 .

In this embodiment, after the work vehicle detection and tracking system acquires the image, the image processing unit uses gamma transformation to enhance the image in the complex mine environment, and maps the low gray value in a narrow range to the high gray value in a wide range. Figure 3 is a comparison diagram of the effect of gamma transformation enhanced images. Comparing the grayscale distribution and pixel distribution of the image before and after gamma transformation in Figure 3, it can be clearly seen that after gamma transformation, the grayscale distribution of the image is more balanced, and the pixel The distribution is denser, and the details of the dark part are richer, thereby reducing the similarity between the working vehicle and the background image, and improving the accuracy of the working vehicle detection.

After the image processing unit completes the image enhancement processing, the enhanced mine area image can be stored not only on the vehicle side, but also on the ground server for extracting image features and model training. The image feature extraction unit uses the convolutional neural network to extract the image features of the work vehicle from the enhanced image, and then the work vehicle detection unit performs target detection based on the extracted work vehicle image features, so as to obtain the work vehicle detection result and output it to the work vehicle tracking Model. Among them, the image feature extraction unit has both the model training and the actual operation process. By training the image feature extraction module, the characteristics of the operating vehicle and its learning are completed, and then the trained image feature extraction module is used for actual operation. Figure 4 shows A schematic diagram of the network structure of the work vehicle detection model will be further described below in conjunction with FIG. 4 .

Specifically, as shown in FIG. 4 , the work vehicle detection model includes a backbone network (Backbone) and a neck (Neckpart). After inputting the image of 514*640*3 into the work vehicle detection model, the size of the image is changed to 608*608*3, and the adjusted image is sliced into 304*304*12 through the Focus structure Feature map of size. In order to realize the detection of different sizes and types of work vehicles, the feature map is further sliced to obtain feature maps of three different sizes. The neck of the work vehicle detection model performs convolution and series operations on these feature maps to extract work vehicles. image features. In addition, in this embodiment, when the feature map is transmitted between the backbone network of the work vehicle detection model and the neck for image extraction, a cross-level local network is used to alleviate a large number of calculation problems, improve the real-time performance of image recognition, and change the gradient Integrate into the feature map, thereby reducing the deep learning weight and maintaining accuracy. The neck of the work vehicle detection model adopts the structure of FPN and PAN. For large, medium and small work vehicles, the feature maps of three different sizes are convolved and concatenated to obtain 76*76*33\38 *38*33 and 19*19*33 models in three sizes realize the detection of different types of work vehicles.

Furthermore, in this embodiment, in order to perform a stable real frame regression of the work vehicle and avoid training divergence of the work vehicle detection model, DIOU is used to construct the regression loss function of the work vehicle. At the same time, in order to speed up the training process of the convolutional neural network and improve the detection accuracy of the convolutional neural network, the K-means clustering algorithm is used to cluster and predict the size of the frame to obtain a detection frame that conforms to the characteristics of the operating vehicle. Among them, the real frame is the frame marked on the image by humans, and the predicted frame is the frame predicted by the network model.

Specifically, in this embodiment, the regression loss function of the real frame of the work vehicle is constructed using DIOU. DIOU (Distance-IoU loss) takes into account the distance, overlap rate and scale factor between the real frame and the predicted frame of the operating vehicle. Similar to GIOU, DIOU can still provide the moving direction for the predicted frame without overlapping with the real frame. . DIOU loss can directly minimize the distance between two vehicle boxes, so it converges faster than GIOU loss. Finally, the non-maximum value suppression algorithm is used to filter the prediction frame to obtain the final position and category of the work vehicle. The formula for DIOU is as follows:

Among them, b and bgt represent the center points of the prediction frame and the real frame respectively, ρ represents the Euclidean distance between the two center points, and represents the diagonal distance of the smallest closed area that can contain both the prediction frame and the real frame, and C represents the The diagonal distance of the minimum closed area that contains both the predicted box and the ground truth box. After the value of DIOU is calculated, it is substituted into the regression loss function of the operating vehicle detection model for calculation, so as to evaluate the accuracy of the operating detection model for detecting operating vehicles in complex mine environments.

Specifically, the regression loss function of the work vehicle detection model consists of the first behavior prediction frame loss, the second and third behavior target confidence loss and the fourth behavior classification loss. The specific formula is as follows:

Among them, the binary cross entropy with logits loss is used for the confidence loss and classification loss part, the size of the prediction module is S×S×B, S×S represents the number of prediction grids, and B represents the module depth.

During the actual operation of the work vehicle detection model, the DIOU regression loss function is used to measure the error between the predicted frame and the real frame, and the predicted frame that does not conform to the characteristics of the work vehicle image is filtered out. Specifically, a threshold is set, and the error value between the predicted frame and the real frame is calculated. If the predicted frame is greater than the set threshold, the predicted frame is filtered out; if the predicted frame is smaller than the threshold, the predicted frame is retained. In the process of model training, if the model training meets the expected standard, the training is completed, and if the model loss function is lower than the expected standard (for example, 1), the training is ended. The threshold is selected according to the actual situation and personal experience. The prediction frame that meets the conditions is used as the sample data, that is, the prediction frame, for machine learning, so as to obtain the detection frame that meets the characteristics of the work vehicle, and the detection frame is output as the target detection result. Among them, the target detection results include information such as the center coordinates, width, height, and category of the operating vehicle in the detection frame. Specifically, in this embodiment, the preset vehicle category number is used to identify the type of work vehicle, for example, the output number 1 indicates a truck, and the number 2 indicates a command vehicle, etc., and the work vehicle in the detection frame is identified through the output vehicle type number type.

In traditional object detection methods, the prediction frame is obtained by multi-scale traversal sliding window or selective search and then positioned, or the size of the prediction frame is manually set for position regression, but these methods are often inefficient and ineffective. In this embodiment, the K-means clustering algorithm is used to analyze the image of the operating vehicle in the mining area, and the IOU (Intersection over Union) of the real frame of the operating vehicle is obtained by dividing the overlapping part of the two regions by the set part of the two regions. The results, that is, the intersection ratio of the real frame and the candidate frame) are clustered as the "distance", so as to obtain the predicted frame size that conforms to the characteristic distribution of the mining vehicle image. The K-means clustering steps are as follows:

Step 1: Take the height and width of the sample ground truth frame as a sample point (w _n , h _n ), n∈{1,2,…,N}, the center point of the sample ground truth frame is (x _n ,y _n ), n∈{1,2,…,N}, and then all sample points form a data set.

Step 2: Randomly select K sample points in the data set as cluster centers.

Step 3: Calculate the value d of the distance between all sample points in the data set and the K cluster centers, and assign the sample points to the cluster center with the smallest d value to obtain K cluster point clusters, that is, all real The frame is classified into K categories, where the calculation formula of d is as follows:

d=1-IOU[(x _n ,y _n ,w _n ,h _n ),(x _n ,y _n ,W _m ,H _m )]

Step 4: Recalculate the K cluster centers of the K cluster point clusters, where N _m represents the mth cluster point

The number of sample points in the cluster, that is, the number of real frames, is calculated as follows:

Finally, repeat steps 3 and 4 until the K cluster centers stop moving, and K cluster centers are obtained, which are used as the width and height of the prediction frame of the work vehicle to obtain the prediction frame of the work vehicle.

In this embodiment, the working vehicle detection and tracking system detects the input mine video image through the working vehicle detection module, and after obtaining the detection result of the working vehicle, it inputs it into the working vehicle tracking The feature tracks detected work vehicles.

Specifically, after the work vehicle detection module obtains the detection results of the work vehicle, the trajectory tracking unit predicts the next movement trajectory of the work vehicle through Kalman filtering according to the current state of motion of the work vehicle. In this embodiment, eight parameters (u, v, r, h, x', y', r', h') are used to describe the motion state of the motion trajectory at a certain moment, and u, v, r, h are respectively Indicates the position of the center point of the work vehicle detection frame. Where (u, v) represent the center coordinates of the detection frame, r is the ratio of the target ordinate to the abscissa, h is the height, and x', y', r', h' represent the corresponding positions of the operating vehicle in the image coordinates speed information. With the continuous movement of the work vehicle, the parameters of the motion state are also constantly changing. According to the parameter information of the work vehicle motion state at a certain moment, the Kalman filter takes the four parameters u, v, r, and h as variables, and adopts the uniform velocity model or The linear observation model observes the detected operating vehicles and predicts the trajectory of the operating vehicles in the next frame of images.

In this implementation form, when the work vehicle detection module tracks the work vehicle, the tracked work vehicle is used as the detection target, the tracking track is defined as track k, and the parameter ak is used to count the number of image frames that match each track k with the detection target, and The maximum frame number threshold Amax is used as the maximum life cycle of the track. When the Kalman filter is used to track the work vehicle in real time, all the trajectories k are saved in the trajectory set, and ak is incremented with the matching times of the corresponding trajectories k and the detection target. If the track k matches the detection target successfully, set the track k to a certain state; when the track k matches the detection target again, reset ak to 0. If the track k fails to match the detection target, set the track k to an unconfirmed state; when the ak of the track k exceeds the predefined maximum frame number threshold Amax, delete the track k from the track set, and use the Cal Mann filter re-predicts the trajectory of the work vehicle. Classify the newly predicted motion trajectory in the first three frames of images as a tentative trajectory. If the detection target is not successfully matched within the three frames of images, these tentative trajectories will be deleted and the tracking of the working vehicle will be terminated. .

Furthermore, in this embodiment, in order to perform target tracking more stably, the data association unit introduces vehicle depth feature metrics in the process of matching motion trajectories and detecting targets, and performs cascade matching in combination with the motion information of the work vehicle and the appearance features of the work vehicle, And store all confirmed matched work vehicle feature vectors in the feature image storage unit. When cascade matching is performed, the cosine distance between the detection target and the corresponding work vehicle feature vector is calculated as the appearance feature correlation measure. Since the detection target is occluded for a period of time, the uncertainty of the Kalman filter prediction will greatly increase, the observability in the state space becomes very low, and the Mahalanobis distance is more inclined to the trajectory with greater uncertainty. Therefore, In the cascade matching of motion state and appearance features, when IOU matching is used to assign confirmed trajectories, the most recently matched trajectories are given higher priority, and the priority of trajectories that have not matched the detection target in consecutive multi-frame feature maps is lowered.

Fig. 5 is a flow chart of the working vehicle tracking method. The working vehicle tracking method is based on the cascade matching of motion state and appearance features, and in series with IOU matching and Kalman filtering to track the working vehicle. The cascading matching of motion information and appearance features includes the association of motion information of the work vehicle and the association of appearance features of the work vehicle. The data association unit can only determine that the detection target is correctly associated with the predicted trajectory when the cosine metric and the Mahalanobis metric are satisfied at the same time. If the trajectory k matches the detection target successfully, the working vehicle tracking model takes the trajectory k as the confirmed trajectory, outputs the tracked working vehicle and the corresponding trajectory k, and then updates the parameters of the trajectory k. And when the track k matches the detection target again, the maximum lifetime ak of the track is reset to 0. If the matching between the trajectory k and the detection target fails, the tracking model of the work vehicle sets the trajectory k as an unconfirmed state, re-performs the cascade matching between the motion state and the appearance features, and conducts the unconfirmed trajectory k, the unmatched trajectory and the unmatched detection target. IOUs were matched and tracked for allocation confirmation using the Hungarian algorithm again.

Specifically, the working vehicle motion information association calculates the Mahalanobis distance between the predicted motion state and the detected motion state of the work vehicle observed at the current moment through Kalman filtering, and the formula is as follows:

Among them, m represents the Mahalanobis distance, T represents the matrix transpose, i represents the i-th trajectory, St is the covariance matrix of the observation space of the Kalman filter at the current moment, yt is the predictor at the current moment, and dj is the jth detection The motion state of the work vehicle (u, v, γ, h). The Mahalanobis distance represents the uncertainty of state estimation by measuring the standard deviation of the position away from the average orbit, and the 0.95 quantile of the inverse chi-square distribution (that is, the quantile of the normal distribution probability of 0.95) is used as a threshold to filter weak , where the filter function is as follows:

Specifically, the average trajectory is the average value of each trajectory of Kalman filtering, and the Mahalanobis distance is calculated between the average trajectory and the actual detected vehicle frame to determine whether the vehicle frame and the trajectory are coincident. The farther the standard deviation is, the larger the standard deviation is, the greater the uncertainty of the state estimation is.

Mahalanobis distance is a good correlation measure if the motion uncertainty of the detection target is low when performing data association between the detection target and the motion trajectory. However, in practice, the Mahalanobis distance measurement method will be invalidated due to the movement of the camera during the movement of the work vehicle. Therefore, in this embodiment, when the Mahalanobis distance is used to correlate the movement information of the work vehicle, the appearance characteristics of the work vehicle are introduced, and the cosine distance is used. The similarity measure is used to represent the correlation degree of the appearance characteristics of the work vehicle, and jointly measure the correlation measure between the detection target and the tracking track.

Among them, the Mahalanobis distance measures the standard deviation of the position of the average track to represent the degree of correlation between the detection frame of the work vehicle and the motion state in the tracking frame, and the cosine distance calculates the cosine value of the two appearance feature vectors corresponding to the predicted trajectory and the detection result and extracts The minimum cosine value is used as the degree of appearance relevance. Specifically, the feature image storage unit constructs a work vehicle appearance feature library for each tracked work vehicle to store the latest 125 frames of work vehicle feature vector rki successfully associated with each work vehicle. Among them, k represents the number of frames, and the maximum value is 125. The stored work vehicle feature vector is used to calculate the appearance correlation degree between the i-th predicted trajectory and the j-th detection result of the current frame, where the detection result refers to the appearance of the detection target The formulas of feature vector, appearance correlation function and corresponding filter function are as follows:

Specifically, the cosine distance refers to the cosine value of the two appearance feature vectors corresponding to the i-th predicted trajectory and the j-th detection result, and the appearance correlation function is used to extract the minimum cosine value as the appearance correlation degree. In the filter function corresponding to the appearance correlation function, f represents the appearance feature vector, and l represents the distance of the appearance feature vector f, and use this filter function to filter out the trajectories that do not reach the appearance correlation threshold.

Further, in this embodiment, the feature vector of the work vehicle is extracted using the feature network of the work vehicle, and FIG. 6 is a network parameter table of the feature network of the work vehicle. As shown in Figure 6, the work vehicle feature network uses a residual network, which includes a convolutional layer, a maximum pooling layer, and six residual modules. Finally, the global feature map with a dimension of 128 is calculated in the dense layer. Through Regularization projects features into vehicle feature vectors.

In this embodiment, after the data association unit obtains the Mahalanobis metric of the associated motion information and the cosine metric of the associated appearance feature, the Mahalanobis distance and the cosine distance are integrated to obtain the cascaded matching metric Ii, j, the cascaded matching metric function and the corresponding The formula for the filter function is as follows:

I _{i, j} = λl ^m (i, j) + (1-λ) l ^f (i, j)

Among them, λ is the weight coefficient, g in the filter function represents the Mahalanobis metric m and the cosine metric f, and the threshold of the filter function is set by the actual scene and personal experience, and is not a fixed value.

Although the methods described above are illustrated and described as a series of acts for simplicity of explanation, it is to be understood and appreciated that the methodologies are not limited by the order of the acts, as some acts may occur in a different order according to one or more embodiments And/or concurrently with other actions from those illustrated and described herein or not illustrated and described herein but can be understood by those skilled in the art.

Those of skill in the art would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The various illustrative logic blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented with a general-purpose processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other Implemented or performed by programmable logic devices, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in cooperation with a DSP core, or any other such configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of both. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integrated into the processor. The processor and storage medium can reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and storage medium may reside as discrete components in the user terminal.

In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example and not limitation, such computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or other Any other medium that is suitable for program code and can be accessed by a computer. Any connection is also properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave , then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media. Disk and disc, as used herein, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks are often reproduced magnetically. data, while a disc (disc) uses laser light to reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.

The previous description of the present disclosure is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to the present disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the present disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A working vehicle detection and tracking method, characterized in that the method comprises:

Acquire an image, and perform image enhancement processing on the image by using an image enhancement method;

The image after the image enhancement processing is input into the work vehicle detection model for target detection, and the target detection result is obtained; wherein, the work vehicle detection model adopts a deep learning target detection framework, and extracts the image features of the work vehicle through a convolutional neural network, And input the obtained multi-type work vehicle detection results into the work vehicle tracking model for target tracking;

The working vehicle tracking model obtains the tracking target and the tracking trajectory through the working vehicle tracking method based on the cascade matching of motion information and appearance features, and outputs the tracking target and the target motion trajectory.
The work vehicle detection and tracking method according to claim 1, wherein the work vehicle detection model uses YOLO as the deep learning target detection framework, uses a genetic algorithm to optimize grid hyperparameters, and outputs a multi-layer prediction module; Wherein, the work vehicle detection model adopts DIOU to construct a regression loss function, and obtains the detection frame of the work vehicle through a K-means clustering algorithm.
The work vehicle detection and tracking method according to claim 1, wherein the work vehicle tracking model uses a Kalman filter to predict and update the work vehicle tracking trajectory, and the cascade matching of the motion information and appearance features is based on The IOU matching performs the association of the motion information of the work vehicle and the feature information of the work vehicle.
The detection and tracking method of a working vehicle according to claim 3, wherein the working vehicle motion information is correlated using the Mahalanobis distance to evaluate the correlation degree of the motion state, and the working vehicle feature information is correlated using the cosine distance to evaluate the appearance feature correlation degree , the Mahalanobis distance and the cosine distance comprehensive metric calculation formula are calculated to obtain the cascade matching comprehensive metric evaluation cascade matching correlation degree.
The work vehicle detection and tracking method according to claim 4, wherein the work vehicle tracking model stores the work vehicle feature map with successful data association in the corresponding work vehicle feature image library, and obtains the work vehicle feature image from the work vehicle feature network through the work vehicle feature network. The feature vector of the work vehicle is extracted from the feature map of the work vehicle; wherein, the feature image library of the work vehicle is provided with a fixed storage threshold and the type of the work vehicle, and the feature map of the work vehicle is updated according to the data association time and the type of the work vehicle.
The work vehicle detection and tracking method according to claim 1, wherein the work vehicle tracking method based on cascade matching of motion information and appearance features further comprises:

Obtain the target detection result and use the Kalman filter to predict the trajectory;

Combine motion information and appearance features for cascade matching;

Judging whether the cascade matching is successful, if the trajectory matching is successful, then use the Kalman filter to update and track the trajectory; if the trajectory matching fails or the target matching fails, then perform IOU matching;

Judging whether the IOU matching is successful, if the track matching is successful or the target matching fails, then use the Kalman filter to update the tracking track; if the track matching fails, then judge whether to delete the track;

Determine whether the track is in a confirmed state, if not, delete the track; if so, determine whether the track exceeds the maximum frame number threshold, if not, delete the track; if so, use the Kalman filter to update the track said trajectory;

Judging whether the trajectory updated by the Kalman filter is in a confirmed state, if not, then perform IOU matching; if so, perform cascade matching in combination with motion information and appearance features or output the target and the trajectory.
The work vehicle detection and tracking method according to any one of claims 1-6, characterized in that the image enhancement method is gamma transformation or histogram equalization.
A work vehicle detection and tracking system, characterized in that it includes a work vehicle detection module for obtaining work vehicle detection results and a work vehicle tracking module for tracking multiple types of work vehicles;

The work vehicle detection module includes an image processing unit, an image feature extraction unit and a work vehicle detection unit;

The vehicle tracking module includes a trajectory tracking unit, a data association unit and a feature image storage unit; wherein,

The image processing unit uses an image enhancement method to perform image enhancement processing on the input image, and transmits the enhanced image to the image feature extraction unit;

The image feature extraction unit extracts the image features of the work vehicle from the image through a convolutional neural network, and transmits the image features of the work vehicle to the work vehicle detection unit;

The work vehicle detection unit performs target detection through the image features of the work vehicle, and transmits the acquired work vehicle detection result to the trajectory tracking unit;

The trajectory tracking unit predicts and updates the trajectory of the working vehicle according to the detection result of the working vehicle, and transmits the trajectory to the data association unit for cascade matching;

The data association unit performs cascade matching through a work vehicle tracking method based on motion information and appearance feature cascade matching, and the trajectory tracking unit performs target tracking according to the cascade matching result;

The characteristic image storage unit is used for storing the characteristic map of the work vehicle whose cascade matching is successful.
The work vehicle detection and tracking system according to claim 8, wherein the feature image storage unit has work vehicle feature image libraries of different types of work vehicles, and the work vehicle feature image library is provided with a fixed storage threshold, according to The feature map of the work vehicle is updated at data association time.
The work vehicle detection and tracking system according to claim 9, wherein the vehicle tracking module further includes a work vehicle feature vector extraction unit, and the work vehicle feature vector extraction unit is stored in the A feature vector of the work vehicle is extracted from the feature map of the work vehicle in the feature image storage unit.
A working vehicle tracking method based on cascade matching of motion information and appearance features, comprising the following steps:

Obtain the target detection result and use the Kalman filter to predict the trajectory;

Combine motion information and appearance features for cascade matching;

Judging whether the cascade matching is successful, if the trajectory matching is successful, then use the Kalman filter to update and track the trajectory; if the trajectory matching fails or the target matching fails, then perform IOU matching;

Judging whether the IOU matching is successful, if the track matching is successful or the target matching fails, then use the Kalman filter to update the tracking track; if the track matching fails, then judge whether to delete the track;

Determine whether the track is in a confirmed state, if not, delete the track; if so, determine whether the track exceeds the maximum frame number threshold, if not, delete the track; if so, use the Kalman filter to update the track said trajectory;

Judging whether the trajectory updated by the Kalman filter is in a confirmed state, if not, then perform IOU matching; if so, perform cascade matching in combination with motion information and appearance features or output the target and the trajectory.