Personnel counting method applied to cart early warning system
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to a personnel counting method applied to a cart early warning system.
Background
Machine vision is a cross discipline, which is now the most powerful application of complex machine learning algorithms in computer vision. In recent years, a machine learning method based on deep learning is widely applied to fields of target detection, tracking, understanding and the like in the aspects of images, videos and the like, wherein a car handling automation system of a secondary vertical shaft of a coal mine is related to machine learning. However, the cart early warning system in the cart system has the problems of weak automation degree, imperfect functions, inconvenient data uploading and the like, most of index monitoring still needs the affinity of workers, and the cart is a narrow-gauge railway carrying vehicle for conveying bulk materials such as coal, ore, waste stone and the like, so that safety accidents are very easy to be caused when personnel are detained at a movable safety gate of the cart or personnel beside a lane exceed the personnel, and even if the personnel monitoring is also performed, the personnel monitoring is also performed carelessly. Therefore, a high real-time personnel counting method is needed in the cart early warning system to avoid the risk hidden trouble.
Disclosure of Invention
Aiming at the technical problems, the invention combines a deep learning algorithm, a multi-target tracking algorithm and an application scene, and provides a personnel counting method applied to a cart early warning system.
In order to solve the problems, the invention adopts the following technical scheme:
a personnel counting method applied to a cart early warning system comprises the following steps:
s1, collecting a historical image to obtain a sample set to be tested;
s2, marking a sample set to be detected: classifying and labeling target personnel to be positioned in a sample set to be detected to generate a required data set;
s3, training a model: training the deep learning model by using a data set to obtain a target model applied to personnel identification and positioning;
s4, identifying and positioning target personnel: each frame of image acquired by the camera is transmitted into a target model in real time, target personnel are identified and positioned, a positioning result of the personnel and corresponding confidence coefficient are obtained, and the positioning result of the personnel is secondarily confirmed according to the confidence coefficient; judging whether the obtained confidence coefficient is in a preset confidence interval or not, if so, judging that the personnel positioning is successful, otherwise, the personnel positioning is failed, and continuing to position;
s5, personnel tracking: tracking and positioning the same person in the continuous frames by utilizing a multi-target tracking algorithm according to the positioning result to obtain a corresponding action track;
s6, personnel statistics counting and alarming: judging whether personnel enter a monitoring area according to the action track, counting the personnel when the action track enters the monitoring area, uploading the personnel to a PLC (programmable logic controller) end to realize linkage, and alarming if the personnel exceed the personnel.
The technical scheme of the invention is further improved as follows: the cart early warning system comprises, but is not limited to, a mining auxiliary well intelligent cart early warning system.
The technical scheme of the invention is further improved as follows: in step S3, the deep learning model includes, but is not limited to, a YOLOv3 model or a CNN model.
The technical scheme of the invention is further improved as follows: in step S3, the data set is divided into a training set and a testing set, the number of the training sets is more than that of the testing set, the training set is used for training the deep learning model, then the testing set is used for testing the trained deep learning model, and the target model is obtained after the testing is qualified.
The technical scheme of the invention is further improved as follows: in step S4, when the target personnel are identified and positioned by a machine vision method based on deep learning, tensorRT is called to realize the reasoning acceleration of the target model. The TensorRT of Inwinda is a high-performance deep learning reasoning (Inference) optimizer, and can provide low-delay and high-throughput deployment reasoning for deep learning application.
The technical scheme of the invention is further improved as follows: in step S5, the multi-target tracking algorithm includes a hungarian algorithm and a kalman filter, and after the target model is used to obtain the personnel target in each frame of image, the same personnel in the continuous frame of images is tracked, the kalman filter is used to predict the position of the personnel in the next frame, and then the hungarian algorithm is used to perform the data association of personnel positioning, so as to obtain the corresponding action track.
The technical scheme of the invention is further improved as follows: the specific processes of the Hungary algorithm and the Kalman filter are as follows:
s51, initializing a Kalman filter by using a plurality of personnel targets identified in a first frame image, and predicting positions and sizes corresponding to a plurality of personnel in a next frame image to obtain a prediction result;
s52, reading in a next frame of image, identifying and positioning personnel in the image, obtaining a detection result, establishing an association matrix of the detection result and a prediction result according to the area intersection ratio of the rectangular bounding box, finding out the best match by using a Hungary algorithm, and performing iterative optimization on a Kalman filter by using matched personnel object data, wherein:
if the detection results of the residual personnel after the completion of the matching cannot be matched with the upper prediction results, initializing a new Kalman filter by the residual personnel;
if the prediction results of the rest people are not matched with the detection results, the target is considered to be blocked or lost in tracking, and the frame number is recorded; when the frame number exceeds a set threshold, the target is considered to disappear or be lost, and the corresponding Kalman filter is removed;
s53, recording the position information of the person identified in the image, and generating a corresponding action track.
The technical scheme of the invention is further improved as follows: in step S51, the specific process is as follows:
the position of the person in the first frame image is represented as BoundingBox (x center ,y center ,h rect ,w rect ) Wherein (x) center ,y center ) The center point of the rectangular marking frame for marking personnel, (h) rect ,w rect ) The height and the width of the rectangular marking frame are the same;
initializing a Kalman filter using a plurality of person targets identified in a first frame image, assuming that the state of an ith person in a kth frame image is And->The position and the moving speed of the ith person in the kth frame of image are respectively shown as +.>And P k The state prediction equation of the kalman filter for the k+1st frame image is:
wherein the method comprises the steps ofP k+1 Mean and covariance matrix for corresponding prediction of k+1st frame image, F k For the motion coefficient matrix of the kth frame image, is->Is F k K is initialized to 1, +.>Initialized to the person's location result (x center ,y center ),/>Initializing to 0; predicting positions and sizes corresponding to a plurality of persons in the next frame of image by using initialized Kalman filter to obtain a prediction result +.>The prediction result is BoundingBox (x) center ,y center ,h rect ,W rect ) Is a prediction of (2).
The technical scheme of the invention is further improved as follows: in step S52, the specific process is:
reading in the next frame of image, identifying and positioning personnel in the image to obtain a detection resultI.e. BoundingBox (x) for the person in the next frame center ,y center ,h rect ,w rect ) The detection result of (2);
the prediction result and the detection result are sequentially subjected to the area ratio R of the rectangular boundary frame area The formula is:
wherein, judge1 and Judge2 are judging conditions for judging whether the rectangular bounding boxes are intersected, W and H are the width and height of the rectangle of the intersected part respectively; establishing an incidence matrix of the two according to the area intersection ratio Rarea, then finding out the optimal matching by using a Hungary algorithm, and carrying out iterative optimization on the Kalman filter by using matched personnel object data;
two Gaussian distributions (mu) can be obtained from the prediction result and the detection result i ,∑ i ) (i=0, 1), wherein μ i Sum sigma i The mean and covariance matrices of the Gaussian distribution are the Gaussian distribution of the prediction result and the detection result respectivelyQ k+1 Covariance matrix of Gaussian distribution for prediction result, +.>P k+1 Mean and covariance matrix for corresponding prediction of k+1th frame image,/for the frame image>Is Q k+1 Transposed matrix of>And->The state mean and covariance matrix of the detection result are respectively shown, so the update equation of the Kalman filter is as follows:
where K is the Kalman gain, and the update equationFor an updated optimal estimate +.>And an optimally estimated covariance matrix P' k+1 Together for the k+2th prediction and iterative optimization;
if the detection results of the residual personnel after the completion of the matching cannot be matched with the upper prediction results, initializing a new Kalman filter by the residual personnel;
if the prediction results of the rest people are not matched with the detection results, the target is considered to be blocked or lost in tracking, and the frame number is recorded; when the number of frames exceeds a set threshold, the target is considered to disappear or be lost, and its corresponding Kalman filter is removed.
The technical scheme of the invention is further improved as follows: in step S6, the data of personnel statistics are uploaded and displayed to the PLC end in real time through a TCP/IP protocol, so that linkage with the PLC end is realized.
Due to the adoption of the technical scheme, the beneficial effects obtained by the invention are as follows:
1. the invention does not need to monitor and count personnel in the running area of the mine car by human eyes, avoids errors caused by human factors, reduces potential safety hazards in car operation while improving safety operation and production efficiency, realizes the transition from automation to intellectualization of the auxiliary vertical car operation of the coal mine, effectively improves the application, management and maintenance levels of intelligent equipment of the coal mine, improves the safety and production management informatization and decision-making intellectualization levels of the transportation link of the mine hoist, and realizes the aim of changing people by using intelligent equipment.
2. The Hungary algorithm used in the invention adopts the area cross ratio of the rectangular bounding box as a matching target, and the corresponding calculation is carried out through the area cross ratio, which is obviously different from the mode of using the minimum distance sum in the conventional Hungary algorithm.
3. The invention can support various models, including but not limited to a YOLOv3 model, a CNN model and the like, and has the characteristics of strong applicability and flexible use. Compared with the traditional image recognition and positioning technical method, the method adopts a deep learning algorithm, has wide application scene and reduces the imperfection caused by artificial design;
4. the invention is not interfered by complex background, has strong robustness and high portability.
5. According to the invention, a deep learning technology is innovatively introduced into the cart early warning system, the number of people in the cart is accurately judged in real time by using the convolutional neural network, and early warning is carried out when the number of people exceeds the number of people, so that the intelligent level of a mine is enhanced, the automation degree of the cart early warning system is promoted to be further, the consumed manpower is greatly reduced, and the aims of reducing the personnel and the efficiency and the production cost are fulfilled.
Drawings
FIG. 1 is a flowchart of an algorithm employing a YOLOv3 model in accordance with an embodiment of the present invention;
FIG. 2 is a graph showing the results of the identification positioning and statistics of the personnel;
FIG. 3 is a graph showing the results of the identification, positioning and statistics of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples.
The invention discloses a personnel counting method applied to a trolley early warning system, which is applied to a mining auxiliary well intelligent trolley early warning system; the cart early warning system comprises, but is not limited to, a mining auxiliary well intelligent cart early warning system, and can also be other cart early warning systems, and special requirements are not required in application occasions. The method comprises the following general procedures:
s1, collecting a historical image to obtain a sample set to be tested.
S2, marking a sample set to be detected: classifying and labeling target personnel to be positioned in a sample set to be detected to generate a required data set; the image data set is marked, software for marking does not make specific requirements, a label is added to staff in each picture, and a rectangular frame format is utilized, for example, the label of a safety helmet is a 'safety helmet', and the label of a human face is a 'staff', so that the deep learning model can identify the label.
S3, training a model: training the deep learning model by using a data set to obtain a target model applied to personnel identification and positioning; for example, 6000-8000 images are selected for training of the deep learning model. During this period, the image with too bright or too dark needs to be preprocessed to meet the processing requirements, and will not be described too much. The deep learning model includes, but is not limited to, a YOLOv3 model or a CNN model, but may be other models. This embodiment is typically described in terms of the YOLOv3 model.
In this embodiment, a YOLOv3 network is used as the target detection model, which is the last version of the YOLOv series, and YOLOv3 uses dark-53 as the base network, consisting mainly of a series of 3×3 and 1×1 convolutional layers, residual structures. Wherein the convolution layer is used for extracting the characteristics of the image, and the output size of the characteristic image is controlled by adjusting the convolution step length. Adding batch normalization and leak ReLU activation layers after each convolution layer reduces variability, accelerates convergence, and avoids overfitting. The method also uses the thought of multi-scale of the feature pyramid network, deep features are extracted through up-sampling with the step length of 2, and splicing operation is performed under the condition that the dimension is the same as the dimension of the feature layer to be fused, so that more and more comprehensive image characteristics are obtained, and the image characteristics are transmitted to three feature images with different sizes to perform confidence degree and region coordinate prediction of corresponding objects. The specific YOLOv3 network architecture schematic is not provided specifically.
It should be noted in particular that the data set is divided into training sets and test sets according to a ratio, and the number of training sets is greater than that of test sets, for example, the ratio of training sets to test sets is 7:3, or can be 8:2 or 9:1 or other ratios, which are determined according to habit, experience or calculation. Training the deep learning model YOLOv3 by using a training set, testing the trained deep learning model by using a testing set, and obtaining a target model after the test is qualified, wherein the target model can be named as a YOLOv3 model 1.
S4, identifying and positioning target personnel: and each frame of image acquired by the camera is transmitted into the target model in real time, target personnel are identified and positioned, the positioning result of the personnel and the corresponding confidence coefficient are obtained, and the positioning result of the personnel is secondarily confirmed according to the confidence coefficient. Regarding the confidence coefficient, the corresponding confidence coefficient of the model output recognition result, namely recognition accuracy, such as judging whether the obtained confidence coefficient is in a preset confidence interval or not, if so, judging that personnel positioning is successful, otherwise, the personnel positioning is failed, and continuing to position until the confidence interval is met. The camera capture image invokes the official SDK of the camera brand, i.e. the development kit.
Meanwhile, when a target person is identified and positioned through a machine vision method based on deep learning, tensorRT is called to realize the reasoning acceleration of the target model YOLOv3 model 1. The TensorRT of Inwinda is a high-performance deep learning reasoning (Inference) optimizer, and can provide low-delay and high-throughput deployment reasoning for deep learning application. The collected images are input into a YOLOv3 model 1, and the YOLOv3 model 1 outputs the identification and positioning results of the personnel.
S5, personnel tracking: tracking and positioning the same person in the continuous frames by utilizing a multi-target tracking algorithm according to the positioning result to obtain a corresponding action track; the multi-target tracking algorithm in the embodiment comprises a Hungary algorithm and a Kalman filter, wherein after a target model YOLOv3 model 1 is utilized to obtain a personnel target in each frame of image, the same personnel in the continuous frame of images are tracked, the position of the next frame of personnel is predicted by the Kalman filter, and then the Hungary algorithm is utilized to carry out personnel positioning data association.
The specific processes of the Hungary algorithm and the Kalman filter are as follows:
s51, initializing a Kalman filter by using a plurality of personnel targets identified in a first frame image, and predicting positions and sizes corresponding to a plurality of personnel in a next frame image to obtain a prediction result;
s52, reading in a next frame of image, identifying and positioning personnel in the image, obtaining a detection result, establishing an association matrix of the detection result and a prediction result according to the area intersection ratio of the rectangular bounding box, finding out the best match by using a Hungary algorithm, and performing iterative optimization on a Kalman filter by using matched personnel object data, wherein:
if the detection results of the residual personnel after the completion of the matching cannot be matched with the upper prediction results, initializing a new Kalman filter by the residual personnel;
if the prediction results of the rest people are not matched with the detection results, the target is considered to be blocked or lost in tracking, and the frame number is recorded;
when the frame number exceeds a set threshold, the target is considered to disappear or be lost, and the corresponding Kalman filter is removed;
s53, recording the position information of the person identified in the image, and generating a corresponding action track.
S6, personnel statistics counting and alarming: judging whether personnel enter a monitoring area according to the action track, counting the personnel when the action track enters the monitoring area, uploading and displaying the personnel counting data to the PLC end in real time through a TCP/IP protocol, uploading to the PLC end to realize linkage, and alarming if the personnel exceeds the personnel.
The general operation of the invention is: collecting historical images, marking target personnel, and training by using the data set to obtain a YOLOv3 model 1; the camera collects images and transmits the images to the vision server, and a target person is positioned by a machine vision method based on deep learning; tracking and positioning the same person in the continuous frames by utilizing a multi-target tracking algorithm; and judging whether the monitoring area is entered according to the action track of the personnel, counting the personnel, uploading the personnel to the PLC end to realize linkage, and alarming if the personnel exceeds the personnel.
The Yolov3 network model is described in detail below with reference to fig. 1.
Based on the personnel counting method applied to the cart early warning system, the method combines a deep learning algorithm, a multi-target tracking algorithm and an application scene, and as shown in fig. 1, the specific process of the Yolov3 network model comprises the following steps:
step one: collecting historical images of target detection personnel in a cart early warning system, marking the historical images as an image sample set to obtain marking type Label and rectangular marking frame data boundingBox, wherein the Label has a unique value person, and the boundingBox structure is (x) min ,y min ,x max ,y max ) Wherein (x) min ,y min ) Pixel coordinates of upper left vertex of rectangular labeling frame in image, (x) max ,y max ) The pixel coordinates of the right lower vertex of the rectangular labeling frame in the image; to facilitate subsequent computation, the BoundingBox structure is converted into (x) center ,y center ,h rect ,w rect ),(x center ,y center ) The center point of the rectangular marking frame for marking personnel, (h) rect ,w rect ) Fusing the image sample set and the personnel labeling result for the height and the width of the labeling frame, and finally generating a data set required by a training model; in view of the small scale of the used data set, the data set obtained by combining the image sample set and the labeling result is divided into a training set and a testing set according to the proportion of 7:3, and the deep learning model training based on the YOLOv3 frame is carried out to obtain the YOLOv3 model 1 applied to personnel positioning. The relevant interface is shown below, where the bddbox is BoundingBox, which is the top left and bottom right corner of the bezel.
Step two: the camera collects real-time images of the trolley scene and uploads the images to the vision server to obtain a 1 st frame color image Img 1 Img is added to 1 And (3) taking the YOLOv3 model 1 obtained in the first step as input, carrying out recognition and positioning of the target personnel and corresponding confidence coefficient, judging whether the obtained confidence coefficient is in a preset confidence interval, and judging that the personnel is positioned successfully when the obtained confidence coefficient is in the confidence interval, wherein the TensorRT is called to generate a corresponding engine file according to the GPU computing power of the visual server and the YOLOv3 model 1, so that the reasoning acceleration of personnel recognition and positioning is realized.
Step three: initializing a Kalman filter with the identified plurality of person targets in the first frame image assuming the status of the ith person in the kth frame is And->The position and the moving speed of the ith person in the kth frame of image are respectively shown as +.>And P k The state prediction equation of the kalman filter is:
wherein F is k For the motion coefficient matrix of the kth frame image,is F k K is initialized to 1, +.>Initializing as a first frame of person positioning results (x center ,y center ),/>Initializing to 0; predicting the positions and the sizes of a plurality of persons corresponding to the next frame of image by using the initialized Kalman filter to obtain a prediction resultI.e. BoundingBox (x) for the person in the next frame center ,y center ,h rect ,w rect ) Is a prediction of (2).
Step four: reading in the next frame of image, repeating the second step, and identifying and positioning personnel in the image to obtain a detection resultI.e. BoundingBox (x) for the person in the next frame center ,y center ,h rect ,w rect ) Is a result of detection of (a). The prediction result and the detection result are sequentially subjected to the area ratio R of the rectangular boundary frame area The formula is:
wherein, judge1 and Judge2 are judging conditions for judging whether the rectangular boundary boxes are intersected, W and H are the width and height of the rectangle of the intersected part respectively. According to the area-to-area ratio R area Establishing an association matrix of the two, finding out optimal matching by using a Hungary algorithm, performing iterative optimization on a Kalman filter by using matched personnel object data, and obtaining two Gaussian distributions (mu) according to a prediction result and a monitoring result i ,∑ i ) (i=0, 1), wherein μ i Sum sigma i The mean and covariance matrixes of the Gaussian distribution are adopted, so that the Gaussian distribution of the prediction result and the monitoring result is respectively Q k+1 Covariance matrix of Gaussian distribution for prediction result, +.>And->Respectively are provided withFor the state mean and covariance matrix of the detection result, < ->Is Q k+1 So the updated equation for the kalman filter is:
where K is the Kalman gain, and the update equationFor an updated optimal estimate +.>And an optimally estimated covariance matrix P' k+1 For the k+2th prediction and iterative optimization. If the filter is updated with the detection result, the state prediction equation of step three is combined to obtain P of k+1 frame image k+1 、/>These two parameters update the next k+2 frame image. Motion coefficient moment F of the kth frame image k Motion coefficient moment F of k+1st frame image k+1 Are all available directly.
If the detection results of the residual personnel after the completion of the matching cannot be matched with the upper prediction results, initializing a new Kalman filter by the residual personnel; the remaining persons here are those who recognize the located detection result in the k+1 frame but fail to match the prediction result.
If the prediction results of the rest people are not matched with the detection results, the target is considered to be blocked or lost in tracking, and the frame number is recorded; when the number of frames exceeds a set threshold, the target is considered to disappear or be lost, and its corresponding Kalman filter is removed.
In short, if one is the detection result but fails to match the prediction result, initializing; one is that the predicted result cannot be matched with the detected result, and the frame number is recorded.
Step five: recording the position information of the identified person in the image, and generating a corresponding tracking track; the camera sequentially transmits back a plurality of continuous real-time images, and the fourth step is repeated; when the person's trajectory enters a predetermined monitoring area, a count is made, as shown in fig. 2 or 3.
Step six: and uploading the counted personnel number to the PLC end for linkage by the vision server, and giving an alarm if the personnel count and exceed the personnel count.
According to the six steps, the deep learning algorithm and the multi-target tracking algorithm are combined with the application scene, and the personnel counting method applied to the cart early warning system is provided.
The invention does not need to monitor and count personnel in the running area of the mine car by human eyes, avoids errors caused by human factors, reduces potential safety hazards in car operation while improving safety operation and production efficiency, realizes the transition from automation to intellectualization of the auxiliary vertical car operation of the coal mine, effectively improves the application, management and maintenance levels of intelligent equipment of the coal mine, improves the safety and production management informatization and decision-making intellectualization levels of the transportation link of the mine hoist, and realizes the aim of changing people by using intelligent equipment.