CN115995053A - Target speed detection method, device, equipment and storage medium - Google Patents

Target speed detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN115995053A
CN115995053A CN202310025801.2A CN202310025801A CN115995053A CN 115995053 A CN115995053 A CN 115995053A CN 202310025801 A CN202310025801 A CN 202310025801A CN 115995053 A CN115995053 A CN 115995053A
Authority
CN
China
Prior art keywords
target
time
detection
track
speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310025801.2A
Other languages
Chinese (zh)
Inventor
牟骏杰
邓博文
陈昌金
罗凡程
肖杰
王鑫
李小兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China South Industries Group Automation Research Institute
Original Assignee
China South Industries Group Automation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China South Industries Group Automation Research Institute filed Critical China South Industries Group Automation Research Institute
Priority to CN202310025801.2A priority Critical patent/CN115995053A/en
Publication of CN115995053A publication Critical patent/CN115995053A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a target speed detection method, a device, equipment and a storage medium, wherein the method can detect a target by using a YOLOX algorithm, then the detection result is tracked by using a multi-target tracking algorithm deep, a speed detection area is arranged in a camera area, the time of entering and leaving the area of the target is acquired, the track of the target in the area is acquired, and the advancing speed of the target is calculated by using the track and the time. The method has real-time performance, detection accuracy, adaptability to different detection environments and different detection targets, and has stronger generalization capability.

Description

Target speed detection method, device, equipment and storage medium
Technical Field
The present invention relates to the field of speed detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a target speed by using a video recognition scheme in a factory scene.
Background
In a factory scene, a special area has a regulation for prohibiting running of pedestrians, and in an indoor environment and an outdoor environment, objects such as forklifts and automobiles have a certain speed regulation requirement. For speed detection, the traditional scheme is to install an infrared speed-measuring camera to detect vehicles, the scheme can not achieve full coverage to detect the speed of various targets in a factory, and meanwhile, the installation of the infrared camera can also increase corresponding cost. In addition, the speed measurement is carried out by using the sensor, the speed detection can be carried out in an all-around mode by using the scheme, but the scheme cannot set the threshold value independently in each region, the threshold value can be set only through the whole domain, the generalization capability is not possessed, a large number of sensors are needed to be purchased by using the scheme of the sensor, and the hardware cost is increased by using the intelligent equipment.
In recent years, the artificial intelligent image detection technology reaches a higher level in efficiency and accuracy, the artificial intelligent image detection technology is added into the safety control of factories, the production management is gradually not fresh, and with the improvement of detection capability, the functions which cannot be realized through videos, images and the like before can be smoothly realized. In such an age background, there is also a demand for speed monitoring of targets such as personnel, articles, vehicles, etc. in factories. The speed monitoring of these targets is also an important aspect of safe and stable production of the factory because of very dangerous events such as overspeed of pedestrians or transportation vehicles in the factory indoor and outdoor environments.
Meanwhile, with the increasing progress of technologies such as image detection and image recognition, the accuracy and efficiency in the detection of dangerous behaviors also reach a more ideal state, and because the factory environment is complex, if staff runs and other dangerous behaviors in the factory scene, serious consequences can be caused if the staff cannot be found in time by a manager and other colleagues, so that the application of the image detection technology to the detection of dangerous behaviors is an important work.
Therefore, how to provide a non-contact method for indirectly detecting the speed of the target is an urgent technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a target speed detection method, apparatus, device, and storage medium for overcoming or at least partially solving the above problems. The speed detection scheme for objects and pedestrians in a factory scene is realized through the technologies of target detection and multi-target tracking in the image detection technology, priori knowledge and the like. The non-contact type speed detection scheme for indirectly detecting the target is realized, and the speed detection function can be realized effectively in real time without increasing hardware and labor cost.
The invention provides the following scheme:
a target speed detection method comprising:
acquiring a plurality of continuous frame images contained in video stream information; the video stream information is a video stream of an acquisition area acquired by image pickup equipment;
detecting targets in the continuous frame images by using a target detection network to obtain a target detection frame;
a target tracking network is utilized to combine the target detection frame to obtain a tracking track;
acquiring the entering time and the leaving time of the target entering and leaving the speed detection area, and acquiring the time difference between the entering time and the leaving time; the speed detection area is a part of the acquisition area;
acquiring an actual track contained in the tracking track in a time period contained in the entering time and the leaving time, and calculating an actual track distance contained in the actual track;
and calculating to obtain the actual speed of the target by combining the actual track distance with the time difference through priori knowledge.
Preferably: the target detection network comprises a modified YOLOX detection network; the improved YOLOX detection network comprises:
using CSPDarknet as a basic network frame and combining an attention Focus module to extract characteristics of the decoded image;
enhancing and fusing the extracted features by adopting an FPN-PAN structure;
and outputting the target detection frame, the target category and the confidence coefficient by using a decoupling HEAD HEAD module.
Preferably: the target tracking network comprises a deepsort network; obtaining a tracking track by combining the deepsort network with the target detection frame, wherein the method comprises the following steps of:
generating Detections, and carrying out tracking track frame Tracks prediction by using Kalman filtering;
cascade matching and IOU matching are carried out on the prediction result and the detection of the current frame by using a Hungary algorithm;
and updating by using Kalman filtering to obtain the tracking track.
Preferably: after detecting new detections, predicting the newly detected detections, and matching by using a Hungary algorithm to determine a matching pair at the moment;
and carrying out cost matrix operation by adopting motion characteristics and appearance characteristics, wherein the motion characteristics are used for predicting the distance between the Kalman filtering state and the newly obtained detection by utilizing the Mah distance.
Preferably: processing appearance features using a small residual network, comprising:
extracting targets in the detections as input, and obtaining a multidimensional feature vector after network processing;
and storing appearance characteristics in each track as a gamma, performing appearance matching with the gamma in each track when new detections are entered, obtaining a minimum cosine distance, and distinguishing different appearance categories by setting a minimum cosine distance setting threshold value to 0.45.
Preferably: the acquiring the entering time and the exiting time of the target entering and exiting the speed detection area comprises the following steps:
and determining the time when the gravity center position of the target is positioned at the start line and the end line of the speed detection area as the entering time and the leaving time.
Preferably: and combining the entering time and the leaving time to obtain the time difference by utilizing a frame rate ratio.
A target speed detection apparatus comprising:
a continuous frame image acquisition unit for acquiring a plurality of continuous frame images contained in the video stream information; the video stream information is a video stream of an acquisition area acquired by image pickup equipment;
the target detection unit is used for detecting targets in the continuous frame images by utilizing a target detection network to obtain a target detection frame;
the target tracking unit is used for acquiring a tracking track by combining a target tracking network with the target detection frame;
an in-out speed detection area time acquisition unit for acquiring the in-out time and the out-out time of the target in-out speed detection area and acquiring the time difference between the in-out time and the out-out time; the speed detection area is a part of the acquisition area;
the actual track determining unit is used for acquiring an actual track contained in the tracking track in a time period contained in the entering time and the leaving time, and calculating and obtaining an actual track distance contained in the actual track;
and the actual speed calculation unit is used for calculating and obtaining the actual speed of the target by combining the actual track distance and the time difference by using priori knowledge.
A detection apparatus, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the target speed detection method according to the instructions in the program code.
A computer-readable storage medium for storing a program code for executing the target speed detection method described above.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the target speed detection method, device and equipment and storage medium, the method can detect a target by means of a YOLOX algorithm, then target tracking is achieved on a detection result by means of a multi-target tracking algorithm DeepSort, a speed detection area is arranged in a camera area, time for the target to enter the area and leave the area is obtained, meanwhile, track of the target in the area is obtained, and the advancing speed of the target is calculated by means of the track and time. The method has real-time performance, detection accuracy, adaptability to different detection environments and different detection targets, and has stronger generalization capability.
In addition, the method can measure the speed of the targets in any area of the factory in a non-contact way by using the camera on the premise of not increasing the cost of the hardware of the factory, can monitor the speed continuously in all weather, is not influenced by factors such as environment, illumination and the like, and can measure the speed of different moving targets without being limited by the types of the targets. The method falls to the ground, and has a crucial prospective effect on the implementation of a deep learning image detection algorithm in the industrial field.
Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings by those of ordinary skill in the art without inventive effort.
FIG. 1 is a flow chart of a target speed detection method provided by an embodiment of the present invention;
FIG. 2 is a block flow diagram of a detection method provided by an embodiment of the present invention;
FIG. 3 is a diagram of a network of a Yolox detection framework provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a laboratory environment pedestrian detection result provided by an embodiment of the present invention;
FIG. 5 is a first schematic diagram of a target track update result provided by an embodiment of the present invention;
FIG. 6 is a second schematic diagram of a target track update result provided by an embodiment of the present invention;
FIG. 7 is a third schematic diagram of a target track update result according to an embodiment of the present invention;
FIG. 8 is a fourth schematic diagram of a target track update result provided by an embodiment of the present invention;
FIG. 9 is a schematic diagram of a speed monitoring area provided by an embodiment of the present invention;
FIG. 10 is a first schematic view of an access monitoring area provided by an embodiment of the present invention;
FIG. 11 is a second schematic view of an access monitoring area provided by an embodiment of the present invention;
FIG. 12 is a third schematic view of an access monitoring area provided by an embodiment of the present invention;
FIG. 13 is a schematic view of an exit monitoring area provided by an embodiment of the present invention;
FIG. 14 is a schematic diagram of a target speed detecting apparatus according to an embodiment of the present invention;
fig. 15 is a schematic diagram of a target speed detecting apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.
Referring to fig. 1, a target speed detection method provided in an embodiment of the present invention, as shown in fig. 1, may include:
s101: acquiring a plurality of continuous frame images contained in video stream information; the video stream information is a video stream of an acquisition area acquired by image pickup equipment;
s102: detecting targets in the continuous frame images by using a target detection network to obtain a target detection frame; in particular, the object detection network comprises a modified YOLOX detection network; the improved YOLOX detection network comprises:
using CSPDarknet as a basic network frame and combining an attention Focus module to extract characteristics of the decoded image;
enhancing and fusing the extracted features by adopting an FPN-PAN structure;
and outputting the target detection frame, the target category and the confidence coefficient by using a decoupling HEAD HEAD module.
S103: a target tracking network is utilized to combine the target detection frame to obtain a tracking track; specifically, the target tracking network comprises a deepsort network; obtaining a tracking track by combining the deepsort network with the target detection frame, wherein the method comprises the following steps of:
generating Detections, and carrying out tracking track frame Tracks prediction by using Kalman filtering;
cascade matching and IOU matching are carried out on the prediction result and the detection of the current frame by using a Hungary algorithm;
and updating by using Kalman filtering to obtain the tracking track.
Further, after detecting the new detected detections, predicting the newly detected detections, and matching by using a hungarian algorithm to determine a matching pair (track, detection) at the moment;
and carrying out cost matrix operation by adopting motion characteristics and appearance characteristics, wherein the motion characteristics are used for predicting the distance between the Kalman filtering state and the newly obtained detection by utilizing the Mah distance.
Still further, processing the appearance features using a small residual network, comprising:
extracting targets in the detections as input, and obtaining a multidimensional feature vector after network processing;
and storing appearance characteristics in each track as a gamma, performing appearance matching with the gamma in each track when new detections are entered, obtaining a minimum cosine distance, and distinguishing different appearance categories by setting a minimum cosine distance setting threshold value to 0.45.
S104: acquiring the entering time and the leaving time of the target entering and leaving the speed detection area, and acquiring the time difference between the entering time and the leaving time; the speed detection area is a part of the acquisition area; specifically, the time when the center of gravity position of the target is located at the start line and the end line of the speed detection area is determined as the entry time and the exit time.
Further, the time difference is obtained by utilizing a frame rate ratio in combination with the entering time and the leaving time.
S105: acquiring an actual track contained in the tracking track in a time period contained in the entering time and the leaving time, and calculating an actual track distance contained in the actual track;
s106: and calculating to obtain the actual speed of the target by combining the actual track distance with the time difference through priori knowledge.
According to the target speed detection method, the speed detection scheme of objects and pedestrians in a factory scene is achieved through the technologies of target detection in an image detection technology, a multi-target tracking technology and priori knowledge, the method is implemented by means of adjusting a camera rtsp address, analyzing the rtsp address through an FFmpeg tool, obtaining a video stream shot by a camera, then detecting a target through a YOLOX algorithm, achieving target tracking on a detection result through a multi-target tracking algorithm deep, setting a speed detection area in the camera area, obtaining time of the target entering the area and leaving the area, meanwhile obtaining actual tracks of the target in the area, utilizing time difference between the actual tracks and the entering and exiting speed detection area, and calculating the advancing speed of the target through a distance speed calculation formula. The method adopts a non-contact type indirect speed detection scheme for the target, and can realize the speed detection function in real time and effectively without increasing hardware and labor cost.
The method comprises the steps of reading a video of a monitoring camera of a factory, setting a speed monitoring area, carrying out target detection on a target to be detected in the camera area by utilizing a YOLOX algorithm, then carrying out target tracking by utilizing a deepsort multi-target tracking algorithm, recording the time when the target enters and leaves the speed monitoring area and the track in the speed monitoring area after the target enters the speed monitoring area, converting the actual time and the algorithm detection time, and obtaining a final target speed detection result by utilizing a speed calculation formula.
Implementation of the method is accomplished primarily through the following three major steps,
the method comprises the steps of firstly collecting video information and carrying out target detection on the video information, firstly utilizing an existing camera of a factory to carry out data reading, adopting an improved YOLOX scheme to carry out target detection in order to achieve real-time performance and accuracy, dividing an integral frame into three parts, firstly utilizing CSPDarknet to carry out feature extraction on decoded images, then utilizing an FPN-PAN structure to carry out enhancement and fusion on the extracted features, and then utilizing a decoupling HEAD HEAD module to realize output of a target detection frame, a target category and confidence, wherein the method comprises the steps of adding an attention module in a data reading and feature extraction part, and utilizing CIOU_Loss in the target detection frame to obtain target detection frame information. And taking the finally obtained target detection frame as a detection result, and using the target detection frame as a subsequent use.
The second step of tracking the target detection result in real time, tracking a detection result frame by using a deep source multi-target tracking scheme, the first step of generating Detections, then carrying out tracking track frame Tracks prediction by using Kalman filtering, then carrying out cascade matching and IOU matching on the prediction result and the Detections of the current frame by using a Hungary algorithm, and then updating by using the Kalman filtering. And obtaining a final multi-target tracking result.
And thirdly, calculating the speed of the target by using priori knowledge, firstly setting a speed detection area, calculating the time of entering and leaving the area of the target, then obtaining the track of the speed detection area by using the result of target tracking, then calculating the real time stamp of entering and leaving by using the difference value between the FPS and the video FPS operated by an algorithm, and calculating the real-time interval speed of the target by using a speed calculation formula v=s/t.
Therefore, the method can detect the target by using the YOLOX algorithm, then the target tracking is realized by using the multiple target tracking algorithm deep source on the detection result, then the speed detection area is arranged in the camera area, the time for the target to enter the area and leave the area is acquired, meanwhile, the track of the target in the area is acquired, and the advancing speed of the target is calculated by using the track and the time. The method has real-time performance, detection accuracy, adaptability to different detection environments and different detection targets, and has stronger generalization capability.
In order to facilitate experimental description, the method takes a pedestrian as an example for experiments, and the method is applicable to moving objects such as vehicles, objects and the like. The method can effectively monitor the speed in real time, simultaneously can transmit data back to the intelligent monitoring system to realize the functions of alarming such as overspeed and the like, has good accuracy and real-time performance, can be generalized to various scenes, and has strong generalization capability.
In the target speed detection method provided by the embodiment of the application, as shown in fig. 2, the target detection technology and the multi-target tracking technology in image processing are combined, and the scheme of multi-target and multi-area real-time speed detection of a factory area is realized by matching with priori knowledge, so that the following steps are required to be completed for realizing the method:
step one: firstly, a target detection data set is established, and sample collection work is carried out on a target needing speed detection. Sample collection is carried out in a factory, then pedestrian samples in COCO and VOC data sets are combined, sample marking work is carried out by taking VOC2007 as a marking format, sample collection is completed, then training is carried out by utilizing an improved YOLOX detection network, the whole network framework is divided into three parts, firstly, feature extraction is carried out on decoded images by utilizing CSPDarknet, then the extracted features are enhanced and fused by adopting an FPN-PAN structure, and then output of a target detection frame, a target category and confidence coefficient is realized by utilizing a decoupling HEAD HEAD module. In the training model as a speed detection scheme, the input of the target detection part processes the video stream information acquired by the camera to obtain a target detection frame.
In the specific implementation, a camera in a factory area reads continuous frame video information of the camera by using an FFmpeg tool, converts the continuous frame video information into RGB pictures, and then performs target detection by using a YOLOX target detection algorithm to obtain a pedestrian target detection frame.
In the first step of the scheme, a high-definition camera is added at the entrance of a factory area, a video stream is acquired by the camera, the video stream is converted into an RGB image by using an FFmpeg tool, the image information is used as the input of a training set and detection, the work of pedestrian detection is performed by using a target detection network, the specific implementation steps are as follows,
step 101, firstly, a target detection data set of various speed measurement targets in a factory environment needs to be established, and the images acquired by cameras at various positions of the factory are acquired, so that balanced acquisition work is required to be performed under different indoor and outdoor environments, different time periods and different weather illumination conditions. And then screening and de-duplicating the collected samples, finally leaving 2000 samples as samples, and then using a VOC marking tool to mark the samples of the speed measuring target for the 2000 samples.
Step 102, establishing an object detection network YOLOX network, wherein the network structure is as shown in fig. 3, and the network structure is divided into 4 parts:
step1, wherein CSPDarknet is used as a basic network framework, and the combination attention Focus module converts abstract information of the image into vector feature information of a high-dimensional space. ,
step2, performing feature fusion by using an FPN module and a PAN module, wherein the FPN is responsible for transmitting and fusing the top-down high-level feature information, and the PAN is responsible for transmitting and fusing the low-level feature information in the down sampling process.
Step3, decoupling the fused characteristic information by using a coupled Head decoupling module, and then performing concierge fusion connection on the basis, wherein the image input is taken as 720P image as an example, and the two-dimensional characteristic vector with the size of 85 x 8400 is finally changed into a two-dimensional characteristic vector with the size of 85 x 8400 after the steps, wherein 8400 represents the number of predicted frames, and 85 represents the information of the predicted frames and comprises position and category information.
Step4, the prediction result is obtained by adopting an Anchor scheme, wherein the result respectively comprises a prediction frame and a prediction category, wherein the prediction frame offset and the GT frame are updated by adopting CIOU_loss instead of Giou_loss, the result considers the scale information of the center shop distance and the aspect ratio of the boundary frame on the basis of Giou_loss, regression can be better performed, CIOU_loss is shown as a formula 1, and the category information is updated by adopting BCE_loss as a formula 3.
Figure BDA0004044544060000091
Wherein, the parameter Distance of the aspect ratio is measured by gamma 2 Is the center point distance.
Figure BDA0004044544060000101
Wherein w is width and h is height.
BCELoss=―((y n *logz n )+(1―y n )*log(1―z n ) (3)
Wherein z is n Representing the probability of predicting the nth sample as a positive sample, y n Representing the label value of the nth sample.
Step 103, using the YOLOX target detection framework of step 102 as a training framework, using the factory pedestrian target detection data set generated in step 101 as training data, and using corresponding tag information as training basis (sample addition is performed when the speed measurement sample class needs to be added), wherein the hardware condition of training is Xinhua three R4900 server, two Tesla T4 image display cards, and the memory is Gonston memory 128G. After the software and hardware are ready, training is started, 200 epochs are counted, the accuracy rate on the test set reaches 99.8%, training is completed when the recall rate reaches 99.9%, and the optimal model is stored for subsequent use.
Step 104, starting to predict by using the model trained in step 103, firstly obtaining rtsp addresses from corresponding cameras through a sea-health convergence platform, then performing video analysis by using an FFmpeg tool to obtain a section of video stream, then performing target detection on pictures of the video stream, using the Yolox target detection frame in step 102 as a prediction basis, then using the target detection model in step 103 as an initialization model input, starting frame-by-frame target detection prediction, and prompting a monitoring system to restart rtsp addresses when 15 continuous frames cannot be analyzed to images as shown in fig. 4.
Step two: and taking the Yolox detection result as input, completing real-time tracking of the target detection result, tracking a detection result frame by using a DeepSort multi-target tracking scheme, generating Detections in the first step, then carrying out tracking track frame Tracks prediction by using Kalman filtering, then carrying out cascade matching and IOU matching on the prediction result and the Detections of the current frame by using a Hungary algorithm, and then updating by using the Kalman filtering. And obtaining a final multi-target tracking result.
In specific implementation, the target detection frame in the pedestrian detection frame in the step one is used as detection, and target tracking is realized by using a deepsort algorithm.
In the second step, a multi-target tracking module is established, the target detection frame in the pedestrian detection frame in the first step is used as detection, target tracking is realized by using a deepsort algorithm, and the specific implementation steps of the second step are divided into the following steps.
Step 201, performing an initialization operation on the result bbox detection frame obtained in step 104 to generate a detection result, initializing with a kalman filter, initializing the detection as tracking the initial track tracks if there is no tracking track at this time when detecting the detection in step 104, marking the video sequence input in step 106 as V, marking the detection result obtained for each frame of pictures in V as D, and estimating two states of tracks with a kalman filter by means of a mean value and a covariance, wherein the mean value represents a positional relationship between targets and is represented by formula 4.
M x =[x,y,r,h,x′,y′,r′,h′](4)
Where x, y represents the barycentric coordinates of bbox, r represents the aspect ratio, and h represents the height.
In addition, covariance indicates uncertainty of a position, and the uncertainty is represented by an 8 x 8 state transition matrix, wherein numerical value invisible uncertainty is in forward distribution, and the larger the numerical value is, the larger the uncertainty is, and the smaller the numerical value is, the smaller the uncertainty is. The kalman filter predicts the state at time t from the information of track at time t-1 by the two values.
Step 202, after initializing and kalman filtering prediction in step 201 are completed, if step 104 detects a new detection, the new detected detection is predicted, and then a hungarian algorithm is used to perform matching to determine a matching pair (track, detection) at the moment. The method uses motion features and appearance features to perform cost matrix operation, so as to complete target matching, wherein the motion features use mahalanobis distance to predict the distance between the state of the kalman filter in step 201 and the newly obtained detection, and the distance is represented by formula 5.
D (i,j) =(D i ―M i ) T S i ―1 (D i ―M i ) (5)
Wherein M is i Is the mean value, S i The method is characterized in that the covariance is adopted, so that the Mahalanobis distance is used for estimating the uncertainty between the two states by calculating the covariance between the mean values of the Kalman filtering tracking positions, the associated targets are determined by setting a threshold value for the Mahalanobis distance, the targets which are not associated are eliminated, and the threshold value interval set in the method is more than 0.9, so that the targets which are not associated can be eliminated more appropriately.
Appearance features are processed through a small residual network that extracts objects in the detections as input and that obtains a 128-dimensional feature vector after processing through the network. According to the method, appearance characteristics in each track are stored as the bellery, when new detection is input, appearance matching is carried out on the appearance characteristics with the bellery in each track, the minimum cosine distance is obtained, and different appearance categories are distinguished by setting the minimum cosine distance to be 0.45.
The method comprises the steps of providing prediction information of an object through combination of motion characteristics and appearance characteristics, determining a matched tracking position when initialization is completed, tracking through IOU matching association without previous motion characteristics and appearance information, and finally obtaining all matching pairs, unmatched tracks and unmatched detections of a current frame.
Step 203, for the tracks successfully matched in step 202, updating by using the corresponding detection, and preparing for the next matching by updating the appearance feature and the motion feature.
In step 204, when the non-matched detections in step 202 exceed 5 frames and no corresponding tracks are found, the corresponding tracks are initialized to a new track, and the remaining detections are discarded.
In step 205, for the tracks Track that are not matched in step 203, which may be the loss detection frames caused by occlusion or motion blur, the number of 60 frames is reserved for the tracks that are not matched, if more than 60 frames are still not matched, the tracks are discarded, and the Track update results are shown in fig. 5, 6, 7 and 8.
Step three: calculating the speed of a target by using priori knowledge, firstly setting a speed detection area, calculating the time of the target entering and leaving the area by using the tracking result obtained in the second step, then determining the route in the speed detection area by using the tracking result of the target, determining the time stamp of the real person entering the area and leaving the area when the final video is played by using the difference value between the FPS operated by a calculation algorithm and the video playing FPS, and finally calculating the real-time interval speed of the target by using a speed calculation formula v=s/t.
In the specific implementation, the priori knowledge is used for matching with the target tracking track in the second step, so as to calculate the time for entering and leaving the speed measuring area and the track of the speed measuring area, and calculate the speed of the target.
In the third step, the priori knowledge is used for matching with the target tracking track in the second step, the time for entering and leaving the speed measuring area and the track of the speed measuring area are calculated, and the speed of the target is calculated. The specific implementation steps of the third step are divided into the following steps,
in step 301, the target speed is calculated according to a priori knowledge, and the target track and the time for the target to move through the track need to be known, so the method designs a target detection speed area first, as shown in fig. 9, and uses the distance between two straight lines as the speed monitoring area.
Step 302, calculating the time of entering the speed monitoring area and the time of leaving the speed monitoring area according to the speed monitoring area positions, respectively, wherein the position of the target is determined by the position of the center of gravity of the target, and the time of entering the speed monitoring area and the time of leaving the speed monitoring area are respectively denoted as t as shown in fig. 10, 11, 12 and 13 in ,t out
Step 303, according to the time of entering and exiting the speed detection area, since the algorithm running time is recorded, a gap exists between the time and the actual time in frame rate, and the final time difference is obtained by using the frame rate ratio.
And obtaining the track distance S of the time by connecting tracks in the area which passes in and out the detection area by utilizing the track information obtained in the step 205, and calculating the final target speed by using the prior knowledge formula 6.
v=S/(t out ―t in ) (6)
The method is characterized by strong real-time performance, flexible deployment and high detection accuracy compared with the traditional scheme, meanwhile, the method belongs to a scheme for indirectly carrying out speed detection, has stronger constraint on actions such as overspeed and the like in the factory, can realize 24-hour real-time detection throughout the day, can realize speed detection of objects of different types by changing the type of the target detection, and is suitable for various different scenes by deploying a scheme, has good generalization capability, and has a good reference value for solving the large direction of practical problems when a deep learning image processing algorithm is applied to the factory.
In a word, the method for detecting the target speed can detect the target in any area of the factory in an all-weather uninterrupted manner by utilizing the camera in a contactless manner on the premise of not increasing the cost of hardware of the factory, is not influenced by factors such as environment, illumination and the like, and can detect the speed of different moving targets without being limited by the types of the targets. The method falls to the ground, and has a crucial prospective effect on the implementation of a deep learning image detection algorithm in the industrial field.
Referring to fig. 14, the embodiment of the present application may further provide a target speed detection apparatus, as shown in fig. 14, which may include:
a continuous frame image acquisition unit 1401 for acquiring a plurality of continuous frame images contained in the video stream information; the video stream information is a video stream of an acquisition area acquired by image pickup equipment;
a target detection unit 1402, configured to detect a target in the continuous frame images by using a target detection network, to obtain a target detection frame;
a target tracking unit 1403 for obtaining a tracking track by combining the target detection frame with a target tracking network;
an entry and exit speed detection area time acquisition unit 1404 configured to acquire entry time and exit time of the target entering and exiting the speed detection area, and obtain a time difference between the entry time and the exit time; the speed detection area is a part of the acquisition area;
an actual track determining unit 1405, configured to obtain an actual track included in the tracking track in a time period including the entry time and the exit time, and calculate an actual track distance included in the actual track;
an actual speed calculation unit 1406, configured to obtain an actual speed of the target by using a priori knowledge in combination with the actual track distance and the time difference calculation.
The embodiment of the application can also provide a target speed detection device, which comprises a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the steps of the target speed detection method described above according to instructions in the program code.
As shown in fig. 15, a target speed detection apparatus provided in an embodiment of the present application may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all complete communication with each other through a communication bus 13.
In the present embodiment, the processor 10 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a field programmable gate array, or other programmable logic device, etc.
The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in an embodiment of the target speed detection method.
The memory 11 is used for storing one or more programs, and the programs may include program codes, where the program codes include computer operation instructions, and in this embodiment, at least the programs for implementing the following functions are stored in the memory 11:
acquiring a plurality of continuous frame images contained in video stream information; the video stream information is a video stream of an acquisition area acquired by image pickup equipment;
detecting targets in the continuous frame images by using a target detection network to obtain a target detection frame;
a target tracking network is utilized to combine the target detection frame to obtain a tracking track;
acquiring the entering time and the leaving time of the target entering and leaving the speed detection area, and acquiring the time difference between the entering time and the leaving time; the speed detection area is a part of the acquisition area;
acquiring an actual track contained in the tracking track in a time period contained in the entering time and the leaving time, and calculating an actual track distance contained in the actual track;
and calculating to obtain the actual speed of the target by combining the actual track distance with the time difference through priori knowledge.
In one possible implementation, the memory 11 may include a storage program area and a storage data area, where the storage program area may store an operating system, and application programs required for at least one function (such as a file creation function, a data read-write function), and the like; the store data area may store data created during use, such as initialization data, etc.
In addition, the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
The communication interface 12 may be an interface of a communication module for interfacing with other devices or systems.
Of course, it should be noted that the structure shown in fig. 15 does not limit the target speed detection apparatus in the embodiment of the present application, and the target speed detection apparatus may include more or fewer components than those shown in fig. 15, or may combine some components in practical applications.
Embodiments of the present application may also provide a computer readable storage medium for storing program code for performing the steps of the above-described target speed detection method.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the description of the embodiments above, it will be apparent to those skilled in the art that the present application may be implemented in software plus the necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (10)

1. A target speed detection method, characterized by comprising:
acquiring a plurality of continuous frame images contained in video stream information; the video stream information is a video stream of an acquisition area acquired by image pickup equipment;
detecting targets in the continuous frame images by using a target detection network to obtain a target detection frame;
a target tracking network is utilized to combine the target detection frame to obtain a tracking track;
acquiring the entering time and the leaving time of the target entering and leaving the speed detection area, and acquiring the time difference between the entering time and the leaving time; the speed detection area is a part of the acquisition area;
acquiring an actual track contained in the tracking track in a time period contained in the entering time and the leaving time, and calculating an actual track distance contained in the actual track;
and calculating to obtain the actual speed of the target by combining the actual track distance with the time difference through priori knowledge.
2. The target speed detection method according to claim 1, wherein the target detection network comprises a modified YOLOX detection network; the improved YOLOX detection network comprises:
using CSPDarknet as a basic network frame and combining an attention Focus module to extract characteristics of the decoded image;
enhancing and fusing the extracted features by adopting an FPN-PAN structure;
and outputting the target detection frame, the target category and the confidence coefficient by using a decoupling HEAD HEAD module.
3. The target speed detection method according to claim 1, wherein the target tracking network comprises a deepsort network; obtaining a tracking track by combining the deepsort network with the target detection frame, wherein the method comprises the following steps of:
generating Detections, and carrying out tracking track frame Tracks prediction by using Kalman filtering;
cascade matching and IOU matching are carried out on the prediction result and the detection of the current frame by using a Hungary algorithm;
and updating by using Kalman filtering to obtain the tracking track.
4. A method of detecting a target speed according to claim 3, wherein after detecting a new detection, the new detected detection is predicted and matched using a hungarian algorithm to determine a matching pair at that time;
and carrying out cost matrix operation by adopting motion characteristics and appearance characteristics, wherein the motion characteristics are used for predicting the distance between the Kalman filtering state and the newly obtained detection by utilizing the Mah distance.
5. The target speed detection method according to claim 4, wherein the processing of the appearance features using a small residual network comprises:
extracting targets in the detections as input, and obtaining a multidimensional feature vector after network processing;
and storing appearance characteristics in each track as a gamma, performing appearance matching with the gamma in each track when new detections are entered, obtaining a minimum cosine distance, and distinguishing different appearance categories by setting a minimum cosine distance setting threshold value to 0.45.
6. The target speed detection method according to claim 1, wherein the acquiring the entry time and the exit time of the target into and out of the speed detection area includes:
and determining the time when the gravity center position of the target is positioned at the start line and the end line of the speed detection area as the entering time and the leaving time.
7. The target speed detection method according to claim 6, wherein the time difference is obtained by using a frame rate ratio in combination with the entry time and the exit time.
8. A target speed detection apparatus, characterized by comprising:
a continuous frame image acquisition unit for acquiring a plurality of continuous frame images contained in the video stream information; the video stream information is a video stream of an acquisition area acquired by image pickup equipment;
the target detection unit is used for detecting targets in the continuous frame images by utilizing a target detection network to obtain a target detection frame;
the target tracking unit is used for acquiring a tracking track by combining a target tracking network with the target detection frame;
an in-out speed detection area time acquisition unit for acquiring the in-out time and the out-out time of the target in-out speed detection area and acquiring the time difference between the in-out time and the out-out time; the speed detection area is a part of the acquisition area;
the actual track determining unit is used for acquiring an actual track contained in the tracking track in a time period contained in the entering time and the leaving time, and calculating and obtaining an actual track distance contained in the actual track;
and the actual speed calculation unit is used for calculating and obtaining the actual speed of the target by combining the actual track distance and the time difference by using priori knowledge.
9. A detection apparatus, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the target speed detection method of any one of claims 1-7 according to instructions in the program code.
10. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a program code for executing the target speed detection method according to any one of claims 1-7.
CN202310025801.2A 2023-01-09 2023-01-09 Target speed detection method, device, equipment and storage medium Pending CN115995053A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310025801.2A CN115995053A (en) 2023-01-09 2023-01-09 Target speed detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310025801.2A CN115995053A (en) 2023-01-09 2023-01-09 Target speed detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115995053A true CN115995053A (en) 2023-04-21

Family

ID=85989989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310025801.2A Pending CN115995053A (en) 2023-01-09 2023-01-09 Target speed detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115995053A (en)

Similar Documents

Publication Publication Date Title
Luber et al. People tracking in rgb-d data with on-line boosted target models
CN105745687B (en) Context aware Moving target detection
CN109829382B (en) Abnormal target early warning tracking system and method based on intelligent behavior characteristic analysis
Benedek 3D people surveillance on range data sequences of a rotating Lidar
CN111753797B (en) Vehicle speed measuring method based on video analysis
CN106934817B (en) Multi-attribute-based multi-target tracking method and device
Ferryman et al. Performance evaluation of crowd image analysis using the PETS2009 dataset
CN107657232B (en) Pedestrian intelligent identification method and system
CN113192105B (en) Method and device for indoor multi-person tracking and attitude measurement
CN110111565A (en) A kind of people's vehicle flowrate System and method for flowed down based on real-time video
CN113256690B (en) Pedestrian multi-target tracking method based on video monitoring
Lijun et al. Video-based crowd density estimation and prediction system for wide-area surveillance
Bui et al. Video-based traffic flow analysis for turning volume estimation at signalized intersections
Huang et al. A real-time and color-based computer vision for traffic monitoring system
CN113256731A (en) Target detection method and device based on monocular vision
Hammam et al. Real-time multiple spatiotemporal action localization and prediction approach using deep learning
CN113935395A (en) Training of object recognition neural networks
CN112836683A (en) License plate recognition method, device, equipment and medium for portable camera equipment
Guan et al. Multi-person tracking-by-detection with local particle filtering and global occlusion handling
Galor et al. Strong-TransCenter: Improved multi-object tracking based on transformers with dense representations
Jiao et al. Traffic behavior recognition from traffic videos under occlusion condition: a Kalman filter approach
CN109977796A (en) Trail current detection method and device
Kurilkin et al. A comparison of methods to detect people flow using video processing
CN115909219A (en) Scene change detection method and system based on video analysis
Mantini et al. Camera Tampering Detection using Generative Reference Model and Deep Learned Features.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination