CN114419565A

CN114419565A - Special vehicle operation collision early warning method and system based on YOLOv4

Info

Publication number: CN114419565A
Application number: CN202111598811.2A
Authority: CN
Inventors: 刘娜; 郭肖勇
Original assignee: Tianjin Shuimu Brothers Technology Co ltd
Current assignee: Tianjin Shuimu Brothers Technology Co ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-29

Abstract

The invention provides a special vehicle operation collision early warning method and system based on YOLOv4, comprising the following steps: training on a public data set by adopting a YOLOv4-tiny network to obtain a pre-training model; acquiring a monitoring video of special vehicle operation, intercepting image information from the monitoring video to form a data set, and dividing the data set into a training set and a verification set; forming a training model, and importing the training set into a pre-training model for training and fine tuning to obtain a human body detection model; loading a human body target detection model and a target tracking model of a worker, generating a reasoning engine, and carrying out target identification and distance detection; screening target identifications of workers in a threshold setting mode, customizing human targets and alarm distances of the human targets, and judging whether excessive approaching behaviors exist or not; and when the excessive approaching behavior is determined, identifying the specific human body and carrying out collision behavior early warning.

Description

Special vehicle operation collision early warning method and system based on YOLOv4

Technical Field

The invention relates to the technical field of computer vision, in particular to a special vehicle operation collision early warning method and system based on YOLOv 4.

Background

Every year, a great number of traffic accidents are caused by vehicle collisions and over speeds, especially for special vehicles, such as: forklifts, trucks, excavators, forklifts, cranes, transport vehicles inside mines, etc. The vehicle body is huge, the blind areas of the vehicles are large, people and small vehicles approaching the vehicle body cannot be aware of the blind areas, and the existing collision early warning system has the problems of high price, incapability of identifying human bodies and short detection distance. Therefore, aiming at a plurality of potential safety hazards of special vehicles, a set of collision early warning system which is low in cost, low in energy consumption, long in detection distance, high in accuracy and free of visual dead angles is urgently needed to be developed. The advent and development of intelligent video surveillance technology has become an effective means to solve this problem. The intelligent detection method is intelligently embodied in that useful information is automatically extracted from massive data, the content of the monitoring video is automatically analyzed and processed, and further, the target in the monitoring video is automatically detected.

Disclosure of Invention

The object of the present invention is to solve at least one of the technical drawbacks mentioned.

Therefore, the invention aims to provide a special vehicle operation collision early warning method and system based on YOLOv4, so as to solve the problems mentioned in the background technology and overcome the defects in the prior art.

In order to achieve the above object, an embodiment of the present invention provides a special vehicle operation collision warning method based on YOLOv4, including the following steps:

step S1, training on a public data set by adopting a YOLOv4-tiny network to obtain a pre-training model;

step S2, acquiring a surveillance video of special vehicle operation, intercepting image information from the surveillance video to form a data set, and dividing the data set into a training set and a verification set;

step S3, based on a YOLOv4-tiny network, the CIoU _ Loss is used as a YOLOv4-tiny network Loss function of a target detection task to form a training model, and the training set is led into a pre-training model to be trained and fine-tuned to obtain a human body detection model;

step S4, acquiring a human body detection model, realizing task parallelism through a plurality of unified computing equipment architecture CUDA streams, deploying an inference engine on a Jetson nano development board to form a heterogeneous system cooperative processing data stream, completing the whole calculation process through GPU and CPU heterogeneous parallelism, and loading a human body target detection model and a target tracking model of a worker by calling TensrT to generate an inference engine for target identification and distance detection;

s5, screening target identification of staff in a threshold setting mode, customizing human targets and alarm distances thereof, and judging whether excessive approaching behavior exists or not; and when the excessive approaching behavior is determined, identifying the specific human body and carrying out collision behavior early warning.

Preferably, in step S2, image information is obtained from the collected video surveillance data, YOLO format labeling of the human body target is completed by using a YOLO mark image labeling tool, and the data is expanded in a data augmentation manner to form a data set, which is divided into a training set and a verification set.

Preferably, according to any of the above schemes, the YOLOv4-tiny network loss function formula is:

wherein, a is a weight function,

v is a parameter that measures the uniformity of the aspect ratio,

wherein, weight and height are the original width and height of the picture, (xmin, ymin), (xmax, ymax) are the position information of the upper left corner and the position information of the lower right corner of the original sample bounding box, respectively, (x, y), (w, h) are the coordinate of the central point and the width and height after the target normalization, respectively;

a and B represent the center point positions of the predicted frame and the actual frame, respectively, ρ (A, B) is the Euclidean distance of the center point coordinates of the A frame and the B frame, and c is the diagonal distance of the minimum frame enclosing them.

In any of the above embodiments, preferably, in step S3, the human body detection model is a rectangular region coordinate where a human body is detected.

Preferably, in any of the above schemes, the target-to-vehicle distance is calculated by recognizing the coordinates of the rectangular region of the human body based on YOLOv4-tiny and Deep _ Sort.

Preferably, in step S5, the human target is determined when the similarity is greater than or equal to a preset similarity threshold, and the voice and image warnings are performed when the distances between the human target and the human target are respectively smaller than a plurality of preset distance thresholds.

The embodiment of the invention also provides a special vehicle operation collision early warning system based on YOLOv4, which comprises: an image acquisition module, a data exchange module, an intelligent analysis module and a display and alarm module, wherein,

the image acquisition module is used for acquiring a monitoring video of special vehicle operation, intercepting image information from the monitoring video and sending the image information to the intelligent analysis module through the data exchange module;

the data exchange module is used for realizing data and instruction communication transmission among the image acquisition module, the intelligent analysis module and the display and alarm module;

the intelligent analysis module is used for training on a public data set by adopting a YOLOv4-tiny network to obtain a pre-training model, forming a data set by image information from the image acquisition module, dividing the data set into a training set and a verification set, forming a training model by taking CIoU _ Loss as a YOLOv4-tiny network Loss function of a target detection task based on the YOLOv4-tiny network, and guiding the training set into the pre-training model for training and fine adjustment to obtain a human body detection model; the method comprises the steps of obtaining a human body detection model, realizing task parallelism through a plurality of unified computing device architecture CUDA streams, deploying an inference engine on a Jetson nano development board to form a heterogeneous system cooperative processing data stream, completing the whole computing process through GPU and CPU heterogeneous parallelism, and loading a human body target detection model and a target tracking model of a worker through calling TensorRT to generate the inference engine for target identification and distance detection; screening target identifications of workers in a threshold setting mode, customizing human targets and alarm distances of the human targets, and judging whether excessive approaching behaviors exist or not;

and the display and alarm module is used for identifying a specific human body and carrying out collision behavior early warning when the intelligent analysis module determines that excessive approaching behavior exists.

Preferably, in any of the above schemes, the image acquisition module adopts an industrial camera or other cameras supporting an rtsp format, and is composed of four cameras in different directions mounted on the special vehicle, and is configured to acquire monitoring information around the special vehicle during operation, and upload an image to the intelligent analysis module through the data exchange module.

Preferably, in any of the above schemes, the intelligent analysis module uses a GPU development board Jetson nano to load a human detection model finely tuned based on yollov 4-tiny, a target tracking model of Deep _ Sort, and the like by using TensorRT to generate an inference engine, and obtains correct human detection information through image information input by the image acquisition module; the Jetson nano sends the detection result to a raspberry group development board, and the distance from each human figure to the camera is estimated by the raspberry group development board; if the human body target distance is lower than the threshold value, a signal is sent to the display and alarm module to generate a light effect alarm and a sound alarm; if the human target distance is above the threshold, no signal will occur.

Preferably, in any one of the above schemes, the display and alarm module includes: when the intelligent analysis module finds that the distance between the human body targets is smaller than the threshold value, the intelligent analysis module sends a signal to the display and alarm module, the human body targets smaller than the threshold value can be displayed on the screen in a highlighting mode, and meanwhile the buzzer sends out an alarm to inform a driver that dangerous hidden dangers exist.

According to the special vehicle operation collision early warning method and system based on the YOLOv4, the used model selects the YOLOv4-tiny network with a good detection effect, and a network structure is improved by using a transfer learning method aiming at the problem of a small number of samples, so that the detection effect of the special vehicle operation collision early warning method and system in a complex environment is improved. And the TensorRT is used for realizing model optimization and deployment during model deployment, so that the model reasoning speed is accelerated. Meanwhile, a programming model based on CUDA (unified computing device architecture) is used in software design, multi-task parallel computing is achieved, and hardware computing resources are fully utilized.

Compared with the prior art, the invention has the advantages and beneficial effects that:

(1) and the accuracy is greatly improved by adopting a deep learning algorithm. The products on the market usually adopt the traditional image processing algorithm, and are low in accuracy, poor in robustness and easy to be influenced by the environment.

(2) The model can accurately detect 50 personal-shaped targets within 200 meters at most, and can ensure that the average accuracy average (mAP) of the model reaches over 90% and the intersection ratio (IOU) reaches over 85%.

(3) Parallel processing of multiple video streams and models is achieved using tensorrt and cuda. After optimization, four to eight paths of video streams can be processed on one Jetson nano development board, the processing speed of each path of video stream is about 10fps, and the operation risk is reduced.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a special vehicle operation collision warning method based on YOLOv4 according to an embodiment of the invention;

FIG. 2 is a graph of a comparison of pre-trained model accuracy according to an embodiment of the present invention;

FIGS. 3a and 3b are video image data for monitoring a special vehicle operation site according to an embodiment of the invention;

FIG. 4 is a diagram of cfg parameters for model tuning according to an embodiment of the invention;

FIG. 5 is a diagram of a Yolov4-tiny network structure training model according to an embodiment of the present invention;

FIG. 6 is a graph of a modeled mAP-Loss curve according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of the alarm system configuration according to an embodiment of the present invention;

fig. 8 is a structural diagram of a special vehicle operation collision warning system based on YOLOv4 according to an embodiment of the present invention;

fig. 9 is a diagram of a job scene detection result according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The embodiment of the invention provides a special vehicle operation collision early warning method and system based on YOLOv4, belongs to the technical field of computer vision, and mainly relates to deep learning target detection. The invention mainly relates to the steps of pre-training a model, preparing monitoring video data and picture information, training a fine-tuning pre-training model, obtaining an improved model, deploying hardware and the like. The software environment is a deep learning framework YOLOv4-tiny, and the operating system is Windows 10.

The invention provides a special vehicle operation collision early warning method based on YOLOv4-tiny, the human body identification and distance detection method is an effective technical means of collision early warning, can provide powerful support and guarantee for collision detection and crisis prevention in special vehicle operation, and solves a series of technical problems of special vehicle collision early warning.

As shown in fig. 1, the special vehicle operation collision early warning method based on YOLOv4 in the embodiment of the present invention includes the following steps:

and step S1, training on the public data set by adopting a YOLOv4-tiny network to obtain a pre-training model.

Firstly, pre-training a model by using an MS COCO data set through a YOLOv4-tiny framework to obtain a pre-training model, wherein the precision of the pre-training model is shown in FIG. 2.

And step S2, acquiring the monitoring video of the special vehicle operation, intercepting image information from the monitoring video to form a data set, and dividing the data set into a training set and a verification set.

Referring to fig. 3a and 3b, there are images of different scenes, respectively. The invention utilizes the vehicle-mounted camera to acquire the video monitoring data from the monitoring video of the actual special vehicle operation and acquire the video monitoring data around the vehicle during the operation of the special vehicle. Fig. 9 is a diagram of a job scene detection result according to an embodiment of the present invention. And intercepting image data of the monitoring video data, manually labeling the intercepted image data through YOLO-Mark software, and carrying out normalization processing to form a data set. The data set is then divided into a training set and a test set by the python program. The method comprises the steps of acquiring image information from collected video monitoring data, completing the YOLO format labeling of a human body target by using a YOLO mark image labeling tool, expanding the data in a data augmentation mode, forming a data set, and dividing the data set into a training set and a verification set.

In the step, aiming at the category of people frequently appearing in the monitoring video around the actual special vehicle, images of different scenes are selected, manually labeled, and data are expanded in a data augmentation mode to form a data set, and the data set is divided into a training set and a verification set.

And S3, based on the Yolov4-tiny network, taking the CIoU _ Loss as a Yolov4-tiny network Loss function of the target detection task to form a training model, and importing the training set into a pre-training model for training and fine adjustment to obtain the human body detection model.

In the embodiment of the invention, the coordinates of the rectangular area of the human body are detected based on the human body target detection model of the worker. And carrying out fine tuning training and verification on the acquired data set based on the YOLOv4-tiny pre-training model to obtain a human shape detection model around the vehicle body.

In the embodiment of the present invention, the YOLOv4-tiny network loss function formula is:

wherein, a is a weight function,

v is a parameter that measures the uniformity of the aspect ratio,

wherein, weight and height are the original width and height of the picture, (xmin, ymin), (xmax, ymax) are the upper left corner position information and the lower right corner position information of the original sample bounding box, respectively, and (x, y), (w, h) are the center point coordinate and the width and height after the target normalization, respectively.

The method comprises the following training steps:

(1) the monitoring video data is subjected to image data interception of 5548, a training set of 4439 and a testing set of 1109.

(2) And (3) on the basis of a COCO data set pre-training model, training on a single GPU with the model of GeForce RTX2080 Super and the video memory of 8GB by using the training set in the step (1) to obtain a final model. In this step, the network parameters are set as: the initial learning rate is 0.0013, the momentum is 0.949, the weight attenuation term is 0.0005, the random gradient descent algorithm is adopted for training, and the final model is obtained after 15000 iterations.

The following describes the training of the human detection model:

the method comprises the following steps: modifying a configuration file

(1) Newly creating an obj.names file under a dark net-master \ built \ dark \ x64\ data folder, wherein the file name can be modified according to the project situation, and if the file name can be renamed as follows: personv1. names. Modifying the content in the content into items needing to be detected, such as: person.

(2) Newly creating an obj.data file under a darknet-master \ built \ darknet \ x64\ data folder, wherein the file name can be modified according to the condition of a project, and if the file name can be renamed as follows: personv1. data. The following code is entered therein: class 1

train＝data/train_personv1.txt

valid＝data/val_personv1.txt

names＝data/personv1.names

backup＝backup/

Classes is the number of categories to be detected, train is a training set directory, vaild is a test set directory, names is a project name file position, and backup is a position where an output model file is located.

(3) And (4) computing anchors, and quickly computing anchors by typing in/dark net. exe detector calc _ anchors data/obj. data-num _ of _ clusterers 9-width 416-height 416 instructions in the windows powershell. The anchors used in the present invention are 6,20,11,39,16,60,27,94,46,139, 182, 396.

(4) The method is characterized by duplicating the directory of darknet-master \ built \ darknet \ x64\ cfg \ Yolov4-tiny to the directory of darknet-master \ built \ darknet \ x64, then renaming the directory to Yolov4-tiny-personv1.cfg, and referring to FIG. 4, setting the parameters as follows:

batch＝64

subdivisions＝16

momentum＝0.949

learning_rate＝0.0013

max_batches＝15000

steps＝12000,13500

parameters of anchors and the respective preceding constrained layer in the three yolo layers are modified simultaneously, wherein the modified mask is 0,1,2, 6,20,11,39,16,60,27,94,46,139, 182,396, classes is 1 in each yolo layer. Filters in the constraint layer before the last three yolo layers 18.

Step two: picture normalization

Reading the marked data, carrying out normalization processing on the marked information, and expanding a data set in a data augmentation mode; wherein, the normalization formula is:

Step three: training of models

The training of the model is mainly based on the Yolov4-tiny algorithm, and mainly uses CIOU _ Loss as a frame regression Loss function to form a training model, and a main structure diagram of a data set introduced into the training model for training and testing Yolov4-tiny is shown in FIG. 5. Wherein the loss function is formulated as:

wherein, (weight, height) is the original width and height of the picture, (xmin, ymin), (xmax, ymax) are the position information of the upper left corner and the position information of the lower right corner of the original sample bounding box, respectively, (x, y), (w, h) are the coordinates of the center point and the width and height after the target normalization, respectively.

The calculation formulas of a and v in the loss function are respectively:

in the invention, a Darknet deep learning frame is adopted to train the model, the system environment is windows10, the GPU is RTX2080 super video memory 8G, and the software environments are CUDA10.1, cuDNN7.6.5, opencv4.1 and Python 3.9. Wherein the training settings are as follows: the training sample number batch is 64 and the batch subdivisions is 16 for each iteration, so that the picture iteration of the training input comprises 4 groups, each group comprising 16 pictures. The impulse momentum is set to be 0.949, the learning rate is 0.0013, and the maximum iteration number is set to be 15000; the learning rate decay is set to be.1 when the number of iterations is 12000 and 13500, respectively.

The training can be started by inputting/darknet. exe detector train data/personv1.data Yolov4-tiny-personv1.cfg Yolov4-tiny. conv.137-map in Windows Powershell. In the training process, Average Precision (AP) and Mean Average Precision (MAP) are generally used as evaluation indexes of the target detection algorithm, and the two evaluation indexes give consideration to Precision (Precision, P) and Recall (Recall, R).

Since YOLOv4-tiny supports Loss and MAP visualization, the system automatically reads the precision and Loss value of each iteration from the trained feedback file and plots MAP line graphs and Loss curves. The training process diagram of the model is shown in fig. 6, and it can be seen from the diagram that the training time of the final model is 10 hours, the MAP of the model is 91%, and the Loss is 0.6.

Step S4, after the human body detection model is obtained, task parallelism is achieved through a plurality of unified computing device architecture CUDA streams, an inference engine is deployed on a Jetson nano development board to form a heterogeneous system cooperative processing data stream, the GPU and the CPU are used for achieving heterogeneous parallelism to complete the whole computing process, the human body target detection model and the target tracking model of a worker are loaded through calling TensrT, the inference engine is generated, and target identification and distance detection are conducted.

Specifically, task parallelism is realized by calling a plurality of CUDA (unified computing equipment architecture) flows, and the model is deployed in a vehicle-mounted alarm system consisting of Jetson nano; the model is optimized and run-time deployed using TensorRT. And loading a humanoid detection model after fine adjustment based on YOLOv4-tiny, a target tracking model of Deep _ Sort and the like to generate an inference engine, and respectively acquiring humanoid information and distance information.

In this step, when the similarity is greater than or equal to a preset similarity threshold, the human target is determined, and when the distances between the human target and the human target are respectively smaller than a plurality of preset distance thresholds, sound and image pre-warning are respectively performed.

Preferably, the human target is determined when the similarity is more than 75%, and the voice and image warning is performed when the distance between the human target is less than 15m, 10m and 5m, respectively.

Based on the multi-model recognition result, self-defining a collision early warning study and judgment rule, and recognizing a character target excessively close to the collision early warning study and judgment rule; and when the situation that the person is excessively close is determined, carrying out collision behavior early warning. The method provided by the invention can identify specific approaching individuals under the condition of multiple persons, can monitor multiple angles simultaneously, and has the advantages of high model accuracy and good collision early warning effect.

The embodiment of the invention also provides a special vehicle operation collision early warning system based on YOLOv4, which comprises: the device comprises an image acquisition module, a data exchange module, an intelligent analysis module and a display and alarm module. The human body detection model trained by the invention is mainly applied to an intelligent analysis module, and is deployed into an embedded system through a task parallel computing mode of a plurality of CUDA streams and model optimization and deployment functions of TensrT to serve as the intelligent analysis module. The specific connection is shown in fig. 7 and 8.

Specifically, the image acquisition module is used for acquiring a monitoring video of special vehicle operation, intercepting image information from the monitoring video, and sending the image information to the intelligent analysis module through the data exchange module.

In the embodiment of the invention, the image acquisition module adopts an industrial camera or other cameras supporting an rtsp format, consists of four cameras in different directions and mounted on the special vehicle, and is used for acquiring monitoring information around the special vehicle during operation and uploading the image to the intelligent analysis module through the data exchange module.

The data exchange module is used for realizing data and instruction communication transmission among the image acquisition module, the intelligent analysis module and the display and alarm module.

In the embodiment of the invention, a common network switch is adopted to realize data and instruction communication among all the modules.

The intelligent analysis module is used for training on a public data set by adopting a YOLOv4-tiny network to obtain a pre-training model, forming a data set by image information from the image acquisition module, dividing the data set into a training set and a verification set, forming the training model by taking CIoU _ Loss as a YOLOv4-tiny network Loss function of a target detection task based on the YOLOv4-tiny network, and importing the training set into the pre-training model for training and fine adjustment to obtain a human body detection model; the method comprises the steps of obtaining a human body detection model, realizing task parallelism through a plurality of unified computing device architecture CUDA streams, deploying an inference engine on a Jetson nano development board to form a heterogeneous system cooperative processing data stream, completing the whole computing process through GPU and CPU heterogeneous parallelism, and loading a human body target detection model and a target tracking model of a worker through calling TensorRT to generate the inference engine for target identification and distance detection; the target identification of the staff is screened in a threshold setting mode, the human target and the alarm distance of the human target are customized, and whether excessive approaching behavior exists is judged.

The intelligent analysis module consists of a Jetson nano development board and a heterogeneous system which are formed for cooperatively processing data streams, and the GPU and the CPU are heterogeneous and parallel to complete the whole calculation process.

The embedded GPU development board Jetson nano developed by Nvidia is used for deploying the deep learning model, and each development board can process 4-8 paths of video streams. And the raspberry pi 3B + development board is adopted to replace a common embedded industrial personal computer, so that more functions are realized, the calculation power is ensured, and the cost and the energy consumption are further reduced.

In the deployment of the model, firstly, parallel computing is realized in a mode of parallel execution of a plurality of CUDA streams, and hardware computing resources are fully utilized. And then, using TensorRT to realize model optimization and deployment, and accelerating the model reasoning speed. TensorRT may result in high throughput, low latency, and low device memory footprint by incorporating high-level APIs that abstract out specific hardware details and the implementation of optimized reasoning. The intelligent analysis module mainly comprises two parts, wherein one part is that a GPU development board Jetson nano uses TensorRT to load a human body detection model finely adjusted based on YOLOv4-tiny and a target tracking model of Deep _ Sort and the like to generate an inference engine, and correct human body detection information is obtained through image information input by an image acquisition module. And the other part of Jetson nano sends the detection result to a raspberry group development board, and the distance from each human figure to the camera is estimated by the raspberry group development board. If the human body target distance is lower than the threshold value, a signal is sent to the display and alarm module to generate a light effect alarm and a sound alarm; if the human target distance is above the threshold, no signal will occur.

In the embodiment of the invention, the display and alarm module mainly comprises a screen and a buzzer. When the intelligent analysis module finds that the distance between the human body targets is lower than the threshold value, the intelligent analysis module sends a signal to the display and alarm module, the human body targets lower than the threshold value can be highlighted on the screen, and meanwhile, the buzzer can give an alarm to inform a driver that dangerous hidden dangers exist.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

It will be understood by those skilled in the art that the present invention includes any combination of the summary and detailed description of the invention described above and those illustrated in the accompanying drawings, which is not intended to be limited to the details and which, for the sake of brevity of this description, does not describe every aspect which may be formed by such combination. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A special vehicle operation collision early warning method based on YOLOv4 is characterized by comprising the following steps:

2. The YOLOv 4-based collision warning method for special vehicle operation as claimed in claim 1, wherein in step S2, image information is obtained from the collected video surveillance data, YOLO mark of human body target is performed by using YOLO mark image marking tool, and data is augmented by data augmentation to form a data set, which is divided into a training set and a verification set.

3. The YOLOv 4-based special vehicle operation collision warning method as claimed in claim 1, wherein the YOLOv4-tiny network loss function formula is:

wherein, a is a weight function,

v is a parameter that measures the uniformity of the aspect ratio,

4. The YOLOv 4-based special vehicle operation collision warning method as claimed in claim 1, wherein in the step S3, the human body detection model is a rectangular region coordinate where a human body is detected.

5. The YOLOv 4-based special vehicle operation collision warning method as claimed in claim 4, wherein the target-to-vehicle distance is calculated by recognizing rectangular region coordinates of a human body based on YOLOv4-tiny and Deep _ Sort.

6. The YOLOv 4-based special vehicle operation collision warning method as claimed in claim 1, wherein in step S5, a human target is determined when the similarity is above a preset similarity threshold, and when the human target distance is less than a plurality of preset distance thresholds, respectively, sound and image warnings are made, respectively.

7. A special vehicle operation collision early warning system based on YOLOv4 is characterized by comprising: an image acquisition module, a data exchange module, an intelligent analysis module and a display and alarm module, wherein,

8. The YOLOv 4-based special vehicle operation collision warning system as claimed in claim 7, wherein the image acquisition module is an industrial camera or other camera supporting rtsp format, and is composed of four cameras with different directions mounted on the special vehicle, and is used for acquiring monitoring information around the special vehicle during operation and uploading the image to the intelligent analysis module through the data exchange module.

9. The YOLOv 4-based special vehicle operation collision early warning system as claimed in claim 7, wherein the intelligent analysis module employs a GPU development board Jetson nano to load a YOLOv 4-tiny-tuning-based human body detection model, a Deep _ Sort target tracking model, and the like to generate an inference engine by using TensorRT, and obtains correct human body detection information through image information input by the image acquisition module; the Jetson nano sends the detection result to a raspberry group development board, and the distance from each human figure to the camera is estimated by the raspberry group development board; if the human body target distance is lower than the threshold value, a signal is sent to the display and alarm module to generate a light effect alarm and a sound alarm; if the human target distance is above the threshold, no signal will occur.

10. The YOLOv 4-based special vehicle operation collision warning system of claim 7, wherein the display and warning module comprises: when the intelligent analysis module finds that the distance between the human body targets is smaller than the threshold value, the intelligent analysis module sends a signal to the display and alarm module, the human body targets smaller than the threshold value can be displayed on the screen in a highlighting mode, and meanwhile the buzzer sends out an alarm to inform a driver that dangerous hidden dangers exist.