CN117612074A

CN117612074A - Logistics safe driving detection method and device based on lightweight improved YOLOv5

Info

Publication number: CN117612074A
Application number: CN202310216193.3A
Authority: CN
Inventors: 赵集民; 张雪白
Original assignee: Xiamen Borui Intelligent Manufacturing Iot Technology Co ltd
Current assignee: Xiamen Borui Intelligent Manufacturing Iot Technology Co ltd
Priority date: 2023-03-08
Filing date: 2023-03-08
Publication date: 2024-02-27

Abstract

The invention discloses a logistics safe driving detection method and device based on lightweight improved YOLOv5, which are characterized in that a logistics image dataset is obtained through collection, and the logistics image dataset is marked to obtain training data; constructing a YOLO-Log model based on YOLOv5 improvement, wherein the YOLO-Log model comprises an input module, a feature extraction module, a feature fusion module and a prediction module, the input module is used for carrying out data enhancement on an input logistics image, a Ghost Bottleneck structure is adopted in the feature extraction module to replace a Bottleneck structure in a C3 module in YOLOv5, and a multi-scale feature fusion BiFPN structure is adopted in the feature fusion module to replace a PANet structure in YOLOv 5; training the YOLO-Log model by adopting training data to obtain a logistics target detection model; and acquiring a logistics image, inputting the logistics image into a logistics target detection model to obtain a target detection result, and performing safety pre-warning according to the target detection result so as to solve the shielding problem in a complex logistics scene and the problem of missed detection of a remote small target.

Description

Logistics safe driving detection method and device based on lightweight improved YOLOv5

Technical Field

The invention relates to the field of target detection, in particular to a logistics safe driving detection method and device based on lightweight improved YOLOv 5.

Background

In production life, safety precaution has received sufficient attention, especially in the commodity circulation field, because commodity circulation operation environment is complicated, fork truck accident frequently takes place, relates to personal safety, causes very big harm. The main reasons are as follows: the forklift driver often does not see pedestrians in front due to too high cargoes, and a rear vision blind area exists due to too high forklift body; the forklift driver is tired, inattentive and potentially dangerous and offensive to operate. Because of the potential safety hazard, the forklift needs safety precaution in the operation process. In the traditional safety early warning method, a base station and an audible and visual alarm are installed through a forklift, an operator wears a tag, and when the distance between the operator and the forklift approaches an early warning area, the audible and visual alarm on the forklift sends out an alarm signal. However, the safety precaution mode can not be used for carrying out safety precaution on pedestrians without the tag; meanwhile, the method lacks accuracy based on 360-degree measurement, and false alarm is carried out on pedestrians at the far side. With the development of deep learning, machine vision-based safety pre-warning becomes an important application of a target detection algorithm in complex logistics scenes. The forklift is loaded with a depth camera with low cost, and the position and the distance of the target are predicted in real time through the RGB-D image so as to achieve safety early warning. Safety early warning based on machine vision can effectively solve the defects of the traditional method and achieve the effect of auxiliary driving of a forklift.

There are two main methods for target detection: a one-stage process and a two-stage process. Classical two-stage methods include R-CNN, fast R-CNN. They are methods based on regional proposal generation, which are characterized by higher detection accuracy and slower detection speed. The second is a one-stage approach, which does not use the region proposal approach, but rather directly performs object classification and bounding box regression, such as YOLO series, SSD, retinaNet. These one-stage method based detectors can reach real-time requirements because they can directly generate class scores and bounding box coordinates, but their detection accuracy is lower than two-stage method based detectors. Although the object detection method based on one stage obtains excellent performance in equipment with sufficient computational power resources such as a GPU, the problems of high computational cost, low detection precision and the like exist in edge equipment and mobile terminal equipment with limited computational power resources. However, in the case of safe driving in complex logistics, real-time is of paramount importance. In the edge device or the mobile terminal device with insufficient computing power, YOLOv5 cannot meet the actual application scene because of a large amount of computing cost of a C3 module in a backup. Common Convolutional Neural Networks (CNNs) are very difficult to run on memory and computing resource constrained embedded devices.

Disclosure of Invention

The technical problems mentioned above are solved. The embodiment of the application aims to provide a logistics safe driving detection method and device based on lightweight improved YOLOv5, which solve the technical problems mentioned in the background art section, and provide a more effective feature fusion method and a lightweight residual structure, so that a target detection model can have good trade-off between accuracy and speed.

In a first aspect, the invention provides a logistics safe driving detection method based on lightweight improved YOLOv5, which comprises the following steps:

s1, collecting a logistics image data set, and labeling the logistics image data set to obtain training data;

s2, constructing a YOLO-Log model based on YOLOv5 improvement, wherein the YOLO-Log model comprises an input module, a feature extraction module, a feature fusion module and a prediction module, the input module is used for carrying out data enhancement on an input logistics image, the feature extraction module is a backbone network obtained by adopting a Ghost Bottleneck structure to replace a Bottleneck structure in a C3 module in YOLOv5, and the feature fusion module is a neck network obtained by adopting a multi-scale feature fusion BiFPN structure to replace a PANet structure in YOLOv 5;

s3, training the YOLO-Log model by adopting training data to obtain a logistics target detection model;

s4, acquiring a logistics image, inputting the logistics image into a logistics target detection model to obtain a target detection result, and carrying out safety early warning according to the target detection result.

Preferably, the step S2 uses a Ghost boltleck structure instead of the boltleck structure in the C3 module in YOLOv5, and specifically includes: replacing the Bottleneck structure in the C3 module with the step length of 1 in YOLOv5 with the Ghost Bottleneck structure with the step length of 1; the Ghost Bottleneck structure with step size 2 is substituted for the Bottleneck structure in the C3 module with step size 2 in YOLOv 5.

Preferably, a variable learning weight is added for each input in the multi-scale feature fusion BiFPN structure to learn information of different feature layers, fusion weights of different inputs are adjusted, and feature layers with different resolutions are fused in a weighted fusion mode.

Preferably, the multi-scale feature fusion bippn structure uses the following fusion formula:

wherein w is _i Representing the weight of the ith input, w _j Representing the weight of the j-th input, I _i Representing the i-th input, O is the fusion feature of the output, E is a non-zero positive integer, and each w is guaranteed by a ReLU function _i All satisfy w _i ≥0。

Preferably, the prediction module uses CIOU Loss as a Loss function of the bounding box.

Preferably, the data enhancement includes HSV color gamut enhancement, mosaic, mirror flip, and horizontal flip.

Preferably, the target detection result includes the position and coordinates of the logistics vehicle and/or the person, and in step S4, the safety precaution is performed according to the target detection result, which specifically includes:

calculating the distance between the logistics vehicle and the person in the logistics image according to the target detection result, and converting the distance into the distance between the logistics vehicle and the person in the Cartesian coordinate system;

the grade of the safety early warning area is distinguished according to the distance between the logistics vehicle and the person in the Cartesian coordinate system;

and carrying out safety early warning according to the grade of the safety early warning area.

In a second aspect, the present invention provides a logistic safe driving detection device based on lightweight improved YOLOv5, comprising:

the data collection module is configured to collect and obtain a logistics image data set, and label the logistics image data set to obtain training data;

the model construction module is configured to construct a YOLO-Log model based on YOLOv5 improvement, the YOLO-Log model comprises an input module, a feature extraction module, a feature fusion module and a prediction module, the input module is used for carrying out data enhancement on an input logistics image, the feature extraction module is a backbone network obtained by adopting a Ghost Bottleneck structure to replace a Bottleneck structure in a C3 module in YOLOv5, and the feature fusion module is a neck network obtained by adopting a multi-scale feature fusion BiFPN structure to replace a PANet structure in YOLOv 5;

the model training module is configured to train the YOLO-Log model by training data to obtain a logistics target detection model;

the detection module is configured to acquire a logistics image, input the logistics image into a logistics target detection model to obtain a target detection result, and perform safety early warning according to the target detection result.

In a third aspect, the present invention provides an electronic device comprising one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

Compared with the prior art, the invention has the following beneficial effects:

(1) According to the YOLO-Log model based on the YOLOv5 improvement, a Ghost Bottleneck structure is used in the model to replace a C3 structure in the original YOLOv5, and a multi-scale feature fusion BiFPN module is introduced, so that the model can extract various features with different scales, and objects in an image can be identified more accurately.

(2) According to the characteristics of complex logistics scene background, uneven light and the like in the logistics scene, the data set is subjected to data enhancement processing such as HSV color gamut enhancement, mosaic, mirror image overturning, horizontal overturning and the like, so that the model can better detect the light unevenness, cloudy days and sunny days of the outdoor environment and the indoor environment in the actual logistics environment, the adaptability of the logistics data set to the environment can be improved, and the model has better generalization in practical application.

(3) According to the YOLOv5 improvement-based model, model parameters are greatly reduced by modifying a convolution mode, so that the model is light and the accuracy of target detection in a complex logistics environment is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary device frame pattern to which an embodiment of the present application may be applied;

FIG. 2 is a schematic flow chart of a method for detecting safe driving of a logistics based on lightweight improved YOLOv5 according to an embodiment of the present application;

FIG. 3 is a sample schematic view of a logistic image dataset of a logistic safe driving detection method based on light-weight improved Yolov5 according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a YOLO-Log model of a logistics safe driving detection method based on lightweight improved YOLOv5 according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a Ghost Bottleneck structure of a lightweight modified Yolov 5-based logistic safe driving detection method according to an embodiment of the present application;

fig. 6 is a schematic diagram of a neck network of a logistics safe driving detection method based on light-weight improved YOLOv5 according to an embodiment of the present application, wherein 6 (a) is a structure of PANet, and 6 (b) is a structure of BiFPN;

FIG. 7 is a schematic diagram of target detection results of a light-weight improved Yolov 5-based logistic safe driving detection method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a light-weight improved YOLOv 5-based logistic safe driving detection device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer device suitable for use in implementing the electronic device of the embodiments of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 illustrates an exemplary device architecture 100 to which the lightweight modified YOLOv 5-based logistics safe driving detection method or the lightweight modified YOLOv 5-based logistics safe driving detection device of embodiments of the present application may be applied.

As shown in fig. 1, the apparatus architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various applications, such as a data processing class application, a file processing class application, and the like, may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablets, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a background data processing server processing files or data uploaded by the terminal devices 101, 102, 103. The background data processing server can process the acquired file or data to generate a processing result.

It should be noted that, the method for detecting the safe driving of the logistics based on the light-weight improved YOLOv5 provided in the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, 103, and accordingly, the device for detecting the safe driving of the logistics based on the light-weight improved YOLOv5 may be provided in the server 105, or may be provided in the terminal devices 101, 102, 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote location, the above-described apparatus architecture may not include a network, but only a server or terminal device.

Fig. 2 shows a logistic safe driving detection method based on light-weight improved YOLOv5 according to an embodiment of the present application, which includes the following steps:

s1, collecting and obtaining a logistics image data set, and labeling the logistics image data set to obtain training data.

Specifically, the lack of a published data set in logistic target detection has hampered the development of this field. The embodiment of the application collects and makes a Logistics target detection data set Logistics-3k as training data of a Logistics target detection model. The dataset mainly contains 2 categories of targets, namely forklift (4,572 numbers) and person (4,190 numbers), and 3342 logistics images. These logistic images were taken by Intel RealSense D435i depth camera and mobile end device, both 1920 x 1080 in size, as shown in fig. 3, and collected as logistic image dataset. In the embodiment of the application, the flow platform is used for marking the data of the Logistics image data set to obtain the Logistics target detection data set logics-3 k, and the Logistics target detection data set logics-3 k is processed according to the following steps: 2:2 dividing the data set, the verification set and the test set.

S2, constructing a YOLO-Log model based on YOLOv5 improvement, wherein the YOLO-Log model comprises an input module, a feature extraction module, a feature fusion module and a prediction module, the input module is used for carrying out data enhancement on an input logistics image, the feature extraction module is a backbone network obtained by adopting a Ghost Bottleneck structure to replace a Bottleneck structure in a C3 module in YOLOv5, and the feature fusion module is a neck network obtained by adopting a multi-scale feature fusion BiFPN structure to replace a PANet structure in YOLOv 5.

In particular embodiments, the data enhancements include HSV gamut enhancement, mosaic, mirror flip, horizontal flip.

In a specific embodiment, in step S2, a Ghost bolteleck structure is used instead of the bolteleck structure in the C3 module in YOLOv5, which specifically includes: replacing the Bottleneck structure in the C3 module with the step length of 1 in YOLOv5 with the Ghost Bottleneck structure with the step length of 1; the Ghost Bottleneck structure with step size 2 is substituted for the Bottleneck structure in the C3 module with step size 2 in YOLOv 5.

In a specific embodiment, a variable learning weight is added to each input in the multi-scale feature fusion BiFPN structure to learn information of different feature layers, fusion weights of different inputs are adjusted, and feature layers with different resolutions are fused in a weighted fusion mode.

In a specific embodiment, the multi-scale feature fusion bippn structure employs the following fusion formula:

In a specific embodiment, the prediction module uses CIOULoss as a Loss function of the bounding box, and the CIOULoss is optimized according to the overlapping area, the center point distance and the length-width ratio.

Specifically, the structure of the YOLO-Log model proposed in the embodiment of the present application is shown in fig. 4, and the YOLO-Log model is structurally divided into four modules based on YOLOv 5: the device comprises an input module, a feature extraction module, a feature fusion module and a prediction module.

Specifically, a better detection effect in a logistics scene can be achieved by selecting a proper data enhancement mode. According to the characteristics of complex background, uneven light and the like in a logistics scene, HSV color gamut enhancement, mosaic, mirror image overturning and horizontal overturning are carried out on a logistics image at an input module. In HSV color gamut enhancement, the phenomenon of picture overexposure caused by weather light is mainly treated, the color gamut of a logistics image can be enriched through HSV color gamut enhancement, and a logistics target detection model can be used for detecting better according to uneven light, cloudy days and sunny days of an outdoor environment and an indoor environment in an actual logistics environment. Mosaics is a Mosaic technology for synthesizing and compressing a plurality of pictures into one picture, and can enrich a data set and improve the detection capability of small targets in a logistics scene. The image overturning and the horizontal overturning are carried out by changing the proportion of random overturning, for example, 50% of images are subjected to image overturning and horizontal overturning, and the data enhancement operations such as image overturning and horizontal overturning can improve the adaptability of the logistics data set to the environment. Through the combination of the data enhancement operation and the characteristics of the Logistics scene, the manufactured Logistics target detection data set Logistics-3k has better generalization in practical application.

In the input module, in the embodiment of the present application, data enhancement is performed on the data set according to the characteristics of the logistics scene, firstly, the logistics image is compressed to 640×640 as input, and the on-line data enhancement operation is performed on the input logistics image in sequence: HSV gamut enhancement, mosaics, mirror flip, horizontal flip. The feature extraction module is done in the backbond of the model, and in the embodiments of the present application, a Ghost Bottleneck structure is used instead of the Bottleneck structure in the C3 module in original YOLOv 5. The feature fusion module is completed in a neck network, and a multi-scale feature fusion BiFPN module is introduced as the feature fusion module in the embodiment of the application. The BiFPN enables the model to extract various features with different scales by using a bidirectional connection and feature fusion algorithm, so that objects in the logistics image can be accurately identified. Finally, in terms of prediction, CIOULoss is used as a Loss function for the bounding box. CIOU Loss is optimized according to the overlapping area, the center point distance and the length-width ratio.

CIOU Loss is used to measure the difference between the predicted box and the true value, and its formula is as follows:

CIOU Loss＝1-IOU+DIOU+α*v；

where IOU is the intersection ratio between the predicted target and the real target, DIOU is a form of the distance intersection ratio, v is a term measuring the aspect ratio, and α is a balance factor. DIoU is calculated by the following formula: where d represents the Euclidean distance of the predicted and true frame center points, |d| is a normalized parameter such that +.>In (0, 1).

Finally, the prediction module outputs the position and coordinates of the object.

Specifically, the structure of reference 5,Ghost Bottleneck includes stacking two Ghost modules: a first Ghost module and a second Ghost module. The method comprises the steps that a first Ghost module in a Ghost Bottleneck structure with the step length of 1 uses batch normalization and a ReLu activation function after each layer of output, and the first Ghost module uses batch normalization after each layer of output and is connected with an input through a shortcut to obtain final output; the step length is 2 Ghost Bottleneck structure, the first Ghost module and the second Ghost module are connected through a step length 2 depth convolution based on the step length 1 Ghost Bottleneck structure, and each layer of output of the depth convolution is normalized in batches. Ghost Bottleneck consists mainly of two groups of Ghost modules, which, like Basic Residual Block in ResNet, integrate multiple Conv layers and shortcuts. The backbone network of the YOLO-Log model provided in the embodiment of the application is formed by a Ghost Bottleneck consisting of Ghost modules instead of four Bottleneck structures of a C3 module in YOLOv5, and model parameters are greatly reduced by modifying a convolution mode, so that the model is light.

Specifically, the feature fusion module in the YOLO-Log model of the embodiment of the present application adopts a multi-scale feature fusion structure, namely a weighted bi-directional feature pyramid network (BiFPN), and aims to fuse feature information under feature mapping with different resolutions. YOLOv5 was fused for features of different scales using PANet (Path Aggregation Network), its structure as 6 (a). The PANet realizes the path fusion of the feature layers with different scales by carrying out bidirectional propagation on the features with different scales.

Although PANet can effectively fuse different feature layers, it is still essentially a simple addition of different features. Different scale features are generated in training due to the different sizes of the detected objects in the different images. If the feature information is simply added up in a PANet, this will result in unequal contribution of different scale features of the same type to the fused feature. Features of large size will blend more into the model while features of small size will contribute less. In response to this problem, the embodiment of the present application introduces a weighted Bi-directional feature pyramid network (Bi-directional Feature Pyramid Network, biFPN) to improve the PANet, whose structure is shown in fig. 6 (b). The network introduces variable learning weights to learn information of different feature layers. The embodiment of the application uses a weighted fusion method to fuse the feature layers with different resolutions, adds a weight for each input, and adjusts the fusion weights of different inputs by the network. In one embodiment, in the fusion formula, e=0.0001, e is a minimum to avoid numerical instability.

And S3, training the YOLO-Log model by using training data to obtain a logistics target detection model.

Specifically, according to the computational power resources required by the logistics scene and the existing conditions of a laboratory, the experimental environment adopted by the embodiment of the application is based on a Windows 11 operating system, an i9-12900k CPU, a NVIDIA GEFORCE RTX4090 (24 GB video memory) processor and 64g of running memory. In this application, some experimental super parameters are set as follows: the training round number epoch was 200, the batch size was set to 80, the optimizer used SGD, and the learning rate (learning rate) was set to 0.0005, the weight decay (weight decay) was set to 0.0005, and the momentum (momentum) was set to 0.937.

In a specific embodiment, the target detection result includes the position and coordinates of the logistics vehicle and/or the person, and in step S4, the safety precaution is performed according to the target detection result, which specifically includes:

Specifically, taking the logistics forklift safety precaution as an example, whether the target is in a set safety precaution area or not is judged according to the information such as the position, the distance and the like of the front target, and whether safety precaution is carried out is judged. The safety early-warning area is mainly divided into a primary safety early-warning area and a secondary safety early-warning area.

When the logistics forklift and the person are in the first-level safety early warning area, although the vehicle and the person in front have a relatively short distance at the moment, the possibility of collision still exists, a warning signal needs to be sent to prompt the person in front to pay attention to avoid, meanwhile, the driver is also enabled to pay attention to the target in front, and the time preparation for deceleration is reserved. When the logistics forklift and the person are in the secondary safety early warning area, the person in front of the logistics forklift is likely to collide at the moment, the driver is required to be warned to brake immediately by sending a warning signal, and if the driver does not brake, the driver sends a braking signal to enable the forklift to brake forcefully, so that the person is prevented from colliding. According to practical engineering experience, the primary safety early warning area is within 5m in front and within 1.5m in front of the secondary safety early warning area under the condition that the logistics operation is not influenced as much as possible and the operation safety is ensured by considering the respective reaction time and relative speed of a driver and a person in front.

Specifically, the detection result of the YOLO-Log model in a complex logistics scene is shown in fig. 7, and the logistics scene has complex background, uneven light and a large number of shielding and far small targets as shown in the figure. The target frame in fig. 7 effectively demonstrates that the YOLO-Log model can detect complex targets in a logistic scene. Due to the fact that BiFPN can cross-layer integrate the advantages of different scale feature information, the YOLO-Log model provided by the embodiment of the application can effectively detect the position of the target, the obvious effect of the method in logistics target detection is proved visually, the method can be further applied to the logistics driving safety early warning process, the logistics driving safety problem existing in the logistics driving safety early warning process can be monitored in real time, the logistics safety is improved, and loss is reduced.

Further, classical algorithms in the currently most advanced YOLO series, including YOLOv3, YOLOv3-spp, YOLOv3-tiny, YOLOv4 and YOLOv5s algorithms, were selected for comparison with the YOLO-Log model presented in the examples of the present application. YOLOv5s is the latest lightweight version in YOLOv 5. The performance results of the different algorithms are shown in table 1. As a lightweight algorithm, the YOLO-Log model has obvious advantages. For example, the parameter for the YOLO-Log model is only 6,010,752, a 14.2% reduction in parameter compared to baseline YOLOv5 s; from the detection time, the YOLO series is the slowest 12.9ms and the fastest 8.1ms. Compared with the algorithms, the YOLO-Log model has good detection precision and detection speed. Although the YOLOv3-tiny is superior to the YOLO-Log model in detection time, the detection accuracy mAP is 10.1% lower than that of the YOLO-Log model, and the industrial requirement of a logistics scene cannot be met. BiFPN allows the YOLO-Log model to be weighted and fused with multi-scale features, and shielding targets in complex logistics can be effectively detected. This experiment clearly demonstrates the effectiveness of YOLO-Log in logistic target detection as proposed in the examples of the present application. The YOLO-Log model proposed by the embodiments of the present application verifies its validity under the logistic-3 k dataset. In the experiment, the detection precision of the YOLO-Log model reaches 91.4%, the model parameters are reduced by 14.2% compared with YOLOv5s, and the detection time is only 10.2ms. Compared with the similar algorithm, the YOLO-Log model is further improved in terms of model weight and accuracy of Logistics-3k Logistics data sets in complex Logistics environments.

TABLE 1

With further reference to fig. 8, as an implementation of the method shown in the foregoing drawings, the present application provides an embodiment of a physical distribution safe driving detection device based on light-weight improved YOLOv5, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be specifically applied to various electronic devices.

The embodiment of the application provides a commodity circulation safe driving detection device based on lightweight improves YOLOv5, includes:

the training data acquisition module 1 is configured to collect and obtain a logistics image data set, and perform data enhancement on the logistics image data set to obtain training data;

a model construction module 2 configured to construct a YOLO-Log model based on YOLOv5 improvement, the YOLO-Log model including an input module, a feature extraction module, a feature fusion module and a prediction module, wherein the feature extraction module is a backbone network obtained by adopting a Ghost Bottleneck structure to replace a Bottleneck structure in a C3 module in YOLOv5, and the feature fusion module is a neck network obtained by adopting a multi-scale feature fusion BiFPN structure to replace a PANet structure in YOLOv 5;

the model training module 3 is configured to train the YOLO-Log model by adopting training data to obtain a logistics target detection model;

and the detection module 4 is configured to acquire a logistics image, input the logistics image into a logistics target detection model, obtain a target detection result and perform safety early warning according to the target detection result.

Referring now to fig. 9, there is illustrated a schematic diagram of a computer apparatus 900 suitable for use in implementing an electronic device (e.g., a server or terminal device as illustrated in fig. 1) of an embodiment of the present application. The electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.

As shown in fig. 9, the computer apparatus 900 includes a Central Processing Unit (CPU) 901 and a Graphics Processor (GPU) 902, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 903 or a program loaded from a storage section 909 into a Random Access Memory (RAM) 904. In the RAM904, various programs and data required for the operation of the apparatus 900 are also stored. The CPU 901, GPU902, ROM 903, and RAM904 are connected to each other by a bus 905. An input/output (I/O) interface 906 is also connected to bus 905.

The following components are connected to the I/O interface 906: an input section 907 including a keyboard, a mouse, and the like; an output portion 908 including a speaker, such as a Liquid Crystal Display (LCD), or the like; a storage section 909 including a hard disk or the like; and a communication section 910 including a network interface card such as a LAN card, a modem, or the like. The communication section 910 performs communication processing via a network such as the internet. The drive 911 may also be connected to the I/O interface 906 as needed. A removable medium 912 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 911 so that a computer program read out therefrom is installed into the storage section 909 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 910, and/or installed from the removable medium 912. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 901 and a Graphics Processor (GPU) 902.

It should be noted that the computer readable medium described in the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or means, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments described in the present application may be implemented by software, or may be implemented by hardware. The described modules may also be provided in a processor.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: collecting and obtaining a logistics image data set, and labeling the logistics image data set to obtain training data; constructing a YOLO-Log model based on YOLOv5 improvement, wherein the YOLO-Log model comprises an input module, a feature extraction module, a feature fusion module and a prediction module, the input module is used for carrying out data enhancement on an input logistics image, the feature extraction module is a backbone network obtained by adopting a Ghost Bottleneck structure to replace a Bottleneck structure in a C3 module in YOLOv5, and the feature fusion module is a neck network obtained by adopting a multi-scale feature fusion BiFPN structure to replace a PANet structure in YOLOv 5; training the YOLO-Log model by adopting training data to obtain a logistics target detection model; and acquiring a logistics image, inputting the logistics image into a logistics target detection model to obtain a target detection result, and carrying out safety early warning according to the target detection result.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. The logistics safe driving detection method based on the lightweight improved YOLOv5 is characterized by comprising the following steps of:

s2, constructing a YOLO-Log model based on YOLOv5 improvement, wherein the YOLO-Log model comprises an input module, a feature extraction module, a feature fusion module and a prediction module, wherein the input module is used for carrying out data enhancement on an input logistics image, the feature extraction module is a backbone network obtained by adopting a Ghost Bottleneck structure to replace a Bottleneck structure in a C3 module in YOLOv5, and the feature fusion module is a neck network obtained by adopting a multi-scale feature fusion BiFPN structure to replace a PANet structure in YOLOv 5;

s3, training the YOLO-Log model by using the training data to obtain a logistics target detection model;

s4, acquiring a logistics image, inputting the logistics image into the logistics target detection model to obtain a target detection result, and carrying out safety early warning according to the target detection result.

2. The method for detecting the safe driving of the logistics based on the lightweight improved YOLOv5 according to claim 1, wherein the step S2 adopts a Ghost Bottleneck structure instead of the Bottleneck structure in the C3 module in YOLOv5, and specifically comprises the following steps: replacing the Ghost Bottleneck structure with the step length of 1 with the Bottleneck structure in the C3 module with the step length of 1 in YOLOv 5; replacing the Ghost Bottleneck structure with step size 2 with the Bottleneck structure in the C3 module with step size 2 in Yolov 5.

3. The logistic safe driving detection method based on lightweight improved YOLOv5 of claim 1, wherein a variable learning weight is added to each input in the multi-scale feature fusion bippn structure to learn information of different feature layers, fusion weights of different inputs are adjusted, and feature layers with different resolutions are fused in a weighted fusion mode.

4. The logistic safe driving detection method based on light-weight improved YOLOv5 of claim 3, wherein the multi-scale feature fusion bippn structure adopts the following fusion formula:

5. The method for detecting the safe driving of the logistics based on the lightweight improved YOLOv5, according to claim 1, wherein the prediction module adopts CIOU Loss as a Loss function of a boundary box.

6. The lightweight improved YOLOv 5-based logistic safe driving detection method of claim 1, wherein the data enhancement comprises HSV gamut enhancement, mosaic, mirror flip, horizontal flip.

7. The method for detecting the safe driving of the logistics based on the lightweight improved YOLOv5 according to claim 1, wherein the target detection result comprises the position and the coordinates of the logistics vehicle and/or the person, and the step S4 of performing the safety precaution according to the target detection result specifically comprises:

calculating the distance between the logistics vehicle and the person in the logistics image according to the target detection result, and converting the distance into the distance between the logistics vehicle and the person in a Cartesian coordinate system;

8. Logistics safe driving detection device based on lightweight improvement YOLOv5, characterized by comprising:

the model construction module is configured to construct a YOLO-Log model based on YOLOv5 improvement, and comprises an input module, a feature extraction module, a feature fusion module and a prediction module, wherein the input module is used for carrying out data enhancement on an input logistics image, the feature extraction module is a backbone network obtained by adopting a Ghost Bottleneck structure to replace a Bottleneck structure in a C3 module in YOLOv5, and the feature fusion module is a neck network obtained by adopting a multi-scale feature fusion BiFPN structure to replace a PANet structure in YOLOv 5;

the model training module is configured to train the YOLO-Log model by adopting the training data to obtain a logistics target detection model;

the detection module is configured to acquire a logistics image, input the logistics image into the logistics target detection model to obtain a target detection result, and perform safety precaution according to the target detection result.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.