CN113592825A

CN113592825A - YOLO algorithm-based real-time coal gangue detection method

Info

Publication number: CN113592825A
Application number: CN202110880702.3A
Authority: CN
Inventors: 郭永存; 刘普壮; 何磊; 王爽; 赵艳秋
Original assignee: Anhui University of Science and Technology
Current assignee: Anhui University of Science and Technology
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2021-11-02

Abstract

The invention provides a method for detecting coal gangue in real time based on a YOLO algorithm, and relates to the field of computer vision. The method for detecting the coal and gangue in real time based on the YOLO algorithm comprises the steps of obtaining a coal and gangue detection data set and expanding the data set; training the original Yolov4 network by adopting the data set manufactured in the step S1 to obtain the weight of original network training and the parameters of convolution kernels in each convolution layer; constructing an improved YOLOv4 network structure based on the complexity of the coal and gangue characteristics to obtain an improved YOLO algorithm suitable for coal and gangue target detection; and (3) performing transfer learning according to the weight of the original network training in the step (2), and training the improved Yolov4 algorithm by adopting the data set manufactured in the step (S1). According to the method for detecting the coal and gangue in real time based on the YOLO algorithm, the average precision is improved, the complexity of a model is reduced, the detection time is shortened, the number of model parameters is obviously reduced, the operation speed is faster, and the capability of detecting the moving coal and gangue in real time is improved.

Description

YOLO algorithm-based real-time coal gangue detection method

Technical Field

The invention relates to the technical field of computer vision, in particular to a method for detecting coal gangue in real time based on a YOLO algorithm.

Background

With the continuous development of deep learning technology in the field of real-time target detection, the real-time target detection by using the convolutional neural network has become practical. In the real-time target detection, that is, by finding the category of a target and the specific position of the target in a given video or scene, the target detection generally needs to solve two problems: classification of the target, location of the target. The type information and the position information of the target are fed back through target detection, and multi-target grabbing and separation can be realized through an actuating mechanism. Compared with the target detection of the traditional method, the target detection method based on deep learning can improve the detection efficiency and precision, and is already applied in the relevant industrial fields.

The coal gangue separation is an essential process in the coal mine production process, and is the most economic and effective technical approach for improving the use value of coal and reasonably utilizing coal resources, the gangue separation can reduce the washing cost, the coal quality is improved, and the enterprise benefit is improved. The research shows that the intelligent sorting robot for gangue and other non-coal sundries in the coal transportation process becomes an important subject of coal preparation production, the coal and gangue detection system is used as a part of the gangue selection robot, and the accuracy rate and the processing speed of target detection and the adaptability to the surrounding severe environment have great influence on the working performance of the robot. The reliable coal and gangue detection technology can ensure that the robot can stably work for a long time, thereby improving the production benefit. Therefore, the method for detecting the coal and gangue in real time based on the YOLO algorithm is provided, and can be applied to a coal and gangue separation field for detecting the coal and gangue in real time.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a method for detecting coal and gangue in real time based on a YOLO algorithm, which solves the problem that the accuracy rate and the processing speed of target detection and the adaptability to the severe environment around the target detection system of a coal and gangue detection system as a part of a gangue selecting robot have great influence on the working performance of the robot.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme: a method for detecting coal gangue in real time based on a YOLO algorithm comprises the following steps:

step S1: acquiring a coal and gangue detection data set, and expanding the coal and gangue detection data set by a data enhancement method to obtain a data set for training a model;

step S2: training the original Yolov4 network by adopting the data set manufactured in the step S1 to obtain the weight of original network training and the parameters of convolution kernels in each convolution layer;

step S3: constructing an improved YOLOv4 network structure based on the complexity of the coal and gangue characteristics to obtain an improved YOLO algorithm suitable for coal and gangue target detection;

step S4: performing transfer learning according to the weight of the original network training in the step 2, training the improved YOLOv4 algorithm by adopting the data set manufactured in the step S1, obtaining the training weight after training through screening, wherein the optimal training weight obtained after screening is set in the process of training the model, the number of iterations is set to 10000, one model can be stored every 1000 times of training, 10 models are counted, the optimal training model, namely the training weight is selected by analyzing and comparing the average accuracy and the detection rate of the 10 models, and the training weight is loaded into the coal gangue target detection network flow;

step S5: and inputting the coal and gangue images on the undetected moving belt into a coal and gangue target detection network flow based on an improved YOLOv4 algorithm, and outputting a coal and gangue target detection result.

Preferably, the improved YOLO algorithm network structure mainly includes a backbone network backbone, a neck network neck and a head, the backbone network backbone is CSPDarknet27 and is used for extracting image features and generating a feature map, the feature map extracted by the backbone network is further input into the neck network neck for further feature integration, and the head performs classification target detection based on the feature integration.

Preferably, the neck network hack includes an SPP module and a feature fusion module for spatial pyramid pooling, the SPP module is capable of receiving a feature map generated by extracting features from a CSPDarknet27 network, the pooled feature map is obtained by spatial pyramid pooling, and the pooled feature map and the feature map output by the CSPDarknet27 network structure are feature fused by the feature fusion module.

Preferably, the pooled feature map is obtained by performing feature fusion on a plurality of sets of feature maps generated by performing multi-scale maximum pooling on feature maps generated by the CSPDarknet27 network.

Preferably, the feature fusion module includes a first upsampling layer1, a second upsampling layer2, a first splicing layer3 and a second splicing layer4 which are sequentially connected according to a data flow direction, and performs feature fusion on low-level features and high-level features of a spatial pyramid pool by constructing a spatial pyramid pool with gradually deepened levels, the low-level features pay more attention to surface information and the high-level features pay more attention to deep semantic information, and when the deep-level features are continuously extracted from an image, the low-level feature information may be lost, so that the low-level features are transmitted to a high level through feature fusion, and feature loss is prevented.

Preferably, the head includes a first scale classifier and a second scale classifier, the first scale classifier is configured to receive the feature map output by the feature fusion module, and the second scale classifier is configured to receive the feature map output by the feature fusion module.

Preferably, in the step S4, a migration learning strategy is adopted when the improved YOLOv4 algorithm is trained, the migration learning is performed according to the weights trained by the original network in the step S2, the weights trained by the original network on the data set and the parameters of the convolution kernel in each convolution layer are used as references for the training of the improved YOLOv4 algorithm, and the strong skills, which have been learned by the original network for the gangue data set, are migrated to the improved algorithm.

Preferably, the electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory is used for storing a computer program; the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-8.

Preferably, a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 8.

(III) advantageous effects

According to the improved YOLOv4 algorithm for coal and gangue detection, the feature maps of three different scales output by an original network are changed into the feature maps of two different scales output by the original network, the feature maps are respectively 19 multiplied by 19 and 38 multiplied by 38, the average precision is improved, the complexity of a model is reduced, the detection time is shortened, the number of model parameters is obviously reduced, the operation speed is faster, and the capability of detecting moving coal and gangue in real time is improved.

Drawings

FIG. 1 is a flow chart of a target detection method of the modified YOLOv4 algorithm;

FIG. 2 is a network architecture diagram of the modified YOLOv4 algorithm;

FIG. 3 is a diagram of a spatial pyramid pooled SPP structure;

FIG. 4 is a diagram of a feature fusion module architecture;

fig. 5 is a diagram of the actual detection effect of the improved YOLOv4 algorithm.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

As shown in fig. 1-5, an embodiment of the invention provides a gangue target detection method based on an improved YOLOv4 algorithm,

as shown in fig. 1, a flowchart of a gangue target detection method of the improved YOLOv4 algorithm of the embodiment specifically includes:

step S1: and acquiring a coal and gangue image on the moving belt on the coal and gangue sorting site for manufacturing a coal and gangue detection data set, and expanding the coal and gangue detection data set by a data enhancement method to obtain a data set for training the model.

The coal and gangue detection data set is a data set manufactured by the coal and gangue detection data set, the coal and gangue data set is manufactured by obtaining coal and gangue images on a moving belt of a coal and gangue separation site, the total number of the images is 200, the images contain about 6000 targets of coal and gangue, and all the images can be marked with the area and the category information of the targets of the coal and the gangue by adopting labelimg software.

In the training process of the embodiment, the original coal and gangue data set can be subjected to data set expansion in a mosaic mode, four pictures are randomly extracted through mosaic data enhancement, and new images are obtained through splicing in random zooming, random cutting and random arrangement modes. In order to further reserve more image information, excessive image splicing and enhancing are not suitable, coal and gangue detection data are enhanced to 4 times of original data, 800 images are contained in the coal and gangue detection data, and coal and gangue targets are correspondingly expanded.

In the invention, it is easy to understand that the proportion of expanding and enhancing the coal and gangue detection data set can be adjusted according to the actual situation, for example, the proportion can be expanded to 8 times, 10 times and the like of the original data set.

In the invention, the label operation can be carried out on the sample data of the coal and gangue data set based on labelimg software, the detection target comprises two types of coal and gangue gap, the two types of targets are selected through a labelimg software frame, and the category information is marked, namely the category of the target area is represented as coal or gangue gap.

It should be noted that, in the data set provided in the above-mentioned embodiment of step S1, in other embodiments, the obtaining method and the labeling method of the sample set may be appropriately adjusted under the condition that a sufficiently effective sample set is ensured.

Step S2: and (5) training the original YOLOv4 network by adopting the data set produced in the step (S1) to obtain the weight of the original network training and the parameters of the convolution kernels in each convolution layer.

The data set created in step S1 is trained without modifying the original YOLOv4 network, and a gangue detection model of the original network is trained, so that the weights of the original network training, the parameter information of the convolution kernels in each convolution layer, and the like can be obtained.

Step S3: and constructing a lightweight YOLOv4 network structure based on the complexity of the coal and gangue characteristics to obtain an improved YOLO algorithm suitable for coal and gangue target detection.

The original YOLOv4 detection network has better advantages in detection speed and detection precision in each industrial application site, and the embodiment combines a plurality of characteristics of the coal and gangue target, further optimizes and simplifies the existing YOLOv4 network structure, constructs a light-weight YOLOv4 network structure, is used for detecting the coal and gangue target, obviously reduces the number of model parameters, and has quicker operation speed, thereby improving the detection effect of the coal and gangue target.

As shown in fig. 2, a network structure diagram of the modified YOLOv4 algorithm, the network structure of the modified YOLOv4 algorithm in this embodiment includes three modules, namely, a backbone network backbone, a neck network tack, and a head. Inputs in the network structure of the modified YOLOv4 algorithm represent Inputs, and the input in this embodiment is a picture of 608 × 608 size.

The backbone network backbone is CSPDarknet27 for extracting image features, and three feature maps with the sizes of 19 × 19, 38 × 38 and 76 × 76 are output by performing feature extraction on an input image through a CSPDarknet27 network structure; the feature map extracted by the backbone network is further input into a neck network neck for further feature integration, the neck network neck comprises an SPP module, a feature fusion module and the like, the feature map with the size of 19 multiplied by 19 is input into the SPP module, a pooling feature map is obtained through spatial pyramid pooling splicing, the pooling feature map and a feature map output by a CSPDarknet27 network structure are subjected to feature fusion, as shown in FIG. 3, an input image, namely the feature map output by the CSPDarknet27 network structure generates three feature maps through three different scales of pooling, and the three feature maps and the input image are subjected to feature fusion to generate an output feature map; the head multi-classifier module outputs fusion features of 19 x 19 and 38 x 38 scales based on the feature fusion module to perform classified target detection, so that a final coal and gangue target detection result is output.

Specifically, the detailed information of each module of the network structure of the improved YOLOv4 algorithm is as follows:

1. backbone network backbone

The backbone network backbone in this embodiment is CSPDarknet27, and the CSPDarknet27 network structure includes sequentially connected CBMs (convolutional layer Conv + Batch Normalization + activation function dash), a first cspras 1 module, a second cspras 2 module, a third cspras 2 module, a fourth cspras 4 module, and a fifth cspras 1 module;

CSPResn (n represents 1, 2, 3, etc.) module, representing the cascade of n Res _ unit modules, the above modules output the character diagram with the output size of 304 × 304 for the first CSPRes1 module, the character diagram with the output size of 152 × 152 for the second CSPRes2 module, the character diagram with the output size of 76 × 76 for the third CSPRes2 module, the character diagram with the output size of 38 × 38 for the fourth CSPRes4 module, and the character diagram with the output size of 19 × 19 for the fifth CSPRes1 module;

different from the prior art, on the basis of the existing YOLOv4 algorithm, the original backbone network CSPDarknet53 for feature extraction is further reduced into a CSPDarknet27 network structure, and three feature maps with different scales output by the original network are changed into feature maps with two different scales, namely 19 × 19 and 38 × 38.

2. Neck network tack

The hack module in this embodiment mainly includes a spatial pyramid pooling SPP module and a feature fusion module.

As shown in fig. 3, in the spatial pyramid pooled SPP structure and the spatial pyramid pooled SPP module, an input image is subjected to feature extraction via the CSPDarknet27 network to generate a feature map (19 × 19 × 512), the feature map is used as an input of the SPP structure, the feature map is subjected to multi-scale maximal pooling (multi-scale maximal pooling) via the pooling kernels of (5 × 5), (9 × 9) and (13 × 13) to generate three sets of feature maps, and feature fusion (concatenate) is performed to obtain a pooled feature map output (19 × 19 × 2048).

As shown in fig. 4, the feature fusion module structure includes a first upsampling layer1, a second upsampling layer2, a first splicing layer3 and a second splicing layer4 which are connected in sequence according to the data flow. The input image size 608 × 608 is subjected to deep feature extraction through a feature extraction network to generate a feature map layer1(38 × 38) and a feature map layer2(19 × 19), the layer2 is subjected to dimension reduction processing by adding 1 × 1 convolution to generate a layer3(19 × 19), and a predicted feature map Predict1 is generated on the basis of the layer 3; an upsampling operation is performed on the features above layer3 to generate a new feature layer 38 × 38 so as to have the same size as layer1, layer4(38 × 38) is generated by feature fusion of the generated new features 38 × 38 with layer1(38 × 38), and predicted feature map Predict2 is generated based on layer 4. By constructing a deep feature pyramid, feature fusion is carried out on the low-level features and the high-level features, and stronger semantic information and more accurate position information can be obtained;

3. head network head

The lightweight YOLO network output header head in this embodiment includes a first scale classifier and a second scale classifier, the first scale classifier is configured to receive a feature map with a size of 19 × 19 output by the feature fusion module, and the second scale classifier is configured to receive a feature map with a size of 38 × 38 output by the feature fusion module.

The input image is subjected to deepest level feature extraction and 19 x 19 feature map output through a feature extraction network CSPDarknet27, features are further integrated through a neck network tack and directly output to a first scale classifier for target detection, meanwhile, feature fusion splicing is carried out on the input image and a 38 x 38 feature map through 2 times of upsampling, and the spliced feature map is directly output to a second scale classifier for target detection.

The embodiment adopts the fused features of 19 × 19 and 38 × 38 scales to detect the pictures respectively, and each scale classifier adopts an independent logistic classifier. Taking the 19 × 19 feature map of the first scale classifier as an example, the input image is divided into 19 × 19 cells, if the center of the prediction frame of the detection target falls within a certain cell area, three prediction frames are output for each cell by the cell where the detection target is located, 19 × 19 × 3 ═ 1083 prediction frames are output in total, when the confidence of the detection target is greater than the threshold set by the network parameter, the three prediction frames generated in the cell are retained, and finally, the optimal boundary prediction frame is selected by the non-maximum suppression NMS.

Two scales are adopted to predict targets, which are determined according to the features of the coal and gangue targets, an input image is extracted to a deepest feature map 19 x 19 through a backbone extraction network CSPDarknet27, the feature map has a large perception field of view and is suitable for detecting large targets, the shallow feature map 38 x 38 has a medium-scale perception field of view and can be suitable for detecting medium-size objects, and other sorting processes can be adopted for small targets, so that the improved YOLOv4 algorithm adopted by the embodiment is suitable for the coal and gangue features, and has a good detection effect on the targets of all volumes.

Step S4: performing transfer learning according to the weight of the original network training in the step 2, training the improved YOLOv4 algorithm by adopting the data set manufactured in the step S1, obtaining the training weight after training through screening, setting the iteration times such as 10000 times in the process of training the model, storing one model every 1000 times of training, totaling 10 models, selecting the optimal training model, namely the training weight, by analyzing and comparing the average accuracy and the detection rate of the 10 models, and loading the training weight into the coal gangue target detection network flow.

In the training process, a transfer learning strategy is adopted, transfer learning is performed according to the weight trained by the original network in the step 2, the weight trained by the original network on the data set and the parameter of the convolution kernel in each convolution layer are used as the reference for training of the improved YOLOv4 algorithm, and the strong skills, which are learned by the original network aiming at the coal and gangue data set, are transferred to the improved algorithm, so that huge waste of time resources and calculation resources is avoided.

In the embodiment, in the training process, 20% of the data set is divided into the verification set, the remaining 80% of the data set is the training set, the sample images in the training set are input into the improved YOLOv4 algorithm by setting corresponding parameters, the iteration step number is set, and the optimal weight file for coal and gangue detection is obtained through training. The network training direction is continuously adjusted by adopting a frame loss function of the CIoU, whether the training achieves the expected effect is verified by calculating a map value of a verification set, the optimal weight file is selected by screening the trained weight file, and the optimal weight file is loaded into an improved YOLOv4 algorithm to serve as a model for detecting the coal and gangue.

In this embodiment, a two-scale training method is adopted, so that on one hand, the original network is lightened to increase the detection speed, but the detection precision is not lost, and the following strategy is adopted in the training process to ensure the effectiveness of the training:

1. setting batch to 16 and subdivisions to 8, namely, in the training process, 16 pictures are loaded into the memory at one time, and then forward propagation is completed in 8 times, 2 pictures at one time. After the forward propagation of 16 pictures, carrying out one backward propagation;

2. introducing a momentum parameter (momentum) of 0.949, wherein the momentum parameter influences the speed of gradient descending to an optimal value, introducing a weight attenuation regular term (decapay) of 0.0005, and introducing the weight attenuation regular term to prevent overfitting;

3. by adopting a Label Smoothing strategy and introducing a penalty factor, the model is not too accurate in the training process, and overfitting is prevented; the learning rate is attenuated by cosine annealing, and the learning rate is increased and decreased firstly;

4. by using the frame loss function based on the CIoU, the detection result of this embodiment will output the prediction frame of the target, and how much the overlap portion between the prediction frame and the real frame represents the quality of the model detection result. Frame regression iou (intersection over union) is generally used as an evaluation index in the target detection field, IoU represents the intersection ratio of a predicted frame and a real frame, but IoU as a loss function has an obvious defect and cannot accurately reflect the coincidence degree of the two frames. Therefore, the embodiment adopts ciou (complete interaction over union) as the border regression loss function.

5. Using the Dropout approach, i.e., randomly removing the neuron nodes in the network and all input and output edges connected to them with a probability p during the training phase, all neurons are in the active state during the testing phase, but reducing the activation value by a factor (1-p) to compensate for the activation dropped during training, in this way preventing overfitting;

6. and (2) setting multi-scale anchors, carrying out cluster analysis on the sizes of the coal and gangue data sets by adopting a k-means clustering method according to the size of the marker mania of the manufactured coal and gangue data sets, wherein the sizes of the anchors which are most suitable for the coal and gangue data sets are (44,63), (54,84), (58,57), (74,84), (76,66), (78,108) from small to large because a two-scale mode is adopted, and 3 anchors are used in each scale for 6 anchors.

Step S5: inputting the coal and gangue images on the undetected moving belt into a coal and gangue target detection network flow based on an improved YOLOv4 algorithm, and outputting coal and gangue target detection results including position information of coal and gangue target areas, category information corresponding to each target area and confidence degree of target detection.

And embedding the trained model into a coal and gangue sorting detection module, transmitting the obtained motion image to the detection model in real time for real-time detection, and outputting a coal and gangue target detection result in real time, wherein the coal and gangue target detection result comprises position information of a coal and gangue target area, category information corresponding to each target area, confidence coefficient of target detection and the like.

The training and detection flow chart of the improved YOLOv4 algorithm comprises four flows, namely image preprocessing, feature extraction, feature fusion and target detection; in the training stage, the size of a coal and gangue data set with known labels is unified to 608 multiplied by 608, images are input into a feature extraction network CSPDarknet27 to extract image features, the extracted features are further input into a two-scale feature fusion network, finally, a final detection result is output through non-maximum value inhibition NMS, the maximum iteration times are reached, and an optimal coal and gangue detection model is output; in the detection stage, the size of the coal gangue data set of the unknown label is unified to 608 multiplied by 608, and the detection is carried out through an optimal detection model to output a detection result.

As shown in fig. 5, the actual detection effect diagram of the improved YOLOv4 algorithm is that the image of the unknown label is input into the optimal detection model to output the detection result, and the network model positioning bounding box almost surrounds the gangue target and has high category probability, so that the problems of missed detection and false detection do not exist. The lightweight Yolov4-Light network model does not reduce the detection effect due to the simplified structure, the detection effect is equivalent to Yolov4, the problems of false detection and false detection do not exist, and the class probability is high.

It should be noted that, because a plurality of gangue targets exist in one frame of image, a target detection result output by the target detection network may include a plurality of target areas, and after detection, the effect of each target area appearing on the image to be classified is not specified in the present application, and different types of target areas and the like may be represented by frames with different colors.

Based on the existing YOLOv4 algorithm, the improved YOLOv4 algorithm for coal and gangue detection further reduces the original backbone network CSPDarknet53 for feature extraction into a CSPDarknet27 network structure, and changes feature maps of three different scales output by the original network into feature maps of two different scales, which are 19 × 19 and 38 × 38 respectively. The application provides a YOLO network improvement method for real-time detection of moving coal and gangue, which improves average precision, reduces complexity of a model, shortens detection time, remarkably reduces the number of model parameters, is faster in running speed, and has the capability of real-time detection of moving coal and gangue.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a portion of the sub-steps or stages of other steps.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims

1. A real-time detection method for coal and gangue comprises the following steps of constructing an improved YOLOv4 network structure: the improved YOLOv4 network structure comprises a backbone network backbone, a neck network neck and a head, wherein the backbone network backbone is CSPDarknet27 and is used for extracting image features and generating a feature map, the feature map extracted by the backbone network is further input into the neck network neck for further feature integration, and the head carries out classification target detection based on the integrated features.

2. The real-time coal gangue detection method as claimed in claim 1, wherein: the improved YOLOv4 network is subjected to two-scale training, training weights after training are obtained through screening, and the training weights are loaded into the improved YOLOv4 network process.

3. The real-time coal gangue detection method as claimed in claim 2, characterized in that: the improved YOLOv4 network adopts transfer learning during training to transfer learn the weight of the original YOLOv4 network.

4. The real-time coal gangue detection method as claimed in claim 1, wherein: the neck network tack comprises an SPP module and a feature fusion module, wherein the SPP module is used for spatial pyramid pooling, the SPP module can receive a CSPDarknet27 network extraction feature generation feature map, a pooling feature map is obtained through spatial pyramid pooling splicing, and feature fusion is carried out on the pooling feature map and a feature map output by a CSPDarknet27 network structure through the feature fusion module.

5. The real-time coal gangue detection method as claimed in claim 4, wherein: the pooling feature map is obtained by performing multi-scale maximum pooling on feature maps generated by a CSPDarknet27 network and performing feature fusion.

6. The real-time coal gangue detection method as claimed in claim 5, wherein: the feature fusion module comprises a first up-sampling layer1, a second up-sampling layer2, a first splicing layer3 and a second splicing layer4 which are sequentially connected according to the data flow direction, and features of low-layer features and high-layer features of the space pyramid pool are fused by constructing a space pyramid pool with gradually deepened layers.

7. The real-time coal gangue detection method as claimed in claim 1, wherein: the head comprises a first scale classifier and a second scale classifier, wherein the first scale classifier is used for receiving the feature map output by the feature fusion module, and the second scale classifier is used for receiving the feature map output by the feature fusion module.

8. The real-time coal gangue detection method as claimed in claim 9, wherein: the head adopts a first scale classifier and a second scale classifier, the first scale classifier is suitable for detecting a larger target, and the second scale classifier is suitable for detecting a medium-sized template.

9. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus; the memory is used for storing a computer program; the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-8.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 8.