CN113158829A

CN113158829A - Deep learning ore size measuring method and early warning system based on EfficientDet network

Info

Publication number: CN113158829A
Application number: CN202110343840.8A
Authority: CN
Inventors: 段章领; 周行云; 盛一帆; 朱明杰; 徐岳; 杨富超; 胡倩凝; 汪志敏; 马腾; 张馨雨; 周明祎; 熊天乐; 潘悦靓
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-07-23

Abstract

The invention discloses a deep learning ore measuring method based on an EfficientDet network and an application system, wherein the method comprises the following steps: acquiring an ore image on a flowing belt through a high frame rate camera; preprocessing an ore image: marking all ores in the image, removing abnormal data, and performing data enhancement on the image; the labeled samples were weighed as 7: 2: 1, dividing the training set into a training set, a verification set and a test set; training by using an EfficientDet network to obtain a network model; positioning the ore discharging position by using a prediction frame obtained by an EfficientDet network model during testing, and calculating the size of the ore through the focal length of a camera and the size of an image pixel; and sending out a prompt when the ore size exceeds the system threshold value according to the preset threshold value of the system. The invention can efficiently detect the size of the ore, uses fewer parameters and has higher detection speed compared with other network models, thereby greatly reducing the dependence on manpower.

Description

Deep learning ore size measuring method and early warning system based on EfficientDet network

Technical Field

The invention relates to an image target identification method, in particular to a deep learning ore size measuring method and an early warning system based on an EfficientDet network.

Background

Mineral resources refer to aggregates which have industrial values for the contents of minerals or useful elements buried underground or exposed on the surface of the earth through geological mineralization. Mineral resources are important natural resources and important material bases for social production development, and the mineral resources cannot be separated from the production and the life of people in the modern society. Mineral resources are non-renewable resources, and their reserves are limited. The extent and depth of utilization of mineral resources are to be increased. According to their characteristics and uses, they are generally classified into three categories, namely metal minerals, non-metal minerals and energy minerals. The metal ore is mined in an explosion mode, and when uneven ore is conveyed to a primary crusher through a belt conveyor, if the size of the ore is too large, the crusher body is easy to damage. At present, the main manual work is used for monitoring and screening the large ores so as to reduce the damage to the crusher body. The manual screening problem mainly has two points: firstly, the safety problem is that the lifting well belt conveyor has a severe environment and serious dust and noise, which damages the body of workers for a long time, and secondly, the manual monitoring causes visual fatigue and easy omission condition when working for a long time, which causes damage to the crusher body. Some developed countries abroad have already applied the relevant knowledge of machine vision to the mining of ores to improve the efficiency, while the relevant knowledge of machine vision is used less at present in China. This patent adopts the relevant knowledge of degree of depth learning field machine vision, constructs the EfficientDet detection network, realizes the real-time detection of all ore blocks, fixes a position its, discerns its size to carry out real-time early warning when meetting the ore block that exceeds system threshold value size, inform the relevant condition of belt control system, make the belt in time stop. Meanwhile, the abnormal signal is transmitted to the actuating mechanism, and the actuating mechanism pulls out the large ore. The manual work use has been reduced in a large number, has improved detection efficiency when guaranteeing workman's safety, greatly reduces the condition of lou examining. The model efficiency is very important in computer vision, and compared with the prior deep learning network such as Mask R-CNN and the like, the EfficientDet network can achieve higher precision by using fewer parameters and lower computation amount so as to improve the detection efficiency.

In conclusion, the traditional ore size detection needs a large amount of manual work to carry out the manual operation problem, the detection parameter quantity of the model based on the neural network is too large, and the technical problem of strong calculation power is needed.

Disclosure of Invention

For the problems in the prior art, the method for measuring the ore size through deep learning based on the EfficientDet network and the early warning system are provided, and the technical problems that manual work is greatly relied on in the ore size detection, the efficiency is low, the accuracy is low, the model parameter quantity is overlarge, and strong calculation force is needed are solved. The method specifically comprises the following steps: capturing video streams of ore transported on a belt through a high frame rate camera, converting the video streams into pictures, manually marking the images, and removing abnormal ore pictures; and (3) pressing the obtained ore picture according to the following steps of 7: 2: 1, dividing the picture into a training picture, a verification picture and a test picture; the method comprises the steps of carrying out data enhancement on an ore picture to improve generalization capability, using an EfficientNet network as a backbone, creating an EfficientDet target detection network, using a training data set to obtain a network model, using the trained EfficientDet network model for testing, using an obtained prediction frame for positioning the position of ore discharge and calculating the size of the ore through the focal length of a camera and the size of image pixels. According to a preset threshold value of the system, when the ore size exceeds the threshold value of the system, an early warning is sent out, meanwhile, an abnormal signal is transmitted to an execution mechanism, and the execution mechanism dials out a large ore.

The invention adopts the following technical scheme to solve the technical problems: a deep learning ore size measuring method and an early warning system based on an EfficientDet network are used for detecting the size of ore in the mining industry and intelligently processing ore blocks exceeding a threshold size, and the method comprises the following specific steps:

s1, data acquisition stage: shooting ores on a flowing belt through a high-speed camera, and storing key frames of a video stream as pictures;

s2, data preprocessing stage: manually labeling the image, detecting and removing abnormal ore pictures; and (3) pressing the obtained ore picture according to the following steps of 7: 2: 1, dividing the picture into a training picture, a verification picture and a test picture; enhancing the data of the ore picture to improve the generalization ability;

s3, network creating and training stage: using an EffectientNet network as a backbone network, using a BiFPN (bi-directional feature pyramid network) as a feature extraction network, selecting one of EffectientD 0-D7 to create an EffectientDet target detection network, setting parameters such as learning rate, Batch size, training round number, optimizer and the like, and obtaining a network model by using an ore training data set;

s4, testing: testing by using the trained EfficientDet network model, using the obtained prediction frame for positioning the position of ore discharge and calculating the size of the ore through the focal length of the camera and the size of the image;

s5, reminding: according to a preset threshold value of the system, when the ore size exceeds the threshold value of the system, an early warning is sent out, meanwhile, an abnormal signal is transmitted to an execution mechanism, and the execution mechanism dials out a large ore.

The above scheme is further described:

the data acquisition in step S1 includes the following steps:

(1) acquiring ore picture video streams through a plurality of high frame rate cameras with different angles, which are arranged near a belt;

(2) intercepting key frames from the acquired picture video stream as image data;

the data preprocessing in the step S2 includes the following steps:

(i) marking the position and the size of the ore in the obtained picture data by using an image marking tool Lableme, wherein the size of the ore is irregular and needs to be marked in a polygonal manner;

(ii) detecting ore picture data which do not meet the requirements, and rejecting the ore picture data; the method mainly comprises the steps of detecting whether marked data exceed the abnormal condition of the boundary of a picture or not and the condition that the coordinate position is reversed;

(iii) performing data enhancement on the training picture: randomly carrying out horizontal and vertical reversal at different angles on the ore picture; zooming the ore picture; adjusting pixel values of the image so that the values thereof become uniformly distributed through histogram equalization; adding random noise; converting the image from an RGB color space to an HSV color space, adjusting the brightness of the image, normalizing the image, and processing noise; therefore, the generalization capability of the network is improved.

The step S3 of creating and training a network stage is composed of the following steps:

using an EfficientNet network as a backbone network, fusing network characteristics extracted by the EfficientNet by adopting a BiFPN network, and classifying and performing regression prediction on the extracted characteristics by the Head of the EfficientDet;

(II) the entire EfficientNet has multiple versions of B0-B7, wherein EfficientNet-B0 consists of 1 Conv (3 × 3), 1 MBConv1(3 × 3), 2 MBConv6(3 × 3), 2 MBConv6(5 × 5), 3 MBConv6(3 × 3), 3 MBConv6(5 × 5), 4 MBConv6(5 × 5), one MBConv6(3 × 3), one Conv (1 × 1), one Pooling layer, and one FC layer. Wherein MBConv contains residual structures. The dimension increasing operation is performed by using the convolution of 1 × 1, then the convolution of 3 × 3 or 5 × 5 is performed, then the attention mechanism about the channel is added, and the dimension reducing operation is performed by using the convolution of 1 × 1, and then the residual structure is stacked. The activation function of MBConv uses the Swish function and is normalized using Batch Normalization;

the definition of Swish function is

Wherein

Is a constant or trainable parameter;

the function is expressed as follows:

meanwhile, the EfficientNet-B0 uses the coefficients of the width (depth), the depth (width) and the resolution (resolution) of the network according to the formula X (TODO)

And performing composite scaling. Where α, β, γ are constants that can be determined by grid search. In that

And

under the constraint of (3), the optimum value of EfficientNet-B0 is

=1.2，

=1.1，

=1.15。

The step S3 of creating a BiFPN includes the following steps:

(A) the BiFPN network can learn the importance from different input features while repeatedly applying top-down and bottom-up multi-size feature fusion. The enhanced extraction network is composed of a plurality of BiFPNs, wherein the EfficientDet-D0-D7 are respectively composed of 3, 4, 5, 6, 7, 8 and 8 BiFPNs;

(B) the EfficientDet comprises an EfficientNet backbone extraction network and a BiFPN enhanced extraction network, and the EfficientHead converts the extracted features into a prediction result. Firstly, the EfficientNet continuously performs downsampling on an input picture, the downsampling frequency of the original EfficientNet is 5 times, and the input picture can be obtained through the EfficientNet

、

，

Is the result of compressing the input picture length and width once,

is the result of two compressions of the input picture length and width,

is the result of three compressions of the input picture length and width, and so on. Due to the fact that

And

does not have high semantic information, so the method is not used in the reinforced extraction network BiFPN

、

Has high semantic information, and is used in the reinforced extraction network BiFPN as three of 5 effective feature layers. With two downsamplings of P5 to obtain information with higher semantics

、

. Obtaining 5 effective characteristic layers

、

；

(C) Transmitting the 5 feature layers into an enhanced extraction network BiFPN for further feature extraction, wherein

Representing resolution as an input image

The feature level of (c). For example, if the input resolution is 640x640,

represents a characteristic level 3(640 ^) with a resolution of 80x80

= 80), and

represents a 7-level feature level representing a resolution of 5x 5; the method comprises the following specific steps:

firstly, adjusting the number of channels to obtain

，

，

，

，

If entering BiFPN for the first time, will

By reducing the number of channels

_1 and

_2，

by reducing the passage to

_1 and

_2；

is obtaining

、

_1、

_2、

_1、

_2、

、

Then need to be paired

Performing upsampling, and performing upsampling

Using attention mechanism to judge whether it is more concerned

Or also

Then activated by swish function and convolved to obtain

；

③ pair

Performing upsampling, and performing upsampling

1 adopts an attention mechanism to judge whether the attention is more focused

Or also

1, then activated by swish function, and then convolved to obtain

；

Fourthly, to

Performing upsampling, and performing upsampling

1 adopts an attention mechanism to judge whether the attention is more focused

Or also

1, then activated by swish function, and then convolved to obtain

；

Fifthly, to

Performing upsampling, and performing upsampling

Using attention mechanism to judge whether it is more concerned

Or also

Then activated by swish function and convolved to obtain

；

Is obtained by

、

、

_2、

、

_2、

、

、

Then, need to make

Down-sampling, after down-sampling and

、

2 adopts an attention mechanism to judge whether the attention is more focused

、

Or also

2, then activated by swish function, and then convolved to obtain

；

Is to

Down-sampling, after down-sampling and

、

2 adopts an attention mechanism to judge whether the attention is more focused

、

Or also

2, then activated by swish function, and then convolved to obtain

(ii) a Then to

Down-sampling, after down-sampling and

、

using attention mechanism to judge whether it is more concerned

、

Or also

Then activated by swish function and convolved to obtain

(ii) a Then to

Down-sampling, after down-sampling and

using attention mechanism to judge whether it is more concerned

Whether or not

Then activated by swish function and convolved to obtain

；

Will obtain

、

、

、

、

As

、

、

、

、

The previous steps are repeated for stacking, and the EffiicientDet-B0 is repeated for 2 times, wherein the steps are repeated

_1 and

the _2need not be separated,

_1 and

nor does it need to be separated 2. The above fusion features can be briefly described at layer 6 as:

wherein

Represents the first

The characteristics of the layer, wherein

Is an intermediate feature of the 6 th stage in the top-down path, and

is the output characteristic of stage 6 in the bottom-up path;

when fusing features of different resolutions, one common approach is to first adjust them to the same resolution and then sum them, the previous approach treats all input features identically;

since different input features have different resolutions, their contributions to the output features are usually unequal;

BiFPN adds extra weight to each input and lets the network know the importance of each input feature. BiFPN was fused using Fast normalized fusion:

is a learnable weight that can be a scalar (per feature), vector (per channel), or multidimensional tensor (per pixel). In order to avoid the instability of the values,

is set to a smaller value

=0.0001；

The width and depth of the BiFPN are scaled using the following formula:

wherein 1.35 is used as BiFPN width scaling factor,

is a complex coefficient that controls all other scale dimensions. The width of the prediction network is the same as BiFPN:

the depth of the prediction network is linearly increased using the equation:

the step S3 is to construct a loss function, which includes the following steps:

(P1) calculating the difference between the network result and the true value using the following loss function:

wherein

Is the loss of the class of the object,

is the regression loss.

A Smooth-L1 loss function was used. The Smooth-L1 loss function is as follows:

(P2) an image has a plurality of candidate boxes, wherein positive samples are included in the image and negative samples are not included in the image. If the probability that the sample I belongs to the class I is 0.9, and the probability that the sample II belongs to the class I is 0.6, the former is a sample which is easy to classify, and the latter is a sample which is difficult to classify. The class of loss in EfficientDet is Focal loss. The Focal length can control the weight of positive and negative samples, and can control the weight of samples which are easy to classify and difficult to classify;

(P3) Focal loss is derived from the cross-entropy loss function, the cross-entropy function loss for the two classes is:

use of

Simplified cross entropy loss function:

the weight of positive and negative samples is controlled, and a coefficient can be added before the cross entropy loss function

：

Controlling the weight of easy-to-classify and difficult-to-classify samples;

referred to as modulation factor. When gamma =0, focal length is the cross entropy loss function, which can be adjusted

A change of the modulation factor is effected. Two kinds of weights are combined to obtain:

wherein

= 0.25 and

= 1.5。

the testing stage in the step S4 is composed of the following steps:

(Q1) in order to distinguish from the common feature layer, the feature layer extracted by BiFPN is called as an effective feature layer, and the five effective feature layers are transmitted into ClassNet (classification prediction network) and BoxNet (regression prediction network) to obtain a prediction result;

(Q2) for Efficientdet-B0, ClassNet takes 3 convolutions of the depth separable convolution of 64 channels and 1 convolution of the number of a priori boxes that the feature layer possesses multiplied by how many classes of objects the network has in common, the number of a priori boxes being by default 9. BoxNet takes 3 convolutions of 64 channels and 1 convolution of the number of prior frames of the feature layer multiplied by four, wherein four refers to the adjustment condition of the prior frames, and the central position and the width and the height are adjusted. The EfficientDet defaults to have 9 prior frames, and the length-width ratio of the prior frames can be adjusted according to actual conditions so as to be suitable for a detected target. The EfficientDet judges the objects in the prior frame and the types of the objects, adjusts the prior frame, screens out the frame which belongs to the same type in a certain area and has the maximum confidence coefficient by using non-maximum suppression (soft-NMS), and obtains a final prediction frame;

(Q3) transmitting a reserved ore test picture into an EfficientDet network for prediction to obtain a prediction frame of ore in the picture, calculating the size of the prediction frame, and calculating the size of the real ore according to the size of the prediction frame and an error;

(Q4) calculating the real size of the camera according to the installation angle of the camera, the distance from the camera to the belt, the size of the network model prediction frame and the pixels of the image:

is the included angle between the camera and the vertical line of the belt,

which represents the true size of the ore,

indicating the vertical distance between the camera and the belt,

which represents the focal length of the camera head,

the size of the EfficientDet network model prediction box is shown.

The reminding stage in the step S5 comprises the following steps:

(T1) setting a system threshold value as the sum of the alarm threshold value and the error threshold value of the ore size, and comparing the calculated real size of the ore with the system threshold value;

(T2) when the ore size exceeds the system threshold value, an early warning is given out, meanwhile, an abnormal signal is transmitted to an execution mechanism, and the execution mechanism dials out the large ore.

The early warning system comprises the following parts:

(D1) the user module is responsible for registration, login and management of users, displays personal user information and provides management authority of super users;

(D2) the ore video stream acquisition module is used for acquiring ore video streams through cameras arranged at various angles near the ore flow belt and sending acquired images to the real-time early warning module;

(D3) and the real-time early warning module is used for receiving the ore image sent by the ore video stream acquisition module, transmitting the ore image into a trained EfficientDet network to obtain the position and size of the ore, and comparing the real size of the ore obtained by calculation with a system threshold value. If the real size of the abnormal information is larger than the system threshold value, displaying the abnormal information on a page, and writing the abnormal information into a database;

(D4) the setting module is used for setting system related parameters and log related parameters, such as the size of the alarm ore block and the like;

(D5) and the log module is used for receiving the abnormal record of the real-time early warning system, displaying the abnormal record and synchronizing the abnormal record to the cloud server.

The user module of the early warning system Web end comprises the following parts:

(U1) providing registration and login functions for new managers, operating and using the whole system, storing information of the whole system to a cloud server, and giving corresponding authority;

(U2) providing a super administrator with high authority with high-level authority to operate the system, and managing other users while operating and using the entire system.

The real-time early warning module of the early warning system Web end comprises the following parts:

(R1) adding, deleting, modifying corresponding industrial personal computer devices;

(R2) displaying a belt rotation screen of the added industrial personal computer device in real time; supporting the starting and stopping of displaying a belt rotation picture of the industrial personal computer equipment;

(R3) when the fact that the ore size of an industrial personal computer is larger than a system threshold value is detected, abnormal information is displayed, a user can select whether to stop the operation of the industrial personal computer or not according to the abnormal information, and information of the industrial personal computer recorded in an abnormal mode, abnormal time and images are stored in a database and written into a log module; and synchronizing the abnormal records in the database to the cloud server at intervals.

The setting module of the early warning system Web end comprises the following parts:

the related information of the early warning system is set, and the ore block size of the early warning of the system, the size of the system error, the focal length of the camera, the cloud server port and the like are mainly set.

The log module of the early warning system Web end comprises the following parts:

(Z1) receiving the abnormal record of the real-time alarm system, displaying the abnormal record on a page in real time, and writing the abnormal record into a database;

(Z2) selecting whether to synchronize the log to the cloud server;

(Z3) provides a search function, possibly based on the information of the industrial personal computer equipment, time, threshold value, etc., for alarm records.

The user module of the early warning system APP end comprises the following parts:

(M1) keeping synchronization with the database of the Web end, providing registration and login functions for new managers, operating and using the whole system, storing the information to the cloud server, and giving corresponding authority;

(M2) the super administrator with high authority provides high-level authority for operating the system and manages other users while operating and using the entire system.

Real-time early warning module of early warning system APP end comprises following part:

(X1) synchronizing the content of the industrial personal computer equipment in the database, and adding, deleting and modifying the corresponding industrial personal computer equipment;

(X2) displaying a belt rotation picture of the added industrial personal computer device in real time; supporting the starting and stopping of displaying a belt rotation picture of the industrial personal computer equipment;

(X3) when the fact that the ore size of an industrial personal computer is larger than a system threshold value is detected, abnormal information is displayed, a user can select whether to stop the operation of the industrial personal computer or not according to the abnormal information, and information of the industrial personal computer recorded in an abnormal mode, abnormal time and images are stored in a database and written into a log module; and synchronizing the abnormal records in the database to the cloud server at intervals.

The log module of the APP end of the early warning system comprises the following parts:

(Y1) real-time querying the database to display the abnormal information on the page;

(Y2) provides a search function, possibly for alarm records based on the information of the industrial personal computer equipment, time, threshold values, etc.

As described above, the invention provides an EfficientDet network-based deep learning ore size measuring method and an early warning system, and solves the problems of extreme dependence on manpower, low efficiency, low accuracy, overlarge model parameter and the like in ore size detection through an EfficientDet network model.

Drawings

FIG. 1 is a schematic diagram of the steps of the ore size measurement method of the present invention;

FIG. 2 is a schematic diagram of an EfficientDet framework network EfficientNet according to the present invention;

FIG. 3 is a schematic diagram of the architecture of the EfficientDet feature extraction network BiFPN of the present invention;

FIG. 4 is a schematic diagram illustrating the overall structure of the EfficientDet network according to the present invention;

FIG. 5 is a flowchart illustrating step S1 in FIG. 1 in one embodiment;

FIG. 6 is a flowchart illustrating a specific example of step S2 shown in FIG. 1;

FIG. 7 is a flowchart illustrating a specific example of step S3 shown in FIG. 1;

FIG. 8 is a flowchart illustrating a specific example of step S4 shown in FIG. 1;

FIG. 9 is a flowchart illustrating a specific example of step S5 shown in FIG. 1;

FIG. 10 is a schematic diagram of a deep learning ore size measurement method and an early warning system module based on the EfficientDet network according to the present invention;

FIG. 11 is a block diagram illustrating an embodiment of the user module M3 of FIG. 10;

FIG. 12 is a block diagram of the real-time warning module M2 of FIG. 10 in an embodiment;

FIG. 13 is a block diagram illustrating the setting module M4 of FIG. 10 in one embodiment;

FIG. 14 is a block diagram of the log module M5 of FIG. 10 in one embodiment;

reference numerals: m1, a video stream acquisition module; m2 real-time early warning module; an M21APP terminal; an M22Web end; an M3 user module; m4 setup module; an M5 log module; an M6 database module; S1-S5 are method steps; S11-S14 are method steps; S21-S25 are method steps; S31-S36 are method steps; S41-S43 are method steps; S51-S54 are the steps of the method.

Detailed Description

According to fig. 1, 2 and 3, a schematic diagram of the steps of the ore size measuring method, an architecture of an EfficientDet skeleton network EfficientNet and an architecture of a feature extraction network BiFPN of the invention are shown; the invention aims to solve the problems of extreme dependence on manpower, low efficiency, low accuracy, overlarge model parameter and the like in ore size detection through an EfficientDet network model. The EfficientDet has 8 versions from EfficientDet-D0 to EfficientDet-D7, and the model parameters and the operand are improved along with the improvement of the version number, and the precision is also improved. An appropriate network version is selected to adapt to different application scenarios, and the efficientDet-D0 is taken as an example. The method comprises the following steps:

s3, network creating and training stage: using an EfficientNet network as a backbone, creating an EfficientDet target detection network, and obtaining a network model by using a training data set;

s4, testing: testing by using the trained EfficientDet network model, using the obtained prediction frame for positioning the position of ore discharge, and calculating the size of the ore through the focal length of the camera, the distance between the camera and the belt and the size of the image;

s5, reminding: and sending out a prompt when the ore size exceeds the system threshold value according to the preset threshold value of the system.

Referring to fig. 5, it is a specific implementation step of S1, including the following steps:

s11, installing a camera at a position with a high vertical height L above the flowing belt, so that the shooting range of the camera can cover the transverse width of the flowing belt to obtain a complete ore video stream;

s12, setting camera parameter data including parameters such as resolution, frame rate, color space and the like to obtain a clearer ore video;

s13, selecting a proper storage format of video acquisition data, and storing the video into a video stream, wherein the storage format is an MJPEG format;

and S14, intercepting key frame data in the video stream to be used as an ore picture for next training, verification and testing.

Referring to fig. 6, it is a specific implementation step of S2, including the following steps:

and S21, marking the position and size of the ore in the acquired image data by using an image marking tool Lableme, wherein the size of the ore is not regular, and polygonal marking and a saved json file are required. The json file name is the same as the name of the mark, and the storage format of the picture is Base64 code;

s22, processing the ore pictures according to the marked data, detecting the pictures which do not meet the requirements, and deleting the pictures;

s23, pressing the obtained ore picture to be 7: 2: 1, dividing the picture into a training picture, a verification picture and a test picture;

s24, performing data enhancement on the training picture: horizontally and vertically reversing the ore picture, zooming the ore picture and changing the corresponding labeling position at the same time;

s25, adding random noise; the image is converted from the RGB color space to the HSV color space, the brightness of the image is adjusted to adapt to different illumination changes, the image is normalized, and the training effect is improved.

Referring to fig. 7, the specific implementation step of S3 includes the following steps:

s31, selecting the backbone type of the EfficientNet-B0 network, and setting the hyper-parameters of the training network, the learning rate, the Batch size, the number of rounds of training (Epochs), the optimizer (such as SGD and Adam), and the like. The EfficientNet-B0 consists of 1 Conv (3 × 3), 1 MBConv1(3 × 3), 2 MBConv6(3 × 3), 2 MBConv6(5 × 5), 3 MBConv6(3 × 3), 3 MBConv6(5 × 5), 4 MBConv6(5 × 5), one MBConv6(3 × 3), one Conv (1 × 1), one Pooling layer, and one FC layer. Wherein MBConv contains residual structures. The dimension increasing operation is performed by using the convolution of 1 × 1, then the convolution of 3 × 3 or 5 × 5 is performed, then the attention mechanism about the channel is added, and the dimension reducing operation is performed by using the convolution of 1 × 1, and then the residual structure is stacked. The activation function of MBConv uses the Swish function and is normalized using Batch Normalization;

the definition of the Swish function is:

wherein

Is a constant or trainable parameter;

the function is expressed as follows:

performing network scaling, and determining that the optimal value of EfficientNet-B0 scaling is alpha =1.2, beta =1.1 and gamma = 1.15;

s32, the BiFPN network can learn the importance from different input features while repeatedly applying top-down and bottom-up multi-size feature fusion. The enhanced extraction network consists of a plurality of BiFPNs, wherein the EfficientDet-D0 consists of 3 BiFPNs;

according to fig. 4, EfficientDet includes an EfficientNet backbone extraction network and a BiFPN enhanced extraction network, and efficienthead converts the extracted features into a prediction result. First EfficiiThe entNet continuously performs downsampling on an input picture, the downsampling frequency of the original EfficientNet is 5, and the original EfficientNet can be obtained through the EfficientNet

、

，

Is the result of compressing the input picture length and width once,

is the result of two compressions of the input picture length and width,

And

、

. Obtaining 5 effective characteristic layers

、

；

Transmitting the 5 feature layers into an enhanced extraction network BiFPN for further feature extraction, wherein

Representing resolution as an input image

The feature level of (c). For example, if the input resolution is 640x640,

represents a characteristic level 3(640 ^) with a resolution of 80x80

= 80), and

represents a 7-level feature level representing a resolution of 5x 5;

the two fused features shown in fig. 3 are in the case of layer 6:

wherein

Represents the first

The characteristics of the layer, wherein

Is an intermediate feature of the 6 th stage in the top-down path, and

is the output characteristic of stage 6 in the bottom-up path;

BiFPN adds extra weight to each input and lets the network know the importance of each input feature. BiFPN uses Fast normalized fusion

is set to a smaller value

=0.0001；

The width and depth of the BiFPN are scaled using the following formula:

wherein 1.35 is used as BiFPN width scaling factor,

is a complex coefficient that controls all other proportional dimensions;

the width of the prediction network is the same as BiFPN:

the depth of the prediction network is linearly increased using the equation:

s33, calculating the difference between the network result and the true value using the following loss function:

wherein

Is the loss of the class of the object,

is the regression loss.

A Smooth-L1 loss function was used. The Smooth-L1 loss function is as follows:

focal length was used, as shown below

wherein

= 0.25 and

= 1.5

s34, visualizing the whole training process, judging whether the network is under-fit or over-fit by judging the performance of the EfficientDet-D0 network in a training set and a verification set, and adjusting parameters in a targeted manner, such as adjusting the learning rate, increasing the number of training rounds, replacing an optimizer, adjusting the size of an anchor and the like;

s35, selecting whether to finish training in advance according to the loss function and the network performance;

s36, finishing the training to obtain the EfficientDet-D0 network model.

Referring to fig. 8, it is a specific implementation step of S4, including the following steps:

s41, predicting the pre-reserved test picture by using the previously trained EfficientDet-D0 network model;

s42, obtaining a prediction frame of the ore in the picture, and calculating the size of the prediction frame;

s43, calculating the real size of the belt according to the installation angle of the camera, the distance between the camera and the belt, the size of the network model prediction frame and the pixels of the image:

is the included angle between the camera and the vertical line of the belt,

which represents the true size of the ore,

indicating the vertical distance between the camera and the belt,

which represents the focal length of the camera head,

represents the size of the EfficientDet-D0 network model prediction box.

Referring to fig. 9, it is a specific implementation step of S5, including the following steps:

s51, setting an ore size alarm threshold and an error threshold at the Web end of the re-system, wherein the system threshold is the sum of the ore size alarm threshold and the error threshold;

s52, comparing the real size of the ore calculated by the calculation system with a system threshold value;

s53, displaying abnormal information when the fact that the ore size of a certain industrial personal computer is larger than a system threshold value is detected;

and S54, the user can select whether to stop the operation of the industrial personal computer according to the abnormal information, and the information of the industrial personal computer recorded in the abnormal state, the abnormal time and the abnormal picture are stored in the database.

According to fig. 10, it consists of the following modules:

the video stream acquisition module (M1) acquires the video stream in real time, sends key frames in the video stream to the real-time early warning module, and judges whether to send out early warning or not by implementing the mirror image of the early warning module; the real-time early warning module (M2) comprises a Web end and an App end, so that the use of a user is more convenient, and the Web end comprises a user module, a real-time early warning module, a setting module and a log module. The App comprises a user module, a real-time early warning module and a log module. The Web end is mainly realized based on a Django framework. The user module (M3) according to fig. 11 mainly comprises: 1. a registration function 2, a login function 3 and an authority management function; the real-time early warning module (M4) according to fig. 12 mainly includes: 1. adding, deleting, modifying and listing 2 industrial personal computer equipment, displaying a belt corresponding picture 3 of the industrial personal computer equipment in real time, starting or stopping displaying a picture 4 of the industrial personal computer, displaying and sending abnormal information to an executing mechanism 5, and storing records of the abnormal information into a database. The setup module according to fig. 13 mainly comprises: 1. the method comprises the steps of setting the size of an ore block 2 for early warning of a system, setting the error size 3 of the system, setting a cloud server port 4 to be synchronized, and setting parameter information of a camera. The logging module (M5) according to fig. 14 mainly comprises: 1. inquiring the log 2, exporting the log 3, displaying the recent abnormal record 4, and selecting whether to synchronize the log from the local machine to the cloud server. Database module (M6): the local data can be synchronized to the cloud database, and the data records in the cloud database can be downloaded to the local.

Claims

1. A deep learning ore size measuring method and an early warning system based on an EfficientDet network are used for detecting ore size in the mining industry and intelligently processing ore blocks exceeding a threshold size, and are characterized by comprising the following specific steps:

s4, testing: testing by using the trained EfficientDet network model, using the obtained prediction frame for positioning the position of ore discharge, and calculating the size of the ore according to the focal length of the camera, the distance between the camera and the belt and the size of the image;

2. The method for measuring the ore size for deep learning based on the EfficientDet network and the early warning system according to claim 1, wherein the data acquisition stage specifically comprises:

(1) acquiring video data streams of ore pictures through a plurality of cameras arranged near the belt and at different angles;

(2) and intercepting the key frame from the acquired video data stream as ore picture data.

3. The method for measuring the ore size for deep learning based on the EfficientDet network and the early warning system according to claim 1, wherein the data preprocessing stage specifically comprises:

(i) marking the position and the size of the ore in the obtained picture data by using an image marking tool Lableme, wherein the scale of the ore is not regular and polygonal marking is needed;

(ii) detecting ore picture data which do not meet the requirements, and rejecting the ore picture data;

(iii) and performing data enhancement on the training picture: randomly carrying out horizontal and vertical reversal at different angles on the ore picture; zooming the ore picture; adjusting pixel values of the image so that the values thereof become uniformly distributed through histogram equalization; adding random noise; the method comprises the steps of converting an image from an RGB color space to an HSV color space, adjusting the brightness of the image, normalizing the image and processing noise.

4. The method for measuring the size of the deep learning ore based on the efficientDet network and the early warning system according to claim 1, wherein the network creating and training stage specifically comprises:

(ii) the entire EfficientNet has versions B0-B7, wherein EfficientNet-B0 consists of 1 Conv (3 × 3), 1 MBConv1(3 × 3), 2 MBConv6(3 × 3), 2 MBConv6(5 × 5), 3 MBConv6(3 × 3), 3 MBConv6(5 × 5), 4 MBConv6(5 × 5), one MBConv6(3 × 3), one Conv (1 × 1), one Pooling layer, one FC layer;

wherein MBConv contains a residual structure;

performing dimension increasing operation by using convolution of 1 × 1, performing convolution of 3 × 3 or 5 × 5, adding an attention mechanism about a channel, performing dimension reducing operation by using convolution of 1 × 1, and stacking with a residual error structure;

the activation function of MBConv uses the Swish function and is normalized using Batch Normalization;

the definition of the Swish function is:

wherein

Is a constant or trainable parameter;

the function is expressed as follows:

at the same time, efficiency-B0 uses the width (depth), depth (width) and resolution (resolution) of the network as the coefficients according to formula (3)

Performing composite zooming;

wherein α, β, γ are constants that can be determined by grid search;

in that

And

under the constraint of (3), the optimum value of EfficientNet-B0 is

=1.2，

=1.1，

=1.15；

。

5. The method for measuring the ore size for deep learning based on the EfficientDet network and the early warning system according to claim 1, wherein the BiFPN is constructed and specifically comprises the following steps:

(A) the importance of different input features can be learned by the learned weight of a BiFPN (bi-directional feature pyramid) network, and multi-scale feature fusion from top to bottom and from bottom to top is repeatedly applied;

the enhanced extraction network is composed of a plurality of BiFPNs, wherein EfficientDet has eight versions D0-D7, and the eight versions are respectively composed of 3, 4, 5, 6, 7, 8 and 8 BiFPNs;

(B) the EfficientDet comprises an EfficientNet backbone extraction network and a BiFPN reinforced extraction network, and the EfficientHead converts the extracted features into a prediction result;

firstly, the EfficientNet continuously performs downsampling on an input picture, the downsampling frequency of the original EfficientNet is 5 times, and the input picture can be obtained through the EfficientNet

、

，

Is the result of compressing the input picture length and width once,

is the result of two compressions of the input picture length and width,

the result of compressing the length and width of the input picture three times, and so on;

due to the fact that

And

、

The method has high semantic information, and therefore, the method is used in the reinforced extraction network BiFPN as three of 5 effective feature layers;

with two downsamplings of P5 to obtain information with higher semantics

、

；

Obtaining 5 effective characteristic layers

、

；

Representing resolution as an input image

A feature level of;

for example, if the input resolution is 640x640,

represents a characteristic level 3(640 ^) with a resolution of 80x80

= 80), and

represents a 7-level feature level representing a resolution of 5x 5;

the method comprises the following specific steps:

firstly, adjusting the number of channels to obtain

，

，

，

，

If entering BiFPN for the first time, will

By reducing the number of channels

_1 and

，

by reducing the passage to

1 and

_2；

is obtaining

、

_1、

_2、

_1、

_2、

、

Then need to be paired

Performing upsampling, and performing upsampling

Using attention mechanism to judge whether it is more concerned

Or also

Then activated by swish function and convolved to obtain；

③ pair

Performing upsampling, and performing upsampling

1 adopts an attention mechanism to judge whether the attention is more focused

Or also

1, then activated by swish function, and then convolved to obtain

；

Fourthly, to

Performing upsampling, and performing upsampling

1 adopts an attention mechanism to judge whether the attention is more focused

Or also

1, then activated by swish function, and then convolved to obtain

；

Fifthly, to

Perform upsamplingAfter upsampling and

using attention mechanism to judge whether it is more concerned

Or also

Then activated by swish function and convolved to obtain

；

Is obtained by

、

、

_2、

、

_2、

、

、

Then, need to make

Down-sampling, after down-sampling and

、

2 adopts an attention mechanism to judge whether the attention is more focused

、

Or also

2, then activated by swish function, and then convolved to obtain

；

Is to

Down-sampling, after down-sampling and

、

2 adopts an attention mechanism to judge whether the attention is more focused

、

Or also

2, then activated by swish function, and then convolved to obtain

(ii) a Then to

Down-sampling, after down-sampling and

、

using attention mechanism to judge whether it is more concerned

、

Or also

Then activated by swish function and convolved to obtain

(ii) a Then to

Down-sampling, after down-sampling and

using attention mechanism to judge whether it is more concerned

Whether or not

Then activated by swish function and convolved to obtain

；

Will obtain

、

、

、

、

As

、

、

、

、

_1 and

the _2need not be separated,

_1 and

nor does it need to be separated;

for example, the above fusion features can be summarized at layer 6 as:

wherein

Represents the first

The characteristics of the layer, wherein

Is an intermediate feature of the 6 th stage in the top-down path, and

is the output characteristic of stage 6 in the bottom-up path;

the width and depth of the BiFPN are scaled using the following formula:

wherein 1.35 is used as BiFPN width scaling factor,

is the composite coefficient that controls all other proportional dimensions:

the width of the prediction network is the same as BiFPN:

the depth of the predicted network increases linearly using equation (9):

。

6. the method for measuring the size of the deep learning ore based on the EfficientDet network and the early warning system according to claim 1 are characterized by constructing a loss function, and specifically comprising the following steps:

)

wherein

Is the loss of the class of the object,

is the regression loss;

the Smooth-L1 loss function is used;

the Smooth-L1 loss function is as follows:

)

(P2) an image has a plurality of candidate boxes, wherein positive samples are included in the image and negative samples are not included in the image;

if the probability that the first sample belongs to the first class is 0.9, the probability that the second sample belongs to the first class is 0.6, the former is a sample which is easy to classify, and the latter is a sample which is difficult to classify;

the class of the EfficientDet is Focal;

the Focal length can control the weight of positive and negative samples, and can control the weight of samples which are easy to classify and difficult to classify;

)

use of

Simplified cross entropy loss function:

)

：

Controlling the weights of easy-to-classify and difficult-to-classify samples:

)

called modulation factor (modulation factor);

when gamma =0, focal length is the cross entropy loss function, which can be adjusted

Implementing a change in modulation factor;

two kinds of weights are combined to obtain:

)

wherein

= 0.25 and

= 1.5。

7. the method for measuring the ore size for deep learning based on the EfficientDet network and the early warning system according to claim 1, wherein the prediction result specifically comprises the following steps:

(Q2) for Efficientdet-B0, classsnet employs 3 convolutions of depth separable volume of 64 channels and 1 convolution of the number of prior boxes possessed by the feature layer times how many classes of objects the network has in common, the number of prior boxes being by default 9;

BoxNet adopts convolution of 3 times of 64 channels and convolution of 1 time of multiplying the prior frame number of the characteristic layer by four, wherein the four refers to the adjustment condition of the prior frame, and the adjustment of the central position and the width and the height are increased;

the EfficientDet defaults to have 9 prior frames, and the length-width ratio of the prior frames can be adjusted according to actual conditions so as to be suitable for a detected target;

and the EfficientDet judges the object in the prior frame and the type of the object, adjusts the prior frame, screens out the frame which belongs to the same type in a certain area and has the maximum confidence coefficient by using non-maximum suppression (soft-NMS), and obtains a final prediction frame.

8. The method for measuring the ore size for deep learning based on the EfficientDet network and the early warning system according to claim 1 are characterized by performing network model test, and specifically comprise the following steps:

(T1) transmitting a reserved ore test picture into an EfficientDet network for prediction to obtain a prediction frame of ore in the picture, calculating the size of the prediction frame, and calculating the size of the real ore according to the size of the prediction frame and an error;

(T2) calculating the true size of the belt from the mounting angle of the camera, the distance from the camera to the belt, the size of the prediction frame of the network model, and the pixels of the image:

is the included angle between the camera and the vertical line of the belt,

which represents the true size of the ore,

indicating the vertical distance between the camera and the belt,

which represents the focal length of the camera head,

the size of the EfficientDet network model prediction box is shown.

9. The method for measuring the ore size for deep learning and the early warning system based on the EfficientDet network as claimed in claim 1, wherein the method for measuring the ore size for deep learning and the early warning system based on the EfficientDet network are characterized in that the method for measuring the ore size for deep learning and the early warning system based on the EfficientDet network send out a prompt when the ore size exceeds a system threshold, and specifically comprise the following steps:

setting an ore size alarm threshold, setting the size of a system threshold as the sum of the alarm threshold and an error threshold, and comparing the actual size of the ore obtained through calculation with the system threshold;

if the real size is larger than the threshold value, generating abnormal information, sending the abnormal information to the real-time alarm module,

the alarm system according to the authority, characterized by comprising:

(D2) the ore video stream acquisition module is used for acquiring ore video streams through cameras arranged at various angles near the ore flow belt and sending acquired images to the real-time alarm module;

(D3) the real-time alarm module is used for receiving the ore image sent by the ore video stream acquisition module, transmitting the ore image into a trained EfficientDet network to obtain the position and the size of the ore, and comparing the actual size of the ore obtained by calculation with a system threshold;

if the real size of the abnormal information is larger than the system threshold value, displaying the abnormal information on a page, and writing the abnormal information into a database;

(D5) the log module is used for receiving the abnormal record of the real-time warning system, displaying the abnormal record and synchronizing the abnormal record to the cloud server;

the user module of the warning system specifically comprises:

(U2) providing a super administrator with high authority to operate the system and manage other users while operating and using the whole system;

the real-time warning module of the warning system specifically comprises:

(R1) industrial personal computer device management, including adding, deleting, modifying, listing corresponding industrial personal computer devices;

(R3) when the fact that the ore size of an industrial personal computer is larger than a system threshold value is detected, abnormal information is displayed, a user can select whether to stop the operation of the industrial personal computer or not according to the abnormal information, and information of the industrial personal computer recorded in an abnormal mode, abnormal time and images are stored in a database and written into a log module; synchronizing the abnormal records in the database to a cloud server at intervals;

the setting module of the warning system specifically comprises:

setting related information of an alarm system, wherein the related information mainly comprises the size of an ore block for setting early warning of the system, the size of a system error, the focal length of a camera, a cloud server port and the like;

the log module of the warning system specifically comprises:

(Z2) selecting whether to synchronize the log to the cloud server;