CN110569843A

CN110569843A - Intelligent detection and identification method for mine target

Info

Publication number: CN110569843A
Application number: CN201910847253.5A
Authority: CN
Inventors: 张帆; 栾佳星
Original assignee: China University of Mining and Technology Beijing CUMTB
Current assignee: China University of Mining and Technology Beijing CUMTB
Priority date: 2019-09-09
Filing date: 2019-09-09
Publication date: 2019-12-13
Anticipated expiration: 2039-09-09
Also published as: CN110569843B

Abstract

The invention discloses an intelligent detection and identification method for mine targets, which comprises the following implementation steps: firstly, making a mine data set, and capturing mine images through video image acquisition equipment to make a training set and a testing set; secondly, a mine network model is established, wherein the mine network model comprises a public feature extraction network, a mine target detection network and a mine text recognition network; thirdly, network training, namely training a mine network model by adopting an end-to-end training method; fourthly, optimizing the network, namely optimizing the parameters of the mine network model by using a batch random gradient descent method; and fifthly, detecting and identifying the target, detecting the test set by using the trained mine network model, and outputting personnel, equipment and text identification results. The invention adopts the convolution network model with the residual error structure, accelerates the network training speed, reduces the complexity of the network training, and can quickly and effectively carry out intelligent detection and accurate identification on the mine target.

Description

Intelligent detection and identification method for mine target

Technical Field

the invention belongs to a computer vision and deep learning technology in the field of artificial intelligence, and particularly relates to an intelligent detection and identification method for a mine target.

Background

The intelligent mining is an effective measure for realizing safe and efficient intensive production of a mine and unmanned mining with few people, researches on accurate detection and real-time tracking and identification of moving targets such as underground operators, operating equipment, special robots and the like, and has important significance for guaranteeing intelligent safe mining of the mine and improving intelligent early warning of coal mine disasters. The existing underground positioning system based on the radio frequency identification technology can position the position information of personnel in a mine in real time, but is difficult to monitor, track and identify the working states of the mine personnel and equipment in real time. In recent years, with the rapid development of computer vision technology, a target detection algorithm based on deep learning can classify and position detected targets, is high in algorithm speed and recognition accuracy, and becomes a research hotspot in the fields of intelligent monitoring, moving target detection and recognition, visual navigation and the like. Therefore, the invention provides an intelligent detection and identification method for mine targets, which aims to overcome the defects of the existing mine target detection and identification technology and realize the rapid detection, tracking and identification of mine targets such as mine operators, equipment and the like through the machine vision and intelligent identification technology.

Disclosure of Invention

the invention provides an intelligent detection and identification method for mine targets. The system comprises a public feature extraction network, a mine target detection network and a mine text recognition network, and is mainly used for realizing the detection of mine personnel and equipment, recognizing the identity information of the mine personnel and the mine equipment according to the text information on the mine personnel and the mine equipment, and realizing the intelligent detection and recognition of mine targets.

The invention provides an intelligent detection and identification method for mine targets, which comprises the following steps:

step 1: making a mine data set comprising a training set and a testing set, wherein the mine data set is a mine image acquired by a video image acquisition device, each mine image comprises a plurality of objects of personnel, equipment and text, and each personnel and equipment object uses 2 parameters (g)_i,C_i) Is represented by the formula g_iCoordinate information representing the real frame of the object, C_irepresenting the true category of objects, with 3 parameters (g) for each textual object_i,C_i,l_k) Is represented by_krepresenting a text sequence in a text target, and generating a real box in a certain step length in the whole image;

step 2: establishing a mine network model, wherein the mine network model comprises a public feature extraction network, a mine target detection network and a mine text recognition network, the public feature extraction network extracts a feature map and inputs the feature map into the mine target detection network and the mine text recognition network respectively, the mine target detection network outputs a person, equipment and a text prediction box, and the mine text recognition network takes text prediction box information output by the mine target detection network and the feature map output by the public feature extraction network as input and outputs a text recognition result;

And step 3: network training, wherein the mine network model adopts an end-to-end training method, trains the public feature extraction network, the mine target detection network and the mine text recognition network in the step 2 by using the training set in the step 1, and outputs a prediction box of the personnel, the equipment and the text and a text recognition result of the text prediction box;

And 4, step 4: optimizing the network, namely optimizing the parameters of the mine network model by using a batch stochastic gradient descent method;

and 5: detecting and identifying a target, detecting the test set by using the trained mine network model, and outputting the personnel, equipment, a text prediction box and a text identification result of the text prediction box;

In the mine network model in the step 2, the public feature extraction network adopts a convolution network with a residual error structure to generate feature maps of 3 different scales; and the number of the first and second groups,

The mine network model in the step 2, wherein the mine target detection network comprises a candidate frame generation module, a candidate frame type and position prediction module and a non-maximum value suppression module, and a feature map generated by a public feature extraction network is used as input; further comprising the substeps of:

2.1 the candidate frame generation module respectively generates 3 candidate frames with different sizes on each grid of the feature maps with 3 different scales by adopting a dimension clustering method;

2.2 the candidate frame type and position prediction module calculates the target existence possibility of different candidate frames on each grid, and carries out type and position prediction on the candidate frame with the maximum target existence possibility on each grid;

2.3 the non-maximum value suppression module calculates the overlapping proportion of the real frame and the candidate frame subjected to category and position prediction, and reserves the prediction frame closest to the real frame to obtain a target detection result; and the number of the first and second groups,

The mine network model in step 2, wherein the mine text recognition network comprises a text feature map extraction module, a feature sequence extraction module, a sequence labeling module and a sequence transcription module, and further comprises the following substeps taking the feature map extracted by the public feature extraction network as input:

2.4 the text feature map extraction module extracts all text feature maps from feature maps extracted by a public feature extraction network according to text prediction information obtained by the mine target detection network, and scales text regions in an original image corresponding to the text feature maps to the same height under the condition of a fixed length-to-width ratio;

2.5, the feature sequence extraction module extracts the text feature map obtained in the step 2.4 through the convolution layer feature, and outputs a feature map with the size of (1, x, 512), wherein x represents the unfixed length of the text feature map, 512 represents 512 channels obtained by convolution of 512 convolution kernels, and each column of the extracted feature map is taken as a feature vector sequence;

2.6 the network structure of the sequence marking module adopts a circulation network of a gate control circulation unit, the characteristic sequence is input into the sequence marking module according to the sequence of each row, and sequence prediction information is output;

2.7 the sequence transcription module transforms the sequence prediction information and deletes redundant information to obtain the final text recognition result.

the network optimization method of the step 4 comprises the following steps:

Iteratively updating the parameters of the mine network model by adopting a batch stochastic gradient descent method, and calculating a loss function of the mine network model, wherein the loss function calculation formula of the mine network model is as follows:

wherein W and b are parameters to be adjusted of the mine network model; n represents the number of mine target samples adopted by single batch random gradient descent; alpha represents the weight of the calculation loss function when the cell has a target; beta represents the weight of the calculation loss function when the cell has no target; gamma represents the weight of the loss function of the calculated text sequence;The value is 0 or 1, which means that the loss function is calculated only for the jth candidate frame of the ith cell with the target;The value is 0 or 1, which means that the loss function is calculated only for the jth candidate frame of the ith cell without the target;the value is 0 or 1, and whether a target exists in the ith cell or not is indicated; b_icoordinate information representing the prediction box; g_icoordinate information representing a real frame; c_iandRespectively representing a true class value and a predicted class value; p is a radical of_i(c) andRespectively representing a real class probability and a prediction class probability; l is_coordIs a position error loss function; l is_confIs a class error loss function; l is_clsIs a class probability loss function; l is_seqRepresenting a text sequence loss function; l_krepresenting a sequence of words in a text box; y is_kRepresenting the predicted sequence.

The text information embedded in the mine image is a valid sequence identification with an identity of the recognition target.

the method carries out specific identity recognition on the personnel and the equipment according to the recognized effective sequence identification, and realizes intelligent detection and recognition of the mine personnel, equipment and texts.

The invention has the beneficial effects that:

The invention provides an intelligent detection and identification method for mine targets, which realizes the rapid detection, tracking and identification of mine targets such as mine operators, equipment and the like through a machine vision and intelligent identification technology. The invention adopts an end-to-end neural network model, and the mine target detection network and the mine text recognition network accelerate the network training speed and reduce the complexity of the network training by sharing a network layer; the invention effectively combines target detection and text recognition, realizes intelligent detection and identity recognition functions of the mine target carrying text information, and realizes the specific identity recognition and real-time detection of the operation state of mine personnel and equipment.

Drawings

FIG. 1 is a diagram of a mine network model architecture according to an embodiment of the present invention

FIG. 2 is a flow chart of mine network model training according to an embodiment of the present invention

FIG. 3(a) is a diagram of a public feature extraction network architecture according to an embodiment of the present invention

FIG. 3(b) is a diagram of a residual unit structure of a common feature extraction network according to an embodiment of the present invention

FIG. 4 is a flow chart of a mine target detection network according to an embodiment of the invention

FIG. 5 is a flow chart of a mine text recognition network in accordance with an embodiment of the present invention

Detailed Description

in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples.

Fig. 1 is a diagram of a mine network model architecture of the present invention. Inputting the collected mine images into a public feature extraction network, outputting feature maps with 3 different scales, and respectively inputting the output feature maps into a mine target detection network and a mine text recognition network, wherein the mine text recognition network performs text recognition on the feature maps according to text position information provided by the mine target detection network; the mine target detection network outputs the position coordinate information and the classification information of the detected mine personnel, equipment and text prediction boxes, the mine text recognition network outputs the text recognition result of the detected text box, and the recognized text is used as a text label to be identified on the corresponding prediction boxes of the mine personnel and the equipment; and overlapping the frame position information and the character recognition result of the personnel, equipment and text predicted by the mine network model with the original mine image, and finally outputting the mine image of the target with the detection frame and the text label.

FIG. 2 is a flow chart of mine network model training of the present invention.

step 1: making a mine data set, acquiring a mine image by video image acquisition equipment, and embedding text labels corresponding to mine personnel and equipment in the mine image into the mine image to obtain the mine data set carrying the text labels; each mine image comprises a plurality of objects of personnel, equipment and text, wherein each object of personnel and equipment uses 2 parameters (g)_i,C_i) Is represented by the formula g_icoordinate information representing the real frame of the object, C_iRepresenting the true category of objects, with 3 parameters (g) for each textual object_i,C_i,l_k) Is represented by_kRepresenting a text sequence in a text target, and generating a real box in a certain step length in the whole image; and finally, dividing the mine data set into a training set and a testing set.

Step 2: and error calculation, wherein the mine network model comprises a public feature extraction network, a mine target detection network and a mine text recognition network. The mine target detection model generates a series of candidate frames on a feature map extracted by a public feature extraction network, and selects the candidate frame with the highest probability score of the corresponding regional target and a real frame to calculate a classification error and a position regression error; the mine text recognition network predicts text content on a feature map generated by a common feature extraction layer according to text box position information provided by a mine target detection network, and performs character error calculation on the predicted text and a real text; the classification error, the position regression error and the character error are added by different weights to form a loss function for mine network model training, and the loss function calculation formula of the mine network model is as follows:

And step 3: and updating the parameters, namely continuously updating the parameters of the mine network model by the mine network model through back propagation by using a batch random gradient descent method until the loss function reaches a training threshold, stopping iteration, and outputting the trained mine network model parameters.

Fig. 3(a) is a diagram of a common feature extraction network structure. The invention adopts a convolution network with a residual error structure as a basic composition unit of the common feature extraction network, wherein the invention adopts convolution kernels with the sizes of 1 × 1 and 3 × 3, performs convolution with the step size of 2 in front of each residual error unit, gradually reduces the scale of the feature map, as shown in figure 3(a), the invention respectively extracts 3 feature maps with different scales with the sizes of 52 × 52, 26 × 26 and 13 × 13 from a residual error unit three, a residual error unit four and a residual error unit five, splices 13 × 13 feature maps extracted from the residual error unit five and 26 × 26 feature maps extracted from the residual error unit four to obtain the final feature map with the scale of 26 × 26, splices the obtained feature map with 26 × 26 through upsampling and the 52 feature map extracted from the residual error unit three to obtain the final feature map with the scale of 52 × 52, and generates a series of candidate boxes over the 3 feature maps.

Fig. 3(b) is a residual unit structure diagram in the common feature extraction network, and the feature diagram obtained in the previous layer is convolved by a 1 × 1 convolution kernel, a 3 × 3 convolution kernel and a 1 × 1 convolution kernel, and then is superimposed on the feature diagram before convolution, and the superimposed feature diagram is output.

Fig. 4 is a flow chart of a mine target detection network.

Step 1: generating candidate frames, wherein the common feature extraction network generates 3 feature maps with the scales of 13 × 13, 26 × 26 and 52 × 52, respectively, the candidate frames are generated by adopting a dimension clustering method, each grid on the feature maps generates 3 candidate frames, when the scale of the feature maps is 13 × 13, the receptive field of each grid is maximum and is suitable for detecting large targets, and the sizes of the 3 candidate frames of each grid are (116,90), (156,198) and (373,326); when the characteristic diagram is 26 x 26, the receptive field of each grid is moderate, and the characteristic diagram is suitable for detectiona general-size target, with 3 candidate box sizes of (30,61), (62,45), and (59,119) per mesh; at a feature map scale of 52 × 52, the field of view of each grid is the smallest and suitable for detecting small targets, and the 3 candidate boxes of each grid are (10,13), (16,30) and (33, 23). Each candidate box predicts three pieces of information: 1. position information of each frame, including (t)_x,t_y,b_h,b_w) Wherein, t_x,t_y,b_h,b_wrespectively predicting the center coordinate of the frame and the height and width of the frame; 2. a target presence likelihood score for each box; 3. class information C of each frame_i. The mine target detection network conducts convolution prediction on three feature maps through (4+1+ c) k convolution kernels with the size of 1 x 1, k is the number of candidate frames, 3 candidate frames are generated on each anchor point of each feature map, so that k is 3, c is the number of classes of predicted targets, 4k parameters are responsible for predicting offset of the candidate frames, k parameters are responsible for predicting the probability that the candidate frames contain the targets, and ck parameters are responsible for predicting the probability that the k preset boundary frames correspond to the c target classes.

Step 2: candidate frame screening, the invention adopts a logistic regression method, a target existence possibility score is carried out on the part enclosed by each anchor point of a feature map, namely the possibility that the position of the block is the target is large, the logistic regression method finds out the highest target possibility score from 9 candidate frames with different sizes, the candidate frame is classified and the position is predicted, if the candidate frame is not optimal, even if the candidate frame exceeds a set threshold value, the candidate frame cannot be predicted, and the step is performed before the task of predicting the classification and the position, thereby removing unnecessary candidate frames and reducing the calculation amount of subsequent prediction.

And step 3: in the classification prediction, a mode of a full connection layer plus softmax is adopted as a classification network structure, the number of neurons of the last layer of the full connection network is the number of categories, and the probability prediction of each category is output; position prediction, the invention adopts a method of directly predicting relative positions, and the coordinate of the upper left corner of the grid unit where the candidate frame is positioned is (c)_x,c_y) The frame width and height candidates are (p)_w,p_h)，The predicted central point of the prediction frame is relative coordinate relative to the upper left corner of the grid unit, and the central coordinate and the width and the height of the prediction frame are (b)_x,b_y,b_w,b_h) The prediction formula is:

b_x＝σ(t_x)+c_x

b_y＝σ(t_y)+c_y

And 4, step 4: and (4) performing final screening on the prediction frames predicted by classification prediction and position regression by adopting a non-maximum suppression method, removing redundant prediction frames, and reserving the corresponding prediction frame closest to the real frame to obtain a final prediction result.

fig. 5 is a flow chart of a mine text recognition network.

Step 1: extracting text characteristic graphs, namely extracting all the text characteristic graphs from the characteristic graphs extracted by the public characteristic extraction network according to text type prediction information obtained by the mine target detection network, and zooming text regions in the original image corresponding to the text characteristic graphs to the same height under the condition of a fixed length-to-width ratio.

Step 2: and (2) extracting a feature sequence, namely extracting features of the text feature map which is zoomed to a fixed height through a convolution layer, and outputting a feature map with the size of (1, x, 512), wherein x represents the unfixed length of the text feature map, and 512 represents 512 channels obtained by convolution of 512 convolution kernels. Each column of the feature map is extracted as a sequence of feature vectors.

And step 3: and (4) sequence labeling, wherein a network structure of a sequence labeling module adopts a circulation network of a gated circulation unit, the characteristic sequences are input into the sequence labeling module according to the sequence of each row, and sequence prediction information is output.

and 4, step 4: and (4) sequence transcription, namely converting the sequence prediction information and deleting redundant information to obtain a final text recognition result.

obviously, it should be understood by those skilled in the art that the intelligent detection and identification method according to the present invention and the above embodiments is applicable to detection and identification of moving targets in non-coal mines and open mines, such as metal and nonmetal mines, and intelligent detection and identification of fully mechanized coal mining face operation equipment, besides being applied to underground coal mine environment as well as detection and identification of mine targets. The invention does not limit the communication technical fields of monitoring of moving targets such as non-coal mines, open mines, fully mechanized mining areas, working face operation equipment and the like, accurate identification of mine internet of things equipment and the like except for detection and identification of the moving targets in the coal mines.

While the invention has been described in detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A mine target intelligent detection and identification method is characterized by comprising the following steps:

and 5: detecting and identifying a target, detecting the test set by using the trained mine network model, and outputting the personnel, equipment, a text prediction box and a text identification result of the text prediction box; it is also characterized in that the method comprises the following steps,

2. The method of claim 1, wherein the network optimization method of step 4 comprises:

3. the method of claim 1, wherein the text information embedded in the mine image is a valid sequence identifier with an identity of a recognition target.

4. The method of claim 1, wherein the method identifies persons and equipment according to the identified effective sequence identifiers, so that intelligent detection and identification of mine persons, equipment and texts are realized.