CN110569843A - Intelligent detection and identification method for mine target - Google Patents

Intelligent detection and identification method for mine target Download PDF

Info

Publication number
CN110569843A
CN110569843A CN201910847253.5A CN201910847253A CN110569843A CN 110569843 A CN110569843 A CN 110569843A CN 201910847253 A CN201910847253 A CN 201910847253A CN 110569843 A CN110569843 A CN 110569843A
Authority
CN
China
Prior art keywords
mine
network
text
target
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910847253.5A
Other languages
Chinese (zh)
Other versions
CN110569843B (en
Inventor
张帆
栾佳星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology Beijing CUMTB
Original Assignee
China University of Mining and Technology Beijing CUMTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology Beijing CUMTB filed Critical China University of Mining and Technology Beijing CUMTB
Priority to CN201910847253.5A priority Critical patent/CN110569843B/en
Publication of CN110569843A publication Critical patent/CN110569843A/en
Application granted granted Critical
Publication of CN110569843B publication Critical patent/CN110569843B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intelligent detection and identification method for mine targets, which comprises the following implementation steps: firstly, making a mine data set, and capturing mine images through video image acquisition equipment to make a training set and a testing set; secondly, a mine network model is established, wherein the mine network model comprises a public feature extraction network, a mine target detection network and a mine text recognition network; thirdly, network training, namely training a mine network model by adopting an end-to-end training method; fourthly, optimizing the network, namely optimizing the parameters of the mine network model by using a batch random gradient descent method; and fifthly, detecting and identifying the target, detecting the test set by using the trained mine network model, and outputting personnel, equipment and text identification results. The invention adopts the convolution network model with the residual error structure, accelerates the network training speed, reduces the complexity of the network training, and can quickly and effectively carry out intelligent detection and accurate identification on the mine target.

Description

Intelligent detection and identification method for mine target
Technical Field
the invention belongs to a computer vision and deep learning technology in the field of artificial intelligence, and particularly relates to an intelligent detection and identification method for a mine target.
Background
The intelligent mining is an effective measure for realizing safe and efficient intensive production of a mine and unmanned mining with few people, researches on accurate detection and real-time tracking and identification of moving targets such as underground operators, operating equipment, special robots and the like, and has important significance for guaranteeing intelligent safe mining of the mine and improving intelligent early warning of coal mine disasters. The existing underground positioning system based on the radio frequency identification technology can position the position information of personnel in a mine in real time, but is difficult to monitor, track and identify the working states of the mine personnel and equipment in real time. In recent years, with the rapid development of computer vision technology, a target detection algorithm based on deep learning can classify and position detected targets, is high in algorithm speed and recognition accuracy, and becomes a research hotspot in the fields of intelligent monitoring, moving target detection and recognition, visual navigation and the like. Therefore, the invention provides an intelligent detection and identification method for mine targets, which aims to overcome the defects of the existing mine target detection and identification technology and realize the rapid detection, tracking and identification of mine targets such as mine operators, equipment and the like through the machine vision and intelligent identification technology.
Disclosure of Invention
the invention provides an intelligent detection and identification method for mine targets. The system comprises a public feature extraction network, a mine target detection network and a mine text recognition network, and is mainly used for realizing the detection of mine personnel and equipment, recognizing the identity information of the mine personnel and the mine equipment according to the text information on the mine personnel and the mine equipment, and realizing the intelligent detection and recognition of mine targets.
The invention provides an intelligent detection and identification method for mine targets, which comprises the following steps:
step 1: making a mine data set comprising a training set and a testing set, wherein the mine data set is a mine image acquired by a video image acquisition device, each mine image comprises a plurality of objects of personnel, equipment and text, and each personnel and equipment object uses 2 parameters (g)i,Ci) Is represented by the formula giCoordinate information representing the real frame of the object, Cirepresenting the true category of objects, with 3 parameters (g) for each textual objecti,Ci,lk) Is represented bykrepresenting a text sequence in a text target, and generating a real box in a certain step length in the whole image;
step 2: establishing a mine network model, wherein the mine network model comprises a public feature extraction network, a mine target detection network and a mine text recognition network, the public feature extraction network extracts a feature map and inputs the feature map into the mine target detection network and the mine text recognition network respectively, the mine target detection network outputs a person, equipment and a text prediction box, and the mine text recognition network takes text prediction box information output by the mine target detection network and the feature map output by the public feature extraction network as input and outputs a text recognition result;
And step 3: network training, wherein the mine network model adopts an end-to-end training method, trains the public feature extraction network, the mine target detection network and the mine text recognition network in the step 2 by using the training set in the step 1, and outputs a prediction box of the personnel, the equipment and the text and a text recognition result of the text prediction box;
And 4, step 4: optimizing the network, namely optimizing the parameters of the mine network model by using a batch stochastic gradient descent method;
and 5: detecting and identifying a target, detecting the test set by using the trained mine network model, and outputting the personnel, equipment, a text prediction box and a text identification result of the text prediction box;
In the mine network model in the step 2, the public feature extraction network adopts a convolution network with a residual error structure to generate feature maps of 3 different scales; and the number of the first and second groups,
The mine network model in the step 2, wherein the mine target detection network comprises a candidate frame generation module, a candidate frame type and position prediction module and a non-maximum value suppression module, and a feature map generated by a public feature extraction network is used as input; further comprising the substeps of:
2.1 the candidate frame generation module respectively generates 3 candidate frames with different sizes on each grid of the feature maps with 3 different scales by adopting a dimension clustering method;
2.2 the candidate frame type and position prediction module calculates the target existence possibility of different candidate frames on each grid, and carries out type and position prediction on the candidate frame with the maximum target existence possibility on each grid;
2.3 the non-maximum value suppression module calculates the overlapping proportion of the real frame and the candidate frame subjected to category and position prediction, and reserves the prediction frame closest to the real frame to obtain a target detection result; and the number of the first and second groups,
The mine network model in step 2, wherein the mine text recognition network comprises a text feature map extraction module, a feature sequence extraction module, a sequence labeling module and a sequence transcription module, and further comprises the following substeps taking the feature map extracted by the public feature extraction network as input:
2.4 the text feature map extraction module extracts all text feature maps from feature maps extracted by a public feature extraction network according to text prediction information obtained by the mine target detection network, and scales text regions in an original image corresponding to the text feature maps to the same height under the condition of a fixed length-to-width ratio;
2.5, the feature sequence extraction module extracts the text feature map obtained in the step 2.4 through the convolution layer feature, and outputs a feature map with the size of (1, x, 512), wherein x represents the unfixed length of the text feature map, 512 represents 512 channels obtained by convolution of 512 convolution kernels, and each column of the extracted feature map is taken as a feature vector sequence;
2.6 the network structure of the sequence marking module adopts a circulation network of a gate control circulation unit, the characteristic sequence is input into the sequence marking module according to the sequence of each row, and sequence prediction information is output;
2.7 the sequence transcription module transforms the sequence prediction information and deletes redundant information to obtain the final text recognition result.
the network optimization method of the step 4 comprises the following steps:
Iteratively updating the parameters of the mine network model by adopting a batch stochastic gradient descent method, and calculating a loss function of the mine network model, wherein the loss function calculation formula of the mine network model is as follows:
wherein W and b are parameters to be adjusted of the mine network model; n represents the number of mine target samples adopted by single batch random gradient descent; alpha represents the weight of the calculation loss function when the cell has a target; beta represents the weight of the calculation loss function when the cell has no target; gamma represents the weight of the loss function of the calculated text sequence;The value is 0 or 1, which means that the loss function is calculated only for the jth candidate frame of the ith cell with the target;The value is 0 or 1, which means that the loss function is calculated only for the jth candidate frame of the ith cell without the target;the value is 0 or 1, and whether a target exists in the ith cell or not is indicated; bicoordinate information representing the prediction box; gicoordinate information representing a real frame; ciandRespectively representing a true class value and a predicted class value; p is a radical ofi(c) andRespectively representing a real class probability and a prediction class probability; l iscoordIs a position error loss function; l isconfIs a class error loss function; l isclsIs a class probability loss function; l isseqRepresenting a text sequence loss function; lkrepresenting a sequence of words in a text box; y iskRepresenting the predicted sequence.
The text information embedded in the mine image is a valid sequence identification with an identity of the recognition target.
the method carries out specific identity recognition on the personnel and the equipment according to the recognized effective sequence identification, and realizes intelligent detection and recognition of the mine personnel, equipment and texts.
The invention has the beneficial effects that:
The invention provides an intelligent detection and identification method for mine targets, which realizes the rapid detection, tracking and identification of mine targets such as mine operators, equipment and the like through a machine vision and intelligent identification technology. The invention adopts an end-to-end neural network model, and the mine target detection network and the mine text recognition network accelerate the network training speed and reduce the complexity of the network training by sharing a network layer; the invention effectively combines target detection and text recognition, realizes intelligent detection and identity recognition functions of the mine target carrying text information, and realizes the specific identity recognition and real-time detection of the operation state of mine personnel and equipment.
Drawings
FIG. 1 is a diagram of a mine network model architecture according to an embodiment of the present invention
FIG. 2 is a flow chart of mine network model training according to an embodiment of the present invention
FIG. 3(a) is a diagram of a public feature extraction network architecture according to an embodiment of the present invention
FIG. 3(b) is a diagram of a residual unit structure of a common feature extraction network according to an embodiment of the present invention
FIG. 4 is a flow chart of a mine target detection network according to an embodiment of the invention
FIG. 5 is a flow chart of a mine text recognition network in accordance with an embodiment of the present invention
Detailed Description
in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples.
Fig. 1 is a diagram of a mine network model architecture of the present invention. Inputting the collected mine images into a public feature extraction network, outputting feature maps with 3 different scales, and respectively inputting the output feature maps into a mine target detection network and a mine text recognition network, wherein the mine text recognition network performs text recognition on the feature maps according to text position information provided by the mine target detection network; the mine target detection network outputs the position coordinate information and the classification information of the detected mine personnel, equipment and text prediction boxes, the mine text recognition network outputs the text recognition result of the detected text box, and the recognized text is used as a text label to be identified on the corresponding prediction boxes of the mine personnel and the equipment; and overlapping the frame position information and the character recognition result of the personnel, equipment and text predicted by the mine network model with the original mine image, and finally outputting the mine image of the target with the detection frame and the text label.
FIG. 2 is a flow chart of mine network model training of the present invention.
step 1: making a mine data set, acquiring a mine image by video image acquisition equipment, and embedding text labels corresponding to mine personnel and equipment in the mine image into the mine image to obtain the mine data set carrying the text labels; each mine image comprises a plurality of objects of personnel, equipment and text, wherein each object of personnel and equipment uses 2 parameters (g)i,Ci) Is represented by the formula gicoordinate information representing the real frame of the object, CiRepresenting the true category of objects, with 3 parameters (g) for each textual objecti,Ci,lk) Is represented bykRepresenting a text sequence in a text target, and generating a real box in a certain step length in the whole image; and finally, dividing the mine data set into a training set and a testing set.
Step 2: and error calculation, wherein the mine network model comprises a public feature extraction network, a mine target detection network and a mine text recognition network. The mine target detection model generates a series of candidate frames on a feature map extracted by a public feature extraction network, and selects the candidate frame with the highest probability score of the corresponding regional target and a real frame to calculate a classification error and a position regression error; the mine text recognition network predicts text content on a feature map generated by a common feature extraction layer according to text box position information provided by a mine target detection network, and performs character error calculation on the predicted text and a real text; the classification error, the position regression error and the character error are added by different weights to form a loss function for mine network model training, and the loss function calculation formula of the mine network model is as follows:
wherein W and b are parameters to be adjusted of the mine network model; n represents the number of mine target samples adopted by single batch random gradient descent; alpha represents the weight of the calculation loss function when the cell has a target; beta represents the weight of the calculation loss function when the cell has no target; gamma represents the weight of the loss function of the calculated text sequence;the value is 0 or 1, which means that the loss function is calculated only for the jth candidate frame of the ith cell with the target;The value is 0 or 1, which means that the loss function is calculated only for the jth candidate frame of the ith cell without the target;The value is 0 or 1, and whether a target exists in the ith cell or not is indicated; biCoordinate information representing the prediction box; giCoordinate information representing a real frame; ciAndrespectively representing a true class value and a predicted class value; p is a radical ofi(c) Andrespectively representing a real class probability and a prediction class probability; l iscoordIs a position error loss function; l isconfIs a class error loss function; l isclsIs a class probability loss function; l isseqRepresenting a text sequence loss function; lkRepresenting a sequence of words in a text box; y iskrepresenting the predicted sequence.
And step 3: and updating the parameters, namely continuously updating the parameters of the mine network model by the mine network model through back propagation by using a batch random gradient descent method until the loss function reaches a training threshold, stopping iteration, and outputting the trained mine network model parameters.
Fig. 3(a) is a diagram of a common feature extraction network structure. The invention adopts a convolution network with a residual error structure as a basic composition unit of the common feature extraction network, wherein the invention adopts convolution kernels with the sizes of 1 × 1 and 3 × 3, performs convolution with the step size of 2 in front of each residual error unit, gradually reduces the scale of the feature map, as shown in figure 3(a), the invention respectively extracts 3 feature maps with different scales with the sizes of 52 × 52, 26 × 26 and 13 × 13 from a residual error unit three, a residual error unit four and a residual error unit five, splices 13 × 13 feature maps extracted from the residual error unit five and 26 × 26 feature maps extracted from the residual error unit four to obtain the final feature map with the scale of 26 × 26, splices the obtained feature map with 26 × 26 through upsampling and the 52 feature map extracted from the residual error unit three to obtain the final feature map with the scale of 52 × 52, and generates a series of candidate boxes over the 3 feature maps.
Fig. 3(b) is a residual unit structure diagram in the common feature extraction network, and the feature diagram obtained in the previous layer is convolved by a 1 × 1 convolution kernel, a 3 × 3 convolution kernel and a 1 × 1 convolution kernel, and then is superimposed on the feature diagram before convolution, and the superimposed feature diagram is output.
Fig. 4 is a flow chart of a mine target detection network.
Step 1: generating candidate frames, wherein the common feature extraction network generates 3 feature maps with the scales of 13 × 13, 26 × 26 and 52 × 52, respectively, the candidate frames are generated by adopting a dimension clustering method, each grid on the feature maps generates 3 candidate frames, when the scale of the feature maps is 13 × 13, the receptive field of each grid is maximum and is suitable for detecting large targets, and the sizes of the 3 candidate frames of each grid are (116,90), (156,198) and (373,326); when the characteristic diagram is 26 x 26, the receptive field of each grid is moderate, and the characteristic diagram is suitable for detectiona general-size target, with 3 candidate box sizes of (30,61), (62,45), and (59,119) per mesh; at a feature map scale of 52 × 52, the field of view of each grid is the smallest and suitable for detecting small targets, and the 3 candidate boxes of each grid are (10,13), (16,30) and (33, 23). Each candidate box predicts three pieces of information: 1. position information of each frame, including (t)x,ty,bh,bw) Wherein, tx,ty,bh,bwrespectively predicting the center coordinate of the frame and the height and width of the frame; 2. a target presence likelihood score for each box; 3. class information C of each framei. The mine target detection network conducts convolution prediction on three feature maps through (4+1+ c) k convolution kernels with the size of 1 x 1, k is the number of candidate frames, 3 candidate frames are generated on each anchor point of each feature map, so that k is 3, c is the number of classes of predicted targets, 4k parameters are responsible for predicting offset of the candidate frames, k parameters are responsible for predicting the probability that the candidate frames contain the targets, and ck parameters are responsible for predicting the probability that the k preset boundary frames correspond to the c target classes.
Step 2: candidate frame screening, the invention adopts a logistic regression method, a target existence possibility score is carried out on the part enclosed by each anchor point of a feature map, namely the possibility that the position of the block is the target is large, the logistic regression method finds out the highest target possibility score from 9 candidate frames with different sizes, the candidate frame is classified and the position is predicted, if the candidate frame is not optimal, even if the candidate frame exceeds a set threshold value, the candidate frame cannot be predicted, and the step is performed before the task of predicting the classification and the position, thereby removing unnecessary candidate frames and reducing the calculation amount of subsequent prediction.
And step 3: in the classification prediction, a mode of a full connection layer plus softmax is adopted as a classification network structure, the number of neurons of the last layer of the full connection network is the number of categories, and the probability prediction of each category is output; position prediction, the invention adopts a method of directly predicting relative positions, and the coordinate of the upper left corner of the grid unit where the candidate frame is positioned is (c)x,cy) The frame width and height candidates are (p)w,ph),The predicted central point of the prediction frame is relative coordinate relative to the upper left corner of the grid unit, and the central coordinate and the width and the height of the prediction frame are (b)x,by,bw,bh) The prediction formula is:
bx=σ(tx)+cx
by=σ(ty)+cy
And 4, step 4: and (4) performing final screening on the prediction frames predicted by classification prediction and position regression by adopting a non-maximum suppression method, removing redundant prediction frames, and reserving the corresponding prediction frame closest to the real frame to obtain a final prediction result.
fig. 5 is a flow chart of a mine text recognition network.
Step 1: extracting text characteristic graphs, namely extracting all the text characteristic graphs from the characteristic graphs extracted by the public characteristic extraction network according to text type prediction information obtained by the mine target detection network, and zooming text regions in the original image corresponding to the text characteristic graphs to the same height under the condition of a fixed length-to-width ratio.
Step 2: and (2) extracting a feature sequence, namely extracting features of the text feature map which is zoomed to a fixed height through a convolution layer, and outputting a feature map with the size of (1, x, 512), wherein x represents the unfixed length of the text feature map, and 512 represents 512 channels obtained by convolution of 512 convolution kernels. Each column of the feature map is extracted as a sequence of feature vectors.
And step 3: and (4) sequence labeling, wherein a network structure of a sequence labeling module adopts a circulation network of a gated circulation unit, the characteristic sequences are input into the sequence labeling module according to the sequence of each row, and sequence prediction information is output.
and 4, step 4: and (4) sequence transcription, namely converting the sequence prediction information and deleting redundant information to obtain a final text recognition result.
obviously, it should be understood by those skilled in the art that the intelligent detection and identification method according to the present invention and the above embodiments is applicable to detection and identification of moving targets in non-coal mines and open mines, such as metal and nonmetal mines, and intelligent detection and identification of fully mechanized coal mining face operation equipment, besides being applied to underground coal mine environment as well as detection and identification of mine targets. The invention does not limit the communication technical fields of monitoring of moving targets such as non-coal mines, open mines, fully mechanized mining areas, working face operation equipment and the like, accurate identification of mine internet of things equipment and the like except for detection and identification of the moving targets in the coal mines.
While the invention has been described in detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. A mine target intelligent detection and identification method is characterized by comprising the following steps:
Step 1: making a mine data set comprising a training set and a testing set, wherein the mine data set is a mine image acquired by a video image acquisition device, each mine image comprises a plurality of objects of personnel, equipment and text, and each personnel and equipment object uses 2 parameters (g)i,Ci) Is represented by the formula gicoordinate information representing the real frame of the object, CiRepresenting the true category of objects, with 3 parameters (g) for each textual objecti,Ci,lk) Is represented bykrepresenting a text sequence in a text target, and generating a real box in a certain step length in the whole image;
Step 2: establishing a mine network model, wherein the mine network model comprises a public feature extraction network, a mine target detection network and a mine text recognition network, the public feature extraction network extracts a feature map and inputs the feature map into the mine target detection network and the mine text recognition network respectively, the mine target detection network outputs a person, equipment and a text prediction box, and the mine text recognition network takes text prediction box information output by the mine target detection network and the feature map output by the public feature extraction network as input and outputs a text recognition result;
And step 3: network training, wherein the mine network model adopts an end-to-end training method, trains the public feature extraction network, the mine target detection network and the mine text recognition network in the step 2 by using the training set in the step 1, and outputs a prediction box of the personnel, the equipment and the text and a text recognition result of the text prediction box;
And 4, step 4: optimizing the network, namely optimizing the parameters of the mine network model by using a batch stochastic gradient descent method;
and 5: detecting and identifying a target, detecting the test set by using the trained mine network model, and outputting the personnel, equipment, a text prediction box and a text identification result of the text prediction box; it is also characterized in that the method comprises the following steps,
In the mine network model in the step 2, the public feature extraction network adopts a convolution network with a residual error structure to generate feature maps of 3 different scales; and the number of the first and second groups,
The mine network model in the step 2, wherein the mine target detection network comprises a candidate frame generation module, a candidate frame type and position prediction module and a non-maximum value suppression module, and a feature map generated by a public feature extraction network is used as input; further comprising the substeps of:
2.1 the candidate frame generation module respectively generates 3 candidate frames with different sizes on each grid of the feature maps with 3 different scales by adopting a dimension clustering method;
2.2 the candidate frame type and position prediction module calculates the target existence possibility of different candidate frames on each grid, and carries out type and position prediction on the candidate frame with the maximum target existence possibility on each grid;
2.3 the non-maximum value suppression module calculates the overlapping proportion of the real frame and the candidate frame subjected to category and position prediction, and reserves the prediction frame closest to the real frame to obtain a target detection result; and the number of the first and second groups,
The mine network model in step 2, wherein the mine text recognition network comprises a text feature map extraction module, a feature sequence extraction module, a sequence labeling module and a sequence transcription module, and further comprises the following substeps taking the feature map extracted by the public feature extraction network as input:
2.4 the text feature map extraction module extracts all text feature maps from feature maps extracted by a public feature extraction network according to text prediction information obtained by the mine target detection network, and scales text regions in an original image corresponding to the text feature maps to the same height under the condition of a fixed length-to-width ratio;
2.5, the feature sequence extraction module extracts the text feature map obtained in the step 2.4 through the convolution layer feature, and outputs a feature map with the size of (1, x, 512), wherein x represents the unfixed length of the text feature map, 512 represents 512 channels obtained by convolution of 512 convolution kernels, and each column of the extracted feature map is taken as a feature vector sequence;
2.6 the network structure of the sequence marking module adopts a circulation network of a gate control circulation unit, the characteristic sequence is input into the sequence marking module according to the sequence of each row, and sequence prediction information is output;
2.7 the sequence transcription module transforms the sequence prediction information and deletes redundant information to obtain the final text recognition result.
2. The method of claim 1, wherein the network optimization method of step 4 comprises:
iteratively updating the parameters of the mine network model by adopting a batch stochastic gradient descent method, and calculating a loss function of the mine network model, wherein the loss function calculation formula of the mine network model is as follows:
Wherein W and b are parameters to be adjusted of the mine network model; n represents the number of mine target samples adopted by single batch random gradient descent; alpha represents the weight of the calculation loss function when the cell has a target; beta represents the weight of the calculation loss function when the cell has no target; gamma represents the weight of the loss function of the calculated text sequence;The value is 0 or 1, which means that the loss function is calculated only for the jth candidate frame of the ith cell with the target;the value is 0 or 1, which means that the loss function is calculated only for the jth candidate frame of the ith cell without the target;the value is 0 or 1, and whether a target exists in the ith cell or not is indicated; biCoordinate information representing the prediction box; gicoordinate information representing a real frame; ciandrespectively representing a true class value and a predicted class value; p is a radical ofi(c) AndRespectively representing a real class probability and a prediction class probability; l iscoordIs a position error loss function; l isconfIs a class error loss function; l isclsIs a class probability loss function; l isseqrepresenting a text sequence loss function; lkRepresenting a sequence of words in a text box; y iskrepresenting the predicted sequence.
3. the method of claim 1, wherein the text information embedded in the mine image is a valid sequence identifier with an identity of a recognition target.
4. The method of claim 1, wherein the method identifies persons and equipment according to the identified effective sequence identifiers, so that intelligent detection and identification of mine persons, equipment and texts are realized.
CN201910847253.5A 2019-09-09 2019-09-09 Intelligent detection and identification method for mine target Active CN110569843B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910847253.5A CN110569843B (en) 2019-09-09 2019-09-09 Intelligent detection and identification method for mine target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910847253.5A CN110569843B (en) 2019-09-09 2019-09-09 Intelligent detection and identification method for mine target

Publications (2)

Publication Number Publication Date
CN110569843A true CN110569843A (en) 2019-12-13
CN110569843B CN110569843B (en) 2022-02-08

Family

ID=68778447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910847253.5A Active CN110569843B (en) 2019-09-09 2019-09-09 Intelligent detection and identification method for mine target

Country Status (1)

Country Link
CN (1) CN110569843B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992361A (en) * 2019-12-25 2020-04-10 创新奇智(成都)科技有限公司 Engine fastener detection system and detection method based on cost balance
CN111160255A (en) * 2019-12-30 2020-05-15 成都数之联科技有限公司 Fishing behavior identification method and system based on three-dimensional convolutional network
CN111553349A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text positioning and identifying method based on full convolution network
CN112686340A (en) * 2021-03-12 2021-04-20 成都点泽智能科技有限公司 Dense small target detection method based on deep neural network
CN113191273A (en) * 2021-04-30 2021-07-30 西安聚全网络科技有限公司 Oil field well site video target detection and identification method and system based on neural network
CN113569672A (en) * 2021-07-16 2021-10-29 国网电力科学研究院有限公司 Lightweight target detection and fault identification method, device and system
CN113762292A (en) * 2020-06-03 2021-12-07 杭州海康威视数字技术股份有限公司 Training data acquisition method and device and model training method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024899A1 (en) * 2014-06-19 2017-01-26 Bae Systems Information & Electronic Systems Integration Inc. Multi-source multi-modal activity recognition in aerial video surveillance
CN108182413A (en) * 2017-12-29 2018-06-19 中国矿业大学(北京) A kind of mine movable object detecting and tracking recognition methods
CN109241870A (en) * 2018-08-16 2019-01-18 中国矿业大学(北京) Coal mine down-hole personnel personal identification method based on Gait Recognition
WO2019028725A1 (en) * 2017-08-10 2019-02-14 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN109685066A (en) * 2018-12-24 2019-04-26 中国矿业大学(北京) A kind of mine object detection and recognition method based on depth convolutional neural networks
CN109934279A (en) * 2019-03-07 2019-06-25 东喜和仪(珠海市)数据科技有限公司 The image-recognizing method of text sequence based on artificial intelligence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024899A1 (en) * 2014-06-19 2017-01-26 Bae Systems Information & Electronic Systems Integration Inc. Multi-source multi-modal activity recognition in aerial video surveillance
WO2019028725A1 (en) * 2017-08-10 2019-02-14 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN108182413A (en) * 2017-12-29 2018-06-19 中国矿业大学(北京) A kind of mine movable object detecting and tracking recognition methods
CN109241870A (en) * 2018-08-16 2019-01-18 中国矿业大学(北京) Coal mine down-hole personnel personal identification method based on Gait Recognition
CN109685066A (en) * 2018-12-24 2019-04-26 中国矿业大学(北京) A kind of mine object detection and recognition method based on depth convolutional neural networks
CN109934279A (en) * 2019-03-07 2019-06-25 东喜和仪(珠海市)数据科技有限公司 The image-recognizing method of text sequence based on artificial intelligence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BAOGUANG SHI等: "An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
张帆等: "基于稀疏度自适应的矿井智能监控图像重构方法", 《煤炭学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992361A (en) * 2019-12-25 2020-04-10 创新奇智(成都)科技有限公司 Engine fastener detection system and detection method based on cost balance
CN111160255A (en) * 2019-12-30 2020-05-15 成都数之联科技有限公司 Fishing behavior identification method and system based on three-dimensional convolutional network
CN111160255B (en) * 2019-12-30 2022-07-29 成都数之联科技股份有限公司 Fishing behavior identification method and system based on three-dimensional convolution network
CN111553349A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text positioning and identifying method based on full convolution network
CN111553349B (en) * 2020-04-26 2023-04-18 佛山市南海区广工大数控装备协同创新研究院 Scene text positioning and identifying method based on full convolution network
CN113762292A (en) * 2020-06-03 2021-12-07 杭州海康威视数字技术股份有限公司 Training data acquisition method and device and model training method and device
CN113762292B (en) * 2020-06-03 2024-02-02 杭州海康威视数字技术股份有限公司 Training data acquisition method and device and model training method and device
CN112686340A (en) * 2021-03-12 2021-04-20 成都点泽智能科技有限公司 Dense small target detection method based on deep neural network
CN113191273A (en) * 2021-04-30 2021-07-30 西安聚全网络科技有限公司 Oil field well site video target detection and identification method and system based on neural network
CN113569672A (en) * 2021-07-16 2021-10-29 国网电力科学研究院有限公司 Lightweight target detection and fault identification method, device and system

Also Published As

Publication number Publication date
CN110569843B (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN110569843B (en) Intelligent detection and identification method for mine target
CN110598736B (en) Power equipment infrared image fault positioning, identifying and predicting method
CN109492581B (en) Human body action recognition method based on TP-STG frame
CN109118479B (en) Capsule network-based insulator defect identification and positioning device and method
CN107016357B (en) Video pedestrian detection method based on time domain convolutional neural network
US20180114071A1 (en) Method for analysing media content
CN111275688A (en) Small target detection method based on context feature fusion screening of attention mechanism
CN108875624A (en) Method for detecting human face based on the multiple dimensioned dense Connection Neural Network of cascade
CN105512640A (en) Method for acquiring people flow on the basis of video sequence
CN113283282B (en) Weak supervision time sequence action detection method based on time domain semantic features
CN112926522B (en) Behavior recognition method based on skeleton gesture and space-time diagram convolution network
CN115719475B (en) Three-stage trackside equipment fault automatic detection method based on deep learning
Wu et al. Single shot multibox detector for vehicles and pedestrians detection and classification
CN111950357A (en) Marine water surface garbage rapid identification method based on multi-feature YOLOV3
CN115880529A (en) Method and system for classifying fine granularity of birds based on attention and decoupling knowledge distillation
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN111626197B (en) Recognition method based on human behavior recognition network model
CN117423157A (en) Mine abnormal video action understanding method combining migration learning and regional invasion
Zhu et al. (Retracted) Transfer learning-based YOLOv3 model for road dense object detection
CN112487920B (en) Convolution neural network-based crossing behavior identification method
Rong et al. Weakly supervised text attention network for generating text proposals in scene images
CN114943873A (en) Method and device for classifying abnormal behaviors of construction site personnel
CN114663835A (en) Pedestrian tracking method, system, equipment and storage medium
KR20230060214A (en) Apparatus and Method for Tracking Person Image Based on Artificial Intelligence
Zhihao et al. Object detection algorithm based on dense connection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant