Disclosure of Invention
The embodiment of the invention provides an automatic detection method for the quantity of paper pulp cargos in the wharf loading and unloading process.
In order to realize the purpose of the invention, the invention is realized by adopting the following technical scheme:
the application relates to a method for automatically detecting the quantity of paper pulp goods in a wharf loading and unloading process, which is characterized by comprising the following steps of:
s1: extracting a video stream of a transport vehicle loaded with pulp goods in real time at a certain frequency;
s2: processing the extracted video image to segment a characteristic diagram of the paper pulp cargo part;
s3: processing the characteristic diagram to obtain candidate connection points;
s4: extracting a series of candidate line segment samples according to the candidate connecting points;
s5: filtering the candidate line segment samples;
s6: and counting the number of the candidate line segment samples to obtain the number of the pulp cargos.
In the present application, step S2 includes the following:
s21: inputting the extracted video image into a residual error network to obtain a characteristic image;
s22: processing the characteristic image, and extracting a rectangular candidate frame for distinguishing a foreground from a background;
s23: mapping the extracted rectangular candidate frame into the feature image, and unifying the window size of the rectangular candidate frame by using a regional feature aggregation technology;
s24: and segmenting the characteristic image into a characteristic map of the pulp cargo part according to the coordinate information of the rectangular candidate frame.
In the present application, S2 further includes the following steps after S23 and before S24: and S23': and performing boundary frame regression on the rectangular candidate frame after the unified window in the step S23 to correct the coordinate information of the rectangular candidate frame.
In the present application, step S3 includes the steps of:
s31: carrying out mesh division on the feature map to form M mesh units Wx×Hx;
S32: inputting the characteristic diagram into a convolutional layer and a classification layer in sequence for processing to calculate the confidence of each grid unit, and converting the characteristic diagram processed by the convolutional layer into a connection point biasMove characteristic diagram O (x),
Wherein V represents a set of points and wherein,
lirepresenting a certain connection point in the set of points V
iIn the grid cell
xIn the position (a) of (b),
representing grid cells
xThe center position of (a);
s33: performing threshold value limitation on the calculated confidence of each grid unit to obtain a probability feature map P (x) And classifies whether or not there is a connection point in each grid cell,
s34: using a connection point offset profile O (x) Predicting the relative position of the connection point in the corresponding grid cell;
s35: the relative positions of the connection points in the corresponding grid cells are optimized using linear regression.
In the present application, step S3 further includes the following steps: precise relative position information of the connection points in the grid cells is obtained by non-maxima suppression techniques.
In the present application, step S4 includes:
s41: outputting endpoint coordinate information of a series of line segment samples by adopting a positive and negative sample mixed sampling mechanism;
s42: and performing fixed-length vectorization processing on each line sample according to the end point coordinate information of each line sample to obtain a characteristic vector of each line sample so as to extract a series of line samples.
In the present application, step S5 specifically is: and filtering each line segment sample by using the intersection ratio between the areas of the rectangular frames formed by taking each line segment sample as an intersection line.
In the present application, step S5 specifically includes: and filtering each line segment sample by using the Euclidean distance between each line segment sample.
The automatic detection method for the number of the paper pulp cargos has the following advantages and beneficial effects:
after the pulp goods arrive at port, the pulp goods are loaded on the transport means, the video stream of the transport means loaded with the pulp goods is monitored in real time in a monitoring picture, video images are extracted from the real-time video stream, the part of the pulp goods is detected according to the video images, the characteristic that the pulp goods are bundled into one package is combined, and then line segment detection is carried out according to the detected part of the pulp goods, so that the detection of the quantity of the pulp goods is realized, the whole process does not need manual participation, the manual task quantity is reduced, the intellectualization and the automation are realized, the operation is carried out by means of a computer, the detection speed is high, and the detection efficiency is improved.
Other features and advantages of the present invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
In order to avoid manually counting the number of paper pulp cargo packages in the loading and unloading process of a wharf, the application relates to an automatic detection method for the number of paper pulp cargo. The main task of the automatic detection method for the number of the pulp cargos is to detect the pulp cargo bags and calculate the number of the contained pulp cargos for all vehicles passing through the surveillance video.
The task is to realize the automatic detection of the quantity of the pulp goods on the transport means, and comprehensively use the target detection and the line segment detection.
Target detection has mainly two tasks: the classification and positioning of the target requires separating foreground objects from the background and determining the category and position information of the foreground.
Current target detection algorithms can be divided into candidate region-based and end-to-end based methods, with candidate region-based methods being more dominant in terms of detection accuracy and positioning accuracy. Meanwhile, the paper pulp goods are placed irregularly, so that auxiliary information such as volume, midpoint and the like is difficult to obtain, and the difficulty is increased for further determining the quantity of the paper pulp goods.
The pulp goods have the characteristic of binding one bundle by one bundle, and the quantity information can be effectively extracted by adopting a line segment detection method in combination with the current line segment detection technology.
Referring to fig. 1 to 7, implementation of the pulp goods quantity automatic detection method will be described in detail as follows.
S1: the video stream of the vehicle loaded with pulp goods is extracted in real time at a certain frequency.
After arrival of the pulp goods, they are loaded onto a transport means, such as a transport truck, which is monitored by means of a monitoring picture, and a video image is captured by extracting a certain frequency of frames per second in a monitored video stream, see fig. 2.
It should be noted that each package of pulp goods loaded on the transport vehicle is substantially rectangular and has substantially uniform height and cross-sectional area, and is bound together into a package by straps (e.g., steel wires).
S2: and processing the extracted video image to segment a characteristic diagram of the pulp cargo part.
The task of step S2 is to implement target detection, and is performed in a plurality of sections as follows.
And S21, inputting the video image into a residual error network to obtain a characteristic image.
In order to extract semantic features of the video image and improve the capability of acquiring detailed features of the image, the video image is input into a residual error network to acquire a feature map.
The residual error network is an asymmetric coding and decoding structure fused with expansion convolution.
For the convenience of subsequent processing, the input video images with different sizes are firstly unified into a square of 512 × 512, and then are sequentially input into a plurality of coding and decoding modules.
In the coding and decoding module, the coding part comprises 3 times of convolution operation with step size of 2, each size of convolution is followed by 6 residual connecting blocks, wherein the convolution kernel used in the residual connecting block is processed by the expansion rate of 2 so as to obtain a larger receptive field. The decoding section restores the image to the input size.
The asymmetric coding and decoding structure can fully extract details and fuse a larger receptive field, so that the residual module can provide abundant detail characteristics for subsequent processing.
S22: and processing the characteristic image, and extracting a rectangular candidate frame for distinguishing the foreground from the background.
Firstly, generating 9 rectangular frames containing 3 shapes (the length-width ratio belongs to {1, 2 }), traversing each point in the characteristic image, matching the 9 rectangular frames for each point, judging the rectangular frame belonging to the foreground through a Softmax classifier, and solving a two-classification problem; and meanwhile, the coordinate information of the rectangular frame is corrected by using border regression of the frame to form a more accurate rectangular candidate frame.
Note that the foreground refers to the pulp cargo, and the background refers to a portion other than the pulp cargo portion.
S23: and mapping the extracted rectangular candidate frame into a feature image, and unifying the window size of the rectangular candidate frame by using a regional feature aggregation technology.
The region feature aggregation technique is proposed when used in Mask RCNN to generate a fixed-size feature map from the generated candidate box region pro-apparent map, which is a technique commonly used in the existing example segmentation architecture.
S24: and segmenting the characteristic image into a characteristic image of the pulp cargo part according to the coordinate information of the rectangular candidate frame.
And finding out the rectangular candidate frames belonging to the foreground according to the coordinate information of each rectangular candidate frame, thereby segmenting the characteristic diagram of the pulp cargo part.
In order to ensure the accuracy of the rectangular candidate frame and improve the automatic detection precision of the number of the paper pulp cargos, before the feature map of the paper pulp cargo part is segmented, the rectangular candidate frame obtained in the step S23 is subjected to sequential boundary frame regression, the coordinate information of the rectangular candidate frame is corrected, and the accuracy of the coordinate information is realized.
According to the rectangular candidate frame with accurate coordinate information, the characteristic image is divided into a characteristic diagram of the pulp cargo part according to the coordinate information of the rectangular candidate frame.
And (5) segmenting a characteristic diagram of the pulp cargo part, namely completing target detection.
Then, the line segment detection is needed to be carried out on the characteristic diagram of the pulp cargo part so as to detect the quantity of the pulp cargo in the video image.
The specific line segment detection section is described below with reference to fig. 1 to 7.
S3: and processing the characteristic graph to obtain candidate connection points.
The characteristic diagram here refers to the characteristic diagram of the pulp cargo portion divided in S24.
S31: the feature map is subjected to grid division to formMA grid cellWx×Hx。
Performing gridding treatment on the feature mapW×HIs divided intoMA grid cell having a grid area ofWx×HxWhere V represents a set of points.
In a certain grid cellxIn the method, it is required to predict whether a candidate connection point exists, and if a connection point exists, predict that the connection point exists in the grid cellxRelative position in (2).
S32: and sequentially inputting the feature maps into a convolutional layer and a classification layer for processing so as to calculate the confidence coefficient of each grid unit, and converting the feature maps processed by the convolutional layer into a connection point offset feature map O (x).
Specifically, the feature maps are processed using a network comprising 1 x 1 convolutional layers and classification layers in which confidence is calculated by a softmax classification function as to whether there is a connection point in each grid cell.
And converting the characteristic diagram into a characteristic diagram O (with offset of connection points) by using a network containing 1 × 1 convolution layersx) The following:
wherein,
lirepresenting a connection point in the set of points V
iIn the grid cell
xIn the position (a) of (b),
representing grid cells
xThe center position of (a).
S33: performing threshold value limitation on the calculated confidence of each grid unit to obtain a probability feature map P (x) And for each grid cell isWhether a connection point exists is classified.
Probability feature map P (x) The following were used:
that is, whether there is a connection point in the grid cell is a two-classification problem.
Limiting the calculated confidence of each grid unit by a threshold value p if the grid unit isxIf the confidence of (2) is greater than the threshold, then P (3) is satisfiedx) =1, consider the grid cellxIn (b) there is a connection point, otherwise P: (b)x) =0, consider the grid cellxThere are no connection points.
If the presence of a connection point in a grid cell is predicted, the relative location of the connection point in the grid cell continues to be predicted (this section is described in detail below).
S34: using a connection point offset profile O (x) The relative position of the connection point in the corresponding grid cell is predicted.
O(x) Arranged as the center point and connection point of the grid celliFor predicting the connection pointiIn the grid cellxRelative position in (2).
S35: the relative positions of the connection points in the corresponding grid cells are optimized using linear regression.
If the grid unit x comprises the connection point i, an L2 linear regression is selected to optimize the relative position of the connection point, and the objective function of the L2 linear regression is as follows:
whereinNvIndicating the number of connection points.
In addition, employing non-maximum suppression techniques further eliminates "non-attachment points" in each network cell, i.e., obtains more accurate relative location information of attachment points in the network cell.
Particularly in the procedure, may be implemented by maximally pooling operations for obtaining more accurate relative position information of connection points in the network element.
After the processing of S31 to S35 as described above, the relative positional relationship of the K candidate connection points with the highest confidence is finally output
Refer to fig. 3.
It should be noted that, before step S3 is actually executed, the whole model in S3 needs to be trained by using the cross entropy loss function, the trained model can be directly used in actual use, and when the feature map is used as an input, the output obtains the candidate connection points as described above.
S4: and extracting a series of candidate line segment samples according to the candidate connecting points acquired in the S3.
The purpose of this step is to select K candidate connection points based on the K candidate connection points obtained in S3
Obtaining T candidate line segment samples
Wherein
And
is shown as
zThe coordinates of the endpoints of the sample of candidate line segments.
S41: and acquiring the endpoint coordinate information of the T candidate line segment samples by adopting a positive and negative sample mixed sampling mechanism.
It should be noted that the mixed sampling of positive and negative samples is a preparation work for model training, in the training process, the difference between the number of positive and negative samples of the K candidate connection points is large, the number of positive and negative samples needs to be balanced, and a mixed training mode of positive and negative samples is adopted, wherein the positive samples come from the labeled true line segment samples, and the negative samples are unreal line segment samples generated randomly through heuristic learning.
When there are few accurate positive samples or training is saturated in the extracted K candidate connection points, quantitative positive/negative samples are added to help start training. Moreover, the added positive samples help the prediction points to adjust the positions, and the prediction performance is improved.
S41: and performing fixed-length vectorization processing on each line sample according to the endpoint coordinate information of each line sample to obtain a feature vector of each line sample so as to extract a series of candidate line samples.
Based on some sample of candidate line segments, e.g. of
zTwo endpoint coordinates of candidate line segment sample
And
vectorization processing of fixed length for line segment sample, i.e. calculation by two end point coordinates
N l Uniformly distributing points, and obtaining coordinates of intermediate points on the feature map output in the step S2 through bilinear interpolation:
thus, a feature vector q of line segment samples is extracted, which isC×N l In whichCThe number of channels of the feature map output in step S2.
At this time, a sample of candidate line segments is extracted.
It should be noted that before a series of candidate line segment samples are obtained through the endpoint coordinate information of the line segment samples, model training is required.
In the training process, the feature map output in step S2 and the candidate line segment sample output in step S4 are required as input.
The training process is briefly described as follows:
first, based on a sample of candidate line segments, e.g. the first
zCoordinates of two endpoints of each candidate line segment sample
And
vectorization processing of fixed length for line segment sample, i.e. calculation by two end point coordinates
N l Uniformly distributing the points, and obtaining the coordinates of the intermediate points on the characteristic diagram output in the step S2 through bilinear interpolation:
then, the feature vector q is dimensionality reduced by a one-dimensional maximum pooling operation of step size s to becomeC×N l And/s and is expanded into a one-dimensional feature vector.
Inputting the one-dimensional feature vector into a full-connection layer for convolution processing to obtain a logic value, specifically, after performing full-connection convolution twice on the one-dimensional feature vector, taking a log value and returning to the full-connection layer
。
And true value
ySigmoid loss calculation and model optimization are carried out together to improve the prediction accuracy, wherein the loss function is as follows:
wherein the true valueyThat is, after the feature values in the feature map output in S2 by the true labels of the line segment samples are convolved, the log is taken and the value is returned.
The penalty is the log of the calculated prediction (i.e., the
) And the error between the log values (namely y) corresponding to the real labels of the line segment samples is used for model training and optimization.
Since repeated detection inevitably occurs during detection, that is, two line segment samples overlap, the line segment samples output in S4 need to be filtered in order to improve the detection accuracy of the line segment samples.
The filtering method is implemented by the step S5, that is, each line segment sample is filtered by using an Intersection-over-unity (IoU) ratio between areas of rectangular frames formed by each line segment sample as an Intersection, or each line segment sample is filtered by using a euclidean distance between line segment samples.
The filtering method is not limited herein as long as the purpose of filtering can be achieved.
The IoU filtering line segment samples are taken as an example for explanation.
Suppose that the coordinates of two coincident line segment samples are L respectively 1 [(x 11 ,y 11 ),(x 12 ,y 12 )]And L 2 [(x 21 ,y 21 ),(x 22 , y 22 )]。
L 1 Length H of rectangular frame R1 formed 1 Width W 1 And area A thereof 1 And L is 2 Length H of rectangular frame R2 formed 2 Width W of 2 And area A thereof 2 Respectively expressed as:
the length H, the width H and the area A of the rectangular frame where the two rectangular frames R1 and R2 intersect are respectively expressed as:
if it is not
Or
IoU =0. Otherwise, the IoU is calculated using the following equation:
and limiting the threshold of the IoU, adjusting the threshold of the IoU, filtering, and finally outputting a filtered line segment sample, which is shown in FIG. 4.
S6: and counting the number of the line segment samples after filtering to obtain the number of the pulp cargos.
The number of line segment samples after filtering is the number of pulp acquisitions.
Fig. 5 is a final detection diagram of the line segment candidate samples in fig. 4 mapped onto the original paper pulp goods image, and it can be seen from the detection effect that the automatic detection of the number of the paper pulp goods is accurate.
Referring to fig. 6 and 7, fig. 6 is a truncated original video image; fig. 7 is a diagram showing the effect of the pulp cargo field inspection applied to the video image in fig. 6 by using the automatic pulp cargo quantity detection method proposed in the present application, which outputs the quantity of line segment samples, i.e., the pulp cargo quantity.
As can be seen from the detection effect graphs of FIG. 5 and FIG. 7, the automatic detection method for the quantity of the paper pulp cargos, which is provided by the application, has high detection accuracy.
The automatic detection method for the quantity of the paper pulp cargos directly extracts the video images from the video stream for detection, is carried out automatically, does not need manual participation, reduces the manual task amount, and is high in detection speed due to intelligent computer calculation, and the detection efficiency is improved.
The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.