CN113269188B

CN113269188B - Mark point and pixel coordinate detection method thereof

Info

Publication number: CN113269188B
Application number: CN202110674621.8A
Authority: CN
Inventors: 赵祚喜; 黄渊; 朱裕昌; 黎源鸿; 邱志; 罗阳帆; 谢超世; 张壮壮; 曹阳阳; 林旭; 项波瑞; 杨厚城; 罗舒元
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2023-03-14
Anticipated expiration: 2041-06-17
Also published as: CN113269188A

Abstract

The invention relates to a general method for detecting a mark point and a pixel coordinate thereof, which comprises the following steps: the method comprises the following steps: acquiring a mark point image and marking a label by a camera to manufacture a data set; step two: a mark point detection model is built, and training is carried out by a data set; step three: collecting a detection image, adding a detection frame in the image, and judging whether the detection frame contains a mark point or not by a mark point detection model classification branch; step four: if the classified branch judges that the detection frame has the mark point, the mark point detection model regresses the pixel coordinate of the center of the branch prediction mark point and outputs the pixel coordinate; if the classification branch judges that no mark point exists in the detection frame, the detection frame is expanded, and then the pixel coordinates of the center of the mark point are predicted through the regression branch. The invention can detect various simple mark points and acquire the pixel coordinates of the centers of the mark points, has strong universality, can realize the detection of dynamic mark points by combining various target tracking algorithms, and belongs to the technical field of image processing.

Description

Mark point and pixel coordinate detection method thereof

Technical Field

The invention relates to the technical field of image processing, in particular to a mark point and a pixel coordinate detection method thereof.

Background

The identification of the mark point and the detection of the pixel coordinate thereof are very important technologies, and the technologies are widely applied to industries of production, manufacturing, engineering measurement and the like.

Although the traditional image-based inertial navigation information mark point identification algorithm reduces the influence of noise to a certain extent, the efficiency is not high, the application is not large and the cost is high in the common industry, and the method universality is poor. The digital marking point identification method mainly realizes identification of the marking point through digital calculation, is simple and quick, and has strong noise resistance, but has strict specifications on the marking point, the marking point made of special materials is required, only specific occasions can be applied, and the universality and the practicability are poor. The recognition process of the recognition algorithm based on pattern matching is complicated and time-consuming.

The convolutional neural network has breakthrough progress in the aspects of classification and identification in recent years, and the inventor finds that the convolutional neural network is suitable for identification of mark points and detection of pixel coordinates, and has the characteristics of strong universality, high detection speed, strong robustness of accurate algorithm, easy combination with other algorithms and the like.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention aims to: the utility model provides a marking point and a pixel coordinate detection method thereof, aiming at solving the problems of speed, accuracy, robustness and method universality of marking point identification in engineering and the problem of corresponding pixel coordinate extraction accuracy.

In order to achieve the purpose, the invention adopts the following technical scheme:

a mark point and a pixel coordinate detection method thereof comprise the following steps:

the method comprises the following steps: acquiring a mark point image and marking a label by a camera to manufacture a data set;

step two: constructing a mark point detection model based on a convolutional neural network, and training by a data set;

step three: collecting a detection image, adding a detection frame in the image, and judging whether the detection frame contains a mark point or not by a mark point detection model classification branch;

step four: if the mark point detection model classification branch judges that the mark point exists in the detection frame, the mark point detection model regresses the pixel coordinate of the branch prediction mark point center and outputs the pixel coordinate; if the classified branch of the mark point detection model judges that no mark point exists in the detection frame, the detection frame is expanded, and then the mark point detection model is used for returning the pixel coordinates of the center of the branch prediction mark point.

Preferably, in the first step, the label format of the image of the data set is (0/1, u, v), wherein 0/1 represents 0 or 1,0 represents no mark point, 1 represents a mark point, and (u, v) represents the pixel coordinate of the center of the mark point. If the image does not contain the mark point, the value is (0, 0).

Preferably, in the second step, the mark point detection model is based on a LetNet-5 model, adopts a feature extractor consistent with the LetNet-5 model, and is provided with two branches, namely a classification branch and a regression branch; the two branches share the same feature extractor, and each branch consists of 2 layers of full connection layers, the number of neurons is the same, wherein the number of neurons in the first full connection layer is 84, and the number of neurons in the second full connection layer is 2.

As a preference, the added detection frame position and size are an array (u) consisting of 4 elements _ROI ,v _ROI ,h _ROI ,w _ROI ) Is shown in (u) _ROI ,v _ROI ) Pixel coordinates in the whole image representing the corner point at the top left corner of the detection box, h _ROI 、w _ROI The height and the width of the detection frame are respectively shown, and the unit is a pixel.

Preferably, in the third step, the mark point detection model classification branch is used for judging whether a mark point exists in the detection frame, softmax operation is needed to make the output result conform to probability distribution before the output result of the mark point detection model classification branch, and the output result is

Respectively representing the probability of the existence or nonexistence of the mark points in the detection frame, and

wherein the maximum value of the two is regarded as the model prediction result of

Preferably, in step four, the regression branch of the mark point detection model is used to predict the pixel coordinates of the center of the mark point in the detection frame, and the output result of the regression branch is

Represents the pixel coordinates of the center of the mark point in the detection frame, and

the model predicts a result of

Preferably, in the fourth step, the detection frame is expanded along the height and width directions by taking the upper left corner as an origin, the processed image in the detection frame is directly transmitted to the regression branch of the mark point detection model, and after the prediction of the regression branch is completed, the detection frame is restored to the size before expansion by taking the predicted pixel coordinate as the center.

Preferably, the pattern of dots used to detect the image in step three is the same as the pattern of dots used in step one.

Preferably, in the third step, a camera collects a video of a moving object, and mark points are pasted on the surface of the object; a plurality of detection frames are manually added in the 1 st frame of a video, then the position of the detection frame in each frame is updated by an MOSSE target tracking algorithm, and a marking point detection model classification branch judges whether each detection frame contains a marking point.

Preferably, in the fourth step, the detection frame is expanded, if the mark point is still judged not to be present after once expansion, the detection is stopped, the detection frame is manually added into the detection image, and then the third step is carried out, and whether the mark point is contained in the detection frame is judged by the mark point detection model classification branch.

The principle of the invention is as follows: the method comprises the steps of firstly collecting a mark point image through a camera and labeling a label to manufacture a data set, then building a mark point detection model and training the data set, then collecting a detection image and adding a detection frame in the image, finally judging whether a mark point exists in the detection frame through the mark point detection model, and outputting a pixel coordinate of the center of the mark point.

In general, the present invention has the following advantages:

1. the method has strong algorithm universality, the mark points are detected by utilizing the convolutional neural network, the method is still applicable when the mark points are different, and the dynamic detection of the mark points can be realized by combining various target tracking algorithm methods.

2. The algorithm of the invention has strong robustness, can still detect the mark points with shielding, poor imaging quality and different shooting angles, and has high detection speed.

3. The method has high accuracy in extracting the pixel coordinates of the mark points, and can extract the pixel coordinates in the mark points according to requirements.

Drawings

FIG. 1 is a schematic view of the main flow of the method of the present invention.

Fig. 2 is a diagram showing a structure of a marker detection model.

Fig. 3 is a pattern of marking points in this example.

Detailed Description

The present invention will be described in further detail with reference to specific embodiments.

the method comprises the following steps: and acquiring a mark point image through a camera and labeling a label to manufacture a data set.

In the step, a plurality of images containing mark points are shot by a camera, the patterns of the mark points are shown in figure 3, and then the grayscale images with the size of 32 multiplied by 32 are intercepted by a screenshot tool, and the labels of the grayscale images are arranged. In this example, 1300 pieces of 32 × 32 grayscale images were collected as a data set.

Step two: and (4) building a mark point detection model based on the convolutional neural network, and training by using a data set.

In the step, the input layer is a 32 × 32 gray image, the activation functions of the convolutional layers and the fully-connected layers are replaced by ReLu functions, the optimizer adopted in the model training is an Adam optimizer, batch standardization is added after each convolutional layer, and 20% of parameters are discarded by adding a Dropout layer after each pooling layer, so that overfitting during training is prevented, and the generalization capability of the model is improved. During training, the classification branch adopts a sparse cross entropy function as a loss function, and the formula is as follows:

the regression branch uses the mean square error function as a loss function in order to minimize the distance between the predicted pixel coordinates and the true pixel coordinates. The formula is as follows:

in the above two equations, C is a loss function, y is a true value, a represents a predicted value, x represents a sample, and n represents the total number of samples.

As shown in fig. 2, the feature extractor in the mark point detection model has 5 layers in total, which includes 3 convolutional layers and 2 pooling layers. Wherein the convolution kernel size for all convolution layers is 5 x 5 and the pooling size for all pooling layers is 2 x 2. The height and the width of the C1 layer feature diagram are both 28, the number of feature channels is 6, the height and the width of the S2 layer feature diagram are both 14, the number of feature channels is 6, the height and the width of the C3 layer feature diagram are both 10, the number of feature channels is 16, the height and the width of the S4 layer feature diagram are both 5, the number of feature channels is 16, and the output of the C5 layer is 1 multiplied by 10. The classification branch and the regression branch of the detection model consist of 2 layers of full connection layers, the number of neurons is the same, the number of neurons in the first full connection layer is 84, and the number of neurons in the second full connection layer is 2.

Step three: and collecting a detection video, adding a detection frame in a first frame of the video, and judging whether the detection frame contains a mark point or not by a mark point detection model classification branch.

In this step, a camera collects video of a moving object, wherein the video resolution is 1280 × 800, the frame rate is 30fps, and the total frame number of the video is 80fps. The moving object in the video does irregular movement, 10 marking points are pasted on the surface of the moving object, the pattern of the marking points is the same as that of the marking points in the training set, as shown in figure 3, circular marking points are adopted, the circle is divided into four equal parts, and black and white filling is arranged at intervals. In Python, a video of a moving object is opened by using a correlation function in an OpenCV library, 10 detection frames are manually added in the 1 st frame of the video, the detection frames added in the first frame require that the detection frames contain mark point centers, the height and the width of each detection frame should be larger than 32 pixels, and then images in the detection frames are compressed to be 32 pixels in height and width. And then updating the position of the detection frame in each frame by an MOSSE target tracking algorithm, and judging whether the detection frame contains the mark point or not by the mark point detection model classification branch.

Step four: if the mark point detection model classification branch judges that the mark point exists in the detection frame, the mark point detection model regresses the pixel coordinate of the branch prediction mark point center and outputs the pixel coordinate; if the classified branch of the marking point detection model judges that no marking point exists in the detection frame, the detection frame is expanded, and then the marking point detection model returns to the pixel coordinate of the center of the branch prediction marking point.

If no mark point is found in the detection frame in the step, the size of the detection frame is enlarged to 2 times along the height and width directions by taking the upper left corner of the detection frame as an origin, namely the area of the detection frame is enlarged to 4 times, namely the height and width of the detection frame are 74 pixels. And if the mark point can not be judged after once expansion, stopping detection, and manually adding a detection frame in the frame. And then, the marking point detection model regresses the pixel coordinates of the center of the branch prediction marking point. After the pixel output is completed by the regression branch, the detection frame is restored to the size before enlargement with the predicted pixel coordinates as the center. Assuming that the coordinates of the central pixel of the predicted mark point are (u, v), the height and width of the detection frame before expansion are h _ROI 、w _ROI If the pixel coordinate of the corner point at the top left corner of the restored detection frame is (u ', v'), then:

according to the operation of the previous step, the pixel coordinates of the center of the mark point are extracted by the mark point detection model, and data of 80 frames of images are obtained in total, wherein 1 frame of image comprises 10 mark points. From the pixel coordinate result of the 80 frames of prediction, 20 frames of data are randomly selected, the data are predicted values, and the real values are obtained by a mature high-speed 3D photogrammetry system, so that the average relative error of each mark point in the 20 frames in the u and v directions is calculated, as shown in the table 1, most of the predicted values of the method of the invention have the relative error of less than 0.87%.

TABLE 1 mean relative error of the process of the invention

The average Euclidean distance between the predicted value and the real value of the center of the mark point is calculated by the following formula, wherein (u, v) is the real value of the pixel coordinate of the center of the mark point,

the coordinates of the center pixel of the marker predicted by the marker detector.

The average Euclidean distance between the predicted value and the real value of each mark point in 20 frames is obtained through the formula, the Euclidean distance between most predicted values and the real values is smaller than 4 pixels, and the total average Euclidean distance in the example is 2.74 pixels.

TABLE 2 mean Euclidean distance for the method of the invention

In summary, in most cases, the relative error of the coordinates of the central Pixel of the mark point detected by the invention is less than 0.87%, and the Euclidean distance is less than 4 pixels.

The mark points are identified by the convolutional neural network, the mark point manufacturing data set can be trained according to requirements, the detection of the required mark points is realized, the tracking of the mark points can be realized by combining various target tracking algorithm methods, and the method is strong in universality. The identification algorithm of the invention has strong robustness, can still detect the mark points with shielding, poor imaging quality and different shooting angles, has high detection speed and high accuracy of extracting the pixel coordinates of the mark points, and can extract the pixel coordinates in the mark points according to the requirements.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A mark point and a pixel coordinate detection method thereof are characterized in that: the method comprises the following steps:

step four: if the mark point detection model classification branch judges that the mark point exists in the detection frame, the mark point detection model regresses the pixel coordinate of the branch prediction mark point center and outputs the pixel coordinate; if the classified branch of the mark point detection model judges that no mark point exists in the detection frame, the detection frame is expanded, and then the mark point detection model is used for returning the pixel coordinates of the center of the branch prediction mark point;

in the first step, the label format of the image of the data set is (0/1, u, v), wherein 0/1 represents 0 or 1,0 represents no mark point, 1 represents a mark point, and (u, v) represents the pixel coordinate of the center of the mark point;

in the second step, the marking point detection model is based on a LetNet-5 model, adopts a feature extractor consistent with the LetNet-5 model, and is provided with two branches, namely a classification branch and a regression branch; the two branches share the same feature extractor and are composed of 2 full-connection layers, the number of the neurons is the same, the number of the neurons in the first full-connection layer is 84, and the number of the neurons in the second full-connection layer is 2;

in step three, the added detection frame position and size are an array consisting of 4 elements

(u _ROI ,v _ROI ,h _ROI ,w _ROI ) Is shown in (u) _ROI ,v _ROI ) The pixel coordinates of the corner point at the upper left corner of the detection frame in the whole image are represented, h _ROI 、w _ROI Respectively representing the height and the width of a detection frame, wherein the unit is a pixel;

in the fourth step, the detection frame takes the upper left corner as an origin, the detection frame is expanded along the height direction and the width direction, the processed image in the detection frame is directly transmitted to the regression branch of the mark point detection model, and after the prediction of the regression branch is finished, the detection frame is restored to the size before the expansion by taking the predicted pixel coordinate as the center;

in the third step, the mark point detection model classification branch is used for judging whether mark points exist in the detection frame or not, the Softmax operation is needed before the output result of the mark point detection model classification branch, so that the output result accords with the probability distribution, and the output result is

Respectively representing the probability of the presence or absence of the mark point in the detection frame, and

In the fourth step, the regression branch of the mark point detection model is used for predicting the pixel coordinates of the center of the mark point in the detection frame, and the output result of the regression branch is

the model predicts a result of

In the third step, a camera collects the video of a moving object, and mark points are pasted on the surface of the object; manually adding a plurality of detection boxes in the 1 st frame of the video, and then performing a MOSSE target tracking algorithmUpdating the position of the detection frame in each frame, and judging whether each detection frame contains a mark point or not by the mark point detection model classification branch;

and step four, expanding the detection frame, stopping detection if the mark point still cannot be judged after once expansion, manually adding the detection frame in the detection image, returning to step three, and judging whether the detection frame contains the mark point or not by the mark point detection model classification branch.

2. A marking point and a pixel coordinate detecting method thereof according to claim 1, wherein: the marking dot pattern used for detecting the image in the third step is the same as the marking dot pattern used in the first step.