CN115424009A

CN115424009A - Automatic reading method for pointer instrument data based on Yolact network

Info

Publication number: CN115424009A
Application number: CN202211113978.XA
Authority: CN
Inventors: 卢业康; 方黎勇; 孙俊男; 王思维; 谌小彤
Original assignee: Nanjing Shidao Information Technology Co ltd; Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Nanjing Shidao Information Technology Co ltd; Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2022-12-02

Abstract

The invention provides a method for automatically reading pointer instrument data based on a Yolact network, which comprises the following steps: s1, correcting an acquired instrument image; s2, carrying out example segmentation on the corrected instrument image by adopting a Yolact example segmentation network, and segmenting the instrument image into a dial and a pointer area to obtain a pointer mask and a dial mask; s3, numerical matching: FGINN matching is carried out on the corrected image and the template image to obtain a plurality of matching point pairs; s4, straight line fitting and reading: the correction dial plate is binarized, a graduation line outline is obtained from a dial plate area, then straight line fitting is carried out on the graduation line outline, the intersection point of straight lines is taken to obtain a central point I, and the projection point of the central point I on a pointer straight line equation is taken as the central point; and obtaining a reading by using an angle method according to the difference value of the pointer and any matching point. The invention can utilize deep learning to combine with image processing to carry out reading detection on the pointer instrument, thereby effectively improving the accuracy of reading numerical values.

Description

Automatic reading method for pointer instrument data based on Yolact network

Technical Field

The invention relates to the technical field of computer image processing, in particular to a method for automatically reading pointer instrument data based on a Yolact network.

Background

Currently, the main instrument inspection means is manual inspection, which requires high cost to cultivate and employ skilled inspection workers. The acquisition and the input of the data monitored by the pointer instrument still need to be observed by human eyes and recorded manually, at first, the acquisition and the input mode of the data not only consumes a large amount of manpower and material resources, but also is easy to cause low working efficiency due to large workload, errors are easy to generate, the accuracy of data acquisition is reduced, and the subjective influence on the data acquisition quality is large. Moreover, most instruments work under severe environmental conditions of high temperature, high pressure, high radiation, even toxicity and the like, and if a mode of manually acquiring and recording data is adopted, the life safety of a data acquirer is seriously threatened. In addition, the manual data acquisition and input consumes too long time, the efficiency of industrial production is seriously influenced, and the problems of too high cost, complex operation and excessive resource waste exist when the pointer instrument is replaced and upgraded.

In addition to the manual method described above, there is also a conventional pointer instrument reading identification method that acquires data by collecting a pointer instrument image; however, because the shooting angle and time are different, the collected meter image is easily affected by the natural environment, such as the illumination, angle and size. The traditional pointer instrument reading identification method is easily influenced by the environment, and under different environments, instrument images obtained through processing have large difference, so that the problems of low identification rate and large reading error exist.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides a method for automatically reading data of a pointer instrument based on a Yoract network.

In order to achieve the above object, the present invention provides a method for automatically reading data of a pointer instrument based on a yoract network, comprising the following steps:

s1, correcting an acquired instrument image;

s2, carrying out example segmentation on the corrected instrument image by adopting a Yolact example segmentation network, and segmenting the instrument image into a dial and a pointer area to obtain a pointer mask and a dial mask; the division of dial plate is carried out in order to reject the external environmental disturbance factor of appearance for follow-up reading detection is more accurate.

By locating the dial plate using the yloact algorithm based on deep learning, the dial plate position can be quickly and accurately obtained and the pointer mask can be segmented. Compared with the traditional method, the method has strong anti-interference performance, can be suitable for positioning and identifying in different environments, and can continuously improve the algorithm efficiency along with the improvement of hardware conditions.

S3, numerical matching: FGINN matching is carried out on the corrected image and the template image, a threshold value with inconsistent geometry in FGINN is set to be 1 pixel, matching precision is further improved, a plurality of matching point pairs are obtained, then the center of the dial plate is taken as a central point, a horizontal right straight line is taken as a polar coordinate axis, so that the difference value of the angle of each matching point in the corrected image and the angle of the pointer is calculated, and the maximum negative value and the minimum positive value are taken to obtain the matching point with the minimum difference value on two sides of the pointer;

the template image is marked with a central point, a zero point, a numerical point and a numerical value, which are called marking information. The index value can be calculated according to the marking information, and the reading value of any point can be calculated according to the angle difference value. Since the matching points in the template image and the corrected image are strictly matched, the reading values of the matching points corresponding to the template image and the corrected image are equal, and the values in the corrected image can be accurately read.

S4, straight line fitting and reading: the method comprises the steps of binarizing a correction dial plate, obtaining a graduation line outline from a dial plate area, fitting straight lines on the graduation line outline to obtain a straight line intersection point to obtain a first central point, fitting a least square straight line on a pointer mask to obtain a pointer straight line equation, wherein the pointer mask is accurate due to the fact that accuracy of pointer detection is high, a pointer can be considered to pass through the central point, and a projection point of the first central point on the pointer straight line equation is used as the central point; and obtaining a corresponding numerical value of the matching point according to the matching point with the minimum difference value on the two sides of the pointer and the template image obtained in the step S3, then obtaining a division value according to the angle difference value of the two matching points, and obtaining a reading by using an angle method according to the difference value of the pointer and any one matching point.

A center positioning method based on the intersection point of the scale mark fitting straight line is provided, and the reading precision is improved by using a scattered point fitting center point.

Further, the S1 includes the steps of:

s1-1, detecting feature points of the collected original image and the collected template image by adopting an SIFT algorithm to obtain key points;

s1-2, matching and screening the key points by improving a matching algorithm FGINN to obtain a plurality of pairs of matching points, and then solving homography matrixes of an original image and a template image by a random sampling consensus algorithm;

and S1-3, performing affine transformation by using the homography matrix to obtain a corrected image.

The FGINN algorithm is an improved SIFT image matching algorithm, realizes image correction by utilizing the FGINN algorithm, has a good correction effect on severely inclined images, can finish reading under the condition of not marking a starting point by utilizing key point matching, and has high expansibility.

Further, the improved matching algorithm FGINN includes:

any key point of the original image is point1, the nearest key point of the key point1 on the template image is point21, and the distance between the point1 and the point21 is dis1; then, selecting a key point22 point which is inconsistent in geometry and is closest to the point on the template image, wherein the distance between the point1 and the point21 is dis2; if dis1/dis2< sigma, point1 and point21 are regarded as correctly matched point pairs, otherwise, the mismatched point pairs are removed; where σ represents a threshold ratio;

the point21 is a key point which is close to the point1 for the first time, and the point21 is a key point which is close to the point1 for the second time;

the geometric inconsistency includes: the Euclidean distance between the centers of the regions is larger than or equal to n pixels.

The center of the area is as follows: when a key point is specified in the image space, it is a region.

The use of the matching points obtained by the improved matching algorithm FGINN avoids the detection of keypoints with multiple directions or too close to each other.

Further, n is 10.

Further, the threshold ratio σ ranges from 0.6 to 0.8.

Further, the yoract instance split network includes: using ResNet101 as the backbone for the yotact model, the meter images were adjusted using the pre-trained model weights.

Further, still include: before S1, carrying out image illumination removing influence processing on the collected instrument image:

because the collected instrument image may be affected by light, an over-bright area or a shadow area is formed to affect the reading; therefore, an image de-illumination affecting processing operation is performed before the image is corrected, wherein the image de-illumination affecting processing operation comprises the following steps:

s100, judging whether to perform shadow compensation or brightness compensation according to the histogram distribution characteristic information of the image;

s200, obtaining an average brightness value of the image, judging whether the brightness value of each pixel is in an x 1-x 2 times average brightness value interval, if not, executing the next step, and if not, not processing;

s300, dividing the connected pixels in the same interval brightness interval into areas with the same brightness: obtaining the brightness value h of the current pixel _c Calculating the brightness value h of the current pixel _c Average value h of brightness of pixels connected in periphery _z Judgment of h _c Whether an overexposure threshold is met, if so, h _z The luminance value h assigned to the current pixel _c (ii) a Judgment h _c Whether an underexposure threshold is met, if so, h _z Luminance value h assigned to the current pixel _c . Then judge h _c Whether or not to satisfy α × h _z ≤h _c ≤β×h _z If yes, the pixels are in the same interval brightness interval; α is a first luminance coefficient, and β is a first luminance coefficient.

S400, determining a compensation parameter: classifying the areas with the same brightness into areas needing brightness compensation and areas needing shadow compensation; then, calculating a compensation parameter, wherein the calculation formula of the compensation parameter is as follows:

where Comp is the compensation parameter of the area to be compensated, h _i Representing the pixel luminance in areas of homogeneous luminance, h _aver Representing the mean brightness value, λ, of the acquired image _i Representing the tone of the ith pixelAnd (3) an integral parameter, wherein N represents the number of pixels of the current similar brightness area, and D (i) represents the compensation weight value of the current similar brightness area.

And S500, processing the image according to the compensation parameters to obtain the image after the image is subjected to illumination influence removal processing.

In conclusion, due to the adoption of the technical scheme, the reading detection of the pointer instrument can be realized by combining deep learning with image processing, and the accuracy of reading values is effectively improved. By locating the dial plate using the yloact algorithm based on deep learning, the dial plate position can be quickly and accurately obtained and the pointer mask can be segmented. Compared with the traditional method, the method has extremely strong anti-interference performance, can be suitable for positioning and identifying in different environments, and can continuously improve the algorithm efficiency along with the improvement of hardware conditions.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow diagram of the present invention.

FIG. 2 is a schematic view of the process of image rectification of the meter of the present invention.

FIG. 3 is a process diagram of matching key points in accordance with the present invention.

FIG. 4 is a schematic diagram of the process of line fitting and reading according to the present invention.

Fig. 5 is a schematic diagram of the recognition effect of the present invention.

FIG. 6 is a schematic diagram of a part of the key point screening method of the present invention; FIG. 6 (a) is a nearest neighbor strategy; FIG. 6 (b) is a mutual nearest neighbor strategy; fig. 6 (c) shows the nearest neighbor ratio check SNN. Where circles represent keypoints of image 1 and squares represent keypoints of image 2.

FIG. 7 is a schematic diagram of the geometrically inconsistent nearest neighbor ratio algorithm FGINN of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention.

In order to achieve the above object, the present invention provides a method for automatically reading data of a pointer instrument based on a Yolact network, as shown in fig. 1, including the following steps:

s1, instrument image correction: the dial correction first requires a template image with labels, as shown in fig. 2, where the template image labels a zero point, a center point, a position of a data point, and a reading value thereof. Carrying out SIFT operator detection on the acquired original image and the template image to obtain key points; and then, matching and screening the key points by improving a matching algorithm FGINN to obtain a plurality of pairs of matching points, and then solving homography matrixes of the two images by a random sample consensus algorithm (RANSAC algorithm). Affine transformation is performed by using the obtained homography matrix, and a corrected image is obtained.

The SIFT-based improved matching algorithm FGINN comprises:

scale Invariant Feature Transform (SIFT) is a computer vision algorithm that detects and describes local features of an image by finding the extreme points in the spatial scale and extracting their position, scale and rotation invariants. The SIFT algorithm is decomposed into the following four steps: the method comprises the steps of scale space extremum detection, key point positioning, direction determination and key point description. Points detected by the SIFT algorithm are all called key points.

When the key points are detected, the key points need to be matched, and the following method is generally adopted:

the nearest neighbor strategy is shown in fig. 6 (a): for the key points in any two images, taking the Euclidean distance of the key point vectors as the measure of the similarity of the key points, taking a certain key point in the image 1, and finding two key points which are closest to the key point in the image 2, wherein the pair of key points which are closest to the key point is a matching point. Although the algorithm of the nearest neighbor strategy is simple, the matching accuracy is low, and most matches are wrong.

To overcome this, a mutual nearest neighbor or cross-consistency check may be performed, as shown in fig. 6 (b). This means that two matches need to be made: the key point in the image 2 closest to the key point is found from the image 1, the key point in the image 1 closest to the key point is found from the image 2, and the matching point is stored as long as the distance between the key points is the shortest.

In addition, the ratio of the distance between the two points can be used for screening, and if the ratio of the nearest distance to the next nearest distance is smaller than a certain threshold value, the point is considered to be a correct point pair. This method is called nearest neighbor ratio checking (SNN), and as shown in fig. 6 (c), the threshold ratio σ is generally 0.8. That is, for a certain SIFT key point1 in the image 1, finding the SIFT key point21 (taking the distance from the key point21 to the point1 as dis 1) nearest to the point1 and the next key point22 (taking the distance from the key point22 to the point1 as dis 2) by searching all SIFT key points on the image 2, if dis1/dis2< σ, the point is regarded as a correctly matched point pair, otherwise, the point pair which is mismatched is removed. And the SIFT key points are key points obtained by adopting SIFT.

However, if keypoints with multiple directions or too close to each other are detected, this may result in a violation of the SNN check. To solve this problem, we propose an improved matching strategy, the geometrically inconsistent nearest neighbor ratio algorithm — FGINN. The SNN is used for checking and screening, for each key point1 of the image 1, the nearest key point on the image 2 is a point21 point (the nearest key point) and the distance is dis1, and then a key point22 point (also the 2 nd nearest key point) which is inconsistent in geometry and is the nearest in distance is found and the distance is dis2. Wherein geometrically inconsistent means if the Euclidean distance between the centers of the regions is ≧ n pixels (default: n = 10). Similar to SNN, dis1/dis2< sigma, we regard it as a correctly matched point pair, otherwise, it is eliminated as a mismatched point pair.

S2, example segmentation: and carrying out instance segmentation on the corrected output image by adopting a Yoract instance segmentation network, segmenting the instrument image into a dial and a pointer, and segmenting the dial to eliminate environmental interference factors outside the instrument so as to ensure that subsequent reading detection is more accurate. And dividing the pointer to obtain a pointer mask for performing linear detection and judging the direction of the pointer.

Yolact is a simple, full-volume model for real-time instance partitioning that is implemented primarily through two parallel subnets. Yoract is a single-stage model that is faster but less accurate than the two-stage model (mask r-cnn).

ResNet101 was used in the study herein as the backbone for the Yoract model, with pre-trained model weights used to fine-tune the meter images.

We constructed a meter test data set for example segmentation, which contains 523 high-precision labeled images together, and labeled the dial area and the pointer area with polygons, as shown in fig. 7. The image input size of the network was 512 x 512, yolact split the instance into two parallel tasks: firstly, generating a template Mask for the whole image, and secondly, predicting a series of Mask coefficients of each example.

Example segmentations of the entire image can be generated by the two steps above: for each example, the template Mask and the corresponding prediction coefficients are linearly combined and cut with the predicted b-box. It mainly implements instance splitting by two parallel sub-networks.

(1) A Prediction Head branch generates a category confidence coefficient, a position regression parameter and a Mask coefficient of each anchor;

(2) The Protonet branch generates a set of prototype masks. Then, the prototype Mask is multiplied by a Mask coefficient to obtain a Mask for each target object in the image.

The more detailed concrete steps are as follows:

s10, preprocessing; the preprocessing steps include resizing the image to the appropriate size, converting the bbox coordinates and converting the mask matrix, etc.

S11, inputting a Backbone; the preprocessed image is input into a backbone network, and ResNet101 is used in the method. There are five convolution modules of ResNet, conv1, conv _2x to conv5x, whose outputs correspond to C1 to C5 of Yolact, respectively. Similar to the SSD model, yotact also employs feature maps at multiple scales. One of the benefits of this is that objects of different dimensions can be detected, i.e. small objects on a large feature map and large objects on a small feature map.

S12, inputting FPN; P3-P7 of Yolact is an FPN network. The network is generated by C5 via a convolutional layer to obtain P5. Next, P5 is subjected to bilinear interpolation once to be amplified, and is added to C4 having undergone convolution to obtain P4, and P3 is obtained in the same manner. Further, P6 is obtained by convolving P5, and P7 is obtained by convolving P6. The reason for using FPN is that deeper feature maps can generate more robust masks, while larger prototype masks can ensure that the final mask is of higher quality and detects small objects better. If the feature is deep and the feature map is large, the FPN is used. The parallel operation follows. P3 is sent to Protonet, and P3-P7 are also sent to Prediction Head at the same time.

S13, inputting Protonet; the design of Protonet was inspired by Mask R-CNN, which consists of several convolutional layers. Its input is P3, and its output mask dimensions are 138 × 32, i.e., 32 prototype masks, each of size 138 × 138.

S14, inputting a Prediction Head; the input of this branch is five feature maps from P3-P7, and the Prediction header also has five Prediction layers sharing parameters corresponding to them one by one. The input feature map is generated into the anchor firstly. Each pixel generates 3 anchors, in the ratio 1, 2 and 2. The anchor base sides of the five profiles are 24, 48, 96, 192 and 384, respectively. The length of the basic side is adjusted according to different proportions, and the equal area of the anchors is ensured.

Taking P3 as an example, its dimension is marked as W3 × H3 × 256, and then its anchor number is a3= W3 × H3 × 3. Next, a Prediction Head generates three types of outputs for it: class confidence with a dimension of a3 x 2; a positional offset, with a dimension a3 x 4; mask coefficient, dimension a3 x 32.

The operations performed on P4-P7 are identical and finally these results are concatenated, with the label a = a3+ a4+ a5+ a6+ a7, giving: all categories are trusted. Its dimension is a x 2; all the positions are shifted, and the dimension is a 4;

all mask coefficients, dimension a 32.

S15, NMS is carried out; after the positional offset is obtained, the RoI position can be obtained by adding the positional offset to the position of the anchor. There is an overlap in the RoI, and NMS is a commonly used screening algorithm.

S16, performing Crop and Threshold; matrix multiplication is carried out on the mask coeffient and the prototype mask, and then the masks in the image are obtained. Crop refers to clearing the mask outside the boundary, the boundary of the training phase is a ground running bounding box, and the boundary of the evaluation phase is a predicted bounding box. Threshold refers to outputting only results with confidence levels greater than a Threshold during the evaluation phase.

By labeling 523 instrument images, after 10000 iterations, the Yolact model achieves 98.11% accuracy, 98.13% accuracy, 98.05% recall, and 98.09% F1-score in the example segmentation task. The recognition effect is as shown in FIG. 5: the yolact instance segmentation network can realize accurate segmentation of the pointer and the dial plate area, remove background interference, refine the pointer area and improve the stability and accuracy of detection.

S3, numerical matching: and carrying out FGINN matching on the corrected image and the template image to obtain a plurality of matching points. The threshold of the geometric inconsistency in FGINN is set to 1 pixel, which helps to further improve the matching precision, and the connecting lines between the corrected image and the template image are the corresponding matching point pairs in fig. 3 (a) (b). Taking the center of the dial as the center point, taking the straight line horizontally to the right as the polar coordinate axis, calculating the angle of each matching point in fig. 3 (b), making a difference with the pointer angle, and taking the maximum negative value and the minimum positive value, wherein the maximum negative value and the minimum positive value correspond to the matching point a and the point b in fig. 3 (b). The template image is marked with a center point (center _ point), a zero point (zero _ point), a value point (value _ point), and a value (value) in fig. 3 (a), and a division value and a reading value at any point can be calculated from the marking information and the angle difference. Since the matching points in the template image and the corrected image are exactly matched, the reading values of the matching points corresponding to the template image and the corrected image are equal, i.e., the reading values of the matching points corresponding to fig. 3 (c) and fig. 3 (d) are the same.

S4, straight line fitting and reading: after the three steps, all data are integrated to perform reading identification of the pointer instrument. Performing image processing on the dial as shown in fig. 4, firstly, performing binarization on the correction dial as shown in fig. 4 (a); a dial mask and a pointer mask are removed from the dial area to obtain a profile of a scale line, as shown in fig. 4 (b); and then performing straight line fitting on the outline of the scale line to obtain an intersection point of straight lines to obtain a central point I, and performing least square straight line fitting after binarization is performed on the pointer mask to obtain a pointer straight line equation, as shown in fig. 4 (c). Because the accuracy of pointer detection is high and the pointer mask is accurate, the pointer can be considered to pass through the central point, and therefore the projected point of the central point-on the pointer straight line equation is taken as the central point. And performing comprehensive data processing according to the numerical value matching information obtained in the step S3, wherein the specific method comprises the steps of obtaining corresponding numerical values according to the matching point with the minimum difference value on two sides of the pointer, the point a (x 1, y 1) and the point b (x 2, y 2) and the template image, obtaining division values according to the angle difference values of the point a and the point b, and obtaining reading values by using an angle method according to the difference value of the pointer and the point a or the point b.

The central point is obtained according to the crossing point of scale mark tentatively, has the error, consequently needs to obtain real central point after handling.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method for automatically reading data of a pointer instrument based on a Yoract network is characterized by comprising the following steps:

s1, correcting an acquired instrument image;

s2, carrying out instance segmentation on the corrected instrument image by adopting a Yoract instance segmentation network, and segmenting the instrument image into a dial and a pointer area to obtain a pointer mask and a dial mask;

s3, numerical matching: FGINN matching is carried out on the corrected image and the template image to obtain a plurality of matching point pairs, then the center of the dial plate is taken as a central point, and a horizontal right straight line is taken as a polar coordinate axis, so that the difference value of the angle of each matching point in the corrected image and the angle of the pointer is calculated, and the maximum negative value and the minimum positive value are taken to obtain the matching point with the minimum difference value on the two sides of the pointer;

s4, straight line fitting and reading: the method comprises the steps of binarizing a correction dial plate, obtaining a graduation line outline from a dial plate area, fitting straight lines on the graduation line outline to obtain a straight line intersection point to obtain a first central point, fitting least square straight lines on a pointer mask to obtain a pointer straight line equation, and taking a projection point of the first central point on the pointer straight line equation as the central point; and obtaining a corresponding numerical value of the matching point according to the matching point with the minimum difference value on the two sides of the pointer and the template image obtained in the step S3, then obtaining a division value according to the angle difference value of the two matching points, and obtaining a reading by using an angle method according to the difference value of the pointer and any one matching point.

2. The method for automatically reading data of the pointer instrument based on the Yolact network as claimed in claim 1, wherein the S1 comprises the following steps:

s1-1, detecting feature points of the acquired original image and the template image by adopting an SIFT algorithm to obtain key points;

3. The method as claimed in claim 2, wherein the improved matching algorithm FGINN comprises:

the geometric inconsistency includes: the Euclidean distance between the centers of the regions is more than or equal to n pixels.

4. The method for automatically reading data of the pointer instrument based on the Yoract network as claimed in claim 3, wherein the threshold ratio σ is in a range of 0.6-0.8.

5. The method of claim 1, wherein the yloact instance splitting network comprises: using ResNet101 as the backbone for the yotact model, the instrument images were adjusted using the pre-trained model weights.