CN110543878A

CN110543878A - pointer instrument reading identification method based on neural network

Info

Publication number: CN110543878A
Application number: CN201910724076.1A
Authority: CN
Inventors: 田联房; 郭月阳; 杜启亮
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2019-12-06
Anticipated expiration: 2039-08-07
Also published as: CN110543878B

Abstract

The invention discloses a pointer instrument reading identification method based on a neural network, which comprises the following steps: 1) constructing and preprocessing a dial positioning data set; 2) designing a matched neural network model aiming at the characteristics of the identified object and the application scene; 3) training the loading training parameters of the neural network model to obtain a dial plate positioning model; 4) inputting an image to be recognized into a trained dial plate positioning model to obtain the position and the category of a dial plate in the image, outputting and constructing a dial plate information extraction data set and preprocessing the dial plate information extraction data set by the dial plate positioning model, loading training parameters on a neural network model and training on the dial plate information extraction data set, and obtaining a dial plate information extraction model after training is finished; 5) outputting and inputting the dial plate positioning model into the trained dial plate information extraction model to obtain dial plate range information; 6) extracting the position information of the pointer and the dial center from the dial positioning model output; 7) the meter reading is calculated. The invention can obtain high-precision meter reading on the premise of ensuring real-time performance.

Description

pointer instrument reading identification method based on neural network

Technical Field

The invention relates to the technical field of pattern recognition and artificial intelligence, in particular to a pointer instrument reading recognition method based on a neural network.

background

the pointer instrument is widely applied to various fields such as industrial production, social life, environmental monitoring and the like, and provides a lot of convenience for production work. At present, manual reading is a widely adopted reading method, but the manual reading has the defects of low efficiency, large error, poor stability and the like, and especially in places which are not suitable for people to enter, such as the high-voltage environment of a transformer substation, the manual reading loses feasibility. In recent years, with the rapid development of machine vision and artificial intelligence technologies, an automatic meter reading identification technology based on image equipment becomes possible, the application scene limitation of an artificial reading method can be effectively avoided by using the automatic identification technology, and the method has the advantages of improving efficiency, eliminating errors and the like. Therefore, the research on the pointer instrument reading automatic identification technology has very important significance.

at present, methods for automatically identifying the reading of a pointer instrument are many, and the methods can be mainly divided into a method based on traditional image processing and a method based on deep learning. The traditional image processing method mainly comprises a subtraction method, a Hough transform detection method, a template matching method and the like, but the methods have some defects, for example, the subtraction method not only needs to obtain a zero-scale image of an instrument to be identified in advance, but also needs a background area to be identified to be unchanged, the Hough transform detection method only can detect a dial plate and a pointer with relatively standard shapes, a template image with a single size needs to be obtained in template matching, relatively high contrast is needed between the template image and the background, the robustness of the traditional image processing methods is easily influenced by factors such as illumination, shooting angle and imaging quality, and the application scene is limited. The deep learning method based on the deep learning can be used for positioning and identifying various types of instruments in different scenes after a large number of data sets are trained, has high robustness, low requirements for application scenes and strong popularization, but the current mainstream deep learning method needs a large amount of data sets and time for training, and cannot ensure real-time performance under limited hardware conditions due to a complex structure. Therefore, structural optimization needs to be performed on the basis of the existing deep learning technology, the algorithm training cost is reduced, and the algorithm instantaneity is improved.

In combination with the above discussion, the pointer instrument reading identification method with real-time performance and high precision has higher practical application value.

Disclosure of Invention

the invention aims to overcome the defects of the prior art and provides a pointer instrument reading identification method based on a neural network, which mainly utilizes a deep learning technology design to realize dial plate positioning, classification and range information extraction functions, then utilizes an image processing technology to obtain pointer and circle center position information, and utilizes an angle method to obtain high-precision instrument reading on the premise of ensuring real-time performance.

in order to achieve the purpose, the technical scheme provided by the invention is as follows: a pointer instrument reading identification method based on a neural network comprises the following steps:

1) constructing and preprocessing a dial positioning data set;

2) designing a matched neural network model aiming at the characteristics of the identified object and the application scene;

3) training the designed neural network model loading training parameters, performing online data enhancement in the training process, and obtaining a dial plate positioning model after the training is finished;

4) Inputting an image to be recognized into a trained dial plate positioning model to obtain the position and the category of a dial plate in the image, outputting and constructing a dial plate information extraction data set by the dial plate positioning model and preprocessing, loading training parameters on the neural network model designed in the step 2) and training on the dial plate information extraction data set, using online data enhancement in the training process, and obtaining a dial plate information extraction model after the training is finished;

5) Outputting and inputting the dial plate positioning model into the trained dial plate information extraction model to obtain dial plate range information;

6) Obtaining the position of a pointer and the central position of the dial by the output of the dial positioning model and the output of the dial information extraction model through an image processing technology;

7) Obtaining a maximum range position and a minimum range position from the output of the dial information extraction model so as to obtain a total range angle, obtaining a reading angle from the pointer position and the minimum range position, and obtaining the reading of the instrument by using the same proportional relation, namely an angle method; wherein, the formula of the angle method is as follows:

where num is the pointer meter reading, angle1 is the angle between the maximum range and the minimum range, angle2 is the angle between the minimum range and the pointer position, max is the maximum range reading, and min is the minimum range reading.

in the step 1), pointer instrument image data under different scenes are collected through image collection equipment, an original data set is constructed, then interference data influencing neural network training and recognition are removed, the interference data include fuzzy data, extreme angle data and dial plate missing data, and other data are marked, wherein the marking content is the position and the category of the dial plate of the instrument.

In step 2), a matched neural network is constructed by combining the characteristics of the recognition object and the application scene, and the method comprises the following steps:

2.1) constructing a feature extraction network

the method comprises the following steps of constructing a feature extraction network according to real-time and high-precision requirements, wherein the feature extraction network mainly comprises a plurality of combined convolution modules and has the structure as follows:

The first layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

The second layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

The third layer is a combined convolution module C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;

The fourth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

The fifth layer is a combined convolution module C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

The sixth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

The seventh layer is a combined convolution module C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the eighth layer is a combined convolution module D which consists of five combined convolution modules B;

The ninth layer is a combined convolution module C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

The tenth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

2.2) building a prediction network

according to the output of different layers of the feature extraction network, a prediction network for outputting and predicting targets with different sizes is constructed, and the method comprises the following steps:

a. large-size target prediction network

the input is the tenth layer output of the feature extraction network, the large-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers, and the structure of the large-size target prediction network is as follows:

The first layer is a combined convolution module D which consists of five combined convolution modules B;

The third layer is a convolution layer;

b. medium size target prediction network

The input is the eighth layer output of the feature extraction network and the first layer output of the large-size target prediction network, the medium-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers, and the structure of the medium-size target prediction network is as follows:

The first layer is an input fusion module which consists of a combined convolution module B, an up-sampling layer and a tensor splicing layer;

the second layer is a combined convolution module D which consists of five combined convolution modules B;

The third layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

The fourth layer is a convolution layer;

c. Small size target prediction network

the input is the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, the small-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers, and the small-size target prediction network has the structure that:

The fourth layer is a convolution layer;

Finally, the output of the large-size target prediction network, the medium-size target prediction network and the small-size target prediction network passes through a non-maximum value inhibition layer to obtain the positions and the types of the predicted targets;

2.3) setting the loss function

setting a loss function as a summation mean of a central coordinate loss function, a width and height loss function, confidence coefficient loss and a category loss function, wherein the loss function formula is as follows:

Loss＝(Loss+Loss+Loss+Loss)/numf

wherein, Loss represents total Loss, Lossxy represents central coordinate Loss, Losswh represents width and height Loss, lossconfident represents confidence Loss, Losscls represents category Loss, and numf represents floating point number of input total number; the respective loss functions are formulated as follows:

Loss＝mark*(2-w*h)*Loss(xy,xy)

Loss＝0.5*mark*(2-w*h)*(whtrue-wh)

Loss＝mark*Loss(mark,c)+(1-mark)*Loss (mark,c)*mark

Loss＝mark*Loss(cls,cls)

The center coordinate loss function is formulated as follows:

Loss＝mark*(2-w*h)*Loss(xy,xy)

wherein markobject represents whether the anchor frame has a mark bit of an object, w represents the width of the anchor frame, h represents the height of the anchor frame, Losslog represents binary cross entropy loss, xytrue represents a real center coordinate value, xypredict represents a predicted center coordinate value, whtrutrutrure represents a real width and height value, whredict represents a predicted width and height value, cpredict represents a confidence value of the prediction frame, markignore represents a mark bit of the anchor frame with an IOU smaller than a threshold value, clstrie represents a real class, and clsprredict represents a prediction class.

in step 3), training the designed neural network model, comprising the following steps:

3.1) setting training parameters

setting a training optimizer as Adam, an initial learning rate of 0.001, an iteration number of 500, a batch size of 8 and K means clustering on all labels to generate initial prior frames of (38, 29), (65, 52), (94, 87), (142, 134), (195, 69), (216, 206), (337, 320), (397, 145), (638, 569);

3.2) Online data enhancement

The data enhancement is carried out on the input image, the data set is expanded, and the main method of the data enhancement is as follows:

a. random mirror inversion

carrying out random mirror image overturning on an input image;

b. random additive noise

Adding a continuous single noise mask to the input image;

c. randomly adjusting contrast

Modifying hue and saturation to realize contrast conversion;

3.3) setting training completion flag

and setting verification set interval detection training accuracy, marking a training completion mark to meet the requirements of the maximum iteration number and the accuracy, and storing the model structure and parameters after the training is completed.

In step 4), integrating the output images obtained by the dial plate positioning model into a dial plate information extraction training set, performing labeling work, wherein the labeled content is range information, and training the neural network model designed in step 2) on the dial plate information extraction data set, wherein the method comprises the following steps:

4.1) setting training parameters

Setting a training optimizer as Adam, an initial learning rate of 0.001, an iteration number of 600, a batch size of 8 and K means clustering on all labels to generate initial prior frames of (8, 12), (10, 16), (11, 20), (12, 12), (14, 16), (15, 26), (21, 19), (24, 34), (45, 55);

4.2) Online data enhancement

a. Random mirror inversion

Carrying out random mirror image overturning on an input image;

b. random addition of rectangular noise

Information loss in a plurality of rectangular areas with selectable area sizes and random positions in an input image is converted, information loss of all channels generates black rectangular blocks, and information loss of partial channels generates color noise;

c. Randomly adjusting contrast

modifying hue and saturation to realize contrast conversion;

4.3) setting training completion flag

In the step 5), a dial area image is obtained by the output of the dial positioning model, and the dial area image is input into the dial information extraction model to obtain dial range information.

in step 6), obtaining the pointer position and the dial center position by the dial positioning model output and the dial information extraction model output through an image processing technology, and the method comprises the following steps:

6.1) Dial correction

obtaining a transformation matrix by using dial plate central range position information and central range real position information which are obtained by the output of the dial plate information extraction model, and correcting a dial plate image through affine transformation;

6.2) pointer location segmentation

Obtaining a binary dial image by applying an OTSU (optical transmission unit) adaptive threshold segmentation method, searching a maximum contour meeting the length-width ratio and the area of a specified range on the binary image to obtain a pointer initial position, and obtaining the pointer position through Hough line detection and least square fitting;

6.3) Dial centering

And obtaining the center of the dial to be selected by using Hough circle detection, calculating the Euclidean distance between each center and the pointer, and obtaining the center position of the dial according to the minimum distance principle.

compared with the prior art, the invention has the following advantages and beneficial effects:

1. The dial positioning and the dial information extraction are completed by using the neural network, the positions and the range information of the pointer instruments with different backgrounds and different types can be accurately identified in a complex environment, and the accuracy is higher than that of a method based on the traditional image processing technology.

2. the designed neural network adopts deep separable convolution to enable the parameter quantity to be small, the network structure is simple, a high frame rate can be obtained on the premise of guaranteeing high accuracy so as to guarantee real-time performance, and the requirement on hardware is low.

3. The target prediction network can predict targets with different sizes, comprehensively select a prediction frame with the most suitable size, accurately detect instruments with different sizes, and is free from the limitation of collection angle and distance.

4. The designed loss function comprises the confidence loss of the background class, and the influence of unbalance of positive and negative samples on training can be effectively eliminated.

drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2a is a schematic diagram of the combined convolution module a.

fig. 2B is a schematic diagram of the combined convolution module B.

fig. 2C is a schematic diagram of the combined convolution module C.

fig. 2D is a schematic diagram of the combined convolution module D.

FIG. 3 is a schematic diagram of an input fusion module.

Fig. 4 is a schematic view of an angle reading identification method.

where num is the pointer meter reading, angle1 is the angle between the maximum range and the minimum range, angle2 is the angle between the pointer and the minimum range, max is the maximum range reading, and min is the minimum range reading.

Detailed Description

The present invention will be further described with reference to the following specific examples.

As shown in fig. 1, the method for identifying a reading of a pointer instrument based on a neural network provided in this embodiment includes the following steps:

1) and collecting pointer instrument images shot in different actual scenes to construct an original data set. Interference data which affect neural network training and recognition due to the existence of fuzziness, extreme angles, dial plate missing and the like are removed, dial plate positions and categories in the rest data are marked by using an open source marking tool labelImg, and a dial plate positioning training set is constructed.

2) Designing a neural network meeting actual requirements according to the specific application scene and the characteristics of an identification object, wherein the following activation layers are all Leaky Relu activation functions if not additionally stated, and the method comprises the following steps:

2.1) constructing a feature extraction network

And constructing a feature extraction network according to the requirements of real-time performance and high precision. The feature extraction network is mainly composed of a plurality of combined convolution modules.

the feature extraction network structure is as follows:

The input images are 416x416x 3.

the first layer is the combined convolution module a, as shown in fig. 2 a. The module first passes through the zero-padding layer with an output of 418 x 3. Then the convolution kernel is (3, 3), the step length is 2, the filter number is 32, and the output is 208 multiplied by 32.

The second layer is the combined convolution module B, as shown in fig. 2B. The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 208 × 208 × 32. And after the convolution, batch normalization and activation layers, the convolution kernel is (1, 1), the step size is 1, the number of filters is 64, the input and output sizes are consistent by using filling, and the output is 208 multiplied by 64.

the third layer is a combined convolution module C, as shown in fig. 2C. The module first passes through the zero-padding layer and the output is 210 x 64. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 2, and the output is 104 multiplied by 64. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 104 multiplied by 128.

the fourth layer is the combined convolution module B, as shown in fig. 2B. The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 104 × 104 × 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 104 multiplied by 128.

the fifth layer is a combined convolution module C, as shown in fig. 2C. The module first passes through the zero-padding layer with an output of 106 x 128. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 2, and the output is 52 multiplied by 128. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.

the sixth layer is the combined convolution module B, as shown in fig. 2B. The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52 × 52 × 256. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.

the seventh layer is a combined convolution module C, as shown in fig. 2C. The module first passes through the zero-padding layer and the output is 54 x 256. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 2, and the output is 26 multiplied by 256. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512.

the eighth layer is the combined convolution module D, as shown in fig. 2D. The modules pass through five combined convolution modules B in sequence as shown in fig. 2B. In each combined convolution module B, the input first passes through a deep convolution, a batch normalization layer and an activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output size is 26 multiplied by 512. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1, 1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512. After sequentially passing through the same combined convolution module B, the output is 26 × 26 × 512.

The ninth layer is a combined convolution module C, as shown in fig. 2C. The module first passes through the zero-padding layer and the output is 28 x 512. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step length is 2, and the output is 13 multiplied by 512. And finally, performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.

the tenth layer is the combined convolution module B, as shown in fig. 2B. The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 13 × 13 × 1024. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.

2.2) building a prediction network

And constructing a prediction network for outputting and predicting the targets with different sizes according to the output of different layers of the feature extraction network.

a. Large-size target prediction network

the input is the tenth layer output of the feature extraction network, and the large-size target prediction network mainly comprises a plurality of combined convolution modules, convolution layers and other neural network modules.

the input image is 13 × 13 × 1024.

the large-size target prediction network structure is as follows:

The first layer is the combined convolution module D, as shown in fig. 2D. The modules pass through five combined convolution modules B in sequence as shown in fig. 2B. In the first combined convolution module B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 13 × 13 × 1024. And performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 512. In the second combined convolution module B, the input is first subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 13 × 13 × 512. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024. After the two different parameters of the combined convolution module B are alternately input, the output is 13 multiplied by 512.

The second layer is the combined convolution module B, as shown in fig. 2B. The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 13 × 13 × 512. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.

The third layer is a convolutional layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 255, and the output is 13 × 13 × 255.

b. Medium size target prediction network

the input is the eighth layer output of the feature extraction network and the first layer output of the large-size target prediction network, and the medium-size target prediction network mainly comprises a plurality of combined convolution modules, convolution layers and other neural network modules.

The input images are 26 × 26 × 512 and 13 × 13 × 512.

The medium-sized target prediction network structure is as follows:

The first layer is the input fusion module, as shown in FIG. 3. The input 13 x 512 first goes through the combined convolution module B where first the deep convolution, batch normalization layer and activation layer are passed, the convolution kernel is (1, 1) and the step size is 1, padding is used to make the input and output sizes consistent and the output is 13 x 512. And performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 512. And then passes through an up-sampling layer, the sampling factor is 2, and the output is 26 multiplied by 512. Finally, the output and input are 26 × 26 × 512 through a tensor splicing layer, and the output is 26 × 26 × 1024.

the second layer is the combined convolution module D, as shown in fig. 2D. The modules pass through five combined convolution modules B in sequence as shown in fig. 2B. In the first combined convolution module B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 26 × 26 × 1024. And then the data is subjected to convolution, batch normalization and activation layers, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 256. In the second combined convolution module B, the input is first subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 26 × 26 × 256. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1, 1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512. After the two different parameters of the combined convolution module B are alternately input, the output is 26 multiplied by 256.

the third layer is a combined convolution module B, as shown in fig. 2B. The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 26 × 26 × 256. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1, 1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512.

The fourth layer is a convolutional layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 255, and the output is 26 × 26 × 255.

c. small size target prediction network

The input is the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, and the small-size target prediction network mainly comprises a plurality of combined convolution modules, convolution layers and other neural network modules.

the input images are 52 × 52 × 256 and 26 × 26 × 256.

the small-size target prediction network structure is as follows:

The first layer is the input fusion module, as shown in FIG. 3. The input 26 × 26 × 256 first passes through the combined convolution module B, where the deep convolution, batch normalization layer and activation layer are first passed, the convolution kernel is (1, 1), the step size is 1, padding is used to make the input and output size uniform, and the output is 26 × 26 × 256. And then the data is subjected to convolution, batch normalization and activation layers, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 256. And the sampling factor is 2 after passing through an up-sampling layer, and the output is 52 multiplied by 256. Finally, the output and input 52 × 52 × 256 go through a tensor concatenation layer, and the output is 52 × 52 × 512.

The second layer is the combined convolution module D, as shown in fig. 2D. The modules pass through five combined convolution modules B in sequence as shown in fig. 2B. In the first combined convolution module B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, padding is used to make the input and output size consistent, and the output is 52X 512. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 128. In the second combined convolution module B, the input is first subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52X 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256. After the two different parameters of the combined convolution module B are alternately input, the output is 52 multiplied by 128.

The third layer is a combined convolution module B, as shown in fig. 2B. The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52 × 52 × 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.

The fourth layer is a convolutional layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 255, and the output is 52 × 52 × 255.

and finally, obtaining the predicted target position and the predicted target category through a non-maximum value suppression layer by using the output 13 × 13 × 255 of the large-size target prediction network, the output 26 × 26 × 255 of the medium-size target prediction network and the output 52 × 52 × 255 of the small-size target prediction network.

2.3) setting the loss function

and setting the loss function as a summation mean of a central coordinate loss function, a width and height loss function, a confidence coefficient loss and a category loss function. The loss function is formulated as follows:

Loss＝(Loss+Loss+Loss+Loss)/numf

Wherein Loss represents total Loss, Lossxy represents center coordinate Loss, Losswh represents width and height Loss, lossconfident represents confidence Loss, Losscls represents category Loss, and numf represents floating point number of total input. The respective loss functions are formulated as follows:

Loss＝mark*(2-w*h)*Loss(xy,xy)

Loss＝0.5*mark*(2-w*h)*(wh-wh)

Loss＝mark*Loss(mark,c)+(1-mark)*Loss (mark,c)*mark

Loss＝mark*Loss(cls,cls)

the center coordinate loss function is formulated as follows:

Loss＝mark*(2-w*h)*Loss(xy,xy)

Wherein markobject represents whether the anchor frame has a mark bit of an object, w represents the width of the anchor frame, h represents the height of the anchor frame, Losslog represents binary cross entropy loss, xytrue represents a real center coordinate value, xypredict represents a predicted center coordinate value, whtrutruru represents a real width and height value, whredict represents a predicted width and height value, cpredict represents a confidence value of the prediction frame, markignore represents a mark bit of the anchor frame with an IOU smaller than a threshold value, clspredict represents a prediction category.

3) training the designed neural network model, comprising the steps of:

3.1) setting training parameters

Setting Adam as a training optimizer, 0.001 as an initial learning rate, 500 as an iteration number, 8 as a batch size, and K means clustering on all labels generates initial prior frames (38, 29), (65, 52), (94, 87), (142, 134), (195, 69), (216, 206), (337, 320), (397, 145), (638, 569).

3.2) Online data enhancement

The data enhancement is carried out on the input image, the data set is expanded, and the data enhancement method comprises the following steps:

a. Random mirror inversion

and carrying out random mirror image inversion on the input image.

b. Random additive noise

A continuous single noise mask is added to the input image.

c. Randomly adjusting contrast

modifying hue and saturation effects contrast conversion.

3.3) setting training completion flag

Setting verification set interval detection training accuracy, setting a training completion flag to reach a maximum iteration number of 500, enabling the accuracy to meet 99% requirements, and storing model structures and parameters after training is completed.

4) Integrating output images obtained through a dial plate positioning model into a dial plate information extraction training set, eliminating interference data which have the influences on neural network training and recognition such as fuzziness, extreme angles, dial plate deletion and the like, marking positions and categories of dimensions in other data by using an open source marking tool, namely label img, constructing the dial plate information extraction training set, and training the neural network model designed in the step 2) on the dial plate information extraction data set, wherein the method comprises the following steps:

4.1) setting training parameters

4.2) Online data enhancement

a. Random mirror inversion

carrying out random mirror image overturning on an input image;

b. random addition of rectangular noise

c. Randomly adjusting contrast

Modifying hue and saturation to realize contrast conversion;

4.3) setting training completion flag

setting verification set interval detection training accuracy, setting a training completion flag to reach the maximum iteration number of 600, enabling the accuracy to meet 99% requirements, and storing the model structure and parameters after training is completed.

5) and outputting the dial area image by the dial positioning model, and inputting the dial area image into the dial information extraction model to obtain dial range information.

6) the dial positioning model output and the dial information extraction model output are used for obtaining the pointer position and the dial center position through an image processing technology, and the method comprises the following steps:

6.1) Dial correction

And obtaining a transformation matrix by using the dial plate central range position information and the central range real position information which are output by the dial plate information extraction model, and correcting the dial plate image through affine transformation.

6.2) pointer location segmentation

Obtaining a binarization dial image by applying an OTSU self-adaptive threshold segmentation method, searching a maximum contour meeting the specified range length-width ratio and area on the binarization image to obtain a pointer initial position, and obtaining the pointer position through Hough line detection and least square fitting.

6.3) Dial centering

And obtaining the center of the dial to be selected by using Hough circle detection, calculating the Euclidean distance between each center and the pointer, and screening according to the minimum distance principle to obtain the center position of the dial.

7) obtaining a total range angle from a maximum range position and a minimum range position obtained in the output of the dial information extraction model, obtaining a reading angle from a pointer position and the minimum range position, and obtaining the meter reading by an angle method which is a same proportion relation, as shown in fig. 4, the formula of the angle method is as follows:

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. a pointer instrument reading identification method based on a neural network is characterized by comprising the following steps:

1) Constructing and preprocessing a dial positioning data set;

2. the method for recognizing the reading of the pointer instrument based on the neural network as claimed in claim 1, wherein in step 1), image data of the pointer instrument under different scenes are collected through an image collection device, an original data set is constructed, then interference data influencing training and recognition of the neural network, including fuzzy data, extreme angle data and dial plate missing data, are removed, and the rest data are labeled, wherein the labeled content is the position and the category of the dial plate of the instrument.

3. The method for recognizing the reading of the pointer instrument based on the neural network as claimed in claim 1, wherein in the step 2), a matched neural network is constructed by combining the characteristics of the recognition object and the application scene, and the method comprises the following steps:

2.1) constructing a feature extraction network

2.2) building a prediction network

a. large-size target prediction network

The third layer is a convolution layer;

b. Medium size target prediction network

the fourth layer is a convolution layer;

c. small size target prediction network

The fourth layer is a convolution layer;

2.3) setting the loss function

Loss＝(Loss+Loss+Loss+Loss)/numf

Loss＝mark*(2-w*h)*Loss(xy,xy)

Loss＝0.5*mark*(2-w*h)*(wh-wh)

Loss＝mark*Loss(mark,c)

+(1-mark)*Loss(mark,c)*mark

Loss＝mark*Loss(cls,cls)

The center coordinate loss function is formulated as follows:

Loss＝mark*(2-w*h)*Loss(xy,xy)

4. the method for recognizing the reading of the pointer instrument based on the neural network as claimed in claim 1, wherein in the step 3), training the designed neural network model comprises the following steps:

3.1) setting training parameters

setting a training optimizer as Adam, an initial learning rate of 0.001, an iteration number of 500, a batch size of 8 and K-means clustering on all labels to generate initial prior frames as (38, 29), (65, 52), (94, 87), (142, 134), (195, 69), (216, 206), (337, 320), (397, 145), (638, 569);

3.2) Online data enhancement

a. Random mirror inversion

Carrying out random mirror image overturning on an input image;

b. random additive noise

Adding a continuous single noise mask to the input image;

c. randomly adjusting contrast

Modifying hue and saturation to realize contrast conversion;

3.3) setting training completion flag

5. the method for recognizing the reading of the pointer instrument based on the neural network as claimed in claim 1, wherein in the step 4), the output image obtained by the dial plate positioning model is integrated into a dial plate information extraction training set, and labeling work is performed, the labeling content is range information, and the neural network model designed in the step 2) is trained on the dial plate information extraction data set, comprising the following steps:

4.1) setting training parameters

Setting a training optimizer as Adam, an initial learning rate of 0.001, an iteration number of 600, a batch size of 8 and K-means clustering on all labels to generate initial prior frames of (8, 12), (10, 16), (11, 20), (12, 12), (14, 16), (15, 26), (21, 19), (24, 34), (45, 55);

4.2) Online data enhancement

a. Random mirror inversion

carrying out random mirror image overturning on an input image;

b. random addition of rectangular noise

c. randomly adjusting contrast

modifying hue and saturation to realize contrast conversion;

4.3) setting training completion flag

6. the method for recognizing the reading of the pointer instrument based on the neural network as claimed in claim 1, wherein in the step 5), the dial area image is obtained by the dial positioning model output, and the dial area image is input into the dial information extraction model to obtain the dial range information.

7. the neural network-based pointer instrument reading identification method of claim 1, wherein in step 6), the pointer position and the dial center position are obtained from the dial positioning model output and the dial information extraction model output through an image processing technology, and the method comprises the following steps:

6.1) Dial correction

6.2) pointer location segmentation

6.3) Dial centering