CN111738264A

CN111738264A - Intelligent acquisition method for data of display panel of machine room equipment

Info

Publication number: CN111738264A
Application number: CN202010380909.XA
Authority: CN
Inventors: 胡金磊; 何永林
Original assignee: Shanghai Yooden Information Technology Co ltd
Current assignee: Hangzhou Youyun Technology Co ltd
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2020-10-02

Abstract

The invention discloses an intelligent acquisition method of data of a display panel of machine room equipment, which comprises the following steps: s1, acquiring images of a plurality of display panels by the robot to serve as a training data set; s2, inputting the training data set into an improved fast-rcnn algorithm, and training to obtain a text detection model; s3, the robot collects images of the display panel in real time and inputs the images into the text detection model obtained by training in the step S2, all texts are automatically marked out to obtain detection boxes, and position coordinates and size information of all the detection boxes in the images in the display panel are output; s4, extracting the Roi image in the detection frame, performing image preprocessing on the Roi image, and reserving the extracted digital skeleton image as a training sample set; s5, training an svm classifier according to the training sample set, and carrying out classification and identification on a single digit through the svm classifier; and S6, splicing the numbers into character strings and outputting the character strings to the client for display. The invention automatically collects data, reduces the labor cost and improves the operation and maintenance efficiency of the data center.

Description

Intelligent acquisition method for data of display panel of machine room equipment

Technical Field

The invention relates to the technical field of image recognition, in particular to an intelligent acquisition method for data of a display panel of machine room equipment.

Background

In the modern times, artificial intelligence is rapidly developed, and the traditional industry gradually seeks to change from manual intensive type to automatic type. The data center machine room is provided with a large number of IT devices including servers, network devices and the like, the devices are provided with various display panels, panel reading needs to be recorded all the time manually to check the current operation condition of the devices, a large amount of labor cost can be consumed, the accuracy of data acquisition cannot be guaranteed, the condition of reading or recording errors is easy to occur, and manual all-time data recording is tedious and low in efficiency. The modern measurement requires automation as much as possible, so that the labor is saved and the influence of human factors on the measurement result is reduced.

In view of the above situation, some computer rooms have adopted robots to acquire panel data, but in the existing acquisition methods in the industry, based on traditional image processing, images acquired by robots are manually marked with digital regions, the digital regions are extracted to perform image preprocessing, then a projection method is used to segment a single number, and finally the segmented number is identified by a threading method. This method has the following disadvantages:

(1) because the types of the display panels of the machine room are various, the relative positions of the numbers in the panels are different, the positions of the numbers are determined by a method of manually marking the number areas, the operation is complex, real intellectualization and automation cannot be realized, and the marking accuracy is easily influenced by subjective factors, so that the final identification result is influenced.

(2) Because the traditional methods such as image preprocessing, projection method and the like used in the prior art rely on the effect of image binarization, the preprocessing operations such as binarization and the like have poor adaptability to illumination, and the influence of illumination change on the final identification result is large.

(3) The method considers using other text recognition methods such as a mainstream OCR recognition library and the like, but the effect is better for a printed panel, the number of the seven-segment nixie tube is composed of seven small segments of nixie tubes, the nixie tubes are not adhered, the small segments of nixie tubes are easily recognized independently by the existing OCR library recognition, the meaning of the whole number cannot be recognized, and the recognition precision is lower.

Therefore, the intelligent degree of the existing method is low, manual auxiliary identification is still inevitably needed, the automation requirement is difficult to achieve, and the invention provides the intelligent acquisition method of the display panel data of the machine room equipment aiming at the digital display panel of the seven-segment digital tube type which exists in a large number in the machine room of the data center.

Disclosure of Invention

The invention provides an intelligent acquisition method of data of a display panel of machine room equipment in order to overcome the defects of the technology, and particularly provides an intelligent acquisition method of a seven-segment digital tube type digital display panel on the machine room equipment.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

an intelligent acquisition method for data of a display panel of machine room equipment comprises the following steps:

s1, acquiring images of a plurality of display panels by the robot to serve as a training data set;

s2, inputting the training data set into an improved fast-rcnn algorithm, and training to obtain a text detection model;

s3, the robot collects images of the display panel in real time and inputs the images into the text detection model obtained by training in the step S2, all texts are automatically marked out to obtain detection boxes, and position coordinates and size information of all the detection boxes in the images in the display panel are output;

s4, extracting the Roi image in the detection frame, performing image preprocessing on the Roi image, and reserving the extracted digital skeleton image as a training sample set;

s5, training an svm classifier according to the training sample set, and carrying out classification and identification on a single digit through the svm classifier;

and S6, splicing the numbers into character strings and outputting the character strings to the client for display.

Further, step S1 specifically includes: the robot autonomously navigates to the right front of a display panel of the equipment to be detected, a preset point of a camera set in advance is called, the shooting angle is automatically adjusted, and at least 1000 images of the display panel are shot and used as a training data set.

Further, in step S2, the modified fast-rcnn algorithm at least includes: a feature extraction module, rpn module, and a nested LSTM module for interfacing the feature extraction module with rpn module.

Further, the feature extraction module adopts the improved vgg16 as a convolution feature extraction module, and is used for extracting the spatial features of the text image pixels in the training sample; the feature extraction module at least comprises four convolution blocks Conv1, Conv2, Conv3 and Conv4, pooling layers pool and a target detection special layer Roi pool, wherein each convolution block is connected with a maximum pooling layer max pool, a second convolution layer Conv1_2 of a first convolution block comprises two convolution kernels with different sizes and c convolution kernels with 1 to 1, wherein the value of c is 2 to 5, and a third convolution layer Conv3_3 of a third convolution block comprises two pairs of irregular cross convolution kernels with different sizes and a convolution kernel with 1 to 1.

Further, the process of extracting the convolution feature by the feature extraction module includes the following steps:

inputting a panel image into a feature extraction module, and adopting two convolution kernels with different sizes on a second convolution layer Conv1_2 of a first convolution block;

step two, continuing convolution through the convolution kernels with two different sizes in the step one, and then adding c convolution kernels with 1 x 1;

step three, after being convolved by the first convolution block Conv1, the first convolution block enters a second convolution block Conv2, after being convolved by the second convolution block Conv2, the first convolution block is divided into two parts, one part enters the pooling layer pool, and the other part enters a third convolution block Conv3 for convolution;

step four, a third convolution layer Conv3_3 of a third convolution block is convolved by two pairs of irregular cross convolution kernels with different sizes so as to extract features with different scales;

step five, after the convolution is carried out by a third convolution block Conv3, the convolution is carried out by a fourth convolution block Conv 4;

and step six, fusing convolution characteristic graphs generated by all convolution blocks obtained by downsampling through the pooling layer pool in the step three and convolution through the fourth convolution block Conv4 in the step five through the target detection special layer Roi _ posing to obtain a final convolution characteristic graph.

Furthermore, the spatial features extracted by the feature extraction module are input into a nested LSTM module, and the nested LSTM module is used for extracting the relationship among feature vectors in the spatial features and taking the spatial features as a feature sequence frame by frame, so that the features of a certain frame of pixels and a previous frame or a next frame have continuity on the features, and a feature sequence with the spatial features and time sequence features is generated.

Further, the rpn module is used for returning the feature sequence generated by the nested LSTM module to the original input image according to a fixed reference frame anchor, and generating a detection frame in the original image, where the detection frame indicates a detection result of the text detection model, and the detection frame conforms to the actual size of the text to be detected.

Further, the process of the rpn module generating the detection box includes the following steps:

step one, obtaining output of N1024H W through a 1024-dimensional fully-connected layer, wherein the output of the fully-connected layer comprises two parts, namely rpn _ bbox _ pred and rpn _ cls _ score, rpn _ cls _ score _ reset, rpn _ cls _ prob and rpn _ cls _ prob _ resume are sequentially connected behind rpn _ cls _ score, the rpn _ bbox _ pred is used for outputting position coordinates of a detection box, the rpn _ cls _ score is used for outputting scores of foreground and background classification, and the rpn _ cls _ prob is used for calculating the probability that a fixed reference box anchor is foreground and background;

step two, outputting N20H W by the rpn _ bbox _ pred, returning the coordinates of the predicted fixed reference frame anchor, wherein the coordinates comprise the center y coordinates and the height value of the fixed reference frame anchor, and predicting a group of fixed reference frames anchors with different sizes and different positions for each pixel point on the full-connection layer feature diagram;

step three, the output dimension of the rpn _ cls _ score is N × 20 × H × W, two classification outputs with 2 × anchor background or foreground are generated, the dimension of the two classification outputs is converted into N × 2 × 10H × W by rpn _ cls _ score _ reshape, the probability that the fixed reference frame anchor is foreground and the probability that the fixed reference frame anchor is background is output by rpn _ cls _ prob, and the dimension of the probability of the foreground and the probability of the background is converted into N × 20 × H × W by rpn _ cls _ prob _ reshape;

and step four, finally, the Proposal module synthesizes the position coordinates of the detection frame and the scores of the foreground and the background to obtain the detection frame, and simultaneously eliminates the detection frame beyond the boundary and the detection frame smaller than the preset value.

Further, in step S4, before extracting the Roi image in the detection frame, the method further includes: and the text detection model automatically avoids invalid information according to the position coordinates and the size information of the detection frame in the display panel.

Further, in step S4, the image preprocessing performed on the Roi image at least includes:

filtering, namely reducing image noise by adopting Gaussian filtering or mean filtering;

self-adaptive binarization, namely performing binarization processing on the image through a self-adaptive set threshold value of local features of the image, and separating a text from a background;

character segmentation for segmenting a character of a white pixel;

and skeleton extraction for carrying out thinning operation on the image.

The invention has the beneficial effects that:

1. the invention improves the traditional fast-rcnn algorithm to be used for detecting the seven-segment digital tube data, ensures the accurate positioning of the digital position, and applies the svm machine learning method to the identification of the seven-segment digital tube data; the technology such as deep learning replaces the traditional image processing algorithm, the requirement of the algorithm on the image quality is reduced, the illumination adaptability of the recognition algorithm is enhanced, the problem of light reflection of the glass panel is well solved, the influence of illumination on image binaryzation is reduced, and the panel data acquisition scheme is more stable.

2. In order to deal with the specific scene of seven-segment digital nixie tube numbers, the traditional fast-rcnn algorithm is improved, deeper image characteristics can be extracted, and the problems of easy segmentation and missing detection of the seven-segment digital nixie tube number detection are effectively solved; the method obtains the accuracy rate of 0.94 and the recall rate of 0.85 by verifying and analyzing the text detection model aiming at the existing data set, and the effect of the text detection model is greatly improved compared with the effect of the traditional fast-rcnn.

3. In order to deal with display panels with various types and complex data display styles of data center machine rooms, the invention combines deep learning with the traditional method under the condition of ensuring the identification precision, accurately collects the seven-segment nixie tube panel data, gives consideration to the universality and the accuracy and greatly saves the labor cost. The intelligent degree and the working efficiency of the machine room inspection robot are improved.

4. According to the invention, the robot acquires the data of the display panel of the IT equipment of the machine room in real time, so that the data acquisition work is automatically carried out, the labor cost is greatly reduced, and the operation and maintenance efficiency of the data center is improved.

Drawings

Fig. 1 is a flowchart of an intelligent acquisition method for data of a display panel of equipment in a machine room according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a robot according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a state of the robot when the robot acquires an image in the machine room according to the embodiment of the present invention.

Fig. 4 is a schematic flow chart of extracting convolution features by the feature extraction module according to the embodiment of the present invention.

Fig. 5 is a schematic flowchart of the rpn module generating the detection box according to the embodiment of the present invention.

In the figure, 1, a tripod head camera; 2. a lifting base; 3. a robot housing; 4. detecting equipment to be detected; 5. a display panel.

Detailed Description

In order to facilitate a better understanding of the invention for those skilled in the art, the invention will be described in further detail with reference to the accompanying drawings and specific examples, which are given by way of illustration only and do not limit the scope of the invention.

As shown in fig. 1, the method for intelligently collecting data of a display panel of a machine room device according to this embodiment includes the following steps:

s1, the robot collects images of a plurality of display panels as a training data set.

Specifically, in this embodiment, as shown in fig. 2 and 3, the robot autonomously navigates to the front of the display panel of the device 4 to be detected, wherein, in order to ensure that there is enough field of view to collect panel data, a pan-tilt camera 1 capable of rotating in four directions, up, down, left and right, is installed above the robot housing 3, the pan-tilt camera 1 is mounted on the lifting base 2, and the maximum lifting height of the lifting base 2 can reach 1.8 m, so as to ensure that the display panel 5 at all heights in the machine room device can be covered; the robot calls a preset point of a camera which is set in advance, automatically adjusts the shooting angle, shoots at least 1000 images of the display panel, and uses the images as a training data set.

And S2, inputting the training data set into an improved fast-rcnn algorithm, and training to obtain a text detection model.

The accurate positioning of the position of the seven-segment nixie tube number is the basis for ensuring the identification accuracy; because every section of nixie tube for representing numbers is not adhered, the existing text detection model is easy to segment the numbers by mistake or miss detection during detection, and the detection on the numbers of a longer seven-section nixie tube is not ideal; in order to solve the problems existing in digital positioning, the invention improves the traditional fast-rcnn algorithm which only has a feature extraction module and an rpn module, and the improved fast-rcnn algorithm at least comprises the following steps: the device comprises a feature extraction module, an rpn module and a nested LSTM module for connecting the feature extraction module and the rpn module, and the original feature extraction module and the rpn module are improved to adapt to the detection of seven-segment digital tubes.

In order to enhance the fitting ability of the model and achieve more sufficient feature extraction, in this embodiment, the feature extraction module adopts the improved vgg16 as a convolution feature extraction module, which is used to extract convolution features of deep layers in the training sample, mainly spatial features of text image pixels. The existing vgg16 has five volume blocks, and the modified vgg16 adopts four volume blocks, as shown in fig. 4, the feature extraction module at least includes four volume blocks Conv1, Conv2, Conv3 and Conv4, a pooling layer pool and a target detection special layer Roi pool, wherein each volume block is connected with a maximum pooling layer maxpool. Firstly, in order to comprehensively consider factors such as model depth, network structure complexity and computational cost, the present embodiment extracts features through differentiation, and convolution kernels with different sizes are adopted in the same convolution block, specifically, the second convolution layer Conv1_2 of the first convolution block includes convolution kernels with two different sizes of 7 × 7 and 5 × 5, so that spatial features of an image can be reasonably utilized, and then c convolution kernels with 1 × 1 are added, wherein the value of c is 2-5, and the channel dimension is adjusted by adjusting the size of c (the channel dimension is reduced by adjusting the size of c), so that the calculation speed is increased, and the calculation cost is saved; secondly, since the seven-segment nixie tube has irregular digital shape, and has rectangle or polygon and inclined body, in the subsequent convolution process, the embodiment adopts irregular cross convolution kernel, specifically, the third convolution layer Conv3_3 of the third convolution block includes a pair of 5 × 1 and 1 × 5 irregular cross convolution kernels, a pair of 3 × 1 and 1 × 3 irregular cross convolution kernels and a convolution kernel of 1 × 1, and the irregular cross convolution kernel can enhance the adaptability and nonlinear expression capability to the multi-scale features, and improve the convolution effect.

In this embodiment, the process of extracting the convolution feature by the feature extraction module includes the following steps:

step one, inputting the panel image into a feature extraction module, and adopting convolution kernels with two different sizes of 7 × 7 and 5 × 5 in a second convolution layer Conv1_2 of a first convolution block;

step two, continuing convolution through two convolution kernels with different sizes, namely 7 × 7 and 5 × 5 in the step one, and then adding c convolution kernels with 1 × 1, namely adding 3 convolution kernels with 1 × 1 in the embodiment;

step four, a third convolution layer Conv3_3 of the third convolution block is convolved by two pairs of irregular cross convolution kernels with different sizes to extract features with different scales, wherein the two pairs of irregular cross convolution kernels with different sizes are a pair of irregular cross convolution kernels 5 x 1 and 1 x 5 and a pair of irregular cross convolution kernels 3 x 1 and 1 x 3 respectively;

and step six, fusing convolution characteristic graphs generated by all convolution blocks obtained by downsampling through the pooling layer pool in the step three and convolution through the fourth convolution block Conv4 in the step five through a target detection special layer Roi _ posing, namely effectively fusing convolution characteristics of a deep layer and a shallow layer to obtain a final convolution characteristic graph, and extracting more responsible various characteristics.

In this embodiment, in order to handle the detection of longer data, a nested LSTM module is added between the feature extraction module and the rpn module for linking. The nested LSTM module is an improved algorithm of the traditional LSTM, has a deeper network architecture compared with the traditional LSTM, can flexibly change the depth and freely access an internal memory unit, and can memorize a state which is longer than the current time. The spatial features extracted by the feature extraction module are input into the nested LSTM module, the nested LSTM module is used for extracting the relation between feature vectors in the spatial features (finding out feature vectors with continuity in space), and the spatial features are taken as a feature sequence frame by frame, so that the features of a certain frame of pixels and the previous frame or the next frame have continuity in features, continuous text lines can be guaranteed, and the feature sequence with the spatial features and time sequence features is generated. In this embodiment, the nested LSTM module is combined with the feature extraction module, and the fusion of the two modules can extract the spatial characteristics of the digital image and also extract the temporal characteristics between the spatial characteristic pixels, and this fusion feature can improve the effect of the whole digital detection model.

In this embodiment, the rpn module is configured to return the feature sequence generated by the nested LSTM module to the original input image according to the fixed reference frame anchor, and generate a detection frame in the original image, where the detection frame indicates a detection result of the text detection model, and the detection frame conforms to the actual size of the text to be detected.

In this embodiment, as shown in fig. 5, the process of generating the detection box by the rpn module includes the following steps:

step two, the rpn _ bbox _ pred outputs N20H W, the coordinates of a predicted fixed reference frame anchor are returned, the coordinates comprise the center y coordinates and the height values of the fixed reference frame anchor, a group of fixed reference frames anchors with different sizes and different positions are predicted for each pixel point of the fixed reference frame anchor on a full-connection layer feature diagram (the fixed reference frames anchor are candidates of a final detection frame), almost all positions and sizes are covered, each fixed reference frame is responsible for detecting whether an intersection-to-parallel ratio (the ratio of intersection and union of a predicted frame and a real frame) is larger than a target of a training preset threshold, and the more the intersection-to-parallel ratio is close to 1, the more accurate the detection is shown; in this embodiment, 10 fixed reference frames anchors with different sizes and different positions are predicted, for example, the width of each anchor is fixed to 10 pixels, and the height of each anchor can be 10, 13, 18, 26, 36, 48, 68, 97, 139, and 198 pixels, so that digital text positions with different sizes and shapes can be positioned;

step three, the output dimension of the rpn _ cls _ score is N × 20 × H × W, two classification outputs with 2 × anchor background or foreground are generated, rpn _ cls _ score _ resurape converts the dimension of the two classification outputs into N × 2 × 10H × W, rpn _ cls _ prob outputs the probability that the fixed reference frame anchor is foreground and the probability that the fixed reference frame anchor is background, and rpn _ cls _ prob _ resurape converts the dimension of the probability of foreground and the probability of background into N × 20 × H × W, wherein rpn _ cls _ score _ resurape and rpn _ cls _ prob _ resurape are used for keeping the output data dimension and the loss function in the same dimension, so that the loss function can be calculated;

and step four, finally, the Proposal module synthesizes the position coordinates of the detection frames and the scores of the foreground and the background to obtain the detection frames, and simultaneously eliminates the detection frames beyond the boundary and the detection frames smaller than a preset value, wherein the detection frames smaller than the preset value are very small detection frames.

S3, the robot collects the images of the display panel in real time and inputs the images into the text detection model obtained by training in the step S2, all texts are automatically marked out to obtain detection boxes, and the position coordinates and the size information of all the detection boxes in the images in the display panel are output.

And S4, the text detection model automatically avoids invalid information according to the position coordinates and the size information of the detection box in the display panel, extracts the Roi image in the detection box, performs image preprocessing on the Roi image, and retains the extracted digital skeleton image as a training sample set. Wherein the image preprocessing of the Roi image at least comprises:

filtering, namely reducing image noise by adopting Gaussian filtering or mean filtering and improving the image quality;

the character segmentation, the image pixel after the direct adaptation binarization is operated, cut apart the character of the white pixel;

and skeleton extraction, which is used for carrying out thinning operation on the image and skeletonizing the value binary image, and the thinned digital skeleton is uniformly identified for convenient identification due to the existence of seven-segment digital tube numbers with different sizes and thicknesses.

S5, making a sample set from the extracted digital skeleton image, training an svm classifier according to the training sample set, and carrying out classification and identification on a single digit through the svm classifier.

And S6, splicing the numbers into character strings, outputting the character strings to a client through a webpage and the like, and displaying the character strings to a user so as to observe the running condition of the equipment in real time and deal with emergency situations in time.

The foregoing merely illustrates the principles and preferred embodiments of the invention and many variations and modifications may be made by those skilled in the art in light of the foregoing description, which are within the scope of the invention.

Claims

1. An intelligent acquisition method for data of a display panel of machine room equipment is characterized by comprising the following steps:

2. The intelligent acquisition method for the data of the display panel of the equipment room according to claim 1, wherein the step S1 specifically comprises: the robot autonomously navigates to the right front of a display panel of the equipment to be detected, a preset point of a camera set in advance is called, the shooting angle is automatically adjusted, and at least 1000 images of the display panel are shot and used as a training data set.

3. The intelligent acquisition method for data of a display panel of equipment room equipment according to claim 1, wherein in step S2, the modified fast-rcnn algorithm at least comprises: a feature extraction module, rpn module, and a nested LSTM module for interfacing the feature extraction module with rpn module.

4. The intelligent acquisition method for data of a display panel of equipment room equipment according to claim 3, wherein the feature extraction module adopts improved vgg16 as a convolution feature extraction module for extracting spatial features of text image pixels in training samples; the feature extraction module at least comprises four convolution blocks Conv1, Conv2, Conv3 and Conv4, pooling layers pool and a target detection special layer Roi, wherein each convolution block is connected with a maximum pooling layer maxpool, a second convolution layer Conv1_2 of a first convolution block comprises two convolution kernels with different sizes and c convolution kernels with 1 × 1, wherein c is 2-5, and a third convolution layer Conv3_3 of a third convolution block comprises two pairs of irregular cross convolution kernels with different sizes and one convolution kernel with 1 × 1.

5. The intelligent acquisition method for data of the display panel of the computer room equipment as claimed in claim 4, wherein the process of extracting the convolution features by the feature extraction module comprises the following steps:

6. The intelligent acquisition method for data of a display panel of equipment room of claim 4, wherein the spatial features extracted by the feature extraction module are input into a nested LSTM module, and the nested LSTM module is used for extracting the relationship among feature vectors in the spatial features and regarding the spatial features as a feature sequence frame by frame, so that the feature of a certain frame of pixels has continuity on the features with the previous frame or the next frame, and a feature sequence with the spatial features and time sequence features is generated.

7. The intelligent acquisition method for the data of the display panel of the computer room equipment as claimed in claim 6, wherein the rpn module is used for returning the feature sequence generated by the nested LSTM module to the original input image according to a fixed reference frame anchor, and generating a detection frame in the original image, wherein the detection frame conforms to the actual size of the text to be detected, and the detection frame represents the detection result of the text detection model.

8. The method for intelligently collecting display panel data of machine room equipment according to claim 7, wherein the process of generating the detection box by the rpn module comprises the following steps:

9. The intelligent collection method for data of display panels of equipment rooms according to claim 1, wherein in step S4, before extracting the Roi image in the detection frame, the method further comprises: and the text detection model automatically avoids invalid information according to the position coordinates and the size information of the detection frame in the display panel.

10. The intelligent acquisition method for display panel data of equipment room according to claim 1, wherein in step S4, the image preprocessing of the Roi image at least comprises:

character segmentation for segmenting a character of a white pixel;

and skeleton extraction for carrying out thinning operation on the image.