CN111461010B

CN111461010B - Power equipment identification efficiency optimization method based on template tracking

Info

Publication number: CN111461010B
Application number: CN202010248283.7A
Authority: CN
Inventors: 杨凤生; 曾惜; 王林波; 王元峰; 王冕; 杨金铎; 王恩伟; 王宏远; 刘畅; 熊萱; 龙思璇; 马庭桦; 兰雯婷; 陈子敬
Original assignee: Guizhou Power Grid Co Ltd
Current assignee: Guizhou Power Grid Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2022-08-12
Anticipated expiration: 2040-04-01
Also published as: CN111461010A

Abstract

The invention discloses a template tracking-based power equipment identification efficiency optimization method, which comprises the following steps of: building a deep classification network and training the deep classification network; detecting and identifying video key frames by using a depth classification network model; extracting a key frame target identification area as a target template, and extracting template characteristics; carrying out template matching in a range twice as large as a non-key frame target area, and calculating a matching rate; judging the relation between the matching rate and the threshold, if the extreme value of the matching rate is greater than or equal to the threshold, updating the area coordinates of the template, re-extracting the characteristics of the template, and if the matching rate is less than the threshold, skipping to the deep neural network model for identification; judging whether the power equipment identification is finished or not, and if so, exiting; the method has high tracking speed, effectively reduces the invalid area identified by the deep learning classifier, and improves the identification frame rate and the false detection rate.

Description

Power equipment identification efficiency optimization method based on template tracking

Technical Field

The invention relates to the field of power equipment identification, in particular to a power equipment identification efficiency optimization method based on template tracking.

Background

Currently, a series of target detection algorithms based on a deep learning algorithm can be roughly divided into two major genres: 1. two-step (two-stage) algorithm: candidate regions are generated and then CNN classification (RCNN series) is performed, 2. one-step (one-stage) algorithm: applying algorithms directly to the input image and outputting the categories and corresponding locations (YOLO series); YOLO has both efficiency and recognition rate, and is a preferred frame for object recognition in the industry at present. The image recognition processing technology plays an important role in daily life and industrial production, the image processing technology cannot be used for monitoring the electric power equipment in the power industry, and the existing technology is low in electric power equipment recognition and detection speed and low in efficiency.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments, and in this section as well as in the abstract and the title of the invention of this application some simplifications or omissions may be made to avoid obscuring the purpose of this section, the abstract and the title of the invention, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made keeping in mind the above problems occurring in the prior art and/or the problems occurring in the prior art.

Therefore, the technical problem to be solved by the invention is the problem of low identification and detection speed of the power equipment in the prior art.

In order to solve the technical problems, the invention provides the following technical scheme: a method for optimizing the recognition efficiency of electric power equipment based on template tracking comprises the following steps,

s1: building a deep classification network and training the deep classification network;

s2: detecting and identifying video key frames by using a depth classification network model;

s3: extracting a key frame target identification area as a target template, and extracting template characteristics;

s4: carrying out template matching in a range twice as large as a non-key frame target area, and calculating a matching rate;

s5: judging the relation between the matching rate and the threshold, if the extreme value of the matching rate is greater than or equal to the threshold, updating the area coordinates of the template, re-extracting the characteristics of the template, and if the matching rate is less than the threshold, skipping to the deep neural network model for identification;

s6: and judging whether the power equipment identification is finished or not, and if so, exiting.

As a preferred scheme of the template tracking-based power equipment identification efficiency optimization method of the present invention, the method comprises: the S1 includes: the method comprises the steps of using a public image recognition data set ImageNet as a pre-training set, collecting samples of different power equipment to respectively establish a training sample set, a testing sample set and a verification sample set under a normal environment, labeling the power equipment in the sample set, enhancing data, using an enhanced regional image of the power equipment as a positive sample, using other regions as negative samples, firstly using the pre-training set to train a pre-training model, and then using the training samples to perform transfer learning based on the pre-training model.

As a preferred scheme of the template tracking-based power equipment identification efficiency optimization method of the present invention, the method comprises: and in the step S1, detecting the correct recognition rate of the model by using the test sample set, if the correct recognition of the model is lower than the recognition rate threshold requirement of the actual scene, continuing the model optimization training until the required recognition rate is reached, and finally obtaining the deep learning model for recognizing the power equipment.

As a preferred scheme of the template tracking-based power equipment identification efficiency optimization method of the present invention, the method comprises: the S2 includes the steps of: the camera grabs a video frame sequence, 8 frames are set as a key frame I, images of the key frame I are transmitted into a deep learning classification model aiming at the key frame I, and the model outputs coordinates, target categories and target category weights of a target rectangular region of the power equipment.

As a preferred scheme of the template tracking-based power equipment identification efficiency optimization method of the present invention, the method comprises: the step S3 includes extracting the target area image obtained from the key frame I in the step S2 as a template, and extracting gradient features of the template.

As a preferred scheme of the template tracking-based power equipment identification efficiency optimization method of the present invention, the method comprises: the S4 includes that every 8 frames of the video frame sequence are key frames I, and the rest are non-key frames B, the non-key frames B match the target position of the current frame based on the target template coordinate position and template features of the previous frame, that is, in the new video frame, the target template target area of the previous frame is expanded to 1.5 times of the original area, the gradient features of the area are extracted, and then the target template extraction gradient features of the previous frame are used for template matching. And calculating a matching rate extreme value.

As a preferred scheme of the method for optimizing the identification efficiency of the power equipment based on the template tracking, the method comprises the following steps: the S5 includes S501: and if the extreme value of the matching rate is greater than or equal to the threshold value, the matching is successful, the target area coordinate is updated to the area coordinate with the highest matching rate, and the gradient feature of the template is re-extracted.

As a preferred scheme of the template tracking-based power equipment identification efficiency optimization method of the present invention, the method comprises: the S5 includes S502: and if the matching rate is smaller than the threshold value, the template matching fails, the current frame is used as a key frame I, a deep neural network model is introduced for identification, the target area coordinates are updated to the target area coordinates in the identification result, and the gradient features of the template are extracted again.

As a preferred scheme of the template tracking-based power equipment identification efficiency optimization method of the present invention, the method comprises: and S6, judging whether the current frame is the last frame, if so, ending target tracking, otherwise, taking the updated template characteristic as the template characteristic of the next frame, and returning to S4.

The invention has the beneficial effects that: the optimization method is used for identifying the marked test set through the model and testing the accuracy by evaluating the area coincidence rate of the detection frame and the marking frame, so that the identification speed of the power equipment is improved, the unnecessary workload is reduced, the invalid area identified by the deep learning classifier is effectively reduced, and the identification frame rate and the false detection rate are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor. Wherein:

fig. 1 is a flowchart of a method for optimizing the identification efficiency of an electrical device based on template tracking according to an embodiment of the present invention;

fig. 2 is a schematic diagram of template tracking in the method for optimizing the identification efficiency of the electrical equipment based on template tracking according to an embodiment of the present invention;

fig. 3 is a schematic diagram of power equipment identification in the method for optimizing power equipment identification efficiency based on template tracking according to an embodiment of the present invention;

fig. 4 is a schematic diagram of power equipment identification in the method for optimizing power equipment identification efficiency based on template tracking according to an embodiment of the present invention;

fig. 5 is a schematic diagram of common template matching in the method for optimizing the identification efficiency of the power equipment based on template tracking according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Next, the present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially according to the general scale for convenience of illustration when describing the embodiments of the present invention, and the drawings are only examples, which should not limit the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Examples

Referring to fig. 1 to 5, the embodiment provides a method for optimizing identification efficiency of electrical equipment based on template tracking, including the following steps:

In this embodiment, a deep learning target recognition model YOLO3 is used, ImageNet is used as a pre-training set, a deep learning frame adopts dark net, and one instruction of the training model is enough, YOLO passes through dark net-53 skeleton network, borrows the reference of a residual error network, shortcut links (shortcut connections) are set between some layers, 3 feature maps of different scales are further used for object detection, an image is divided into a plurality of cells in detection efficiency, and prior frames are set in different scales. Due to the adoption of the pure convolution neural network, the characteristic diagram has position invariance, and the characteristics of the prior frame can be directly calculated by the final characteristic diagram. And performing regression training after the characteristic diagram is obtained so as to obtain an efficient classification model. The detection target is not a regular square or rectangle. YOLO sets 3 prior frames for each downsampling scale, downsamples three times, and clusters 9 prior frames in total. The size of the prior box can be obtained by adopting K-means clustering, and the 9 prior boxes in the COCO data set are as follows: (10x13), (16x30), (33x23), (30x61), (62x45), (59x119), (116x90), (156x198), (373x 326).

In assignment, larger prior boxes (116x90), (156x198), (373x326) are applied on the smallest 13 x13 signature (with the largest receptive field), suitable for detecting larger objects. Medium boxes (30x61), (62x45), (59x119) were applied on the medium 26 × 26 signature (medium receptive field), suitable for detecting medium sized objects. Smaller a priori boxes (10x13), (16x30), (33x23) are applied on the larger 52 x 52 signature (smaller receptive field), suitable for detecting smaller objects.

yolo divides the input image into S × S cells, followed by output in units of cells:

1) if the center of an object falls on a cell, then the cell is responsible for predicting the object.

2) Each cell needs to predict B bbox values (x, y, w, h, c), while a confidence score (confidence scores) is predicted for each bbox value, i.e. each cell needs to predict B x (4+1) values.

3) Therefore, the output dimension of the final network is:

S*S*(B*5+C)

here, although each cell is responsible for predicting one kind of object, each cell can predict a plurality of bbox values.

The meaning of each Bbox is as follows:

1. x, y is the offset of the center of bbox relative to the cell, representing the offset of the center relative to the cell, and is calculated as follows:

2. w, h is the ratio of bbox to the whole picture, the predicted width and height of bbox is (w, h) shows that bbox is the ratio of bbox to the whole picture, and the calculation formula is as follows:

3. confidence, which is composed of two parts, namely whether a target exists in the grid or not and the accuracy of the bbox. The confidence is defined as:

if there are objects in the grid, P _T When (object) is 1, the confidence is equal to IoU. If there is no object in the cell, P _T When (object) is 0, the confidence is 0.

4. Conditional probability of class C, the conditional probability being defined as P _T (class _i I object) indicating the probability that the cell has an object and belongs to the i-th class. The probability of each cell predicting the final output is defined as:

template matching is one of the methods for finding a specific target in an image, and the principle of this method is very simple, and it is possible to traverse every possible position in the image, compare every place with the template to see if "similar", and when the similarity is high enough, consider that we have found our target.

As shown in fig. 5, a common matching algorithm is as follows:

1. square difference matching method CV _ TM _ SQDIFF

2. Normalized square difference matching method CV _ TM _ SQDIFF _ NORMED

3. Correlation matching method CV _ TM _ CCORR

4. Normalized correlation matching method CV _ TM _ CCORR _ NORMED

5. Correlation coefficient matching method CV _ TM _ CCOEFF

6. Normalized correlation coefficient matching method CV _ TM _ CCOEFF _ NORMED

Specifically, S1 includes: the method comprises the steps of using a public image recognition data set ImageNet as a pre-training set, collecting samples of different power equipment to respectively establish a training sample set, a testing sample set and a verification sample set under a normal environment, labeling the power equipment in the sample set, enhancing data, using an enhanced regional image of the power equipment as a positive sample, using other regions as negative samples, firstly using the pre-training set to train a pre-training model, and then using the training samples to perform transfer learning based on the pre-training model.

In the step S1, a test sample set is used to detect the correct recognition rate of the model, and if the correct recognition of the model is lower than the recognition rate threshold requirement of the actual scene, model optimization training is continued until the required recognition rate is reached, and finally a deep learning model for power equipment recognition is obtained; the identification rate detection is to identify the marked test set through a model and test the accuracy rate by evaluating the area coincidence rate of the detection frame and the marking frame.

Further, the S2 includes the following steps: the method comprises the steps that a camera grabs a video frame sequence, 8 frames are set as a key frame I, for the key frame I, images of the key frame I are transmitted into a deep learning classification model, and the model outputs coordinates, target categories and target category weights of a target rectangular region of the power equipment; specifically, the first frame is set as an I frame, the I frame is an image sent into the depth model, the deep learning model can obtain the accurate coordinate position of the target, the obtained coordinate position is used as a template, the subsequent images use template matching to position the target, and the coordinate of the target is updated.

Further, the S3 includes extracting the target area image obtained from the key frame I in the S2 as a template, and extracting gradient features of the template; the gradient feature belongs to an edge feature, a Sobel operator is adopted to extract a Sobel operator reference connection, a gray image is taken as an example, the theoretical basis of the method is that if an edge occurs, the gray level of the image has a certain change, for convenience, if a boundary is represented by black gradual change and white gradual change, the gray level of the image is analyzed, a gray level function at the edge is a first-order function Y ═ kx, the slope k of the gray level function is obtained by obtaining a first-order derivative of the gray level function, namely the first-order derivative of the edge is a constant, and the first-order derivative of the non-edge is zero, so that the edge of the image can be preliminarily judged by obtaining the first-order derivative, namely the derivatives in the X direction and the Y direction, namely the gradient. It is in theory the computer that in this way obtains the edges of the image. However, the specific application to images may find this derivative to be irreducible, since there is no exact function to do it, and the computer is much more cumbersome to solve the analytic solution than to solve the numerical solution, so an alternative way to do the derivative is to use a 3 × 3 window to approximate the derivative of the image. Taking the derivative in the X direction as an example, the derivative at a certain point is the sum of the elements in the third row minus the sum of the elements in the first row, and thus the approximate derivative at a certain point is obtained. It is also well understood why it is similar to representing a derivative, which represents a rate of change, from the first line to the third line, the gray values are subtracted, which is of course a rate of change, which is the so-called Prewitt operator. The approximate X-direction derivative is then found, the Y-direction derivative being similar to the X-direction derivative method except that the sum of the first column elements is subtracted from the sum of the third column elements; if the derivatives in the X and Y directions have the same value, the gradient is also found, and the edge in one graph can be found.

Further, referring to fig. 3 to 4, the S4 includes that each 8 frames of the video frame sequence are a key frame I, and the rest are non-key frames B, and the non-key frames B match the target position of the current frame based on the target template coordinate position and the template feature of the previous frame, that is, in the new video frame, the target template target region of the previous frame is expanded to 1.5 times of the original target template target region, the gradient feature of the region is extracted, and then the target template extraction gradient feature of the previous frame is used to perform template matching. And calculating a matching rate extreme value. S5 includes S501: if the extreme value of the matching rate is greater than or equal to the threshold value, the matching is successful, the target area coordinate is updated to the area coordinate with the highest matching rate, and the gradient feature of the template is extracted again; s5 includes S502: if the matching rate is smaller than the threshold value, template matching fails, the current frame is used as a key frame I, a deep neural network model is led in for identification, the target area coordinates are updated to be the target area coordinates in the identification result, and the gradient features of the template are extracted again; specifically, after extracting the template edge points and the image edge points, the similarity matching rate is measured by using the Hausdorff distance, which is a measure describing the degree of similarity between two sets of point sets and is a defined form of the distance between the two sets of points, and preferably, the threshold value is an empirical value of 0.7.

Further, the step S6 includes determining whether the current frame is the last frame, if so, ending the target tracking, otherwise, taking the updated template feature as the template feature of the next frame, and returning to step S4; the real-time video power equipment identifies that the last frame is when the video is finished.

It is important to note that the construction and arrangement of the present application as shown in the various exemplary embodiments is illustrative only. Although only a few embodiments have been described in detail in this disclosure, those skilled in the art who review this disclosure will readily appreciate that many modifications are possible (e.g., variations in sizes, dimensions, structures, shapes and proportions of the various elements, values of parameters (e.g., temperatures, pressures, etc.), mounting arrangements, use of materials, colors, orientations, etc.) without materially departing from the novel teachings and advantages of the subject matter recited in this application. For example, elements shown as integrally formed may be constructed of multiple parts or elements, the position of elements may be reversed or otherwise varied, and the nature or number of discrete elements or positions may be altered or varied. Accordingly, all such modifications are intended to be included within the scope of this invention. The order or sequence of any process or method steps may be varied or re-sequenced according to alternative embodiments. In the claims, any means-plus-function clause is intended to cover the structures described herein as performing the recited function and not only structural equivalents but also equivalent structures. Other substitutions, modifications, changes and omissions may be made in the design, operating conditions and arrangement of the exemplary embodiments without departing from the scope of the present inventions. Therefore, the present invention is not limited to a particular embodiment, but extends to various modifications that nevertheless fall within the scope of the appended claims.

Moreover, in an effort to provide a concise description of the exemplary embodiments, all features of an actual implementation may not be described (i.e., those unrelated to the presently contemplated best mode of carrying out the invention, or those unrelated to enabling the invention).

It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions may be made. Such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure, without undue experimentation.

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A method for optimizing the recognition efficiency of electric power equipment based on template tracking is characterized by comprising the following steps: the method comprises the following steps:

s1: building a deep classification network model and training the deep classification network model;

s5: judging the relation between the matching rate and the threshold, if the extreme value of the matching rate is greater than or equal to the threshold, updating the coordinates of the template area, re-extracting the characteristics of the template, and if the matching rate is less than the threshold, skipping to the deep classification network model for identification;

s6: judging whether the power equipment identification is finished or not, and if so, exiting;

the S5 includes S501: if the extreme value of the matching rate is greater than or equal to the threshold value, the matching is successful, the target area coordinate is updated to the area coordinate with the highest matching rate, and the gradient feature of the template is extracted again;

the S5 includes S502: and if the matching rate is smaller than the threshold value, the template matching fails, the current frame is used as a key frame I, a deep classification network model is imported for identification, the target area coordinates are updated to the target area coordinates in the identification result, and the gradient features of the template are extracted again.

2. The template tracking-based power equipment identification efficiency optimization method according to claim 1, characterized in that: the S1 includes: the method comprises the steps of using a public image recognition data set ImageNet as a pre-training set, collecting samples of different power equipment to respectively establish a training sample set, a testing sample set and a verification sample set under a normal environment, labeling the power equipment in the sample set, enhancing data, using an enhanced regional image of the power equipment as a positive sample, using other regions as negative samples, firstly using the pre-training set to train a pre-training model, and then using the training samples to perform transfer learning based on the pre-training model.

3. The template tracking-based power equipment identification efficiency optimization method according to claim 2, characterized in that: and in the step S1, detecting the correct recognition rate of the model by using the test sample set, if the correct recognition of the model is lower than the recognition rate threshold requirement of the actual scene, continuing the model optimization training until the required recognition rate is reached, and finally obtaining the deep classification network model for recognizing the power equipment.

4. The template tracking-based power equipment identification efficiency optimization method according to claim 3, characterized in that: the S2 includes the steps of: the camera grabs a video frame sequence, 8 frames are set as a key frame I at intervals, images of the key frame I are transmitted into a depth classification network model aiming at the key frame I, and the model outputs coordinates, target categories and target category weights of a target rectangular area of the power equipment.

5. The template tracking-based power equipment identification efficiency optimization method according to claim 4, wherein the template tracking-based power equipment identification efficiency optimization method comprises the following steps: the step S3 includes extracting the target area image obtained from the key frame I in the step S2 as a template, and extracting gradient features of the template.

6. The template tracking-based power equipment identification efficiency optimization method according to claim 5, characterized in that: the S4 includes that every 8 frames of the video frame sequence are key frames I, and the rest are non-key frames B, the non-key frames B match the target position of the current frame based on the target template coordinate position and template features of the previous frame, that is, in the new video frame, the target template target area of the previous frame is expanded to 1.5 times of the original area, the gradient features of the area are extracted, then the target template extraction gradient features of the previous frame are used for template matching, and the matching rate extremum is calculated.

7. The template tracking-based power equipment identification efficiency optimization method according to claim 6, characterized in that: and S6, judging whether the current frame is the last frame, if so, ending target tracking, otherwise, taking the updated template characteristic as the template characteristic of the next frame, and returning to S4.