CN116823793A

CN116823793A - Device defect detection method, device, electronic device and readable storage medium

Info

Publication number: CN116823793A
Application number: CN202310853041.4A
Authority: CN
Inventors: 蒋乐; 刘洋; 叶晓舟; 欧阳晔
Original assignee: Guangzhou Yaxin Technology Co ltd
Current assignee: Guangzhou Yaxin Technology Co ltd
Priority date: 2023-07-11
Filing date: 2023-07-11
Publication date: 2023-09-29

Abstract

The embodiment of the application provides a device defect detection method, a device, electronic equipment and a readable storage medium, and relates to the technical field of image target detection. The method comprises the following steps: inputting the image of the equipment to be detected into a first defect detection model to obtain a target defect parent class, and if the second defect detection model corresponding to the target defect parent class exists, inputting the image of the region of interest corresponding to the target defect parent class into the second defect detection model to obtain a target defect child class. The first defect detection model is obtained through training according to a sample image set, and each sample image in the sample image set is marked with a corresponding father type label; the second defect detection model is obtained through training according to a sub-category sample image set corresponding to the target defect father category, and each sample image in the sub-category sample image set is marked with a corresponding sub-category label. The sample image characteristics can be fully utilized, the target detection precision of the model is improved, and the accuracy of equipment defect detection is effectively improved.

Description

Device defect detection method, device, electronic device and readable storage medium

Technical Field

The present application relates to the field of image target detection technology, and in particular, to a device defect detection method, device, electronic device, and readable storage medium.

Background

Substation inspection is an effective measure for maintaining normal operation and stability of a power supply system, and potential safety problems can be found in time and solution measures are taken by identifying abnormal problems such as abnormal power transformation equipment, abnormal production behaviors and abnormal equipment operation state, so that the utilization efficiency of the equipment is improved, maintenance and fault costs are reduced, the service life of the equipment is prolonged, and the safety and reliability of the equipment are ensured.

At present, the inspection mode of the transformer substation mainly comprises manual inspection and robot inspection. The manual inspection mode is mainly used for checking whether equipment normally operates through the modes of observation, hearing distinction, sniffing judgment and the like, inspection efficiency is low, a large amount of labor cost is required to be consumed, the problems of omission, negligence and the like are inevitably caused depending on the experience of inspection personnel, the condition of false inspection and omission is easy to occur, and the accuracy of the detection result is affected. The inspection method of the robot mainly adopts an image recognition algorithm to detect equipment defects of images acquired by the robot, and although the inspection efficiency can be improved, as recognition targets (equipment) in a transformer substation scene have the characteristics of unbalanced category number, wide target size range, small differences among partial categories and the like, the extraction and recognition of image features are difficult, and the accuracy of equipment defect detection results is low.

Under such circumstances, it is desirable to provide an apparatus defect detection scheme that improves the accuracy of the apparatus defect detection result.

Disclosure of Invention

The application aims to at least solve one of the technical defects, and the technical scheme provided by the embodiment of the application is as follows:

in a first aspect, an embodiment of the present application provides a method for detecting a device defect, including:

inputting the image of the equipment to be detected into a first defect detection model to obtain a target defect father category; the first defect detection model is obtained through training according to a sample image set, and each sample image in the sample image set is marked with a corresponding father type label;

if the second defect detection model corresponding to the target defect father category exists, inputting the interested area image corresponding to the target defect father category into the second defect detection model to obtain a target defect sub-category; the second defect detection model is obtained through training according to a sub-category sample image set corresponding to the target defect father category, and each sample image in the sub-category sample image set is marked with a corresponding sub-category label.

In an optional embodiment of the present application, the sample image in the sub-category sample image set corresponding to the target defect parent category is a sample image in the sample image set labeled with the target parent category label;

The target parent category label is a defect category label corresponding to the target defect parent category.

In an alternative embodiment of the present application, the first defect detection model comprises: backbone network, neck network and detection end;

inputting an image of equipment to be detected into a first defect detection model to obtain a target defect father category, which comprises the following steps:

inputting the image of the equipment to be detected into a first defect detection model, extracting the characteristics of the image of the equipment to be detected through a backbone network, and outputting a plurality of initial characteristic diagrams with different sizes;

performing resolution transformation on a plurality of initial feature images with different sizes through a neck network, and outputting a plurality of target feature images with different resolutions;

and obtaining a detection result corresponding to each target feature map through the detection end, and outputting the target defect father category according to all the detection results.

In an alternative embodiment of the application, the neck network comprises: a feature pyramid network layer and a cross resolution weighting layer;

the method comprises the steps of carrying out resolution transformation on a plurality of initial feature images with different sizes through a neck network, and outputting a plurality of target feature images with different resolutions, and specifically comprises the following steps:

inputting a plurality of initial feature images with different sizes into a neck network, and fusing the plurality of initial feature images with different sizes through a feature pyramid network layer to obtain a plurality of pyramid feature images;

And carrying out self-adaptive average pooling operation on the pyramid feature images through the cross resolution weighting layer to obtain an average feature image, segmenting the average feature image in the channel dimension to obtain a plurality of weight images, and outputting a plurality of target feature images with different resolutions according to the average feature image.

In an alternative embodiment of the application, the method further comprises:

performing self-adaptive maximum pooling operation on the pyramid feature images through a cross resolution weighting layer to obtain a maximum feature image;

splitting the average feature map in the channel dimension to obtain a plurality of weight maps, including:

adding the average feature map and the maximum feature map according to the bit to obtain a fusion feature map;

and cutting the fusion feature map in the channel dimension to obtain a plurality of weight maps.

In an alternative embodiment of the application, the sample image set is acquired by:

acquiring a parent class label list corresponding to the initial sample image set, randomly and repeatedly sampling the parent class labels in the parent class label list for preset times to acquire a parent class label sequence, and determining the number of the parent class labels of each type in the parent class label list; each sample image in the initial sample image set is marked with a corresponding parent type label, and the parent type label list comprises parent type labels of different types corresponding to the initial sample image set;

Screening or amplifying sample images corresponding to the parent class labels of each type in the initial sample image set according to the number of the parent class labels of each type, and obtaining sample images corresponding to the parent class labels of each type;

and constructing a sample image set according to sample images corresponding to the parent category labels of all the categories in the parent category label sequence.

In an alternative embodiment of the application, the parent class label includes a true annotation box and a true parent class;

the first defect detection model is obtained by:

according to the sample image set, the following training operation is iteratively executed on the initial neural network until a preset training stop condition is met, so as to obtain a first defect detection model:

inputting each sample image in the sample image set into an initial neural network, and outputting a plurality of prediction results corresponding to each sample image; each prediction result comprises a prediction father category and a corresponding prediction frame;

screening out a prediction frame corresponding to the same prediction parent class as the real parent class for each sample image, and obtaining a prediction frame set;

for a prediction frame set corresponding to each sample image, acquiring the position relation between each prediction frame and a real labeling frame, and screening positive sample candidate frames from the prediction frame set according to the position relation;

Acquiring training losses of all positive sample candidate frames and corresponding real labeling frames, screening positive sample prediction frames from the positive sample candidate frames according to each training loss, and acquiring total training losses according to the training losses of each positive sample prediction frame and the corresponding real labeling frame;

and adjusting network parameters of the initial neural network according to the training total loss.

In a second aspect, an embodiment of the present application provides an apparatus defect detection device, including:

the first defect type detection module is used for inputting the image of the equipment to be detected into the first defect detection model to obtain a target defect father type; the first defect detection model is obtained through training according to a sample image set, and each sample image in the sample image set is marked with a corresponding father type label;

the second defect type detection module is used for inputting the region-of-interest image corresponding to the target defect father type into the second defect detection model to acquire the target defect sub-type after determining that the second defect detection model corresponding to the target defect father type exists; the second defect detection model is obtained through training according to a sub-category sample image set corresponding to the target defect father category, and each sample image in the sub-category sample image set is marked with a corresponding sub-category label.

the first defect type detection module is specifically configured to:

In an alternative embodiment of the application, the first defect class detection module is further configured to:

the first defect type detection module is specifically configured to:

In an alternative embodiment of the present application, the device defect detecting apparatus further includes: a sample image set acquisition module; a sample image set acquisition module for:

the device defect detection apparatus further includes: a defect detection model acquisition module; the defect detection model acquisition module is used for:

In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a memory, a processor, and a computer program stored on the memory, and the processor executes the computer program to implement the steps of the device defect detection method provided in any one of the foregoing embodiments.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the device defect detection method provided in any of the above embodiments.

The technical scheme provided by the embodiment of the application has the beneficial effects that:

according to the scheme, the equipment defect category is divided into a parent category and a sub-category corresponding to the parent category, a first defect detection model and a second defect detection model are respectively built for the parent category and the sub-category, identification of the equipment defect parent category of the equipment image equipment to be detected is carried out according to the first defect detection model, and identification of the equipment defect sub-category is carried out according to the second defect detection model. Compared with a single model target identification method, the double-layer defect detection model nesting mode is adopted, equipment defect identification is split into two subtasks and different model processing is adopted, sample image characteristics can be utilized more fully, the target detection precision of the model is improved, and the accuracy of equipment defect detection is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic diagram of a patrol robot according to an embodiment of the present application;

FIG. 2 is a schematic scale diagram of an identification target according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a defect class of a meter according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of a method for detecting defects of a device according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for detecting a device defect according to an embodiment of the present application;

FIG. 6 is a flowchart of a training method for an equipment defect detection model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an inverted residual module according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an hourglass module according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a first defect detection model according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a cross-resolution weighting layer according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of an apparatus for detecting defects in a device according to an embodiment of the present application;

Fig. 12 is a schematic structural diagram of an electronic device for detecting a device defect according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The following description of the terminology and related art related to the application:

because the traditional manual inspection method needs to consume a great deal of labor cost, with the development of the artificial intelligence technology, the automatic robot inspection method based on the inspection robot gradually replaces the manual inspection method.

For example, fig. 1 is a schematic diagram of an inspection robot according to an embodiment of the present application, where the inspection robot shown in fig. 1 improves inspection efficiency, reduces manpower risk, and can timely monitor abnormal conditions, so as to provide support for safe and stable operation of a transformer substation through autonomous navigation and application of various sensors.

At present, aiming at the detection of equipment defects in a transformer substation, a traditional image feature recognition algorithm or a target detection algorithm based on deep learning is generally adopted to carry out image recognition on inspection images acquired by an inspection robot, so that the types of the defects are determined.

Taking three types of related technologies as examples, the following details are presented for the specific scheme and the existing defects of the related technologies:

related art 1: carrying out image acquisition on appearance defect conditions of the transformer by using tools such as a camera, a handheld terminal, a camera and the like through operation staff and engineers in the power station; according to the collected images of the transformer and the marked xml files, randomly dividing a training set and a testing set according to a certain proportion, and respectively using the training set and the testing set for training a model and verifying the accuracy of the model; training a Cascade RCNN (Cascade Region-based Convolutional Neural Networks) model by using the training set data; and deploying the model on a centralized control platform of the inspection robot, and acquiring an appearance image of the transformer equipment by a high-definition camera for inspecting the appearance state of the equipment in the transformer area.

Cascade RCNN belongs to a three-stage Cascade detection algorithm, is more suitable for the conditions of large target scale difference and small targets, and can greatly increase model parameters and increase model detection time consumption by cascading multi-stage detectors.

Related art 2: acquiring a standardized infrared image of the power equipment through a substation equipment detection device; establishing an infrared image sample library of the power equipment, and extracting a training set, a verification set and a test set; establishing a FASTER-RCNN (fast Region-based Convolutional Neural Networks, fast regional convolutional neural network) depth target detection neural network, training the established FASTER-RCNN depth target detection neural network by using a training set of a sample library, and verifying the overfitting degree of the model by a verification set; and carrying out multi-target identification and positioning on the infrared images in the test set by utilizing the network model established by training, and generating an identification result.

The FASTER-RCNN target detection network is a two-stage target detection algorithm, classification accuracy can be effectively improved through a mode of detecting before classifying, but the FASTER-RCNN target detection network has obvious disadvantages in detection speed, meanwhile, the FASTER-RCNN target detection algorithm uses a manually set IOU threshold value to divide positive and negative samples in a training stage, and is inflexible in label distribution, so that accuracy and robustness of target detection are poor.

Related art 3: photographing substation equipment with oil leakage defects, constructing an equipment oil leakage image data set in the substation, and carrying out data enhancement on the data set to increase the sample richness; model training is carried out based on a MobileNet-SSD (MobileNet Single Shot MultiBox Detector, mobile network single-shot multi-frame detector) judging mechanism and an equipment oil leakage image data set in a transformer substation; and inputting the image of the transformer substation equipment to be detected into a trained MobileNet-SSD model, and diagnosing and identifying the oil leakage fault.

The use of the MobileNet-SSD target detection algorithm can reduce the quantity of model parameters and improve the model detection speed, but because the SSD network does not adopt a multi-scale feature fusion mode (such as FPN (Feature Pyramid Network, feature pyramid network)), small targets have insufficient semantic information in high-level features, the method has poor detection effect on small targets and is difficult to adapt to multi-scale targets.

The following difficulties exist in the equipment defect detection task in the substation inspection scene:

(1) Substation inspection belongs to an edge computing scene, and has higher requirements on computing capacity, time delay and stability, so that the training and reasoning efficiency of an algorithm are required to be considered.

(2) Fig. 2 is a schematic scale diagram of an identification target provided by the embodiment of the present application, as shown in fig. 2, in which the range of scale variation of the identification target in substation inspection is large, and there are many large scale (as shown in fig. 2 (a)) and small scale targets (as shown in fig. 2 (b)), so that the algorithm needs to have a large receptive field and multi-scale detection capability.

(3) The training data sample has serious class imbalance phenomenon, so that the recognition effect of part of classes is poor.

(4) Fig. 3 is a schematic diagram of a defect class of a meter according to an embodiment of the present application, where, as shown in fig. 3, the defect class of the meter in the transformer substation may be further subdivided into three categories, i.e. meter reading abnormality (as shown in fig. 3 (a)), dial ambiguity (as shown in fig. 3 (b)) and dial breakage (as shown in fig. 3 (c)). Because fine granularity categories exist under part of equipment defect categories, the difference between the categories is small, a single model is difficult to accurately judge which specific fine granularity category the identification target belongs to, confusion and error classification are easy to generate, and the defect category identification effect is poor.

In summary, the existing inspection method of the related art robot has the problems that on one hand, the computing power and the storage space of the computing platform carried by the inspection robot are limited, the complexity of the current inspection image recognition algorithm model is high, the model parameters are more, the computing amount is large, and the requirement of edge computing power is difficult to meet. On the other hand, the recognition targets (equipment defects) in the transformer substation scene have the characteristics of unbalanced category number, large target scale change, subtle differences among partial categories and the like, and the current inspection image recognition algorithm is difficult to better cope with the characteristics, so that the recognition performance is poor.

In view of at least one of the above-mentioned technical problems or the need for improvement in the related art, the present application proposes a device defect detection method.

The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.

Fig. 4 is a schematic flow chart of an apparatus defect detection method provided in an embodiment of the present application, where an execution subject of the method may be a terminal (such as a patrol robot, a desktop computer for executing a computing task, a notebook computer, a tablet computer, etc.) or a server (a physical server or a cloud server for providing a cloud computing service, etc.), and as shown in fig. 4, an embodiment of the present application provides an apparatus defect detection method, including:

step S401, inputting an image of equipment to be detected into a first defect detection model to obtain a target defect father category; the first defect detection model is obtained through training according to a sample image set, and each sample image in the sample image set is marked with a corresponding father type label.

Specifically, considering that fine-grained categories exist under some equipment defect categories, the difference between the categories is small, in the prior art, a single model is difficult to accurately judge which specific fine-grained category the identification target belongs to, confusion and error classification are easy to generate, and the defect category identification effect is poor.

In the embodiment, the defect categories of the equipment are divided into the defect parent category and the defect sub-category corresponding to the defect parent category (namely, the fine granularity category under the defect parent category), different defect detection models are adopted to detect the defect parent category and detect the defect sub-category corresponding to the defect parent category, and the problem of fine granularity category identification is solved through secondary category detection, so that the defect target detection with smaller difference between categories is better applicable.

The defect parent class of the equipment is detected by a trained first defect detection model, the first defect detection model is obtained through training according to a sample image set, and each sample image in the sample image set is marked with a corresponding marking frame and parent class labels.

Before the equipment defect type detection is performed, an equipment image to be detected needs to be determined, and it can be understood that the equipment image to be detected can be acquired by a patrol robot or a monitoring camera corresponding to the equipment to be detected, and the equipment image to be detected comprises at least one equipment to be detected.

Fig. 5 is a flowchart of an apparatus defect detection method according to an embodiment of the present application, where, as shown in fig. 5, a determined image of an apparatus to be detected is input into a first defect detection model, and a target defect parent class of the apparatus to be detected in the image of the apparatus to be detected is output through the first defect detection model. It can be understood that if the image of the device to be detected includes more than one device to be detected, the first defect detection model outputs the target defect parent class corresponding to each device to be detected.

Step S402, if it is determined that there is a second defect detection model corresponding to the target defect parent class, inputting the region-of-interest image corresponding to the target defect parent class into the second defect detection model to obtain a target defect sub-class; the second defect detection model is obtained through training according to a sub-category sample image set corresponding to the target defect father category, and each sample image in the sub-category sample image set is marked with a corresponding sub-category label.

Specifically, referring to fig. 5 again, since not all defect parent categories have corresponding defect sub-categories, secondary detection is required, after determining the target defect parent category, it is further required to determine whether a second defect detection model corresponding to the target parent category exists.

And if the second defect detection model corresponding to the target parent class does not exist, indicating that the fine-grained class does not exist under the target parent class, and taking the target parent class as the defect class of the equipment to be detected.

If the second defect detection model corresponding to the target defect parent class exists, the position of a corresponding prediction frame when the target defect parent class is output according to the first defect detection model, and an image of a region of interest (Region of Interest, ROI) corresponding to the target defect parent class is cut out from the image of the equipment to be detected. For example: and automatically cutting a prediction frame in the equipment image to be detected through the python script to obtain an interested area image.

Inputting the region-of-interest image into a trained second defect detection model, classifying the defect categories by the second defect detection model, outputting target defect sub-categories of the equipment to be detected, and taking the target defect sub-categories as the defect categories of the equipment to be detected.

The second defect detection model is obtained through training according to a sub-category sample image set corresponding to the target defect father category, and each sample image in the sub-category sample image set is marked with a corresponding sub-category label. In this embodiment, the specific number of the second defect detection models is the same as the number of defect parent categories having defect child categories.

It can be understood that if the image of the device to be detected includes not only one device to be detected, after the first defect detection model outputs the target defect parent class corresponding to each device to be detected, for the target defect class having the corresponding second defect detection model, the region-of-interest image corresponding to each target defect parent class is obtained, and each region-of-interest image is input into the corresponding second defect detection model to obtain the target defect sub-class.

When the first defect detection model training and the second defect detection model training are performed, the sub-sample image set used for the second defect detection model training may be a subset of the first defect detection model training sample image set, so that the sample image features can be fully utilized, and a good detection effect can be obtained even when the sample image data volume is small.

The method can also be adopted to independently and additionally construct a sub-sample image set used for training the second defect detection model, and when the number of the sample images is enough, the generalization capability of the model can be improved and the robustness of the model can be enhanced by adopting the method. According to the technical scheme provided by the embodiment, the equipment defect category is divided into the father category and the sub-category corresponding to the father category, a first defect detection model and a second defect detection model are respectively constructed aiming at the father category and the sub-category, the identification of the father category of the equipment defect of the equipment image to be detected is carried out according to the first defect detection model, and the identification of the sub-category of the equipment defect is carried out according to the second defect detection model. Compared with a single model target identification method, the double-layer defect detection model nesting mode is adopted, equipment defect identification is split into two subtasks and different model processing is adopted, sample image characteristics can be utilized more fully, the target detection precision of the model is improved, and the accuracy of equipment defect detection is effectively improved.

Specifically, fig. 6 is a flowchart of a training method for an equipment defect detection model, which is provided by the embodiment of the present application, as shown in fig. 6, images of abnormal power transformation equipment, abnormal production behavior, abnormal equipment operation state and the like can be obtained as sample images in various manners including mobile phone photographing, inspection robot acquisition and monitoring camera acquisition.

After the sample image is acquired, the device coordinates (real labeling frame) and defect type labels (parent type labels and sub-type labels) in the sample image are labeled, and the defect type labels can reflect the real defect types of the devices in the sample image. For example, a parent class label of a certain sample image is "meter", and a child class label is "dial blur".

For the relation between each defect father category and the corresponding defect subcategory, classifying sample images, dividing the defect subcategory corresponding to the defect father category into clusters, and analyzing a subcategory sample image set. The specific number of sub-category sample image sets is the same as the number of defect parent categories for which defect sub-categories exist.

Furthermore, the images in the sub-sample image set can be cut, the real annotation frame in the sample image is automatically cut through the python script, the sub-sample image of the region of interest is obtained, the sub-sample image of the region of interest and the corresponding label are stored in the folder corresponding to the name of the defect father category, and the sub-category image set after pretreatment is constructed.

In this embodiment, the first defect detection model is trained from a sample image set constructed from sample images labeled with both the parent category label and the child category label. And inputting the image of the equipment to be detected into a first defect detection model to obtain the target defect father category.

After the target defect father category is obtained, determining a target father category label according to the defect category label corresponding to the target defect father category. Selecting sample images marked with target father category labels in the sample image set to form a sub-category sample image set corresponding to the target defect father category, wherein each sample image in the sub-category sample image set is also marked with a sub-category label. And training a second defect detection model according to the sub-category sample image set corresponding to the target defect parent category.

It can be appreciated that in the above embodiment, when the first defect detection model is trained and the second defect detection model is trained, the training sample image is labeled with the parent category label and the sub-category label at the same time.

In another embodiment of the present invention, the first defect detection model is trained from a sample image set constructed from sample images labeled with defect class labels (labeled with parent class labels or labeled with both parent class labels and subcategory labels).

And training a plurality of second defect detection models according to each sub-sample image set marked with the defect type label (marked with the sub-type label or marked with the parent type label and the sub-type label) to obtain a second defect detection model set.

After the target defect parent class is acquired, a second defect detection model corresponding to the target defect parent class is acquired in the second defect detection model set.

According to the technical scheme provided by the embodiment, sample images marked with the target father category labels in the sample image set used in the first defect detection model training are selected through screening, and a sub-sample image set corresponding to the target defect father category used in the second defect detection model training is constructed. The sub-sample image set corresponding to the target defect father category is a subset of the sample image set, and the model training set is constructed in the mode, so that the characteristics of the sample image can be fully utilized, a good detection effect can be achieved when the sample image data size of a trained model is small, the problem that the distinction between fine-grained categories cannot be effectively identified due to insufficient sample data in single model training, the problem that the fine-grained category identification is inaccurate is effectively solved, the target detection precision of the model is improved, and the accuracy of equipment defect detection is effectively improved.

Specifically, the first defect detection model may be divided into: backbone Network (Backbone Network), neck Network (Backbone Network), and Detection Head.

Inputting the image of the equipment to be detected into a first defect detection model, extracting the characteristics of the image of the equipment to be detected through a backbone network, and outputting a plurality of initial characteristic diagrams with different sizes.

The initial feature images with different sizes can capture the target information with different scales, the feature images of the lower layer have better response to the small-scale targets, the feature images of the higher layer have better perceptibility to the large-scale targets, and the detection capability of the model to the targets with different scales can be improved by comprehensively utilizing a plurality of feature images.

The method comprises the steps of inputting a plurality of initial feature images with different sizes into a neck network, carrying out resolution transformation on the plurality of initial feature images with different sizes through the neck network, realizing multi-scale feature communication, outputting a plurality of target feature images with different resolutions, enabling the target feature images to focus on more important positions, improving the expression capability of the feature images, and enabling a model to capture the detail features and the local features of a target better.

Inputting a plurality of target feature images with different resolutions into a detection end, acquiring detection results corresponding to each target feature image through the detection end, selecting a most representative detection result from all detection results, and outputting a target defect father category according to the most representative detection result.

For example, the detection end may find a plurality of overlapped candidate frames in the to-be-detected device image, in order to eliminate redundant detection results, a Non-maximum suppression (Non-Maximum Suppression, NMS) algorithm is adopted, a most representative target frame is screened out according to the predicted class probability score and the overlapping degree of the frames, and a parent class label corresponding to the target frame is output, where the defect parent class corresponding to the parent class label is the target defect parent class.

According to the technical scheme provided by the embodiment, a plurality of initial feature images with different sizes are obtained through multi-scale feature extraction, resolution changes are carried out on the feature images with different sizes to obtain a plurality of target feature images with different resolutions, and the target feature images are detected to output target defect father categories. When detecting the defect father category, extracting and transforming the feature map from two aspects of the scale and the resolution of the feature map, so that cross-level information fusion can be realized, the expressive capacity of the feature map is enhanced, the detailed information and the local information of the target are better captured, and more comprehensive understanding capacity is provided on different scales, so that the first defect detection model can better adapt to targets of different scales, and the detection performance and the accuracy of the first defect detection model can be further improved, wherein the targets of large scale and small scale can be effectively detected and positioned.

In an alternative embodiment of the application, the backbone network comprises: n feature extraction layers; n is an integer of 2 or more;

each feature extraction layer comprises at least one inverted residual error module; the nth feature extraction layer also comprises at least one hourglass module;

Specifically, taking n as 3 as an example, the backbone network includes: a first feature extraction layer, a second feature extraction layer, and a third feature extraction layer; extracting features of an image of a device to be detected through a backbone network, and outputting a plurality of initial feature images with different sizes, wherein the method specifically comprises the following steps:

inputting the image of the equipment to be detected into a backbone network, extracting the characteristics of the image of the equipment to be detected through a first characteristic extraction layer, and outputting a first initial characteristic diagram;

performing feature extraction on the first initial feature map through a second feature extraction layer, and outputting a second initial feature map;

and carrying out feature extraction on the second initial feature map through a third feature extraction layer, and outputting a third initial feature map.

In this embodiment, the first defect detection network extracts image features of an image of the device to be detected through an inversion residual error module and an hourglass module in the backbone network.

The inverted residual module (Inverted Residual block) is originally arranged in the MobileNet-V2 network, and can maintain good accuracy while having high efficiency. The inverted residual module includes a point-wise convolution (Pointwise Convolution) and a depth separable convolution (Depthwise Separable Convolution).

Fig. 7 is a schematic structural diagram of an inverted residual module according to an embodiment of the present application, as shown in fig. 7, the inverted residual module first expands (expansion) feature dimensions by means of 1*1 point-by-point convolution (Conv 1 x 1 in the upper part in fig. 7), then performs spatial information feature extraction by means of 3*3 depth separable convolution (dwase 3 x 3 in fig. 7), and finally reduces the number of channels by means of 1*1 point-by-point convolution (Conv 1 x 1 in the lower part in fig. 7) as a bottleneck (bottleneck), performs dimension reduction (reduction) on the feature map, and performs residual operation on the low dimension, and the feature after dimension reduction and the input feature are added to form residual connection.

Fig. 8 is a schematic structural diagram of an Hourglass Module according to an embodiment of the present application, where, as shown in fig. 8, the structure of the Hourglass Module is similar to a symmetric Hourglass shape, first, spatial information feature extraction is performed by using a depth separable convolution (Dwise 3*3 in fig. 8) of 3*3, the number of channels is reduced by using a 1*1 point-by-point convolution (Conv 1×1 in fig. 8) as a bottleneck (bottleneck), feature dimensions are reduced by a feature map, feature dimensions are expanded (expansion) by a 1*1 point-by-point convolution (Conv 1×1 in fig. 8), feature dimension is increased, and finally, spatial information feature extraction is performed by a depth separable convolution (Dwise 3*3 in fig. 8) of 3*3, and the extracted feature and an input feature are added to form a residual connection.

Because the inverted residual module is adopted to extract the features (such as a MobileNet-V2 network), the features need to be reduced to a lower dimension firstly, and enough useful information may not be reserved, so that the feature expression capability is limited. Therefore, feature extraction by the inverted residual module can lead to very high feature dimensions in the deep layer of the network, greatly increase the number of model parameters, and risk information loss and gradient confusion.

In order to solve the above-mentioned problem, the present embodiment introduces an hourglass module when extracting features, and in the backbone network, a first feature extraction layer is formed by at least one inverted residual module, a second feature extraction layer is formed by at least one inverted residual module, and a third feature extraction layer is formed by at least one inverted residual module and at least one hourglass module. And respectively extracting a first initial feature map and a second initial feature map by adopting a first feature extraction layer and a second feature extraction layer, and extracting a third initial feature map by adopting a third feature extraction layer.

The deep partial inversion residual error module is replaced by the hourglass module, so that the number of model parameters can be effectively reduced, gradient propagation can be better promoted, the risk of gradient confusion is reduced, the reasoning efficiency in target identification is improved, and the requirement of edge calculation is better met. It can be appreciated that the number and positions of the inverted residual modules and the hourglass modules can be determined according to actual requirements when extracting the feature map.

Fig. 9 is a schematic structural diagram of a first defect detection module according to an embodiment of the present application, and as shown in fig. 9, a specific structure example of the first defect detection model is used to describe the structure of the backbone network according to the embodiment. The backbone network of the first defect detection model is based on a MobileNet-V2 network, and the third last inversion residual module in the backbone network of the MobileNet-V2 network is replaced by an hourglass module.

In this embodiment, the size of the to-be-detected device image P0, P0 of the backbone network of the first (Input) defect detection model is 640×640×3, and after feature extraction is performed on P0 by using 6 inverted residual modules, a first initial feature map P1 is obtained, where the size of P1 is 80×80×32. And extracting the characteristics of the P1 through 7 inversion residual modules to obtain a second initial characteristic diagram P2, wherein the size of the P2 is 40 x 96. And performing feature extraction on the P1 through 7 inversion residual modules to obtain a second initial feature map P2, wherein the size of the P2 is 40×40×96. And carrying out feature extraction on the P2 through an inverted residual error module, an hourglass module and two inverted residual error modules in sequence to obtain a third initial feature map P3, wherein the size of the P3 is 20 x 320.

In the three initial feature graphs, the size of P1 is one eighth of P0, the size of P2 is one sixteenth of P0, and the size of P3 is one thirty-half of P0.

Specifically, the neck network of the first defect detection model includes: a feature pyramid network (Feature Pyramid Network, FPN) layer and a Cross resolution weighting (Cross-resolution weight computation, also known as Cross resolution weight calculation) layer.

After a plurality of initial feature images with different sizes are sequentially obtained, inputting the initial feature images with different sizes into a neck network, and fusing the initial feature images with different sizes through a feature pyramid network layer to obtain a plurality of pyramid feature images. It can be appreciated that the number of pyramid feature maps matches the number of initial feature maps, which can be determined according to the actual requirements during feature extraction.

Referring again to fig. 9, a manner in which the feature pyramid network layer merges the initial feature map will be described, taking the neck network structure of the first defect detection model shown in fig. 9 as an example. The neck network of the first defect detection model is based on a YOLO v5s network, and a cross resolution weighting layer is added in the neck network of the YOLO v5s network.

After three initial feature maps P1, P2 and P3 with different sizes are sequentially acquired, inputting the initial feature maps with different sizes into a neck network, and fusing the initial feature maps with different sizes through a feature pyramid network layer. And P3 is used as a pyramid feature map P3', P3' and P2 are fused to obtain a pyramid feature map P2', and P2' and P1 are fused to obtain a pyramid feature map P1'.

After obtaining a plurality of pyramid feature images, inputting the pyramid feature images into a cross resolution weighting layer, and inputting the pyramid feature images in parallel through the cross resolution weighting layer _s Pyramid feature map (pyramid feature map of different sizes) X ₁ ，X ₂ ，...，X _S By a lightweight mapping function H, as shown in the following formula _s (.) to calculate s weight matrices W ₁ ，W ₂ ，...，W _S 。

(W ₁ ，W ₂ ，...，W _s )＝H _s (X ₁ ，X ₂ ，...，X _s )

Wherein X is ₁ Corresponding to the pyramid characteristic diagram with the maximum resolution, X _s Representing the feature map with the s-th large resolution, mapping function H _s The implementation of () is:

for input pyramid feature map { X ₁ ，X ₂ ，...，X _s-1 Adaptive averaging pooling (partitioning the image in a fixed-size network, taking the average of all pixels in the grid) is performed as follows (Adaptive Average Pooling, AAP for short):

X′ ₁ ＝AAP(X ₁ )，X′ ₂ ＝AAP(X ₂ )，...，X′ _s-1 ＝AAP(X _s-1 )

average pooling of input pyramid feature maps of multiple different sizes to a given size W _s ×H _s Will { X' ₁ ，X′ ₂ ，...，X′ _s-1 } and X _s Concatenating (concatenating) in the channel direction and performing 1*1 convolution, reLU (Rectified Linear Unit, modified linear unit), 1*1 convolution operations to generate an average feature map F _avg 。

For average characteristic diagram F _avg Executing sigmoid operation, and segmenting in the channel dimension to generate s weight graphs with different resolutions: w'. ₁ ，W′ ₂ ，...，W′ _s . Each weight map corresponds to an input pyramid feature map.

And for each pyramid feature map, up-sampling the corresponding weight map to the size consistent with the corresponding pyramid feature map, and multiplying the feature map by the up-sampled weight map according to elements to obtain the corresponding target feature map. And obtaining and outputting a plurality of corresponding target feature graphs with different resolutions according to the plurality of pyramid feature graphs.

Referring to fig. 9 again, after a plurality of target feature maps with different resolutions are obtained, a detection end obtains a detection result (i.e., prediction) corresponding to each target feature map.

According to the technical scheme provided by the embodiment, the feature pyramid layer is used for carrying out feature fusion processing on a plurality of initial feature graphs, expression and context information of the target under different scales can be captured, and features from the shallow layer and the deep layer are fused to obtain richer semantic information, so that the target can be detected under different scales, and the accuracy of the first defect detection model in target detection is improved. The resolution conversion processing is carried out on the pyramid feature images through the cross resolution weighting layer, so that the cross resolution and channel information exchange of the pyramid feature images can be realized, the information contribution of the multi-scale feature images is balanced better, the feature image fusion effect is improved, and the accuracy and the robustness of the first defect detection model in target detection are further improved.

In an alternative embodiment of the application, the method further comprises:

Specifically, the cross resolution weighting layer in this embodiment provides another resolution transformation processing mode, as follows:

for input pyramid feature map { X ₁ ，X ₂ ，...，X _s-1 An adaptive max pooling (partitioning an image in a fixed-size network, with pixel values within the grid taking the maximum of all pixels within the grid) operation (Adaptive Max Pooling, AMP for short) is performed as follows:

X″ ₁ ＝AMP(X ₁ )，X″ ₂ ＝AMP(X ₂ )，...，X″ _s-1 ＝AMP(X _s-1 )

maximizing pooling of input multiple pyramid feature maps of different sizes to a given size W _s ×H _s Will { X' ₁ ，X′ ₂ ，...，X′ _s-1 } and X _s Concatenating (concatenating) in the channel direction and performing 1*1 convolution, reLU, 1*1 convolution operations to generate a maximum signature F _max 。

Will F _avg And F _max Adding bits, sequentially executing sigmoid and splitting operation of channel dimension, and generating weight graphs with s different resolutions: w'. ₁ ，W′ ₂ ，...，W′ _s . Each weight map corresponds to an input pyramid feature map.

For example, fig. 10 is a schematic structural diagram of a cross resolution weighting layer according to an embodiment of the present application, and as shown in fig. 10, a step of obtaining a target feature map in a resolution transformation processing manner according to this embodiment is described by taking three pyramid feature maps as an example.

From the input pyramid feature map { X ] ₁ ，X ₂ ，X ₃ Adaptive average pooling (i.e., AAP in fig. 10) and adaptive maximum pooling (i.e., AMP in fig. 10), respectively.

Processing the three pyramid feature images after the average pooling sequentially by a Concate layer, conv 1*1 (namely 1*1 convolution), reLU and Conv 1*1 to obtain an average feature image F _avg . Sequentially processing the three pyramid feature graphs subjected to maximum pooling by a Concate (namely concatate) layer, conv 1*1 (namely 1*1 convolution), reLU and Conv 1*1 to obtain a maximum feature graph F _max 。

F is added by addition between matrices (Matrix Sum) _avg And F _max Adding bits, and sequentially executing sigmoid and segmentation operation of channel dimension to obtain 3 weight graphs W' ₁ ，W′ ₂ ，W′ ₃ . And multiplying the pyramid feature map which is up-sampled to the size consistent with the corresponding pyramid feature map and corresponds to the direct transfer (Identity) by elements by adopting element-by-element multiplication (Pointwise Multiplication, also called point-by-point multiplication), and obtaining and outputting (Output) three target feature maps.

The technical scheme provided by the embodiment is beneficial to smoothing the feature map and capturing wider context information, local details are emphasized by selecting the most obvious features in the maximum pooling, and the average value and the salient feature value can be reserved simultaneously in a mode of combining the self-adaptive average pooling and the self-adaptive maximum pooling when resolution conversion is carried out by the cross resolution weighting layer, so that the feature information which is richer and diversified is captured, the advantages of the average pooling and the maximum pooling are comprehensively utilized, the obtained target feature map can provide more comprehensive and fine spatial information representation, the feature expression is enriched, the distinguishing property of the features is improved, and further the robustness and the detection accuracy of the first defect detection model are further improved.

Specifically, considering that serious class imbalance exists in each defect class in an actually collected sample image, after labeling a corresponding parent class label for each sample image and acquiring an initial sample image set, the embodiment provides a class equalization sampling method of the sample image, which performs sample equalization processing on the initial sample image set, wherein the specific steps of the sample equalization processing are as follows:

And obtaining a parent category label list corresponding to the initial sample image set according to the parent category labels of different types corresponding to the initial sample image set. For example, when there are m kinds of parent class labels, the corresponding parent class labels are denoted by numbers corresponding to 1-m, and the parent class label list may be denoted as cat_list= {0,1,2 … m }.

And randomly and repeatedly sampling the parent class labels in the parent class label list for preset times to obtain a parent class label sequence, and determining the number of the parent class labels of each type in the parent class label list. For example, N label categories may be randomly and repeatedly sampled from the cat_list, a parent category label sequence cat_samples is obtained, and the number of occurrences of 1-m in the cat_samples is recorded as the number of parent category labels of each category. It will be appreciated that when the number of N is much greater than m, the number of parent class labels for each category is approximately equal.

And screening or amplifying the sample images corresponding to the parent class labels of each type in the initial sample image set according to the number of the parent class labels of each type, and obtaining the sample images corresponding to the parent class labels of each type. For example, when the number of sample images in the sample image set after class equalization is determined to be N, the sample images corresponding to the parent class labels of each class are screened or amplified to the number of the parent class labels of each class, and the sample images corresponding to the parent class labels of the corresponding class are obtained. It is understood that the number of sample images in the sample image set after class equalization and N may be other multiples.

There may be two cases of X.ltoreq.Y and X > Y due to the number X of parent class labels of any kind and the number Y of corresponding sample images. And when X is less than or equal to Y, screening X sample images from the sample images corresponding to the parent class labels of the class. And when X > Y, amplifying the sample image corresponding to the parent class label of the class to X sample images.

It is understood that the specific manner of screening can be random sampling, clustered sampling, nearest neighbor sampling, and the like. The amplification may be performed by image transformation (such as flipping, panning, scaling, adding noise, etc.) or copying the sample image.

According to sample images corresponding to the parent category labels of all the categories in the parent category label sequence, a sample image set is constructed, and the number of sample images corresponding to all the parent categories in the sample image set is approximately equal.

It can be understood that the sample equalization processing method provided by this embodiment may be applied to samples corresponding to the defect parent category, and also may be applied to samples corresponding to the defect child category corresponding to each defect parent category, so as to implement data equalization of the training sample image set of the second defect detection model.

Taking a pytorch frame as an example, an application example of sample equalization processing in this embodiment is illustrated:

Implementing class-balanced sampling requires overwriting the sampler function in the DataLoader, which functions to generate an index sequence that accesses the dataset, thus requiring the generation of a class-balanced sample image set index sequence.

All labels in the initial image sample set are traversed, a mapping relation dictionary of each defect father category cat and a sample image id list corresponding to the defect father category cat is established, and the mapping relation dictionary is marked as cat_img_id_list= { cat_id: [ img_id, … ] … }.

If the number of the images in the mapping relation dictionary is N, randomly and repeatedly sampling N category labels from a parent category label list cat_list= {0,1,2 … m } to obtain a parent category label sequence cat_samples.

And performing label class balancing operation on each parent class label in the parent class label list until all the parent class labels are traversed, obtaining a picture index sequence cat_samples with evenly distributed classes, returning the picture index sequence cat_samples to the DataLoader, realizing sample class balancing of the initial sample image set, and obtaining a balanced sample image set.

The label class balancing operation specifically comprises the following steps:

for any target parent category label in the parent category label list, index corresponding to the target parent category label in the cat_samples is found out, index list index_list is obtained, and the number X of labels in the index list is determined.

Obtaining a sample image id list img_samples corresponding to the target father category from a mapping relation dictionary cat_img_id_list, marking the number of sample images in img_samples as Y, and randomly scrambling the sample images in img_samples to obtain IMG_samples.

Starting from next_start_index (initial 0), obtaining ids of continuous X sample images in IMG_samples, and assigning the ids of the continuous X sample images to positions which are indexes in cat_samples through an index list index_list. Meanwhile, the next_start_index value is increased by X, if the value of the next_start_index is larger than Y, the next_start_index is reset to 0, and X sample images are extracted from IMG_samples in a recyclable sequence.

According to the technical scheme provided by the embodiment, the probability of occurrence of various types of samples in the sample image is controlled through class balance sampling of the sample image, and the problem of unbalanced training sample class during training of the defect detection model is solved. The performance of the defect detection model is effectively improved, the overfitting phenomenon caused by the difference of the number of samples among different categories is reduced, the defect detection model can more comprehensively capture the characteristics of each defect category of different categories, the identification capability of the defect detection model to the category with the smaller number of sample images is improved, and the defect detection accuracy of the defect detection model is further improved.

the first defect detection model is obtained by:

Specifically, the first defect detection model in this embodiment implements label distribution based on Sim-OTA (Optimal Transport Assignment) distribution strategy.

In the first defect detection model training process, each sample image in the sample image set for training comprises a parent category label, and the parent category label comprises a real annotation frame and a real parent category.

and inputting each sample image in the sample image set into an initial neural network, and outputting a plurality of prediction results corresponding to each sample image. Each prediction result includes a prediction parent class and a corresponding prediction box.

And screening out the prediction frames corresponding to the same prediction parent class as the real parent class for each sample image, and obtaining a prediction frame set.

And for the prediction frame set corresponding to each sample image, acquiring the position relation between each prediction frame and the real annotation frame, and screening positive sample candidate frames from the prediction frame set according to the position relation.

For example, a prediction frame with a center point within the true labeling frame gt is selected, denoted as in_boxes, a radius is defined with the center point of gt as a center, r is defined as a radius, and a prediction frame with a center point within the range of S is selected, denoted as in_centers. When a certain prediction block is marked as both in_boxes and in_centers, it is marked as a positive sample candidate block as fg_boxes.

And obtaining training losses of all positive sample candidate frames and corresponding real labeling frames, screening positive sample prediction frames from the positive sample candidate frames according to each training loss, and obtaining total training losses according to the training losses of each positive sample prediction frame and the corresponding real labeling frames.

For example, the IOU (Intersection over Union, overlap) penalty of all positive sample candidate boxes fg_boxes and all real labeling boxes gt is calculated, and the classification penalty of all positive sample candidate boxes fg_boxes and all real labeling boxes gt is calculated. It is understood that IOU losses include, but are not limited to, one or more of IOU loss, smooth L1 loss, DIoU loss (Distance Intersection over Union loss), and BCE loss (Binary Cross Entropy loss), and classification losses include, but are not limited to, one or more of cross entropy loss (Cross Entropy Loss), KL divergence loss (KL Divergence Loss), and softmax cross entropy loss (Softmax Cross Entropy Loss).

A dynamic k-matching process is performed by the IOU penalty and classification penalty for each positive sample candidate box:

for each real labeling frame gt, selecting a positive sample candidate frame with the largest front topk iou and corresponding topk iou, and summing the topk iou to obtain a k value.

For each real labeling frame gt, the first k positive sample candidate frames with the smallest classification loss are selected as positive sample prediction frames of the real labeling frame gt.

Obtaining an IOU loss matrix iou_loss_matrix according to the IOU loss of each positive sample prediction frame and the corresponding real labeling frame, and obtaining a classification loss matrix cls_loss_matrix according to the classification loss.

The training total loss is calculated by the total loss calculation formula cost=cls_loss_matrix+m_iou_loss_matrix. It can be understood that m is a weight coefficient, and can be set according to practical needs, for example, set to 3.

And according to the total training loss, the network parameters of the initial neural network are adjusted, and the loss is optimized.

In the technical scheme provided by the embodiment, the first defect detection model realizes label distribution based on a dynamic label distribution Sim-OTA distribution strategy, and compared with a mode of manually setting a threshold value to perform label distribution, the Sim-OTA label distribution strategy allows the labels to be dynamically distributed and redistributed, so that flexibility and adaptability are realized, the label distribution process can be dynamically and scheduled and distributed according to task demands and computing resources, the network performance of the model is effectively optimized, computing resources are fully utilized, and computing efficiency is improved.

The following describes a specific application of the embodiment of the present application in detail by a specific example:

before the detection of the device image to be identified is performed, a lighter target detection model Light-YOLO is constructed based on the YOLO v5s network, and compared with the YOLO v5s network, the Light-YOLO is optimized in three parts of network structure, training sample sampling and label distribution.

Wherein the network architecture optimization includes optimization of backbone networks and neck networks. The backbone network part adopts a MobileNet V2 as a basic network, and replaces the third last inversion residual error module with an hourglass module. The neck network portion adds a cross multi-scale weighting layer after the feature pyramid network layer. The specific structure of the Light-Yolo model is shown in fig. 9.

In the way of training sample sampling, a class equalization sampling method is adopted to replace random probability sampling in Yolo v5 s.

In the label distribution mode, a dynamic label distribution mode Sim-OTA is adopted to replace a label distribution mode in which a threshold value is manually set in YOLO v5 s.

Compared with the original YOLO v5s network, the Light-Yolo has lower model parameter quantity and higher reasoning efficiency, so that the method is better suitable for substation inspection scenes. Meanwhile, the method has stronger multi-scale target detection capability and can be well suitable for the condition of unbalanced sample types.

And determining a sub-category sample image set corresponding to each parent category label according to the sample image set with the parent category label and the sub-category label processed by the category balance sampling method. Training Light-Yolo through the sample image set to obtain a trained first defect detection model. And respectively training ResNet-18 (the basic architecture of the network is ResNet (Residual Networks, residual neural network) and the depth of the network is 18 layers) classification networks according to each sub-class sample image set to obtain a plurality of trained second defect detection models.

Inputting the image of the equipment to be detected into the first defect detection model to obtain a target defect father category of the equipment to be detected, and determining a corresponding target second defect detection model according to the target defect father category. And cutting out the image of the equipment to be detected according to the detection frame corresponding to the target defect father category to obtain an image of the region of interest. And inputting the region-of-interest image into a target second defect detection model to obtain a target defect subcategory of the equipment to be detected.

Fig. 11 is a schematic structural diagram of an apparatus for detecting a device defect according to an embodiment of the present application, as shown in fig. 11, the apparatus 11 may include: a first defect class detection module 111 and a second defect class detection module 112;

A first defect type detection module 111, configured to input an image of a device to be detected into a first defect detection model to obtain a target defect parent type; the first defect detection model is obtained through training according to a sample image set, and each sample image in the sample image set is marked with a corresponding father type label;

a second defect class detection module 112, configured to, after determining that a second defect detection model corresponding to the target defect parent class exists, input an image of a region of interest corresponding to the target defect parent class into the second defect detection model to obtain a target defect sub-class; the second defect detection model is obtained through training according to a sub-category sample image set corresponding to the target defect father category, and each sample image in the sub-category sample image set is marked with a corresponding sub-category label.

According to the technical scheme provided by the embodiment, the equipment defect category is divided into the father category and the sub-category corresponding to the father category, a first defect detection model and a second defect detection model are respectively constructed aiming at the father category and the sub-category, the identification of the father category of the equipment defect of the equipment image to be detected is carried out according to the first defect detection model, and the identification of the sub-category of the equipment defect is carried out according to the second defect detection model. Compared with a single-model target identification method, the double-layer defect detection model nesting mode is adopted, equipment defect identification is split into two subtasks and different model processing is adopted, sample image characteristics can be utilized more fully, the target detection precision of the model is improved, the accuracy actual condition of equipment defect detection is effectively improved, and the reliability of equipment defect detection results is effectively improved.

The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.

the first defect type detection module is specifically configured to:

inputting each sample image of the sample image set into an initial neural network, and outputting a plurality of prediction results corresponding to each sample image; each prediction result comprises a prediction father category and a corresponding prediction frame;

The embodiment of the application provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of the device defect detection method, and compared with the related technology, the method can realize the following steps: the method comprises the steps of dividing the equipment defect category into a parent category and a sub-category corresponding to the parent category, respectively constructing a first defect detection model and a second defect detection model aiming at the parent category and the sub-category, identifying the equipment defect parent category of the equipment image to be detected according to the first defect detection model, and identifying the equipment defect sub-category according to the second defect detection model. Compared with a single-model target identification method, the double-layer defect detection model nesting mode is adopted, equipment defect identification is split into two subtasks and different model processing is adopted, sample image characteristics can be utilized more fully, the target detection precision of the model is improved, the accuracy actual condition of equipment defect detection is effectively improved, and the reliability of equipment defect detection results is effectively improved.

In an alternative embodiment, an electronic device is provided, fig. 12 is a schematic structural diagram of an electronic device for detecting a device defect according to an embodiment of the present application, and the electronic device 12 shown in fig. 12 includes: a processor 121 and a memory 123. Processor 121 is coupled to memory 123, such as via bus 122. Optionally, the electronic device 120 may further include a transceiver 124, and the transceiver 124 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that, in practical applications, the transceiver 124 is not limited to one, and the structure of the electronic device 120 is not limited to the embodiment of the present application.

The processor 121 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 121 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 122 may include a path to transfer information between the components. Bus 122 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect Standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. Bus 122 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or one type of bus.

Memory 123 may be, without limitation, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer.

The memory 123 is used to store a computer program for executing an embodiment of the present application, and is controlled to be executed by the processor 121. The processor 121 is arranged to execute a computer program stored in the memory 123 for carrying out the steps shown in the previous method embodiments.

The electronic device in the embodiment of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a car-mounted terminal (e.g., car navigation terminal), a wearable device, etc., and a fixed terminal such as a digital TV, a desktop computer, etc.

Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the foregoing method embodiments and corresponding content.

The computer readable storage medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described.

It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.

The foregoing is merely an optional implementation manner of some of the implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, and the implementation manner is also within the protection scope of the embodiments of the present application.

Claims

1. A method for detecting a device defect, comprising:

if the second defect detection model corresponding to the target defect father category exists, inputting the region-of-interest image corresponding to the target defect father category into the second defect detection model to obtain a target defect sub-category; the second defect detection model is obtained through training according to a sub-category sample image set corresponding to the target defect father category, and each sample image in the sub-category sample image set is marked with a corresponding sub-category label.

2. The apparatus defect detection method according to claim 1, wherein the sample image in the sub-category sample image set corresponding to the target defect parent category is a sample image in the sample image set labeled with a target parent category label;

the target father category label is a defect category label corresponding to the target defect father category.

3. The device defect detection method of claim 1 or 2, wherein the first defect detection model comprises: backbone network, neck network and detection end;

inputting the image of the equipment to be detected into a first defect detection model to obtain a target defect father category, which comprises the following steps:

inputting the to-be-detected equipment image into the first defect detection model, extracting the characteristics of the to-be-detected equipment image through the backbone network, and outputting a plurality of initial characteristic diagrams with different sizes;

performing resolution transformation on the plurality of initial feature images with different sizes through the neck network, and outputting a plurality of target feature images with different resolutions;

and acquiring a detection result corresponding to each target feature map through the detection end, and outputting the target defect father category according to all the detection results.

4. A method of device defect detection according to claim 3, wherein the neck network comprises: a feature pyramid network layer and a cross resolution weighting layer;

the method comprises the steps of carrying out resolution transformation on the plurality of initial feature images with different sizes through the neck network, and outputting a plurality of target feature images with different resolutions, wherein the method specifically comprises the following steps:

Inputting the initial feature images with different sizes into the neck network, and fusing the initial feature images with different sizes through the feature pyramid network layer to obtain a plurality of pyramid feature images;

and carrying out self-adaptive average pooling operation on the pyramid feature images through the cross resolution weighting layer to obtain an average feature image, segmenting the average feature image in a channel dimension to obtain a plurality of weight images, and outputting the target feature images with different resolutions according to the average feature image.

5. The device defect detection method of claim 4, wherein the method further comprises:

performing self-adaptive maximum pooling operation on the pyramid feature images through the cross resolution weighting layer to obtain a maximum feature image;

the step of segmenting the average feature map in the channel dimension to obtain a plurality of weight maps includes:

and cutting the fusion feature map in a channel dimension to obtain the plurality of weight maps.

6. The device defect detection method of claim 2, wherein the sample image set is acquired by:

Acquiring a parent class label list corresponding to an initial sample image set, randomly and repeatedly sampling parent class labels in the parent class label list for preset times to acquire a parent class label sequence, and determining the number of the parent class labels of each type in the parent class label list; each sample image in the initial sample image set is marked with a corresponding parent category label, and the parent category label list comprises parent category labels of different types corresponding to the initial sample image set;

and constructing the sample image set according to sample images corresponding to the parent category labels of all the categories in the parent category label sequence.

7. The device defect detection method of any of claims 1, 2, or 6, wherein the parent class label comprises a true callout box and a true parent class;

the first defect detection model is obtained by the following steps:

according to the sample image set, performing the following training operation on the initial neural network in an iterative manner until a preset training stop condition is met, so as to obtain the first defect detection model:

screening out a prediction frame corresponding to the same prediction father category as the real father category for each sample image, and obtaining a prediction frame set;

for a prediction frame set corresponding to each sample image, acquiring the position relation of each prediction frame and a real labeling frame, and screening positive sample candidate frames from the prediction frame set according to the position relation;

acquiring training losses of all positive sample candidate frames and corresponding real labeling frames, screening positive sample prediction frames from the positive sample candidate frames according to each training loss, and acquiring training total losses according to the training losses of each positive sample prediction frame and the corresponding real labeling frame;

and adjusting the network parameters of the initial neural network according to the training total loss.

8. A device defect detection apparatus, comprising:

The second defect type detection module is used for inputting the region-of-interest image corresponding to the target defect father type into the second defect detection model to acquire a target defect sub-type after determining that the second defect detection model corresponding to the target defect father type exists; the second defect detection model is obtained through training according to a sub-category sample image set corresponding to the target defect father category, and each sample image in the sub-category sample image set is marked with a corresponding sub-category label.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-7.