CN110688925A

CN110688925A - Cascade target identification method and system based on deep learning

Info

Publication number: CN110688925A
Application number: CN201910887482.XA
Authority: CN
Inventors: 刘广秀; 王万国; 许玮; 慕世友; 周大洲; 李建祥; 王振利; 刘丕玉; 张旭; 刘越; 贾亚军; 李勇; 郭锐; 赵金龙; 李振宇; 许荣浩
Original assignee: Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd; State Grid Intelligent Technology Co Ltd
Current assignee: State Grid Intelligent Technology Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2020-01-14
Anticipated expiration: 2039-09-19
Also published as: CN110688925B

Abstract

The disclosure provides a cascade target identification method and a system based on deep learning, which are used for obtaining a sample to be detected of a patrol image, marking a target detection sample and expanding the number of the sample; multi-features of image enhancement data are fused to perform a multi-level deep learning detection algorithm, so that the detection of salient equipment aiming at a target with a large proportion is realized, and the noise interference of a complex background to the detection algorithm is eliminated; and multi-feature fusion of image enhancement data is performed for multi-level deep learning algorithm detection, so that the detection accuracy of deep learning in small target detection is improved, and the influence of image quality on the detection algorithm is reduced.

Description

Cascade target identification method and system based on deep learning

Technical Field

The disclosure belongs to the field of artificial intelligence, and particularly relates to a cascade target identification method and system based on deep learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In recent years, target detection is a hot spot of research of scholars at home and abroad, and is widely applied to the fields of military affairs, medicine, traffic, security and the like. However, the detection of a small target area under a large field of view is always a difficult problem of target detection, and the small target has few pixels and few unobvious features, so that the detection rate of the small target is low compared with that of a large target. When the target occupies a small proportion in the original image, the recognition algorithm is often misjudged by noise interference of other areas, and is particularly obvious in routing inspection images under complex backgrounds such as mutual shading, forests, mountains and the like and multiple interference factors. And the target with smaller occupation is sometimes a high attention area, such as critical defect of equipment, and also becomes a key area detection point. Therefore, the research on the fine-grained vision classification and recognition algorithm under the complex background can play a great promoting role in the application of the vision algorithm in the inspection field. The deep learning has achieved a certain performance in image processing, but has a certain deficiency in target detection with a small target proportion.

The shallow network of the convolutional neural network focuses more on detail information, the high network focuses more on semantic information, and the high semantic information can help us to detect a target, so that the characteristic map on the last convolutional layer can be used for prediction. This method exists in most deep networks, such as VGG, ResNet, inclusion, which use the last layer of features of the deep network for classification. The method has the advantages of high speed and less memory requirement. The disadvantage is that we only focus on the features of the last layer in the deep network, but the detailed information can improve the detection accuracy to some extent. Therefore, the detection effect for a small target under a large field of view is poor. Meanwhile, images in many scenes are limited by outdoor acquisition and complex illumination environment, and can be affected by illumination and blurring and are increasingly difficult to process.

Disclosure of Invention

In order to solve the above problems, the present disclosure provides a method and a system for identifying a cascade target based on deep learning. The method can fully utilize the superiority of the deep learning detection algorithm for detecting the target with larger proportion, and combines the multi-features of the image enhancement data to carry out multistage deep learning algorithm detection, thereby improving the detection accuracy of deep learning in small target detection and reducing the influence of image quality on the detection algorithm.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

the cascade target identification method based on deep learning comprises the following steps:

step (1): and obtaining a sample to be detected of the inspection image, marking a target frame on the target detection sample, and expanding sample data.

As an alternative embodiment, the image is respectively subjected to rotation, mirror image, defogging, enhancement, deblurring and the like in the sample expansion process.

As an alternative embodiment, the samples are expanded to four times and more than the original number of samples.

Step (2): and (4) realizing iterative step-by-step refined target area detection training by using a deep learning detection algorithm.

Wherein the deep learning detection algorithm can select a primary or secondary detection algorithm. The first-level detection algorithm directly converts the problem of target frame positioning into regression problem processing without generating candidate frames, such as SSD, YOLO, RetinaNet, and the like. The secondary detection algorithm means that a series of candidate frames serving as samples are generated firstly, and then sample classification is carried out through a convolutional neural network, such as RCNN, fast-RCNN, FPN and the like.

The selectable data access modes of the diagnosis method and the detection stage of the system are two types:

one is that the acquisition system directly acquires real-time transmission data.

The transmission mode can be wireless transmission, wired transmission and the like as alternative embodiments.

The other is by directly reading the offline data.

And (3): after the model training, testing the deep learning detection algorithm under the multiple stages to obtain a model with a good effect as a cascade detection algorithm model of the detection system. Finally, n sets of detection MODELs can be obtained, wherein MODEL is { MODEL₁，MODEL₂，…MODEL_nWhere { N ∈ N, N }>1}，MODEL_nFor one or more models, defined as { model }_n1，model_n2，…model_niWhere { i ∈ N, i ≠ 0 }. The region detected by the MODEL for detecting MODEL is defined as D ═ D₁，D₂，…D_n}，D_n＝{d₁，d₂，…d_jWhere j is e.N. The region CLASS predicted by each detection model is defined as CLASS ═ { CLASS₁，CLASS₂，…CLASS_n}，CLASS_nIs defined as { class₁，class₂，…class_kWhere k is N.

And (4): and (3) adding image preprocessing operation to the data acquired in the step (2), and combining a multi-stage detection algorithm model to perform target detection and identification.

The method comprises the following specific steps:

and (5.1) acquiring the inspection image of the area to be detected, and performing image preprocessing, wherein the preprocessing method comprises image rotation, image mirroring, image defogging, image enhancement, image deblurring and other methods.

(5.2) applying the detection algorithm MODEL MODEL obtained in the step (3)₁Detecting the target area of the original image to be detected and the image after image preprocessing, filtering the overlapped frame of the multi-image parallel target detection result by using a non-maximum suppression method to obtain a primary detection result, realizing the primary positioning of the target significant area and obtaining a primary target area D₁And CLASS CLASS₁。

(5.3) applying the detection algorithm MODEL MODEL obtained in the step (3)₂Inputting the original image of the primary target area and the area image after cutting out and preprocessing into MODEL₂In the detection MODEL, the target is detected through an algorithm detection network, the multi-graph parallel type prediction result filters an overlapped frame by using a non-maximum suppression method, and the MODEL is obtained₂Lower prediction region D₂And CLASS CLASS₂。

(5.4) refining the iterative area detection step by step, and enabling the MODEL to be_n-1Generated prediction region M_n-1As MODEL_nTo finally obtain the target detection area D_nAnd category ClASS_n。

The cascade target identification system based on deep learning comprises:

the sample processing module is configured to obtain a training sample of the inspection image, perform target frame marking on a target detection sample, and expand the number of samples;

the model building and training module is configured to select a first-level or second-level deep learning target area detection algorithm for model training;

the target sample acquisition module is configured to acquire a sample to be detected of the inspection image, preprocess the sample, input multiple paths of samples into a next-stage detection algorithm, position and classify a target salient region, and acquire multiple paths of multi-stage target regions and target types;

and the identification module is configured to input the multiple target areas into a next-stage detection algorithm, realize intelligent prediction of the target areas step by step, predict the target areas and types through iterative reasoning, and finally acquire the areas and types of the targets.

A computer-readable storage medium, wherein a plurality of instructions are stored, the instructions being adapted to be loaded by a processor of a terminal device and to perform all or part of the steps of the above-mentioned deep learning based cascaded object recognition method.

A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions which are suitable for being loaded by a processor and executing all or part of the steps of the deep learning based cascading object recognition method.

Compared with the prior art, the beneficial effect of this disclosure is:

1. the method and the device solve the problem of poor detection effect on the large view field and the small target to a certain extent. The method has the advantages that the superiority of deep learning for target detection with a large proportion is fully utilized, the defect detection is carried out by applying a multi-stage detection algorithm for the significant target region, the interference of invalid background noise on the deep learning detection algorithm is reduced, the guiding effect of key region features on a network model is fully exerted, and the detection accuracy of the deep learning detection algorithm is greatly improved by combining a cascade detection structure of a first-stage deep learning detection algorithm and a second-stage deep learning detection algorithm.

2. The method solves the problem that the problems of distortion, blurring and the like of the polling acquisition image are influenced on the deep learning detection due to weather factors such as light, sun radiation, cloud shielding and the like, camera factors such as relative motion of a shot object caused by shooting postures, shooting angles and camera shooting processes, and factors such as noise and the like introduced by the image in the acquisition process to a certain extent, and solves the problems that the background of the image is complex and the contrast is reduced to the interference of intelligent prediction caused by the change of the environment. The link of blind restoration of low-quality images is skipped, and a parallel target detection structure is added in the multi-stage detection, so that a new idea is provided for high-quality detection of deep learning. The improvement of the defect detection rate of the image with a small proportion generally means the improvement of the crisis defect detection rate, so the method plays a great role in promoting the practical application of deep learning in the industrial field.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

FIG. 1 is a cascaded detection structure of a predictive architecture;

FIG. 2 is a diagram of a parallel target detection logical inference mechanism architecture for image pre-processing;

FIG. 3 is a structural diagram of a nth-level model parallel type target detection logical inference mechanism.

The specific implementation mode is as follows:

the present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

In the first embodiment, the power transmission and transformation image is used as an input, and the target is a defect of the power transmission and transformation equipment.

In the cascade target identification method based on deep learning, fig. 1 shows a cascade detection structure of a prediction architecture, fig. 2 shows a parallel target detection logical inference mechanism structure of image preprocessing, and fig. 3 shows an nth-level model parallel target detection logical inference mechanism structure.

The method comprises the following steps:

in the example of this disclosure, use unmanned aerial vehicle transmission line image as the sample source. And acquiring typical power transmission line defect images by using an unmanned aerial vehicle.

The disclosed examples merely exemplify part defects at the sample defect category bolt pin. And (4) manually selecting and removing serious fuzzy and backlight images, and reserving a defect sample picture with higher quality. Selecting detection algorithm MODEL MODEL_nWherein n is 2, and MODEL₁＝{model₁₁}，MODEL₂＝{model₂₁Therein model₁₁Selecting a first-level deep learning network model, wherein the model₂₁And selecting a secondary deep learning network model.

1. Performing data sample tag calibration, model₁₁The calibration category of the algorithm model training comprises equipment parts including pins, such as a pole tower connecting hanging plate, an insulator hanging point, a lead ground wire hanging point and the like, and the model₂₁The calibration categories of the algorithm model training comprise the defects of bolt missing pins, bolt missing nuts, bolt missing, bolt and nut missing and the like.

2. And (4) expanding the number of the sample pictures and the labels obtained in the step (1). The method for data sample expansion can be data image rotation, image mirror image, image defogging, image enhancement, image blurring processing and the like. The method is not limited to this, and other image processing methods that can change the data quality can also be used as the sample expansion method, but the processed image must still be able to determine the defect manually. When a single picture is expanded, n types of expansion methods are selected by using a random sampling method to expand the samples for n times, wherein the specific times can be determined according to the number of the samples.

3. Expanded data and model in step 2₁₁Algorithm MODEL tag as MODEL₁And (5) deep learning and detection model training. According to MODEL₁MODEL training label cutting original drawing and expansion data obtaining MODEL₂Detecting an algorithm MODEL training picture, and correspondingly calculating a defect label in a cut picture as MODEL₂And (4) labels trained by the algorithm model.

4、MODEL₁And (3) training a detection algorithm model, namely, using an SSD detection algorithm under a Caffe framework, designing the size of an algorithm anchor frame according to the proportion of a target in a graph, and carrying out equipment detection training on a joint by using a primary detection algorithm model sample, wherein the model used for training is of a VGG type. Model for testing and screening optimal detection effect and using model as model₁₁And detecting an algorithm model. In the SSD detection algorithm, the overall objective loss function of the algorithm is the weighted sum of the local regression positioning loss and the category confidence loss.

The concrete formula is as follows:

where N is the number of matching default boxes, if N is 0, the penalty is set to 0; x is set as default box and actual some sort of box existing IOU>X is 1 at 0.5, otherwise, is 0; l is the prediction box, g is the true box, and α is the weight of the confidence penalty and the position penalty. L is_conf(x, c) is a category confidence loss, which is a softmax loss at a plurality of category confidences (c); l is_loc(x, L, g) is the local regression localization loss calculated by smooth-L1 loss between the prediction box (L) and the true tag value box (g) parameters, similar to FasterR-CNN, regression to the center of the default bounding box and its width and heightThe offset of the degree.

5、MODEL₂Training a detection algorithm model, and aiming at the target defect in the model by using a FasterR-CNN secondary detection algorithm under a Caffe framework₂₁The size of an anchor frame of an algorithm RPN is designed according to the proportion size in a detection algorithm model sample graph, and model is finally realized through alternate training of the RPN and a FastR _ CNN network and sharing of network parameters₂₁And (5) training a detection algorithm model. The model used by FastR _ CNN is the residual network ResNet 101. Model for testing and screening optimal detection effect and using model as model₂₁And detecting an algorithm model.

The RPN network functions to recommend candidate regions, defining the anchor box with the maximum IoU value from the real box as positive examples, and defining the anchor box with a ratio to the real value IoU below 0.3 as negative examples.

The FastR _ CNN network takes IoU greater than 0.5 as positive samples and IoU as negative samples between intervals [0.1, 0.5). The RPN network is consistent with the target loss function when the FastR _ CNN network is trained.

IoU:the ratio of the intersection and union of the areas of the image rectangle T and the image rectangle G is the IsoU value.

The objective loss function is defined as:

wherein p is_iAs the probability that the ith anchor point is foreground, when the ith anchor point is foreground1 and conversely 0, t_iThe coordinates of the predicted bounding box are represented,

the coordinates of the actual frame. Loss of classification L_clsIs the log loss of two classes (object and non-object). Return loss use

Where R is the smooth-L1 loss function.

Mean regression loss at positive anchor point only

Is activated, otherwise

The output of the cls and reg layers are respectively composed of { p_iAnd t_iAnd (9) composition. cls and reg are derived from N_clsAnd N_regNormalized and weighted by a balance parameter lambda.

6. Finally designing a cascade detection structure, and using MODEL₁Detection algorithm MODEL and MODEL₂And the detection algorithm models are cascaded into a whole to detect the defects of the pictures collected by the unmanned aerial vehicle.

6.1, reading the image of the equipment to be detected in a local storage, and then carrying out image preprocessing, wherein the preprocessing method comprises image rotation, image mirroring, image defogging, image enhancement, image deblurring and other methods.

6.2, inputting the original image and the preprocessed multipath images into a MODEL₁Predicting a detection algorithm MODEL, calculating a basic flow as a parallel target detection logical inference mechanism structure diagram preprocessed in the figure 2 to obtain MODEL₁Coordinate regions and categories are detected.

6.3 MODEL obtained according to 6.2₁Detecting original image cut by coordinate frame and picture data under pretreatment, inputting cut multi-channel data into MODEL₂And (3) detecting an algorithm model, wherein the basic flow is shown in figure 2, and finally obtaining the defect result of the inspection image equipment.

Example two

The difference from the above embodiments is that the input data of the embodiment is a predetermined area inspection image acquired by the unmanned aerial vehicle, and the identification target to be identified is an area disaster (such as debris flow, fire, etc.).

And acquiring a patrol inspection image of a region to be observed, and performing image preprocessing, wherein the preprocessing method comprises image rotation, image mirroring, image defogging, image enhancement, image deblurring and other methods.

And carrying out image expansion.

The image expansion can be performed by selecting and traversing all image processing methods to perform sample expansion, and can also be performed by selecting the image processing methods in a random probability mode to perform sample expansion.

Construction of MODEL₁The detection algorithm MODEL is used for detecting the environment target area of the original image to be detected and the image after image preprocessing, filtering the overlapped frame of the multi-image parallel target detection result by using a non-maximum suppression method, obtaining a primary detection result, realizing primary positioning of a significant environment area and obtaining MODEL₁An environmental target area.

Construction of MODEL₂Detection of algorithm MODEL, MODEL₁Original image of environment target area and area image input MODEL after cutting out preprocessing₂In the algorithm detection MODEL, through MODEL₂The algorithm detection network detects the environment area and the type, the multi-image parallel type prediction result filters the overlapped frames by using a non-maximum suppression method, and a final disaster detection result is obtained.

The corresponding product providing examples:

the cascade target recognition system based on deep learning comprises:

a MODEL construction and training module configured to perform MODEL₁Deep learning target region detection algorithm MODEL training is carried out to carry out MODEL₂Training a deep learning target type detection algorithm model;

the target sample acquisition module is configured to acquire a sample to be detected of the inspection image, preprocess the sample and realize multi-path sample input MODEL₁The detection algorithm is used for carrying out primary positioning and classification on the target salient region to obtain a plurality of paths of primary target regions and primary target types;

an identification module configured to identify multiple MODELs₁Target area input MODEL₂Detection algorithm to realize MODEL₁Refined intelligent prediction and reasoning prediction MODEL of target area₂Target area, and determines MODEL₂And (4) acquiring the target type of the target area, and finally acquiring the area and the type of the target.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. The cascade target identification method based on deep learning is characterized in that: the method comprises the following steps:

acquiring a training sample of the inspection image, carrying out target frame marking on a target detection sample, and expanding the number of samples;

respectively carrying out multistage deep learning target region detection algorithm model training;

acquiring a sample to be detected of a patrol image, preprocessing the sample, inputting a plurality of paths of samples into a next-stage detection algorithm, and performing stage-by-stage positioning and classification on a target salient region, wherein each stage acquires a plurality of paths of target regions and target types;

and inputting the target area of the previous stage into a detection algorithm of the next stage, realizing step-by-step iterative intelligent prediction, reasoning and predicting the tiny target area, classifying the target type, and finally obtaining the area and the category of the target.

2. The method of claim 1 for cascade object recognition based on deep learning, wherein: sample expansion is performed on the acquired image by a sample expansion method including, but not limited to, at least one of image rotation, mirroring, defogging, enhancement and deblurring image processing operations.

3. The method of claim 1 for cascade object recognition based on deep learning, wherein: and (3) performing cascade detection by using a multi-stage detection algorithm to form an integral algorithm cascade structure.

4. The method of claim 1 for cascade object recognition based on deep learning, wherein: and each stage of the multi-stage detection algorithm selects one or more detection models to realize the stage-by-stage detection of the target salient region and the type of the image.

5. The method of claim 1 for cascade object recognition based on deep learning, wherein: the multi-stage detection algorithm can select a one-stage or two-stage deep learning algorithm structure and is used for realizing the detection of the target area and the type of the inspection image.

6. The method of claim 3 for cascade object recognition based on deep learning, wherein: and performing multi-stage network model prediction by using the multi-path images after image processing, and executing a non-maximum value suppression method in each stage of prediction to determine a final detection result.

7. The method of claim 1 for cascade object recognition based on deep learning, wherein: the content predicted by the detection algorithm is the shape deviation degree and the class confidence degree of the target class.

8. The cascade target recognition system based on deep learning is characterized in that: the method comprises the following steps:

the model building and training module is configured to select a first-level or second-level deep learning target region detection algorithm model for training;

the target sample acquisition module is configured to acquire a sample to be detected of the inspection image, preprocess the sample, input a plurality of paths of samples into a next-stage detection algorithm, perform stage-by-stage positioning and classification on a target salient region, and acquire a plurality of paths of target regions and target types;

and the identification module is configured to input the multipath previous-stage target areas into a next-stage detection algorithm, realize step-by-step iterative intelligent prediction, predict and infer the final target area, classify the target type and finally acquire the area and the category of the target.

9. A computer-readable storage medium characterized by: a plurality of instructions stored therein, the instructions being adapted to be loaded by a processor of a terminal device and to perform all or part of the steps of the deep learning based cascaded object recognition method of any one of claims 1-7.

10. A terminal device is characterized in that: the system comprises a processor and a computer readable storage medium, wherein the processor is used for realizing instructions; the computer readable storage medium is used for storing a plurality of instructions, which are suitable for being loaded by a processor and executing all or part of the steps of the deep learning based cascading object recognition method of any one of claims 1-7.