CN114596244A

CN114596244A - Infrared image identification method and system based on visual processing and multi-feature fusion

Info

Publication number: CN114596244A
Application number: CN202011404625.6A
Authority: CN
Inventors: 胥明凯; 何峰; 胡旭冉; 刘斌; 慕世友; 任志刚; 周大洲; 黄锐; 郭锐; 王海鹏; 张德才; 鲍新
Original assignee: State Grid Intelligent Technology Co Ltd; Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Current assignee: State Grid Intelligent Technology Co Ltd; Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2022-06-07

Abstract

The invention provides an infrared image recognition method and system based on visual processing and multi-feature fusion, which are characterized in that an image preprocessing technology is utilized to process collected infrared images so as to filter the interference of background and other factors on equipment recognition, a target detection model based on VGG-Net multi-feature fusion is utilized to carry out target detection on the infrared images, the positions of electric equipment in the infrared images are positioned, and the types of the positioned electric equipment are recognized by utilizing a trained equipment classification model.

Description

Infrared image identification method and system based on visual processing and multi-feature fusion

Technical Field

The invention belongs to the technical field of inspection robots and computer vision, and relates to an infrared image identification method and system based on vision processing and multi-feature fusion.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The research of detecting electrical equipment and diagnosing faults by using an infrared thermal imaging technology is vigorously carried out at home and abroad, and the technology is also widely applied to the conventional transformer substation inspection robot system. Infrared thermal imaging is a non-contact, passive measurement technique. It can detect and diagnose a large number of internal and external defects of electrical equipment, and in particular, it can also diagnose many faults that are inconvenient or undetectable by conventional testing methods. However, according to the knowledge of the inventor, the fault diagnosis of the electric equipment of the transformer substation by carrying the infrared thermal image sensor in the current transformer substation still has some defects, which are mainly expressed as follows:

1) the infrared thermal imaging technique reflects temperature field information that can cause severe distortion in the shape of the object. The technology cannot effectively position and identify the infrared image of the substation equipment, so that the corresponding relation between the temperature field information and the electrical equipment and each part of the electrical equipment cannot be accurately mapped in the image;

2) in order to identify the transformer substation equipment by an infrared image intelligent identification technology, a large amount of infrared image data resources are accumulated, but the data resources cannot be effectively utilized by a traditional machine learning algorithm, so that fault diagnosis cannot be effectively carried out on the transformer substation equipment;

3) the temperature information of a block of area is often obtained through infrared image analysis, the accuracy of data judgment is influenced by equipment positioning, a related calibration method needs to be further improved, and an accurate target positioning and identifying method for the electrical equipment is needed, so that the interference of background information is reduced.

Infrared images are different from visible light images, and some existing methods of device detection and location in visible light images are not feasible to directly process infrared images. The overall distribution of the infrared image gray scale is lower and more concentrated, and the signal-to-noise ratio and the contrast ratio of the infrared image are lower due to random interference brought to the infrared imaging process by the surrounding environment and the imperfection of a thermal imaging system. Therefore, the power equipment is identified through the infrared image, the requirements on a target detection and positioning algorithm are high, and the traditional visual identification algorithm cannot meet the equipment identification requirements in the complex scene.

Disclosure of Invention

The invention provides an infrared image identification method and system based on visual processing and multi-feature fusion in order to solve the problems.

According to some embodiments, the invention adopts the following technical scheme:

an infrared image recognition method based on visual processing and multi-feature fusion comprises the following steps:

the method comprises the steps of processing collected infrared images by using an image preprocessing technology to filter interference of a background and other factors on equipment identification, carrying out target detection on the processed infrared images through a target detection model based on multi-feature fusion, positioning the position of the electric power equipment in the processed infrared images, and identifying the type of the positioned electric power equipment by using a trained equipment classification model.

As an alternative embodiment, the method further comprises the following steps of utilizing the trained second classification model to diagnose the defects of the classified electric equipment, and identifying the state of the electric equipment.

As an alternative embodiment, the specific process of processing the acquired infrared image by using the image preprocessing technology includes:

r, G, B three-channel gray levels are set for the collected infrared images of the power equipment, and normalized histograms of the three channels are calculated;

respectively carrying out equalization processing on three channels of the image;

and merging the channels of the images respectively equalized by the three channels to obtain a color image, thereby obtaining the final processing result of the infrared image.

As an alternative embodiment, the specific process of the target detection model performing target detection on the processed image includes:

uniformly scaling the training samples to a set size, marking the peripheral frame of the complete target to be detected appearing in each picture in the training set, recording the coordinate position of the peripheral frame, and giving a category label;

performing data enhancement on the training sample;

based on an SSD network, replacing the full-connection layer with convolution layers, wherein the feature maps of the convolution layers have different sizes, and constructing a feature pyramid by the feature maps in a deconvolution mode to form a multi-scale feature fusion target detection model;

and training the target detection model, and detecting the target to be detected by using the trained model.

By way of further limitation, the specific process of forming the multi-scale feature fused target detection model comprises: replacing 3 fully-connected layers with 2 convolutional layers by using VGG-16 as a basic network structure, and halving the resolution of Conv7 by using a convolutional kernel; and constructing a feature pyramid by deconvoluting the Conv4, Conv5, Conv6 and Conv7 feature maps to achieve the multi-scale target positioning detection effect.

By way of further limitation, the specific process of forming the multi-scale feature fused target detection model further comprises: generating target prior on the feature map of each scale, reducing a sample search space, and judging whether the region contains a target or not; the detector of each scale combines with the target prior to generate all detection results, and a non-maximum suppression algorithm is used for screening all detection results.

As an alternative embodiment, the method for constructing the device classification model includes:

building convolution layers, and setting the sizes of convolution kernels of the three convolution layers;

building pooling layers, wherein one pooling layer and a PReLu activation function are connected behind each convolution layer, the first two pooling layers use Max pool, and the next pooling layer uses Max pool;

and building a full connection layer, setting the output characteristic dimension of a first full connection layer and the output characteristic dimension of a second full connection layer, and respectively representing the category predicted values of the input images.

As an alternative embodiment, the method for constructing the second classification model includes:

building a convolution layer and a pooling layer; firstly connecting a convolution layer and a pooling layer behind an input layer of the network, wherein the convolution kernel of the convolution layer is 3 multiplied by 3, and the pooling layer uses Max pool;

building an inclusion module; the inclusion module performs convolution operations on the input using 3 filters of different sizes and performs maximum pooling, adding an additional 1x1 convolutional layer before the 3x3 convolutional layer, and then concatenating the outputs of all sub-layers and passing to the next inclusion module;

a Batch Normalization layer is added after each inclusion module.

An infrared image recognition system based on visual processing and multi-feature fusion, comprising:

the preprocessing module is configured to process the acquired infrared image by utilizing an image preprocessing technology;

and the target detection and identification module is configured to construct a target detection model based on multi-feature fusion, perform target detection on the processed infrared image, position the power equipment in the infrared image, and identify the type of the positioned power equipment by using the trained equipment classification model.

A computer readable storage medium, wherein a plurality of instructions are stored, the instructions are suitable for being loaded by a processor of a terminal device and executing the infrared image recognition method based on visual processing and multi-feature fusion.

A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the infrared image recognition method based on visual processing and multi-feature fusion.

An inspection device comprises a processor and a computer readable storage medium, wherein the processor is used for realizing instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the infrared image recognition method based on visual processing and multi-feature fusion.

The inspection equipment provided by the invention comprises but is not limited to an inspection robot, an inspection unmanned aerial vehicle, an automatic inspection vehicle and the like.

Compared with the prior art, the invention has the beneficial effects that:

the invention innovatively provides an infrared image identification method based on visual processing and multi-feature fusion, a plurality of convolution layers are constructed, the feature diagram size of each convolution layer is different, a target prior is generated on the feature diagram of each scale, and a detection result is screened;

according to the method, the infrared image is positioned by adopting a single-step target detection algorithm, the accuracy of target positioning is improved in a multi-feature extraction network fusion mode, the infrared image is subjected to target detection through a multi-feature fusion SSD target detection model, the speed and the precision of target detection can be considered at the same time, and the real-time performance and the reliability of the target detection are improved. The manual investment is reduced, and the detection time is effectively shortened.

According to the invention, a computer vision technology and a deep learning technology are innovatively combined, a lightweight infrared image equipment identification and diagnosis model is constructed, the types and defects of the equipment are respectively identified, the accuracy of infrared image identification is effectively improved, the method can be practically applied to a patrol inspection image identification task in an electric power scene, and the problem of mistaken identification of an interference target by the infrared image identification technology is solved.

The invention provides a visual equalization processing and noise filtering algorithm, which is used for filtering the interference of the background and other factors on equipment identification, reducing the influence of infrared images with different qualities on target identification, greatly improving the precision of infrared image equipment positioning and identification and solving the problems of poor infrared image quality and overall low gray level distribution.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is an SSD infrared imaging device location network;

FIG. 2 is a diagram of a cnn-net1 network architecture;

fig. 3 is a diagram of a cnn-net2 network architecture.

The specific implementation mode is as follows:

the invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

An infrared image recognition method based on visual processing and multi-model fusion comprises the following steps:

(1) acquiring an infrared image of substation equipment, calibrating a target area in the image, analyzing the background characteristic of the image, and performing filtering processing on the image by adopting a histogram equalization technology to improve the contrast of the image;

(2) in order to solve the problem of target positioning in the infrared image, a multi-target detection algorithm (SSD) based on deep learning is selected to position a target, and the accuracy of target positioning is improved in a multi-feature extraction network fusion mode;

(3) a convolutional neural network classification model cnn-net1 is designed to identify the type of the power equipment.

(4) A convolutional neural network classification model cnn-net2 is designed for diagnosing defects of the electric power equipment.

(5) And training a cnn-net1 model and a cnn-net2 model according to the manufactured infrared image equipment sample, setting network hyper-parameters, and operating a training script to obtain a training model.

In the step (1), in the infrared image collected by the substation inspection robot, the environment in which the power equipment is placed is greatly different, and the complex background, the interference light source and the background close to the target can greatly influence the identification of the target, so that the positioning and the detection of the equipment in the later period are not facilitated. The acquired equipment image is filtered by a histogram equalization method, so that the interference of an image background to equipment identification is reduced as much as possible. Histogram Equalization (also known as Histogram flattening) is essentially the non-linear stretching of an image to redistribute image pixel values so that the number of pixel values within a certain gray scale range is approximately equal. Thus, the contrast of the top part of the peak in the middle of the original histogram is enhanced, the contrast of the bottom part of the valley at two sides is reduced, and the histogram of the output image is a flatter segmented histogram: if the output data segment value is small, a visual effect of rough classification is generated. Sampling an equipment indicator light image in a machine room scene, and processing the image through a histogram equalization technology, wherein the method specifically comprises the following steps:

(1-1) for the collected infrared image of the device, assuming that the ranges of three channel gray levels (intensity levels) are respectively [0, L ]_R-1]、[0,L_G-1]、[0,L_B-1]Where the dimension of the image is M × N × 3, MN indicates the total number of pixels per channel of the image, normalized histograms of three channels of the image R, G, B are first obtained, and the normalized histograms of the three channels are expressed by the following equations

p_R(k)＝n_k/MN k＝0,1,2…L_R

p_G(k)＝n_k/MN k＝0,1,2…L_G

p_B(k)＝n_k/MN k＝0,1,2…L_B

In the above formula p_R、p_G、p_BWhich represent normalized histograms of the three channels of one picture R, G, B, respectively.

(1-2) equalizing three channels of the image respectively, wherein the process is represented by the following formula

In the above formula S_RRepresenting the result of the R channel of the image, S_GIndicating the result of the equalization process on the G channel of the image, S_BIndicating the result of the equalization process on the B channel of the image.

And (1-3) merging the channels of the images respectively equalized by the three channels to obtain a color image, thus obtaining the final processing result of the infrared image.

And (3) in the step (2), the development of the current convolutional neural network model is combined according to the characteristics of the infrared image. The infrared image device location model is designed as an SSD-based network model. SSD is a method of detecting objects in an image using a single deep neural network that discretizes the output space of bounding boxes into a set of default boxes. In the network structure of SSD, a series of boxes with fixed sizes are arranged on feature maps with different scales, which are called DefaultBox in SSD and are used for framing the positions of target objects, and a group Troth is assigned to each fixed Box during network training. The core of the SSD method is the use of convolution filters to predict class scores and position offsets for a fixed set of default boxes on the feature map, which is based on the forward propagation CNN network. In addition, the model also has high accuracy and high detection speed, and is a real-time target detection method. Different from the mainstream target detection method, the method does not need region name giving, and can also directly put the whole image or video frame into a convolutional neural network and directly regress the position and classification of the target.

For example, the specific steps of establishing the indicator lamp positioning model through the SSD include:

(2-1) preparing a training data set; the data set mainly comprises infrared images shot when the robot patrols and examines, and pictures collected by the robot have high resolution. In the training sample, a total of 3000 training images were included, each picture size being 6000 × 4000. The training samples are first scaled uniformly to 1200 x 800, then for the complete indicator light target appearing in each picture in the training set, its peripheral box is marked and its coordinate position is recorded and given a category label.

(2-2) data enhancement; the images in the training set are subjected to data enhancement operations of mirror image, rotation at different angles and scaling at different scales respectively, 20000 training samples are obtained in total, and the training sample set is enlarged.

(2-3) designing an infrared image target detection network based on the SSD network; it is divided into two parts, the former part is the basic network for image classification (the layer related to classification is removed), the latter part is the multi-scale feature mapping layer for detection, thus the targets with different sizes can be detected. The model is trained by using the whole image and directly optimizes the detection result, the frame detection is used as a regression problem, the complex data transmission process is omitted, and the method is an end-to-end detection system.

(2-4) carrying out infrared image equipment positioning by means of SSD network multilayer feature fusion; the algorithm takes VGG-16 as a basic network structure,

the VGG-16 network comprised 13 convolutional layers and 3 fully-connected layers, of which 3 fully-connected layers were first replaced with 2 convolutional layers and the resolution of Conv7 was halved using one 2 x 2 convolutional kernel. Regarding the size of the input image, the Conv4, Conv5, Conv6, and Conv7 feature map size is divided into 1/8, 1/16, 1/32, 1/64 of the input image. And then constructing a feature pyramid by deconvoluting the Conv4, Conv5, Conv6 and Conv7 feature maps to achieve a multi-scale target positioning detection effect. And then generating a target prior on the feature map of each scale, reducing a sample search space, and judging whether the region contains a target. And finally, combining the detector of each scale with the target prior to generate all detection results, and screening all the detection results by using a non-maximum suppression (NMS) algorithm.

And (2-5) model training, namely taking the image to be trained and the image group Truth data acquired in the step one as input, and carrying out supervised training on a network model aiming at a target area. The initialization weight parameters of the network are obtained through a pre-training model of the SSD on the ImageNet data set.

In the step (3), the convolutional neural network classification model cnn-net1 can be used to identify the type of the power equipment. In this embodiment, the network comprises 3 convolutional layers, 3 pooling layers, and two fully-connected layers, and the model size is 391 k.

As the type identification of the power equipment is an image classification problem, and the task is simpler, the requirement can be met without designing a complex network structure. In order to ensure the speed and accuracy of the algorithm operation, a small-sized classifier cnn-net1 is designed to identify the kind of the power equipment.

In this embodiment, the building process of the CNN network includes:

(3-1) building convolutional layers, wherein the sizes of convolution kernels of the first two convolutional layers are 3 multiplied by 3, and the size of convolution kernel of the third convolutional layer is 2 multiplied by 2.

And (3-2) building pooling layers, wherein each convolution layer is followed by one pooling layer and a PReLu activation function. The first two pooling layers used Max pool (3 × 3) and the latter one used Max pool (2 × 2).

And (3-3) building a fully-connected layer, wherein the output characteristic dimension of the first fully-connected layer is 128, and the output characteristic dimension of the 2 nd fully-connected layer is 2, and respectively represents the category predicted values of the input image.

In the step (4), the convolutional neural network classification model cnn-net2 may be used to identify defects of the power device and determine whether the device has defects, in this embodiment, the network includes 1 convolutional layer, 1 pooling layer, 2 inclusion modules, and two full connection layers.

Because the problem of equipment defect identification based on the infrared image relates to fine-grained image classification, on one hand, the difference between different equipment defects on a temperature field is generally very small, and the characteristics of embodying the equipment defects are not very obvious; on the other hand, different devices can be shielded and overlapped, which can have a great influence on device defect identification. For this reason, cnn-net2 is a relatively complex and highly classified network structure.

In this embodiment, cnn-net2 employs an inclusion structure, which is a network with a fine local topology, i.e., multiple convolution or pooling operations are performed in parallel on the input image, and all output results are merged into a very deep feature map. Because different convolution operations and pooling operations can obtain different information of the input image, processing these operations in parallel and combining all the results will result in better image characterization. The construction flow of cnn-net2 is as follows:

(4-1) building a convolution layer and a pooling layer; after the input layer of the network, a convolution layer and a pooling layer are connected, wherein the convolution kernel size of the convolution layer is 3x3, and the pooling layer uses Max pool (3 x 3).

(4-2) building an inclusion module; the inclusion module performs convolution operations on the input using 3 filters of different sizes (1 x1, 3x3, 5 x 5), and also performs maximum pooling. To reduce the number of input channels and computational complexity, an additional 1x1 convolutional layer is added before the 3x3 convolutional layer, after which the outputs of all sub-layers are finally concatenated and passed to the next inclusion module.

And (4-3) building a Batch Normalization layer, and adding the Batch Normalization layer behind each Incepration module in order to accelerate the convergence speed of the network and avoid the occurrence of overfitting.

In the step (5), the concrete steps include:

(5-1) making an infrared imaging device identification sample, wherein the training sample is divided into two parts, one part of the sample contains the device type and is used for training the cnn-net1, and the other part of the sample contains the defect information of the device and is used for training the cnn-net 2. And marking the two parts of samples by using an image marking tool.

(5-2) cutting out a target area from the infrared image according to the image marking information, and uniformly zooming to a size of 32 x 32;

(5-3) training cnn-net1 and cnn-net2 respectively, wherein during the training: 60% of the images were used as training set, 20% of the images were used as validation set, and 20% of the images were used as test set. The CNN network is trained using a Stochastic Gradient Descent (SGD) method, the size of the batch size is 256, and the weights in the network are initialized randomly. The initial learning rate was 0.01 and the momentum was 0.9. The network iterates a total of 10 ten thousand times during the training process.

Of course, in other embodiments, the above parameters may be adjusted according to specific situations.

In the above embodiment, the acquired image is processed by using an image preprocessing technology, so as to filter the interference of the background and other factors on the device identification, then the target detection is performed on the infrared image through the SSD target detection model based on multi-feature fusion, so as to locate the position of the power device in the infrared image, and then the type of the device is identified by using the trained device classification model. And finally, a large number of pictures in the transformer substation scene are used for testing the algorithm, and the test result shows that the method can accurately position and identify the type of the transformer substation equipment from the infrared image. The method can be practically applied to the inspection image recognition task in the power scene.

The following product examples are also provided:

a pre-processing module configured to process the acquired image using image pre-processing techniques to filter interference of background and other factors with device identification;

and the target detection and identification module is configured to construct a target detection model based on multi-feature fusion, and perform target detection on the infrared image detected in real time, so as to position the position of the power equipment in the infrared image, and identify the type of the positioned power equipment by using the trained equipment classification model.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. An infrared image identification method based on visual processing and multi-feature fusion is characterized in that: the method comprises the following steps:

the method comprises the steps of processing collected infrared images by using a visual equalization technology, carrying out target detection on the infrared images by using a target detection model based on VGG-Net multi-feature fusion, positioning the position of the power equipment in the infrared images, and identifying the type of the positioned power equipment by using a trained equipment classification model.

2. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 1, wherein: the specific process of processing the acquired infrared image by utilizing the image preprocessing technology comprises the following steps:

and carrying out nonlinear stretching on the infrared image by utilizing histogram equalization, and redistributing image pixel values to ensure that the number of the pixel values in a certain gray scale range is approximately equal.

3. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 1, wherein: the specific process of processing the acquired infrared image by using the image preprocessing technology comprises the following steps:

4. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 1, wherein: the process of utilizing the target detection model based on VGG-Net multi-feature fusion to carry out target detection on the infrared image comprises the following steps:

performing data enhancement on the training sample;

5. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 4, wherein: the specific process for constructing the target detection model based on VGG-Net multi-feature fusion comprises the following steps: replacing 3 fully-connected layers with 2 convolutional layers by using VGG-16 as a basic network structure, and halving the resolution of Conv7 by using a convolutional kernel; and constructing a feature pyramid by deconvoluting the Conv4, Conv5, Conv6 and Conv7 feature maps to achieve the multi-scale target positioning detection effect.

6. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 4, wherein: the specific process of constructing the target detection model based on VGG-Net multi-feature fusion further comprises the following steps: generating a target prior on the feature map of each scale, reducing a sample search space, and judging whether the region contains a target; the detector of each scale combines with the target prior to generate all detection results, and a non-maximum suppression algorithm is used for screening all detection results.

7. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 1, wherein: the method for constructing the equipment classification model comprises the following steps:

8. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 1, wherein: and the method further comprises the following steps of diagnosing the defects of the classified electric equipment by using the trained second classification model, and identifying the state of the electric equipment.

9. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 8, wherein: the construction method of the second classification model comprises the following steps:

building a cascaded inclusion module;

a Batch Normalization layer is added after each inclusion module.

10. The infrared image recognition method based on visual processing and multi-feature fusion as claimed in claim 9, wherein: the specific steps for building the cascaded inclusion module comprise: the inclusion module performs convolution operations on the input using 3 different sized filters and performs maximum pooling, adding an additional 1x1 convolutional layer before the 3x3 convolutional layer, and then concatenating the outputs of all the sublayers and passing to the next inclusion module.

11. An infrared image recognition system based on visual processing and multi-feature fusion is characterized in that: the method comprises the following steps:

and the target detection and identification module is configured to perform target detection on the infrared image by using a target detection model based on VGG-Net multi-feature fusion, position the position of the power equipment in the infrared image and identify the type of the positioned power equipment by using a trained equipment classification model.

12. A computer-readable storage medium, comprising: stored with instructions adapted to be loaded by a processor of a terminal device and to perform a method for infrared image recognition based on visual processing and multi-feature fusion according to any of claims 1-10.

13. A terminal device is characterized in that: the system comprises a processor and a computer readable storage medium, wherein the processor is used for realizing instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform a method for infrared image recognition based on visual processing and multi-feature fusion according to any one of claims 1 to 10.