WO2021046951A1

WO2021046951A1 - Image identification method, system, and storage medium

Info

Publication number: WO2021046951A1
Application number: PCT/CN2019/110026
Authority: WO
Inventors: 徐海青; 陈是同; 徐唯耀; 秦浩; 董媛媛; 吴立刚; 王维佳; 余江斌; 梁翀; 宋杰; 王文清; 程琳; 浦正国; 郭庆; 吴小华; 张彬彬; 胡心颖; 胡丁丁
Original assignee: 安徽继远软件有限公司; 国网信息通信产业集团有限公司; 国家电网有限公司
Priority date: 2019-09-09
Filing date: 2019-10-08
Publication date: 2021-03-18
Also published as: CN110705601A

Abstract

Provided in embodiments of the present application are an image identification method, a system, and a computer storage medium. The method comprises: acquiring an original image, and performing data augmentation on the original image on the basis of optimization classification; performing image denoising on the augmented data on the basis of a non-local mean value of an adaptive Gaussian kernel; constructing an anchor refinement module and a target detection module, and performing network structure optimization on the basis of the constructed modules; on the basis of the anchor refinement module and the target detection module, performing machine training on a deep learning neural network model; and importing an image to be diagnosed into the trained neural network model, and performing image data processing analysis and defect diagnosis.

Description

Image recognition method, system and storage medium

Cross-references to related applications

This application is filed based on a Chinese patent application with application number 201910846731.0 and an application date of September 9, 2019, and claims the priority of the Chinese patent application. The content of the Chinese patent application is hereby incorporated into this application by way of introduction.

Technical field

This application relates to the technical field of equipment maintenance, in particular to an image recognition method, system and storage medium.

Background technique

Traditional oil-filled equipment for substations based on oil insulation relies on the good insulation characteristics of insulating oil, which can effectively prevent internal short circuits in the equipment. However, due to multiple complex factors such as equipment manufacturing quality, transportation, installation, and long-term operation, the operation of oil-filled equipment produces leakage. Oil leakage is more common, and it is difficult to check when the equipment is powered on. It can only be checked when the equipment is powered off.

There are many reasons for oil leakage of oil-filled equipment in substations. The radiator bleed plugs currently used generally have the problem of not having a stop, which cannot achieve a good sealing effect. In addition, the equipment structure design of some manufacturers is unreasonable, and it is extremely easy. Causes oil leakage defects. The oil leakage of oil-filled equipment is also related to the size of the load it carries. The higher the equipment load, the higher the equipment oil temperature, and the viscosity of the insulating oil becomes thinner, which is more likely to cause oil leakage. The quality of the components of the oil-filled equipment is also one of the main reasons that cause it to easily lead to oil leakage. The improper lifting, transportation, and installation operation methods adopted during the transportation and installation of the oil-filled equipment also caused oil leakage of the oil-filled equipment. Oil leakage caused by thermal expansion and contraction caused by changes in ambient temperature and load during long-term operation of oil-filled equipment.

In recent years, my country’s power system has also vigorously promoted the transformation of substation equipment, but the substations in my country’s power system have not been completely transformed. The transformers and other equipment used in these substations are still mostly old-fashioned transformers. All substation equipment requires huge manpower, material resources, and financial resources, and will also have a certain impact on the normal operation of the power grid.

At present, the oil leakage defects of substation equipment (including at least the oil filling equipment of the substation) are mainly inspected by manpower, and repairs are carried out after the oil leakage is found. With the development of computer technology and image processing technology, artificial intelligence has been widely used. How to effectively apply deep learning technology to the detection of oil leakage defects in substation equipment is a problem that needs to be solved urgently.

Summary of the invention

In order to solve the above-mentioned shortcomings in the prior art, the embodiments of the present application provide an image recognition method, system, and storage medium. By intelligently identifying and diagnosing the inspection images of substation equipment, the speed of diagnosis of oil leakage in substation equipment is improved. And precision, avoiding the inspection of oil leakage by manpower, greatly saving manpower and material resources.

The embodiment of the present application provides an image recognition method, including:

Obtain the original image, and perform data augmentation based on the optimized classification for the original image;

For the augmented data, perform image denoising based on the non-local mean of the adaptive Gaussian kernel;

Construct the anchor refinement module and target detection module, and optimize the network structure based on the constructed module;

Based on the anchor refinement module and the target detection module, perform deep learning neural network model machine training;

Import the image to be diagnosed into the trained neural network model for image data processing analysis and defect diagnosis.

In the above solution, the data augmentation based on the optimized classification for the original image includes:

Calculate the classification accuracy rates of all classes in a test set, where the test set is a set of at least two original images, and classify the at least two original images;

Retrieve the category with the lowest correct rate in the test set;

Data augmentation is performed based on the classification with the lowest retrieval accuracy.

In the above solution, for the augmented data, performing image denoising based on the non-local mean of the adaptive Gaussian kernel includes:

Based on the adaptive Gaussian kernel framework, set the θ _i direction angle as the main direction of the image;

Set the main directions of the two original images N _i and N _j _{as θ i} and θ _j , and set Δθ _i, _{where j is} the difference between the direction angles θ _i and θ j;

If _{the value of Δθ i,j} is an integer multiple of 90, exchange _{pixels in the original image N j} to perform image rotation conversion;

If _{the value of Δθ i,j} is not an integral multiple of 90, expand _{the image size of the original image N j} to a new image neighborhood, and perform image rotation conversion;

If Δθ _i, the value _j does not exceed the predetermined threshold value, such as 2, and the original image N _i N _j to match the original image, the converted image rotation is not performed.

In the above solution, the construction of the anchor refinement module and the target detection module to optimize the network structure based on the constructed module includes:

Adjust the position and size of anchors based on the anchor refinement module;

The regression operation is performed through the target detection module, and the accurate target position and size of the anchors are obtained.

In the above solution, the anchor refinement module is used to adjust the position and size of the anchors, and the target detection module is used to perform the regression operation to obtain the accurate target position and size of the anchors, including:

Generate n refined anchor window boxes based on the divided image feature map units;

Through the generated n refined anchor boxes, the anchor boxes are passed to the corresponding feature map of the target detection module to generate the target category and accurate target position and size; where n is a positive integer greater than or equal to 1.

In the above solution, after generating n refined anchor window boxes based on the divided image feature map unit, the method further includes:

Calculate the negative confidence of each anchor box;

Delete anchor boxes with negative confidence greater than the preset confidence threshold;

Correspondingly, through the generated n refined anchor boxes, the anchor boxes are transferred to the corresponding feature map of the target detection module, and the target detection model is used to generate the target category and the accurate target position and size.

In the above solution, the method further includes:

The machine training of the deep learning neural network model is performed based on the calculation of the loss function, where the loss function includes the loss of the anchor refinement module and the loss of the target detection module.

In the above solution, the importing the image to be diagnosed into the trained neural network model to perform image data processing analysis and defect diagnosis includes:

Input the image to be diagnosed into the trained deep neural network model, and identify whether the image to be diagnosed is an oil leakage image.

The anchor refinement module is constructed by eliminating the classification layer of the classifier.

In the above solution, the target detection module is constructed by transmitting the output of the connection block.

In the above scheme, the loss function is defined as follows:

Set i to the anchor index in a small batch,

Is the real category label of anchor ^i,

Is the real position and size ^{of anchor i;}

p _i and x _i respectively represent the confidence that the predicted anchor ⁱ is a target and the coordinate ^{of the refined anchor i in the anchor refinement module;}

c _i and t _i respectively represent the object category and coordinates of the bounding box predicted in the target detection module;

N _arm and No _odm are the number of anchors of the positive samples in the anchor refinement module and the target detection module, respectively, the classification loss L _b is the cross-entropy loss of the two categories, and the multi- _{class loss L m} is the normalized index of the confidence of multiple categories Function loss; use smooth L1 loss as regression loss L _r ;

Indicator function

When the condition is true, output 1; that is

Otherwise, the output is 0;

Shows that the regression loss of negative sample anchors is ignored;

When N _arm = 0, set

with

When N _odm =0, set

with

The embodiment of the present application also provides an image recognition system, including:

One or more processors;

Memory, used to store one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are caused to execute any one of the image recognition methods described above.

A computer-readable storage medium in which a computer program is stored, and when the program is executed by a processor, the image recognition method as described above is realized.

The embodiment of the application adopts the above-mentioned technical solution, and has at least the following technical effects compared with the prior art:

1). In the embodiment of this application, it is equivalent to preprocessing the actual image (original image) collected by the substation equipment, and adopting the denoising method based on the adaptive Gaussian kernel to eliminate the noise and interference information in the original image and improve the original image The clarity of is helpful to improve the accuracy of subsequent intelligent recognition and classification;

2). In the embodiments of this application, the secondary cascade structure of the anchor refinement module and the target detection module is adopted. The anchor refinement module is designed to filter negative sample anchors and roughly adjust the position and size of the anchors to provide better Initialization results; the target detection module uses the refined anchors as input to calculate the position and size of the anchors to obtain accurate target positions and sizes. Based on the two-level cascade structure, it can improve the detection of oil leakage of charging station equipment Accuracy, effectively guarantee the accuracy of the test results.

Description of the drawings

By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, purposes, and advantages of the present application will become more apparent:

Figure 1 is a schematic flow diagram of an embodiment of the application;

FIG. 2 is a schematic diagram of a data augmentation process based on optimized classification according to an embodiment of the application;

FIG. 3 is a schematic diagram of a processing flow of image denoising according to an embodiment of the application;

FIG. 4 is a flowchart of similarity comparison of rotating and matching image pieces according to an embodiment of this application;

FIG. 5 is a schematic diagram of the weight coefficient distribution of a noise image according to an embodiment of the application;

FIG. 6 is a schematic diagram of the adjustment position and size of the anchor refinement module based on the embodiment of the application.

detailed description

The application will be further described in detail below with reference to the drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for ease of description, only the parts related to the invention are shown in the drawings.

It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the application will be described in detail with reference to the drawings and in conjunction with the embodiments.

As shown in Figures 1-6, the embodiment of the present application discloses an image recognition method, which may specifically be an image recognition method for oil leakage of substation equipment based on single-stage target detection, including the following steps:

Step 1. Obtain the original image, and perform data augmentation based on the optimized classification for the original image;

Collect at least two images for substation equipment and use them as original images, and train deep learning neural network model machines through the images.

Among them, the steps of data augmentation based on optimized classification include:

Step 11. Calculate the classification accuracy rates of all classes in the test set;

The test set is a set of at least two original images, and at least two original images are classified. When the known images include several classifications and when it is known to manually classify and label each original image, calculate the The classification accuracy rate obtained by classifying the original images in the test set;

Step 12. Retrieve the category with the lowest correct rate in the test set;

Extract the original image belonging to the classification result with the lowest classification accuracy rate;

Step 13. Perform data augmentation processing based on the classification with the lowest retrieval accuracy;

More specifically, the extracted original image belonging to the classification result with the lowest classification accuracy rate is input to the convolutional neural network as the input image sample. It can be understood that the convolutional neural network includes at least one convolutional layer and at least one fully connected layer. Among them, the convolutional layer is used to calculate the feature vector of the input image sample and output the calculation result to the fully connected layer. The final fully connected layer of the convolutional neural network is connected to the output layer, and is directed to the feature vector of the input image sample For machine training, in the process of inputting samples, the following multivariate equations are obtained:

In the above formula, x ₁ , x ₂ ,..., x _n are the input of the last fully connected layer and the feature vector of the image sample; a ₁ , a ₂ ,..., a _m are the output, and m is the number of categories , W is the weight parameter of the fully connected layer, and b is the bias of the fully connected layer;

For the i-th category (m=i) analysis, when k samples are input, for each category, there is a multivariate equation system as shown in the following formula, the unknown of the equation is the weight parameter w corresponding to the category, enter The number of samples is the number of equations that constitute the equation group:

When training the network model, the number of samples k in the training set is countable, and the number of input samples appearing in each category does not reach the number of weight parameters of the actual fully connected layer. K is much smaller than n. For multiple linear equations, equations The number is less than the number of unknowns, the equations are underdetermined equations, converted into matrix form as follows, the underdetermined equations have infinitely many solutions, that is, the equations have multiple solutions;

For classified samples with complex sample characteristics, by increasing the number of input samples, increasing the number of equations in the equation system, and increasing the number of vectors contained in the basic solution system, the network model is promoted to reach the global optimum;

Through data augmentation based on optimized classification, the input samples of classes with poor classification effects are increased, that is, the number of equations corresponding to the class is increased. Data augmentation for a single class can improve the correct classification of the class by the network model And then improve the overall classification accuracy rate;

Step 2: For the augmented data, perform image denoising based on the non-local mean of the adaptive Gaussian kernel;

Specifically, the processing steps of image denoising include the following:

Step 21: Based on the adaptive Gaussian kernel framework, set the θ _i direction angle as the main direction of the image, and use θ _{i to} realize the rotation matching of the image;

Step 22: Set the main directions of the _{two images N i} and N _j _{to be θ i} and θ _j , respectively, and set Δθ _{i, j to} be the difference between the direction angles θ _i and θ _j ;

Step 23: If _{the value of Δθ i,j} is an integer multiple of 90, exchange _{pixels in the image N j} to perform image rotation conversion;

Step 24, if _{the value of Δθ i,j} is not an integer multiple of 90, expand _{the image size of N j} to a new image neighborhood, and perform image rotation conversion;

Step 25, if _{the value of Δθ i,j} does not exceed a preset threshold such as a value of 2, the images N _i and N _j are matching images, and no image rotation conversion is performed;

Specifically, the construction of the anchor refinement module is realized by removing the classification layer of the classifier. The target detection module is constructed by transmitting the output of the connection block.

Image matching based on the rotation of the current image in the image such that N _i between the target image as an image N _j establish a higher correlation to find more similar pixel, the pixel is similar to more facilitate better denoising effect.

Specifically, in the process of obtaining the reliable similarity weight coefficient by calculating the similarity distance, the similarity distance is calculated based on the adaptive Gaussian kernel, and then the rotation matching similarity measure and the adaptive Gaussian kernel weight coefficient are associated to obtain the new similarity distance d _K (i, j), the specific formula is as follows:

In the above formula, v (N _i) is a pixel image of the N _i, v (N _'j) with image V N _j (N _i) similar pixel.

Is the weight coefficient of the adaptive Gaussian kernel after normalization, that is, the weight function

defined as:

In the above formula, h(i) is the local adaptive similar weight parameter, that is, the denoising result can be calculated by the following formula:

In the above formula,

Is the normalization factor, S(i) is the search area of the image;

Specifically, the method for selecting local adaptive similarity weight parameters in this embodiment is based on image residual estimation, and the specific calculation includes the following:

_{Calculate the residual r i} of each pixel in the sample image with noise:

among them

Used to ensure that it is in a flat area

E(.) represents mathematical expectation;

Use a robust median estimator to estimate the noise standard σ _{i of the} local area of the image, namely:

Among them, R={r ₁ , r ₂ ,..., r _|Ω(i)| },|Ω(i)| is the total number of pixels in the image; set R′={r ₁ , r ₂ , ..., r _|S(i)| } is the residual estimation of each point in the search area S(i), and the noise level in the search area is defined as σ _i = mean{R′}

The definition of the local adaptive similarity weight parameter h(i) is:

h(i)=ησ _i (10)

Among them, the constant η is used to adjust the parameters;

Based on the residual estimation method, there is an overestimation of the noise level in the detail part of the image, and the noise will underestimate the noise level in the flat part, so η is designed to be

Functions related to σ _i:

Among them, η ₁ =1.1 and η ₂ =1.6 are empirical values, which are obtained through experimental optimization;

Refer to Figure 5, which is a schematic diagram of the weight coefficient distribution of the non-local mean denoising method based on the adaptive Gaussian kernel and the traditional non-local method used in this embodiment. The left side of the figure is a noisy image, and the middle is a traditional non-local mean image denoising. Weight coefficient distribution, on the right is the non-local mean denoising weight coefficient distribution based on the adaptive Gaussian kernel. The image block is contaminated by Gaussian noise with the noise standard σ=20, and the brighter pixels in the figure indicate the greater the weight. It can be seen from Figure 5 that pixels with similar edges or texture structures in the non-local mean denoising method based on the adaptive Gaussian kernel are given larger weights, that is, the weight function proposed in this embodiment can be more effective Measures the similarity between pixels;

In texture images, the denoising method based on the non-local mean of the adaptive Gaussian kernel can find more similar pixels, and the rotation matching similarity comparison can find more images with similar patterns, ensuring the non-local mean based on the adaptive Gaussian kernel Denoising method has better denoising effect;

Step 3: Construct an anchor refinement module and a target detection module, and optimize the network structure based on the constructed module;

The network structure includes an anchor refinement module and a target detection module. The anchor refinement module is used to remove negative sample anchors, so as to reduce the search space for the classifier, and at the same time roughly adjust the position and size of the anchors, so as to provide for the subsequent regression Better initialization results; the target detection module is used to return the results to the accurate target position according to the refined anchors, and predict multi-category labels;

The anchor refinement module is constructed by removing the classification layer of the classifier; the target detection module is constructed by transmitting the output of the connection block; the transmission connection block is followed by the prediction layer, which generates the score of the target category and the coordinates relative to the refined anchors Offset;

The anchor refinement module and the target detection module establish a connection based on the transmission connection block, and convert the functions of different layers from the anchor refinement module into the form required by the target detection module, so that the target detection module can share the features from the anchor refinement module;

It is specifically explained here that the transmission connection block is used for feature maps associated with anchors. The transmission connection block also inherits large-scale context by adding advanced features to the transmitted features to improve the accuracy of detection; The dimensions of the contexts of the scale are matched, and the inverse convolution operation is used to increase the advanced features, and then added to the convolutional layer after the summation to ensure the discernibility of the detected features;

In order to improve the accuracy of single-step regression predicting the position and size of the target based on each feature layer at different scales, this embodiment adopts a two-step cascade regression strategy to return to the target position and size, and first adjusts it through the anchor refinement module. The position and size of the anchors are convenient to provide better initialization results for the regression operation of the target detection module.

More specifically, the specific location and size of the anchors are adjusted based on the anchor refinement module, and then the target detection module is used to perform the regression operation. The specific steps include the following:

Step 31: Generate n refined anchor boxes (windows) based on the units of the divided image feature map;

Step 32: Pass the anchor boxes to the corresponding feature map of the target detection module by obtaining the generated n refined anchor boxes to generate the target category and accurate target position and size; n is a positive integer greater than or equal to 1.

In this embodiment, two category scores and four precise offsets relative to the refined anchor boxes are calculated, and 6 outputs are generated for each refined anchor box to realize the detection task; this embodiment uses two-step cascade regression The strategy, that is, the anchor refinement module generates refined anchor boxes, and the target detection module uses these refined anchor boxes as input for further detection, especially for small targets, making the detection results more accurate;

Specifically, this embodiment also rejects negative sample anchors that have been accurately classified by designing a negative sample filtering mechanism, and adjusts the problem of sample imbalance;

For a refined anchor box in the training phase, if its negative confidence is greater than a preset (confidence) threshold θ, the anchor can be discarded in the process of training ODM; preferably, θ in this implementation Set to 0.99;

Train the target detection module by passing refined negative sample anchor boxes and positive sample anchor boxes;

In the inference stage, if there is a refined anchor box that is given a negative confidence greater than θ, the anchor will be discarded in the detection process of the target detection module;

Step 4: Based on the anchor refinement module and the target detection module, perform deep learning neural network model machine training;

Specifically, training a deep learning neural network model based on the designed anchor refinement module and target detection module. The loss function in this embodiment includes the loss of the anchor refinement module and the loss of the target detection module. For the anchor refinement module, it is every Each anchor is assigned a binary class label, indicating that it is a target or not a target, and returns to its position and size at the same time to obtain a refined anchor;

Pass the refined anchors with a negative confidence level less than the threshold to the target detection module to further predict the target category and accurate target position and size;

The loss function is defined as follows:

Set i to the anchor index in a small batch,

Is the real category label of anchor ^i,

Is the real position and size ^{of anchor i;}

Indicator function

When the condition is true, output 1; that is

Otherwise, the output is 0;

Shows that the regression loss of negative sample anchors is ignored;

When N _arm = 0, set

with

When N _odm =0, set

with

More specifically, in order to deal with targets of different scales, this embodiment of the application selects four special layers, using VGG-16 as the backbone network, and the step size is 8, 16, 32, and 64 pixels, which are different from the predicted ones. The anchors of the scale are associated;

The four selected feature layers are all associated with a specific anchor scale and three aspect ratios (ie 0.5, 1.0, 2.0); in particular, the anchor scale here is four times the corresponding step length;

During the training process, according to the corresponding relationship between the Jaccard overlapped anchors and the real frame quality inspection, the network model is trained end-to-end; by matching each real frame with the anchor boxes with the best overlap score, and then Match anchor boxes with any real boxes whose overlap degree is higher than 0.5;

Step 5: Import the image to be diagnosed into the trained neural network model for image data processing analysis and defect diagnosis;

Specifically, by importing the target image to be diagnosed into the trained neural network model, image data processing analysis and defect diagnosis are performed. Based on the input of the substation oil leakage image to be diagnosed and identified into the trained deep neural network model, the identification Describe whether the image (patrol image) for the substation equipment to be diagnosed is an oil leakage image, to verify the accuracy of the algorithm in this case.

The image recognition method provided in the embodiments of this application uses deep learning techniques such as deep learning neural network models to intelligently identify and diagnose substation equipment inspection images, which improves the speed and accuracy of the diagnosis of oil leakage in substation equipment, and avoids the use of manpower The inspection of oil leakage has greatly saved manpower and material resources. Among them, the original image is processed, and the denoising method based on the adaptive Gaussian kernel is used to eliminate the noise and interference information in the original image, improve the clarity of the original image, and help improve the accuracy of subsequent intelligent recognition and classification.

An embodiment of the present application also provides an image recognition system, and the system includes:

One or more processors;

Memory, used to store one or more computer programs,

When the one or more computer programs are executed by the one or more processors, the one or more processors are caused to execute the aforementioned image recognition method.

An embodiment of the present application also provides a computer-readable storage medium storing a computer program, wherein the program is executed by a processor to implement the aforementioned image recognition method.

The above description is only a preferred embodiment of the present application and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above technical features without departing from the inventive concept. Or other technical solutions formed by any combination of its equivalent features. For example, the above-mentioned features and the technical features disclosed in this application (but not limited to) with similar functions are mutually replaced to form a technical solution.

Except for the technical features described in the specification, the other technical features are known to those skilled in the art. In order to highlight the innovative features of this application, the rest of the technical features are not repeated here.

Industrial applicability

The image recognition method, system, and storage medium provided by the embodiments of the application use deep learning technologies such as deep learning neural network models to intelligently identify and diagnose substation equipment inspection images, which improves the speed and accuracy of the diagnosis of oil leakage in substation equipment , It avoids the inspection of oil leakage by manpower, which greatly saves manpower and material resources.

Claims

An image recognition method includes the following steps:

Obtain the original image, and perform data augmentation based on the optimized classification for the original image;

For the augmented data, perform image denoising based on the non-local mean of the adaptive Gaussian kernel;

Construct the anchor refinement module and target detection module, and optimize the network structure based on the constructed module;

Based on the anchor refinement module and the target detection module, perform deep learning neural network model machine training;

Import the image to be diagnosed into the trained neural network model for image data processing analysis and defect diagnosis.
The method according to claim 1, wherein the data augmentation based on optimized classification comprises:

Calculate the classification accuracy rate of all classes in the test set, the test set is a set of at least two original images, perform image classification on the at least two original images, and calculate the correct classification obtained by classifying the original images in the test set rate;

Retrieve the category with the lowest correct rate in the test set;

Data augmentation is performed based on the classification with the lowest retrieval accuracy.
The method according to claim 1, wherein the performing image denoising based on the non-local mean of the adaptive Gaussian kernel comprises the following:

Based on the adaptive Gaussian kernel framework,

Set the main directions of the two original images N i and N j as θ i and θ j , and set Δθ i, j to be the difference between the direction angles θ i and θ j ;

If the value of Δθ i,j is an integer multiple of 90, exchange pixels in the original image N j to perform image rotation conversion;

If the value of Δθ i,j is not an integer multiple of 90, expand the image size of the original N j to a new image neighborhood, and perform image rotation conversion;

If the value of Δθ i,j does not exceed the preset threshold, the images N i and N j are matched, and no image rotation conversion is performed.
The method according to claim 1, wherein the constructing the anchor refinement module and the target detection module to optimize the network structure based on the constructed module comprises:

Adjust the position and size of anchors based on the anchor refinement module;

The regression operation is performed through the target detection module, and the accurate target position and size of the anchors are obtained.
The method according to claim 4, wherein the adjusting the position and size of the anchors based on the anchor refinement module, and performing a regression operation through the target detection module to obtain the accurate target position and size of the anchors, comprises:

Generate n refined anchor window boxes based on the divided image feature map units;

Through the generated n refined anchor boxes, the anchor boxes are passed to the corresponding feature map of the target detection module to generate the target category and accurate target position and size; where n is a positive integer greater than or equal to 1.
The method according to claim 5, wherein, after generating the n refined anchor window boxes based on the unit of the divided image feature map, the method further comprises:

Calculate the negative confidence of each anchor box;

Delete anchor boxes with negative confidence greater than the preset confidence threshold;

Correspondingly, through the generated n refined anchor boxes, the anchor boxes are transferred to the corresponding feature map of the target detection module, and the target detection model is used to generate the target category and the accurate target position and size.
The method according to claim 1, wherein the method further comprises:

The machine training of the deep learning neural network model is performed based on the calculation of the loss function, where the loss function includes the loss of the anchor refinement module and the loss of the target detection module.
The method according to claim 1, wherein the importing the image to be diagnosed into the trained neural network model to perform image data processing analysis and defect diagnosis comprises:

Input the image to be diagnosed into the trained deep neural network model, and identify whether the image to be diagnosed is an oil leakage image.
An image recognition system, wherein the system includes:

One or more processors;

Memory, used to store one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are caused to execute the image recognition method according to any one of claims 1-8.
A computer-readable storage medium storing a computer program, wherein the program is executed by a processor to implement the image recognition method according to any one of claims 1-8.