CN115147385A

CN115147385A - Intelligent detection and judgment method for repeated damage in aviation hole exploration video

Info

Publication number: CN115147385A
Application number: CN202210820374.2A
Authority: CN
Inventors: 万夕里; 黄旭; 管昕洁
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-10-04

Abstract

The invention discloses an intelligent detection and judgment method for repeated damage in an aviation hole exploration video, which comprises the steps of firstly carrying out target detection on internal damage, reading the position of a target detection frame of a current frame and the depth characteristics of image blocks of each detection frame; then filtering the detection frame according to the confidence coefficient; the maximum value of the detection frames is suppressed so as to eliminate the phenomenon that a plurality of detection frames exist in the same damage; predicting the position of a target current frame by using a Kalman filtering tracker; calculating a cost matrix of the track information and the detection information according to the Mahalanobis distance based on the appearance information, and sequentially performing cascade matching and IOU matching; and finally, updating parameters and feature sets of the Kalman filtering tracker, and judging the disappearance of the target and the appearance of a new target. In the process, in order to solve the problem of repeated damage misjudgment in the hole exploration video, the invention provides an intelligent detection and judgment method for repeated damage in the aviation hole exploration video based on the fusion of a twin convolutional neural network and an optimal stack block.

Description

Intelligent detection and judgment method for repeated damage in aviation hole exploration video

Technical Field

The invention relates to an intelligent detection and judgment method for repetitive damage in an aviation hole exploration video based on a twin network and an optimal stacking block, and belongs to the field of computer vision.

Background

The traditional target tracking method has made great progress over years, and can obtain good tracking effect to a certain extent. For example, target tracking is performed through feature space and compressive sampling, a filter is used for relieving the influence caused by shielding in the tracking process, and multi-target tracking is performed through a swarm optimization algorithm. Although the method can achieve the target tracking effect to a certain extent, the designed tracker cannot sufficiently learn the characteristics of the target content in the image, so that the tracking effect is unstable, and the work of marking and judging whether the damage is repeated in real time is difficult to complete.

In recent years, the development of target detection and tracking technology is rapid, and the tracking-detection technology has become a mainstream technology in the field of multi-target tracking. But the method mainly focuses on detection and tracking application of pedestrians and vehicles, has less damage research on the aero-engine, and has very important significance for effectively tracking the damage of the aero-engine as the core power of the aero field. The YOLO algorithm is an efficient identification algorithm, has excellent performance for the detection and identification of the target, can be used as a target detector in a detection stage in target tracking, and is used for providing an initial position of a tracked damaged target of the aircraft engine.

In the application of the actual aeroengine hole detection technology, the internal structure of the engine is complex, the change of the hole detection shooting angle is large, the problems that a detection target is shielded, the shooting angle is changed, illumination is changed and the like can be solved, the existing technology is not suitable for the actual hole detection technology, and the requirements of real-time detection and judgment on whether repeated damage exists can not be met.

Disclosure of Invention

In order to solve the problems, the invention provides an intelligent detection and judgment method for repeated damage in an aviation hole exploration video based on a twin network and an optimal stacking block, the method is high in precision and high in processing speed, the repetition rate of damage detection is reduced, and the method is more suitable for objects with large angle change.

Firstly, carrying out target detection on internal (such as internal engine) damage through an aviation hole detection video (such as an aviation engine internal hole detection video), and reading the position of a current frame target detection frame and the depth characteristics of each detection frame image block;

then filtering the detection frame according to the confidence coefficient; carrying out maximum suppression on the detection frames so as to eliminate the phenomenon that a plurality of detection frames exist in the same damage; predicting the position of a target current frame by using a Kalman filtering tracker;

calculating a cost matrix of the track information and the detection information according to the Mahalanobis distance based on the appearance information, and sequentially performing cascade matching and IOU matching;

and finally, updating parameters and feature sets of the Kalman filtering tracker, and judging the disappearance of the target and the appearance of a new target.

In the process, in order to solve the problem of repeated damage interference caused by online statistics of damage, an intelligent detection and judgment method for repeated damage in an aviation hole exploration video is provided based on a fusion technology of a twin convolutional neural network and an optimal stack block, and the specific steps comprise:

firstly, establishing a corresponding track frame for a result detection frame detected by a first frame in an original frame of a given video in an acquired hole detection video in the aircraft engine;

step two, performing IOU matching on the frame of target detection and the frame of the previous frame passing through the track prediction one by one, and calculating a cost matrix of the frame;

step three, taking all the cost matrixes obtained in the step two as input of the Hungarian algorithm, and obtaining a linear matching result through the Hungarian algorithm;

step four, the step two and the step three are circulated until the matched result shows a track of a confirmation state or the video is finished;

predicting a prediction frame corresponding to the track of the confirmed state and the track of the unconfirmed state of the matching result through Kalman filtering; then, cascade matching is carried out on the prediction frame of the track in the confirmed state and the detection frame of the detection result of the original video frame, and a corresponding cost matrix is calculated according to the matching result;

taking all the cost matrixes obtained in the step five as input of the Hungarian algorithm, and obtaining a linear matching result through the Hungarian algorithm;

step seven, sending the matching result obtained in the step six into a model fusing a twin convolutional neural network and the optimal stacking block for similarity judgment, and detecting and judging the obtained damage;

and step eight, circulating the step five to the step seven until the video is finished.

The detection frame is the target detection frame.

The invention has the beneficial effects that:

according to the technical scheme, the semantic features of the objects which are large in shooting angle change, shielded and the like can be accurately propagated, a tracking algorithm is added on the basis of target detection, repeated damage is removed, and accuracy of damage detection is improved.

The technical reasons for achieving the above results in the technical scheme are as follows:

1) The existing detector such as YOLOV5 is used as a target detection detector, and the weight value and the network of the detector are modified so as to solve the problem that the same damage identification is greatly influenced by the change of the shooting angle.

2) The method for solving the association between the detection result and the tracking prediction result is to use the Hungarian algorithm, and the association of the motion information and the association of the target appearance information are considered here.

3) By introducing cascade matching, when a target is shielded for a long time, the uncertainty of Kalman filtering prediction is greatly increased, and the observability in a state space is greatly reduced.

If two trackers compete for the matching right of the same detection result, the uncertainty of tracking the predicted position is larger because the position information is not updated for a long time, that is, the covariance is larger, the inverse of the covariance is used in the mahalanobis distance calculation, and therefore the mahalanobis distance is smaller, so that the detection result is more likely to be associated with the track with longer occlusion time, and this undesirable effect often destroys the continuity of tracking. The final stage of matching also performs IOU-based matching on uncertain states and outdated unmatched successful tracks.

4) The fusion of a twin network and the optimal stacking speed is added, so that the problem of repeated judgment of the same damage caused by shielding, illumination and the like of an engine damage target is effectively solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of the invention as a whole.

FIG. 2 is a fusion twin network and optimal stacked block flow diagram.

Detailed Description

The invention is described in further detail below with reference to the following figures and detailed description:

the specific embodiment of the invention is as follows:

firstly, in the acquired hole detection video in the aircraft engine, for an original frame of a given video, establishing a corresponding track of a detection frame of a first frame. Specifically, the first step includes the following substeps:

step 1.1, detecting a video frame by adopting a target detector to obtain a detection frame;

step 1.2, marking out the corresponding damage in all the target frames, and extracting the characteristics; the characteristics comprise apparent characteristics and motion characteristics;

and 1.3, initializing the motion variables of Kalman filtering, and predicting the corresponding prediction frame through Kalman filtering.

And step two, performing IOU matching on the detection frame and the track frame of the previous frame one by one, and calculating a cost matrix of the detection frame and the track frame.

And step three, taking all the cost matrixes obtained in the step two as the input of the Hungarian algorithm, and obtaining a linear matching result through the Hungarian algorithm. Specifically, step three includes the following substeps:

3.1, subtracting the minimum element of each row of the cost matrix;

3.2, generating each row of elements of the matrix and subtracting the minimum element of the row;

step 3.3, connecting all zeros in the new matrix by using the least row lines and column lines, and checking whether the current distribution is optimal; if the row line and the column line do not connect all the elements of the matrix, step 3.4 is entered, otherwise step 3.5 is entered;

3.4, finding the minimum element from the elements which are not connected with the row lines and the column lines, subtracting the minimum element from the rest elements, and adding the minimum element to the element corresponding to the intersection point of the row lines and the column lines;

step 3.5, finding out 0 elements corresponding to each row and 0 elements corresponding to the columns, and finding out optimal distribution according to the 0 elements;

step 3.6, to incorporate the motion information, the square of the Mahalanobis distance between the predicted Kalman Filter State and the newly arrived measurement value is used, i.e. the first metric d ⁽¹⁾ Represents:

wherein d is ⁽¹⁾ (i, j) represents the motion matching degree between the jth detection frame and the ith track, x _i Is the predicted observed quantity of the trajectory at the current moment, S _i Is a covariance matrix of the observation space at the current moment predicted by a Kalman filter, using d _j Indicating the detected current track state of the target;

step 3.7, using the index of step 3.6, eliminates low probability correlations by setting the threshold for mahalanobis distance to 95% of the confidence interval calculated by the inverse chi-square distribution, with an indicator representing this decision:

wherein the corresponding March threshold is t ⁽¹⁾ When d is greater than d ⁽¹⁾ (i, j) is greater than t ⁽¹⁾ Then, then

Is set to 0 and otherwise is 1.

When the number of the detection frames is 0, the Markov distance between the ith track frame and the jth detection frame is not matched, and when the number of the detection frames is 1, the Markov distance is matched;

step 3.8, measuring the minimum cosine distance between the ith track frame and the jth detection frame by using a second metric, and using d ⁽²⁾ Represents:

wherein R is _i Is a container for storing the feature vector of the track frame of the matching result. r is a radical of hydrogen _j Is the feature vector of the jth detection box,

k feature vectors representing the ith trajectory box;

step 3.9, introduce a binary variable to indicate whether the metric can accept a correlation:

wherein, t ⁽²⁾ Is a threshold value, if d ⁽²⁾ (i, j) is greater than t ⁽²⁾ Then, then

Is set to 0, otherwise is 1;

when the cosine distance is 0, the cosine distance between the ith track frame and the jth detection frame is not matched, and when the cosine distance is 1, the cosine distance is matched;

step 3.10, to build the correlation problem, two metrics are combined using a weighted sum:

h _(i，j) ＝λd ⁽¹⁾ (i，j)+(1-λ)d ⁽²⁾ (i，j)

wherein d is ⁽¹⁾ (i, j) is defined in step 3.6, d ⁽²⁾ (i, j) is defined in step 3.8, lambda represents weight, and the value range is 0-1;

step 3.11, if an association is within the gated region for both metrics, it is said to be an acceptable association:

the results of linear matching are three:

the first is track mismatch, and the mismatched track is directly deleted;

the second is a mismatch between the predicted result and the detected result, when such a detected result is initialized as a new trajectory;

and the third is that the detection frame of the next frame and the prediction frame of the previous frame are successfully matched, namely the two adjacent frames are successfully tracked, and at the moment, the corresponding track variable of the detection result of the next frame is updated through Kalman filtering.

And step four, circulating the step two and the step three until the matching result shows a track of a confirmation state or all frames are processed.

And step five, predicting frames corresponding to the track of the confirmed state and the track of the unconfirmed state through Kalman filtering, and performing cascade matching on the track frame of the confirmed state and the detection frame. Specifically, the step five includes the following substeps:

step 5.1, kalman filtering prediction, namely predicting a value at the next moment according to an estimated value at the previous moment, which is called prior estimation, and predicting an error at the next moment at the same time, which is called prior error;

there are three kinds of cascade matching results:

firstly, track matching, namely updating the corresponding track variable of the track through Kalman filtering;

secondly, track mismatching, namely, carrying out IOU matching on the track in the previous uncertain state and the mismatching track together with a mismatching detection frame one by one, and calculating a cost matrix of the mismatching track through the result of the IOU matching;

and thirdly, mismatching is detected, the track in the previous uncertain state and the mismatching track are matched with the mismatching detection frame one by one, and then the cost matrix of the mismatching detection frame is calculated according to the result of the IOU matching.

And step six, taking all the cost matrixes obtained in the step five as input of the Hungarian algorithm, and obtaining a linear matching result through the Hungarian algorithm. Specifically, the sixth step includes the following substeps:

step 6.1, if the tracks are mismatched, directly deleting the mismatched tracks, otherwise, entering step 6.2;

step 6.2, if the detection results are mismatched, initializing the detection results into a new track, otherwise, entering step 6.3;

and 6.3, if the detection frame and the predicted frame are successfully matched, namely the previous frame and the next frame are successfully tracked, updating the corresponding track variables of the detection results through Kalman filtering. The method comprises the following specific steps:

step 6.3.1, updating Kalman filtering, namely firstly calculating Kalman gain, then calculating posterior estimation by using the prior estimation of the previous step, and updating the prior error to obtain a posterior error;

6.3.2, performing feature extraction on the damage at the moment, and identifying again; and carrying out duplicate removal operation on the condition that the same damage occurs in the previous frame and the next frame and different ID values are distributed.

And step seven, sending the matching result obtained in the step six into a model fusing the twin convolutional neural network and the optimal stacking block for similarity judgment, and detecting and judging the obtained damage. Specifically, step seven includes the following substeps:

7.1, for two adjacent frames of images, respectively segmenting the two adjacent frames of images to obtain a plurality of stacked blocks, wherein the stacked blocks are used as sub-images;

step 7.2, each sub-image is independently processed into a preliminary damage similarity verification decision;

7.3, judging the similarity of the two groups of sub-images through damage, and outputting a similarity score;

if the score is not greater than the threshold, not the same lesion; if the score is smaller than the threshold value, discarding the irrelevant sub-image with low judgment, and selecting the optimal verification precision of the sub-image with high judgment;

7.4, fusing the twin convolutional neural network and the optimal stack block, training to obtain a pre-training model, extracting the characteristics of the optimal selection sub-image, and fusing the characteristics into a single characteristic vector;

7.5, transmitting the feature set formed by the feature vector to a classifier to obtain a final damage similarity verification decision;

and 7.6, obtaining a damage similarity verification result, comparing, and reducing the misjudgment rate of the same damage in the aviation hole detection video.

Further, step 7.4 comprises the following sub-steps:

step 7.4.1, adopting a supervised end-to-end deep connected convolutional neural network based on feature similarity; especially the one-piece network architecture based on the feature similarity, the method is different from other methods, a twin network is used for verifying the damage similarity from the whole image through one-time feature extraction, then the single similarity measurement is applied to merge and calculate the similarity between feature vectors, and further whether two kinds of damages are the same damage is judged;

and 7.4.2, training by using the twin convolutional neural network and the optimal stacking block to obtain the best training model so as to enhance the judgment of injury similarity.

Further, step 7.5 comprises the following sub-steps:

and (3) further processing through a result judged by the classifier to improve the performance of judging the damage similarity:

deleting irrelevant parts from the sub-image blocks, and only keeping the public sub-images with strong correlation as relation evidence of damage similarity so as to minimize the distance between the damaged sub-image pairs (same damage) or maximize the distance between the damaged sub-image pairs (non-same damage);

if the original image is segmented into m blocks, i.e. m sub-images, they form m image pairs with the image to be detected;

calling similarity scores of the previous m image pairs; suppose there are N containers C = (C) ₁ ，c ₂ ，c ₃ ，...，c _n ) Each container having a parent level

And subsets

Given P _a ∈F ^p ＝(P _a，1 ，P _a，2 ，P _a，3 ，...，P _a，n ) And P _b ∈S ^c ＝(P _b，1 ，P _b，2 ，P _b，3 ，...，P _b，n ) In which P is _a ∈F ^p And P _b ∈S ^c Respectively indicate that it belongs to F ^p And S ^c M pairs of sub-images in a pair, P _a And P _b Representing each sub-image;

for each container C, a subpicture P _a And P _b Is C = (C) ₁ (P _a -P _b )，c ₂ (P _a -P _b )，c ₃ (P _a -P _b )，...，c _n (P _a -P _b ))；

Then, based on the test set T = { e = { e } ₁ (c ₁ (P _a -P _b ))，e ₂ (c ₂ (P _a -P _b ))，e ₃ (c ₃ (P _a -P _b ))，...，e _n (c _n (P _a -P _b ) B) in which e _n Representing similarity accuracy assessment, c _n Denotes a container, P _a And P _b Representing a sub-image;

using the accuracy e obtained in the previous step _n To determine the most similar appearance component between the two lesions;

after obtaining the accuracy score, the impairment component is located and selected according to a threshold that specifies similarity, with the sub-image representation that shares the most similar cues ignoring the dissimilar portions.

And step eight, repeatedly circulating the step five, the step six and the step seven until the video frame is finished.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and embellishments can be made without departing from the principle of the present invention, and these should also be construed as the scope of the present invention.

Claims

1. An intelligent detection and judgment method for repeated damage in an aviation hole detection video is characterized by comprising the following steps:

firstly, establishing a corresponding track frame for a detection frame of a first frame in an acquired aviation hole detection video for an original frame of a given video;

step two, carrying out IOU matching on the detection frame and the track frame of the previous frame one by one, and calculating a cost matrix of the detection frame and the track frame;

step four, the step two and the step three are circulated until a matched result shows a track of a confirmation state or all frames are processed;

predicting a confirmed state track prediction frame and an unconfirmed state track prediction frame of a matching result through Kalman filtering; then, cascade matching is carried out on the track prediction frame in the confirmed state and the detection frame of the original video frame detection result, and a corresponding cost matrix is calculated according to the matching result;

step six, taking all the cost matrixes obtained in the step five as input of the Hungarian algorithm, and obtaining a linear matching result through the Hungarian algorithm;

step seven, sending the matching result obtained in the step six into a model fusing the twin convolutional neural network and the optimal stacking block for similarity judgment, and detecting and judging the obtained damage;

and step eight, circulating the step five to the step seven until all the frames are processed.

2. The method for intelligently detecting and determining repetitive damage in an aviation hole detection video according to claim 1, wherein in the first step:

step 1.2, marking out corresponding damages in all target frames, and performing feature extraction; the characteristics comprise apparent characteristics and motion characteristics;

3. The method for intelligently detecting and determining the repetitive damage in the aviation hole detection video according to the claim 1, characterized in that, in the third step:

step 3.1, subtracting the minimum element of each row of the cost matrix;

3.2, generating each column of elements of the matrix and subtracting the minimum element of the column;

3.3, connecting all zeros in the new matrix by using the least row lines and column lines, and checking whether the current distribution is optimal; if the row and column lines do not connect all elements of the matrix, go to step 3.4, otherwise go to step 3.5;

3.4, finding the minimum element in the elements which are not connected with the row lines and the column lines, subtracting the minimum element from the residual elements, and adding the minimum element to the element corresponding to the intersection point of the row lines and the column lines;

step 3.6, to incorporate the motion information, the square of the mahalanobis distance between the predicted kalman filter state and the newly arrived measurement is used, i.e. the first metric is denoted d (1):

wherein d is ⁽¹⁾ (i, j) represents the motion matching degree between the jth detection frame and the ith track, x _i Is the predicted observed quantity of the trajectory at the current time, S _i Is a covariance matrix of the observation space at the current time predicted by a Kalman filter, using d _j Representing the detected current trajectory state of the target;

step 3.7, using the index of step 3.6, eliminates low probability associations by setting the threshold for mahalanobis distance to 95% of the confidence interval calculated by the inverse chi-square distribution, with an indicator to indicate this decision:

wherein the corresponding March threshold is t ₍₁₎ When d is greater than d ⁽¹⁾ (i, j) is greater than t ⁽¹⁾ Then, then

Is set to 0 and otherwise is 1.

d ⁽²⁾ (i，j)＝min{1-r _j ^T r _i ^(k) |r _i ^(k) ∈R _i }

wherein R is _i Is a container for storing the feature vector of the track frame of the matching result; r is _j Is the feature vector of the jth detection box, r _i ^(k) K feature vectors representing the ith trajectory box;

Is set to 0, otherwise is 1;

when the number is 0, the cosine distance between the ith track frame and the jth detection frame is not matched, and when the number is 1, the cosine distance is matched;

h _(i，j) ＝λd ⁽¹⁾ (i，j)+(1-λ)d ⁽²⁾ (i，j)

wherein d is ⁽¹⁾ (i, j) is defined in step 3.6, d ⁽²⁾ (i, j) is defined in step 3.8, λ represents weight, and the value range is 0-1;

step 3.11, if an association is within the gated region of both metrics, it is said to be an acceptable association:

the results of linear matching are three:

the first is track mismatch, and the mismatched track is directly deleted;

the second is that the prediction result and the detection result are mismatched, and the detection result is initialized to a new track;

and the third is that the detection frame of the next frame is successfully matched with the prediction frame of the previous frame, namely the two adjacent frames are successfully tracked, and at the moment, the corresponding track variable of the detection result of the next frame is updated through Kalman filtering.

4. The intelligent detection and determination method for repetitive damage in an aviation hole detection video according to claim 1, characterized in that in the fifth step:

step 5.1, kalman filtering prediction, namely predicting a value at the next moment according to an estimated value at the previous moment, namely priori estimation, and predicting an error at the next moment, namely a priori error, at the same time;

there are three kinds of cascade matching results:

firstly, track matching, namely updating the corresponding track variable of the track by Kalman filtering;

and thirdly, mismatching is detected, the track in the previous uncertain state, the mismatching track and the mismatching detection frame are subjected to IOU matching one by one, and then a cost matrix of the mismatching is calculated through the result of the IOU matching.

5. The intelligent detection and determination method for repetitive damage in an aviation hole detection video according to claim 1, characterized in that in the sixth step:

6.1, if the tracks are mismatched, directly deleting the mismatched tracks, otherwise, entering the step 6.2;

and 6.3, if the detection frame and the predicted frame are successfully matched, namely the previous frame and the next frame are successfully tracked, updating the corresponding track variable of the detection result through Kalman filtering.

6. The intelligent detection and determination method for repetitive damage in an aviation hole detection video according to claim 5, characterized in that in step 6.3:

step 6.3.1, kalman filtering updating, namely firstly calculating Kalman gain, then calculating posterior estimation by using the prior estimation of the previous step, and updating the prior error to obtain a posterior error;

step 6.3.2, performing feature extraction on the damage at the moment, and identifying again; and performing duplicate removal operation on the condition that the same damage occurs in the previous frame and the next frame and different ID values are allocated.

7. The intelligent detection and judgment method for repetitive damage in an aviation hole detection video according to claim 1, characterized in that in step seven:

step 7.1, for two adjacent frames of images, respectively dividing the two adjacent frames of images into a plurality of stacked blocks, wherein the stacked blocks are used as sub-images;

7.3, judging the damage similarity of the two groups of subimages, and outputting a similarity score;

7.4, fusing a twin convolution neural network and the optimal stack block, training to obtain a pre-training model, extracting the characteristics of the optimal selection sub-image, and fusing the characteristics into a single characteristic vector;

8. The method as claimed in claim 7, wherein said step 7.4 comprises:

step 7.4.1, adopting a supervised end-to-end deep connected convolutional neural network based on characteristic similarity;

the convolutional neural network is a connected network architecture based on feature similarity, a twin network is used for verifying damage similarity from the whole image through one-time feature extraction, and then single similarity measurement is applied to merge and calculate the similarity between feature vectors so as to judge whether two kinds of damages are the same damage;

and 7.4.2, training by using a twin convolutional neural network and the optimal stacking block to obtain the best training model so as to enhance the judgment of injury similarity.

9. The method as claimed in claim 7, wherein said step 7.5 comprises:

deleting irrelevant parts from the sub-image blocks, and only keeping the public sub-images with strong correlation as relation evidence of damage similarity so as to minimize the distance between the damaged sub-image pairs, namely the same damage, or maximize the distance between the damaged sub-image pairs, namely the non-same damage;

And subsets

after obtaining the accuracy score, the impairment component is located and selected according to a threshold specifying similarity, with the sub-image representations that share the most similar cues ignoring the dissimilar parts.

10. The method as claimed in claim 1, wherein the aviation hole probe video is an aviation engine internal hole probe video.