WO2019232831A1

WO2019232831A1 - Method and device for recognizing foreign object debris at airport, computer apparatus, and storage medium

Info

Publication number: WO2019232831A1
Application number: PCT/CN2018/092614
Authority: WO
Inventors: 叶明�
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-06-06
Filing date: 2018-06-25
Publication date: 2019-12-12
Also published as: CN108764202B; CN108764202A

Abstract

A method and device for recognizing foreign object debris at airport, a computer apparatus, and a storage medium. The method comprises: if a foreign object debris detection model detects foreign object debris in an image undergoing detection, acquiring a preset number of images adjacent to said image to form a set of recognition images; acquiring a comparison result according to the level of similarity between feature vectors of the foreign object debris at corresponding positions in the set of recognition images and a reference feature vector; and generating a recognition result according to the comparison result. The invention can prevent an image undergoing detection from being affected by changes in the surrounding environment, such as changes in light and shadow, so as to avoid incorrect determination of a recognition result, such that certain incorrectly determined samples can be filtered out in a process of foreign object debris recognition, thereby enhancing precision of foreign object debris recognition.

Description

Airport foreign object identification method, device, computer equipment and storage medium

This application is based on a Chinese invention patent application filed on June 06, 2018 with application number 201810574129.1 and entitled "Airport Foreign Object Identification Method, Device, Computer Equipment, and Storage Medium", and claims its priority.

Technical field

The present application relates to the field of image recognition, and in particular, to a method, a device, a computer device, and a storage medium for identifying foreign objects in an airport.

Background technique

Various anomalous objects often appear in airport runways. They are called FOD (Foreign Object Debris). FOD refers to some kind of foreign material that may damage aircraft or systems. It is often called runway foreign body. There are many types of FOD, such as aircraft and engine connections (nuts, screws, washers, fuses, etc.), machine tools, flying items (nails, personal documents, pens, pencils, etc.), wild animals, leaves, stones and sand, Road surface materials, wooden blocks, plastic or polyethylene materials, paper products, ice cream in the running area, etc. Experiments and cases have shown that foreign objects on the airport pavement can be easily drawn into the engine, causing the engine to fail. Debris can also accumulate in mechanical devices, affecting the normal operation of equipment such as landing gear and flaps.

However, due to the development of artificial intelligence, attempts have been made to use deep learning object detection models to detect foreign objects in airports. However, the existing deep learning object detection models are mainly divided into two types: two-stage detection (FastRCNN, FasterRCNN, etc.) and single-step detection (Single stage detector) models (FCN, SSD, etc.). In the case where the traditional two-step detection model has a very low ratio (less than one thousandth) of the object scene, the region selection is difficult and the calculation speed is slow. It is not suitable for scenarios with certain real-time requirements. The traditional single-step detection model is not sensitive enough for small objects, and for small objects, the final detection position is prone to deviation.

Summary of the Invention

Based on this, it is necessary to provide an airport foreign object recognition method, device, computer equipment, and storage medium that can improve the accuracy of airport foreign object recognition in view of the above technical problems.

An airport foreign body identification method includes:

Acquiring a detection image, detecting the detection image by using a foreign object detection model, and acquiring a detection result;

If the detection result is that there is a foreign object in the detection image, obtaining a position of the foreign object in the detection image as a reference position, and extracting a feature vector of the foreign object according to the reference position as a reference feature vector;

Acquiring successive predetermined frames of images according to the detected images to form a recognition image set;

Extracting a feature vector of each recognition image in the recognition image set according to the reference position, and comparing the feature vector of each recognition image and the feature vector similarity of the reference feature vector to obtain a comparison result;

A recognition result is generated according to the comparison result, and the recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.

An airport foreign body identification device includes:

A detection result acquisition module, configured to acquire a detection image, detect the detection image by using a foreign object detection model, and obtain a detection result;

A reference feature vector acquisition module, configured to obtain a position of the foreign object in the detection image if the detection result is that there is a foreign object in the detection image, and use it as a reference position to extract the foreign object according to the reference position; Eigenvectors as the reference eigenvectors;

A recognition image set composition module, configured to obtain consecutive predetermined frames of images according to the detection image to form a recognition image set;

A comparison result acquisition module, configured to extract a feature vector of each recognition image in the recognition image set according to the reference position, and compare the feature vector of each recognition image and the feature vector similarity of the reference feature vector to obtain a comparison result;

A recognition result acquisition module is configured to generate a recognition result according to the comparison result, where the recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.

A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:

One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below, and other features and advantages of the present application will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.

FIG. 1 is a schematic diagram of an application environment of an airport foreign object identification method according to an embodiment of the present application; FIG.

FIG. 2 is an example diagram of an airport foreign object identification method in an embodiment of the present application; FIG.

FIG. 3 is an example diagram of step S10 of an airport foreign object recognition method according to an embodiment of the present application;

FIG. 4 is an example diagram of step S11 of an airport foreign object recognition method in an embodiment of the present application; FIG.

FIG. 5 is an example diagram of step S12 of the airport foreign object recognition method in an embodiment of the present application; FIG.

FIG. 6 is a principle block diagram of an airport foreign body identification device according to an embodiment of the present application; FIG.

FIG. 7 is a schematic diagram of a computer device in an embodiment of the present application.

Detailed ways

In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.

The airport foreign body identification method provided in this application can be applied in the application environment shown in FIG. 1, where a client (computer device) communicates with a server through a network. The client sends a detection image to the server, and the server recognizes the detection image and generates a recognition result. Among them, the client (computer device) can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, video capture devices, and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of multiple servers.

In one embodiment, as shown in FIG. 2, a method for identifying foreign objects at an airport is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:

S10: Obtain a detection image, use a foreign object detection model to detect the detection image, and obtain a detection result.

The detection image is formed by dividing the video data in the surveillance video of the airport into images of a predetermined frame at a certain time interval. Preferably, the detection images are sorted in chronological order, and then the detection images are detected using a foreign object detection model to obtain detection results. Optionally, the detection result includes detecting the presence or absence of a foreign object in the image. The foreign object detection model is a pre-trained recognition model. Optionally, the foreign object detection model can be a two-stage detection (FastRCNN, FasterRCNN, etc.) or a single-step detection (Single stage detection detector) model (FCN, SSD Etc.) to achieve. The detection image is detected through a foreign body detection model trained in advance, and the foreign body detection model outputs a detection result.

S20: If the detection result is that there is a foreign object in the detection image, obtain the position of the foreign object in the detection image as a reference position, and extract a feature vector of the foreign object according to the reference position as a reference feature vector.

The foreign object detection model is used to detect the detection image. If the foreign object detection model outputs a foreign object in the detection image, the position of the foreign object in the detection image is obtained, and a corresponding feature vector is extracted as a reference feature vector based on the position. Specifically, the detected foreign objects may be uniformly scaled to a predetermined size (such as 32 * 32), and then a feature vector may be extracted as a reference feature vector. Optionally, a color histogram and a direction gradient histogram (HOG, Histogram of Gradient) of a foreign object may be extracted to form a reference feature vector.

S30: Obtain consecutive predetermined frames of images according to the detected images to form a recognition image set.

Based on the detected image, corresponding consecutive predetermined frames of images are acquired to form a recognition image set. The continuous predetermined frame image refers to a predetermined predetermined frame image adjacent to the detection image in the video data where the detection image is located. For example, the next 20 frames of images corresponding to the detection image are acquired to form a recognition image set.

S40: Extract the feature vector of each recognition image in the recognition image set according to the reference position, and compare the feature vector of each recognition image and the feature vector similarity of the reference feature vector to obtain a comparison result.

The feature vector of each recognition image in the recognition image set is obtained according to the reference position, and the feature vector of each recognition image and the reference feature vector are compared to obtain the feature vector similarity. Specifically, algorithms such as the Ming distance, Euclidean distance, or Mahalanobis distance can be used to calculate the feature vector similarity between the feature vector of each recognition image and the reference feature vector. The calculated feature vector similarity is compared with a preset similarity threshold, and a comparison result is obtained, which may specifically be similar and dissimilar. For example: when the feature vector similarity is greater than or equal to the similarity threshold, the comparison result is similar; when the feature vector similarity is less than the similarity threshold, the comparison result is dissimilar.

S50: Generate a recognition result according to the comparison result. The recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.

The comparison result of each recognition image in the statistical recognition image set is statistically recognized. When the number of similarities in the comparison result is greater than or equal to the determination threshold, the recognition result is confirmed as a foreign object. When the number of similarities in the comparison result is less than the determination threshold, the recognition result is confirmed as a non-foreign object. The determination threshold may be set by identifying the number of images in the image set. For example, the determination threshold is 60%, 80%, or 90% of the number of images in the identification image set.

When detecting an image, it is possible that shadows are formed in the detection image due to the influence of changes in the surrounding environment (lighting, shadows, etc.). When using a foreign object detection model to detect a detection image, it is possible to identify the shadow as a foreign object. When the detection result of the foreign object detection model is the presence of foreign objects, the influence of the presence of shadows in the detection image on the recognition results is excluded by further comparing the similarity of the feature vectors at corresponding positions in the consecutive predetermined frames of the detection image.

In this embodiment, when a foreign object is detected in the detection image through the foreign object detection model, a continuous predetermined frame image of the detection image is acquired to form a recognition image set. The recognition result is obtained by identifying the similarity between the feature vector of the corresponding position of the foreign body in the image set and the feature vector of the reference feature vector, and finally generating a recognition result based on the comparison result. It can avoid the misjudgment of the recognition result caused by the changes of the surrounding environment (light, shadow, etc.) in the detection image, so as to filter out some misjudged samples during the foreign object recognition process, thereby improving the accuracy of foreign object recognition at the airport.

In an embodiment, as shown in FIG. 3, detecting a detection image by using a foreign object detection model to obtain a detection result includes the following steps:

S11: Preprocess the detection image to obtain an image to be identified.

Preprocessing the detection image refers to performing enhanced processing on the detection image to improve subsequent detection accuracy. When acquiring a detection image, there are many factors that affect the detection image, such as: uneven illumination, restrictions on acquisition equipment, and different acquisition environments will cause the clarity of the detection image to be insufficient, resulting in a reduction in subsequent recognition accuracy. Therefore, in this step, the detection image is pre-processed to improve subsequent detection accuracy. Optionally, an image enhancement algorithm may be used to perform global enhancement or local enhancement processing on the detected image, and then sharpen processing is performed on the enhanced detected image to obtain an image to be identified. Preferably, the image enhancement algorithm may be a multi-scale retina algorithm, an adaptive histogram equalization algorithm, or an optimized contrast algorithm. After performing AND processing on the detection image, an image to be identified is obtained.

S12: Input the image to be identified into a full-difference-pyramid feature network recognition model for recognition, and obtain a classification confidence map.

Among them, the full-difference-pyramid feature network recognition model refers to a full-difference (DenseNet, Densely Connected, Convolutional Networks) as the coding network and a pyramid feature (RefineNet, Multi-Path, Refinement, Networks) as the decoding network according to the coding-decoding model Neural network recognition model.

Specifically, the full-difference network is a splicing of the network of different layers in the neural network model, so that the input of each layer of the network includes the output of all the layers of the previous layer, which can avoid the loss of small objects during the model upsampling process. The full-difference network can improve the transmission efficiency of information and gradients in the network. Each layer can directly obtain the gradient from the loss function and directly obtain the input signal. This can train a deeper network. This network structure also has regularization. Effect, the full difference network improves network performance from the perspective of feature reuse. Therefore, using a full-difference network not only reduces the phenomenon of small objects missing during the upsampling process of the model, but also improves the training speed and reduces the phenomenon of overfitting.

Pyramid feature network is an improved multi-path network. It extracts all the information during the downsampling process and uses a long-distance network connection to obtain a high-resolution prediction network. The pyramid feature network uses the features of the fine layer, so that the semantic information of the high level can be improved. A large number of RCUs (residual connection units) are used in the pyramid feature network, which makes short-range connections within the pyramid feature network, which is beneficial for training. In addition, the pyramid feature network also forms a long-range connection with the full-difference network, allowing the gradient to be effectively transmitted to the entire network, increasing the impact of the underlying features on the final result, and effectively improving the positioning accuracy of objects (airport foreign objects).

A classification confidence map refers to an image that is labeled and displayed in different ways for different categories in the image after the image to be identified is detected. Optionally, different colors may be used to distinguish different categories in the image to be identified. For example: in the detection image, the possible objects are runways, lawns, airport equipment (non-foreign objects) and airport foreign objects. Therefore, different colors can be given to the above-mentioned different types of objects in advance. After inputting the image to be identified into the full-difference-pyramid feature network recognition model for recognition, the full-difference-pyramid feature network recognition model is based on different judgment results of different regions in the to-be-recognized image and combined with different colors in advance to form a classification confidence map .

In a specific embodiment, foreign objects in the airport can also be marked with more specific objects, such as: engine connections (nuts, screws, washers, fuses, etc.), machine tools, flying items (nails, personal documents, pens, pens, Pencil, etc.) and animals. These are classified into the category of foreign objects at the airport. In this way, when the foreign objects at the airport are identified, the specific foreign object type can be further determined to facilitate the formulation of appropriate treatment measures.

S13: Obtain a detection result according to the classification confidence map, and the detection result includes detecting the presence of a foreign object in the image and the absence of a foreign object in the detection image.

After obtaining the classification confidence map, the detection results can be obtained according to different colors on the classification confidence map. The detection results include detecting the presence of foreign objects in the image and the absence of foreign objects in the image. For example, if the foreign object at the airport is set to red in the preset settings, after obtaining the classification confidence map, it is determined whether a red area exists in the classification confidence map to obtain different detection results. If there is a red area in the classification confidence image, it means that there is a foreign object in the detection image, and at this time, the detection result is that there is a foreign object in the detection image. If there is no red region in the classification confidence map, it means that there is no foreign object in the detection image, and the detection result is that there is a foreign object in the detection image. Optionally, the detection result may be embodied in text, voice, or signal light, or a combination of at least two of text, voice, or signal light. For example, when the detection result is that there is a foreign object in the detected image, a voice prompt may be sent and a warning light may be used as a reminder to better remind relevant personnel to process.

In one embodiment, when there is a red area in the classification confidence map, the location information of the foreign object at the airport can also be obtained. At this time, the detection result also includes the location information of the foreign object at the airport. Specifically, an identification mark may be assigned to each of the images to be identified in advance, and used to locate the image source of the image to be identified, for example, acquired by the camera device where the identification mark is located. In this way, when a red area exists in the classification confidence map, the position of the red area in the image to be identified can be obtained, and the identification of the image to be identified can be combined to obtain the actual airport foreign object corresponding to the red area in the airport. s position.

This embodiment obtains a to-be-recognized image by preprocessing the detection image to improve subsequent detection accuracy. A full-difference-pyramid feature network recognition model is used to recognize the image to be recognized, which ensures the recognition accuracy and positioning accuracy of small objects during the recognition process, and also improves the recognition efficiency.

In an embodiment, as shown in FIG. 4, in step S11, the detection image is preprocessed to obtain an image to be identified, which specifically includes the following steps:

S111: Use a multi-scale retinal algorithm to perform global enhancement processing on the detected image.

Among them, the Multi-Scale Retinex (MSR) algorithm is an image enhancement processing algorithm, which is used to reduce the influence of various factors (such as interference noise, lack of edge details, etc.) on the original unprocessed image. The multi-scale retinal algorithm is used to enhance the detection image. By removing the illumination image of the detection image, retaining the reflection image, and adjusting the gray dynamic range of the detection image, the reflection information of the reflection image corresponding to the detection image is obtained. To achieve enhanced effects.

Preferably, the multi-scale retinal algorithm is used to perform global enhancement processing on the detection image, which specifically includes:

Use the following formula to globally enhance the detection image:

Among them, N is the number of scales, (x, y) is the coordinate value of the detected image pixels, and G (x, y) is the input of the multi-scale retina algorithm, that is, the gray value of the detected image, R (x, y) Is the output of the multi-scale retinal algorithm, that is, the gray value of the detected image after global enhancement processing, w _n is the weight factor of the scale, and its constraint is

F _n (x, y) is the n-th center wrapping function, and the expression is:

In the formula, σ _n is the scale parameter of the n-th center surround function, and the coefficient K _n must satisfy:

Specifically, the gray value G (x, y) of the detected image is obtained by an image information acquisition tool, and the value of the scale parameter σ _n of the input n center surround functions is determined to satisfy

The value of K _n , and then calculate the center surrounding functions F _n (x, y) and G (x, y) according to the following formula to obtain the gray value R (x, y) of the detected image after global enhancement processing :

Among them, σ _n determines the size of the neighborhood of the center surround function, and its size determines the quality of the detected image. When σ _{n is} larger, the selected range of the neighborhood is larger. Detect local details of an image.

In a specific embodiment, the number of selected scales n = 3, and correspondingly set:

σ ₁ = 30, σ ₂ = 110, σ ₃ = 200;

Among them, σ ₁ , σ _2, and σ ₃ correspond to low gray, middle gray, and high gray of the gray value interval [0,255] of the detection image, respectively, and set w ₁ = w ₂ = w ₃ = 1/3. According to the setting of the above parameters, the multi-scale retinal algorithm simultaneously takes into account the three gray scales of low gray, medium gray and high gray, so as to obtain better results. The multi-scale retinal algorithm can achieve good self-adaptability by combining multiple scales, highlighting the texture details of dark areas of the image, and can adjust the dynamic range of the image to achieve the purpose of image enhancement.

S112: The Laplace operator is used to sharpen the detection image after the global enhancement processing to obtain an image to be identified.

Laplacian operator is a second-order differential operator, which is suitable for improving image blur caused by diffuse reflection of light. Laplace operator sharpening transformation on the image can reduce the blur of the image and improve the sharpness of the image. Therefore, by performing a sharpening process on the detection image after the global enhancement processing, the edge detail features of the detection image after the global enhancement processing are highlighted, thereby improving the contour definition of the detection image after the global enhancement processing. Sharpening processing refers to the transformation of sharpening an image to enhance the target boundaries and image details in the image. After the global enhancement processing of the detected image is sharpened by the Laplacian operator, the edge details of the image are enhanced and the halo is weakened, thereby protecting the details of the detected image.

The Laplace operator based on second-order differential is defined as:

For the detection image R (x, y) after global enhancement processing, its second derivative is:

Therefore, Laplace operator ▽ ² R is:

▽ ² R = R (x + 1, y) + R (x-1, y) + R (x, y + 1) + R (x, y-1) -4R (x, y);

After the Laplace operator ▽ ² R is obtained, the gray value of each pixel of the detection image R (x, y) after the global enhancement process is sharpened with the Laplace operator ▽ ² R according to the following formula: To obtain the sharpened pixel gray value, where g (x, y) is the sharpened pixel gray value.

The sharpened pixel gray value is replaced with the gray value at the original (x, y) pixel to obtain the image to be identified.

In a specific embodiment, the Laplace operator ▽ ² R selects a four-neighbor sharpening template matrix H:

A four-neighbor sharpening template matrix H is used to perform Laplace operator sharpening on the detected image after global enhancement processing.

In this embodiment, the multi-scale retinal algorithm is used to perform global enhancement processing on the detection image, and the detection image after the multi-scale retinal algorithm enhancement processing is used to sharpen the Laplacian operator. At the same time, the edge details of the image are enhanced. The halo is also weakened, thereby protecting the details of the detected image. In addition, the above steps are not only simple and convenient. After processing, the edge details of the image to be identified are more prominent, and the texture features of the image to be identified are enhanced, which is beneficial to improving the accuracy of the image to be identified.

In an embodiment, as shown in FIG. 5, before the steps of inputting an image to be identified into a full-difference-pyramid feature network recognition model for recognition and obtaining a classification confidence map, the airport foreign object recognition method further includes:

S121: Obtain a training sample set, and classify and label the training images in the training sample set.

The training sample set includes training images, and the training images refer to the sample images used to train the full-difference-pyramid feature network recognition model. Optionally, the training image may be obtained by setting a video capture device or an image capture device at different locations in the airport to collect corresponding data, and the video capture device or the image capture device collects the corresponding data and sends it to the server. If the server obtains video data, the video data may be framed at a predetermined frame rate to obtain a training image. Classifying and labeling training images refers to classifying different objects in the training images. For example, in the training image, the objects that may appear are runways, lawns, airport equipment (non-foreign objects), and airport foreign objects. By assigning different labeling information to different objects in the training image, the classification labeling of the training image is completed.

S122: Use the training images in the training sample set to classify and label the training network to obtain the target output vector.

In this step, the training network is trained using the training images in the training sample set to label the full difference network. In the full difference network, the training image input is set to x _{0. The} full difference network consists of L-layer structures, and each layer is completely different. The networks all contain a non-linear transformation H _l (·). Optionally, the non-linear transformation may include ReLU (Recified Linear Units) and Pooling, or BN (Batch Normalization), ReLU, and convolutional layers, or BN, ReLU, and Pooling. Among them, BN is to adjust the distribution of the input value of any neuron in each layer of the neural network to a normal normal distribution with a mean value of 0 and a variance of 1 by means of normalization, so that the activation input value falls in a non-linear function that is sensitive to the input Region, make the gradient larger, avoid the problem of gradient disappearance, and greatly speed up the training speed. ReLU is a piecewise linear function and a one-sided suppression function. It can output all the negative values of the input to 0, while the positive values of the input remain unchanged. ReLU can realize the sparse model to better mine related features and fit the training data.

In this embodiment, suppose that the output of the first layer in the full-difference network is x _l , and each layer in the full-difference network is directly connected to all the previous layers, that is:

x _l = H _l ([x ₀ , x ₁ , ..., x _l-1 ]);

The output of the corresponding layer in the full difference network constitutes the target output vector for subsequent training of the pyramid feature network using the target output vector.

S123: Use the target output vector to train the pyramid feature network to obtain a full-difference-pyramid feature network recognition model.

In the pyramid feature network, the output of each layer in the target output vector in the full-difference network is connected to the RCU unit of the pyramid feature network, respectively. That is, there are RCU units in the pyramid feature network that have the same number of layers as the target output vector in the full-difference network.

The RCU unit refers to the unit structure extracted from the full-difference network, and specifically includes ReLU, convolution and summation. The target output vectors of each layer obtained in the full-difference network are respectively subjected to ReLU, convolution and summing operations. The output of each layer of the RCU unit is processed using Multi-resolution fusion to obtain different output feature maps. First, the output feature maps of each layer of the RCU unit are adaptively processed by a convolution layer. Then upsampling to the maximum resolution of this layer. Chained residual pooling samples the output feature maps of different resolutions of the input to the same size as the maximum output feature map and then superimposes them. Finally, the superimposed output feature map is convolved by an RCU to obtain a fine feature map.

The function of the pyramid feature network is to fuse feature maps with different resolutions. First divide the pre-trained full-difference network into several full-difference blocks according to the resolution of the feature map, and then fuse these blocks to the right as several paths to fuse through the pyramid feature network, and finally obtain a fine feature map (subsequent connection softmax layer, and then output through bilinear interpolation).

In the pyramid feature network, the target feature vector of the full-difference network is used to train the pyramid feature network to form a preliminary training network. The verification samples are then used to verify and adjust the pyramid feature network until a preset classification accuracy rate is obtained, and the training ends. . The preset classification accuracy can be set according to the needs of the actual recognition model.

In this embodiment, a full-difference-pyramid feature network recognition model is obtained by training with a training sample set after classification and labeling, which ensures the recognition accuracy and speed of the full-difference-pyramid feature network recognition model.

In an embodiment, training a full-difference network specifically includes:

Set the initial convolution layer of the full-difference network, and use the largest pooling layer in the full-difference network for downsampling.

The convolution layer is used to extract the features of the input image, and the initial convolution layer extracts the features of the training image. Optionally, the initial convolution layer uses a 7 * 7 convolution kernel. The maximum pooling layer in the full-difference network is used for downsampling. During the sampling process, if the new sampling rate is less than the original sampling rate, it is downsampling. Max-pooling refers to the maximum value of all neurons in the area taken by the sampling function. The input image through the initial convolution layer is subjected to maximum pooling processing, feature compression, main features are extracted, and network computation complexity is simplified.

A three-layer full-difference network module is provided. Each full-difference network module includes a full-difference convolution layer and a full-difference activation layer. The activation function in the full-difference activation layer adopts a linear activation function.

In these three layers of fully differential network modules, the output of each fully differential network module is the combination of the outputs of all the previous modules, that is:

x _l = H _l ([x ₀ , x ₁ , ..., x _l-1 ]), l = 1, 2, 3;

Among them, each H _l (·) is a combination of two operations of the convolutional layer and the activation layer: Conv-> ReLU. Optionally, the size of the convolution kernel in the full-difference convolution layer is 3 * 3. The number of features output by each H _l (·) is the feature growth rate. Optionally, if the feature growth rate is set to 16, then the number of output features output by the three-layer fully-differential network module is 48. The linear activation function formula is:

By transforming the linear activation function, the time of the training process can be quickly converged.

A transmission layer is set between the fully differential network modules, and each transmission layer includes a normalization layer, a transmission activation layer, and an average pooling layer.

In the full-difference network module, the output characteristics of each full-difference network module are increasing. In the above setting, if the feature growth rate is 16, the output characteristic of the three-layer full-difference network module output is 48. In this way, the amount of calculation is gradually increased, so the transmission layer is introduced, and the transmission parameters are set to indicate how many times the input of the transmission layer is reduced. For example, the transmission parameter is 0.6, that is, the input of the transmission layer is reduced to the original 0.6.

In this embodiment, by setting the network structure and various parameters in the full-difference network, the training speed and accuracy of the full-difference network are guaranteed.

In an embodiment, in the process of training the full-difference-pyramid feature network recognition model, the loss function is implemented using a Focal Loss function:

FL (p _t ) =-(1-p _t ) ^γ log (p _t );

among them,

p _t is the prediction value of the full difference-pyramid feature network recognition model for the training image, p is the estimated probability of the model for the training image y = 1, p ∈ [0,1], y is the labeled value of the training image, and γ is the adjustment parameter.

Loss function refers to a function that maps an event (an element in a sample space) to a real number that expresses the economic or opportunity cost associated with its event. In this embodiment, when training the full-difference-pyramid feature network recognition model, a loss function is used to measure the prediction of the full-difference-pyramid feature network recognition model. The smaller the loss function, the better the predictive ability of the recognition model. In the embodiment of the present application, the number of sample images of each classification in the training images of the training sample set may be uneven, especially the training images containing foreign objects in the airport may be less. The prediction function was selected for the loss function.

Therefore, the Focal Loss function is used to implement the loss function. The Focal loss function adds an adjustment factor (1-p _t ) ^γ , where the value of the adjustment parameter γ is between [0,5]. y is the labeling value of the training image. For example, for foreign object labels in the training image, if it is a foreign object, it is labeled as y = 1, and for non-foreign objects, it is labeled as y = -1. When a training image is misclassified, P _{t is} small. At this time, the adjustment factor (1-p _t ) ^{γ is} close to 1, and the loss will not have a great impact. When the value of P _t is large, it will approach 1. The value of the adjustment factor approaches 0, so the loss value for the correctly classified samples is reduced.

In this embodiment, the FocalLoss function is used in the process of training the full difference-pyramid feature network recognition model, which can reduce the impact of uneven sample classification on training the full difference-pyramid feature network recognition model, and has also improved the follow-up. The effect of detection accuracy.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

In one embodiment, an airport foreign body identification device is provided. The airport foreign body identification device corresponds to the airport foreign body identification method in the above embodiment in a one-to-one correspondence. As shown in FIG. 6, the airport foreign object recognition device includes a detection result acquisition module 10, a reference feature vector acquisition module 20, a recognition image set composition module 30, a comparison result acquisition module 40, and a recognition result acquisition module 50. The detailed description of each function module is as follows:

The detection result acquiring module 10 is configured to acquire a detection image, detect the detection image by using a foreign object detection model, and obtain a detection result.

A reference feature vector acquisition module 20 is configured to obtain a position of the foreign object in the detection image if the detection result is that there is a foreign object in the detection image, and use the reference position to extract a feature vector of the foreign object according to the reference position as a reference feature vector.

The recognition image set composition module 30 is configured to obtain consecutive predetermined frames of images according to the detection image to form a recognition image set.

The comparison result acquisition module 40 is configured to extract a feature vector of each recognition image in the recognition image set according to the reference position, and compare the feature vector of each recognition image and the feature vector similarity of the reference feature vector to obtain a comparison result.

The recognition result acquisition module 50 is configured to generate a recognition result according to the comparison result. The recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.

Preferably, the detection result acquisition module 10 includes an image-to-be-identified image acquisition unit 11, a classification confidence map acquisition unit 12, and a detection result acquisition unit 13.

The to-be-recognized image obtaining unit 11 is configured to preprocess the detection image to obtain the to-be-recognized image.

A classification confidence map acquisition unit 12 is configured to input an image to be identified into a full-difference-pyramid feature network recognition model for recognition, and obtain a classification confidence map.

The detection result obtaining unit 13 is configured to obtain a detection result according to the classification confidence map, and the detection result includes detecting the presence of a foreign object in the image and the absence of a foreign object in the detection image.

Preferably, the image acquisition unit 11 includes a global enhancement processing sub-unit 111 and a sharpening processing sub-unit 112.

The global enhancement processing sub-unit 111 is configured to perform global enhancement processing on the original image by using a multi-scale retinal algorithm.

A sharpening processing unit 112 is configured to use a Laplacian to sharpen the original image after global enhancement processing to obtain an image to be identified.

Preferably, the airport foreign body identification device further includes a training sample set acquisition module 121, a target output vector acquisition module 122, and a recognition model acquisition module 123.

The training sample set obtaining module 121 is configured to obtain a training sample set and classify and label the training images in the training sample set.

A target output vector obtaining module 122 is configured to train a full difference network using training images in which training samples are classified and labeled to obtain a target output vector.

A recognition model acquisition module 123 is used to train a pyramid feature network using a target output vector to obtain a full-difference-pyramid feature network recognition model.

For specific limitations on the airport foreign body identification device, refer to the foregoing limitation on the airport foreign body identification method, which will not be repeated here. Each module in the above-mentioned airport foreign body identification device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 7. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer equipment is used to store detection images and foreign object detection model data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement an airport foreign object identification method.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:

Obtain the detection image, use the foreign object detection model to detect the detection image, and obtain the detection result.

If the detection result is that there is a foreign object in the detection image, the position of the foreign object in the detection image is obtained as a reference position, and a feature vector of the foreign object is extracted according to the reference position as a reference feature vector.

Consecutive predetermined frame images are acquired according to the detected images to form a recognition image set.

The feature vector of each recognition image in the recognition image set is extracted according to the reference position, and the feature vector of each recognition image and the feature vector similarity of the reference feature vector are compared to obtain a comparison result.

A recognition result is generated based on the comparison result, and the recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.

In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more Each processor performs the following steps:

A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by using computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a non-volatile computer. In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the storage, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile storage. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to describe the technical solution of the present application, but not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.

Claims

An airport foreign body identification method, comprising:

Acquiring a detection image, detecting the detection image by using a foreign object detection model, and acquiring a detection result;

If the detection result is that there is a foreign object in the detection image, obtaining a position of the foreign object in the detection image as a reference position, and extracting a feature vector of the foreign object according to the reference position as a reference feature vector;

Acquiring successive predetermined frames of images according to the detected images to form a recognition image set;

Extracting a feature vector of each recognition image in the recognition image set according to the reference position, and comparing the feature vector of each recognition image and the feature vector similarity of the reference feature vector to obtain a comparison result;

A recognition result is generated according to the comparison result, and the recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.
The foreign object identification method of an airport according to claim 1, wherein the detecting the detected image by using a foreign object detection model to obtain a detection result specifically includes:

Preprocessing the detection image to obtain an image to be identified;

Inputting the to-be-recognized image into a full-difference-pyramid feature network recognition model for recognition, and obtaining a classification confidence map;

A detection result is obtained according to the classification confidence map, and the detection result includes detecting the presence of a foreign object in the image and the absence of a foreign object in the detection image.
The method for identifying foreign objects in an airport according to claim 2, wherein the preprocessing the detection image to obtain an image to be identified specifically includes:

Using a multi-scale retinal algorithm to globally enhance the detection image;

A Laplace operator is used to sharpen the detection image after the global enhancement processing to obtain an image to be identified.
The foreign object identification method for an airport according to claim 2, wherein before the step of inputting the image to be identified into a full-difference-pyramid feature network identification model to obtain a classification confidence map, the airport Foreign object identification methods also include:

Acquiring a training sample set, and classifying the training images in the training sample set;

Training a full-difference network by using the classified labeled training images in the training sample set to obtain a target output vector;

The target output vector is used to train a pyramid feature network to obtain the full-difference-pyramid feature network recognition model.
The method for identifying foreign objects in an airport according to claim 4, wherein the training of the full difference network specifically comprises:

Setting an initial convolution layer of the full-difference network, and using the largest pooling layer in the full-difference network for downsampling;

Setting up three layers of full-difference network modules, each of which includes a full-difference convolution layer and a full-difference activation layer, and the activation function in the full-difference activation layer adopts a linear activation function;

A transmission layer is provided between the fully differential network modules, and each of the transmission layers includes a normalization layer, a transmission activation layer, and an average pooling layer.
The method for identifying foreign objects in an airport according to claim 4, characterized in that, in the process of training the full difference-pyramid feature network recognition model, the loss function is implemented using a FocalLoss function:

FL (p t ) =-(1-p t ) γ log (p t );

among them,
p t is the predicted value of the full difference-pyramid feature network recognition model for the training image, p is the estimated probability of the model for the training image y = 1, p ∈ [0,1], and y is the training The label value of the image, γ is the adjustment parameter.
An airport foreign body identification device, comprising:

A detection result acquisition module, configured to acquire a detection image, detect the detection image by using a foreign object detection model, and obtain a detection result;

A reference feature vector acquisition module, configured to obtain a position of the foreign object in the detection image if the detection result is that there is a foreign object in the detection image, and use it as a reference position to extract the foreign object according to the reference position; Eigenvectors as the reference eigenvectors;

A recognition image set composition module, configured to obtain consecutive predetermined frames of images according to the detection image to form a recognition image set;

A comparison result acquisition module, configured to extract a feature vector of each recognition image in the recognition image set according to the reference position, and compare the feature vector of each recognition image and the feature vector similarity of the reference feature vector to obtain a comparison result;

A recognition result acquisition module is configured to generate a recognition result according to the comparison result, where the recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.
The foreign body identification device for an airport according to claim 7, wherein the detection result acquisition module comprises:

An image-to-be-identified obtaining unit, configured to pre-process the detection image to obtain an image to be identified;

A classification confidence map acquisition unit, configured to input the image to be identified into a full-difference-pyramid feature network recognition model for recognition, and obtain a classification confidence map;

A detection result obtaining unit is configured to obtain a detection result according to the classification confidence map, where the detection result includes a detection of the presence of a foreign object in the image and a detection of the absence of a foreign object in the image.
A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and is characterized in that the processor implements the computer-readable instructions as follows step:

Acquiring a detection image, detecting the detection image by using a foreign object detection model, and acquiring a detection result;

If the detection result is that there is a foreign object in the detection image, obtaining a position of the foreign object in the detection image as a reference position, and extracting a feature vector of the foreign object according to the reference position as a reference feature vector;

Acquiring successive predetermined frames of images according to the detected images to form a recognition image set;

Extracting a feature vector of each recognition image in the recognition image set according to the reference position, and comparing the feature vector of each recognition image and the feature vector similarity of the reference feature vector to obtain a comparison result;

A recognition result is generated according to the comparison result, and the recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.
The computer device according to claim 9, wherein the detecting the detected image by using a foreign object detection model to obtain a detection result specifically comprises:

Preprocessing the detection image to obtain an image to be identified;

Inputting the to-be-recognized image into a full-difference-pyramid feature network recognition model for recognition, and obtaining a classification confidence map;

A detection result is obtained according to the classification confidence map, and the detection result includes detecting the presence of a foreign object in the image and the absence of a foreign object in the detection image.
The computer device according to claim 10, wherein the preprocessing the detected image to obtain an image to be identified specifically comprises:

Using a multi-scale retinal algorithm to globally enhance the detection image;

A Laplace operator is used to sharpen the detection image after the global enhancement processing to obtain an image to be identified.
The computer device according to claim 10, wherein before the step of inputting the to-be-recognized image into a full-difference-pyramid feature network recognition model for recognition and obtaining a classification confidence map, further comprising:

Acquiring a training sample set, and classifying the training images in the training sample set;

Training a full-difference network by using the classified labeled training images in the training sample set to obtain a target output vector;

The target output vector is used to train a pyramid feature network to obtain the full-difference-pyramid feature network recognition model.
The computer device according to claim 12, wherein the training of the full difference network specifically comprises:

Setting an initial convolution layer of the full-difference network, and using the largest pooling layer in the full-difference network for downsampling;

Setting up three layers of full-difference network modules, each of which includes a full-difference convolution layer and a full-difference activation layer, and the activation function in the full-difference activation layer adopts a linear activation function;

A transmission layer is provided between the fully differential network modules, and each of the transmission layers includes a normalization layer, a transmission activation layer, and an average pooling layer.
The computer device according to claim 12, characterized in that, in the process of training the full difference-pyramid feature network recognition model, the loss function is implemented using a FocalLoss function:

FL (p t ) =-(1-p t ) γ log (p t );

among them,
p t is the predicted value of the full difference-pyramid feature network recognition model for the training image, p is the estimated probability of the model for the training image y = 1, p ∈ [0,1], and y is the training The label value of the image, γ is the adjustment parameter.
One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Acquiring a detection image, detecting the detection image by using a foreign object detection model, and acquiring a detection result;

If the detection result is that there is a foreign object in the detection image, obtaining a position of the foreign object in the detection image as a reference position, and extracting a feature vector of the foreign object according to the reference position as a reference feature vector;

Acquiring successive predetermined frames of images according to the detected images to form a recognition image set;

Extracting a feature vector of each recognition image in the recognition image set according to the reference position, and comparing the feature vector of each recognition image and the feature vector similarity of the reference feature vector to obtain a comparison result;

A recognition result is generated according to the comparison result, and the recognition result includes confirmation as a foreign object and confirmation as a non-foreign object.
The non-volatile readable storage medium according to claim 15, wherein the detecting the detection image by using a foreign object detection model to obtain a detection result specifically comprises:

Preprocessing the detection image to obtain an image to be identified;

Inputting the to-be-recognized image into a full-difference-pyramid feature network recognition model for recognition, and obtaining a classification confidence map;

A detection result is obtained according to the classification confidence map, and the detection result includes detecting the presence of a foreign object in the image and the absence of a foreign object in the detection image.
The non-volatile readable storage medium according to claim 16, wherein the preprocessing the detection image to obtain an image to be identified specifically comprises:

Using a multi-scale retinal algorithm to globally enhance the detection image;

A Laplace operator is used to sharpen the detection image after the global enhancement processing to obtain an image to be identified.
The non-volatile readable storage medium according to claim 16, characterized in that before the step of inputting the image to be identified into a full-difference-pyramid feature network identification model to obtain a classification confidence map ,Also includes:

Acquiring a training sample set, and classifying the training images in the training sample set;

Training a full-difference network by using the classified labeled training images in the training sample set to obtain a target output vector;

The target output vector is used to train a pyramid feature network to obtain the full-difference-pyramid feature network recognition model.
The non-volatile readable storage medium according to claim 18, wherein the training of the full difference network specifically comprises:

Setting an initial convolution layer of the full-difference network, and using the largest pooling layer in the full-difference network for downsampling;

Setting up three layers of full-difference network modules, each of which includes a full-difference convolution layer and a full-difference activation layer, and the activation function in the full-difference activation layer adopts a linear activation function;

A transmission layer is provided between the fully differential network modules, and each of the transmission layers includes a normalization layer, a transmission activation layer, and an average pooling layer.
The non-volatile readable storage medium according to claim 18, wherein in the process of training the full-difference-pyramid feature network recognition model, the loss function is implemented using a FocalLoss function:

FL (p t ) =-(1-p t ) γ log (p t );

among them,
p t is the predicted value of the full difference-pyramid feature network recognition model for the training image, p is the estimated probability of the model for the training image y = 1, p ∈ [0,1], and y is the training The label value of the image, γ is the adjustment parameter.