CN113763471A

CN113763471A - Visual-based bullet hole detection method and system

Info

Publication number: CN113763471A
Application number: CN202110997123.7A
Authority: CN
Inventors: 张文广; 余新洲; 陈雨杭; 徐晓刚; 王军
Original assignee: Zhejiang Gongshang University; Zhejiang Lab
Current assignee: Zhejiang Gongshang University; Zhejiang Lab
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-12-07

Abstract

The invention discloses a visual-based bullet hole detection method and system, wherein the method comprises the following steps: s1, acquiring a target shooting hole image data set and carrying out hole marking; s2, constructing an ultra-lightweight network with convolution characteristics and texture characteristics fused in depth, and carrying out model training based on the bullet hole image data to obtain a detection model; s3, reasoning based on the detection model obtained by training to obtain a single-frame bullet hole detection result; s4, constructing a bullet hole integral graph of the current frame based on the multi-frame detection result; s5, matching and frame difference are carried out based on the current frame bullet hole integral graph and the previous frame bullet hole integral graph, and a new bullet hole of the current frame is obtained; the system comprises: the system comprises a focusing layer, a nested bottleneck layer, a convolution layer, a feature fusion and single-scale target regression module, a feature extraction unit and a feature fusion and optimization module, wherein the feature fusion and single-scale target regression module consists of a single-scale target regression subnetwork, the texture feature extraction unit and the feature fusion and optimization module; the invention reduces the resource consumption and has good detection accuracy and robustness.

Description

Visual-based bullet hole detection method and system

Technical Field

The invention relates to the technical field of image recognition, in particular to a visual-based bullet hole detection method and system.

Background

In the technical field of shooting training target scoring equipment, the traditional target scoring modes mainly include manual target scoring, photoelectric sensor target scoring and recently more image-based target scoring systems.

The manual target scoring efficiency is extremely low, and the target scoring cannot be carried out in real time under most conditions; although the accuracy of the target reporting mode of the photoelectric sensor is high, the cost is high, and the routine maintenance is complex; therefore, the target scoring method based on the visual algorithm is commonly used in the industry at present. The core problem of the visual target scoring scheme is how to accurately detect the newly added shooting bullet holes. At present, the mainstream bullet hole detection schemes all use a frame difference method or a multi-frame difference method, the method has low calculation resource consumption, but the method is sensitive to interference factors such as illumination, jitter and the like, so that the bullet hole detection accuracy based on the frame difference method is low. The bullet hole detection based on deep learning obtains better accuracy, but most detection networks have larger depths and consume a large amount of computing resources, so that a GPU or other AI computing hardware with higher matching cost is required in actual deployment.

Based on this, it is necessary to provide a bullet hole detection method with less resource consumption, high accuracy and high robustness.

Disclosure of Invention

In order to solve the problems of low detection rate and poor robustness of the existing method and the problem of sensitivity to illumination and jitter of a frame difference method, the invention provides a deep neural detection network with fusion of convolution characteristics and texture characteristics, coordinate regression of a bullet hole is carried out by adopting a single-scale Anchor, and the influence of illumination change on a result is reduced by multi-sample training and fusion with the texture characteristics; meanwhile, a post-processing algorithm based on the bullet hole integral graph is also provided, the irreversible characteristic of bullet hole traces is utilized, a result quantization gray scale graph is obtained through the bullet hole integral graph, and the newly added bullet hole coordinate is obtained through frame difference after the integral gray scale graph is registered. The specific technical scheme is as follows:

a visual-based bullet hole detection method comprises the following steps:

s1, acquiring a target shooting bullet hole image data set and marking a bullet hole;

s2, constructing an ultra-lightweight network with convolution characteristics and texture characteristics fused in depth and training, wherein the method comprises the following steps:

s21, conducting multi-layer scribing on the image tensor of the target image, and then recombining;

s22, inputting the recombined image tensor into a similar residual error structure, and extracting high-level semantic information;

s23, obtaining a suspected bullet hole area by using coordinates obtained by regression based on a single-scale target regression sub-network, extracting texture features from the suspected bullet hole area, fusing the texture features with convolution features input into the single-scale target regression sub-network, performing feature optimization through a channel attention mechanism, performing secondary classification on the optimized features for bullet holes and non-bullet holes, and performing deep fusion network training through a secondary classification result and labeled bullet holes;

s3, reasoning based on the trained depth fusion network to obtain a single-frame image bullet hole detection result;

s4, constructing a bullet hole integral image of the current frame based on the detection result of the multi-frame image in the video detection;

and S5, registering and performing frame difference based on the current frame bullet hole integral image and the previous frame bullet hole integral image, and finally obtaining the newly added bullet hole of the current frame.

Further, in S1, labeling the bullet hole of each target image with a rectangular frame, and labeling as category 0; determining a suspected bullet hole target frame (X, Y, W, H) by using the coordinates obtained by regression through the single-scale target regression subnetwork in the S23, wherein X and Y respectively represent the center coordinates of the suspected bullet hole target frame, and W and H respectively represent the width and height of the suspected bullet hole target frame; extracting texture features of the suspected bullet hole target frame; and carrying out secondary classification on the optimized features to obtain two classification confidence degrees of the bullet holes and the non-bullet holes, wherein a classification result 0 is represented as the bullet holes, a classification result 1 is represented as the non-bullet holes, and a prediction bullet hole target frame and a labeling bullet hole rectangular frame obtained through the two classification results are used for carrying out deep fusion network training.

Further, in S23, the obtained suspected bullet hole area is subjected to extended cropping, texture features are extracted from the cropped suspected bullet hole area, and the image of the cropped suspected bullet hole area is defined as f (x, y), [ x ∈ (0, M-1); y ∈ (0, N-1) ], where M is the pixel width of the region, N is the pixel height of the region, and the gray level co-occurrence matrix of the region is represented by P (i, j, d, θ), where i represents an x-axis index of the gray level co-occurrence matrix, j represents a y-axis index of the gray level co-occurrence matrix, d represents a distance between two pixels, and θ represents an included angle between two pixels; extracting texture features, comprising: extracting texture angle second moment, texture entropy, contrast, uniformity, X-axis gradient and Y-axis gradient;

texture angle second order moment is expressed as

Texture entropy is expressed as

Contrast is expressed as

Inverse differential matrix for uniformity

To represent；

The X-axis gradient of the region is expressed as f₅Denoted by the Y-axis gradient f₆Represents;

flattening the extracted texture features to one dimension and splicing to obtain a texture feature matrix f ═ f of the region₁,f₂,f₃,f₄,f₅,f₆]。

Further, in S23, the GIOU Loss is used as the Loss function of the perforation area, and the cross entropy Loss function L is used to evaluate the classification of the perforation:

wherein A and B represent the predicted and actual bullet hole regions, respectively, C represents the smallest rectangle that can enclose A and B, N represents the number of samples, y represents_iIndicates the classification class, p, of the ith sample_iThe prediction correct probability of the ith sample is shown, and the final total loss function is the sum of the two loss functions.

Further, in S2, the target image is scaled to a fixed size, 3-channel grayscale normalization is performed, the image tensor of the target image is converted from the RBG space to the RGBP space, and the converted image tensor is used as the input of S21.

Further, in S4, a bullet hole integral map is constructed according to the detection result of the multi-frame image in the video, the bullet hole integral map is obtained by quantizing and accumulating the results of the current frame and the previous frames, during initialization, a grid map with a fixed size is constructed based on the input image, each network initial value is 0, the coordinates of the bullet hole center point detected in each frame fall in the corresponding grid after quantization, the grid value is added by 1, the grid integral map is the result accumulated sum of the current frame and the previous frame, if the current frame number is less than the frame number of the previous frame group, the integral map is calculated by accumulating the frame from the 1 st frame to the current frame, the accumulated sum is multiplied by a fixed gray coefficient, and a gridded gray scale integral map is obtained, the integral map formula of the k frame is expressed as:

wherein g is_m(i) The number of the bullet holes of the mth frame in the (a, b) grid is represented, a represents the subscript of the x axis in the grid graph, b represents the subscript of the y axis in the grid graph, and the integral graph mode can effectively prevent the situation that the total number of the bullet holes of the kth frame is smaller than the total number of the bullet holes of the (k-1) th frame due to detection omission, so that the identification error caused by the detection of the network omission is avoided.

Further, in S5, matching and frame difference are performed between the bullet hole integral image of the current frame and the bullet hole integral image of the previous frame, so as to finally obtain a new bullet hole of the current frame;

the method comprises the steps of matching front and rear frames of bullet hole integral images, extracting features by adopting a common scale-invariant feature transformation Sift mode, and then calculating registration transpose matrixes T (a, b) of the two images so as to match the images. In experiments, the fact that the time interval between two frames before and after is short, the integral images of the two frames are not distorted, the obtained registration transposition matrix T (a, b) is grid deviation of the k frame based on the k-1 frame in the X axis and the Y axis, so that the k-1 frame is taken as a reference frame, and the integral images of the k frame after registration are shown as

And (3) performing frame difference on the bullet hole integral image, subtracting the registered k-frame integral image from the k-1 frame integral image to obtain a difference image, and binarizing the difference image by adopting a fixed threshold value delta:

and delta is a binaryzation threshold value, the selection of the threshold value is related to the accumulated frame number of the integral graph, grids (a and b) with the grid value of 1 are used as grid positions of the newly added bullet holes, then the grid positions are converted into an original graph, and finally the coordinates of the central points of the newly added bullet holes are obtained.

A visual-based bullet hole detection system comprises a convolution characteristic and textural characteristic depth fusion network and a bullet hole integral graph post-processing module, wherein the fusion network is a lightweight network and comprises a focusing layer, a nested bottleneck layer, a convolution layer, a characteristic fusion and single-scale target regression module;

the focusing layer is used for carrying out multi-layer scribing on the image tensor of the target image and then recombining;

the nested bottleneck layer is of a similar residual error structure and is used for extracting high-level semantic information from the recombined image tensor;

the feature fusion and single-scale target regression module comprises a single-scale target regression subnetwork, a texture feature extraction unit and a feature fusion and optimization module;

the single-scale target regression subnetwork comprises a two-dimensional convolution layer and an activation layer, wherein the two-dimensional convolution layer is connected with a nested bottleneck layer and outputs convolution characteristics to the activation layer, the activation layer adopts a Sigmoid operator, and the calculation formula is

Obtaining a suspected bullet hole area through the coordinates output by the activation layer;

the texture feature extraction unit extracts texture features from the suspected bullet hole area;

the characteristic fusion and optimization module comprises a fusion module, a pooling layer, a normalization layer and a full-connection layer, the extracted texture characteristics and convolution characteristics output by the two-dimensional convolution layer are subjected to characteristic fusion through the fusion module, characteristic optimization is carried out through a channel attention mechanism formed by the pooling layer and the normalization layer, the optimized characteristics are subjected to second classification of elastic holes and non-elastic holes through the full-connection layer, and training of a deep fusion network is carried out through the classification results and the elastic holes marked by the training set; and reasoning based on the trained deep fusion network to obtain a single-frame bullet hole detection result.

Furthermore, the focusing layer comprises a scribing unit, a combination unit, a two-dimensional convolution layer, a BN layer and an activation layer which are sequentially connected, the scribing unit conducts multi-layer scribing on the image tensor of the target image, the combination unit combines the multi-layer scribing, and the activation layer

Wherein x represents the input image tensor, max (0, x + n) represents the part of the image tensor x + n which is greater than 0, the part which is not greater than 0 is equal to 0, min (m, x) represents the part of the image tensor which is less than m, the part which is greater than or equal to m is directly equal to m, wherein m and n are constants which are obtained according to experience, and m and n are constants>n。

Furthermore, nested bottleneck layer, including bottleneck module, convolution layer, concatenation unit, BN layer and activation layer, the image tensor is through after convolution layer and a set of nested bottleneck module, through the concatenation unit, splices with the image tensor that only passes through the convolution layer, and rethread BN layer, activation layer and convolution layer carry out the output, the structure of bottleneck module is for the input after two one-dimensional convolution layers, adds with former input to the higher level semantic information of better extraction. Considering that the detection scene of the bullet hole is relatively single, only one bottleneck module is arranged for reducing the calculation amount.

The bullet hole integral image post-processing module comprises a bullet hole integral image constructing module, an inter-frame integral image registration module and an integral image frame difference module;

the bullet hole integral image constructing module constructs an integral image according to the bullet hole detection results of the current frame and the previous frames;

the inter-frame integral image registration module is used for registering integral images of two adjacent frames;

and the integral image frame difference module is used for carrying out frame difference on the registered integral image to finally obtain the newly added bullet hole coordinate of the current frame.

The invention has the advantages and beneficial effects that:

the convolution characteristics are extracted by constructing an extremely light convolution network, the parameter quantity of the model is only one fifth of that of a current mainstream detection model Yolov5-s, and finally the mAP is only one percent lower than that of the Yolov 5-s; the false detection rate is effectively reduced by adopting a mode of optimally fusing high-level semantic features such as convolution and the like with feature depths such as texture, gray level statistics and the like; by adopting the coordinate regression subnetwork regression bullet hole coordinate based on the single-scale Anchor, false detection with the maximum size or the minimum size can be effectively filtered, model training is accelerated, and the convergence speed of the model is improved; the 'neglect' of the elastic hole with the confidence coefficient near the detection threshold value can be effectively prevented by the mode of the elastic hole grid integral graph; the interference problem caused by target surface shaking can be effectively reduced by a mode of matching the bullet hole grid integral graph; and accurately acquiring the coordinates of the newly added bullet holes by a grid integral image frame difference method. The detection algorithm consumes less than 5ms in a single frame on NVIDIA 2080Ti, consumes less than 40ms in a single frame on NVIDIA Tx2 embedded platform, and is low in deployment cost.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram of a detection network framework in the present invention.

FIG. 3a is a network framework diagram of the Focus module of the present invention.

Fig. 3b is a tensor scribing schematic diagram of the Focus module in the invention.

Fig. 4 is a network framework diagram of the BCSP module of the present invention.

Fig. 5 is a flowchart of the Detect module network in the present invention.

Fig. 6 is a frame diagram of texture feature extraction in the present invention.

FIG. 7 is a graph of the integration effect of the grid results of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

As shown in fig. 1, a visual-based bullet hole detection method includes the following steps:

s1: acquiring a target shooting bullet hole image data set and marking the bullet holes; and marking the bullet holes of each target shooting image by using a rectangular frame, and marking the bullet holes as class 0.

S2: an ultra-light network with convolution features and texture features deeply fused is constructed, and as shown in fig. 2, the ultra-light network comprises a focusing layer (Focus module), a nested bottleneck layer (BCSP module), a convolution layer (Conv module), a feature fusion and single-scale target regression module (Detect module), a data set obtained based on S1 is used for preprocessing a current image frame, and then the preprocessed result is sequentially sent to the Focus module, the Conv convolution module, the BCSP module, the Conv convolution module and the Detect module for training.

And the image frame is preprocessed, namely the image is zoomed to a fixed size of 960 x 960, 3-channel gray scale normalization is carried out, then color space conversion is carried out, and the image tensor is converted from RBG space to RGBP space.

The Focus module is structurally shown in FIG. 3a, the image tensor of the RGBP space is subjected to four-layer scribing and then recombination, and then passes through a 32 × 32 two-dimensional convolution layer, then is input into a Batch Normalization layer, and finally passes through a Hardswish activation layer.

Four-layer dicing is to interleave down sample a tensor of 960 x 3 and to dice a tensor of 4 x 3 as shown in fig. 3 b.

The Hardswish activation layer adopts an activation function universal to a detection model:

wherein x represents the input tensor, max (0, x + n) represents the part of the solved tensor x + n greater than 0, the part not greater than 0 is equal to 0, min (m, x) represents the part of the solved tensor less than m, the part greater than or equal to m is directly equal to m, m and n are constants obtained according to experience, and m and n are constants obtained according to experience>n, when m is 6 and n is 3 in this embodiment, the effect is obtainedAnd (4) optimizing.

The BCSP module, whose structure is shown in fig. 4, is a similar residual error structure bottleeck CSP (CSP Bottleneck layer), where bottleeck belongs to the similar residual error structure, and is input to pass through two one-dimensional convolutional layers and then added with itself. The BottleneckCSP layer is a nested Bottleneck convolution, then splicing, finally passing through a convolution layer, then passing through a Batch Normalization, and finally passing through a LeakyRelu activation convolution layer. It should be noted that a plurality of boltleeck modules can be nested in the boltleeck csp layer, and the boltleeck module can better extract high-level semantic information, but in consideration of a relatively single bullet hole detection scene, in order to reduce the amount of computation, one boltleeck is adopted in this example.

The structure of the feature fusion and single-scale target regression module Detect is shown in figure 5, a coordinate regression subnetwork based on a single-scale Anchor is adopted, coordinates obtained by regression are cut from an original image to obtain a suspected bullet hole area, expansion is carried out to extract texture features, the suspected bullet hole area is fused with convolution features, feature optimization is carried out through a channel attention module, and finally classification confidence of a target is obtained through a full connection layer.

The coordinate regression subnetwork based on the single-scale Anchor comprises a two-dimensional volume and an activation layer, wherein the activation layer adopts a Sigmoid operator calculation formula as

Wherein the Anchor uses a single scale Anchor frame, for example 960 x 960 image, based on prior knowledge, when the camera is within 5 meters of the target surface, the size of the bullet holes is about 5 x 5 to 15 x 15 pixels in the 960 x 960 image, therefore the Anchor uses [5,5,10,10,15 ] pixels]Anchor boxes of these three dimensions. Through a coordinate regression sub-network, we can regress to obtain a suspected bullet hole target frame, as shown in fig. 5, the target frame of the bullet hole is represented by (X, Y, W, H), where X, Y respectively represent the center coordinates of the bullet hole target frame, and W, H respectively represent the width and height of the bullet hole target frame.

As shown in fig. 6, the texture feature is extended and cropped on the original image based on the suspected bullet hole target frame, and the extension ratio is 1: and 1.2, extracting characteristics such as texture angle second moment, texture entropy, contrast, uniformity, X-axis gradient, Y-axis gradient and the like from the cutting area, and splicing the characteristics into a multi-dimensional characteristic vector.

Assuming that the image of the suspected bullet hole expansion area is f (x, y), [ x ∈ (0, M-1); y is an element (0, N-1)]Where M is the pixel width of the region and N is the pixel height of the region; the gray level co-occurrence matrix of the region is represented by P (i, j, d, theta), where i represents the x-axis index of the gray level co-occurrence matrix, j represents the y-axis index of the gray level co-occurrence matrix, and d is two pixels (x₁,y₁) And (x)₂,y₂) θ is the angle between two pixels. The texture angle second order moment is expressed as

Texture entropy is expressed as

Contrast is expressed as

Inverse differential matrix for uniformity

Represents; the X-axis gradient of the region is expressed as f₅Denoted by the Y-axis gradient f₆And (4) showing. Finally, the texture feature matrix of the region is obtained by flattening the 6 matrixes to one dimension and splicing, and the formula is expressed as f ═ f₁,f₂,f₃,f₄,f₅,f₆]。

The feature fusion and optimization module is shown in the lower half of fig. 5, and is used for splicing the texture features and convolution features extracted by the network, performing feature optimization by using an avgpoulding + SoftMax channel attention mechanism, and finally performing secondary classification by using a full connection layer FC, wherein a classification result 0 is represented as a bullet hole, and a classification result 1 is represented as a non-bullet hole.

And in the model training, the GIOU Loss is used as a Loss function of the target frame, and the classification of the bullet holes is evaluated by adopting a cross entropy Loss function L. The calculation formulas are respectively as follows:

a and B respectively represent a predicted target frame and a real target frame, and C is a minimum rectangle capable of wrapping A and B; n denotes the number of samples, y_iIndicates the classification class, p, of the ith sample_iThe prediction correct probability of the ith sample is shown, and the final total loss function is the sum of the two loss functions.

S3: reasoning based on a detection model obtained by training of S2 to obtain a single-frame image bullet hole detection result;

deducing to obtain a bullet hole detection result based on a classification detection model, wherein the detection result of the kth frame image is represented as g_k(i)＝(x_i,y_i)；{0≤i＜N_kIn which x_iAnd y_iRespectively represents the pixel coordinates of the detected ith bullet hole center point in the image, N_kThe total number of the bullet holes detected in the k frame is shown.

S4: constructing a bullet hole integral chart according to the detection result of S3 of the multi-frame image in the video, as shown in FIG. 7;

the bullet hole integral image is obtained by the quantitative accumulation of the results of the current frame and the previous frames. When the algorithm is initialized, a grid graph with a fixed size is constructed based on an input image, taking the resolution of 960 x 960 as an example, the image is grid-formed into 96 x 96, the step length of each grid is 10 x 10, the initial value of each grid is 0, the coordinates of the central point of each detected spring hole fall in the corresponding grid after quantization, the grid numerical value is increased by one, the grid integral graph is the result accumulation sum of the current frame and the previous 19 frames, and if the current frame number is less than 19, the calculation of the integral graph is from the 1 st frame to the next frame. Accumulating and multiplying a fixed gray coefficient to obtain a gridded gray integral image, wherein the integral image formula of the k frame is as follows:

wherein g is_m(i) The number of the bullet holes of the m-th frame detection result falling in the (a, b) grid is shown, a represents the subscript of the x axis in the grid graph, and b represents the subscript of the y axis in the grid graph. The integral graph mode can effectively prevent the situation that the total number of the bullet holes of the kth frame is smaller than the total number of the bullet holes of the (k-1) th frame due to detection missing, so that the identification error caused by detection of network missing is avoided.

S5: and (4) registering the bullet hole integral image of the current frame and the bullet hole integral image of the previous frame based on the bullet hole integral image of the current frame obtained in the step (S4), and performing frame difference to finally obtain the newly added bullet hole of the current frame.

The front and back frames of the integral image are matched by using common Sift (Scale-invariant feature transform) to extract features, and then the registration transposition matrix T (a, b) of the two images is calculated, so that image matching is carried out. As the target surface shakes due to factors such as shooting or longitudinal wind in the shooting process, the integral images of the front and rear frames do not correspond to the same position in the original image in the same grid, and therefore, the integral images of the front and rear frames are registered, false detection caused by image shaking can be effectively inhibited, and the robustness of the algorithm is improved. In experiments, the fact that the integral images of two frames are not distorted due to short time intervals of the two frames before and after the two frames is found, the obtained registration matrix T (a, b) is the grid deviation of the k frame based on the k-1 frame in the X axis and the Y axis, and therefore the integral image of the k frame after the registration by taking the k-1 frame as a reference frame is shown as

Integrating image frame difference, subtracting the k-1 frame integrating image from the registered k frame integrating image to obtain a difference image, and then carrying out binarization on the difference image by using a fixed threshold value delta, wherein the selection of the threshold value is related to the accumulated frame number of the integrating image, and a binarization image formula can be expressed as

And delta is a binarization threshold value, the grids (a and b) with 1 are newly added bullet hole grid positions, and then the grid positions are converted into an original image to finally obtain the coordinates of the central points of the newly added bullet holes.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A visual-based bullet hole detection method is characterized by comprising the following steps:

s2, constructing a convolution feature and texture feature depth fusion network and training, wherein the method comprises the following steps:

s23, obtaining a suspected bullet hole area through coordinates obtained by regression by adopting a single-scale target regression sub-network, extracting texture features from the suspected bullet hole area, fusing the texture features with convolution features input into the single-scale target regression sub-network, performing feature optimization through a channel attention mechanism, performing secondary classification on the optimized features for bullet holes and non-bullet holes, and performing deep fusion network training through a secondary classification result and labeled bullet holes;

s4, constructing a bullet hole integral graph of the current frame image based on the detection result of the multi-frame image;

and S5, registering the bullet hole integral image of the current frame image and the bullet hole integral image of the previous frame image, performing frame difference, and finally obtaining the newly added bullet hole of the current frame.

2. The vision-based bullet hole detection method according to claim 1, wherein said S1 is performed by labeling the bullet hole of the target image with a rectangular frame; determining a suspected bullet hole target frame (X, Y, W, H) by using the coordinates obtained by regression through the single-scale target regression subnetwork in the S23, wherein X and Y respectively represent the center coordinates of the suspected bullet hole target frame, and W and H respectively represent the width and height of the suspected bullet hole target frame; extracting texture features of the suspected bullet hole target frame; and carrying out secondary classification on the optimized features to obtain two classification confidence degrees of the bullet holes and the non-bullet holes, and carrying out deep fusion network training on a predicted bullet hole target frame and a marked bullet hole rectangular frame which are obtained according to two classification results.

3. The visual-based bullet hole detection method according to claim 1, wherein in said S23, the obtained suspected bullet hole area is subjected to extended cropping, a texture feature is extracted from the cropped suspected bullet hole area, and the image of the cropped suspected bullet hole area is represented by f (x, y), [ x e (0, M-1); y ∈ (0, N-1) ], where M is the pixel width of the region, N is the pixel height of the region, and the gray level co-occurrence matrix of the region is represented by P (i, j, d, θ), where i represents an x-axis index of the gray level co-occurrence matrix, j represents a y-axis index of the gray level co-occurrence matrix, d represents a distance between two pixels, and θ represents an included angle between two pixels; extracting texture features, comprising: extracting texture angle second moment, texture entropy, contrast, uniformity, X-axis gradient and Y-axis gradient;

texture angle second order moment is expressed as

Texture entropy is expressed as

Contrast is expressed as

Inverse differential matrix for uniformity

Represents;

4. The vision-based bullet hole detection method according to claim 1, wherein in S23, giouloloss is used as a loss function of the bullet hole area, and a cross entropy loss function L is used to evaluate the classification of the bullet hole:

5. The vision-based bullet hole detection method of claim 1, wherein in S2, the target image is scaled to a fixed size and subjected to 3-channel gray scale normalization, and then the image tensor of the target image is converted from RBG space to RGBP space, and the converted image tensor is used as the input of S21.

6. The method according to claim 1, wherein in S4, a bullet hole integral map is constructed for the detection results of multiple frames of images in the video, the bullet hole integral map is obtained by quantizing and accumulating the results of the current frame and the previous frames, when initializing, a gridding map is constructed based on the input image, each network initial value is 0, the coordinates of the bullet hole center point detected in each frame fall in the corresponding grid after quantization, the grid value is added by 1, the grid integral map is the result accumulated sum of the current frame and the previous frame, if the current frame number is less than the frame number of the previous frame, the integral map is calculated by accumulating the frame from frame 1 to the current frame backwards, and the accumulated sum is multiplied by a gray scale coefficient to obtain a gridded gray scale integral map:

wherein g is_m(i) The number of the bullet holes of the m-th frame detection result falling in the (a, b) grid is shown, a represents the subscript of the x axis in the grid graph, and b represents the subscript of the y axis in the grid graph.

7. The vision-based bullet hole detection method according to claim 1, wherein in S5, the bullet hole integral map of the current frame is matched with the bullet hole integral map of the previous frame by frame difference, and finally the newly added bullet hole of the current frame is obtained;

matching the front frame and the rear frame of the bullet hole integral image, extracting the characteristics by adopting a common scale-invariant characteristic transformation Sift mode, then calculating the registration transpose matrixes T (a, b) of the two images so as to carry out image matching, and obtaining a registration transpose matrix T (a, b)a, b) is k frames based on grid deviation of k-1 frames in the X axis and the Y axis, the k-1 frames are used as reference frames, and the registered k frame integral diagram is shown as

And (3) performing frame difference on the bullet hole integral image, subtracting the registered k-frame integral image from the k-1 frame integral image to obtain a difference image, and performing binarization on the difference image by adopting a threshold value delta:

8. A visual-based bullet hole detection system comprises a convolution characteristic and textural characteristic depth fusion network and a bullet hole integral graph post-processing module, and is characterized in that the fusion network is a lightweight network and comprises a focusing layer, a nested bottleneck layer, a convolution layer, a characteristic fusion and single-scale target regression module;

the single-scale target regression subnetwork comprises a two-dimensional convolution layer and an activation layer, wherein the two-dimensional convolution layer is connected with the nested bottleneck layer and outputs convolution characteristics to the activation layer, and a suspected bullet hole area is obtained through coordinates output by the activation layer;

the characteristic fusion and optimization module comprises a fusion module, a pooling layer, a normalization layer and a full-connection layer, the extracted texture characteristics and convolution characteristics output by the two-dimensional convolution layer are subjected to characteristic fusion through the fusion module, characteristic optimization is carried out through a channel attention mechanism formed by the pooling layer and the normalization layer, the optimized characteristics are subjected to second classification of elastic holes and non-elastic holes through the full-connection layer, and training of a deep fusion network is carried out through the classification results and the elastic holes marked by the training set; reasoning is carried out based on the trained deep fusion network to obtain a bullet hole detection result;

9. The vision-based bullet hole detection system according to claim 8, wherein said focus layer comprises a scribing unit, a combining unit, a two-dimensional convolution layer, a BN layer and an active layer, which are connected in sequence, the scribing unit performs multi-layer scribing on the image tensor of the target image, the combining unit combines the multi-layer scribing, and the active layer combines the multi-layer scribing

10. The vision-based bullet hole detection system according to claim 8, wherein said nested bottleneck layer comprises a bottleneck module, a convolution layer, a splicing unit, a BN layer and an active layer, an image tensor passes through the convolution layer and a set of nested bottleneck modules, and then is spliced with the image tensor passing through the convolution layer only by the splicing unit, and then is output by the BN layer, the active layer and the convolution layer, and said bottleneck module is structured such that an input passes through two one-dimensional convolution layers and then is added to the original input.