CN115410196A

CN115410196A - Small target identification system based on complex background

Info

Publication number: CN115410196A
Application number: CN202211342007.2A
Authority: CN
Inventors: 杨扬; 邓荣春; 李柯; 李睿; 胡燚; 周丹; 胡荣群; 沈克永
Original assignee: Nanchang Institute of Technology
Current assignee: Nanchang Institute of Technology
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2022-11-29
Anticipated expiration: 2042-10-31
Also published as: CN115410196B

Abstract

The invention provides a small target identification system based on a complex background. A small target identification system based on a complex background comprises a picture acquisition module, an image enhancement module, a picture noise filtering module and a target identification module. The feature map is judged by using the prior frame, when a target exists in the region mapped by the prior frame, classification can be directly carried out, complex calculation is not required to be carried out through an RPN (resilient packet network), and the calculation amount is reduced.

Description

Small target identification system based on complex background

Technical Field

The invention relates to the field of target identification, in particular to a small target identification system based on a complex background.

Background

The object recognition is widely applied to various fields, for example, in the bio-survey, when people need to survey and count the creatures in a certain region, a hidden camera is usually installed in the region to monitor the places where the creatures often appear, and then people watch the monitoring video to observe the biological object, but the scheme is obviously time-consuming and labor-consuming, so that the object recognition is applied to the situation, the picture is processed through a built-in object recognition algorithm of a computer, the biological object is recognized, people do not need to watch the video in a time-consuming manner, and the scheme is obviously more practical.

However, during the biological survey, some biological types are very small, the regional environment is relatively complex, the acquired picture has a relatively complex background, but the picture containing the small target is detected by a target identification algorithm, the picture characteristics are continuously aggregated in continuous convolution to cause loss, and the identification precision is obviously reduced.

Disclosure of Invention

In order to overcome the defect that the identification precision of the small target is reduced under the existing complex background, the small target identification system based on the complex background is provided, which can reduce the calculation amount and improve the workload of the identification precision.

In order to solve the above technical problem, the present invention provides a small target recognition system based on a complex background, including:

the image acquisition module is used for acquiring image information and sending the image information to the image enhancement module;

the image enhancement module is used for receiving the picture information from the picture acquisition module, removing a background in the picture information through a threshold segmentation technology, and sending the processed picture information to the picture noise filtering module;

the picture noise filtering module is used for receiving the processed picture information and removing noise of the processed picture information by a median filtering method;

and the target identification module is used for acquiring the image information after noise reduction and marking and identifying the target in the image.

Specifically, the target identification module comprises a neural network model storage unit, a neural network training unit, a neural network model selection unit, a small target identification unit and a labeling unit;

the neural network model storage unit is used for storing a trained neural network model and a neural network model template, the neural network model adopts a fast-RCNN algorithm and is divided into a convolutional layer sub-template, an RPN (resilient packet network), an ROI Pooling layer and a classification layer, wherein the convolutional layer sub-template is formed by taking a VGG16 network as a base and adding an SE (selective analysis) module processing layer, a sampling layer and a characteristic fusion layer, the VGG16 network removes a final Pooling layer, a full connection layer and a soft-max classification layer, and the VGG16 network at the moment is formed by 13 conv + relu layers and 4 Pooling layers;

the neural network training unit is used for loading a neural network model template and carrying out parameter training on the neural network model template according to a training data set;

the neural network model selecting unit is used for responding to the operation of a user to select a corresponding trained neural network model from the neural network model storage unit and sending the selected neural network model to the small target recognition unit;

the small target recognition unit is used for loading the trained neural network model and sending the acquired picture information into the trained neural network model, and the trained neural network model processes the picture information to generate a target type label and a positioning frame coordinate;

the labeling unit is used for receiving the coordinates of the positioning frame and labeling the picture according to the coordinates of the positioning frame, and the labeled content comprises the positioning frame and the target type label.

Specifically, the steps of processing the picture information by the trained neural network model are as follows:

s1: the picture information passes through 13 conv + relu layers, convolution calculation and nonlinear transformation are carried out on the picture, convolution kernels are 3 x3, 4 Pooling layers are passed in the period, and downsampling is carried out through a 2 x2 filter;

s2: selecting feature graphs generated by the conv2 layer and the conv13 layer, and processing the feature graphs through the SE module processing layer respectively, wherein the generated feature graphs are marked as F1 and F2 respectively;

s3: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and the up-sampling is spliced with the F1 to generate a characteristic diagram F3;

s4: sending the feature map F3 into an RPN network, and generating a prediction frame coordinate containing a target by the RPN network according to the feature map F3;

s5: the ROI Pooling layer generates a characteristic diagram F4 by mapping the coordinates of the prediction frame to a characteristic diagram F3, dividing the mapped area into parts with the same size, performing max Pooling on different parts, and sending the characteristic diagram F4 to the classification layer;

s6: and the classification layer classifies the targets and generates target type labels and positioning frame coordinates.

Specifically, the method further comprises an a priori block generation module;

the prior frame generation module is used for generating a region with the highest target existence probability according to the pictures of the training data set and marking the region as a prior frame;

the prior frame generation module generates the prior frame by the following steps:

t1: the total number of pictures in the training data set is M, a picture in the training data set is selected, and coordinates (x) of all pixel points in a marking frame are obtained according to the coordinate position of the marking frame on the picture _i ，y _j ) And storing and recording as a set delta, wherein I =1,2,3, I is the maximum value of the x coordinate of the corresponding picture pixel point, J =1,2,3, J is the y coordinate of the corresponding picture pixel pointIs calculated, a weight w is assigned to each pixel point coordinate in the set delta _n N =1,2,3 · · N, N being the total number of all elements in the set δ, record k =1;

t2: selecting a picture in the next training data set, and sequentially acquiring pixel point coordinates (x) in a mark frame according to the coordinate position of the mark frame on the picture _i ，y _j ）；

T3: assigning (k + 1) to k, and acquiring pixel point coordinate (x) each time _i ，y _j ) Traversing the whole set delta and matching the pixel point coordinate; if the matching is successful, w is matched _n +1 weight w assigned to pixel coordinate corresponding to successful matching _n (ii) a If the matching fails, the pixel point coordinate is added to the set delta, and the weight w is distributed to the pixel point coordinate _n ；

T4: judging the value of k, and if the k is less than or equal to M, returning to T2; if the k is not more than or equal to M, entering T5;

t5: coordinates (x) of all pixel points in the set delta _i ，y _j ) By weight w _n Sorting from big to small and selecting the weight w _n Pixel point coordinate (x) of preceding e _i ，y _j ) And is denoted as a set μ, where e is a weight confidence threshold;

t6: selecting coordinates (x) of all pixel points in the set mu _i ，y _j ) X in (2) _i And will select x _i Arranging according to the sequence from big to small, and then selecting the maximum value x _max And the minimum value x _min (ii) a Selecting coordinates (x) of all pixel points in the set mu _i ，y _j ) Y in (1) _j And will select y _j Arranged from big to small, and then the maximum value y is selected _max And minimum value y _min (ii) a In terms of coordinates (x) _min ，y _min ) And coordinates (x) _max ，y _max ) For the lower left corner coordinates and the upper right corner coordinates, a prior box is generated, which is stored in coordinate form.

Specifically, the step T2 specifically includes:

t2.1: then selecting the next picture in the training data set, and marking the picture according to the pictureThe coordinate position of the mark frame sequentially acquires the coordinates (x) of pixel points in the mark frame _i ，y _j ）；

T2.2: judging whether k is less than or equal to 0.5M, if k is less than or equal to 0.5M, entering T3;

if k is not more than 0.5M, entering T2.3;

t2.3: calculating the number alpha of all pixel coordinate points in the set delta, judging whether alpha is more than or equal to A, wherein A is the number of the pixel coordinate points which are allowed to be the maximum in the set delta, and if alpha is more than or equal to A, entering T3; if the alpha is not more than or equal to A, entering T2.4;

t2.4: coordinates (x) of all pixel points in the set delta _i ，y _j ) By weight w _n Sorting from big to small, deleting weight w _n Pixel point coordinate (x) at back f _i ，y _j ) Go to T3, where f is the meaningless weight threshold.

Specifically, the step S3 specifically includes the following steps:

s3.1: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and the up-sampling is spliced with the F1 to generate a characteristic diagram F3;

s3.2: mapping the prior frame to a feature map F3, sending a feature map F5 corresponding to a region selected after mapping to a CNN-based deep learning model, judging whether a target exists in the region, if so, sending the feature map F5 to a classification layer, and entering S6; if no target exists, the process goes to S4.

Specifically, the image acquisition module is configured as a camera, and acquires images every 15 s.

Specifically, the a is set by a human.

Compared with the prior art, the invention has the following advantages: 1. the feature map is judged by using the prior frame, when a target exists in the region mapped by the prior frame, classification can be directly carried out, complex calculation is not required to be carried out through an RPN (resilient packet network), and the calculation amount is reduced.

2. The invention adopts the OTSU method to enhance the image, removes the background in the image information according to the threshold value, reduces the interference of the complex background to the small target identification, and improves the accuracy of the small target identification.

3. According to the invention, the image noise filtering module is used for carrying out noise reduction processing on the image information, so that the interference of noise on subsequent identification is reduced, the edge characteristics can be kept, and the accuracy of small target identification is further improved.

4. According to the method, the characteristic diagram of the second convolutional layer and the characteristic diagram of the thirteenth convolutional layer are selected, the shallow layer characteristics of the characteristic diagrams are strengthened through the SE module, and the two processed characteristic diagrams are spliced, so that the generated characteristic diagrams can keep the shallow layer characteristics, and the problem that the small target is small in size and the recognition accuracy is reduced is avoided.

Drawings

Fig. 1 is a schematic structural diagram of a small target recognition system based on a complex background adopted in the present invention.

Fig. 2 is a schematic structural diagram of a target identification module used in the present invention.

Fig. 3 is a schematic structural diagram of a neural network model adopted in the present invention.

FIG. 4 is a flow chart of a generate a prior block employed by the present invention.

Fig. 5 is a flowchart illustrating processing of picture information according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The invention provides a small target identification system based on a complex background, which comprises a picture acquisition module, an image enhancement module, a picture noise filtering module and a target identification module, as shown in figure 1:

the image enhancement module is used for receiving the picture information from the picture acquisition module, removing a background in the picture information through a threshold segmentation technology, and sending the processed picture information to the picture noise filtering module; the threshold segmentation technology selects an OTSU method, namely a maximum inter-class variance method, a threshold value which enables the difference between the target and the background to be maximum is obtained through calculation, the background in the image information is removed according to the threshold value, and the interference of a complex background on the identification of the small target is reduced;

the picture noise filtering module is used for receiving the processed picture information and removing noise of the processed picture information by a median filtering method; the image information after threshold segmentation processing has more noise at this time, and the gray value of the pixel point of each pixel point is the median of the gray values of the pixel points in the neighborhood of the pixel point according to a median filtering method, so that the interference of the noise on subsequent identification can be remarkably reduced, and the edge characteristics can be kept;

and the target identification module is used for acquiring the image information after noise reduction, and marking and identifying the target in the image.

As shown in fig. 2, the target identification module includes a neural network model storage unit, a neural network training unit, a neural network model selection unit, a small target identification unit, and a labeling unit;

the neural network model storage unit is used for storing a trained neural network model and a trained neural network model template, as shown in fig. 3, the neural network model adopts a fast-RCNN algorithm and is divided into a convolutional layer sub-template, an RPN network, a ROI Pooling layer and a classification layer, wherein the convolutional layer sub-template is formed by taking a VGG16 network as a base and adding a SE module processing layer, a sampling layer and a feature fusion layer, wherein the VGG16 network removes a final Pooling layer, a full connection layer and a soft-max classification layer, and at the moment, the VGG16 network consists of 13 conv + relu layers and 4 Pooling layers, wherein the SE module can enhance the shallow layer feature of a feature map, avoid loss in the identification process and improve the identification accuracy of a small target;

the specific steps of the parameter training are as follows:

x1: the method comprises the steps of sorting the shot pictures with small targets, carrying out classification and naming according to different types of the targets, and then dividing the pictures into a test data set and a training data set according to a ratio of 1;

x2: carrying out frame marking on targets in the test data set and training data set pictures, recording corresponding marking frame coordinates, and storing the marking frame coordinates in a form of combination of the lower left corner coordinates and the upper right corner coordinates of the marking frame coordinates;

x3: loading a neural network model template by a neural network training unit, and sending training data set picture information, corresponding type information and marking frame coordinate information into the neural network model template;

x4: training internal parameters of the neural network model template through training data set information;

x5: and detecting the trained neural network model through the test data set.

the marking unit is used for receiving the coordinates of the positioning frame and marking on the picture according to the coordinates of the positioning frame, and the marked content comprises the positioning frame and a target type label;

as shown in fig. 1, further comprises an a priori block generation module;

as shown in fig. 4, the step of generating the prior frame by the prior frame generation module is:

t1: the total number of the pictures in the training data set is M, a picture in the training data set is selected, and coordinates (x) of all pixel points in a marking frame are obtained according to the coordinate position of the marking frame on the picture _i ，y _j ) And storing and recording as a set delta, wherein I =1,2,3 · · I, I is the maximum value of the x coordinate of the corresponding picture pixel point, J =1,2,3 · · J, J is the maximum value of the y coordinate of the corresponding picture pixel point, and each pixel point coordinate in the set delta is assigned with a weight w _n N =1,2,3 · · N, N being the total number of all elements in the set δ, record k =1;

t2.1: selecting a picture in the next training data set, and sequentially acquiring pixel point coordinates (x) in a mark frame according to the coordinate position of the mark frame on the picture _i ，y _j ）；

if k is not more than 0.5M, entering T2.3;

t2.3: calculating the number alpha of all pixel coordinate points in the set delta, and judging whether alpha is more than or equal to A, wherein A is the maximum allowed number of pixel coordinate points in the set delta, the A is manually set, and if the alpha is more than or equal to A, entering T3; if the alpha is not more than or equal to A, entering T2.4;

t2.4: coordinates (x) of all pixel points in the set delta _i ，y _j ) By weight w _n Sorting from big to small, setting f =30%, and deleting weight w _n The last 30% of the pixel coordinates (x) _i ，y _j ) Entering T3, wherein f is a meaningless weight threshold value, and deleting the coordinates of some pixel points with lower weight when more elements exist in the set delta, so as to reduce the calculated amount;

t3: assigning (k + 1) to k, and acquiring pixel point coordinate (x) each time _i ，y _j ) Traversing the whole set delta and matching the pixel point coordinate; if the matching is successful, w is _n +1 weight w assigned to pixel coordinate corresponding to successful matching _n (ii) a If the matching fails, the pixel point coordinate is added to the set deltaAnd assigns a weight w thereto _n The method is used for counting the areas with the highest probability of occurrence of the small targets;

t4: judging the value of k, and returning to T2 if the value of k is less than or equal to M; if the k is not more than or equal to M, entering T5;

t5: coordinates (x) of all pixel points in the set delta _i ，y _j ) By weight w _n Sorting from big to small, setting e =40%, and selecting weight w _n First 40% of the pixel coordinates (x) _i ，y _j ) And is recorded as a set mu, wherein e is a weight confidence threshold;

t6: selecting coordinates (x) of all pixel points in the set mu _i ，y _j ) X in (2) _i And will select x _i Arranging according to the sequence from big to small, and then selecting the maximum value x _max And the minimum value x _min (ii) a Selecting coordinates (x) of all pixel points in the set mu _i ，y _j ) Y in (1) _j And will select y _j Arranged from big to small, and then the maximum value y is selected _max And the minimum value y _min (ii) a In terms of coordinates (x) _min ，y _min ) And coordinates (x) _max ，y _max ) Generating a prior frame for the coordinates of the lower left corner and the upper right corner, and storing the prior frame in a coordinate form;

as shown in fig. 5, the steps of processing the picture information by the trained neural network model are as follows:

s1: the picture information passes through 13 conv + relu layers, the convolution calculation and the nonlinear transformation are carried out on the picture, the convolution kernels are all 3 x3, 4 Pooling layers are passed in the period, and the down sampling is carried out through a 2 x2 filter;

s2: selecting feature graphs generated by conv2 and conv13 layers, processing the feature graphs through an SE module processing layer, strengthening shallow features of the feature graphs through an SE module, and marking the generated feature graphs as F1 and F2 respectively;

s3.1: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and is spliced with the F1 to generate a characteristic diagram F3;

s3.2: mapping the prior frame to a feature map F3, sending a feature map F5 corresponding to the area selected after mapping to a deep learning model based on CNN, judging whether a target exists in the area, if so, sending the feature map F5 to a classification layer, and entering S6; if no target exists, entering S4;

In the embodiment, the feature map is judged by using the prior frame, when a target exists in the region mapped by the prior frame, classification can be directly performed, complex calculation is not required to be performed through an RPN (resilient packet network), and the calculation amount is reduced.

The following describes a small target recognition system based on a complex background in conjunction with a bio-survey:

w1: acquiring a picture of a specific region through a picture acquisition module, wherein the picture acquisition module is configured as a camera and acquires the picture every 15 s;

w2: sending the picture into an image enhancement module, and carrying out threshold segmentation processing on the picture to remove the interference of a complex background;

w3: the processed picture information is subjected to noise reduction processing through a picture noise filtering module;

w4: responding to the input of a user through a neural network model selecting unit, selecting a corresponding trained neural network model from a neural network model storage unit, and loading the neural network model to a small target recognition unit;

w5: processing the picture information through a small target identification unit to generate a target type label and a positioning frame coordinate;

w6: and marking the original picture through a marking unit according to the target type label and the coordinate of the positioning frame.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For a person skilled in the art to which the invention pertains, several equivalent alternatives or obvious modifications, all of which have the same properties or uses, without departing from the inventive concept, should be considered as falling within the scope of the patent protection of the invention, as determined by the claims submitted.

Claims

1. A small target recognition system based on a complex background, comprising:

the target identification module is used for acquiring the image information after noise reduction and marking and identifying a target in the image;

the target identification module comprises a neural network model storage unit, a neural network training unit, a neural network model selection unit, a small target identification unit and a labeling unit;

the neural network model selecting unit is used for responding to the operation of a user to select a corresponding trained neural network model from the neural network model storage unit and sending the neural network model into the small target recognition unit;

the labeling unit is used for receiving the coordinates of the positioning frame and labeling on the picture according to the coordinates of the positioning frame, and the labeled content comprises the positioning frame and the target type label.

2. The small target recognition system based on the complex background as claimed in claim 1, wherein the step of processing the image information by the trained neural network model is as follows:

s2: selecting feature graphs generated by a conv2 layer and a conv13 layer, and processing the feature graphs through an SE module processing layer respectively, wherein the generated feature graphs are marked as F1 and F2 respectively;

s3: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and is spliced with the F1 to generate a characteristic diagram F3;

s5: the ROI Pooling layer generates a characteristic diagram F4 by mapping the coordinates of the prediction frame to a characteristic diagram F3, dividing the mapped area into parts with the same size, performing max Pooling on different parts, and sending the characteristic diagram F4 to a classification layer;

3. The system for recognizing the small target based on the complex background as claimed in claim 1, further comprising a priori box generation module;

the prior frame generation module is used for generating a region with the highest target existence probability according to the picture of the training data set and recording the region as a prior frame;

the step of generating the prior frame by the prior frame generating module is as follows:

T3: assigning (k + 1) to k, and acquiring pixel point coordinate (x) each time _i ，y _j ) Traversing the whole set delta and matching the pixel point coordinate; if the matching is successful, w is _n +1 weight w assigned to pixel coordinate corresponding to successful matching _n (ii) a If the matching fails, adding the pixel point coordinate to the set delta and distributing the weight w to the set delta _n ；

t5: coordinates (x) of all pixel points in the set delta _i ，y _j ) By weight w _n Sorting from big to small and selecting the weight w _n Pixel point coordinate (x) of preceding e _i ，y _j ) Is marked asA set μ, where e is a weight confidence threshold;

t6: selecting coordinates (x) of all pixel points in the set mu _i ，y _j ) X in (1) _i And will select x _i Arranged from big to small, and the maximum value x is selected _max And the minimum value x _min (ii) a Selecting coordinates (x) of all pixel points in the set mu _i ，y _j ) Y in (1) _j And will select y _j Arranged from big to small, and then the maximum value y is selected _max And the minimum value y _min (ii) a In terms of coordinates (x) _min ，y _min ) And coordinates (x) _max ，y _max ) For the lower left corner coordinates and the upper right corner coordinates, a prior box is generated, which is stored in coordinate form.

4. The system according to claim 3, wherein the step T2 specifically comprises:

t2.1: then selecting a picture in the next training data set, and sequentially acquiring the coordinates (x) of pixel points in the marking frame according to the coordinate position of the marking frame on the picture _i ，y _j ）；

if k is not more than 0.5M, entering T2.3;

5. The system for recognizing the small target based on the complex background as claimed in claim 2, wherein the step S3 specifically comprises the steps of:

s3.2: mapping the prior frame to a feature map F3, sending a feature map F5 corresponding to the area selected after mapping to a deep learning model based on CNN, judging whether a target exists in the area, if so, sending the feature map F5 to a classification layer, and entering S6; if no target exists, the process goes to S4.

6. The system according to claim 1, wherein the image acquisition module is configured as a camera, and the image acquisition module acquires images every 15 s.

7. A small-object recognition system based on complex background as claimed in claim 4, wherein A is set manually.