CN115410196A - Small target identification system based on complex background - Google Patents

Small target identification system based on complex background Download PDF

Info

Publication number
CN115410196A
CN115410196A CN202211342007.2A CN202211342007A CN115410196A CN 115410196 A CN115410196 A CN 115410196A CN 202211342007 A CN202211342007 A CN 202211342007A CN 115410196 A CN115410196 A CN 115410196A
Authority
CN
China
Prior art keywords
picture
neural network
coordinates
network model
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211342007.2A
Other languages
Chinese (zh)
Other versions
CN115410196B (en
Inventor
杨扬
邓荣春
李柯
李睿
胡燚
周丹
胡荣群
沈克永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Institute of Technology
Original Assignee
Nanchang Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Institute of Technology filed Critical Nanchang Institute of Technology
Priority to CN202211342007.2A priority Critical patent/CN115410196B/en
Publication of CN115410196A publication Critical patent/CN115410196A/en
Application granted granted Critical
Publication of CN115410196B publication Critical patent/CN115410196B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/695Preprocessing, e.g. image segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/36Applying a local operator, i.e. means to operate on image points situated in the vicinity of a given point; Non-linear local filtering operations, e.g. median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/698Matching; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Nonlinear Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a small target identification system based on a complex background. A small target identification system based on a complex background comprises a picture acquisition module, an image enhancement module, a picture noise filtering module and a target identification module. The feature map is judged by using the prior frame, when a target exists in the region mapped by the prior frame, classification can be directly carried out, complex calculation is not required to be carried out through an RPN (resilient packet network), and the calculation amount is reduced.

Description

Small target identification system based on complex background
Technical Field
The invention relates to the field of target identification, in particular to a small target identification system based on a complex background.
Background
The object recognition is widely applied to various fields, for example, in the bio-survey, when people need to survey and count the creatures in a certain region, a hidden camera is usually installed in the region to monitor the places where the creatures often appear, and then people watch the monitoring video to observe the biological object, but the scheme is obviously time-consuming and labor-consuming, so that the object recognition is applied to the situation, the picture is processed through a built-in object recognition algorithm of a computer, the biological object is recognized, people do not need to watch the video in a time-consuming manner, and the scheme is obviously more practical.
However, during the biological survey, some biological types are very small, the regional environment is relatively complex, the acquired picture has a relatively complex background, but the picture containing the small target is detected by a target identification algorithm, the picture characteristics are continuously aggregated in continuous convolution to cause loss, and the identification precision is obviously reduced.
Disclosure of Invention
In order to overcome the defect that the identification precision of the small target is reduced under the existing complex background, the small target identification system based on the complex background is provided, which can reduce the calculation amount and improve the workload of the identification precision.
In order to solve the above technical problem, the present invention provides a small target recognition system based on a complex background, including:
the image acquisition module is used for acquiring image information and sending the image information to the image enhancement module;
the image enhancement module is used for receiving the picture information from the picture acquisition module, removing a background in the picture information through a threshold segmentation technology, and sending the processed picture information to the picture noise filtering module;
the picture noise filtering module is used for receiving the processed picture information and removing noise of the processed picture information by a median filtering method;
and the target identification module is used for acquiring the image information after noise reduction and marking and identifying the target in the image.
Specifically, the target identification module comprises a neural network model storage unit, a neural network training unit, a neural network model selection unit, a small target identification unit and a labeling unit;
the neural network model storage unit is used for storing a trained neural network model and a neural network model template, the neural network model adopts a fast-RCNN algorithm and is divided into a convolutional layer sub-template, an RPN (resilient packet network), an ROI Pooling layer and a classification layer, wherein the convolutional layer sub-template is formed by taking a VGG16 network as a base and adding an SE (selective analysis) module processing layer, a sampling layer and a characteristic fusion layer, the VGG16 network removes a final Pooling layer, a full connection layer and a soft-max classification layer, and the VGG16 network at the moment is formed by 13 conv + relu layers and 4 Pooling layers;
the neural network training unit is used for loading a neural network model template and carrying out parameter training on the neural network model template according to a training data set;
the neural network model selecting unit is used for responding to the operation of a user to select a corresponding trained neural network model from the neural network model storage unit and sending the selected neural network model to the small target recognition unit;
the small target recognition unit is used for loading the trained neural network model and sending the acquired picture information into the trained neural network model, and the trained neural network model processes the picture information to generate a target type label and a positioning frame coordinate;
the labeling unit is used for receiving the coordinates of the positioning frame and labeling the picture according to the coordinates of the positioning frame, and the labeled content comprises the positioning frame and the target type label.
Specifically, the steps of processing the picture information by the trained neural network model are as follows:
s1: the picture information passes through 13 conv + relu layers, convolution calculation and nonlinear transformation are carried out on the picture, convolution kernels are 3 x3, 4 Pooling layers are passed in the period, and downsampling is carried out through a 2 x2 filter;
s2: selecting feature graphs generated by the conv2 layer and the conv13 layer, and processing the feature graphs through the SE module processing layer respectively, wherein the generated feature graphs are marked as F1 and F2 respectively;
s3: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and the up-sampling is spliced with the F1 to generate a characteristic diagram F3;
s4: sending the feature map F3 into an RPN network, and generating a prediction frame coordinate containing a target by the RPN network according to the feature map F3;
s5: the ROI Pooling layer generates a characteristic diagram F4 by mapping the coordinates of the prediction frame to a characteristic diagram F3, dividing the mapped area into parts with the same size, performing max Pooling on different parts, and sending the characteristic diagram F4 to the classification layer;
s6: and the classification layer classifies the targets and generates target type labels and positioning frame coordinates.
Specifically, the method further comprises an a priori block generation module;
the prior frame generation module is used for generating a region with the highest target existence probability according to the pictures of the training data set and marking the region as a prior frame;
the prior frame generation module generates the prior frame by the following steps:
t1: the total number of pictures in the training data set is M, a picture in the training data set is selected, and coordinates (x) of all pixel points in a marking frame are obtained according to the coordinate position of the marking frame on the picture i ,y j ) And storing and recording as a set delta, wherein I =1,2,3, I is the maximum value of the x coordinate of the corresponding picture pixel point, J =1,2,3, J is the y coordinate of the corresponding picture pixel pointIs calculated, a weight w is assigned to each pixel point coordinate in the set delta n N =1,2,3 · · N, N being the total number of all elements in the set δ, record k =1;
t2: selecting a picture in the next training data set, and sequentially acquiring pixel point coordinates (x) in a mark frame according to the coordinate position of the mark frame on the picture i ,y j );
T3: assigning (k + 1) to k, and acquiring pixel point coordinate (x) each time i ,y j ) Traversing the whole set delta and matching the pixel point coordinate; if the matching is successful, w is matched n +1 weight w assigned to pixel coordinate corresponding to successful matching n (ii) a If the matching fails, the pixel point coordinate is added to the set delta, and the weight w is distributed to the pixel point coordinate n
T4: judging the value of k, and if the k is less than or equal to M, returning to T2; if the k is not more than or equal to M, entering T5;
t5: coordinates (x) of all pixel points in the set delta i ,y j ) By weight w n Sorting from big to small and selecting the weight w n Pixel point coordinate (x) of preceding e i ,y j ) And is denoted as a set μ, where e is a weight confidence threshold;
t6: selecting coordinates (x) of all pixel points in the set mu i ,y j ) X in (2) i And will select x i Arranging according to the sequence from big to small, and then selecting the maximum value x max And the minimum value x min (ii) a Selecting coordinates (x) of all pixel points in the set mu i ,y j ) Y in (1) j And will select y j Arranged from big to small, and then the maximum value y is selected max And minimum value y min (ii) a In terms of coordinates (x) min ,y min ) And coordinates (x) max ,y max ) For the lower left corner coordinates and the upper right corner coordinates, a prior box is generated, which is stored in coordinate form.
Specifically, the step T2 specifically includes:
t2.1: then selecting the next picture in the training data set, and marking the picture according to the pictureThe coordinate position of the mark frame sequentially acquires the coordinates (x) of pixel points in the mark frame i ,y j );
T2.2: judging whether k is less than or equal to 0.5M, if k is less than or equal to 0.5M, entering T3;
if k is not more than 0.5M, entering T2.3;
t2.3: calculating the number alpha of all pixel coordinate points in the set delta, judging whether alpha is more than or equal to A, wherein A is the number of the pixel coordinate points which are allowed to be the maximum in the set delta, and if alpha is more than or equal to A, entering T3; if the alpha is not more than or equal to A, entering T2.4;
t2.4: coordinates (x) of all pixel points in the set delta i ,y j ) By weight w n Sorting from big to small, deleting weight w n Pixel point coordinate (x) at back f i ,y j ) Go to T3, where f is the meaningless weight threshold.
Specifically, the step S3 specifically includes the following steps:
s3.1: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and the up-sampling is spliced with the F1 to generate a characteristic diagram F3;
s3.2: mapping the prior frame to a feature map F3, sending a feature map F5 corresponding to a region selected after mapping to a CNN-based deep learning model, judging whether a target exists in the region, if so, sending the feature map F5 to a classification layer, and entering S6; if no target exists, the process goes to S4.
Specifically, the image acquisition module is configured as a camera, and acquires images every 15 s.
Specifically, the a is set by a human.
Compared with the prior art, the invention has the following advantages: 1. the feature map is judged by using the prior frame, when a target exists in the region mapped by the prior frame, classification can be directly carried out, complex calculation is not required to be carried out through an RPN (resilient packet network), and the calculation amount is reduced.
2. The invention adopts the OTSU method to enhance the image, removes the background in the image information according to the threshold value, reduces the interference of the complex background to the small target identification, and improves the accuracy of the small target identification.
3. According to the invention, the image noise filtering module is used for carrying out noise reduction processing on the image information, so that the interference of noise on subsequent identification is reduced, the edge characteristics can be kept, and the accuracy of small target identification is further improved.
4. According to the method, the characteristic diagram of the second convolutional layer and the characteristic diagram of the thirteenth convolutional layer are selected, the shallow layer characteristics of the characteristic diagrams are strengthened through the SE module, and the two processed characteristic diagrams are spliced, so that the generated characteristic diagrams can keep the shallow layer characteristics, and the problem that the small target is small in size and the recognition accuracy is reduced is avoided.
Drawings
Fig. 1 is a schematic structural diagram of a small target recognition system based on a complex background adopted in the present invention.
Fig. 2 is a schematic structural diagram of a target identification module used in the present invention.
Fig. 3 is a schematic structural diagram of a neural network model adopted in the present invention.
FIG. 4 is a flow chart of a generate a prior block employed by the present invention.
Fig. 5 is a flowchart illustrating processing of picture information according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The invention provides a small target identification system based on a complex background, which comprises a picture acquisition module, an image enhancement module, a picture noise filtering module and a target identification module, as shown in figure 1:
the image acquisition module is used for acquiring image information and sending the image information to the image enhancement module;
the image enhancement module is used for receiving the picture information from the picture acquisition module, removing a background in the picture information through a threshold segmentation technology, and sending the processed picture information to the picture noise filtering module; the threshold segmentation technology selects an OTSU method, namely a maximum inter-class variance method, a threshold value which enables the difference between the target and the background to be maximum is obtained through calculation, the background in the image information is removed according to the threshold value, and the interference of a complex background on the identification of the small target is reduced;
the picture noise filtering module is used for receiving the processed picture information and removing noise of the processed picture information by a median filtering method; the image information after threshold segmentation processing has more noise at this time, and the gray value of the pixel point of each pixel point is the median of the gray values of the pixel points in the neighborhood of the pixel point according to a median filtering method, so that the interference of the noise on subsequent identification can be remarkably reduced, and the edge characteristics can be kept;
and the target identification module is used for acquiring the image information after noise reduction, and marking and identifying the target in the image.
As shown in fig. 2, the target identification module includes a neural network model storage unit, a neural network training unit, a neural network model selection unit, a small target identification unit, and a labeling unit;
the neural network model storage unit is used for storing a trained neural network model and a trained neural network model template, as shown in fig. 3, the neural network model adopts a fast-RCNN algorithm and is divided into a convolutional layer sub-template, an RPN network, a ROI Pooling layer and a classification layer, wherein the convolutional layer sub-template is formed by taking a VGG16 network as a base and adding a SE module processing layer, a sampling layer and a feature fusion layer, wherein the VGG16 network removes a final Pooling layer, a full connection layer and a soft-max classification layer, and at the moment, the VGG16 network consists of 13 conv + relu layers and 4 Pooling layers, wherein the SE module can enhance the shallow layer feature of a feature map, avoid loss in the identification process and improve the identification accuracy of a small target;
the neural network training unit is used for loading a neural network model template and carrying out parameter training on the neural network model template according to a training data set;
the specific steps of the parameter training are as follows:
x1: the method comprises the steps of sorting the shot pictures with small targets, carrying out classification and naming according to different types of the targets, and then dividing the pictures into a test data set and a training data set according to a ratio of 1;
x2: carrying out frame marking on targets in the test data set and training data set pictures, recording corresponding marking frame coordinates, and storing the marking frame coordinates in a form of combination of the lower left corner coordinates and the upper right corner coordinates of the marking frame coordinates;
x3: loading a neural network model template by a neural network training unit, and sending training data set picture information, corresponding type information and marking frame coordinate information into the neural network model template;
x4: training internal parameters of the neural network model template through training data set information;
x5: and detecting the trained neural network model through the test data set.
The neural network model selecting unit is used for responding to the operation of a user to select a corresponding trained neural network model from the neural network model storage unit and sending the selected neural network model to the small target recognition unit;
the small target recognition unit is used for loading the trained neural network model and sending the acquired picture information into the trained neural network model, and the trained neural network model processes the picture information to generate a target type label and a positioning frame coordinate;
the marking unit is used for receiving the coordinates of the positioning frame and marking on the picture according to the coordinates of the positioning frame, and the marked content comprises the positioning frame and a target type label;
as shown in fig. 1, further comprises an a priori block generation module;
the prior frame generation module is used for generating a region with the highest target existence probability according to the pictures of the training data set and marking the region as a prior frame;
as shown in fig. 4, the step of generating the prior frame by the prior frame generation module is:
t1: the total number of the pictures in the training data set is M, a picture in the training data set is selected, and coordinates (x) of all pixel points in a marking frame are obtained according to the coordinate position of the marking frame on the picture i ,y j ) And storing and recording as a set delta, wherein I =1,2,3 · · I, I is the maximum value of the x coordinate of the corresponding picture pixel point, J =1,2,3 · · J, J is the maximum value of the y coordinate of the corresponding picture pixel point, and each pixel point coordinate in the set delta is assigned with a weight w n N =1,2,3 · · N, N being the total number of all elements in the set δ, record k =1;
t2.1: selecting a picture in the next training data set, and sequentially acquiring pixel point coordinates (x) in a mark frame according to the coordinate position of the mark frame on the picture i ,y j );
T2.2: judging whether k is less than or equal to 0.5M, if k is less than or equal to 0.5M, entering T3;
if k is not more than 0.5M, entering T2.3;
t2.3: calculating the number alpha of all pixel coordinate points in the set delta, and judging whether alpha is more than or equal to A, wherein A is the maximum allowed number of pixel coordinate points in the set delta, the A is manually set, and if the alpha is more than or equal to A, entering T3; if the alpha is not more than or equal to A, entering T2.4;
t2.4: coordinates (x) of all pixel points in the set delta i ,y j ) By weight w n Sorting from big to small, setting f =30%, and deleting weight w n The last 30% of the pixel coordinates (x) i ,y j ) Entering T3, wherein f is a meaningless weight threshold value, and deleting the coordinates of some pixel points with lower weight when more elements exist in the set delta, so as to reduce the calculated amount;
t3: assigning (k + 1) to k, and acquiring pixel point coordinate (x) each time i ,y j ) Traversing the whole set delta and matching the pixel point coordinate; if the matching is successful, w is n +1 weight w assigned to pixel coordinate corresponding to successful matching n (ii) a If the matching fails, the pixel point coordinate is added to the set deltaAnd assigns a weight w thereto n The method is used for counting the areas with the highest probability of occurrence of the small targets;
t4: judging the value of k, and returning to T2 if the value of k is less than or equal to M; if the k is not more than or equal to M, entering T5;
t5: coordinates (x) of all pixel points in the set delta i ,y j ) By weight w n Sorting from big to small, setting e =40%, and selecting weight w n First 40% of the pixel coordinates (x) i ,y j ) And is recorded as a set mu, wherein e is a weight confidence threshold;
t6: selecting coordinates (x) of all pixel points in the set mu i ,y j ) X in (2) i And will select x i Arranging according to the sequence from big to small, and then selecting the maximum value x max And the minimum value x min (ii) a Selecting coordinates (x) of all pixel points in the set mu i ,y j ) Y in (1) j And will select y j Arranged from big to small, and then the maximum value y is selected max And the minimum value y min (ii) a In terms of coordinates (x) min ,y min ) And coordinates (x) max ,y max ) Generating a prior frame for the coordinates of the lower left corner and the upper right corner, and storing the prior frame in a coordinate form;
as shown in fig. 5, the steps of processing the picture information by the trained neural network model are as follows:
s1: the picture information passes through 13 conv + relu layers, the convolution calculation and the nonlinear transformation are carried out on the picture, the convolution kernels are all 3 x3, 4 Pooling layers are passed in the period, and the down sampling is carried out through a 2 x2 filter;
s2: selecting feature graphs generated by conv2 and conv13 layers, processing the feature graphs through an SE module processing layer, strengthening shallow features of the feature graphs through an SE module, and marking the generated feature graphs as F1 and F2 respectively;
s3.1: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and is spliced with the F1 to generate a characteristic diagram F3;
s3.2: mapping the prior frame to a feature map F3, sending a feature map F5 corresponding to the area selected after mapping to a deep learning model based on CNN, judging whether a target exists in the area, if so, sending the feature map F5 to a classification layer, and entering S6; if no target exists, entering S4;
s4: sending the feature map F3 into an RPN network, and generating a prediction frame coordinate containing a target by the RPN network according to the feature map F3;
s5: the ROI Pooling layer generates a characteristic diagram F4 by mapping the coordinates of the prediction frame to a characteristic diagram F3, dividing the mapped area into parts with the same size, performing max Pooling on different parts, and sending the characteristic diagram F4 to the classification layer;
s6: and the classification layer classifies the targets and generates target type labels and positioning frame coordinates.
In the embodiment, the feature map is judged by using the prior frame, when a target exists in the region mapped by the prior frame, classification can be directly performed, complex calculation is not required to be performed through an RPN (resilient packet network), and the calculation amount is reduced.
The following describes a small target recognition system based on a complex background in conjunction with a bio-survey:
w1: acquiring a picture of a specific region through a picture acquisition module, wherein the picture acquisition module is configured as a camera and acquires the picture every 15 s;
w2: sending the picture into an image enhancement module, and carrying out threshold segmentation processing on the picture to remove the interference of a complex background;
w3: the processed picture information is subjected to noise reduction processing through a picture noise filtering module;
w4: responding to the input of a user through a neural network model selecting unit, selecting a corresponding trained neural network model from a neural network model storage unit, and loading the neural network model to a small target recognition unit;
w5: processing the picture information through a small target identification unit to generate a target type label and a positioning frame coordinate;
w6: and marking the original picture through a marking unit according to the target type label and the coordinate of the positioning frame.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For a person skilled in the art to which the invention pertains, several equivalent alternatives or obvious modifications, all of which have the same properties or uses, without departing from the inventive concept, should be considered as falling within the scope of the patent protection of the invention, as determined by the claims submitted.

Claims (7)

1. A small target recognition system based on a complex background, comprising:
the image acquisition module is used for acquiring image information and sending the image information to the image enhancement module;
the image enhancement module is used for receiving the picture information from the picture acquisition module, removing a background in the picture information through a threshold segmentation technology, and sending the processed picture information to the picture noise filtering module;
the picture noise filtering module is used for receiving the processed picture information and removing noise of the processed picture information by a median filtering method;
the target identification module is used for acquiring the image information after noise reduction and marking and identifying a target in the image;
the target identification module comprises a neural network model storage unit, a neural network training unit, a neural network model selection unit, a small target identification unit and a labeling unit;
the neural network model storage unit is used for storing a trained neural network model and a neural network model template, the neural network model adopts a fast-RCNN algorithm and is divided into a convolutional layer sub-template, an RPN (resilient packet network), an ROI Pooling layer and a classification layer, wherein the convolutional layer sub-template is formed by taking a VGG16 network as a base and adding an SE (selective analysis) module processing layer, a sampling layer and a characteristic fusion layer, the VGG16 network removes a final Pooling layer, a full connection layer and a soft-max classification layer, and the VGG16 network at the moment is formed by 13 conv + relu layers and 4 Pooling layers;
the neural network training unit is used for loading a neural network model template and carrying out parameter training on the neural network model template according to a training data set;
the neural network model selecting unit is used for responding to the operation of a user to select a corresponding trained neural network model from the neural network model storage unit and sending the neural network model into the small target recognition unit;
the small target recognition unit is used for loading the trained neural network model and sending the acquired picture information into the trained neural network model, and the trained neural network model processes the picture information to generate a target type label and a positioning frame coordinate;
the labeling unit is used for receiving the coordinates of the positioning frame and labeling on the picture according to the coordinates of the positioning frame, and the labeled content comprises the positioning frame and the target type label.
2. The small target recognition system based on the complex background as claimed in claim 1, wherein the step of processing the image information by the trained neural network model is as follows:
s1: the picture information passes through 13 conv + relu layers, convolution calculation and nonlinear transformation are carried out on the picture, convolution kernels are 3 x3, 4 Pooling layers are passed in the period, and downsampling is carried out through a 2 x2 filter;
s2: selecting feature graphs generated by a conv2 layer and a conv13 layer, and processing the feature graphs through an SE module processing layer respectively, wherein the generated feature graphs are marked as F1 and F2 respectively;
s3: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and is spliced with the F1 to generate a characteristic diagram F3;
s4: sending the feature map F3 into an RPN network, and generating a prediction frame coordinate containing a target by the RPN network according to the feature map F3;
s5: the ROI Pooling layer generates a characteristic diagram F4 by mapping the coordinates of the prediction frame to a characteristic diagram F3, dividing the mapped area into parts with the same size, performing max Pooling on different parts, and sending the characteristic diagram F4 to a classification layer;
s6: and the classification layer classifies the targets and generates target type labels and positioning frame coordinates.
3. The system for recognizing the small target based on the complex background as claimed in claim 1, further comprising a priori box generation module;
the prior frame generation module is used for generating a region with the highest target existence probability according to the picture of the training data set and recording the region as a prior frame;
the step of generating the prior frame by the prior frame generating module is as follows:
t1: the total number of the pictures in the training data set is M, a picture in the training data set is selected, and coordinates (x) of all pixel points in a marking frame are obtained according to the coordinate position of the marking frame on the picture i ,y j ) And storing and recording as a set delta, wherein I =1,2,3 · · I, I is the maximum value of the x coordinate of the corresponding picture pixel point, J =1,2,3 · · J, J is the maximum value of the y coordinate of the corresponding picture pixel point, and each pixel point coordinate in the set delta is assigned with a weight w n N =1,2,3 · · N, N being the total number of all elements in the set δ, record k =1;
t2: selecting a picture in the next training data set, and sequentially acquiring pixel point coordinates (x) in a mark frame according to the coordinate position of the mark frame on the picture i ,y j );
T3: assigning (k + 1) to k, and acquiring pixel point coordinate (x) each time i ,y j ) Traversing the whole set delta and matching the pixel point coordinate; if the matching is successful, w is n +1 weight w assigned to pixel coordinate corresponding to successful matching n (ii) a If the matching fails, adding the pixel point coordinate to the set delta and distributing the weight w to the set delta n
T4: judging the value of k, and returning to T2 if the value of k is less than or equal to M; if the k is not more than or equal to M, entering T5;
t5: coordinates (x) of all pixel points in the set delta i ,y j ) By weight w n Sorting from big to small and selecting the weight w n Pixel point coordinate (x) of preceding e i ,y j ) Is marked asA set μ, where e is a weight confidence threshold;
t6: selecting coordinates (x) of all pixel points in the set mu i ,y j ) X in (1) i And will select x i Arranged from big to small, and the maximum value x is selected max And the minimum value x min (ii) a Selecting coordinates (x) of all pixel points in the set mu i ,y j ) Y in (1) j And will select y j Arranged from big to small, and then the maximum value y is selected max And the minimum value y min (ii) a In terms of coordinates (x) min ,y min ) And coordinates (x) max ,y max ) For the lower left corner coordinates and the upper right corner coordinates, a prior box is generated, which is stored in coordinate form.
4. The system according to claim 3, wherein the step T2 specifically comprises:
t2.1: then selecting a picture in the next training data set, and sequentially acquiring the coordinates (x) of pixel points in the marking frame according to the coordinate position of the marking frame on the picture i ,y j );
T2.2: judging whether k is less than or equal to 0.5M, if k is less than or equal to 0.5M, entering T3;
if k is not more than 0.5M, entering T2.3;
t2.3: calculating the number alpha of all pixel coordinate points in the set delta, judging whether alpha is more than or equal to A, wherein A is the number of the pixel coordinate points which are allowed to be the maximum in the set delta, and if alpha is more than or equal to A, entering T3; if the alpha is not more than or equal to A, entering T2.4;
t2.4: coordinates (x) of all pixel points in the set delta i ,y j ) By weight w n Sorting from big to small, deleting weight w n Pixel point coordinate (x) at back f i ,y j ) Go to T3, where f is the meaningless weight threshold.
5. The system for recognizing the small target based on the complex background as claimed in claim 2, wherein the step S3 specifically comprises the steps of:
s3.1: the characteristic fusion layer performs up-sampling on the F2 through bilinear interpolation, and the up-sampling is spliced with the F1 to generate a characteristic diagram F3;
s3.2: mapping the prior frame to a feature map F3, sending a feature map F5 corresponding to the area selected after mapping to a deep learning model based on CNN, judging whether a target exists in the area, if so, sending the feature map F5 to a classification layer, and entering S6; if no target exists, the process goes to S4.
6. The system according to claim 1, wherein the image acquisition module is configured as a camera, and the image acquisition module acquires images every 15 s.
7. A small-object recognition system based on complex background as claimed in claim 4, wherein A is set manually.
CN202211342007.2A 2022-10-31 2022-10-31 Small target identification system based on complex background Active CN115410196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211342007.2A CN115410196B (en) 2022-10-31 2022-10-31 Small target identification system based on complex background

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211342007.2A CN115410196B (en) 2022-10-31 2022-10-31 Small target identification system based on complex background

Publications (2)

Publication Number Publication Date
CN115410196A true CN115410196A (en) 2022-11-29
CN115410196B CN115410196B (en) 2023-02-07

Family

ID=84167362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211342007.2A Active CN115410196B (en) 2022-10-31 2022-10-31 Small target identification system based on complex background

Country Status (1)

Country Link
CN (1) CN115410196B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN205003729U (en) * 2015-09-25 2016-01-27 南昌理工学院 Automatic product specification identification equipment that stores in a warehouse of commodity circulation
CN107316001A (en) * 2017-05-31 2017-11-03 天津大学 Small and intensive method for traffic sign detection in a kind of automatic Pilot scene
CN110321815A (en) * 2019-06-18 2019-10-11 中国计量大学 A kind of crack on road recognition methods based on deep learning
CN110569754A (en) * 2019-08-26 2019-12-13 江西航天鄱湖云科技有限公司 Image target detection method, device, storage medium and equipment
CN110826476A (en) * 2019-11-02 2020-02-21 国网浙江省电力有限公司杭州供电公司 Image detection method and device for identifying target object, electronic equipment and storage medium
CN112633257A (en) * 2021-01-28 2021-04-09 华东交通大学 Potato disease identification method based on improved convolutional neural network
CN113076804A (en) * 2021-03-09 2021-07-06 武汉理工大学 Target detection method, device and system based on YOLOv4 improved algorithm
CN113553874A (en) * 2021-07-27 2021-10-26 张金翼 Express delivery bar code detection method in complex environment
AU2021107103A4 (en) * 2021-08-25 2021-12-02 Joshi, Sagar Human identification based on biometric radiographs: a forensic approach
CN113920475A (en) * 2021-10-29 2022-01-11 福建技术师范学院 Security protection equipment identification method based on autonomous learning strategy and storage medium
CN114581841A (en) * 2022-01-25 2022-06-03 刘建芳 Method for detecting weak and small targets by using deep learning method in complex traffic environment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN205003729U (en) * 2015-09-25 2016-01-27 南昌理工学院 Automatic product specification identification equipment that stores in a warehouse of commodity circulation
CN107316001A (en) * 2017-05-31 2017-11-03 天津大学 Small and intensive method for traffic sign detection in a kind of automatic Pilot scene
CN110321815A (en) * 2019-06-18 2019-10-11 中国计量大学 A kind of crack on road recognition methods based on deep learning
CN110569754A (en) * 2019-08-26 2019-12-13 江西航天鄱湖云科技有限公司 Image target detection method, device, storage medium and equipment
CN110826476A (en) * 2019-11-02 2020-02-21 国网浙江省电力有限公司杭州供电公司 Image detection method and device for identifying target object, electronic equipment and storage medium
CN112633257A (en) * 2021-01-28 2021-04-09 华东交通大学 Potato disease identification method based on improved convolutional neural network
CN113076804A (en) * 2021-03-09 2021-07-06 武汉理工大学 Target detection method, device and system based on YOLOv4 improved algorithm
CN113553874A (en) * 2021-07-27 2021-10-26 张金翼 Express delivery bar code detection method in complex environment
AU2021107103A4 (en) * 2021-08-25 2021-12-02 Joshi, Sagar Human identification based on biometric radiographs: a forensic approach
CN113920475A (en) * 2021-10-29 2022-01-11 福建技术师范学院 Security protection equipment identification method based on autonomous learning strategy and storage medium
CN114581841A (en) * 2022-01-25 2022-06-03 刘建芳 Method for detecting weak and small targets by using deep learning method in complex traffic environment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DUJIANG等: "Semantic segmentation for multiscale target based on object recognition using the improved Faster-RCNN model", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
卜华雨等: "基于一种改进的Faster-RCNN的公路中央隔离带绿化识别", 《农业装备与车辆工程》 *
沈克永等: "基于搜索区域条件概率CNN的精确目标探测方法", 《计算机工程与应用》 *
胡荣群等: "VR技术在船舶数字孪生运维系统中的应用", 《舰船科学技术》 *

Also Published As

Publication number Publication date
CN115410196B (en) 2023-02-07

Similar Documents

Publication Publication Date Title
CN108334848B (en) Tiny face recognition method based on generation countermeasure network
CN110414559B (en) Construction method of intelligent retail cabinet commodity target detection unified framework and commodity identification method
CN109118473B (en) Angular point detection method based on neural network, storage medium and image processing system
CN109711407B (en) License plate recognition method and related device
US20080193020A1 (en) Method for Facial Features Detection
CN105868708B (en) A kind of images steganalysis method and device
US20210326638A1 (en) Video panoptic segmentation
CN110766017B (en) Mobile terminal text recognition method and system based on deep learning
CN109726678B (en) License plate recognition method and related device
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
CN113435319B (en) Classification method combining multi-target tracking and pedestrian angle recognition
CN113297956B (en) Gesture recognition method and system based on vision
CN111368632A (en) Signature identification method and device
CN115100469A (en) Target attribute identification method, training method and device based on segmentation algorithm
CN115410196B (en) Small target identification system based on complex background
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
US20230196718A1 (en) Image augmentation device and method
CN116363535A (en) Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network
CN115953744A (en) Vehicle identification tracking method based on deep learning
CN114067186B (en) Pedestrian detection method and device, electronic equipment and storage medium
CN111199050A (en) System for automatically desensitizing medical records and application
CN115223173A (en) Object identification method and device, electronic equipment and storage medium
CN114724175A (en) Pedestrian image detection network, detection method, training method, electronic device, and medium
CN110969602B (en) Image definition detection method and device
CN114550062A (en) Method and device for determining moving object in image, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant