CN113780110A

CN113780110A - Method and device for detecting weak and small targets in image sequence in real time

Info

Publication number: CN113780110A
Application number: CN202110980665.3A
Authority: CN
Inventors: 袁滔; 温明; 程勇策; 戴玉成; 宋秋迪
Original assignee: Third Research Institute Of China Electronics Technology Group Corp
Current assignee: Third Research Institute Of China Electronics Technology Group Corp
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2021-12-10

Abstract

The invention discloses a method and equipment for detecting weak and small targets in an image sequence in real time. The method for detecting the weak and small targets in the image sequence in real time comprises the following steps: acquiring an image sequence under a fixed view field; constructing an initial background model based on a first frame image of the image sequence, and performing foreground pixel segmentation on a subsequent image sequence of the first frame image based on the initial background model; updating the initial background model by using the pixels of all the segmented background areas, and performing post-processing and contour extraction processing on all the segmented foreground areas based on the updated initial background model to obtain a target boundary frame; based on the target bounding box, a target is determined. The invention identifies the target in the image by a pixel-by-pixel comparison method between the images and finally outputs a target position boundary frame and a target category. The accuracy of recognizing the weak and small targets is improved, the calculated amount in the recognition process is reduced, and the method has wider applicability.

Description

Method and device for detecting weak and small targets in image sequence in real time

Technical Field

The invention relates to the technical field of computer vision target detection, in particular to a real-time detection method and equipment for a small and weak target.

Background

The target detection technology in computer vision generally refers to a process of acquiring an image of a corresponding scene by using an image sensor and automatically labeling all interested targets in the image out of a position bounding box and a category by combining a certain algorithm strategy. The target detection technology is widely applied to the fields of military investigation, security monitoring, automatic driving, industrial automation, intelligent medical treatment and the like, can greatly improve the system intelligence degree, saves human resources, assists human decision making, and has great potential economic benefits. In the military field, photoelectric detection monitoring equipment is developing from traditional manual operation for searching targets to automatic detection for finding targets, and the intelligence detection capability of the military is effectively improved, and the target detection technology is an indispensable key link.

At present, most target detection algorithms adopt a deep convolutional neural network model, and the accuracy of target detection is gradually improved through training of a large amount of data sets, but the target detection algorithms have certain requirements on the size and the definition of a target, have low detection rate on weak and small moving targets with low contrast with an image background, are usually large in model and high in calculation cost, and have great limitations on edge calculation equipment applied to a photoelectric detection system.

Disclosure of Invention

The embodiment of the invention provides a method and equipment for detecting weak and small targets in an image sequence in real time, which are used for solving the problems of low detection rate, high calculation cost and low applicability to edge calculation equipment of the weak and small targets with low contrast with an image background in the prior art.

According to one aspect of the invention, the real-time detection method for the weak and small targets comprises the following steps:

acquiring an image sequence under a fixed view field;

constructing an initial background model based on a first frame image of the image sequence, and performing foreground pixel segmentation on a subsequent image sequence of the first frame image based on the initial background model;

updating the initial background model by using pixels of all the segmented background areas, and performing post-processing and contour extraction processing on all the segmented foreground areas based on the updated initial background model to obtain a target boundary frame;

based on the target bounding box, a target is determined.

According to some embodiments of the invention, the method further comprises:

denoising the image sequence before constructing an initial background model based on a first frame image of the image sequence.

According to some embodiments of the invention, the denoising the sequence of images comprises:

and performing median filtering processing on each frame of image in the image sequence.

According to some embodiments of the invention, the constructing an initial background model based on a first frame image of the image sequence comprises:

and superposing random numbers uniformly distributed among [ -a, a ] on all pixel gray values of the first frame image, and repeating the superposition for N times to generate N initial background sample images.

According to some embodiments of the invention, the performing foreground pixel segmentation on the subsequent image sequence of the first frame image based on the initial background model comprises:

subtracting the gray value of each pixel of each frame of image after the first frame of image from the gray value of the pixel of the N initial background sample images one by one, and then calculating an absolute value, if the absolute value is smaller than a gray difference threshold, adding one to the background matching degree of the pixel, and if the background matching degree of the pixel is finally counted to be larger than the sample matching threshold, determining the pixel as a background pixel, and meanwhile, setting the gray value of the pixel in the image to be 0; otherwise, the pixel is judged to be a foreground pixel, and meanwhile, the gray value of the pixel in the image is set as b.

According to some embodiments of the invention, the updating the initial background model with pixels of all segmented background regions comprises:

starting from the first pixel of the background area, selecting a pixel point at intervals of j pixels as a first pixel point for updating the initial background model;

if the first pixel point is a background point, randomly selecting one of N initial background sample pictures as an initial background sample picture to be updated, and replacing and updating the corresponding pixel point in the initial background sample picture to be updated by using the first pixel point;

and randomly selecting a neighborhood pixel point in the neighborhood around the first pixel point, and replacing and updating one of the N initial background sample pictures by using the neighborhood pixel point.

According to some embodiments of the present invention, the performing post-processing and contour extraction processing on all the segmented foreground regions based on the updated initial background model to obtain the target bounding box includes:

carrying out corrosion and expansion treatment on the foreground area;

based on the updated initial background model, carrying out ghost elimination processing on the corroded and expanded foreground area so as to update the foreground area;

carrying out hole filling processing on the updated foreground area;

and after the foreground area subjected to hole filling treatment is corroded, carrying out differential treatment on the foreground area before and after the corrosion treatment to obtain a target boundary frame.

According to some embodiments of the invention, the determining the target based on the target bounding box comprises:

constructing and training a softmax classification model;

and inputting the target bounding box into a trained softmax classification model to output a target position bounding box and a target class.

According to another aspect of the present invention, an embodiment provides a weak and small target real-time detection device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method for detecting weak small objects in a sequence of images in real time as described above.

According to an embodiment of a further aspect of the present invention, there is provided a computer-readable storage medium, on which an information transfer implementation program is stored, which when executed by a processor implements the steps of the method for detecting weak small objects in an image sequence in real time as described above.

By adopting the scheme of the invention, the image sequence under the fixed view field is firstly collected, then the initial background model is established, the target area in the image is extracted by comparing the pixels of other image sequences with the corresponding pixels of the initial background model one by one, and finally the target in the image is detected and identified, and the weak and small targets are positioned by utilizing background estimation, so that the extraction capability of the algorithm on the target under the conditions of complex background and low contrast can be improved, the time-consuming sliding window convolution positioning process is avoided, the calculation cost is reduced, and the applicability to different calculation platforms is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. In the drawings:

FIG. 1 is a schematic flow chart of a method for detecting weak and small targets in an image sequence in real time according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a method for detecting weak and small targets in an image sequence in real time according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

An embodiment of an aspect of the present invention provides a method for detecting a small target in an image sequence in real time, which is shown in fig. 1, and the method includes:

s1, collecting an image sequence under a fixed view field; the image sequence is a combination of a plurality of sets of images acquired at different times in a certain scene through image acquisition equipment;

s2, constructing an initial background model based on a first frame of image of the image sequence, and performing foreground pixel segmentation on a subsequent image sequence of the first frame of image based on the initial background model; front background pixel segmentation may be understood as the segmentation of an image into foreground and background regions.

S3, updating the initial background model by using the pixels of all the segmented background areas, and performing post-processing and contour extraction processing on all the segmented foreground areas based on the updated initial background model to obtain a target boundary frame;

and S4, determining a target based on the target boundary box.

The method comprises the steps of establishing an initial background model based on a first frame of image of an acquired image sequence, segmenting foreground and background pixels of a subsequent image by a method of comparing the subsequent image with the initial background model one by one, and extracting a target through the segmented foreground pixels so as to obtain a target boundary frame and determine the target. The method for detecting the targets by comparing the pixels in the images one by one is adopted, so that the accuracy of identifying weak and small targets is improved, the target detection probability is improved, the calculated amount is reduced compared with methods such as sliding window identification in the prior art, and the applicability to each computing platform is improved.

On the basis of the above-described embodiment, various modified embodiments are further proposed, and it is to be noted herein that, in order to make the description brief, only the differences from the above-described embodiment are described in the various modified embodiments.

According to some embodiments of the invention, the method further comprises:

denoising the image sequence before constructing an initial background model based on a first frame image of the image sequence. Through the noise reduction processing, the influence of the image impulse noise on the subsequent operation of the invention can be reduced, the accuracy of target identification is further improved, and the subsequent calculation amount is reduced.

and performing median filtering processing on each frame of image in the image sequence. For example, the image is denoised using a median filter with a window size of 5 x 5.

The median filtering is adopted to carry out noise reduction processing on the collected images, the median filtering can protect the edges of the images, the effect of removing noise can be achieved, and the calculated amount is small.

Noise processing may also be performed using an averaging filter, an adaptive wiener filter, a morphological noise filter, etc., according to some embodiments of the invention.

and superposing random numbers uniformly distributed among [ -a, a ] on all pixel gray values of the first frame image, and repeating the superposition for N times to generate N initial background sample images. By means of random number superposition, an initial background sample set with the capacity of N is obtained for each pixel, and the noise tolerance of an initial background model can be improved.

The gray difference threshold and the matching threshold comprehensively determine the sensitivity of foreground segmentation, the higher the sensitivity is, the weaker and smaller targets can be extracted, but noise can be generated, and parameters can be adjusted according to scene complexity and target characteristics or requirements in the application process, so that the method has wide applicability.

According to some embodiments of the invention, the performing foreground pixel segmentation on the subsequent image sequence of the first frame image based on the initial background model further comprises:

the segmentation method based on the threshold sets different thresholds according to the difference of the gray values of the background and the foreground of the image, compares the gray value of each pixel in the image with the threshold, and finally segments the image into the foreground and the background. The threshold value can be set by fixing a certain pixel value as a threshold value point, determining the threshold value point through iterative threshold value calculation, and also can be segmented through an adaptive threshold value image, and the accuracy of segmentation can be increased by using the Otsu method (OTSU) in the adaptive threshold value image segmentation method.

the edge-based segmentation method is a reflection of the discontinuity of local features of an image due to the collection of continuous pixels on the boundary line of two different areas in the image, and reflects the sudden change of image characteristics such as gray scale, color, texture and the like. Generally, edge detection can be performed based on gray values, the gray values of pixels on two sides of an edge have obvious difference, edge detection can be performed by using a differential operator, namely, an edge is determined by using an extreme value of a first derivative and a zero crossing point of a second derivative, and in specific implementation, segmentation of a foreground and a background of an image can be completed by using convolution of the image and a template.

the interval number J is a random number uniformly distributed in the interval of [0, J ], J is a preset updating parameter, and the parameter can be adjusted according to the time change speed of the acquired scene image sequence.

The gray level of the pixels in the background area usually changes slowly, so that the pixels in the background area are not required to be completely used for updating the initial background model; the calculation time can be greatly reduced by randomly extracting the pixel points, the calculation efficiency is improved, and all the pixels have the same updating frequency, so that the background templates of all the pixels can be ensured to be updated after a period of time.

carrying out corrosion and expansion treatment on the foreground area; for example: the method can be characterized in that morphological operation opening operation is carried out on the foreground area, the foreground area is firstly subjected to corrosion treatment by a rectangular template with a proper size, and then expansion treatment is carried out, so that part of foreground noise is removed.

Based on the updated initial background model, carrying out ghost elimination processing on the corroded and expanded foreground area so as to update the foreground area; for example: and calculating the gray average value of N pairs of initial background sample images in the initial background model for a certain pixel of a foreground area according to the polarity characteristics of the target, and making a difference on the gray value of the pixel, wherein if the difference is greater than 0 and the polarity of the target is white, or if the difference is less than 0 and the polarity of the target is black, the pixel of the foreground area is judged as the background pixel again.

Carrying out hole filling processing on the updated foreground area;

Partial foreground noise can be removed through corrosion and expansion processing, subsequent calculation amount is reduced, ghost image elimination processing is carried out on the foreground area, pixels reflecting the target in the foreground area are more accurate, and accuracy of target identification is improved.

According to some embodiments of the present invention, after performing the difference processing on the foreground regions before and after the erosion processing, the horizontal direction of the graph is set as a horizontal axis x, the vertical direction of the graph is set as a vertical axis y to establish a coordinate system, the foreground image is subjected to row-column traversal search, a coordinate set of the foreground image reflecting the target pixel can be obtained, and the maximum value and the minimum value of the coordinate set in the x and y directions are taken as the positions of the target bounding box.

constructing and training a softmax classification model; calculating the target area, the aspect ratio and the target area ratio through the target boundary box, and labeling the target category; the target area is the total number of pixels actually occupied by the target, the aspect ratio is the aspect ratio of the target contour bounding box, and the area ratio is the ratio of the target area to the contour bounding box area. And carrying out standardization and amplification processing on the calculated related data, and constructing a softmax classification model and a training data set and a verification data set based on the data.

Referring to fig. 2, a detailed description is given of a method for detecting weak and small objects in an image sequence in real time according to an embodiment of the present invention. It is to be understood that the following description is illustrative only and is not intended to be in any way limiting. All similar structures and similar variations thereof adopted by the invention are intended to fall within the scope of the invention.

The method for detecting the weak and small targets in the image sequence in real time comprises the following steps:

s11, acquiring an image sequence under a fixed view field, and marking the image sequence as a first frame image and a second frame image according to an acquisition time sequence;

s12, preprocessing each frame image by using a median filter with a convolution kernel size of 5 x 5, and reducing the influence of image noise on subsequent processing;

s13, constructing an initial background model by using the first frame image, superposing uniformly distributed random numbers between [ -10,10] on all pixel gray values of the first frame image, repeating for 20 times to generate 20 initial background sample images, and obtaining an initial background sample set with the capacity of 20 for each pixel.

S14, starting from a second frame image, performing pixel-by-pixel statistical comparison on the image subjected to median filtering preprocessing and a current background model, further setting a gray difference threshold G of front background segmentation as 10, setting a sample matching threshold as 5, subtracting the gray value of each pixel from the gray values of 20 samples one by one to obtain an absolute value, if the absolute value is less than 10, adding one to the background matching degree M, and if the background matching degree of the pixel obtained through final statistics is greater than 5, determining that the pixel is a background pixel, and meanwhile setting the gray value of the pixel in the segmented image as 0; and if the background matching degree is less than or equal to 5, the pixel is judged to be a foreground pixel, meanwhile, the gray value of the pixel in the segmentation image is set to be 255(8bit image), and a complete front background segmentation image is obtained after traversing all pixels of the image.

S15, updating the current background template by using the pixels of the background area obtained by segmentation, wherein the updating mode is random selection and replacement in a time domain and a space domain; starting from the first pixel of the front background segmentation image, selecting a pixel point at intervals of j pixels as a pixel of the frame needing to update the background template, wherein j is a random number uniformly distributed between [0 and 20], and a new random number instance j is required to be generated in each interval selection; and if the selected pixel point is a foreground point, the selected pixel point is not used for updating the background sample set.

If a certain pixel point selected at random intervals is a background point, generating a random number uniformly distributed between [0 and 19] for randomly selecting one of 20 background samples corresponding to the pixel and replacing and updating the pixel gray value; and generating a random number uniformly distributed between [0,7] for randomly selecting a neighborhood pixel in the neighborhood of the pixel 8, and replacing and updating a sample in the background model of the neighborhood pixel by the gray value of the pixel.

S16, post-processing the foreground and background segmentation image, eliminating noise and ghost images, and filling holes, so that an interest target can be conveniently extracted in subsequent processing; for example, eliminating noise, the foreground segmentation image is subjected to erosion and expansion processing by using a rectangular template with the size of 3 x 3. In the ghost elimination process, for a certain foreground pixel point, calculating the gray level mean value of a background sample set and making a difference with the gray level of the pixel, if the difference value is larger than 0 and the target polarity is white, or the difference value is smaller than 0 and the target polarity is black, re-judging the foreground pixel point as the background pixel, and setting the gray level of the pixel in the front background segmentation image as 0.

S17, carrying out contour extraction processing on the foreground image subjected to post-processing, and positioning a target boundary frame; specifically, 0 filling of 1 pixel width is performed on the boundary of the binary segmentation image, and the size of the filled image is 1922 × 1082; and (3) carrying out erosion operation on the image by adopting a 3-by-3 template, and differentiating the image with the image before erosion to obtain a contour map. Traversing the rows and columns of the contour map to find a pixel with the first gray scale of 255 as a target contour search starting point, traversing contour points clockwise, and finding a next contour point in 8 neighborhoods of the contour map until the search returns to the starting point to obtain a complete target contour point set; and taking the maximum value and the minimum value of the coordinates of the contour point set on the x axis and the y axis as the positioning frames of the single object.

Acquiring a sample image sequence, extracting a target boundary box of each sample image, calculating the target area, the aspect ratio and the target/rectangular area ratio, manually marking the target type, and constructing a training data set and a verification data set of a softmax classification model through data standardization and amplification processing; training a softmax classification model and verifying the classification accuracy of the softmax classification model; the distribution ratio of the training set and the verification set of the labeled data set is 8: 2; the data normalization uses the following formula:

in the formula, s is the standard deviation, n is the number of targets labeled as the same kind in the training set, and x_iAnd y_iIs a feature vector.

And S18, sending the image sequence to be detected with the obtained target boundary frame into a trained softmax model.

And S19, outputting the target position bounding box and the target category.

According to the invention, the weak and small targets are identified by comparing the pixels one by one, so that the identification accuracy is improved. The method is characterized in that the steps of performing median filtering pretreatment on the acquired image sequence, generating a plurality of pairs of sample images with different gray values based on the first frame image, updating a background model, performing post-treatment on a foreground region and the like are added, so that the calculated amount is further reduced while the noise influencing the target identification is eliminated, and the applicability of the detection scheme is wider.

Another aspect of the embodiments of the present invention provides a device for detecting weak and small targets in an image sequence in real time; the detection device includes: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method for detecting weak and small objects in an image sequence in real time as described in any one of the above embodiments or a free combination thereof.

According to some embodiments of the invention, the memory and processor may be connected by a bus or other means. The memory is used for storing the computer program and transmitting the computer program to the processor. The memory may include volatile memory, such as Random Access Memory (RAM); the memory may also comprise non-volatile memory, such as Read Only Memory (ROM), flash memory, a hard disk, or a combination of the above. The processor may be a general purpose processor such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an application specific integrated circuit processor (ASIC), or the like.

Yet another aspect of the present invention provides a computer-readable storage medium; the computer-readable storage medium according to some embodiments of the present invention stores thereon a program for implementing information transfer, which when executed by a processor implements the steps of the method for detecting weak and small objects in an image sequence in real time as described in any one of the above embodiments or any combination thereof. The computer readable storage medium may be RAM memory, flash memory, ROM memory, EPROM memory, EEPRON memory, registers, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art.

It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention, and those skilled in the art may freely combine the embodiments of the present invention, and may make various modifications and changes. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. In the description of the present specification, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

Claims

1. A method for detecting weak and small targets in an image sequence in real time is characterized by comprising the following steps:

acquiring an image sequence under a fixed view field;

based on the target bounding box, a target is determined.

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 2, wherein the denoising the sequence of images comprises:

4. The method of claim 1, wherein constructing an initial background model based on a first frame image of the sequence of images comprises:

5. The method of claim 4, wherein the performing foreground pixel segmentation for the subsequent image sequence of the first frame image based on the initial background model comprises:

6. The method of claim 5, wherein said updating the initial background model with pixels of all segmented background regions comprises:

7. The method of claim 1, wherein the performing post-processing and contour extraction processing on all segmented foreground regions based on the updated initial background model to obtain a target bounding box comprises:

carrying out corrosion and expansion treatment on the foreground area;

carrying out hole filling processing on the updated foreground area;

8. The method of claim 1, wherein determining the target based on the target bounding box comprises:

constructing and training a softmax classification model;

9. A device for real-time detection of small and weak objects in an image sequence, comprising: memory, processor and computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method for real-time detection of weak small objects in a sequence of images as claimed in any one of claims 1 to 8.

10. A computer-readable storage medium, on which an information transfer implementing program is stored, which, when being executed by a processor, implements the steps of the method for detecting weak small objects in an image sequence in real time as claimed in any one of claims 1 to 8.