CN109166106B

CN109166106B - Target detection position correction method and device based on sliding window

Info

Publication number: CN109166106B
Application number: CN201810871600.3A
Authority: CN
Inventors: 赵梦莹; 张俊男; 李睿豪; 潘煜; 贾智平; 蔡晓军
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2018-08-02
Filing date: 2018-08-02
Publication date: 2021-07-30
Anticipated expiration: 2038-08-02
Also published as: CN109166106A

Abstract

The invention discloses a target detection position correction method and a device based on a sliding window, wherein the width and the moving stride of the sliding window are set, and an image of a target to be detected is divided by using the sliding window to obtain a plurality of candidate target areas; sending all candidate target areas into a CNN neural network for training treatment to obtain confidence degrees of all candidate target areas; selecting an index area corresponding to the maximum confidence coefficient value and the maximum value as a reference value; and cutting and combining the candidate target area by using a position correction method and a reference value to form a new target area. The invention provides a combinable and cutting positioning method aiming at a single target in an image on the basis of a convolutional neural network and a sliding window, and improves the accuracy and speed of target identification.

Description

Target detection position correction method and device based on sliding window

Technical Field

The invention relates to the field of image processing, in particular to a target detection position correction method and device based on a sliding window.

Background

As is well known, in the information age, the acquisition, processing, and application of information are all dramatically expanding. The important knowledge source in the world is known as image information, and in many occasions, the information transmitted by the image is richer, more true and more specific than other forms of information. The cooperation of the human eye and the brain enables people to acquire, process and understand visual information, and the efficiency of human perception of external environment information by using vision is high. In fact, according to statistics made by some foreign scholars, about 80% of the external information obtained by human beings is from images taken by the eyes. Therefore, vision is used as a main carrier for human to obtain external information, and a computer needs to be capable of processing image information to realize intellectualization. In particular, in recent years, image data processing featuring a large capacity such as graphics, images, and video has been widely used in the fields of medicine, transportation, industrial automation, and the like.

In recent years, machine learning has received a great deal of academic and engineering attention. In machine learning, a Convolutional Neural Network (CNN) is a deep feedforward artificial Neural Network, generally including a Convolutional layer (Convolutional layer), a normalization layer (normalization layer), a pooling layer (pooling layer), and a fully-connected layer (full-connected layer), and has been successfully applied to image recognition. At present, CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification, because the network avoids the complex preprocessing of the image, and can directly input the original image and classify the image, etc., the network is more widely applied.

The target detection is an important content in the field of image processing and target identification, and the main task of the target detection is to locate and classify targets from a given image, wherein a method based on sliding window search is widely applied to the target detection. However, the conventional sliding window (sliding window) search technique has the following disadvantages: (1) the size of the window is fixed, and the size of the segmented image cannot be changed due to the size of the target; (2) if a plurality of groups of sliding windows with different sizes work simultaneously, the calculated amount is increased inevitably, and the efficiency is influenced; (3) when the sliding stride is dense, the data volume is increased, and the speed is influenced; when the sliding stride is too large, the detection accuracy is affected.

In summary, an effective solution to the problem of low accuracy and efficiency of target detection in the prior art is still lacking.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a target detection position correction method and a target detection position correction device based on a sliding window.

The technical scheme adopted by the invention is as follows:

a first object of the present invention is to provide a target detection position correction method based on a sliding window, the method comprising the steps of:

setting the width and the moving stride of a sliding window, and segmenting the image of the target to be detected by using the sliding window to obtain a plurality of candidate target areas;

sending all candidate target areas into a CNN neural network for training treatment to obtain confidence degrees of all candidate target areas;

selecting an index area corresponding to the maximum confidence coefficient value and the maximum value as a reference value;

and cutting and combining the candidate target area by using a position correction method and a reference value to form a new target area.

Further, determining the width of the sliding window according to the average size of all the objects to be detected; the moving width of the sliding window is less than or equal to half of the width of the sliding window.

Further, the step of sending all candidate target regions into the CNN neural network for training includes:

taking the candidate target area with the correlation rate with the target area smaller than the threshold value I as noise, taking the candidate target area with the correlation rate with the target area larger than the threshold value I as a target, and respectively inputting the target areas into a CNN neural network for training;

and obtaining the confidence degrees of all candidate target regions by using the trained CNN neural network.

Further, when the noise areas are too many, a plurality of noise areas are deleted at random by using a random sampling method, or the pictures of the corresponding training set are deleted.

Further, the maximum confidence value is selected from all the confidence values output by the CNN neural network, and the index region corresponding to the maximum confidence value and the maximum confidence value is used as a reference value.

Further, according to the width of the sliding window and the size of the target to be detected, the depth of the breadth traversal is used as a traversal constraint condition, and when the maximum depth of the breadth traversal is less than or equal to 2, the position correction is carried out.

Further, the method for clipping and combining the candidate target region by using the position correction method and the reference value comprises:

taking the index region corresponding to the maximum confidence coefficient as a central point region;

setting region intensity threshold value T₁Confidence activation threshold T₂And confidence suppression threshold T₃；

Taking the central point area as an origin, and taking four adjacent areas, namely an upper area, a lower area, a left area and a right area, as current candidate diffusion areas;

based on the breadth traversal algorithm, the confidence coefficient of the current diffusion region is differed from the maximum confidence coefficient of the center point of the index region, and the confidence coefficient of the diffusion region is respectively compared with a confidence coefficient activation threshold value and a confidence coefficient inhibition threshold value;

if the difference between the confidence coefficient of a certain direction of the current diffusion area and the maximum confidence coefficient is less than T₁And the diffusion region confidence is greater than T₂Expanding the boundary of the central area towards the direction corresponding to the diffusion area;

if the confidence coefficient of a certain direction of the current diffusion region is less than T₃If the target area does not extend in the direction, the target to be detected is in the index area corresponding to the maximum value, and the direction corresponding to the central area is reduced in the opposite direction.

A second object of the present invention is to provide a sliding-window-based object detection position correction apparatus, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the following steps, including:

Compared with the prior art, the invention has the beneficial effects that:

(1) the size of the sliding window is set according to the average size of the object to be detected, so that the target area has better elasticity in combination and cutting, and the area where the target is located can be detected under fewer combination and cutting operations; the moving step is set to be half of the size of the sliding window, and the target detection speed is increased on the premise that the window has a large overlapping area;

(2) the method is based on the breadth traversal method, the depth of the breadth traversal is added according to the size of the sliding window and the real size of the detected target as the traversal constraint condition, namely the maximum depth of the breadth traversal is less than or equal to 2, the combination and the cutting are carried out, the height and the weight of the target to be detected are described instead of an original square, and the accuracy of target detection is effectively improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flowchart of a sliding window-based target detection position correction method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a sliding window-based target detection position correction method according to a second embodiment of the present invention;

FIG. 3 is a diagram illustrating attribute values of candidate target regions of an image;

FIG. 4 is a schematic view of maximum depth;

FIG. 5 is an exemplary diagram of a cropping candidate region;

fig. 6 is an exemplary diagram of a combination candidate region.

Detailed Description

The invention is further described with reference to the following figures and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As introduced in the background art, the conventional sliding window (sliding window) searching technology has the defects that the window size is fixed, the size of a segmented image cannot be changed due to the size of a target, if a plurality of groups of sliding windows with different sizes work simultaneously, the calculation amount is increased, the efficiency is influenced, when the sliding stride is dense, the data amount is increased, the speed is influenced, and when the sliding stride is too large, the detection accuracy is influenced.

In view of the foregoing disadvantages, an embodiment of the present invention provides a target detection position correction method based on a sliding window. As shown in fig. 1, the method comprises the steps of:

and S101, setting the size and the moving stride of a sliding window, and segmenting the image by using the sliding window.

Firstly, collecting an image of an object to be detected; then the sliding window size and the moving step are set.

When the size of the sliding window is set, the sliding window is usually in a square shape by default, the size of the sliding window is determined according to the average size of a detected object, and the sliding window is not selected to be too large.

When the moving step s is selected, the moving step s is lower than a half of the sliding window, namely 0< s ═ slide.width, and the slide.width is the width of the sliding window; the sliding window is ensured to have a larger overlapping area, and the identification accuracy is improved.

After the size of the sliding window and the moving stride are set, the sliding window is utilized to start to segment the original image of the target detection, and a plurality of candidate target areas are obtained.

And S102, sending all the obtained candidate target regions into a CNN neural network for training, and obtaining confidence degrees of all the candidate target regions.

When the weight of the CNN neural network is trained, the candidate target areas obtained by all sliding windows except the target area are used as noise, and the candidate target areas obtained by all sliding windows with the correlation rate IoU (interaction over Unit) of the target area larger than the threshold I are used as targets and are put into the CNN neural network for training.

The expression of the target area relevance ratio IoU is:

in the formula, Area of overlay indicates the intersection of the correct result Area and the detection result Area; area of Union represents the Union of the correct result Area and the detection result Area; detection result represents a target Detection position; and GroudTruth represents the real position of the target.

When the noise regions are too many, some noise regions may be randomly deleted using a random () method, or some pictures of the corresponding training set may be reduced.

The selection of the threshold value I (0< I <1) needs to be adjusted according to the actual training situation, i.e. to the direction with higher accuracy.

Finally, the confidence scores [1], score [2],. and score [ n ] of all candidate target regions obtained by sliding the window are used as the output of the CNN neural network.

S103, selecting, as a reference value, an index region v ∈ (1, n) whose maximum value max _ score is max { score [1], score [2]. score [ n ] } corresponds to the maximum value, among all the confidences output by the CNN neural network.

In this embodiment, the maximum value is selected from all the confidence values output by the CNN neural network, that is, the most likely position of the target to be detected is selected, and then the position is taken as the center point to perform adjustment.

When the original image is divided sequentially by the sliding window, an array of bases also records the information of each divided image block sequentially, which is in one-to-one correspondence with score [ i ]. If socre [ v ] is the maximum value, then bounds [ v ] is the corresponding image block, called the index area.

S104, the candidate target regions are cut and combined by using the position correction method BSF _ Revise _ bases () and the reference values max _ score, v, to form a new target region.

The method is based on BFS (Breadth-First Search) Breadth traversal, and adds the Breadth traversed depth extent according to the size of a sliding window and the real size of a detection target as a traversal constraint condition, namely the Breadth traversed maximum depth extent is less than 2 to correct the position.

The cutting and combining method specifically comprises the following steps:

taking an index region corresponding to the maximum confidence coefficient max _ score as a center, and diffusing the index region to the periphery by using a breadth traversal BFS method;

diffusion region confidence score [ w ]]Comparing with the maximum confidence max _ score of the index region; wherein the invention sets three thresholds, T respectively₁、T₂、T₃，T₁Indicates diffusion region confidence score [ w ]]The difference value with the maximum confidence coefficient max _ score represents the strength of the connection between the two regions; t is₂Activating a threshold for the confidence level if the confidence level score [ w ] of the current diffusion region]Greater than T₂Indicating that the current region has higher reliability; t is₃As confidence suppression threshold, if the current diffusion region confidence score [ w ]]Less than T₃Indicating that the confidence of the current region is low; t is₁、T₂、T₃The size setting can be flexibly set according to the relation between the actual confidence degree obtained by the CNN neural network and the detection target.

If the confidence score [ w ] of the current diffusion region]Difference with maximum region confidence coefficient max _ score is less than T₁And the current diffusion region confidence score [ w ]]Greater than T₂The diffusion area is closely related to the index area, and the diffusion area w is a high-reliability area. In this case, the boundary should be enlarged toward the diffusion region w, and whether the peripheral region adjacent to the diffusion region w is covered with the boundary or not is consideredIncluding the target area.

If the confidence score [ w ] of the current diffusion region]Less than T₃It means that the w region is weakly associated with the target region and the target region is within the candidate region corresponding to max _ score, and the boundary of the diffusion region w in the corresponding direction should be narrowed, and the extension of the diffusion region w in the corresponding direction should be discarded.

And S105, outputting the corrected target area coordinates.

The embodiment of the invention provides a target detection position correction method based on a sliding window, wherein the size of the sliding window is set according to the size of a detected object, so that the size of a segmented image can be changed due to the size of a target; the size of the movable cloth is set to be half of that of the sliding window, so that a large overlapping area of the window is ensured, and the identification accuracy is improved; and adding the depth of breadth traversal according to the size of the sliding window and the real size of the detected target as a traversal constraint condition, namely, performing position correction when the maximum depth of breadth traversal is less than or equal to 2, and effectively improving the accuracy of target detection.

In order to make those skilled in the art better understand the present invention, the second embodiment of the present invention provides a sliding window based target detection position correction method, as shown in fig. 2, the method includes the following steps:

s201, selecting the width slide and the moving step S of the sliding serial port, and processing the image of the object to be detected by using the sliding window to obtain a 1-n candidate area.

S202, all the obtained candidate regions are sent to a CNN neural network for training, and the trained CNN neural network is used for obtaining the confidence score [ i ] of the candidate regions, as shown in FIG. 3.

S203, selecting a maximum confidence coefficient max _ score and an index region v corresponding to the maximum confidence coefficient from all candidate region confidence coefficients score [ i ] by adopting a maximum max method; the maximum confidence max _ score and the index area v are used as reference values.

And S204, calling a position correction method BFS _ Revise _ Bounds () to correct the boundary region of the candidate region.

The BFS _ Revise _ bounds () position correction method specifically comprises the following steps:

s2041, initializing a queue Q; initializing candidate region depth extension [1,2,3.. n ] ═ infinity.

The queue Q is a one-dimensional queue with the characteristic of first-in first-out, stores the center point traversed by the current breadth, just starts to store the area v corresponding to the maximum value, where the extent is 0, then stores the left, upper, right and lower points of v through circulation, and after the combination and cutting are finished, the end of the circulation is 1, and the next circulation is entered; the next circulation takes the left point as a central point, respectively adds the left, upper, right and lower parts of the left point, combines and cuts, respectively adds the left, upper, right and lower parts of the upper point by taking the left point as the central point, combines and cuts, and sequentially circulates until the left, upper, right and lower traversal of the lower point is completed, and at this time, the extent is 2; no new elements are in queue Q and the loop exits.

The extent array stores the distance of the index point from the maximum value v, initialized to be positive infinity.

S2042, accessing the index region point v corresponding to the maximum confidence coefficient; visited [ v ] ═ 1, extend [ v ] ═ 0; the center point v is queued in queue Q.

The visited array indicates whether the area has been visited, and if the visited [ v ] is 1, the area has been visited; visited [ v ] ═ 0; indicating that it has not been accessed.

And S2043, if the queue Q is not empty, continuing to execute, otherwise, jumping to S20413.

S2044, dequeuing the head element of the queue Q, and assigning to temp.

S2045, w is equal to the left \ upper \ right \ lower index point of temp.

Left \ up \ right \ down means one access at a time, and the order of accesses is different, such as the first access left, the second access up, the third access right, and the fourth access down. This is in a loop statement, which is accessed if it does not exist for the first time.

S2046, if w exists, executing downwards, otherwise, jumping to S2044;

s2047, if w is not accessed, executing downwards, otherwise jumping to S20412;

s2048, accessing w, setting visited [ w ] to 1, where extend [ w ] + 1; as shown in fig. 4;

s2049, confidence score [ w ] of diffusion region w]Difference from maximum confidence max _ score-score [ w]≤T₁And confidence score [ w ] of diffusion region w]≥T₂Then expand from the central point v corresponding to the left/upper/right/lower boundary of the region to the diffusion region w corresponding to the left/upper/right/lower boundary as shown in fig. 5;

s20410, if extend [ w ] <2, w is enqueued in queue Q;

s20411, confidence score [ w ] of diffusion region w]≤T₃And the diffusion region w left/right/upper/lower boundary is not enlarged; the maximum confidence max _ score corresponds to a region left/right/top/bottom boundary reduction d₁(ii) a As shown in fig. 6;

s20412, jumping to S2046 for the next (up \ right \ down) adjacent confidence point of w ═ v; jumping to S2044 until w does not exist;

and S20413, returning to the corrected target area boundary coordinates.

Wherein, T₁、T₂、T₃The size setting can be flexibly set according to the relation between the actual confidence coefficient obtained by the CNN neural network and the detection target, d₁And the size of the sliding window and the size of the target object are flexibly set.

The embodiment of the invention provides a target detection position correction method based on a sliding window, wherein the size of the sliding window is determined according to the average size of a detected object, the moving stride is less than half of the window, a large overlapping area of the window is ensured, and the identification accuracy is improved; taking the corresponding region of the maximum confidence as the center, diffusing to the periphery by a BFS method, and flexibly setting a threshold T according to the relation between the actual confidence obtained by the CNN neural network and the detection target₁、T₂、T₃The boundary area is corrected, and the accuracy and the speed of target identification are improved.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A target detection position correction method based on a sliding window is characterized by comprising the following steps:

determining the width of the sliding window according to the average size of all the objects to be detected; the moving cloth width of the sliding window is less than or equal to half of the width of the sliding window;

cutting and combining the candidate target area by using a position correction method and a reference value to form a new target area; according to the width of the sliding window and the size of the target to be detected, the depth of the breadth traversal is used as a traversal constraint condition, and when the maximum depth of the breadth traversal is less than or equal to 2, position correction is carried out;

the method for cutting and combining the candidate target area by using the position correction method and the reference value comprises the following steps:

taking the index area corresponding to the maximum confidence coefficient as a central point;

Taking the central point region as an origin, and taking four adjacent regions, namely an upper region, a lower region, a left region and a right region, as current candidate diffusion regions;

2. The sliding-window-based object detection position correction method according to claim 1, wherein the step of sending all candidate object regions into a CNN neural network for training comprises:

3. The sliding-window based object detection position correction method as claimed in claim 2, wherein when the noise area is excessive, a plurality of noise areas are randomly deleted or pictures of a corresponding training set are deleted by using a random sampling method.

4. The sliding-window-based object detection position correction method according to claim 1, wherein a maximum confidence value is selected from all confidence values output by the CNN neural network, and an index region corresponding to the maximum confidence value and the maximum confidence value is used as a reference value.

5. A sliding window based object detection position correction apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program, comprising:

cutting and combining the candidate target area by using a position correction method and a reference value to form a new target area;

according to the width of the sliding window and the size of the target to be detected, the depth of the breadth traversal is used as a traversal constraint condition, and when the maximum depth of the breadth traversal is less than or equal to 2, position correction is carried out;