CN112084886B

CN112084886B - Method and device for improving detection performance of neural network target detection

Info

Publication number: CN112084886B
Application number: CN202010835666.4A
Authority: CN
Inventors: 韦虎; 涂治国
Original assignee: Mouxin Technology Shanghai Co ltd
Current assignee: Mouxin Technology Shanghai Co ltd
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2022-03-15
Anticipated expiration: 2040-08-18
Also published as: CN112084886A

Abstract

The invention discloses a method and a device for improving the detection performance of a neural network target, and relates to the technical field of digital image processing. The method comprises the following steps: determining the sizes of a plurality of rectangular sliding windows for scanning according to the input size of the target detection neural network algorithm and the size of an original input image; during each frame detection, rotating the sliding window subgraph in each rectangular sliding window according to a preset angle and carrying out scaling processing to generate a sliding window rotating mapping subgraph, and carrying out scaling processing on an original input image to generate a full-map mapping subgraph; combining and splicing the full map mapping subgraph and each sliding window rotation mapping subgraph to form a detection input image; and detecting the detection input image through a target detection neural network algorithm corresponding to the input scale. The invention reduces the calculation power and bandwidth requirements of the target detection algorithm on the monitoring edge computing equipment, and optimizes the target detection distance and the detection accuracy.

Description

Method and device for improving detection performance of neural network target detection

Technical Field

The invention relates to the technical field of digital image processing, in particular to a method and a device for improving the detection performance of detecting a neural network target.

Background

With the rapid development of artificial intelligence and deep learning technology, in the field of monitoring, a target detection method based on a Convolutional Neural Network (CNN) algorithm is widely applied, and a common target detection process is to slide a window from left to right and from top to bottom and identify a target by classification. To detect different target types at different viewing distances, we can use windows of different sizes and aspect ratios (sliding windows). In the target detection algorithm, commonly used methods such as RCNN, Fast RCNN and Fast RCNN are methods based on candidate regions and deep learning classification, from RCNN to Fast RCNN and then to Fast RCNN, and the maps (mean Average precision) are continuously refreshed; the methods such as YOLO, SSD, DenseBox, etc. are regression methods based on deep learning, and improve the performance to a very high frame rate. In monitored edge computing application scenarios, such as IPC, NVR, DVR, etc., the computing power and bandwidth of current devices are very limited. In order to meet the frame rate requirement of real-time target detection, neural network algorithms with low computational power requirements, such as SSD, YOLO, MobileNet _ SSD, MTCNN, DensNet, and the like, are often adopted on these devices. However, even with these relatively low complexity algorithms, at the commonly used video image resolution (e.g., 1080P), the real-time measurement power and bandwidth requirements still exceed the capabilities of most current hardware.

On one hand, in order to solve the technical problems of insufficient computing power and bandwidth, the following methods are provided in the prior art: the first, and most common, approach is to simplify the neural network employed by the algorithm (pruning and low bit quantization). However, the accuracy of target detection of the neural network after pruning and low-bit quantization is often significantly reduced, the false detection rate of missed detection is significantly increased, and the low-bit quantization is limited by the support degree of hardware for quantization bits (bits). The second method is to reduce the actual frame rate of target detection, only detect key frames, and use low-complexity target tracking algorithm to make up for the insufficient frame rate of detection in non-key frames. However, this method suffers from problems of missed detection and tracking errors when the object moves faster. The third method is to sacrifice the detection distance of the algorithm. For example, the last small-scale feature convolution layer of the network such as the SSD is deleted, so that the computational power and bandwidth requirements can be reduced, but the maximum scale at which the target can be detected is reduced, so that the face or human figure close to the camera cannot be detected. The fourth method is to reduce the input image and use a neural network algorithm with a small input size. However, this causes a decrease in the resolution of the input image, and limits the minimum size at which the target can be detected, so that a human face, a human figure, or the like at a distance cannot be detected because of too few pixels.

On the other hand, in order to detect targets at different distances, the common methods in the prior art are: and carrying out multi-scale scaling on the original input image to generate a multi-scale pyramid image group, and then respectively detecting the input images with different scales. When a large-size object near the object is detected, detecting on the reduced image; when detecting a distant object, the detection is performed on a large-size image with high resolution. However, the above method is complex in design, and requires training neural network for each image scale, which puts high demands on the computing power and bandwidth of the device.

On the other hand, in order to improve the target detection rate, the training targets adopted by the existing target monitoring neural network usually include multiple postures or visual angles, for example, a person is taken as a detection target, such as a figure including standing and sitting, a tilted and rotated human face, and the like. However, in a few special monitoring occasions, an abnormal target rotating by 90 degrees, 180 degrees or 270 degrees, such as a human body standing horizontally (lying or lying) in a swimming pool scene, a human body (face) standing upside down in a gymnasium scene, etc., may occur, and the missing rate of the target by the existing target detection algorithm is high. This is because most of the trained targets are upright, which results in that the trained detection algorithm has not ideal detection performance for the targets that are upright (including lying and lying), inverted. In order to solve the above problems, the conventional practice in the prior art is to perform 90-degree, 180-degree and 270-degree rotation on the monitoring input image and then perform re-detection, so as to improve the detection rate of these targets, but the data volume to be processed after the above scheme is adopted will be improved by more than 4 times than the original data volume, and higher demands are made on the computing power and bandwidth of the device.

In summary, the existing low-complexity optimization method cannot achieve the aspects of detection accuracy, frame rate, farthest and nearest detection distance, and the like. The actual monitoring application scene is complex, the monitoring device needs to meet high target detection accuracy, needs a sufficient frame rate to detect a target passing through rapidly, and needs to detect both a large-scale target at a near position and a small-scale target at a far position (the targets can be detected when the targets are close to a camera or pass through at the far position), especially the detection of abnormal targets. How to reduce the calculation power and bandwidth requirements of a target detection algorithm on monitoring edge computing equipment, improve the target detection rate and optimize the target detection distance is a technical problem which needs to be solved urgently at present.

Disclosure of Invention

The invention aims to: the method and the device for improving the detection performance of the neural network target overcome the defects of the prior art. The method combines the low frame rate detection of the far small target and the high frame rate detection of the near target by utilizing the characteristic that the motion vectors of the far target and the near target in the monitoring video image are different in size, adjusts the angle of the detection input image according to the characteristic of the detection target, and simultaneously utilizes the characteristic that the spliced image is adaptive to the detection neural network with fixed input scale, thereby reducing the calculation power and bandwidth requirements of the target detection algorithm on the monitoring edge computing equipment and optimizing the target detection distance and the detection accuracy.

In order to achieve the above object, the present invention provides the following technical solutions:

a method for improving the detection performance of a target of a neural network comprises the following steps:

determining the sizes of a plurality of rectangular sliding windows for scanning according to the input size of a target detection neural network algorithm and the size of an original input image, wherein the rectangular sliding windows can move to different positions on the original input image according to a preset scanning rule and frames;

during each frame detection, obtaining sliding window subgraphs in each rectangular sliding window, rotating the sliding window subgraphs according to a preset angle, carrying out scaling processing to generate sliding window rotation mapping subgraphs, and carrying out scaling processing on an original input image to generate full-map mapping subgraphs, wherein the resolution of the full-map mapping subgraphs is lower than that of the sliding window rotation mapping subgraphs;

combining and splicing the full map mapping subgraph and the sliding window rotation mapping subgraphs corresponding to the rectangular sliding windows into a rectangular input image which is used as a detection input image;

and detecting the detection input image through a target detection neural network algorithm corresponding to the input scale.

Further, the method also comprises the following steps: and merging and mapping the detection results of the sub-images to the original input image.

Further, when the detected target is a horizontal or inverted target, the sliding window graph is rotated by an angle of 90 degrees, 180 degrees or 270 degrees respectively; the length-width ratio of the rectangular sliding window is matched with the size of the detection target.

Further, the step of determining the size of a plurality of rectangular sliding windows for scanning according to the input size of the target detection neural network algorithm and the size of the original input image is as follows:

step 110, determining the input size of the adopted target detection neural network algorithm according to the calculation force of the current equipment, and the minimum detection size and the maximum detection size of the target which can be detected by the algorithm under the corresponding input size;

step 120, according to the input size, dividing the input rectangle for detecting the neural network algorithm into a plurality of rectangular sub-map areas, wherein each rectangular sub-map area is larger than or equal to the minimum detection size which can be detected by the algorithm; the rectangular sub-graph region comprises a full-graph mapping sub-graph region and a sliding window mapping sub-graph region, the length-width ratio of the full-graph mapping sub-graph region is the same as that of an original input image and is used for displaying the full-graph mapping sub-graph, and the sliding window mapping sub-graph region is used for displaying a corresponding sliding window rotation mapping sub-graph;

wherein, the dividing steps of each rectangular sub-map area are as follows,

step 121, determining the size of the full map mapping sub-map area on the input rectangle of the neural network algorithm to be detected: selecting an initial scale according to the size of the object with the nearest detection distance L0 on the original input image, so that the nearest object on the original input image is scaled to be smaller than or equal to the maximum detection size which can be detected by an algorithm, so as to detect the nearest object on the full-map sub-map region and leave enough space for each sliding window mapping sub-map region, and after the scale of the original input image to the full-map sub-map region is determined, determining the farthest detection distance L1 which can be detected on the full-map sub-map region;

step 122, selecting a rectangular area as a sliding window mapping sub-map area in the remaining space, so that a target at a distance L1 on the original input image can fall into the sliding window mapping sub-map area after rotating and reducing according to a preset angle, and can be detected by a detection algorithm; adjusting the scaling so that the object at the farthest detection distance L2 on the original input image can be detected;

step 123, repeating step 122 to determine the size of other sliding window mapping sub-map areas, and gradually enlarging the farthest detection distance until no suitable space is available for setting as a sliding window mapping sub-map area;

and step 124, repeatedly executing the step 121 to the step 123 to adjust the size of each rectangular sub-image region and the corresponding scaling so that the detection distance is farthest.

Further, in step 122, the target at the farthest detection distance L2 on the original input image stays on the original image for a longer time than the round of scanning time of the sliding window.

Further, the plurality of rectangular sliding windows differ in size and/or aspect ratio;

the preset scanning rule is that the whole picture is scanned at a constant speed from left to right and from top to bottom in sequence, or the whole picture is scanned according to a random moving rule, or the whole picture is scanned according to a sequence set by a user.

And further, obtaining the detection result of each sliding window subgraph, and adaptively adjusting the moving speed and/or the staying time of the rectangular sliding window during scanning according to the detection result.

The invention also provides a device for improving the detection performance of the neural network target, which comprises the following structures:

the sliding window setting module is used for determining the sizes of a plurality of rectangular sliding windows for scanning according to the input size of the target detection neural network algorithm and the size of the original input image, and the rectangular sliding windows can move to different positions on the original input image according to a preset scanning rule and frames;

the image preprocessing module is connected with the sliding window setting module and used for acquiring sliding window subgraphs in each rectangular sliding window during detection of each frame, rotating the sliding window subgraphs according to a preset angle and carrying out scaling processing to generate sliding window rotation mapping subgraphs, and carrying out scaling processing on an original input image to generate full-map mapping subgraphs, wherein the resolution of the full-map mapping subgraphs is lower than that of the sliding window rotation mapping subgraphs; combining and splicing the full map mapping subgraph and the sliding window rotation mapping subgraphs corresponding to the rectangular sliding windows into a rectangular input image which is used as a detection input image;

and the target detection module is connected with the sliding window setting module and the image preprocessing module and is used for detecting the detection input image by adopting a target detection neural network algorithm corresponding to the input scale.

And further, the system also comprises a result display module which is used for merging and mapping the detection results of all the subgraphs to the original input image for displaying and outputting.

Further, the sliding window setting module comprises an input size determining unit and a rectangular sub-picture dividing unit;

the input size determination unit is configured to: determining the input size of the adopted target detection neural network algorithm according to the calculation force of the current equipment, and the minimum detection size and the maximum detection size of the target which can be detected by the algorithm under the corresponding input size;

the rectangular sub-picture dividing unit is configured to: according to the input size, dividing the input rectangle of the neural network algorithm to be detected into a plurality of rectangular sub-map regions, wherein each rectangular sub-map region is larger than or equal to the minimum detection size which can be detected by the algorithm; the rectangular sub-graph region comprises a full-graph mapping sub-graph region and a sliding window mapping sub-graph region, the length-width ratio of the full-graph mapping sub-graph region is the same as that of an original input image and is used for displaying the full-graph mapping sub-graph, and the sliding window mapping sub-graph region is used for displaying a corresponding sliding window rotation mapping sub-graph;

wherein, the dividing steps of each rectangular sub-map area are as follows,

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects as examples: the characteristics of different sizes of motion vectors of a detected target at a far position and a detected target at a near position in a monitoring video image are used for combining the low frame rate detection of the far small target with the high frame rate detection of the near target, the angle of a detected input image can be adjusted according to the characteristics of the detected target, and meanwhile, the calculation power and bandwidth requirements of a target detection algorithm on monitoring edge computing equipment are reduced and the target detection distance and detection accuracy are optimized by utilizing the characteristics that a spliced image is matched with a detection neural network with a fixed input scale. The method is particularly suitable for improving the target detection performance in multiple directions (such as handstand and horizontal standing) or under different depth of field fuzzy degrees.

Compared with the existing multi-scale target detection method, on one hand, the multi-scale target detection method usually needs to detect an input image with multi-level resolution from high to low, and the calculation power and the bandwidth required by calculation are large. On the other hand, the common multi-scale detection method based on the neural network needs to design a corresponding network for each input scale, and the complexity of designing and training the network is higher, but the method only adopts a fixed input scale, so that the complexity of training and designing the neural network is obviously simplified; on the other hand, when the length-width ratio of the input size of the original input image is inconsistent with the length-width ratio of the input size of the neural network, the existing method for adding the black edge wastes computation power and bandwidth, and the method provided by the invention can fully utilize the hardware performance, improve the detection capability of the target detection equipment and simultaneously improve the utilization efficiency of the computation power of the equipment.

Drawings

Fig. 1 is a diagram illustrating a relationship between a size and a distance of an object on an input image according to an embodiment of the present invention.

Fig. 2 is an exemplary diagram of a prior art for filling black borders in an input image of a detection algorithm.

Fig. 3 is a flowchart of a method for improving the detection performance of detecting a neural network target according to an embodiment of the present invention.

Fig. 4 is a diagram illustrating an example of rotation processing for generating a sliding window rotation map subgraph according to an embodiment of the present invention.

Fig. 5 is a diagram illustrating an exemplary operation of generating a detection input image by stitching according to an embodiment of the present invention.

Description of reference numerals:

a large-size face 10, a medium-size face 20, and a small-size face 30;

original input image 100, detected input image 200, black border 300.

Detailed Description

The following describes the method and apparatus for improving the target detection performance of the neural network in detail with reference to the accompanying drawings and specific embodiments. It should be noted that technical features or combinations of technical features described in the following embodiments should not be considered as being isolated, and they may be combined with each other to achieve better technical effects. In the drawings of the embodiments described below, the same reference numerals appearing in the respective drawings denote the same features or components, and may be applied to different embodiments. Thus, once an item is defined in one drawing, it need not be further discussed in subsequent drawings.

It should be noted that the structures, proportions, sizes, and other dimensions shown in the drawings and described in the specification are only for the purpose of understanding and reading the present disclosure, and are not intended to limit the scope of the invention, which is defined by the claims, and any modifications of the structures, changes in the proportions and adjustments of the sizes and other dimensions, should be construed as falling within the scope of the invention unless the function and objectives of the invention are affected. The scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that described or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

Examples

As shown in fig. 1, in a surveillance video, a detected target located near the surveillance video is large in size, a motion vector of the detected target is also relatively large in the video, the moving speed of the target on a picture is high, and a high detection frame rate is required to avoid missing detection; and the size of the remote detection target in the image is relatively small, the motion vector of the remote detection target is relatively small, the moving speed on the picture is slow, and the detection can be carried out by adopting a low detection frame rate.

The detection target may be, for example and without limitation, a human face, a human figure, a vehicle type, and the like. In fig. 1, a mode of using a human face as a detection target is illustrated, and 3 detection targets, namely, a large-size human face 10 with a distance of L0, a medium-size human face 20 with a distance of L1, and a small-size human face 30 with a distance of L2 are displayed in a monitoring input image, and the distances from the three to a camera are sequentially increased.

On the other hand, parameters and a network structure of a detection algorithm based on the neural network often correspond to a specified input size, the size of a detected input image cannot be adjusted at will, and network design and training may need to be carried out again every time the input size is changed. In most cases, the original input image and the detected input image are not matched, and if the aspect ratio of the image is forcibly changed for scaling, the accuracy of target detection may be reduced, so at this time, it is often necessary to expand the original input image by black edges to generate a detected input map (also referred to as a detection algorithm input map) to adapt to the input aspect ratio size requirement of the detection network, as shown in fig. 2, the size of the original input image 100 in fig. 2 is 1920 × 1080 (length × width), the size of the detected input image 200 is 320 × 240 (length × width), and the black edge 300 is below the detected input image 200. The black edge area images participate in calculation, calculation power and bandwidth are wasted, and if the black edge area is replaced by image content, the calculation power can be fully utilized to improve the detection performance.

Based on the principle, the invention provides a method for improving the detection performance of detecting the neural network target. Referring to fig. 3, the method comprises the steps of:

s100, determining the sizes of a plurality of rectangular sliding windows for scanning according to the input size of the target detection neural network algorithm and the size of the original input image, wherein the rectangular sliding windows can move to different positions on the original input image according to a preset scanning rule and frames.

S200, when each frame is detected, obtaining sliding window subgraphs in each rectangular sliding window, rotating the sliding window subgraphs according to a preset angle, carrying out scaling processing to generate sliding window rotation mapping subgraphs, and carrying out scaling processing on an original input image to generate full-map subgraphs, wherein the resolution of the full-map subgraphs is lower than that of the sliding window rotation mapping subgraphs.

The preset angle may be a specific angle that a user needs to set according to target detection, for example, when the user performs face detection on a swimming pool monitoring video, the preset angle that needs to be rotated is set to 90 degrees to improve the face detection rate; for example, the detection system may set a preset angle to be rotated in the current target detection according to a rotation angle used by the user in the previous target detection.

And S300, combining and splicing the full map sub-map and the sliding window rotation sub-map corresponding to each rectangular sliding window into a rectangular input image to be used as a detection input image.

And S400, detecting the detection input image through a target detection neural network algorithm corresponding to the input scale.

The step of S400 may further include the following steps: and merging and mapping the detection results of the full map subgraph and the sliding window rotation map subgraph to the original input image for output.

Preferably, when the detected target is a horizontal or inverted target, the sliding window graph is rotated by an angle of 90 degrees, 180 degrees or 270 degrees, respectively; the length-width ratio of the rectangular sliding window is matched with the size of the detection target. Since these objects which are standing up or standing upside down do not move or move slowly in the video, the method is suitable for detecting at a low detection frame rate. Thus, the detection rate of the target which is approximately upright or inverted can be increased with a low computational effort.

According to the technical scheme provided by the invention, the rectangular sliding windows with different sizes or aspect ratios are preset, and the rectangular sliding windows can be moved to different positions on the original input image according to the set rule and frames. When each frame of image is detected, firstly, the selected regions (sliding window sub-images) of the rectangular sliding windows on the original image are rotated according to a preset angle, such as 90 degrees, 180 degrees or 270 degrees, and then scaling processing is performed to generate corresponding sliding window rotation mapping sub-images, as shown in fig. 4; then, reducing the original input image into a full map mapping subgraph with lower resolution; finally, splicing the full map mapping subgraph and each sliding window rotation mapping subgraph into a rectangular input image with the size smaller than that of the original input image and then taking the rectangular input image as a detection input image; and finally, detecting the detected input image by adopting a neural network target detection algorithm corresponding to the input scale, and merging and mapping the detection results of all sub-images onto the original input image. The technical scheme enables the detection neural network algorithm with a small input size to be adopted, and real-time detection of the monitoring extremely far and extremely near targets can be realized. In particular, the detection rate of detecting the horizontal and inverted targets of the neural network can be improved, so that the requirements of the accuracy of the target detection function on the monitoring edge computing equipment, the detection distance, the frame rate performance and the like are improved, and the missing detection rate and the false detection rate are reduced.

In this embodiment, preferably, the step of determining the sizes of the plurality of rectangular sliding windows for scanning according to the input size of the target detection neural network algorithm and the size of the original input image is:

and step 110, determining the input size of the adopted target detection neural network algorithm according to the calculation force of the current equipment, and the minimum detection size and the maximum detection size of the target which can be detected by the algorithm under the corresponding input size.

Step 120, according to the input size, dividing the input rectangle for detecting the neural network algorithm into a plurality of rectangular sub-map areas, wherein each rectangular sub-map area is larger than or equal to the minimum detection size which can be detected by the algorithm; the rectangular sub-map region comprises a full map mapping sub-map region and a sliding window mapping sub-map region, the aspect ratio of the full map mapping sub-map region is the same as that of the original input image, and the full map mapping sub-map region is used for displaying the original input image reduced to low resolution, namely the full map mapping sub-map; and the sliding window mapping sub-graph area is used for displaying the corresponding sliding window rotation mapping sub-graph.

Wherein, the dividing steps of each rectangular sub-map area are as follows,

step 121, determining the size of the full map mapping sub-map area on the input rectangle of the neural network algorithm to be detected: selecting an initial scale according to the size of the object at the closest detection distance L0 on the original input image, such that the closest object on the original input image is scaled to be less than or equal to the maximum detection size detectable by the algorithm, so as to detect the closest object on the full-map sub-map region while leaving sufficient space for each sliding window map sub-map region, the scale of the original input image to the full-map sub-map region being determined, and the farthest detection distance L1 detectable on the full-map sub-map region being also determined.

Step 122, selecting a rectangular area as a sliding window mapping sub-map area in the remaining space, so that a target at a distance L1 on the original input image can fall into the sliding window mapping sub-map area after rotating and reducing according to a preset angle, and can be detected by a detection algorithm; the scaling is adjusted so that the object on the original input image that is the farthest detection distance L2 can be detected. Wherein, the target with the farthest detection distance L2 on the original input image stays on the original input image for a time longer than the round scanning time of the sliding window.

Step 123, repeat step 122 to determine the size of other sliding window mapping sub-map regions, and gradually enlarge the farthest detection distance until there is no suitable space for setting as the sliding window mapping sub-map region.

The rectangular subgraph zone segmentation method provided by the invention determines the subgraph zone size and the scaling ratio corresponding to the target in each distance range step by step from near to far according to the corresponding relation between the target size and the distance on the original input image and the constraint relation between the stay time of the target in the image and the scanning period of the sliding window.

In this embodiment, the plurality of rectangular sliding windows have different sizes and/or aspect ratios.

The scanning rules can be set by default of the system or can be personalized by the user according to needs. Preferably, the preset scanning rule is to scan the whole graph at a constant speed from left to right and from top to bottom, or scan the whole graph according to a random moving rule, or scan the whole graph according to a sequence set by a user. Furthermore, the detection result of each sliding window sub-graph can be obtained, and the moving speed and/or the staying time during the scanning of the rectangular sliding window can be adaptively adjusted according to the detection result.

The following describes the implementation steps of the present invention in detail with reference to fig. 4 and 5 by taking the example of setting 2 rectangular sliding windows.

Step 1, determining the input size of the adopted target detection neural network algorithm according to the calculation force of the current equipment, and determining the minimum detection size and the maximum detection size of the target which can be detected by the algorithm under the corresponding input size.

And 2, dividing the input rectangle of the detection network into a plurality of rectangle sub-map regions according to the input size of the given detection neural network, wherein the division into 3 rectangle sub-map regions is exemplified as shown in fig. 4, and each rectangle sub-map region is larger than or equal to the minimum detection size which can be detected by the algorithm. Wherein the length-width ratio of 1 sub-map region is kept the same as the length-width ratio of the original input image, and the sub-map region is a full map sub-map region and is used for displaying a full map sub-map; the remaining 2 rectangular subgraphs are called sliding window mapping subgraph regions and are used for displaying the corresponding sliding window rotation mapping subgraphs after the sliding window subgraph is rotated and scaled, and the process of generating the corresponding sliding window rotation mapping subgraph by the sliding window subgraph refers to the process shown in fig. 4.

The size of the full map mapping sub-region corresponds to a certain reduction ratio of the original input image. Therefore, the size of the full map sub-map region corresponds to a certain size range in which the target on the original input image can be detected, that is, corresponds to the target within a certain distance range from the camera.

The method comprises the following steps of dividing each rectangular sub-map region:

step 21, firstly, determining the size of the full map mapping sub-map area on the input rectangle of the detection algorithm. Based on the size of the object at the closest detection distance L0 on the original input image, an appropriate initial scaling is selected such that the closest object on the original input image is scaled to be less than or equal to the maximum object size that the algorithm can detect, so that the closest distance object is detected on the full-map sub-map region while leaving enough space for each sliding-window map sub-map region. After the scaling from the original input image to the full map sub-map region is determined, the farthest detection distance L1 detectable on the full map sub-map region is also determined, i.e. the size of the target corresponding to the minimum detection size on the full map sub-map region on the original input image.

Step 22, then, on the remaining space, selecting a suitable rectangular area to be divided into a sliding window mapping sub-map area, so that the target at the distance L1 on the original input image, after rotating and shrinking according to a preset angle, can fall into the sliding window mapping sub-map area, and can be detected by the detection algorithm, that is, greater than or equal to the minimum detection size. This sliding window mapping sub-region corresponds to a sliding window region of a certain size on the original input image, according to the scaling. Each frame of the sliding window moves according to a set rule, and the whole original input image is scanned in a certain period. The scaling is adjusted so that the object at the farthest detection distance L2 on the original input image can be detected, that is, the object at the distance L2 is reduced to be equal to or larger than the minimum detection size, and the object at the distance L2 stays on the original input image for a longer time than one round of scanning time of the sliding window.

The rule of the sliding window movement (i.e. the scanning rule) may be to scan the whole graph from left to right and from top to bottom sequentially at a constant speed, or may be to scan the whole graph according to a specific sequence, or may be to scan the whole graph according to a random movement rule. Further, the moving speed or the staying time of the sliding window can be adjusted in a self-adaptive mode according to the detection result.

And step 23, sequentially determining the sizes of other sliding window mapping sub-areas. The foregoing step 22 is repeated to expand the farthest detection distance step by step until there is no suitable space to set as the sliding window map subgraph.

And 24, adjusting the size and the corresponding scaling of each sub-image region to enable the detection distance to be farthest. And returning to the process from the step 21 to the step 23, and adjusting the size and the scaling of each sub-map area so that the detection distance is farthest.

And 3, during each frame detection, obtaining sliding window subgraphs in each rectangular sliding window, rotating the sliding window subgraphs according to a preset angle, then carrying out scaling processing to generate sliding window rotation mapping subgraphs, and carrying out scaling processing on an original input image to generate full-map mapping subgraphs. And combining and splicing the full map sub-map and the sliding window rotation sub-map corresponding to each rectangular sliding window into a rectangular input image and using the rectangular input image as a detection input image, which is shown in fig. 5.

And 4, finally, detecting the detected input image by adopting a corresponding neural network target detection algorithm, and merging and mapping the detection results of all sub-images onto the original input image.

The method reduces the original input image into the full map mapping subgraph with lower resolution, adopts high frame rate to detect the near target, and adopts lower frame rate to detect the far small target, while each sliding window subgraph keeps higher resolution. Therefore, when the neural network target detection algorithm with small input size is used by the monitoring edge computing equipment, the near and far targets can be detected as far as possible. In particular, the method can be used for an application scenario in which a sliding window sub-graph is generated after a local region of an original input image is subjected to rotation scaling or other color processing, and can significantly improve the target detection performance in multiple directions (such as handstand and horizontal) or different depth-of-field blur degrees.

The invention further provides a device for improving the detection performance of detecting the target of the neural network.

The device comprises a sliding window setting module, an image preprocessing module and a target detection module.

The sliding window setting module is used for determining the sizes of a plurality of rectangular sliding windows for scanning according to the input size of the target detection neural network algorithm and the size of the original input image, and the rectangular sliding windows can move to different positions on the original input image according to a preset scanning rule and frames.

The image preprocessing module is connected with the sliding window setting module and used for acquiring sliding window subgraphs in each rectangular sliding window during each frame detection, rotating the sliding window subgraphs according to a preset angle and carrying out scaling processing to generate sliding window rotation mapping subgraphs, and carrying out scaling processing on an original input image to generate full-map mapping subgraphs, wherein the resolution of the full-map mapping subgraphs is lower than that of the sliding window rotation mapping subgraphs; and combining and splicing the full map mapping subgraph and the sliding window rotation mapping subgraphs corresponding to the rectangular sliding windows into a rectangular input image to be used as a detection input image.

Further, the device may further include a result display module, configured to merge and map the detection results of the sub-graphs onto the original input image, and display the result.

In this embodiment, the sliding window setting module may include an input size determination unit and a rectangular sub-frame division unit.

The input size determination unit is configured to: and determining the input size of the adopted target detection neural network algorithm according to the calculation force of the current equipment, and the minimum detection size and the maximum detection size of the target which can be detected by the algorithm under the corresponding input size.

The rectangular sub-picture dividing unit is configured to: according to the input size, dividing the input rectangle of the neural network algorithm to be detected into a plurality of rectangular sub-map regions, wherein each rectangular sub-map region is larger than or equal to the minimum detection size which can be detected by the algorithm; the rectangular sub-graph region comprises a full-graph mapping sub-graph region and a sliding window mapping sub-graph region, the length-width ratio of the full-graph mapping sub-graph region is the same as that of an original input image and is used for displaying the full-graph mapping sub-graph, and the sliding window mapping sub-graph region is used for displaying a corresponding sliding window rotation mapping sub-graph.

Wherein, the dividing steps of each rectangular sub-map area are as follows,

Step 122, selecting a rectangular area as a sliding window mapping sub-map area in the remaining space, so that a target at a distance L1 on the original input image can fall into the sliding window mapping sub-map area after rotating and reducing according to a preset angle, and can be detected by a detection algorithm; the scaling is adjusted so that the object on the original input image that is the farthest detection distance L2 can be detected.

The preset scanning rule can be that the whole picture is scanned at a constant speed from left to right and from top to bottom, or the whole picture is scanned according to a random moving rule, or the whole picture is scanned according to a sequence set by a user.

Other technical features are described in the previous embodiment and are not described in detail herein.

In the foregoing description, the disclosure of the present invention is not intended to limit itself to these aspects. Rather, the various components may be selectively and operatively combined in any number within the intended scope of the present disclosure. In addition, terms like "comprising," "including," and "having" should be interpreted as inclusive or open-ended, rather than exclusive or closed-ended, by default, unless explicitly defined to the contrary. All technical, scientific, or other terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. Common terms found in dictionaries should not be interpreted too ideally or too realistically in the context of related art documents unless the present disclosure expressly limits them to that. Any changes and modifications of the present invention based on the above disclosure will be within the scope of the appended claims.

Claims

1. A method for improving the detection performance of a target of a neural network is characterized by comprising the following steps:

detecting the detection input image through a target detection neural network algorithm corresponding to the input scale;

the method comprises the following steps of determining the sizes of a plurality of rectangular sliding windows for scanning:

wherein, the dividing steps of each rectangular sub-map area are as follows,

2. The method of claim 1, further comprising the steps of: and merging and mapping the detection results of the sub-images to the original input image.

3. The method according to claim 1 or 2, characterized in that: when the detected target is a horizontal or inverted target, rotating the sliding window picture according to an angle of 90 degrees, 180 degrees or 270 degrees respectively; the length-width ratio of the rectangular sliding window is matched with the size of the detection target.

4. The method of claim 1, wherein: in step 122, the target at the farthest detection distance L2 on the original input image stays on the original image for a longer time than the round of scanning time of the sliding window.

5. The method of claim 4, wherein: the plurality of rectangular sliding windows are different in size and/or aspect ratio;

6. The method of claim 5, wherein: and obtaining the detection result of each sliding window subgraph, and adaptively adjusting the moving speed and/or the staying time of the rectangular sliding window during scanning according to the detection result.

7. An apparatus for improving the detection performance of detecting a neural network target, comprising:

the target detection module is connected with the sliding window setting module and the image preprocessing module and is used for detecting the detection input image by adopting a target detection neural network algorithm corresponding to the input scale;

the sliding window setting module comprises an input size determining unit and a rectangular subgraph dividing unit;

wherein, the dividing steps of each rectangular sub-map area are as follows,

8. The apparatus of claim 7, wherein: and the result display module is used for merging and mapping the detection results of the sub-images to the original input image for display and output.