CN111242066B

CN111242066B - Large-size image target detection method, device and computer readable storage medium

Info

Publication number: CN111242066B
Application number: CN202010053891.2A
Authority: CN
Inventors: 李荣春; 周鑫; 窦勇; 姜晶菲; 牛新; 苏华友; 乔鹏; 潘衡岳
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2023-09-05
Anticipated expiration: 2040-01-17
Also published as: CN111242066A

Abstract

The application discloses a large-size image target detection method, a device and a computer readable storage medium, wherein the method comprises the following steps: preprocessing an original image to obtain a plurality of groups of small images; inputting the groups of small images into a target detection network to perform batch target detection to obtain a detection result of each small image; screening the detection result of the small image; and mapping the screened detection result back to the original image to obtain a target detection result of the original image. According to the large-size image target detection method provided by the application, the image is divided into a plurality of groups of small images, the small images are detected by the target detection network group, the detection result of the small images is mapped back to the original image, the target detection result of the original image is obtained, the target detection speed of the image is higher, and the accuracy of the detection result is high, so that the method is especially suitable for target detection of the large-size image.

Description

Large-size image target detection method, device and computer readable storage medium

Technical Field

The application relates to the technical field of computer vision, in particular to a large-size image target detection method and device and a computer readable storage medium.

Background

Object detection is one of the main applications of computer vision. The object of interest in the image can be accurately identified by using an object detection technique. The target detection technology based on the deep neural network has the characteristics of rapidness, accuracy, high efficiency and the like, and has wide application in the fields of intelligent security, satellite remote sensing, military reconnaissance and the like.

In the practical application scenario of object detection, there is a case that an object of interest is to be found in a large-sized image. The large-size image refers to an image containing more than 100 ten thousand pixels, and the pixel size is calculated by length by width, that is, an image with a size of 1000 x 1000 just contains one million pixels. The object detection task for large-size pictures generally has the following characteristics:

(1) The image is large in size, typically in the thousands to tens of thousands of pixels; (2) the target size is small, typically tens of pixels; (3) The number of images is large, and hundreds or even thousands of images are required to carry out target detection tasks.

Depending on the actual application effect on large-size images, faster R-CNN is usually chosen as detection network. The fast R-CNN detection network consists of a backbone network and a candidate region generation network, wherein the backbone network is used for extracting image features, and the candidate region generation network is used for generating a target candidate region, screening target candidate frames and generating a final detection result. The basic implementation steps of the detection are as follows: (1) extracting image features; (2) screening candidate regions; (3) candidate region regression correction; (4) object classification.

Currently, the target detection task based on the deep neural network is generally realized based on a GPU platform. For large-size images, if the images are sent to the detection network at one time, the problems of memory exhaustion and poor detection effect are encountered. On the one hand, because the image size is large, more memory is required for forward reasoning calculation, and the whole calculation process is difficult to finish under the condition that the memory of the GPU is limited. On the other hand, if the whole picture is sent to the detection network, after the image features are extracted, the size of the feature map is gradually reduced due to the operations such as pooling, and the number of pixels occupied by the actual target on the feature map is also gradually reduced, so that the accuracy is greatly reduced when the candidate region is generated. Therefore, conventional detection methods cannot be used for target detection of large-size images. In addition, under the condition that the target detection of the large-size image has a slower speed, in some application scenes with higher requirements on real-time response, the existing detection method is difficult to meet the actual requirements along with the increase of the image size and the increase of the number scale.

Disclosure of Invention

The application aims to provide a large-size image target detection method, a large-size image target detection device and a computer readable storage medium. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

According to an aspect of an embodiment of the present application, there is provided a large-size image object detection method including:

preprocessing an original image to obtain a plurality of groups of small images;

inputting the groups of small images into a target detection network to perform batch target detection to obtain a detection result of each small image;

screening the detection result of the small image;

and mapping the screened detection result back to the original image to obtain a target detection result of the original image.

Further, the preprocessing the original image to obtain a plurality of groups of small images includes:

dividing an original image into a plurality of small images;

grouping the plurality of small images to obtain a plurality of groups of small images.

Further, the dividing the original image into a plurality of small images includes:

dividing the original image into a plurality of small images with the same size according to a preset dividing size and a preset overlapping area size.

Further, the target detection network comprises a faster area convolutional neural network; the faster regional convolutional neural network comprises a backbone network, a regional generation network, a Propos al layer, a region of interest pooling layer and a regression correction classification layer which are connected in sequence, wherein the region of interest pooling layer is connected with the backbone network.

Further, inputting the plurality of groups of small images into a target detection network for batch target detection to obtain a detection result of each small image, including:

inputting each group of small images into the backbone network to obtain a feature map of each small image;

inputting the feature map into the region generation network and the region-of-interest pooling layer respectively;

obtaining candidate frames of each small image through the area generating network;

screening the candidate frames through the Propos layer;

inputting the screened candidate boxes into the region-of-interest pooling layer and pooling the feature images of the small images at the same time;

and sending the pooled feature images and the candidate frames into a regression correction classification layer together to obtain a detection result of each small image.

Further, the screening the candidate frame through the Proposal layer includes:

removing candidate frames with confidence scores lower than a preset threshold value; the confidence score is obtained through the regional generation network;

sorting the rest candidate frames according to the confidence scores, and selecting a preset value candidate frame before the confidence scores are ranked;

performing boundary processing on the candidate frames with the preset values, and eliminating parts exceeding the image boundaries;

screening the candidate frames subjected to boundary processing through a nonlinear maximum suppression algorithm;

and storing the candidate frames obtained after screening.

Further, the screening the detection result of the small image includes:

deleting candidate frames with confidence scores lower than a preset threshold value from the detection results of the small images; the confidence score is obtained through the regional generation network;

ranking the remaining candidate frames in the detection result of the small image according to the confidence scores, and selecting a plurality of candidate frames with the values preset before the score ranking;

carrying out boundary processing on the candidate frames with the preset numerical values, and eliminating the parts exceeding the image boundaries;

and screening the candidate frames subjected to boundary processing according to the confidence score and the intersection ratio by using a nonlinear maximum suppression algorithm, and screening the candidate frames meeting a preset score threshold condition and a preset intersection ratio threshold condition.

Further, the mapping the screened detection result back to the original image to obtain a target detection result of the original image includes:

according to the position of the candidate frame in the original image before segmentation, mapping the candidate frame meeting a preset score threshold condition and a preset intersection ratio threshold condition back to the original image to obtain the coordinate of the candidate frame in the original image; the target detection result of the original image comprises coordinates of all candidate frames in the original image.

Further, before the inputting the filtered candidate box into the region of interest pooling layer and pooling the feature map of the small image simultaneously, the method further includes: and continuously storing the detection results of the small images, and setting an index label for the detection result of each small image.

According to another aspect of an embodiment of the present application, there is provided a large-size image object detection apparatus including:

the preprocessing module is used for preprocessing the original image to obtain a plurality of groups of small images;

the detection module is used for inputting the plurality of groups of small images into a target detection network to carry out batch target detection to obtain a detection result of each small image;

the screening module is used for screening the detection result of the small image;

and the mapping module is used for mapping the screened detection result back to the original image to obtain a target detection result of the original image.

According to another aspect of an embodiment of the present application, there is provided a computer-readable storage medium having stored thereon a computer program that is executed by a processor to implement the above-described large-size image object detection method.

One of the technical solutions provided in one aspect of the embodiments of the present application may include the following beneficial effects:

according to the large-size image target detection method provided by the embodiment of the application, the image is divided into a plurality of groups of small images, the small images are detected by the target detection network group, the detection result of the small images is mapped back to the original image, the target detection result of the original image is obtained, the target detection speed of the image is higher, and the accuracy of the detection result is high, so that the method is particularly suitable for target detection of the large-size image.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

FIG. 1 shows a flow chart of a large-size image object detection method of an embodiment of the present application;

FIG. 2 shows a block diagram of a large-size image object detection apparatus according to an embodiment of the present application;

FIG. 3 is a flowchart showing a large-size image object detection method according to another embodiment of the present application;

FIG. 4 is a schematic diagram showing steps of the ROI pooling layer and regression-modified classification layer processing candidate boxes and feature maps according to another embodiment of the present application.

Detailed Description

The present application will be further described with reference to the drawings and the specific embodiments in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

When the fast R-CNN network is used for target detection, the common practice is to send pictures into the network one by one in sequence for detection, but in some application scenes, a large number of pictures need to be processed rapidly at the same time. In the design of fast R-CNN, data is stored and organized essentially in the form of NCHW, where H and W represent the height and width of the image, respectively, C represents the number of channels of the image, and N represents the number of pictures that are sent into the network. Such a storage structure is facilitated by, for example, caffe, tensorflow, pyTorchThe framework of reasoning with GPUs is widely adopted. But N in the network after the ROI Pooling layer in the fast R-CNN network again represents a different meaning, here we define it as N _O . Because the ROI Pooling layer outputs the feature map of different candidate regions after being pooled by the ROI, N is _O N in CHW _O N×r, where N represents the number of input detection pictures, R represents the number of candidate regions, and H and W are feature maps of the candidate regions after pooling.

The current method for realizing batch detection is that a plurality of pictures are sent into a detection network, characteristics are extracted through a backbone network, the characteristic images are subjected to RPN (remote procedure) network to obtain the score of a frame corresponding to each anchor, candidate frames of each picture are screened according to the score and IoU in a Propos al layer, and then the screened results are sent into an ROI (region of interest) Pooling layer for subsequent processing. In the process of screening different images in a batch by the Propos al layer, the positions and the number of the obtained candidate frames are necessarily different due to different image contents in each picture. The general practice is to set a maximum value of one candidate frame for each picture, so as to ensure that the number of candidate frames obtained by all pictures is within the range of the maximum value. Since the coordinate values of the candidate frames generally exist in a continuous memory space, this has the advantage that the address of each picture candidate frame coordinate can be conveniently found in the ROI Pooling layer according to this maximum value, so as to perform the ROI Pooling calculation. According to the previous description, the feature map dimension N after the ROI pooling _O N in CHW _O ＝N×ROI _max . This way of batch detection can ensure that all candidate boxes in the image can be fed into the subsequent network unimpeded. However, in batch detection of large-size images, this approach is also quite memory and time consuming, since not all pictures get a number of candidate frames that can reach the ROI _max The redundant invalid candidate boxes are sent to the subsequent network, which causes waste of storage space and computing resources. Although by reducing the ROI _max Saving storage and computation, but not knowing how many candidate boxes are in the image before the result is obtained, an unreasonable ROI _max The value is easy to cause missed detection, fromSo that the detection accuracy is lowered. Therefore, improvements to this method of batch inspection are needed to meet the needs of large-size image batch inspection.

As shown in fig. 1, an embodiment of the present application provides a large-size image object detection method 01, including:

s10, preprocessing the original image to obtain a plurality of groups of small images.

In certain embodiments, step S10 comprises:

s101, dividing an original image into a plurality of small images;

s102, grouping the plurality of small images to obtain a plurality of groups of small images.

The segmenting the original image into a plurality of small images includes:

In some embodiments, preprocessing the original image primarily includes segmenting and grouping the large-size image; the original image is a large-size image, and the large-size image refers to an image with more than 100 ten thousand pixels; the large-size image segmentation is to segment one large-size image into a plurality of small images with the same size according to a fixed size, and proper segmentation size and overlapping area need to be selected during segmentation. The overlapping area is set when the image is segmented, so as to prevent the actual object from being truncated at the segmentation boundary in the segmentation process, thereby causing missed detection, and the later segmentation area and the former segmentation area are overlapped to a certain extent when the image is segmented in the horizontal direction and the vertical direction. The grouping is to group the divided images. The number of small images that each group contains is determined based on the number of images that the GPU is able to support simultaneous detection. The small images of one group are taken as one batch and simultaneously sent to a target detection network for detection.

S20, inputting the small images into a deep neural network in a group unit to perform batch target detection, and obtaining a detection result of each small image.

The deep neural network is a fast R-CNN (Faster regional convolutional neural network) -based target detection network, including a class of target detection networks that use different backbones.

The fast R-CNN (Faster regional convolutional neural network) of this embodiment includes a backbone network, an RPN (regional generation network), a Proposal layer, an ROI pooling layer (regional pooling layer of interest), and a regression correction classification layer connected in this order, the ROI pooling layer being connected to the backbone network. The backbone network may be a VGG 16 network. The regression-correction classification layer includes a number of convolution layers.

The batch detection can process a plurality of images simultaneously and obtain detection results simultaneously. In the detection beginning stage, a group of small images are sent to a detection network, and image data of the small images are simultaneously sent to a GPU memory to participate in neural network operations such as convolution, pooling and the like, including operations of a Propos layer and an ROI Pooling layer. On the GPU, these computation processes occur within the same execution cycle of the forward reasoning process.

In certain embodiments, step S20 comprises:

s201, inputting each group of small images into a backbone network to obtain a feature map of each small image;

s202, respectively inputting the feature map into an RPN layer and an ROI Pooling layer;

s203, obtaining a candidate frame of each small image and a confidence score of each candidate frame through RPN;

s204, primarily screening candidate frames corresponding to each anchor point through a Propos layer;

s205, inputting the residual candidate frames after primary screening into an ROI Pooling layer of the faster region convolutional neural network, and simultaneously performing ROI Pooling operation on the candidate frames of the small images and the feature images of the small images;

and S206, sending the feature images of the candidate areas after the ROI pooling operation into a regression correction classification layer together, and finally obtaining the detection result of each small image.

The detection result of each small image comprises a feature map, a candidate frame and a confidence score of the candidate frame corresponding to each small image.

In the Proposal layer operation of the Faster R-CNN network, candidate frames of each small image are respectively screened according to the confidence scores of the candidate frames obtained through the RPN layer. The candidate frames corresponding to each anchor point are subjected to primary screening through a Propos layer, and the method mainly comprises the following steps:

and firstly, removing candidate frames with confidence scores lower than a preset threshold value. The preset threshold may be preset in advance before detection begins.

Step two, sorting the rest candidate frames according to the confidence score, and selecting the S before the confidence score ranking _max Candidate boxes, S therein _max The value may be preset before detection begins.

Third step, for the front S _max And carrying out boundary processing on the candidate frames, and eliminating the part exceeding the image boundary in the candidate frames.

Fourth, the confidence scores and IoU values (IoU, intersectional-over-Union) of all candidate frames of each thumbnail processed by the above steps are screened using nms (Non-Maximum Suppression, nonlinear maximum suppression) algorithm. Both the score threshold for screening and the IoU threshold may be preset prior to detection.

And fifthly, storing the candidate frames obtained in the fourth step into a continuous storage space for the subsequent use of the calculation process.

In some embodiments, the screening the candidate box by the Proposal layer comprises:

s2041, removing candidate frames with confidence scores lower than a preset threshold value;

s2042, sorting the rest candidate frames according to the confidence score, and selecting a plurality of candidate frames with preset values before the confidence score ranking;

s2043, carrying out boundary processing on the candidate frames with the preset values, and eliminating parts exceeding the image boundaries;

s2044, screening the candidate frames subjected to boundary processing through a nonlinear maximum suppression algorithm;

s2045, storing the candidate frames obtained after screening.

In the operation of the ROI Pooling layer (region of interest Pooling layer) in the fast R-CNN network, all feature maps of each small image participate together in the ROI (region of interest) Pooling operation. And the feature images of each small picture which are pooled through the ROI Pooling layer are sent to a subsequent regression correction network and a classification network together, and finally, detection results are obtained simultaneously.

S30, screening the detection result of the small image.

The screening in step S30 is to screen candidate frames in the detection result again according to the score according to the preset threshold, remove candidate frames lower than the score threshold, and further screen candidate frames again using nms algorithm. In the method of this embodiment, a total of two screens are performed, the primary screen is a screen performed on the candidate frame by the Propos al layer, and the secondary screen is a screen performed on the detection result of the small image.

The screening of the detection result of the small image comprises the following steps:

s301, deleting candidate frames with confidence scores lower than a preset threshold value from the detection result of the small image; the confidence score is obtained through the regional generation network;

s302, sorting the rest candidate frames in the detection result of the small image according to confidence scores, and selecting a plurality of candidate frames with the scores of the preset numerical values before ranking;

s303, carrying out boundary processing on the candidate frames with the preset numerical values, and eliminating parts exceeding the image boundaries;

s304, screening the candidate frames subjected to boundary processing according to the confidence score and the cross ratio by using a nonlinear maximum suppression algorithm, and screening the candidate frames meeting a preset score threshold condition and a preset cross ratio threshold condition.

And S40, mapping the screened detection result back to the original image to obtain a target detection result of the original image.

The mapping is coordinate mapping, which means that the coordinates of the candidate region obtained by each picture are mapped back to the original image before the division according to the position of the picture before the division, so as to obtain the coordinates of the candidate frame in the original image, and the target detection result of the original image is obtained.

According to the large-size image target detection method, the image is divided into a plurality of groups of small images, the small images are detected through the target detection network in a grouping mode, the detection results of the small images are mapped back to the original image, the target detection results of the original image are obtained, the target detection speed of the image is high, and the detection result accuracy is high, so that the method is particularly suitable for target detection of the large-size image.

In some embodiments, before the inputting the filtered candidate box into the region of interest pooling layer and the feature map of the small image are pooled simultaneously, the large-size image target detection method further comprises: and continuously storing the detection results of the small images, and setting an index label for the detection result of each small image.

As shown in fig. 2, the present embodiment further provides a large-size image object detection apparatus, including:

the preprocessing module 100 is used for preprocessing the original image to obtain a plurality of groups of small images;

the detection module 200 is configured to input the plurality of groups of small images into a target detection network to perform batch target detection, so as to obtain a detection result of each small image;

the screening module 300 is configured to screen the detection result of the small image;

and the mapping module 400 is configured to map the screened detection result back to the original image, so as to obtain a target detection result of the original image.

The present embodiment also provides a computer-readable storage medium having stored thereon a computer program that is executed by a processor to implement the above-described large-size image object detection method.

As shown in fig. 3, another embodiment of the present application provides a large-size image object detection method 02, including:

firstly, preprocessing a large-size image, dividing the large image into small images, inputting each small image into a deep neural network, then, obtaining detection results of a plurality of small images simultaneously by using a batch detection method, finally, carrying out a post-processing step, and mapping coordinates of the results back to an original image to obtain a final detection result of the large image.

And outputting a feature map after the small picture is input into a backbone network for processing, and inputting the feature map into an RPN (Region Proposal Network, regional generation network) to obtain candidate frames and confidence scores of the candidate frames.

The Propos al layer is used for screening candidate frames corresponding to each anchor point according to the confidence scores of the candidate frames obtained by the RPN layer, and screening candidate frames most likely to contain targets according to the scores. This includes several steps:

1) One-time screening according to confidence score threshold

2) And sorting the left candidate frames once according to the confidence scores of the remaining candidate frames, and selecting the first N items according to the maximum number of the candidate frames.

3) Overlapping candidate boxes are screened by an nms algorithm.

4) And sending the candidate frames of the different small pictures to the next layer for calculation.

As shown in fig. 4, the ROI Pooling layer is used to: aiming at known candidate frames and picture feature images, the position of the corresponding picture feature image is found through the candidate frames, pooling operation is carried out, the sizes of different candidate frames are different, and the ROI Pooling can change all feature images corresponding to each candidate frame into feature images with the same size, so that subsequent calculation is facilitated.

In some embodiments, the detection results of different small images in the batch detection process are stored continuously, and an index label is set for each detection result of the small images to distinguish the detection results. This step is done after the screening result of the thumbnail is obtained by the Propos layer.

In general, for a storage manner of candidate frames of batch detection pictures, a maximum value of candidate frames is set, and since the number of candidate frames obtained for each small picture is different, if the maximum value of candidate frames is set according to the small picture with the largest number of candidate frames, space is wasted, and in the process of calculation, for the small pictures with the number of candidate frames not reaching the maximum value, such redundancy occupies computing resources. If the maximum value of the candidate frame is set too small, objects in small pictures with the number of the candidate frames exceeding the value are easily missed, and detection omission is caused.

If the improved method of continuous storage in this embodiment is implemented, the situation can be well avoided by storing according to the actual number of candidate frames of each picture, and setting an index for each candidate frame to identify the small picture to which it belongs. The method can reduce the calculated amount and the storage space, accelerate the batch detection speed and realize higher detection throughput rate under the same calculation resource. Meanwhile, adverse effects on detection accuracy caused by setting of the maximum value of the candidate frame are avoided.

Finally, when mapping back to the original image, the position of the original image is required to be corresponding to the index label of the candidate frame.

In the Propos al layer, candidate frames are generated for each input picture, and the number of candidate frames generated for each picture is set as r _n Where n=1, 2, …, N. The ROI Pooling layer is input by ROI (region of interest) and feature map, dimensions are r×4 and n×c×h×w, respectively, and output is r×c×pool_h×pool_w. The ROI Pooling is used for generating a feature map after corresponding region Pooling according to the position of each candidate frame. The dimensions of these feature maps are pool_h×pool_w. R represents the number of candidate regions.

In the case of batch detection, the subsequent computation of the Propos al layer involves candidate frames generated by multiple pictures, the number of candidate frames for each picture is different, and if all candidate frames are sent to the ROI Pooling layer at a time, when the ROI Pooling operation is performed, it cannot be determined which picture's feature picture to be mapped to. The simplest implementation is to set a maximum ROI for each picture generated candidate frame _max Then the ROI can be used for the ROI Pooling _max To determine a corresponding feature map. Let the output of Propos al layer be a ROI matrix of size N×ROI _max X 4, and ROI Pooling is used to determine the ROI corresponding to the feature map nLocation, can use

Addr＝N×ROI _max ×4；

The output of the ROI pulling at this time is N×ROI _max ×c×pool _h ×pool _w . In this way, although the ROI Pooling is convenient for address calculation and the consistent ROI output quantity of each picture is ensured, the waste of storage and calculation resources is caused. Because if the number of ROIs per picture is much smaller than the ROI _max Then the excess output is sent to the subsequent network for calculation.

This way of uniformly sizing the output is less influential for the fast R-CNN with VGG 16 as the backbone network, VGG 16 having only two fully connected layers after ROI Pooling. However, for a fast R-CNN like Resnet-101 as a backbone network, the superfluous output can significantly increase the computational effort and thus severely impact the computational speed. Because the ResNet-101 as the backbone of Faster R-CNN has a significant portion of the convolution calculation after the ROI Pooling layer. If these convolutions are all in accordance with the ROI _max The first dimension that is input is then the resulting amount of computation is quite considerable. Although the ROI can be reduced by _max To avoid unnecessary computation.

For this case, the implementation of the Proposal layer is modified in the algorithm for bulk target detection. Since the number of generated candidate frames per picture is different, it is obviously inefficient to calculate if these candidate frames are sent separately to the subsequent network. Therefore, the candidate frames screened by nms and score sorting are marked, and the picture index corresponding to each picture candidate frame is recorded. The candidate frames are then placed in a candidate frame matrix having dimensions R x 5, the first 4 in a row representing the coordinates of the candidate frame and the last representing the index of the picture in which it is located

R＝r ₁ +r ₂ +…+r _n

After this matrix is obtained, the data that is fed into ROI Pooling is further calculated. In addition to generating the candidate frame matrix, there are index vectors at the Propos al layer that mark the number of candidate frames per picture. In the post-processing of the result, the result may be subjected to a final nms screening according to the index vector and mapped back to the original picture via the calculated position.

In the implementation of the ROI Pooling layer, since all candidate boxes have an index of a source picture, a corresponding feature map can be determined according to the index at the time of the Pooling operation. After such optimization, the output of the ROI pool layer becomes RXc X pool _h ×pool _w In general terms, the number of the cells in the cell,

R＜N×ROI _max

therefore, the optimization method can greatly increase the calculation speed of batch detection.

It should be noted that:

the term "module" is not intended to be limited to a particular physical form. Depending on the particular application, modules may be implemented as hardware, firmware, software, and/or combinations thereof. Furthermore, different modules may share common components or even be implemented by the same components. There may or may not be clear boundaries between different modules.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may also be used with the teachings herein. The required structure for the construction of such devices is apparent from the description above. In addition, the present application is not directed to any particular programming language. It will be appreciated that the teachings of the present application described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Various component embodiments of the application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in the creation means of a virtual machine according to an embodiment of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing examples merely illustrate embodiments of the application and are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A large-size image object detection method, characterized by comprising:

preprocessing an original image to obtain a plurality of groups of small images; the original image is a large-size image, and the large-size image refers to an image with more than 100 ten thousand pixels;

screening the detection result of the small image;

mapping the screened detection result back to the original image to obtain a target detection result of the original image;

the preprocessing of the original image to obtain a plurality of groups of small images comprises the following steps: dividing an original image into a plurality of small images; grouping the plurality of small images to obtain a plurality of groups of small images;

the segmenting the original image into a plurality of small images includes: dividing the original image into a plurality of small images with the same size according to a preset dividing size and a preset overlapping area size;

the target detection network comprises a faster area convolutional neural network; the faster regional convolution neural network comprises a backbone network, a regional generation network, a Propos al layer, a region of interest pooling layer and a regression correction classification layer which are connected in sequence, wherein the region of interest pooling layer is connected with the backbone network;

inputting the plurality of groups of small images into a target detection network for batch target detection to obtain a detection result of each small image, wherein the detection result comprises the following steps:

screening the candidate frames through the Propos layer;

sending the pooled feature images and the candidate frames into a regression correction classification layer together to obtain a detection result of each small image;

before the inputting of the filtered candidate boxes into the region of interest pooling layer and the pooling of feature images of the small image simultaneously, the method further comprises: and continuously storing the detection results of the small images, and setting an index label for the detection result of each small image.

2. The method of claim 1, wherein the screening the candidate boxes by the Proposal layer comprises:

and storing the candidate frames obtained after screening.

3. The method of claim 1, wherein the screening the detection results of the small image comprises:

4. A method according to claim 3, wherein mapping the screened detection result back to the original image to obtain a target detection result of the original image comprises:

5. A large-size image object detection apparatus, characterized by comprising:

the preprocessing module is used for preprocessing the original image to obtain a plurality of groups of small images; the original image is a large-size image, and the large-size image refers to an image with more than 100 ten thousand pixels;

the mapping module is used for mapping the screened detection result back to the original image to obtain a target detection result of the original image;

screening the candidate frames through the Propos layer;

before the step of inputting the filtered candidate boxes into the region-of-interest pooling layer and the step of pooling feature images of the small images simultaneously, the method further comprises the steps of: and continuously storing the detection results of the small images, and setting an index label for the detection result of each small image.

6. A computer-readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the large-size image object detection method according to any one of claims 1 to 4.