CN114419428A

CN114419428A - Target detection method, target detection device and computer readable storage medium

Info

Publication number: CN114419428A
Application number: CN202111478778.XA
Authority: CN
Inventors: 宋忠浩
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-04-29

Abstract

The application discloses a target detection method, a target detection device and a computer readable storage medium, wherein the method comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises at least one target to be detected; splitting an image to be processed into a plurality of sub-images, and performing target detection processing on the sub-images by adopting a target detection model to obtain a sub-detection result, wherein at least two sub-images have an overlapping area, and each target to be detected exists in at least one sub-image; classifying all the sub-detection results based on whether the same target to be detected exists in at least two sub-images to obtain at least two detection result sets, wherein the detection result sets comprise sub-detection results; and performing duplicate removal processing on the detection result set to obtain a target detection result, wherein the target detection result is the position information of all targets to be detected in the image to be processed. By means of the method, the problem of insufficient video memory can be solved, target omission can be prevented, and the accuracy of target detection is improved.

Description

Target detection method, target detection device and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a target detection method, a target detection apparatus, and a computer-readable storage medium.

Background

The large-size aerial image is an image with a larger size/an oversize size and shot by equipment such as a remote sensing satellite/a high-altitude unmanned aerial vehicle/an aerial camera, and the shooting range is wider. However, when the target detection is performed on the large-size aerial image, some difficulties exist: when the whole large-size aerial image is sent to a target detection model for model reasoning, the problem of insufficient video memory of equipment is caused because the large-size aerial image has large size and the model has too many parameters to be calculated, and the reasoning mode has great difficulty particularly in embedded equipment or mobile-end equipment with extremely limited calculation force; in addition, because the large-size aerial image has a large size, the pixel area occupied by the interested target is small in the whole image, so that the interested target is difficult to accurately detect, the problem of missing detection occurs, and the performance of the target detection model is influenced.

Disclosure of Invention

The application provides a target detection method, a target detection device and a computer readable storage medium, which can solve the problem of insufficient video memory for detecting a large-size image by a computing power limited device, prevent target omission and improve the accuracy of target detection.

In order to solve the technical problem, the technical scheme adopted by the application is as follows: there is provided a method of object detection, the method comprising: acquiring an image to be processed, wherein the image to be processed comprises at least one target to be detected; splitting an image to be processed into a plurality of sub-images, and performing target detection processing on the sub-images by adopting a target detection model to obtain a sub-detection result, wherein at least two sub-images have an overlapping area, and each target to be detected exists in at least one sub-image; classifying all the sub-detection results based on whether the same target to be detected exists in at least two sub-images to obtain at least two detection result sets, wherein the detection result sets comprise sub-detection results; and performing duplicate removal processing on the detection result set to obtain a target detection result, wherein the target detection result is the position information of all targets to be detected in the image to be processed.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an object detection apparatus comprising a memory and a processor connected to each other, wherein the memory is used for storing a computer program, and the computer program is used for implementing the object detection method in the above technical solution when being executed by the processor.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer-readable storage medium for storing a computer program for implementing the object detection method of the above technical solution when the computer program is executed by a processor.

Through the scheme, the beneficial effects of the application are that: firstly, splitting an image to be processed into a plurality of sub-images, and then inputting each sub-image into a pre-trained target detection model to generate a sub-detection result; classifying all the sub-detection results by detecting whether each target to be detected appears in different sub-images to generate a detection result set; then, carrying out duplicate removal processing on the detection result set to generate a final target detection result; according to the method and the device, the image to be processed is cut, so that the size of the image processed by the target detection model is reduced, the time spent by the target detection model in directly processing the large image is shortened, the target detection efficiency is improved, and the method and the device are convenient to apply to embedded equipment or mobile terminal equipment with limited computing power; moreover, when the image is cut, each target to be detected in the image to be processed can be detected, the condition that the target is missed to be detected is effectively avoided, and the accuracy of target detection is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a target detection method provided herein;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a target detection method provided herein;

FIG. 3 is a Stride provided herein>W′-max(z_i) A corresponding cutting result graph;

FIG. 4 is Stride ≦ W' -max (z) provided herein_i) A corresponding cutting result graph;

FIG. 5 is a schematic diagram of a rectangular planar coordinate system provided herein;

FIG. 6 is a schematic diagram of a de-duplication detection block provided herein;

FIG. 7 is a schematic diagram of the implementation of the Soft-NMS provided in the present application;

FIG. 8 is a schematic structural diagram of an embodiment of an object detection apparatus provided in the present application;

FIG. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be noted that the following examples are only illustrative of the present application, and do not limit the scope of the present application. Likewise, the following examples are only some examples and not all examples of the present application, and all other examples obtained by a person of ordinary skill in the art without any inventive work are within the scope of the present application.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

It should be noted that the terms "first", "second" and "third" in the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of indicated technical features. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a target detection method provided in the present application, the method including:

step 11: and acquiring an image to be processed.

The image to be processed comprises at least one target to be detected, and an image with a larger size can be read from image data to serve as the image to be processed, or the current scene is shot through camera equipment to generate the image to be processed; specifically, the image to be processed may be a large-size aerial image, which is an image captured by a camera device such as a remote sensing satellite, a high-altitude unmanned aerial vehicle, or an aerial camera.

Step 12: and splitting the image to be processed into a plurality of sub-images, and performing target detection processing on the sub-images by adopting a target detection model to obtain a sub-detection result.

After the image to be processed is acquired, the image to be processed is split to generate a plurality of sub-images, at least two sub-images have an overlapping area, and each target to be detected exists in at least one sub-image, namely any target to be detected in the image to be processed appears in at least one sub-image. Inputting the subimages into a target detection model to generate corresponding sub-detection results, wherein the sub-detection results comprise detection results of whether the subimages have targets to be detected or not; when the sub-detection result is that the sub-image has the target to be detected, the sub-detection result further includes position information of the target to be detected, taking the detection frame of the target to be detected as a rectangle as an example, the position information may be a coordinate value of the upper left corner of the detection frame, a length of the detection frame and a width of the detection frame, or the position information is a coordinate value of the upper left corner of the detection frame and a coordinate value of the lower right corner of the detection frame; or the sub-detection result may also include other information of the target to be detected, such as: category information.

In a specific embodiment, the following steps may be employed to train the target detection model:

firstly, training data is obtained, wherein the training data comprises a plurality of training images and position labels of labeling targets in the training images (namely position information of target frames corresponding to the labeling targets).

Manually labeling a large amount of training images to generate labeling information, wherein the labeling information comprises position information of an interested target (namely a labeling target) and category information of the labeling target.

Further, the training data can be preprocessed to obtain preprocessed training data, and the preprocessed training data is input into the target detection model for training, wherein the preprocessing operation comprises geometric transformation, color enhancement, data enhancement or data augmentation and the like.

Secondly, selecting a training image from the training data, and detecting the training image by adopting a target detection model to obtain a target frame.

And thirdly, calculating the current loss value based on the target frame and the position label corresponding to the target frame.

And fourthly, judging whether the target detection model meets the preset training ending condition or not based on the current loss value or the current training times.

And if the target detection model does not meet the preset training end condition, returning to the step of selecting a training image from the training data until the target detection model meets the preset training end condition.

Further, the preset stop condition includes: the loss value is converged, namely the difference value between the last loss value and the current loss value is smaller than a set value; judging whether the current loss value is smaller than a preset loss value, wherein the preset loss value is a preset loss threshold value, and if the current loss value is smaller than the preset loss value, determining that a preset stop condition is reached; training times reach a set value (for example: 10000 times of training); or the accuracy obtained when the test set is used for testing reaches a set condition (for example, the preset accuracy is exceeded), and the like.

It is to be understood that the above-mentioned object detection model may be a commonly used object detection model, such as: the fast regional Convolutional Neural network (fast R-CNN), the Single multi-box Detector (SSD), or the yolo (young Only Look one) can be used to design the structure and loss function of the target detection model according to the scene requirements, so as to perform deep learning training.

Step 13: and classifying all the sub-detection results based on whether the same target to be detected exists in at least two sub-images to obtain at least two detection result sets.

After detecting all the sub-images by adopting the target detection model, judging whether each detected target to be detected appears in at least two sub-images, and generating a coincidence detection result, wherein the coincidence detection result comprises a detection result whether each target to be detected appears in at least two sub-images, or the detection result comprises a plurality of sub-sets, each sub-set comprises at least one sub-image corresponding to the same target to be detected, namely the sub-images in the sub-sets comprise the same target to be detected; and then classifying all the sub-detection results by using the coincidence detection result to generate at least two types of detection result sets, wherein the detection result sets comprise the sub-detection results of the same type.

Further, the specific number of the detection result sets may be set according to specific application requirements, for example, the at least two types of detection result sets include two detection result sets: the method comprises the steps that a first detection result set and a second detection result set are obtained, wherein the first detection result set comprises sub-images meeting a first preset condition, and the first preset condition is that the sub-images and at least one residual sub-image in the first detection result set have an overlapping area (namely the sub-images and the residual sub-images comprise the same target to be detected); the second detection result set comprises the subimages meeting a second preset condition, and the second preset condition is that the subimages do not have an overlapping area with all the rest subimages in the second detection result set. Or, the at least two types of test result sets include three test result sets: the method comprises the steps that a first detection result set, a second detection result set and a third detection result set are obtained, the first detection result set comprises sub-images meeting a third preset condition, and the third preset condition is that the sub-images and a certain residual sub-image in the first detection result set have an overlapping area; the second detection result set comprises sub-images meeting a second preset condition; the third detection result set comprises the subimages meeting a fourth preset condition, and the fourth preset condition is that the subimages and two residual subimages in the third detection result set have an overlapping area. It is understood that the at least two types of test result sets may further include 4 test result sets, which will not be described in detail herein.

Step 14: and carrying out duplicate removal processing on the detection result set to obtain a target detection result.

After the detection result sets are obtained, processing at least one detection result set by adopting a duplicate removal method to obtain the detection result set after duplicate removal; and then fusing the detection result set after the deduplication processing and the detection result set without the deduplication processing to generate a target detection result set, wherein the target detection result is the position information of all targets to be detected in the image to be processed. For example, taking the number of the detection result sets as 3 as an example, which are respectively denoted as a1-A3, the detection result set a1 and the detection result set a2 may be subjected to de-duplication processing, respectively, to generate a detection result set B1 and a detection result set B2, and then the detection result set B1, the detection result set B2, and the detection result set A3 are superimposed to form a large detection result set (i.e., target detection result).

The embodiment provides a target detection method based on a large-size aerial image, which adopts a preprocessing method of cutting the large-size aerial image into a plurality of sub-images, and then inputs each sub-image into a pre-trained target detection model to generate a sub-detection result; then classifying all sub-detection results by utilizing whether each target to be detected appears in different sub-images; carrying out duplicate removal processing on the classified detection result set to obtain a final target detection result; the large-size aerial image is cut, so that the size of the image processed by the target detection model is greatly reduced, the time spent by the target detection model for directly detecting the large-size aerial image can be shortened, the efficiency of target detection on the large-size aerial image is improved, and the method is convenient to apply to embedded equipment or mobile terminal equipment with limited computing power; moreover, when the image is cut, each target to be detected in the large-size aerial image is ensured to appear in at least one subimage, so that target missing detection is effectively avoided, and the accuracy of target detection on the large-size aerial image is improved.

Referring to fig. 2, fig. 2 is a schematic flowchart of another embodiment of a target detection method provided in the present application, the method including:

step 21: and acquiring an image to be processed.

Step 21 is the same as step 11 in the above embodiment, and is not described again here.

Step 22: and cutting the image to be processed by adopting a sliding window with a preset size and a preset step length to obtain a plurality of sub-images.

The large-size image to be processed is difficult to directly carry out model reasoning operation, so that the image to be processed is cut and split into a plurality of small-size sub-images which are then respectively sent to a target detection model for reasoning; specifically, a sliding window method in the related art is adopted to cut the image to be processed, and the size of the sliding window is the same as that of the sub-image, namely the size of the sub-image is a preset size; and the sliding step length is a preset step length which meets the preset condition.

In a specific embodiment, the size condition of the labeled target in the training data can be counted, and the size distribution condition is analyzed to find the maximum size; specifically, in an application scenario where a target frame of a labeled target is rectangular, calculating the maximum value of diagonal lengths of all labeled targets in training data to obtain a preset length; calculating the difference between the length of the subimage and the preset length; judging whether the preset step length is smaller than the difference value or not; and if the preset step length is smaller than the difference value, determining that the preset step length meets the preset condition.

For example, suppose that the training data is highlightedThe total number of targets is P, and the length of the target frame is w_i(i ═ 0,1, 2.., P), the width of the target frame is h_i(i ═ 0,1, 2., P), the length of the diagonal of the target box is calculated using the following formula:

assuming that the cutting step size is Stride, the size of the image to be processed is W × H × C, where W denotes the width of the image to be processed, H denotes the height of the image to be processed, and C denotes the number of channels of the image to be processed. Determining the size of the sub-image to be cut according to the target distribution condition and the target size condition; assume that the size of the sub-image is W ' × H ' × C '; performing cutting operation on an image to be processed according to a cutting step size Stride, wherein the Stride needs to meet the following limiting conditions:

Stride<＝W′-max(z_i) (2)

this ensures that all objects to be detected are detected, and thus, missing detection of objects to be detected distributed in the overlapping area of the sub-images is avoided, as shown in fig. 3 and 4.

As can be seen from FIG. 3, when Stride is being used>W′-max(z_i) When the object to be detected (i.e., object) is located on the boundary of the two sub-images (sub-image 1 and sub-image 2), and the length of the diagonal line is greater than the width of the overlap region, the object to be detected cannot be detected from any one of the sub-images 1 and sub-image 2, and finally the object to be detected is missed.

As can be seen from FIG. 4, when Stride is being used<＝W′-max(z_i) In time, three target distributions appear, which are discussed separately below:

1) if the target to be detected (e.g.: object1) is located on the left border of sub-image 2, the Object to be detected is in the cut area of sub-image 1 and will be detectable from sub-image 1.

2) If the target to be detected (e.g.: object2) is located in the overlapping area of sub-image 1 and sub-image 2, the Object to be detected is located in the cutting area of the two sub-images at the same time, and can be detected from the sub-image 1 and the sub-image 2, at this time, the Object to be detected has two position detection results, and the position detection results respectively correspond to the top left corner vertexes of the sub-image 1 and the sub-image 2, and the deduplication processing is required when the Object information fusion is finally performed, and the specific operation will be described in the following scheme.

3) If the target to be detected (e.g.: object3) is located on the right border of sub-image 1, the Object to be detected is in the cut area of sub-image 2 and can be detected from sub-image 2.

In conclusion, no matter which position the target to be detected is located, the target to be detected can be ensured to be detected at least once, and the condition that the target is missed to be detected is effectively avoided.

Step 23: and carrying out target detection processing on the sub-images to obtain sub-detection results.

Step 23 is the same as step 12 of the previous embodiment, and is not described again here.

After the detection of all the sub-images is completed, since each sub-detection result is independent from each other, the sub-detection results of each sub-image need to be fused to be output as the target detection result of the image to be processed, which will be described in detail below.

Step 24: the first sub-location information is converted into second sub-location information.

The sub-detection result includes first sub-position information, the first sub-position information of each sub-image is a detection result corresponding to the sub-image, and the position information of the target to be detected is a coordinate value corresponding to a vertex of an upper left corner of each sub-image, so that the target to be detected needs to be converted into position information corresponding to a large image (i.e., an image to be processed), that is, the first sub-position information is subjected to coordinate conversion processing to generate second sub-position information, and the second sub-position information is position information of the target to be detected in the image to be processed corresponding to the first sub-position information.

In a specific embodiment, the adopted rectangular plane coordinate system is shown in fig. 5, and since the detection frame is rectangular, only the coordinates (x) of the upper left corner of the target to be detected need to be determined₀,y₀) And coordinates of lower right corner: (x₁,y₁) And (4) finishing.

Assuming that the image size of the image to be processed is W × H × C, the image size of the sub-image is W ' × H ' × C ', the cutting step is Stride, the number of sub-images obtained by cutting along the X-axis direction is as follows:

where INT denotes a rounding down operation.

The number of sub-images cut along the Y-axis direction is as follows:

let us assume that the position of the object to be detected in the subimage is denoted as [ (x)₀,y₀),(x₁,y₁)]For the object to be detected in each sub-image, its position in the large image is denoted by [ (x)_min,y_min),(x_max,y_max)]The calculation is as follows:

x_min＝i×Stride+x₀ (i＝0,1,2,...,n_x) (5)

y_min＝i×Stride+y₀ (i＝0,1,2,...,n_y) (6)

x_max＝i×Stride+x₁ (i＝0,1,2,...,n_x) (7)

y_max＝i×Stride+y₁ (i＝0,1,2,...,n_y) (8)

finally, obtaining the position information of all the targets to be detected relative to the large graph through the operation, wherein the position information is unified and is calculated by taking the upper left corner of the whole large graph as a vertex; the detection frames of each object to be detected may be marked and assigned an object identification (id) to distinguish the different detection frames.

Step 25: and judging whether the at least two sub-images have the same target to be detected or not based on the second sub-position information.

The position of the target to be detected in each sub-image can be converted into position information relative to the large image; for the large graph, under the condition that the cutting step length is determined, the position information of the overlapped area in the large graph is determined, and the position information of the overlapped area can be calculated; therefore, when the target to be detected in the sub-images is classified, whether second sub-position information corresponding to the target to be detected in the sub-images falls in the overlapping area or not can be judged, if the second sub-position information falls in the overlapping area, the target to be detected is indicated to be located in the overlapping area, and at the moment, the fact that the same target to be detected exists in at least two sub-images can be determined; if the second sub-position information does not fall in the overlapping region, it indicates that the object to be detected is not located in the overlapping region, and it is determined that the object to be detected only appears in one sub-image.

In other embodiments, the similarity between the second sub-position information corresponding to the target to be detected in the sub-image and the second sub-position information corresponding to the target to be detected in other sub-images can be calculated; judging whether the similarity is smaller than a preset similarity or not; and if the similarity is smaller than the preset similarity, determining that the at least two sub-images have the same target to be detected. Or judging whether the sub-image corresponding to the second sub-position information has an overlapping area with the sub-images corresponding to other second sub-position information; and if the sub-images corresponding to the second sub-position information and the sub-images corresponding to other second sub-position information have an overlapping area, determining that the at least two sub-images have the same target to be detected.

Step 26: and if the same target to be detected exists in at least two sub-images, putting the sub-detection result corresponding to the target to be detected into the first detection result set.

The at least two detection result sets comprise a first detection result set and a second detection result set, and if a certain target to be detected appears in at least two sub-images, the sub-detection results of the sub-images comprising the target to be detected are all put into the first detection result set.

Step 27: and if the same target to be detected does not appear in the at least two sub-images, putting the sub-detection results corresponding to the target to be detected into a second detection result set.

Through the processing of steps 25 to 27, the classification of all the targets to be detected into two categories, namely, overlapping area targets and non-overlapping area targets is realized, wherein the overlapping area targets are targets located in overlapping areas of different sub-images, such as: objects similar to Object2, depicted in FIG. 4, have their Object box results (i.e., sub-detection results) saved in the first detection result set for subsequent deduplication operations; the non-overlap region objects are objects that are not located in the overlap region of the sub-images, and their sub-detection results are saved in the second detection result set, such as: FIG. 4 depicts a target similar to Object 3.

Step 28: and carrying out duplicate removal processing on the first detection result set to obtain a third detection result set.

Compared with a general deduplication operation, the present embodiment does not perform deduplication processing on the entire to-be-processed image, but performs a non-maximum suppression operation only on the overlapping areas of different sub-images, because: the objects to be detected in the image to be processed may be densely distributed, for example: the method comprises the following steps that images of a parking lot shot by a high-altitude unmanned aerial vehicle are cut, and the number of sub-images generated by cutting is large, so that the number of detected targets to be detected is large; the core idea of the non-maximum suppression algorithm is iteration-traversal-elimination, so that the whole processing process is long in time consumption; in the embodiment, by reducing the area needing to be deduplicated, the post-processing time of the target detection model can be effectively reduced, and the performance index of real-time performance is improved.

The objects to be detected located in the overlapping areas of the different sub-images have a unique id and are individually saved in the first set of detection results. Since a commonly used Non-maximum suppression algorithm (NMS) is easy to remove a target with a low score when processing a dense target, resulting in a target loss and reducing an average detection rate of the algorithm, the embodiment uses Soft-NMS for deduplication processing. Specifically, the first detection result set includes at least two detection frames of the target to be detected, and Soft Non-maximum suppression (Soft-NMS) is used to perform deduplication processing on all detection frames in the first detection result set to obtain a third detection result set, so as to remove repeatedly detected position information in the first detection result set, and obtain a set of detection frames (i.e., the third detection result set) with duplicate detection in the overlapping region. For example, as shown in fig. 6, which is a schematic diagram of removing duplicate detection frames, Object1 and Object2 are objects to be detected, and H uses Soft-NMS to perform de-duplication processing on the detection frames of the overlapping regions, where the solid line frames of the overlapping regions (i.e., H2 and H3) are detection frames generated by detecting sub-image 1, and the solid line frames of the overlapping regions (i.e., H1 and H4) are detection frames generated by detecting sub-image 2.

Further, the implementation principle of Soft-NMS is shown in FIG. 7, where B represents the set of initial detection box results, S represents the confidence score of the detection, and N represents_tRepresenting an Intersection of Union (IoU) threshold, M represents a detection box with the highest score, and represents a gaussian weight function, the mathematical expression of which is as follows:

the Soft-NMS algorithm multiplies the confidence score of the current detection box by a weighting function, the weighting function will attenuate the score of the adjacent detection box overlapping with the highest scoring detection box M, the more the attenuation of the score of the detection box highly overlapping with the detection box M is, the specific principle of Soft-NMS is the same as in the related art, and will not be described in detail herein.

Step 29: and fusing the second detection result set and the third detection result set to obtain a target detection result.

The set of sub-detection results obtained in step 28 after the overlap region is de-duplicated and the set of sub-detection results obtained in step 27 for the non-overlap region are added to be output as the target detection result of the entire thumbnail.

According to the preprocessing method for cutting the large graph into the sub-graphs, the size distribution condition of the target frame of the marked target in the training data is fully considered when the cutting strategy is made, the size of the target frame of the marked target in the training data is used for pre-estimating the size of the detection frame of the target to be detected in the image to be processed, a reasonable cutting step length can be made, the target to be detected is prevented from being located on the cutting line of the sub-graphs, and therefore the condition that the target is missed to be detected is effectively avoided. In addition, the embodiment improves the post-processing area, and only performs the non-maximum suppression deduplication operation on the overlapping area of the sub-images, thereby reducing the calculation amount, effectively reducing the post-processing time of the target detection model, and improving the efficiency of target detection on the large image.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of the object detection apparatus provided in the present application, an object detection apparatus 80 includes a memory 81 and a processor 82 connected to each other, the memory 81 is used for storing a computer program, and the computer program is used for implementing the object detection method in the foregoing embodiment when being executed by the processor 82.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium 90 provided in the present application, where the computer-readable storage medium 90 is used for storing a computer program 91, and the computer program 91 is used for implementing the object detection method in the foregoing embodiment when being executed by a processor.

The computer-readable storage medium 90 may be a server, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various media capable of storing program codes.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules or units is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims

1. A method of object detection, comprising:

acquiring an image to be processed, wherein the image to be processed comprises at least one target to be detected;

splitting the image to be processed into a plurality of sub-images, and performing target detection processing on the sub-images by adopting a target detection model to obtain a sub-detection result, wherein at least two sub-images have an overlapping area, and each target to be detected exists in at least one sub-image;

classifying all the sub-detection results based on whether the same target to be detected exists in at least two sub-images to obtain at least two detection result sets, wherein the detection result sets comprise the sub-detection results;

and performing duplicate removal processing on the detection result set to obtain a target detection result, wherein the target detection result is the position information of all targets to be detected in the image to be processed.

2. The object detection method of claim 1, wherein the step of splitting the image to be processed into a plurality of sub-images comprises:

cutting the image to be processed by adopting a sliding window with a preset size and a preset step length to obtain a plurality of sub-images;

and the size of the subimage is the preset size, and the preset step length meets the preset condition.

3. The object detection method of claim 2, further comprising:

calculating the maximum value of the diagonal lengths of all the marked targets in the training data to obtain a preset length;

calculating a difference between the length of the sub-image and the preset length;

judging whether the preset step length is smaller than the difference value or not;

if so, determining that the preset step length meets the preset condition.

4. The object detection method of claim 2, wherein the sub-detection result comprises first sub-location information, the method further comprising:

converting the first sub-position information into second sub-position information, wherein the second sub-position information is position information of a target to be detected corresponding to the first sub-position information in the image to be processed;

and judging whether the at least two sub-images have the same target to be detected or not based on the second sub-position information.

5. The target detection method according to claim 4, wherein the step of determining whether the same target to be detected exists in at least two sub-images based on the second sub-position information comprises:

judging whether second sub-position information corresponding to the target to be detected in the sub-images falls in the overlapping area; and if so, determining that the at least two sub-images have the same target to be detected.

6. The target detection method of claim 2, wherein the at least two types of detection result sets include a first detection result set and a second detection result set, and the step of performing de-duplication processing on the detection result set to obtain the target detection result comprises:

carrying out duplicate removal processing on the first detection result set to obtain a third detection result set;

and fusing the second detection result set and the third detection result set to obtain the target detection result.

7. The target detection method according to claim 6, wherein the step of classifying all the sub-detection results based on whether the same target to be detected exists in at least two of the sub-images to obtain at least two types of detection result sets comprises:

judging whether the at least two sub-images have the same target to be detected or not;

if so, putting the sub-detection result corresponding to the target to be detected into the first detection result set;

and if not, putting the sub-detection result corresponding to the target to be detected into the second detection result set.

8. The target detection method according to claim 6, wherein the first detection result set includes at least two detection frames of the target to be detected, and the step of performing deduplication processing on the first detection result set to obtain a third detection result set includes:

and performing duplicate removal treatment on all the detection frames by adopting a softening non-maximum value inhibition method to obtain the third detection result set.

9. An object detection apparatus, comprising a memory and a processor connected to each other, wherein the memory is configured to store a computer program, which when executed by the processor is configured to implement the object detection method of any one of claims 1-8.

10. A computer-readable storage medium for storing a computer program, the computer program, when being executed by a processor, is adapted to carry out the object detection method of any one of claims 1 to 8.