WO2020258793A1 - 目标检测及目标检测网络的训练 - Google Patents
目标检测及目标检测网络的训练 Download PDFInfo
- Publication number
- WO2020258793A1 WO2020258793A1 PCT/CN2019/128383 CN2019128383W WO2020258793A1 WO 2020258793 A1 WO2020258793 A1 WO 2020258793A1 CN 2019128383 W CN2019128383 W CN 2019128383W WO 2020258793 A1 WO2020258793 A1 WO 2020258793A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bounding box
- target
- candidate
- foreground
- network
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 129
- 238000012549 training Methods 0.000 title claims abstract description 59
- 230000011218 segmentation Effects 0.000 claims abstract description 130
- 238000000034 method Methods 0.000 claims abstract description 80
- 238000000605 extraction Methods 0.000 claims description 48
- 238000012545 processing Methods 0.000 claims description 19
- 238000005070 sampling Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 21
- 230000006870 function Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 14
- 238000011176 pooling Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000005670 electromagnetic radiation Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/12—Bounding box
Definitions
- the present disclosure relates to the field of image processing technology, and in particular to a method, device and equipment for target detection and target detection network training.
- Target detection is an important issue in the field of computer vision, especially for the detection of military targets such as airplanes and ships. Due to its large image size and small target size, it is difficult to detect. Moreover, the detection accuracy is low for targets such as ships with densely arranged states.
- the embodiments of the present disclosure provide a method, device and equipment for target detection and target detection network training.
- a target detection method including:
- the obtaining the target detection result of the input image according to the plurality of candidate bounding boxes and the foreground segmentation result includes: according to each of the plurality of candidate bounding boxes The overlapping area between the candidate bounding box and the foreground image area corresponding to the foreground segmentation result, at least one target bounding box is selected from multiple candidate bounding boxes; based on the at least one target bounding box, the target of the input image is obtained Test results.
- Selecting at least one target bounding box includes: for each candidate bounding box of the plurality of candidate bounding boxes, if the overlap area between the candidate bounding box and the corresponding foreground image area occupies in the candidate bounding box If it is greater than the first threshold, the candidate bounding box is used as the target bounding box.
- the at least one target bounding box includes a first bounding box and a second bounding box
- the obtaining the target detection result of the input image based on the at least one target bounding box includes : Based on the angle between the first bounding box and the second bounding box, determine the overlap parameter of the first bounding box and the second bounding box; based on the first bounding box and the first bounding box The overlap parameter of the two bounding boxes determines the target object positions corresponding to the first bounding box and the second bounding box.
- the determining the overlap parameter of the first bounding box and the second bounding box based on the included angle between the first bounding box and the second bounding box includes: obtaining an angle factor according to the angle between the first bounding box and the second bounding box; according to the intersection ratio between the first bounding box and the second bounding box and the angle Factor to obtain the overlap parameter.
- the overlap parameter of the first bounding box and the second bounding box is the product of the intersection ratio and the angle factor, wherein the angle factor increases with the The angle between the first bounding box and the second bounding box increases as the angle increases.
- the overlap parameter of the first bounding box and the second bounding box follows the first bounding box and the first bounding box.
- the angle between the two bounding boxes increases with the increase.
- the location of the target object corresponding to the first bounding box and the second bounding box is determined based on the overlap parameter of the first bounding box and the second bounding box , Including: in the case where the overlap parameter of the first bounding box and the second bounding box is greater than a second threshold, using one of the first bounding box and the second bounding box as the The location of the target object.
- the using one of the first bounding box and the second bounding box as the target object position includes: determining the first bounding box and the The overlap parameter between the foreground image area and the overlap parameter between the second bounding box and the foreground image area corresponding to the foreground segmentation result; and the first bounding box and the second bounding box with the A bounding box with a larger overlap parameter between the foreground image regions is used as the target object position.
- the location of the target object corresponding to the first bounding box and the second bounding box is determined based on the overlap parameter of the first bounding box and the second bounding box , Including: when the overlap parameter of the first bounding box and the second bounding box is less than or equal to a second threshold, using both the first bounding box and the second bounding box as the target object position.
- the aspect ratio of the target object to be detected in the input image is greater than a specific value.
- a method for training a target detection network includes a feature extraction network, a target prediction network, and a foreground segmentation network.
- the method includes:
- the foreground segmentation network obtains a sample foreground segmentation result of the sample image, wherein the sample foreground segmentation result includes indication information indicating whether each pixel of the plurality of pixels of the sample image belongs to the foreground; A plurality of sample candidate bounding boxes, the sample foreground segmentation result, and the label information of the sample image are used to determine a network loss value; based on the network loss value, network parameters of the target detection network are adjusted.
- the annotation information includes the true bounding box of at least one target object contained in the sample image, and the candidate bounding boxes of the plurality of samples and the sample foreground image area and all the
- the labeling information of the sample image and determining the network loss value include: for each candidate bounding box in the plurality of candidate bounding boxes, determining the candidate bounding box and the at least one real target bounding box marked by the sample image An intersection ratio between each real target bounding box; determining the first network loss value according to the determined intersection ratio of each candidate bounding box in the plurality of candidate bounding boxes.
- the intersection ratio between the candidate bounding box and the real target bounding box is obtained based on the circumscribed circle containing the candidate bounding box and the real target bounding box.
- the weight corresponding to the width of the candidate bounding box is higher than the weight corresponding to the length of the candidate bounding box.
- the obtaining the sample foreground segmentation result of the sample image through the foreground segmentation network according to the feature data includes: performing up-sampling processing on the feature data so that the processing The size of the latter feature data is the same as the size of the sample image; pixel segmentation is performed based on the processed feature data to obtain a sample foreground segmentation result of the sample image.
- the aspect ratio of the target object included in the sample image is higher than a set value.
- a target detection device including:
- the feature extraction unit is used to obtain feature data of the input image; the target prediction unit is used to determine multiple candidate bounding boxes of the input image according to the feature data; the foreground segmentation unit is used to obtain The foreground segmentation result of the input image, wherein the foreground segmentation result includes indication information indicating whether each pixel of the plurality of pixels of the input image belongs to the foreground; the target determination unit is configured to compare the multiple candidate bounding boxes with The foreground segmentation result obtains the target detection result of the input image.
- a training device for a target detection network includes a feature extraction network, a target prediction network, and a foreground segmentation network.
- the device includes:
- the feature extraction unit is used to perform feature extraction processing on the sample image through the feature extraction network to obtain feature data of the sample image;
- the target prediction unit is used to obtain multiple feature data through the target prediction network according to the feature data A sample candidate bounding box;
- a foreground segmentation unit configured to obtain a sample foreground segmentation result of the sample image through the foreground segmentation network according to the feature data, wherein the sample foreground segmentation result includes multiple indications of the sample image The indication information of whether each pixel in the pixel points belongs to the foreground;
- the loss value determining unit is used to determine the network loss according to the multiple sample candidate bounding boxes, the sample foreground segmentation result, and the label information of the sample image Value;
- a parameter adjustment unit for adjusting the network parameters of the target detection network based on the network loss value.
- a target testing device in a fifth aspect, includes a memory and a processor, the memory is used to store computer instructions that can be run on the processor, and the processor is used to execute the computer instructions When realizing the target detection method described above.
- a training device for a target detection network includes a memory and a processor.
- the memory is configured to store computer instructions that can run on the processor.
- the processor is configured to execute the The training method of the above-mentioned target detection network is realized by computer instructions.
- the target detection and target detection network training method, device, and equipment determine multiple candidate bounding boxes according to the feature data of the input image, and obtain the foreground segmentation result according to the feature data.
- the multiple candidate bounding boxes and foreground segmentation results can more accurately determine the detected target object.
- Fig. 1 is a flowchart of a target detection method shown in an embodiment of the present application.
- Fig. 2 is a schematic diagram of a target detection method shown in an embodiment of the present application.
- FIG. 3A and FIG. 3B are respectively a diagram showing the detection result of a ship according to an exemplary embodiment of the present application.
- Figure 4 is a schematic diagram of a target bounding box in the related art.
- 5A and 5B are respectively schematic diagrams of overlapping parameter calculation methods shown in exemplary embodiments of the present application.
- Fig. 6 is a flowchart of a method for training a target detection network shown in an embodiment of the present application.
- FIG. 7 is a schematic diagram of a method for calculating the intersection ratio shown in an embodiment of the present application.
- Fig. 8 is a network structure diagram of a target detection network shown in an embodiment of the present application.
- Fig. 9 is a schematic diagram of a method for training a target detection network shown in an embodiment of the present application.
- FIG. 10 is a flowchart of a method for predicting a candidate bounding box shown in an embodiment of the present application.
- FIG. 11 is a schematic diagram of an anchor point frame shown in an embodiment of the present application.
- Fig. 12 is a flowchart of a method for predicting a foreground image area according to an exemplary embodiment of the present application.
- Fig. 13 is a schematic structural diagram of a target detection device shown in an exemplary embodiment of the present application.
- Fig. 14 is a schematic structural diagram of a training device for a target detection network according to an exemplary embodiment of the present application.
- Fig. 15 is a structural diagram of a target detection device shown in an exemplary embodiment of the present application.
- Fig. 16 is a structural diagram of a training device for a target detection network shown in an exemplary embodiment of the present application.
- Fig. 1 shows a target detection method, which may include the following steps.
- step 101 feature data of the input image (for example, feature map) is obtained.
- the input image may be a remote sensing image.
- the remote sensing image may be an image obtained by the electromagnetic radiation characteristic signal of the ground object detected by a sensor mounted on, for example, an artificial satellite or an aerial photography aircraft.
- the input image may also be other types of images and is not limited to remote sensing images.
- the feature data of the sample image may be extracted through a feature extraction network, such as a convolutional neural network, and the specific structure of the feature extraction network is not limited in the embodiment of the present disclosure.
- the extracted feature data is multi-channel feature data, and the size of the feature data and the number of channels are determined by the specific structure of the feature extraction network.
- the characteristic data of the input image can be obtained from other devices, for example, the characteristic data sent by the receiving terminal, which is not limited in the embodiment of the present disclosure.
- step 102 a plurality of candidate bounding boxes of the input image are determined according to the characteristic data.
- the candidate bounding box includes the parameter information for obtaining the candidate bounding box.
- the parameters may include the length, width, center point coordinates of the candidate bounding box, One or any combination of angles.
- a foreground segmentation result of the input image is obtained according to the characteristic data, wherein the foreground segmentation result includes indication information indicating whether each pixel of the plurality of pixels of the input image belongs to the foreground.
- the foreground segmentation result obtained based on the feature data includes the probability that each pixel belongs to the foreground and/or the background among multiple pixels of the input image, and the foreground segmentation result gives a pixel-level prediction result.
- step 104 a target detection result of the input image is obtained according to the multiple candidate bounding boxes and the foreground segmentation result.
- the multiple candidate bounding boxes determined according to the feature data of the input image and the foreground segmentation result obtained through the feature data have a corresponding relationship.
- the multiple candidate bounding boxes are mapped to the foreground segmentation result.
- the target detection result may include information such as the position and number of target objects included in the input image.
- At least one target bounding box may be selected from the plurality of candidate bounding boxes according to the overlapping area between each candidate bounding box of the plurality of candidate bounding boxes and the foreground image area corresponding to the foreground segmentation result And based on the at least one target bounding box, the target detection result of the input image is obtained.
- a candidate bounding box whose proportion of the overlapping area between the plurality of candidate bounding boxes and the foreground image area in the entire candidate bounding box is greater than a first threshold may be used as the target bounding box.
- the present disclosure does not limit the specific value of the first threshold, which can be determined according to actual needs.
- the target detection method of the embodiments of the present disclosure can be applied to target objects to be detected with a wide disparity in aspect ratio, such as military targets such as airplanes, ships, and vehicles.
- the disparity in aspect ratio means that the aspect ratio is greater than a certain value, for example, greater than 5.
- the specific value can be specifically determined according to the detection target.
- the target object may be a ship.
- the following takes the input image as a remote sensing image and the detection target as a ship as an example to illustrate the target detection process. Those skilled in the art should understand that the target detection method can also be applied to other target objects. Refer to the schematic diagram of the target detection method shown in FIG. 2.
- the above-mentioned feature data is input into the first branch (upper branch 230 in FIG. 2) and the second branch (lower branch 240 in FIG. 2) respectively, and the following processing is performed respectively.
- a confidence score is generated for each anchor box.
- the confidence score is related to the probability that the anchor point frame is the foreground or the background. For example, the higher the probability that the anchor point frame is the foreground, the higher the confidence score.
- the anchor box is a rectangular box based on prior knowledge.
- the specific implementation method of the anchor box can be referred to the description in the subsequent training target detection network, which is not detailed here.
- the anchor box can be predicted as a whole to calculate the probability that the anchor box belongs to the foreground or the background, that is, to predict whether the anchor box contains an object or a specific target. If the anchor box contains an object or a specific target, The anchor frame is judged as the foreground.
- the foreground anchor point boxes can be selected as the foreground anchor point boxes.
- the foreground anchor point The frame offset can obtain the candidate bounding box, and the parameters of the candidate bounding box can be obtained based on the offset.
- the anchor point frame may include direction information, and various aspect ratios may be set to cover the target object to be detected.
- the number of specific directions and the value of the aspect ratio can be set according to actual needs.
- the constructed anchor point frame corresponds to 6 directions, where w represents the width of the anchor point frame, l represents the length of the anchor point frame, and ⁇ represents the angle of the anchor point frame (the anchor point frame is relative to the horizontal Rotation angle), (x,y) represents the coordinates of the center point of the anchor point box.
- the ⁇ is 0°, 30°, 60°, 90°, -30°, -60°.
- the overlapping detection boxes may be further removed by a non-maximum suppression (NMS) method. For example, you can first traverse all candidate bounding boxes, select the candidate bounding box with the highest confidence score, and traverse the remaining candidate bounding boxes. If the intersection over Union (IoU) with the current highest-scoring bounding box is greater than a certain threshold, then The bounding box is deleted. After that, continue to select the highest score in the unprocessed candidate bounding box, and repeat the above process. After multiple iterations, the unsuppressed ones are finally retained as the determined candidate bounding boxes. Taking FIG. 2 as an example, after NMS processing, three candidate bounding boxes labeled 1, 2, and 3 in the candidate bounding block diagram 231 are obtained.
- NMS non-maximum suppression
- the feature data predict the probability of each pixel in the input image as foreground and background, and generate a pixel-level foreground segmentation result 241 by using pixels with a foreground probability higher than a set value as foreground pixels.
- the candidate bounding box can be mapped to the pixel segmentation result, and the candidate bounding box can be determined according to the overlap area between the candidate bounding box and the foreground image area corresponding to the foreground segmentation result Target bounding box. For example, a candidate bounding box whose proportion of the overlapping area in the entire candidate bounding box is greater than a first threshold may be used as the target bounding box.
- Figure 2 map the three candidate bounding boxes labeled 1, 2, and 3 to the foreground segmentation result, and it can be calculated that the overlapping area of each candidate bounding box and the foreground image area occupies the entire candidate bounding box.
- the ratio is 92%
- candidate bounding box 2 the ratio is 86%
- candidate bounding box 3 the ratio is 65%.
- the first threshold is 70%
- the possibility that the candidate bounding box 3 is the target bounding box is excluded.
- the target bounding box is the candidate bounding box 1 and the candidate bounding box 2.
- the output target bounding box still has the possibility of overlapping.
- the threshold is set too high, it is possible that overlapping candidate bounding boxes may not be suppressed.
- the final output target bounding box may also include overlapping bounding boxes.
- the embodiment of the present disclosure may determine the final target object by the following method.
- the method is not limited to processing two overlapping bounding boxes, and it is also possible to process multiple overlapping bounding boxes by first processing two, and then processing the remaining one and the other bounding boxes.
- the overlap parameter of the first bounding box and the second bounding box is determined based on the angle between the first bounding box and the second bounding box; based on the first bounding box
- the overlap parameter of the frame and the second bounding box determines the target object position corresponding to the first bounding box and the second bounding box.
- the target bounding boxes (the first bounding box and the second bounding box) of the two may be overlapped. But in this case, the intersection of the first bounding box and the second bounding box is smaller than usual. Therefore, the present disclosure determines whether the detected objects in the two bounding boxes are all target objects by setting the overlap parameters of the first bounding box and the second bounding box.
- the overlap parameter when the overlap parameter is greater than the second threshold, it means that there may be only one target object in the first bounding box and the second bounding box, so one of the bounding boxes is used as the target object position. Since the result of foreground segmentation includes the pixel-level foreground image area, the foreground image area can be used to determine which bounding box to keep as the bounding box of the target object. For example, the first overlap parameter between the first bounding box and the corresponding foreground image area and the second overlap parameter between the second bounding box and the corresponding foreground image area can be calculated separately, and the first overlap parameter and the second overlap parameter can be compared. The target bounding box corresponding to the larger value is determined as the target object, and the target bounding box corresponding to the smaller value is removed. Through the above method, two or more bounding boxes overlapping on a target object are removed.
- both the first bounding box and the second bounding box are used as the target object position.
- the following example illustrates the process of determining the final target object.
- the bounding boxes A and B are the ship detection results, where the bounding box A and the bounding box B overlap, and the overlap parameter of the two is calculated to be 0.1.
- the second threshold is 0.3
- Mapping the bounding box to the pixel segmentation result shows that bounding box A and bounding box B correspond to different ships.
- the overlap parameter of the two bounding boxes is less than the second threshold, no additional process of mapping the bounding box to the pixel segmentation result is required, and the above is only for verification purposes.
- the bounding boxes C and D are the detection results of another type of ship, where the bounding box C and the bounding box D are overlapped, and the overlap parameter of the two is calculated to be 0.8 , Which is greater than the second threshold 0.3. Based on the calculation result of the overlapping parameters, it can be determined that the bounding box C and the bounding box D are actually the bounding boxes of the same ship. In this case, by mapping the bounding box C and the bounding box D to the pixel segmentation result, the corresponding foreground image area can be used to further determine the final target object. Calculate the first overlap parameter between the bounding box C and the foreground image area, and calculate the second overlap parameter between the bounding box D and the foreground image area.
- the bounding box C corresponding to the first overlapping parameter with a larger value contains the ship, and the bounding box D corresponding to the second overlapping parameter is removed at the same time , And finally output the bounding box C as the target bounding box of the ship.
- the foreground image area corresponding to the pixel segmentation result is used to assist in determining the target object of the overlapping bounding box. Since the pixel segmentation result corresponds to the pixel-level foreground image area with high spatial accuracy, the overlapping bounding box and The overlap parameter of the foreground image area further determines the target bounding box containing the target object, which improves the accuracy of target detection.
- the anchor point frame used is usually a rectangular frame without angle parameters, for target objects with a large difference in aspect ratio, such as ships, when the target object is in an inclined state, the anchor point frame is used to determine
- the target bounding box is the circumscribed rectangular frame of the target object, and its area is very different from the real area of the target object.
- the target bounding box 403 corresponding to the target object 401 is its circumscribed rectangle
- the target bounding box 404 corresponding to the target object 402 is also its circumscribed rectangle.
- the overlap parameter between the target bounding boxes of the object is the intersection ratio IoU between the two circumscribed rectangular boxes. Due to the difference in area between the target bounding box and the target object, the error of the calculated intersection ratio is very large, which leads to a decrease in the recall of the target detection.
- the anchor point frame of the present disclosure may introduce the angle parameter of the anchor point frame to increase the calculation accuracy of the intersection ratio.
- the angles of different target bounding boxes calculated from the anchor point box may also be different from each other.
- the present disclosure proposes a method for calculating overlap parameters as follows: obtain an angle factor according to the angle between the first bounding box and the second bounding box; according to the first bounding box and the second bounding box The intersection ratio between the bounding boxes and the angle factor obtain the overlap parameter.
- the overlap parameter is the product of the intersection ratio and the angle factor, where the angle factor can be obtained according to the angle between the first bounding box and the second bounding box, and its value is less than 1, and increases as the angle between the first bounding box and the second bounding box increases.
- angle factor can be expressed by formula (1):
- ⁇ is the angle between the first bounding box and the second bounding box.
- the overlap parameter increases as the angle between the first bounding box and the second bounding box increases.
- FIGS. 5A and 5B are examples to illustrate the influence of the above overlap parameter calculation method on target detection.
- the intersection ratio of the areas of the two is AIoU1, and the angle between the two is ⁇ 1 .
- the intersection ratio of the areas of the two is AIoU2, and the angle between the two is ⁇ 2 .
- the angle factor ⁇ is added to calculate the overlap parameter. For example, by multiplying the ratio of the intersection of the areas of the two bounding boxes by the value of the angle factor, the overlap parameter is obtained.
- the overlap parameter ⁇ 1 of the bounding box 501 and the bounding box 502 can be calculated using formula (2):
- the overlap parameter ⁇ 2 of the bounding box 503 and the bounding box 504 can be calculated using formula (3):
- the overlap parameter calculation results of FIG. 5A and FIG. 5B are opposite to the area intersection ratio calculation results. This is because in FIG. 5A, the angle between the two bounding boxes is larger, so that the value of the angle factor is also larger, so the obtained overlap parameter becomes larger. Correspondingly, in FIG. 5B, the angle between the two bounding boxes is small, so that the value of the angle factor is also small, so the obtained overlap parameter becomes small.
- the angle between the two may be very small.
- the area overlap between the detected two bounding boxes may be large. If the area is only used to calculate the intersection ratio, it is likely that the intersection ratio result will be larger, making it easy to be misjudged as two.
- Each bounding box contains the same target object.
- overlapping parameter calculation method is not limited to calculating the overlapping parameters between target bounding boxes, and can also be used for candidate bounding boxes, foreground anchor boxes, true bounding boxes, anchor boxes, etc. with angle parameters
- the overlap parameter between the boxes is calculated.
- other methods may also be used to calculate the overlap parameter, which is not limited in the embodiment of the present disclosure.
- the aforementioned target detection method may be implemented by a trained target detection network, which may be a neural network. Before using the target detection network, it needs to be trained first to obtain optimized parameter values.
- the target detection network may include a feature extraction network, a target prediction network, and a foreground segmentation network. Referring to the flowchart of the training method embodiment shown in FIG. 6, the following steps may be included.
- step 601 feature extraction processing is performed on a sample image through the feature extraction network to obtain feature data of the sample image.
- the sample image can be a remote sensing image.
- the remote sensing image is an image obtained through the electromagnetic radiation characteristic signal of the ground object detected by sensors mounted on, for example, artificial satellites and aerial photography aircraft.
- the sample image can also be other types of images and is not limited to remote sensing images.
- the sample image includes pre-labeled target object labeling information.
- the label information may include the ground truth of the calibrated target object.
- the label information may be the coordinates of the four vertices of the calibrated ground truth.
- the feature extraction network may be a convolutional neural network, and the embodiment of the present disclosure does not limit the specific structure of the feature extraction network.
- step 602 a plurality of sample candidate bounding boxes are obtained through the target prediction network according to the characteristic data.
- multiple candidate bounding boxes of the target object are predicted and generated according to the feature data of the sample image.
- the information contained in the candidate bounding box may include at least one of the following: the probability of foreground and background in the bounding box, and parameters of the bounding box, for example, the size, angle, and position of the bounding box.
- step 603 the foreground segmentation result in the sample image is obtained according to the characteristic data.
- the sample foreground segmentation result of the sample image is obtained through the foreground segmentation network according to the feature data.
- the sample foreground segmentation result includes indication information indicating whether each pixel of the plurality of pixels of the sample image belongs to the foreground. That is, the corresponding foreground image area can be obtained through the result of foreground segmentation, and the foreground image area includes all pixels predicted to be foreground.
- a network loss value is determined according to the multiple sample candidate bounding boxes, the sample foreground segmentation result, and the label information of the sample image.
- the network loss value may include a first network loss value corresponding to the target prediction network and a second network loss value corresponding to the foreground segmentation network.
- the first network loss value is obtained according to the annotation information in the sample image and the information of the sample candidate bounding box.
- the label information of the target object may be the coordinates of the four vertices of the true bounding box of the target object
- the predicted parameters of the sample candidate bounding box obtained by prediction may be the length, width, and relative horizontal of the candidate bounding box.
- the rotation angle and the coordinates of the center point Based on the coordinates of the four vertices of the real bounding box, the length, width, rotation angle relative to the horizontal, and the coordinates of the center point of the real bounding box can be calculated accordingly. Therefore, based on the prediction parameters of the sample candidate bounding box and the real parameters of the real bounding box, the first network loss value that reflects the difference between the annotation information and the predicted information can be obtained.
- the second network loss value is obtained based on the sample foreground segmentation result and the real foreground image area. Based on the pre-labeled real bounding box of the target object, the region containing the target object labeled in the original sample image can be obtained, and the pixels contained in the region are real foreground pixels, which are real foreground image regions. Therefore, the second network loss value can be obtained based on the sample foreground segmentation result and the annotation information, that is, through the comparison between the predicted foreground image area and the real foreground image area.
- step 605 the network parameters of the target detection network are adjusted based on the network loss value.
- the above-mentioned network parameters can be adjusted through a gradient back propagation method.
- the parameters of each network can be adjusted together through the difference between the prediction results of the two branches and the actual target object labeled, which can provide at the same time Object-level supervision information and pixel-level supervision information improve the quality of features extracted by the feature extraction network.
- the network used to predict the candidate bounding box and the foreground image in the embodiments of the present disclosure are all one-stage detectors, which can achieve higher detection efficiency.
- the first network loss value may be determined based on the intersection ratio between the plurality of sample candidate bounding boxes and at least one real target bounding box labeled by the sample image.
- the calculation result of the cross-union ratio can be used to select positive samples and/or negative samples from multiple anchor point boxes.
- an anchor box whose intersection ratio with the real bounding box is greater than a certain value, such as 0.5 can be regarded as a candidate bounding box containing the foreground, and used as a positive sample to train the target detection network; and it can be compared with the real boundary
- An anchor box whose intersection ratio is less than a certain value, such as 0.1, is used as a negative sample to train the network. Based on the selected positive samples and/or negative samples, the first network loss value is determined.
- the embodiments of the present disclosure adopt an anchor point frame with directional parameters.
- the present disclosure proposes a calculation method for the intersection and union ratio, which can be used for anchoring.
- the calculation of the intersection ratio between the point box and the ground truth bounding box can also be used to calculate the intersection ratio between the candidate bounding box and the ground truth bounding box.
- the ratio of the intersection of the anchor point box and the circumscribed area of the true bounding box and the union can be used as the intersection ratio.
- FIG. 7 takes FIG. 7 as an example.
- the bounding box 701 and the bounding box 702 are rectangular boxes with different aspect ratios and angle parameters.
- the aspect ratio of the two is 5, for example.
- the circumscribed circle of the bounding box 701 is 703, and the circumscribed circle of the bounding box 702 is 704.
- the ratio of the intersection of the areas of the circumscribed circle 703 and the circumscribed circle 704 (shaded in the figure) and the union can be used as the intersection ratio.
- intersection ratio between the anchor point box and the real bounding box For the calculation of the intersection ratio between the anchor point box and the real bounding box, other methods can also be used, which are not limited in the embodiment of the present disclosure.
- the method of calculating the intersection ratio proposed in the above embodiment retains more samples that are similar in shape but different in direction through the constraint of the direction information, which increases the number and proportion of the selected positive samples, thus strengthening The supervision and learning of direction information further improves the accuracy of direction prediction.
- the training method of the target detection network will be described in more detail. Among them, the following describes the training method by taking the detected target object as an example of a ship. It should be understood that the target object detected by the present disclosure is not limited to ships, and may also be other objects with a large difference in length and width.
- a sample set Before training the neural network, a sample set may be prepared first, and the sample set may include multiple training samples for training the target detection network.
- training samples can be obtained in the following manner.
- the remote sensing image may include multiple ships, and the true bounding box of each ship needs to be marked.
- the parameter information of each true bounding box needs to be marked, such as the coordinates of the four vertices of the bounding box.
- the pixels in the true bounding box can be determined as the real foreground pixels, that is, while marking the true bounding box of the ship, the real foreground image of the ship is also obtained.
- pixels in the real bounding box also include pixels included in the real bounding box itself.
- the target detection network may include a feature extraction network, and a target prediction network and a foreground segmentation network respectively cascaded with the feature extraction network.
- the feature extraction network is used to extract the features of the sample image. It can be a convolutional neural network.
- the existing VGG (Visual Geometry Group) network, ResNet, DenseNet, etc. can be used, and other convolutional neural networks can also be used. structure. This application does not limit the specific structure of the feature extraction network.
- the feature extraction network may include network units such as convolutional layer, excitation layer, and pooling layer, which are stacked in a certain manner by the above-mentioned network units. Become.
- the target prediction network is used to predict the bounding box of the target object, that is, predict the prediction information of the candidate bounding box. This application does not limit the specific structure of the target prediction network.
- the target prediction network may include network units such as a convolutional layer, a classification layer, and a regression layer, which are stacked in a certain manner. .
- the foreground segmentation network is used to predict the foreground image in the sample image, that is, predict the pixel area containing the target object. This application does not limit the specific structure of the foreground segmentation network.
- the foreground segmentation network may include an upsampling layer and a mask layer, which are formed by stacking the foregoing network units in a certain manner.
- FIG. 8 shows a network structure of a target detection network to which the embodiments of the present disclosure can be applied. It should be noted that FIG. 8 only shows an example of a target detection network, and the actual implementation is not limited to this.
- the target extraction network includes a feature extraction network 810 and a target prediction network 820 and a foreground segmentation network 830 cascaded with the feature extraction network 810 respectively.
- the feature extraction network 810 includes a first convolutional layer (C1) 811, a first pooling layer (P1) 812, a second convolutional layer (C2) 813, a second pooling layer (P2) 814 and The third convolutional layer (C3) 815, that is, in the feature extraction network 810, the convolutional layer and the pooling layer are alternately connected together.
- the convolutional layer can extract different features in the image through multiple convolution kernels to obtain multiple feature maps.
- the pooling layer is located after the convolutional layer, and can perform local averaging and down-sampling operations on the feature map data to reduce features The resolution of the data. As the number of convolutional layers and pooling layers increases, the number of feature maps gradually increases, and the resolution of the feature maps gradually decreases.
- the multi-channel feature data output by the feature extraction network 810 is input to the target prediction network 820 and the foreground segmentation network 830 respectively.
- the target prediction network 820 includes a fourth convolutional layer (C4) 821, a classification layer 822, and a regression layer 823. Among them, the classification layer 822 and the regression layer 823 are cascaded with the fourth convolution layer 821 respectively.
- C4 fourth convolutional layer
- classification layer 822 and the regression layer 823 are cascaded with the fourth convolution layer 821 respectively.
- the fourth convolution layer 821 uses a sliding window (for example, 3*3) to convolve the input feature data, each window corresponds to multiple anchor point boxes, and each window generates one for the classification layer 823 and the regression layer 824 Fully connected vector.
- a sliding window for example, 3*3
- each window corresponds to multiple anchor point boxes
- each window generates one for the classification layer 823 and the regression layer 824 Fully connected vector.
- two or more convolutional layers can also be used to convolve the input feature data.
- the classification layer 822 is used to determine whether the bounding box generated by the anchor point box is the foreground or the background.
- the regression layer 823 is used to obtain the approximate position of the candidate bounding box. Based on the output results of the classification layer 822 and the regression layer 823, it can be predicted that The candidate bounding box of the target object, and output the probability of foreground and background in the candidate bounding box and the parameters of the candidate bounding box.
- the foreground segmentation network 830 includes an up-sampling layer 831 and a mask layer 832.
- the up-sampling layer 831 is used to convert the input feature data into the original sample image size;
- the mask layer 832 is used to generate a binary mask of the foreground, that is, output 1 for foreground pixels and 0 for background pixels.
- the fourth convolutional layer 821 and the mask layer 832 can convert the image size to correspond to the feature positions, that is, the target prediction network 820 and the foreground segmentation network 830 The output of can predict the information of the same position on the image, and then calculate the overlapping area.
- the target detection network Before training the target detection network, you can set some network parameters. For example, you can set the number of convolution kernels used by each convolution layer in the feature extraction network 810 and the convolution layer in the target prediction network, and you can also set The size of the convolution kernel, etc. For parameter values such as the value of the convolution kernel and the weights of other layers, self-learning can be carried out through iterative training.
- the training of the target detection network can be started.
- the following will list the specific training methods of the target detection network.
- the structure of the target detection network can be seen in FIG. 8.
- the sample image input to the target detection network may be a remote sensing image containing ship images.
- the real bounding box of the contained ship is marked, and the marking information may be parameter information of the real bounding box, for example, the coordinates of the four vertices of the bounding box.
- the input sample image first passes through the feature extraction network to extract the features of the sample image, and output the multi-channel feature data of the sample image.
- the size of the output feature data and the number of channels are determined by the convolutional layer structure and the pooling layer structure of the feature extraction network.
- the multi-channel feature data enters the target prediction network on the one hand, and the target prediction network predicts the candidate bounding box containing the ship based on the input feature data based on the current network parameter settings, and generates prediction information of the candidate bounding box.
- the prediction information may include the probability that the bounding box is the foreground or the background, and parameter information of the bounding box, for example, the size, position, and angle of the bounding box.
- the value LOSS1 of the first network loss function that is, the first network loss value
- the value of the first network loss function reflects the difference between the label information and the prediction information.
- the multi-channel feature data enters the foreground segmentation network, and the foreground segmentation network predicts that the sample image contains the foreground image area of the ship based on the current network parameter settings. For example, the probability that each pixel in the feature data is the foreground and the background, and the pixels whose foreground probability is greater than the set value can be used as foreground pixels, and pixel segmentation can be performed to obtain the predicted foreground image area.
- the foreground pixels in the sample image can be obtained, that is, the real bounding box in the sample image can be obtained.
- Foreground image Based on the predicted foreground image and the real foreground image obtained through the annotation information, the value LOSS2 of the second network loss function, that is, the second network loss value, can be obtained.
- the value of the second network loss function reflects the difference between the predicted foreground image and the annotation information.
- the total loss value determined based on the value of the first network loss function and the value of the second network loss function can be reversed back to the target detection network to adjust the value of the network parameters, such as adjusting the value of the convolution kernel and other layers the weight of.
- the sum of the first network loss function and the second network loss function may be determined as the total loss function, and the total loss function may be used to adjust the parameters.
- the training sample set can be divided into multiple image subsets (batch), and each image subset includes one or more training samples.
- each image subset is sequentially input to the network, and the network parameters are adjusted in combination with the loss value of each sample prediction result in the training samples included in the image subset.
- input the next image subset to the network for the next iteration training input the next image subset to the network for the next iteration training.
- the training samples included in different image subsets are at least partially different.
- the predetermined ending condition for example, may be that the total loss value (LOSS value) has fallen to a certain threshold, or the predetermined number of iterations of the target detection network is reached.
- the target prediction network provides object-level supervision information
- the pixel segmentation network provides pixel-level supervision information.
- the quality of the features extracted by the feature extraction network is obtained.
- the target prediction network can predict the candidate bounding box of the target object in the following manner.
- the structure of the target prediction network can be seen in Figure 8.
- Fig. 10 is a flowchart of a method for predicting a candidate bounding box. As shown in Fig. 10, the process may include the following steps.
- each point of the feature data is used as an anchor point, and multiple anchor point boxes are constructed with each anchor point as the center.
- a total of H ⁇ W ⁇ k anchor point boxes are constructed, where k is the number of anchor point boxes generated at each anchor point.
- multiple anchor point frames constructed at one anchor point are set with different aspect ratios to be able to cover the target object to be detected.
- a priori anchor point box can be directly generated through hyperparameter settings, and then the anchor point box can be predicted through features.
- step 1002 the anchor points are mapped back to the sample image to obtain the area contained in each anchor point frame on the sample image.
- all the anchor points are mapped back to the sample image, that is, the feature data is mapped back to the sample image, and then the area of the sample image framed by the anchor point frame generated with the anchor point as the center can be obtained.
- a priori anchor point frame, predicted value, and current feature resolution can be combined to calculate the anchor point frame back to the position and size of the sample image to obtain the area contained by each anchor point frame on the sample image.
- the above process is equivalent to using a convolution kernel (sliding window) to perform a sliding operation on the input feature data.
- the convolution kernel slides to a certain position of the feature data, it maps back to a sample image with the center of the current sliding window as the center Area, the center of this area on the sample image is the corresponding anchor point, and then the anchor point frame is framed with the anchor point as the center. That is to say, although the anchor point is defined based on the feature data, it is ultimately relative to the original sample image.
- the process of extracting features can be implemented through the fourth convolution layer 821, and the convolution kernel of the fourth convolution layer 821 can be, for example, 3 ⁇ 3 in size.
- an anchor point frame whose intersection ratio with the ground truth bounding box is greater than a first set value, for example 0.5, may be regarded as a candidate bounding box containing the foreground.
- the probability that the anchor point frame is the foreground and the background can also be determined by performing two classifications on the anchor point frame.
- an image subset it can be made to contain multiple anchor boxes with the foreground labels randomly extracted from a sample image, for example, 256, which are used as positive samples for training.
- negative samples can also be used to train the target detection network.
- the negative sample may be, for example, an anchor point box whose intersection ratio with the ground truth bounding box is smaller than a second set value, such as 0.1.
- the aspect ratio may be different from the aspect ratio of the ship in the sample image, and the position and angle of the foreground anchor frame may also be different from the sample ship, so . It is necessary to use the offset between the foreground anchor point box and its corresponding real bounding box for regression training, so that the target prediction network has the ability to predict its offset to the candidate bounding box through the front scenic spot box, thereby obtaining the candidate boundary The parameters of the box.
- step 1003 and step 1004 information about the candidate bounding box can be obtained: the probability of foreground and background in the candidate bounding box, and the parameters of the candidate bounding box. Based on the aforementioned candidate bounding box information and the label information in the sample image (the true bounding box corresponding to the target object), the first network loss can be obtained.
- the target prediction network is a one-stage network. After the candidate bounding box is obtained for the first prediction, the prediction result of the candidate bounding box is output, which improves the detection efficiency of the network.
- the parameters of the anchor point frame corresponding to each anchor point usually include the length, the width, and the coordinates of the center point.
- a method for setting a rotating anchor frame is proposed.
- a multi-directional anchor point frame is constructed with each anchor point as the center, and multiple aspect ratios can be set to cover the target object to be detected.
- the number of specific directions and the value of the aspect ratio can be set according to actual needs.
- the constructed anchor point frame corresponds to 6 directions, where w represents the width of the anchor point frame, l represents the length of the anchor point frame, and ⁇ represents the angle of the anchor point frame (the anchor point frame is relative to the horizontal Rotation angle), (x,y) represents the coordinates of the center point of the anchor point box.
- the ⁇ is 0°, 30°, 60°, 90°, -30°, -60°.
- the parameters of the anchor box can be expressed as (x, y, w, l, ⁇ ).
- the aspect ratio can be set to 1, 3, 5, or other values for the detected target object.
- the parameters of the candidate bounding box can also be expressed as (x, y, w, l, ⁇ ), and the parameters can be calculated by using the regression layer 823 in FIG. 8.
- the method of regression calculation is as follows.
- the parameter value of the foreground anchor box is [A x ,A y ,A w ,A l ,A ⁇ ], where A x ,A y ,A w ,A l, A ⁇ represent the foreground anchor box Center point x coordinate, center point y coordinate, width, length, angle; the five values corresponding to the true bounding box are [G x ,G y ,G w ,G l ,G ⁇ ], where G x ,G y , G w , G l , G ⁇ respectively represent the center point x coordinate, center point y coordinate, width, length, and angle of the true bounding box
- the offset between the foreground anchor box and the true bounding box can be determined (d x (A), d y (A), d w (A), d l (A), d ⁇ (A)], where dx(A), dy(A), dw(A), dl(A), d ⁇ (A) represent the center point x coordinate, center point y coordinate,
- Each offset can be calculated by formulas (4)-(8), for example:
- formula (6) and formula (7) use logarithms to express the offset of length and width in order to quickly converge when the difference is large.
- each foreground anchor point box selects the ground truth bounding box with the highest overlap with it to calculate the offset.
- the regression layer 823 can be trained using the above offset.
- the target prediction network has the offset of identifying each anchor point box to the corresponding optimal candidate bounding box [d x '(A), d y '(A), d w '(A ),d l '(A),d ⁇ '(A)], that is to say, based on the parameter value of the anchor point box, the parameter value of the candidate bounding box can be determined, including the center point x coordinate and the center point y coordinate , Width, length, angle.
- the regression layer can be used to first calculate the offset from the foreground anchor box to the candidate bounding box. Since the optimization of the network parameters during training has not been completed, the offset may differ from the actual offset [d x (A), d y (A), d w (A), d l (A), d ⁇ (A)] The gap is larger.
- the foreground anchor point box is offset based on the offset to obtain the candidate bounding box, and the parameters of the candidate bounding box are obtained.
- the offset from the foreground anchor box to the candidate bounding box can be used [d x '(A), d y '(A), d w '(A), d l ' (A),d ⁇ '(A)] and the offset of the foreground anchor box and the true bounding box during training [d x (A), d y (A), d w (A), d l (A) ,d ⁇ (A)] to calculate the regression loss.
- the probability is the probability that the candidate bounding box is the foreground and background. Based on this probability, you can Determine the classification loss of foreground and background in the prediction candidate bounding box.
- the sum of the classification loss and the regression loss of the parameters of the predicted candidate bounding box constitutes the value of the first network loss function.
- network parameters can be adjusted based on the value of the first network loss function of all candidate bounding boxes.
- the weight ratio of each parameter of the anchor point box can be set, so that the weight ratio of the width is higher than the weight ratio of other parameters, and according to the set weight Ratio, calculate the value of the first network loss function.
- a parameter with a higher weight ratio contributes more to the final calculated loss function value.
- the width is very small compared to the length. Therefore, setting the weight of the width to be higher than the weight of other parameters can improve the prediction accuracy of the width.
- the foreground image area in the sample image can be predicted in the following manner.
- the structure of the foreground segmentation network can be seen in Figure 8.
- Fig. 12 is a flowchart of an embodiment of a method for predicting a foreground image area. As shown in Fig. 12, the process may include the following steps.
- step 1201 an up-sampling process is performed on the feature data, so that the size of the processed feature data is the same as the size of the sample image.
- the feature data can be up-sampled through a deconvolution layer or bilinear difference value to enlarge the feature data back to the sample image size. Since the input pixel segmentation network is multi-channel feature data, after up-sampling processing, the feature data of the corresponding channel number and the size of the sample image is obtained. Each position on the feature data has a one-to-one correspondence with the original image position.
- step 1202 pixel segmentation is performed based on the processed feature data to obtain a sample foreground segmentation result of the sample image.
- the probability that it belongs to the foreground and the background can be judged.
- a threshold can be set to determine a pixel whose probability of belonging to the foreground is greater than the set threshold as a foreground pixel, then mask information can be generated for each pixel, which can usually be represented by 0 or 1, where 0 can be used to represent the background, and 1 Indicates prospects.
- the pixels of the foreground can be determined, so that the pixel-level foreground segmentation result is obtained.
- each pixel on the feature data corresponds to the area on the sample image, and the true bounding box of the target object has been marked in the sample image, according to the labeling information, the classification result of each pixel and the true bounding box are determined. Difference, get classification loss.
- the value of the second network loss function corresponding to it can be determined by the sum of the classification loss of each pixel.
- a pixel-level foreground image area can be obtained, which improves the accuracy of target detection.
- FIG. 13 provides a target detection device.
- the device may include: a feature extraction unit 1301, a target prediction unit 1302, a foreground segmentation unit 1303, and a target determination unit 1304.
- the feature extraction unit 1301 is used to obtain feature data of the input image.
- the target prediction unit 1302 is configured to determine multiple candidate bounding boxes of the input image according to the feature data.
- the foreground segmentation unit 1303 is configured to obtain a foreground segmentation result of the input image according to the feature data, where the foreground segmentation result includes indication information indicating whether each pixel of the plurality of pixels of the input image belongs to the foreground.
- the target determination unit 1304 is configured to obtain a target detection result of the input image according to the plurality of candidate bounding boxes and the foreground segmentation result.
- the target determining unit 1304 is specifically configured to: according to the overlapping area between each candidate bounding box of the plurality of candidate bounding boxes and the foreground image area corresponding to the foreground segmentation result, the target At least one target bounding box is selected from the candidate bounding boxes; based on the at least one target bounding box, a target detection result of the input image is obtained.
- the target determining unit 1304 is used for the overlapping area between each candidate bounding box of the plurality of candidate bounding boxes and the foreground image area corresponding to the foreground segmentation result, from When at least one target bounding box is selected from a plurality of candidate bounding boxes, it is specifically used to: for each candidate bounding box of the plurality of candidate bounding boxes, if the overlapping area between the candidate bounding box and the corresponding foreground image area is The proportion of the candidate bounding box is greater than the first threshold, then the candidate bounding box is used as the target bounding box.
- the at least one target bounding box includes a first bounding box and a second bounding box
- the target determining unit 1304 is configured to obtain the target of the input image based on the at least one target bounding box.
- the detection result is specifically used to: determine the overlap parameter of the first bounding box and the second bounding box based on the angle between the first bounding box and the second bounding box;
- An overlap parameter of the bounding box and the second bounding box determines the target object positions corresponding to the first bounding box and the second bounding box.
- the target determining unit 1304 is configured to determine the first bounding box and the second bounding box based on the angle between the first bounding box and the second bounding box
- the overlap parameter is specifically used to: obtain the angle factor according to the angle between the first bounding box and the second bounding box; according to the difference between the first bounding box and the second bounding box
- the intersection ratio and the angle factor are used to obtain the overlap parameter.
- the overlap parameter of the first bounding box and the second bounding box is the product of the intersection ratio and the angle factor, wherein the angle factor increases with the first bounding box The angle between the frame and the second bounding box increases as the angle increases.
- the overlap parameter of the first bounding box and the second bounding box follows the first bounding box and the second bounding box.
- the angle between increases increases.
- the determining the position of the target object corresponding to the first bounding box and the second bounding box based on the overlap parameter of the first bounding box and the second bounding box includes: In a case where the overlap parameter of the first bounding box and the second bounding box is greater than a second threshold, use one of the first bounding box and the second bounding box as the target object position.
- using one of the first bounding box and the second bounding box as the target object position includes: determining the foreground corresponding to the first bounding box and the foreground segmentation result The overlap parameter between the image areas and the overlap parameter between the second bounding box and the foreground image area; the difference between the first bounding box and the second bounding box and the foreground image area The bounding box with the larger overlap parameter is used as the target object position.
- the determining the position of the target object corresponding to the first bounding box and the second bounding box based on the overlap parameter of the first bounding box and the second bounding box includes: In the case where the overlap parameter of the first bounding box and the second bounding box is less than or equal to a second threshold, both the first bounding box and the second bounding box are used as the target object position.
- the aspect ratio of the target object to be detected in the input image is greater than a specific value.
- Figure 14 provides a training device for a target detection network, which includes a feature extraction network, a target prediction network, and a foreground segmentation network.
- the apparatus may include: a feature extraction unit 1401, a target prediction unit 1402, a foreground segmentation unit 1403, a loss value determination unit 1404, and a parameter adjustment unit 1405.
- the feature extraction unit 1401 is configured to perform feature extraction processing on a sample image through the feature extraction network to obtain feature data of the sample image.
- the target prediction unit 1402 is configured to obtain multiple sample candidate bounding boxes through the target prediction network according to the feature data.
- the foreground segmentation unit 1403 is configured to obtain a sample foreground segmentation result of the sample image through the foreground segmentation network according to the feature data, wherein the sample foreground segmentation result includes a plurality of pixels indicating the sample image Information indicating whether each pixel belongs to the foreground.
- the loss value determining unit 1404 is configured to determine the network loss value according to the multiple sample candidate bounding boxes, the sample foreground segmentation result, and the label information of the sample image.
- the parameter adjustment unit 1405 is configured to adjust the network parameters of the target detection network based on the network loss value.
- the annotation information includes the true bounding box of at least one target object contained in the sample image
- the loss value determining unit 1404 is specifically configured to: for each of the plurality of candidate bounding boxes A candidate bounding box, determining the intersection ratio between the candidate bounding box and each real target bounding box in at least one real target bounding box marked by the sample image; according to each of the determined multiple candidate bounding boxes The intersection and union ratio of the candidate bounding boxes determine the first network loss value.
- intersection ratio between the candidate bounding box and the real target bounding box is obtained based on a circumscribed circle containing the candidate bounding box and the real target bounding box.
- the weight corresponding to the width of the candidate bounding box is higher than the weight corresponding to the length of the candidate bounding box.
- the foreground segmentation unit 1403 is specifically configured to: perform an up-sampling process on the feature data, so that the size of the feature data after processing is the same as the size of the sample image; Pixel segmentation is performed on the feature data of, and the sample foreground segmentation result of the sample image is obtained.
- the aspect ratio of the target object contained in the sample image is higher than a set value.
- FIG. 15 is a target detection device provided by at least one embodiment of the present disclosure.
- the device includes a memory 1501 and a processor 1502.
- the memory is used to store computer instructions that can be run on a processor.
- the computer instructions are used to implement the target detection method described in any embodiment of this specification.
- the device may also include a network interface 1503 and an internal bus 1504.
- the memory 1501, the processor 1502, and the network interface 1503 communicate with each other through the internal bus 1504.
- Fig. 16 is a training device for a target detection network provided by at least one embodiment of the present disclosure.
- the device includes a memory 1601 and a processor 1602.
- the memory is used to store computer instructions that can be run on a processor.
- the method for training the target detection network described in any embodiment of this specification is implemented when the computer instructions are executed.
- the device may also include a network interface 1603 and an internal bus 1604.
- the memory 1601, the processor 1602, and the network interface 1603 communicate with each other through the internal bus 1604.
- At least one embodiment of this specification also provides a non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the target detection method described in any embodiment of this specification is implemented, And/or, implement the method for training a target detection network described in any embodiment of this specification.
- the computer-readable storage medium may be in various forms.
- the machine-readable storage medium may be: non-volatile memory, flash memory, storage drive (such as hard disk drive). ), solid-state hard disk, any type of storage disk (such as optical disc, DVD, etc.), or similar storage medium, or a combination of them.
- the computer-readable medium may also be paper or other suitable medium capable of printing programs. Using these media, these programs can be obtained by electrical means (for example, optical scanning), can be compiled, interpreted, and processed in a suitable manner, and then can be stored in a computer medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Remote Sensing (AREA)
- Astronomy & Astrophysics (AREA)
- Image Analysis (AREA)
Abstract
一种目标检测及目标检测网络的训练方法、装置及设备。该目标检测方法包括:获得输入图像的特征数据(101);根据所述特征数据,确定所述输入图像的多个候选边界框(102);根据所述特征数据,获得所述输入图像的前景分割结果,其中,前景分割结果包含指示所述输入图像的多个像素中每个像素是否属于前景的指示信息(103);根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果(104)。
Description
本公开涉及图像处理技术领域,尤其涉及一种目标检测及目标检测网络的训练方法、装置及设备。
目标检测是计算机视觉领域的重要问题,尤其对于飞机、舰船等军事目标的检测,由于其具有影像尺寸大、目标尺寸小的特点,导致检测难度较大。而且,对于具有密集排列状态的舰船等目标,检测精度较低。
发明内容
本公开实施例提供了一种目标检测及目标检测网络的训练方法、装置及设备。
第一方面,提供一种目标检测方法,包括:
获得输入图像的特征数据;根据所述特征数据,确定所述输入图像的多个候选边界框;根据所述特征数据,获得所述输入图像的前景分割结果,其中,前景分割结果包含指示所述输入图像的多个像素中每个像素是否属于前景的指示信息;根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果。
结合本公开提供的任一实施方式,所述根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果,包括:根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框;基于所述至少一个目标边界框,得到所述输入图像的目标检测结果。
结合本公开提供的任一实施方式,所述根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框,包括:对于所述多个候选边界框中每个候选边界框,若该候选边界框与对应的前景图像区域之间的重叠区域在该候选边界框中所占的比例大于第一阈值,则将该候选边界框作为所述目标边界框。
结合本公开提供的任一实施方式,所述至少一个目标边界框包括第一边界框和第二边界框,所述基于所述至少一个目标边界框,得到所述输入图像的目标检测结果,包括:基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠参数;基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置。
结合本公开提供的任一实施方式,所述基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠参数,包括:根据所述第一边界框和所述第二边界框之间的夹角,获得角度因子;根据所述第一边界框和所述第二边界框之间的交并比和所述角度因子,获得所述重叠参数。
结合本公开提供的任一实施方式,所述第一边界框和所述第二边界框的重叠参数为所述交并比 与所述角度因子的乘积,其中,所述角度因子随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
结合本公开提供的任一实施方式,在所述交并比保持一定的条件下,所述第一边界框和所述第二边界框的重叠参数随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
结合本公开提供的任一实施方式,所述基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置,包括:在所述第一边界框和所述第二边界框的重叠参数大于第二阈值的情况下,将所述第一边界框和所述第二边界框中的其中一个边界框作为所述目标对象位置。
结合本公开提供的任一实施方式,所述将所述第一边界框和所述第二边界框中的其中一个边界框作为所述目标对象位置,包括:确定所述第一边界框与所述前景分割结果对应的前景图像区域之间的重叠参数和所述第二边界框与所述前景图像区域之间的重叠参数;将所述第一边界框和所述第二边界框中与所述前景图像区域之间的重叠参数较大的边界框作为所述目标对象位置。
结合本公开提供的任一实施方式,所述基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置,包括:在所述第一边界框和所述第二边界框的重叠参数小于或等于第二阈值的情况下,将所述第一边界框和所述第二边界框均作为所述目标对象位置。
结合本公开提供的任一实施方式,所述输入图像中待检测的目标对象的长宽比大于特定数值。
第二方面,提供一种目标检测网络的训练方法,所述目标检测网络包括特征提取网络、目标预测网络和前景分割网络,所述方法包括:
通过所述特征提取网络对样本图像进行特征提取处理,获得所述样本图像的特征数据;根据所述特征数据,通过所述目标预测网络获得多个样本候选边界框;根据所述特征数据,通过所述前景分割网络获得所述样本图像的样本前景分割结果,其中,所述样本前景分割结果包含指示所述样本图像的多个像素点中每个像素点是否属于前景的指示信息;根据所述多个样本候选边界框和所述样本前景分割结果以及所述样本图像的标注信息,确定网络损失值;基于所述网络损失值,对所述目标检测网络的网络参数进行调整。
结合本公开提供的任一实施方式,所述标注信息包括所述样本图像包含的至少一个目标对象的真实边界框,所述根据所述多个样本候选边界框和所述样本前景图像区域以及所述样本图像的标注信息,确定网络损失值,包括:对于所述多个候选边界框中的每个候选边界框,确定该候选边界框与所述样本图像标注的至少一个真实目标边界框中的每个真实目标边界框之间的交并比;根据确定的所述多个候选边界框中每个候选边界框的所述交并比,确定第一网络损失值。
结合本公开提供的任一实施方式,所述候选边界框和所述真实目标边界框之间的交并比是基于包含所述候选边界框与所述真实目标边界框的外接圆得到的。
结合本公开提供的任一实施方式,在确定所述网络损失值的过程中,所述候选边界框的宽度所对应的权重高于所述候选边界框的长度所对应的权重。
结合本公开提供的任一实施方式,所述根据所述特征数据,通过所述前景分割网络获得所述样本图像的样本前景分割结果,包括:对所述特征数据进行上采样处理,以使得处理后的所述特征数 据的大小与样本图像的大小相同;基于所述处理后的所述特征数据进行像素分割,获得所述样本图像的样本前景分割结果。
结合本公开提供的任一实施方式,所述样本图像包含的目标对象的长宽比高于设定值。
第三方面,提供一种目标检测装置,包括:
特征提取单元,用于获得输入图像的特征数据;目标预测单元,用于根据所述特征数据,确定所述输入图像的多个候选边界框;前景分割单元,用于根据所述特征数据,获得所述输入图像的前景分割结果,其中,前景分割结果包含指示所述输入图像的多个像素中每个像素是否属于前景的指示信息;目标确定单元,用于根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果。
第四方面,提供一种目标检测网络的训练装置,所述目标检测网络包括特征提取网络、目标预测网络和前景分割网络,所述装置包括:
特征提取单元,用于通过所述特征提取网络对样本图像进行特征提取处理,获得所述样本图像的特征数据;目标预测单元,用于根据所述特征数据,通过所述目标预测网络获得多个样本候选边界框;前景分割单元,用于根据所述特征数据,通过所述前景分割网络获得所述样本图像的样本前景分割结果,其中,所述样本前景分割结果包含指示所述样本图像的多个像素点中每个像素点是否属于前景的指示信息;损失值确定单元,用于根据所述多个样本候选边界框和所述样本前景分割结果以及所述样本图像的标注信息,确定网络损失值;参数调整单元,用于基于所述网络损失值,对所述目标检测网络的网络参数进行调整。
第五方面,提供一种目标测检设备,所述设备包括存储器、处理器,所述存储器用于存储可在所述处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现以上所述的目标检测方法。
第六方面,提供一种目标检测网络的训练设备,所述设备包括存储器、处理器,所述存储器用于存储可在所述处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现以上所述的目标检测网络的训练方法。
第七方面,提供一种非易失性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时,促使所述处理器实现以上所述的目标检测方法,和/或,实现以上所述的目标检测网络的训练方法。
本公开一个或多个实施例的目标检测及目标检测网络的训练方法、装置及设备,根据输入图像的特征数据确定多个候选边界框,并根据所述特征数据得到前景分割结果,通过结合所述多个候选边界框和前景分割结果,能够更准确地确定所检测的目标对象。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本说明书的实施例,并与说明书一起用于解释本说明书的原理。
图1是本申请实施例示出的一种目标检测方法的流程图。
图2是本申请实施例示出的一种目标检测方法的示意图。
图3A和图3B分别是本申请示例性实施例示出的船舰检测结果图。
图4是相关技术中的一种目标边界框的示意图。
图5A和图5B分别是本申请示例性实施例示出的重叠参数计算方法示意图。
图6是本申请实施例示出的一种目标检测网络的训练方法的流程图。
图7是本申请实施例示出的一种交并比计算方法示意图。
图8是本申请实施例示出的一种目标检测网络的网络结构图。
图9是本申请实施例示出的一种目标检测网络的训练方法的示意图。
图10是本申请实施例示出的一种预测候选边界框方法的流程图。
图11是本申请实施例示出的一种锚点框的示意图。
图12是本申请一示例性实施例示出的一种预测前景图像区域方法的流程图。
图13是本申请一示例性实施例示出的一种目标检测装置的结构示意图。
图14是本申请一示例性实施例示出的一种目标检测网络的训练装置的结构示意图。
图15是本申请一示例性实施例示出的一种目标检测设备的结构图。
图16是本申请一示例性实施例示出的一种目标检测网络的训练设备的结构图。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
应理解,本公开实施例提供的技术方案主要应用于图像中细长小目标的检测,但本公开实施例对此不做限定。
图1示出了一种目标检测方法,该方法可以包括以下步骤。
在步骤101中,获得输入图像的特征数据(例如特征图feature map)。
在一些实施例中,输入图像可以是遥感图像。遥感图像可以是通过搭载在例如人造卫星、航拍飞机上的传感器探测的地物电磁辐射特征信号等所获得的图像。本领域技术人员应当理解,输入图像也可以是其他类型的图像,并不限于遥感图像。
在一个示例中,可以通过特征提取网络提取样本图像的特征数据,例如卷积神经网络,本公开实施例不限制特征提取网络的具体结构。所提取的特征数据是多通道的特征数据,特征数据的大小和通道数目由特征提取网络的具体结构确定。
在另一个示例中,可以从其他设备处获取输入图像的特征数据,例如,接收终端发送的特征数据,本公开实施例对此不作限制。
在步骤102中,根据所述特征数据,确定所述输入图像的多个候选边界框。
在本步骤中,利用例如感兴趣区域(Region Of Interest,ROI)等技术预测得到候选边界框,包括了获得候选边界框的参数信息,参数可以包括候选边界框的长度、宽度、中心点坐标、角度等一种或任意组合。
在步骤103中,根据所述特征数据,获得所述输入图像的前景分割结果,其中,前景分割结果包含指示所述输入图像的多个像素中每个像素是否属于前景的指示信息。
基于特征数据所获得的前景分割结果,包含了所述输入图像的多个像素中,每个像素属于前景和/或背景的概率,前景分割结果给出像素级的预测结果。
在步骤104,根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果。
在一些实施例中,根据输入图像的特征数据所确定的多个候选边界框,和通过所述特征数据得到的前景分割结果,具有对应关系。将多个候选边界框映射到前景分割结果,与目标对象的轮廓拟合越好的候选边界框,与前景分割结果对应的前景图像区域越接近重叠。因此,可以结合所确定的多个候选边界框和所得到前景分割结果,可以更准确地确定所检测的目标对象。在一些实施例中,目标检测结果可以包括输入图像包括的目标对象的位置、数量等信息。
在一个示例中,可以根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框;并基于所述至少一个目标边界框,得到所述输入图像的目标检测结果。
在所述多个候选边界框中,与前景图像区域之间的重叠区域越大,也即候选边界框与前景图像区域越接近重叠,说明该候选边界框与目标对象的轮廓拟合的越好,也说明该候选边界框的预测结果越准确。因此,根据候选边界框与前景图像之间的重叠区域,可以从所述多个候选边界框中选取出至少一个候选边界框作为目标边界框,将所选取的目标边界框中作为检测到的目标对象,获得所述输入图像的目标检测结果。
例如,可以将所述多个候选边界框中与所述前景图像区域之间的重叠区域在整个候选边界框中所占的比例大于第一阈值的候选边界框作为所述目标边界框。重叠区域在整个候选边界框中所占的比例越高,说明该候选边界框与前景图像区域的重叠程度越高。本领域技术人员应当理解,本公开不限定第一阈值的具体数值,其可以根据实际需求来确定。
本公开实施例的目标检测方法可以应用于长宽比悬殊的待检测目标对象,例如飞机、船舰、车辆等军事目标。在一个示例中,长宽比悬殊指长宽比大于特定数值,例如大于5。本领域技术人员应当理解,该特定数值可以依据检测目标而具体确定。在一个示例中,目标对象可以是船舰。
下面以输入图像为遥感图像且检测目标为船舰为例,说明目标检测的过程。本领域技术人员应当理解,对于其他的目标对象,也可以应用该目标检测方法。参见图2所示的目标检测方法示意图。
首先,获得该遥感图像(也即图2中的输入图像210)的多通道特征数据(也即图2中的特征图220)。
将上述特征数据分别输入到第一分支(图2中上部分支230)和第二分支(图2中下部分支240),分别进行如下处理。
对于第一分支
对每个锚点(anchor)框生成一个置信度得分。该置信度得分与锚点框内为前景或背景的概率相关,例如,锚点框为前景的概率越高,置信度得分就越高。
在一些实施例中,锚点框是基于先验知识的矩形框。锚点框的具体实现方法可以参见后续训练目标检测网络中的描述,在此暂不详述。可以将锚点框作为一个整体进行预测,以计算锚点框内属于前景或背景的概率,即预测该锚点框内是否含有物体或特定目标,其中,若锚点框含有物体或特定目标,则将该锚点框判断为前景。
在一些实施例中,按照置信度得分,可以选出得分最高或超过一定阈值的若干锚点框作为前景锚点框,通过预测前景锚点框到候选边界框的偏移量,对前景锚点框进行偏移可以得到候选边界框,并且基于该偏移量可以获得候选边界框的参数。
在一个示例中,锚点框可以包括方向信息,并且可以设置多种长宽比,以覆盖待检测的目标对象。具体的方向个数以及长宽比的数值可以根据实际需求进行设置。如图11所示,所构造的锚点框对应6个方向,其中,w表示锚点框的宽度,l表示锚点框的长度,θ表示锚点框的角度(锚点框相对于水平的旋转角度),(x,y)表示锚点框中心点的坐标。对应于方向上均匀分布的6个锚点框,θ分别为0°、30°、60°、90°、-30°、-60°。
在一个示例中,在生成候选边界框之后,可以进一步通过非极大值抑制方法(Non-Maximum Suppression,NMS)去除重叠的检测框。例如可以首先遍历所有候选边界框,选择置信度得分最高的候选边界框,遍历其余的候选边界框,如果和当前最高分边界框的交并比(Intersection over Union,IoU)大于一定阈值,则将该边界框删除。之后,从未处理的候选边界框中继续选取得分最高的,重复上述过程。多次迭代后,得最终未被抑制的保留下来,作为所确定的候选边界框。以图2为例,经NMS处理后,得到候选边界框图231中的标号为1、2、3的三个候选边界框。
对于第二分支
根据所述特征数据,对于输入图像中的每个像素,预测其为前景、背景的概率,通过将为前景概率高于设定值的像素作为前景像素,生成像素级的前景分割结果241。
由于第一分支和第二分支输出的结果尺寸是一致的,因此可以将候选边界框映射到像素分割结果中,据候选边界框与前景分割结果对应的前景图像区域之间的重叠区域,来确定目标边界框。例如,可以将重叠区域在整个候选边界框中所占的比例大于第一阈值的候选边界框作为所述目标边界框。
以图2为例,将标号为1、2、3的三个候选边界框映射至前景分割结果中,可以计算得出每个候选边界框与前景图像区域重叠区域在整个候选边界框中所占的比例,例如,针对候选边界框1,该比例为92%,针对候选边界框2,该比例为86%,针对候选边界框3,该比例为65%。在第一阈值为70%的情况下,则排除了候选边界框3为目标边界框的可能性,在最终检测输出结果图250中,目标边界框为候选边界框1和候选边界框2。
通过以上方法进行检测,输出的目标边界框仍有重叠的可能性。例如,在进行NMS处理时,如果阈值设置的过高,则有可能没有抑制掉重叠的候选边界框。在候选边界框与前景图像区域重叠区域在整个候选边界框中所占的比例都超过第一阈值的情况下,最终输出的目标边界框还有可能包括重叠的边界框。
在所选取的至少一个目标边界框包括第一边界框和第二边界框的情况下,本公开实施例可以通过以下方法确定最终目标对象。本领域技术人员应当理解,该方法不限于处理两个重叠边界框,也可以通过先处理两个,再处理保留的一个与其他边界框的方法,处理多个重叠边界框。
在一些实施例中,基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠参数;基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置。
在两个待检测目标对象紧密排列的情况下,二者的目标边界框(第一边界框和第二边界框)有可能是重复的。但这种情况,第一边界框和第二边界框的交并比通常是比较小的。因此,本公开通过设置第一边界框和第二边界框的重叠参数,来确定两个边界框中的检测物体是否均为目标对象。
在一些实施例中,在所述重叠参数大于第二阈值的情况下,则表示第一边界框和第二边界框中有可能只有一个目标对象,因此将其中的一个边界框作为目标对象位置。由于前景分割结果包括了像素级的前景图像区域,因此可以利用该前景图像区域来确定保留哪一个边界框,作为目标对象的边界框。例如,可以分别计算第一边界框与对应的前景图像区域的第一重叠参数以及第二边界框与对应的前景图像区域的第二重叠参数,将第一重叠参数和第二重叠参数中的较大值对应的目标边界框内确定为目标对象,并移除较小值对应的目标边界框。通过以上方法,则移除了在一个目标对象上重叠的两个或多个边界框。
在一些实施例中,在所述重叠参数小于或等于第二阈值的情况下,将所述第一边界框和所述第二边界框均作为目标对象位置。
以下示例性地说明确定最终目标对象的过程。
在一个实施例中,如图3A所示,边界框A、B为船舰检测结果,其中,边界框A和边界框B是重叠的,计算得出二者的重叠参数为0.1。在第二阈值为0.3的情况下,确定边界框A和边界框B是两个不同船舰的检测。将边界框映射到像素分割结果中可见,边界框A和边界框B分别对应着不同的船舰。在判断出两个边界框的重叠参数小于第二阈值的情况下,并不需要额外的将边界框映射到像素分割结果的过程,以上仅出于验证的目的。
在另一个实施例中,如图3B所示,边界框C、D为另一种船舰检测结果,其中,边界框C和边界框D是重叠的,计算得出二者的重叠参数为0.8,也即大于第二阈值0.3。基于该重叠参数计算结果,可以确定边界框C和边界框D实际上是同一船舰的边界框。在这种情况下,可以通过将边界框C和边界框D映射到像素分割结果中,利用对应的前景图像区域来进一步确定最终目标对象。计算边界框C与前景图像区域的第一重叠参数,以及计算边界框D与前景图像区域的第二重叠参数。例如,第一重叠参数为0.9,第二重叠参数为0.8,则确定数值较大的第一重叠参数所对应的边界框C包含船舰,并同时移除第二重叠参数所对应的边界框D,最终输出边界框C作为船舰的目标边界框。
在一些实施例中,利用像素分割结果对应的前景图像区域辅助确定重叠边界框的目标对象,由于像素分割结果对应的是像素级的前景图像区域,空间精度较高,因此通过重叠的边界框与前景图像区域的重叠参数进一步确定包含目标对象的目标边界框,提升了目标检测的精度。
相关技术中,由于采用的锚点框通常是不含角度参数的矩形框,对于长宽比悬殊的目标对象,例如船舰,当目标对象处于倾斜的状态,利用这种锚点框所确定的目标边界框是目标对象的外接矩 形框,其面积与目标对象的真实面积相差是非常大的。对于两个紧密排列的目标对象,如图4所示,其中目标对象401对应的目标边界框403是其外接矩形框,目标对象402对应的目标边界框404也是其外接矩形框,这两个目标对象的目标边界框之间的重叠参数即是两个外接矩形框之间的交并比IoU。由于目标边界框与目标对象之间面积的差异,使得计算得到的交并比的误差是非常大的,因此导致了目标测检的召回率(recall)降低。
为此,如前所述,在一些实施例中,本公开的锚点框可以引入锚点框的角度参数,以增加交并比的计算准确性。由锚点框经过计算得到的不同的目标边界框的角度也可能互不相同。
基于此,本公开提出了如下计算重叠参数的方法:根据所述第一边界框和所述第二边界框之间的夹角,获得角度因子;根据所述第一边界框和所述第二边界框之间的交并比和所述角度因子,获得所述重叠参数。
在一个示例中,所述重叠参数为所述交并比与所述角度因子的乘积,其中,所述角度因子可以根据第一边界框和第二边界框之间的夹角得到,其值小于1,并且随着第一边界框和第二边界框之间的角度的增大而增大。
例如,该角度因子可以用公式(1)表示:
其中,θ为第一边界框和第二边界框之间的夹角。
在另一个示例中,在所述交并比保持一定的条件下,所述重叠参数随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
以下以图5A和图5B为例,说明以上重叠参数计算方法对目标检测的影响。
对于图5A中的边界框501和边界框502,二者面积的交并比为AIoU1,二者之间的角度为θ
1。对于图5B中的边界框503和边界框504,二者面积的交并比为AIoU2,二者之间的角度为θ
2。其中,AIoU1<AIoU2。
利用上述重叠参数计算方法,增加角度因子γ进行重叠参数计算。例如,通过将两个边界框面积的交并比值与角度因子的值相乘,得到重叠参数。
例如,边界框501和边界框502的重叠参数β1可利用公式(2)进行计算:
边界框503和边界框504的重叠参数β2可利用公式(3)进行计算:
经计算可得,β1>β2。
在加入了角度因子后,图5A和图5B的重叠参数计算结果相较于面积交并比的计算结果,在大小关系上是相反的。这是由于在图5A中,两个边界框之间的角度较大,使得角度因子的值也较大,因此得到的重叠参数变大。相应地,在图5B中,两个边界框之间的角度较小,使得角度因子的值也较小,因此得到的重叠参数变小。
对于两个紧密排列的目标对象来说,二者之间的角度可能是很小的。但是由于其排列紧密,检测得到的二者的边界框之间,面积重叠部分可能较大,如果仅以面积计算交并比的话,很可能交并比结果较大,使得容易被误判为两个边界框包含的是同一个目标对象。通过本公开实施例所提出的重叠参数计算方法,通过引入角度因子,使得排列紧密的目标对象之间的重叠参数计算结果变小,有利于准确地检测出目标对象,提升对紧密排列目标的召回率。
本领域技术人员应当理解,以上重叠参数计算方法不限于对目标边界框之间的重叠参数进行计算,也可用于候选边界框、前景锚点框、真实边界框、锚点框等带有角度参数的框之间的重叠参数计算。此外,也可以采用其他方式计算重叠参数,本公开实施例对此不做限定。
在一些例子中,上述目标检测方法可以由已训练好的目标检测网络实现,该目标检测网络可以为神经网络。在使用目标检测网络之前,需要先对其进行训练,以得到优化的参数值。
下面仍以船舰检测目标为例,说明目标检测网络的训练过程。所述目标检测网络可以包括特征提取网络、目标预测网络和前景分割网络。参见图6所示的训练方法实施例流程图,可以包括如下步骤。
在步骤601中,通过所述特征提取网络对样本图像进行特征提取处理,获得所述样本图像的特征数据。
在本步骤中,所述的样本图像可以是遥感图像。遥感图像是通过搭载在例如人造卫星、航拍飞机上的传感器探测的地物电磁辐射特征信号,所获得的图像。样本图像也可以是其他类型的图像,并不限于遥感图像。此外,所述样本图像包括预先标注的目标对象的标注信息。该标注信息可以包括标定的目标对象的真实边界框(ground truth),在一个示例中,该标注信息可以是标定的真实边界框的四个顶点的坐标。特征提取网络可以是卷积神经网络,本公开实施例不限制特征提取网络的具体结构。
在步骤602中,根据所述特征数据,通过所述目标预测网络获得多个样本候选边界框。
在本步骤中,根据所述样本图像的特征数据,预测生成目标对象的多个候选边界框。所述候选边界框所包含的信息可以包括以下中的至少一种:该边界框内是前景、背景的概率,该边界框的参数,例如,该边界框的尺寸、角度、位置等。
在步骤603中,根据所述特征数据获得所述样本图像中的前景分割结果。
在本步骤中,根据所述特征数据,通过所述前景分割网络获得所述样本图像的样本前景分割结果。其中,所述样本前景分割结果包含指示所述样本图像的多个像素点中每个像素点是否属于前景的指示信息。也即,通过前景分割结果可以获得对应的前景图像区域,该前景图像区域包括所有被预测为前景的像素。
在步骤604,根据所述多个样本候选边界框和所述样本前景分割结果以及所述样本图像的标注信息,确定网络损失值。
所述网络损失值可以包括所述目标预测网络对应的第一网络损失值,和所述前景分割网络对应的第二网络损失值。
在一些例子中,所述第一网络损失值根据样本图像中的标注信息与所述样本候选边界框的信息得到。在一个示例中,目标对象的标注信息可以是目标对象的真实边界框的四个顶点的坐标,而预测得到的样本候选边界框的预测参数可以是候选边界框的长度、宽度、相对于水平的旋转角度、中心点的坐标。基于真实边界框的四个顶点的坐标,可以相应地计算出真实边界框的长度、宽度、相对于水平的旋转角度、中心点的坐标。因此,基于样本候选边界框的预测参数和真实边界框的真实参数,可以得到体现标注信息与预测信息之间的差异的第一网络损失值。
在一些例子中,所述第二网络损失值根据样本前景分割结果与真实的前景图像区域得到。基于预先标注的目标对象的真实边界框,可以获得在原始的样本图像中所标注的包含目标对象的区域,该区域中所包含的像素为真实的前景像素,为真实的前景图像区域。因此,基于样本前景分割结果与标注信息,也即通过预测的前景图像区域与真实的前景图像区域之间的比较,可以得到第二网络损失值。
在步骤605中,基于所述网络损失值,对所述目标检测网络的网络参数进行调整。
在一个示例中,可以通过梯度反向传播方法调整上述网络参数。
由于候选边界框的预测和前景图像区域的预测共享特征提取网络所提取的特征数据,通过两个分支的预测结果与标注的真实目标对象之间的差异来共同调整各个网络的参数,能够同时提供对象级的监督信息和像素级的监督信息,使特征提取网络所提取特征的质量得到提高。并且,本公开实施例用于预测候选边界框和前景图像的网络皆为one-stage检测器,能够实现较高的检测效率。
在一个示例中,可以基于所述多个样本候选边界框与所述样本图像标注的至少一个真实目标边界框之间的交并比,确定第一网络损失值。
在一个示例中,可以利用交并比的计算结果,从多个锚点框中选择正样本和/或负样本。例如,可以将与真实边界框的交并比大于一定数值,例如0.5,的锚点框,视为包含前景的候选边界框,将其作为正样本来训练目标检测网络;并且可以将与真实边界框的交并比小于一定数值,例如0.1,的锚点框,作为负样本来训练网络。基于所选择的正样本和/或负样本,确定第一网络损失值。
在计算第一网络损失值的过程中,由于目标对象长宽比悬殊,相关技术中计算得到的锚点框与真实边界框的交并比值可能较小,容易导致所选择的进行损失值计算的正样本变少,从而影响了训练精度。此外,本公开实施例采用的是带方向参数的锚点框,为了适应于该锚点框并提高交并比计算的准确性本公开提出了一种交并比计算方法,该方法可用于锚点框与真实边界框的交并比计算,也可用于候选边界框与真实边界框之间的交并比计算。
在该方法中,可以根据锚点框与真实边界框的外接圆面积的交集与并集的比值作为交并比。以下以图7为例进行说明。
边界框701和边界框702是长宽比悬殊、具有角度参数的矩形框,二者的长宽比例如为5。边界框701的外接圆为703,边界框702的外接圆为704,可以利用外接圆703和外接圆704面积的交集(图中阴影部分)与并集的比值,作为交并比。
对于锚点框与真实边界框的交并比计算,也可以采用其他方式,本公开实施例对此不做限 定。
以上实施例中提出的计算交并比的方法,通过方向信息的约束,保留了更多在形状上类似但是方向上有差异的样本,提升了所选取的正样本的数量和比例,因此加强了对方向信息的监督与学习,进而提升了方向预测精度。
如下的描述中,将对目标检测网络的训练方法进行更详细的描述。其中,下文以检测的目标对象是船舰为例描述该训练方法。应当理解的是,本公开检测的目标对象不局限于船舰,也可以是其他长宽比较为悬殊的对象。
准备样本
在训练神经网络之前,首先可以先准备样本集,该样本集可以包括:用于训练目标检测网络的多个训练样本。
例如,可以按照下述方式获得训练样本。
在作为样本图像的遥感图像上,标注出船舰的真实边界框。在该遥感图像上,可能包括多个船舰,则需要标注出每一个船舰的真实边界框。同时,需要标注出每一个真实边界框的参数信息,例如该边界框的四个顶点的坐标。
在标注出船舰的真实边界框的同时,可以将该真实边界框内的像素确定为真实的前景像素,也即,标注船舰的真实边界框的同时也获得了船舰的真实前景图像。本领域技术人员应当理解,真实边界框内的像素也包括真实边界框本身所包括的像素。
确定目标检测网络结构
本公开一个实施例中,目标检测网络可以包括特征提取网络、以及分别与该特征提取网络级联的目标预测网络和前景分割网络。
其中,特征提取网络用于提取样本图像的特征,其可以是卷积神经网络,例如可以采用已有的VGG(Visual Geometry Group)网络、ResNet、DenseNet等等,也可以采用其他的卷积神经网络结构。本申请对特征提取网络的具体结构不做限定,在一种可选的实现方式中,特征提取网络可以包括卷积层、激励层、池化层等网络单元,由上述网络单元按照一定方式堆叠而成。
目标预测网络用于预测目标对象的边界框,也即预测生成候选边界框的预测信息。本申请对目标预测网络的具体结构不做限定,一种可选的实现方式中,目标预测网络可以包括卷积层、分类层、回归层等网络单元,由上述网络单元按照一定方式堆叠而成。
前景分割网络用于预测样本图像中的前景图像,也即预测包含目标对象的像素区域。本申请对前景分割网络的具体结构不做限定,一种可选的实现方式中,前景分割网络可以包括上采样层、掩膜(mask)层,由上述网络单元按照一定方式堆叠而成。
图8示出了本公开实施例可以应用的一种目标检测网络的网络结构,需要说明的是,图8仅是示例性示出了一种目标检测网络,实际实施中不局限于此。
如图8所示,目标提取网络包括特征提取网络810和分别与特征提取网络810级联的目标预测网络820和前景分割网络830。
其中,特征提取网络810包括依次连接的第一卷积层(C1)811、第一池化层(P1)812、 第二卷积层(C2)813、第二池化层(P2)814和第三卷积层(C3)815,也即,在特征提取网络810中,卷积层和池化层交替连接在一起。卷积层可以通过多个卷积核分别提取图像中的不同特征,得到多幅特征图,池化层位于卷积层之后,可以对特征图的数据进行局部平均和降采样的操作,降低特征数据的分辨率。随着卷积层和池化层数量的增加,特征图的数目逐渐增多,并且特征图的分辨率逐渐降低。
特征提取网络810输出的多通道的特征数据分别输入至目标预测网络820和前景分割网络830。
目标预测网络820包括第四卷积层(C4)821、分类层822和回归层823。其中,分类层822和回归层823分别与第四卷积层821级联。
第四卷积层821利用滑动窗口(例如,3*3)对输入的特征数据进行卷积,每个窗口对应多个锚点框,每个窗口产生一个用于与分类层823和回归层824全连接的向量。此处还可以使用二个或多个卷积层,对输入的特征数据进行卷积。
分类层822用于判断锚点框所生成的边界框内是前景还是背景,回归层823用于得出候选边界框的大致位置,基于分类层822和回归层823的输出结果,可以预测出包含目标对象的候选边界框,并且输出该候选边界框内为前景、背景的概率以及该候选边界框的参数。
前景分割网络830包括上采样层831和掩膜层832。上采样层831用于将输入的特征数据转换为原始的样本图像大小;掩膜层832用于生成前景的二进制掩膜,即对于前景像素输出1,对于背景像素输出0。
此外,在计算候选边界框与前景图像区域重叠区域时,可以由第四卷积层821和掩膜层832进行图像尺寸的转换,使特征位置得到对应,即目标预测网络820和前景分割网络830的输出可以预测图像上同一位置的信息,进而计算重叠区域。
在训练该目标检测网络之前,可以设定一些网络参数,例如,可以设定特征提取网络810中每一个卷积层以及目标预测网络中卷积层使用的卷积核的数量,还可以设定卷积核的尺寸大小,等。而对于卷积核的取值、其他层的权重等参数值,可以通过迭代训练进行自学习。
在准备了训练样本和初始化目标检测网络结构的基础上,可以开始进行目标检测网络的训练。以下将列举目标检测网络的具体训练方法。
训练目标检测网络一
在一些实施例中,目标检测网络的结构可以参见图8所示。
参见图9的示例,输入目标检测网络的样本图像可以是包含船舰图像的遥感图像。并且在该样本图像上,标注出了所包含的船舰的真实边界框,标注信息可以是真实边界框的参数信息,例如该边界框的四个顶点的坐标。
输入的样本图像首先通过特征提取网络,提取样本图像的特征,输出该样本图像的多通道特征数据。输出特征数据的大小和通道数目由特征提取网络的卷积层结构和池化层结构确定。
该多通道特征数据一方面进入目标预测网络,目标预测网络基于当前的网络参数设置,基于输入的特征数据预测包含船舰的候选边界框,并生成该候选边界框的预测信息。该预测信息可以包括该边界框为前景、背景的概率,以及该边界框的参数信息,例如,该边界框的尺寸、位置、角 度等。基于预先标注的目标对象的标注信息和预测得到的候选边界框的预测信息,可以得到第一网络损失函数的数值LOSS1,也即第一网络损失值。该第一网络损失函数的数值体现标注信息与预测信息之间的差异。
另一方面,该多通道特征数据进入前景分割网络,前景分割网络基于当前的网络参数设置,预测样本图像中包含船舰的前景图像区域。例如可以通过特征数据中每个像素为前景、背景的概率,通过将为前景概率大于设定值的像素都作为前景像素,进行像素分割,则可以得出预测的前景图像区域。
由于在样本图像中已经预先标注了船舰的真实边界框,通过该真实边界框的参数,例如四个顶点的坐标,可以得出样本图像中为前景的像素,即得知样本图像中的真实前景图像。基于预测的前景图像与通过标注信息得到的真实前景图像,可以得到第二网络损失函数的数值LOSS2,也即第二网络损失值。该第二网络损失函数的数值体现了预测的前景图像与标注信息之间的差异。
可以基于第一网络损失函数的数值和第二网络损失函数的数值共同确定的总损失值反向回传目标检测网络,以调整网络参数的取值,例如调整卷积核的取值、其他层的权重。在一个示例中,可以将第一网络损失函数和第二网络损失函数之和确定为总损失函数,利用总损失函数进行参数调整。
在训练目标检测网络时,可以将训练样本集分成多个图像子集(batch),每个图像子集包括一个或多个训练样本。每次迭代训练时,向网络依次输入一个图像子集,结合该图像子集包括的训练样本中各个样本预测结果的损失值进行网络参数的调整。本次迭代训练完成后,向网络输入下一个图像子集,以进行下一次迭代训练。不同图像子集包括的训练样本至少部分不同。当达到预定结束条件时,则可以完成目标检测网络的训练。所述预定训练结束条件,例如可以是总损失值(LOSS值)降低到了一定阈值,或者达到了预定的目标检测网络迭代次数。
本实施的目标检测网络训练方法,由目标预测网络提供对象级的监督信息,通过像素分割网络提供像素级的监督信息,通过两种不同层次的监督信息,使特征提取网络所提取特征的质量得到提高,并且,利用one-stage的目标预测网络和像素分割网络进行检测,使检测效率得到了提高。
训练目标检测网络二
在一些实施例中,目标预测网络可以通过以下方式预测得到目标对象的候选边界框。目标预测网络的结构可以参见图8所示。
图10是预测候选边界框的方法的流程图,如图10所示,该流程可以包括以下步骤。
在步骤1001中,将所述特征数据的每一点作为锚点,以每一个锚点为中心构造多个锚点框。
例如,对于大小为[H×W]的特征层,共构造H×W×k个锚点框,其中,k是在每一个锚点生成的锚点框的个数。其中,对在一个锚点构造的多个锚点框设置不同的长宽比,以能够覆盖待检测的目标对象。首先,可以基于先验知识,例如统计大部分目标的尺寸分布,通过超参数设置直接生成先验锚点框,然后通过特征预测出锚点框。
在步骤1002中,将所述锚点映射回所述样本图像,得到每个锚点框在所述样本图像上包含的区域。
在本步骤中,将所有锚点映射回样本图像,也即将特征数据映射回样本图像,则可以得到 以锚点为中心所生成的锚点框在样本图像中所框的区域。可以通过先验锚点框、预测值并结合当前的特征分辨率共同进行计算,将锚点框映射回样本图像的位置和大小,得到每个锚点框在样本图像上包含的区域。
以上过程相当于用一个卷积核(滑动窗口)在输入的特征数据上进行滑动操作,当卷积核滑动到特征数据的某一个位置时,以当前滑动窗口中心为中心映射回样本图像的一个区域,以样本图像上这个区域的中心即是对应的锚点,再以锚点为中心框出锚点框。也就是说,虽然锚点是基于特征数据定义的,但最终其是相对于原始的样本图像的。
对于图8所示的目标预测网络结构,可以通过第四卷积层821来实现提取特征的过程,第四卷积层821的卷积核例如可以是3×3大小。
在步骤1003中,基于映射回样本图像的锚点框与真实边界框的交并比确定前景锚点框,并获得所述前景锚点框内为前景、背景的概率。
在本步骤中,通过比较锚点框在所述样本图像上包含的区域与真实边界框的重叠情况来确定哪些锚点框内是前景,那些锚点框内是背景,也即给每一个锚点框都打上前景或背景的标签(label),具有前景标签的锚点框即为前景锚点框,具有背景标签的锚点框即为背景锚点框。
在一个示例中,可以将与真实边界框的交并比大于第一设定值,例如0.5,的锚点框,视为包含前景的候选边界框。并且,还可以通过对锚点框进行二分类,确定锚点框内为前景、背景的概率。
可以利用前景锚点框来训练目标检测网络,例如将其作为正样本来训练网络,使这些前景锚点框参与损失函数的计算,而这一部分的损失通常被称为分类损失,其是基于前景锚点框的二分类概率与前景锚点框的标签进行比较得到的。
对于一个图像子集,可以使其包含从一张样本图像中随机提取的多个标签为前景的锚点框,例如256个,作为正样本用于训练。
在一个示例中,在正样本数量不足的情况下,还可以利用负样本来训练目标检测网络。负样本例如可以是与真实边界框的交并比小于第二设定值,例如0.1,的锚点框。
在该示例中,可以使一个图像子集包含从一张样本图像中随机提取的256个锚点框,其中128个标签为前景的锚点框,作为正样本,另外128个是与真实边界框的交并比小于第二设定值,例如0.1,的锚点框,作为负样本,使正负样本的比例达到1:1。如果一个图像中的正样本数小于128,则可以多用一些负样本以满足256个锚点框用于训练。
在步骤1004中,对所述前景锚点框进行边界框回归,得到候选边界框,并获得所述候选边界框的参数。
在本步骤中,前景锚点框、候选边界框的参数类型与锚点框的参数类型是一致的,也即,所构造的锚点框包含哪些参数,所生成的候选边界框也包含哪些参数。
在步骤1003中所获得的前景锚点框,由于长宽比可能与样本图像中的船舰的长宽比有差距,并且前景锚点框的位置、角度也可能与样本船舰有差距,因此,需要利用前景锚点框和与其对应的真实边界框之间的偏移量进行回归训练,使得目标预测网络具备通过前景点框预测其到候选边界框的偏移量的能力,从而获得候选边界框的参数。
通过步骤1003和步骤1004,可以获得候选边界框的信息:候选边界框内为前景、背景的概率,以及候选边界框的参数。基于上述候选边界框的信息,以及样本图像中的标注信息(目标对象对应的真实边界框),可以得到第一网络损失。
在本公开实施例中,目标预测网络为one stage网络,在第一次预测得到候选边界框后,即输出候选边界框的预测结果,提高了网络的检测效率。
训练目标检测网络三
相关技术中,每一个锚点所对应的锚点框的参数通常包括长度、宽度和中心点的坐标。在本实例中,提出了一种旋转锚点框设置方法。
在一个示例中,以每一个锚点为中心构造多个方向的锚点框,并且可以设置多种长宽比,以覆盖待检测的目标对象。具体的方向个数以及长宽比的数值可以根据实际需求进行设置。如图11所示,所构造的锚点框对应6个方向,其中,w表示锚点框的宽度,l表示锚点框的长度,θ表示锚点框的角度(锚点框相对于水平的旋转角度),(x,y)表示锚点框中心点的坐标。对应于方向上均匀分布的6个锚点框,θ分别为0°、30°、60°、90°、-30°、-60°。相应地,在该示例中,锚点框的参数可以表示为(x,y,w,l,θ)。其中,长宽比例可以设置为1、3、5,也可以针对检测的目标对象设置为其他数值。
在一些实施例中,候选边界框的参数也同样可以表示为(x,y,w,l,θ),该参数可以利用图8中的回归层823进行回归计算。回归计算的方法如下。
首先,计算得到前景锚点框到真实边界框的偏移量。
例如,前景锚点框的参数值为[A
x,A
y,A
w,A
l,A
θ],其中,A
x,A
y,A
w,A
l,A
θ分别表示前景锚点框的中心点x坐标、中心点y坐标、宽度、长度、角度;对应真实边界框的五个值为[G
x,G
y,G
w,G
l,G
θ],其中,G
x,G
y,G
w,G
l,G
θ分别表示真实边界框的中心点x坐标、中心点y坐标、宽度、长度、角度
基于前景锚点框的参数值和真实边界框的值可以确定前景锚点框与真实边界框之间的偏移量[d
x(A),d
y(A),d
w(A),d
l(A),d
θ(A)],其中,dx(A),dy(A),dw(A),dl(A),dθ(A)分别表示中心点x坐标、中心点y坐标、宽度、长度、角度的偏移量。各个偏移量例如可以分别通过公式(4)-(8)进行计算:
d
x(A)=(G
x-A
x)/A
w (4)
d
y(A)=(G
y-A
y)/A
l (5)
d
w(A)=log(G
w/A
w) (6)
d
l(A)=log(G
l/A
l) (7)
d
θ(A)=G
θ-A
θ (8)
其中,公式(6)和公式(7)采用对数来表示长和宽的偏移,是为了在差别大时能快速收敛。
在一个示例中,在输入的多通道特征数据中有多个真实边界框的情况下,每个前景锚点框 选择与它重叠度最高的真实边界框来计算偏移量。
接下来,得到前景锚点框到候选边界框的偏移量。
此处为寻找表达式建立锚点框与真实边界框的关系的过程,可以使用回归来实现。以图8中的网络结构为例,可以利用上述偏移量训练回归层823。在完成训练后,目标预测网络具备了识别每一个锚点框到与之对应的最优候选边界框的偏移量[d
x’(A),d
y’(A),d
w’(A),d
l’(A),d
θ’(A)]的能力,也就是说,基于锚点框的参数值即可以确定候选边界框的参数值,包括中心点x坐标、中心点y坐标、宽度、长度、角度。在训练时,可以利用回归层先算出前景锚点框到候选边界框的偏移量。由于训练时网络参数的优化还没有完成,所以该偏移量可能和实际的偏移量[d
x(A),d
y(A),d
w(A),d
l(A),d
θ(A)]的差距较大。
最后,基于所述偏移量对所述前景锚点框进行偏移,得到所述候选边界框,并获得所述候选边界框的参数。
在计算第一网络损失函数的数值时,可以利用前景锚点框到候选边界框的偏移量[d
x’(A),d
y’(A),d
w’(A),d
l’(A),d
θ’(A)]与训练时前景锚点框与真实边界框的偏移量[d
x(A),d
y(A),d
w(A),d
l(A),d
θ(A)]来计算回归损失。
前述预测的前景锚点框内为前景、背景的概率,在对该前景锚点框进行回归得到候选边界框后,该概率即为候选边界框内为前景、背景的概率,基于该概率则可以确定预测候选边界框内为前景、背景的分类损失。该分类损失与预测候选边界框的参数的回归损失之和,组成了第一网络损失函数的数值。对于一个图像子集,可以基于所有候选边界框的第一网络损失函数的数值,进行网络参数的调整。
通过设置具有方向的锚点框,可以生成更符合目标对象位姿的外接矩形边界框,使边界框之间的重叠部分的计算更加严格与精确。
训练目标检测网络四
在基于标准信息与候选边界框的信息得到第一网络损失函数的数值时,可以设置锚点框的各个参数的权重比例,使宽度的权重比例高于其他参数的权重比例,并根据设置的权重比例,计算第一网络损失函数的数值。
权重比例越高的参数,对于最终计算得到的损失函数值贡献越大,在进行网络参数调整时,会更注重调整的结果对该参数值的影响,从而使得该参数的计算精度高于其他参数。对于长宽比悬殊的目标对象,例如船舰,其宽度相较于长度来说非常小,因此将宽度的权重设置为高于其他参数的权重,可以提高宽度的预测精度。
训练目标检测网络五
在一些实施例中,可以通过以下方式预测得到样本图像中的前景图像区域。前景分割网络的结构可以参见图8所示。
图12是预测前景图像区域方法的实施例流程图,如图12所示,该流程可以包括如下步骤。
在步骤1201中,对所述特征数据进行上采样处理,以使处理后的特征数据的大小与样本图像的大小相同。
例如,可以通过反卷积层,或者双线性差值对特征数据进行上采样处理,将特征数据放大回样本图像大小。由于输入像素分割网络的是多通道特征数据,在经过上采样处理后,得到的是相应通道数目的、与样本图像大小一致的特征数据。特征数据上的每个位置都与原始图像位置一一对应。
在步骤1202中,基于所述处理后的所述特征数据进行像素分割,获得所述样本图像的样本前景分割结果。
对于特征数据的每个像素,可以判断出其属于前景、背景的概率。可以通过设定阈值,将属于前景的概率大于设定阈值的像素确定为前景像素,则对于每个像素都能够生成掩膜信息,通常可以用0、1表示,其中可以用0表示背景,1表示前景。基于该掩膜信息,可以确定为前景的像素,从而得到了像素级的前景分割结果。由于特征数据上的每个像素都与样本图像上的区域相对应,而样本图像中已经标注出了目标对象的真实边界框,因此根据标注信息,确定每个像素的分类结果与真实边界框的差异,得到分类损失。
由于该像素分割网络不涉及边界框的位置确定,因此其所对应的第二网络损失函数的数值,可以通过每个像素的分类损失之和确定。通过不断地调整网络参数,使得第二网络损失值达到最小,可以使得每个像素的分类更加准确,从而更准确地确定目标对象的前景图像。
在一些实施例中,通过对特征数据进行上采样处理,以及对于每个像素生成掩膜信息,可以得到像素级的前景图像区域,使目标检测的精确度得到了提高。
图13提供了一种目标检测装置,如图13所示,该装置可以包括:特征提取单元1301、目标预测单元1302、前景分割单元1303和目标确定单元1304。
特征提取单元1301,用于获得输入图像的特征数据。
目标预测单元1302,用于根据所述特征数据,确定所述输入图像的多个候选边界框。
前景分割单元1303,用于根据所述特征数据,获得所述输入图像的前景分割结果,其中,前景分割结果包含指示所述输入图像的多个像素中每个像素是否属于前景的指示信息。
目标确定单元1304,用于根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果。
在另一个实施例中,所述目标确定单元1304具体用于:根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框;基于所述至少一个目标边界框,得到所述输入图像的目标检测结果。
在另一个实施例中,所述目标确定单元1304在用于所述根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框时,具体用于:对于所述多个候选边界框中每个候选边界框,若该候选边界框与对应的前景图像区域之间的重叠区域在该候选边界框中所占的比例大于第一阈值,则将该候选边界框作为所述目标边界框。
在另一个实施例中,所述至少一个目标边界框包括第一边界框和第二边界框,所述目标确定单元1304在用于基于所述至少一个目标边界框,得到所述输入图像的目标检测结果时,具体用于:基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠 参数;基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置。
在另一个实施例中,所述目标确定单元1304在用于基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠参数时,具体用于:根据所述第一边界框和所述第二边界框之间的夹角,获得角度因子;根据所述第一边界框和所述第二边界框之间的交并比和所述角度因子,获得所述重叠参数。
在另一个实施例中,所述第一边界框和所述第二边界框的重叠参数为所述交并比与所述角度因子的乘积,其中,所述角度因子随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
在另一个实施例中,在所述交并比保持一定的条件下,所述第一边界框和所述第二边界框的重叠参数随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
在另一个实施例中,所述基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置,包括:在所述第一边界框和所述第二边界框的重叠参数大于第二阈值的情况下,将所述第一边界框和所述第二边界框中的其中一个边界框作为目标对象位置。
在另一个实施例中,将所述第一边界框和所述第二边界框中的其中一个边界框作为目标对象位置,包括:确定所述第一边界框与所述前景分割结果对应的前景图像区域之间的重叠参数和所述第二边界框与所述前景图像区域之间的重叠参数;将所述第一边界框和所述第二边界框中与所述前景图像区域之间的重叠参数较大的边界框作为目标对象位置。
在另一个实施例中,所述基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置,包括:在所述第一边界框和所述第二边界框的重叠参数小于或等于第二阈值的情况下,将所述第一边界框和第二边界框均作为目标对象位置。
在另一个实施例中,所述输入图像中待检测的目标对象的长宽比大于特定数值。
图14提供了一种目标检测网络的训练装置,所述目标检测网络包括特征提取网络、目标预测网络和前景分割网络。如图14所示,该装置可以包括:特征提取单元1401、目标预测单元1402、前景分割单元1403、损失值确定单元1404和参数调整单元1405。
特征提取单元1401,用于通过所述特征提取网络对样本图像进行特征提取处理,获得所述样本图像的特征数据。
目标预测单元1402,用于根据所述特征数据,通过所述目标预测网络获得多个样本候选边界框。
前景分割单元1403,用于根据所述特征数据,通过所述前景分割网络获得所述样本图像的样本前景分割结果,其中,所述样本前景分割结果包含指示所述样本图像的多个像素点中每个像素点是否属于前景的指示信息。
损失值确定单元1404,用于根据所述多个样本候选边界框和所述样本前景分割结果以及所述样本图像的标注信息,确定网络损失值。
参数调整单元1405,用于基于所述网络损失值,对所述目标检测网络的网络参数进行调整。
在另一个实施例中,所述标注信息包括所述样本图像包含的至少一个目标对象的真实边界框,所述损失值确定单元1404具体用于:对于所述多个候选边界框中的每个候选边界框,确定该候选边界框与所述样本图像标注的至少一个真实目标边界框中的每个真实目标边界框之间的交并比;根据确定的所述多个候选边界框中每个候选边界框的所述交并比,确定第一网络损失值。
在另一个实施例中,所述候选边界框和所述真实目标边界框之间的交并比是基于包含所述候选边界框与所述真实目标边界框的外接圆得到的。
在另一个实施例中,在确定所述网络损失值的过程中,所述候选边界框的宽度所对应的权重高于所述候选边界框的长度所对应的权重。
在另一个实施例中,所述前景分割单元1403具体用于:对所述特征数据进行上采样处理,以使得处理后的所述特征数据的大小与样本图像的大小相同;基于所述处理后的所述特征数据进行像素分割,获得所述样本图像的样本前景分割结果。
在另一个实施例中,所述样本图像包含的目标对象的长宽比高于设定值。
图15为本公开至少一个实施例提供的目标检测设备,所述设备包括存储器1501、处理器1502,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本说明书任一实施例所述的目标检测方法。所述设备还可能包括网络接口1503及内部总线1504。存储器1501、处理器1502和网络接口1503通过内部总线1504进行相互之间的通信。
图16为本公开至少一个实施例提供的目标检测网络的训练设备,所述设备包括存储器1601、处理器1602,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本说明书任一实施例所述的目标检测网络的训练方法。所述设备还可能包括网络接口1603及内部总线1604。存储器1601、处理器1602和网络接口1603通过内部总线1604进行相互之间的通信。
本说明书至少一个实施例还提供了一种非易失性计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本说明书任一实施例所述的目标检测方法,和/或,实现本说明书任一实施例所述的目标检测网络的训练方法。
在本申请实施例中,计算机可读存储介质可以是多种形式,比如,在不同的例子中,所述机器可读存储介质可以是:非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、DVD等),或者类似的存储介质,或者它们的组合。特殊的,所述的计算机可读介质还可以是纸张或者其他合适的能够打印程序的介质。使用这些介质,这些程序可以被通过电学的方式获取到(例如,光学扫描)、可以被以合适的方式编译、解释和处理,然后可以被存储到计算机介质中。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。
Claims (37)
- 一种目标检测方法,其特征在于,所述方法包括:获得输入图像的特征数据;根据所述特征数据,确定所述输入图像的多个候选边界框;根据所述特征数据,获得所述输入图像的前景分割结果,其中,前景分割结果包含指示所述输入图像的多个像素中每个像素是否属于前景的指示信息;根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果。
- 根据权利要求1所述的方法,其特征在于,所述根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果,包括:根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框;基于所述至少一个目标边界框,得到所述输入图像的目标检测结果。
- 根据权利要求2所述的方法,其特征在于,所述根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框,包括:对于所述多个候选边界框中每个候选边界框,若该候选边界框与对应的前景图像区域之间的重叠区域在该候选边界框中所占的比例大于第一阈值,则将该候选边界框作为所述目标边界框。
- 根据权利要求2或3所述的方法,其特征在于,所述至少一个目标边界框包括第一边界框和第二边界框,所述基于所述至少一个目标边界框,得到所述输入图像的目标检测结果,包括:基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠参数;基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置。
- 根据权利要求4所述的方法,其特征在于,所述基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠参数,包括:根据所述第一边界框和所述第二边界框之间的夹角,获得角度因子;根据所述第一边界框和所述第二边界框之间的交并比和所述角度因子,获得所述重叠参数。
- 根据权利要求5所述的方法,其特征在于,所述第一边界框和所述第二边界框的重叠参数为所述交并比与所述角度因子的乘积,其中,所述角度因子随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
- 根据权利要求5或6所述的方法,其特征在于,在所述交并比保持一定的条件下,所述第一边界框和所述第二边界框的重叠参数随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
- 根据权利要求4至7中任一项所述的方法,其特征在于,所述基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置,包括:在所述第一边界框和所述第二边界框的重叠参数大于第二阈值的情况下,将所述第一边界框和所述第二边界框中的其中一个边界框作为所述目标对象位置。
- 根据权利要求8所述的方法,其特征在于,所述将所述第一边界框和所述第二边界框中的其中一个边界框作为所述目标对象位置,包括:确定所述第一边界框与所述前景分割结果对应的前景图像区域之间的重叠参数和所述第二边界框与所述前景图像区域之间的重叠参数;将所述第一边界框和所述第二边界框中与所述前景图像区域之间的重叠参数较大的边界框作为所述目标对象位置。
- 根据权利要求4至9中任一项所述的方法,其特征在于,所述基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置,包括:在所述第一边界框和所述第二边界框的重叠参数小于或等于第二阈值的情况下,将所述第一边界框和所述第二边界框均作为所述目标对象位置。
- 根据权利要求1至10中任一项所述的方法,其特征在于,所述输入图像中待检测的目标对象的长宽比大于特定数值。
- 一种目标检测网络的训练方法,其特征在于,所述目标检测网络包括特征提取网络、目标预测网络和前景分割网络,所述方法包括:通过所述特征提取网络对样本图像进行特征提取处理,获得所述样本图像的特征数据;根据所述特征数据,通过所述目标预测网络获得多个样本候选边界框;根据所述特征数据,通过所述前景分割网络获得所述样本图像的样本前景分割结果,其中,所述样本前景分割结果包含指示所述样本图像的多个像素点中每个像素点是否属于前景的指示信息;根据所述多个样本候选边界框和所述样本前景分割结果以及所述样本图像的标注信息,确定网络损失值;基于所述网络损失值,对所述目标检测网络的网络参数进行调整。
- 根据权利要求12所述的方法,其特征在于,所述标注信息包括所述样本图像包含的至少一个目标对象的真实边界框,所述根据所述多个样本候选边界框和所述样本前景图像区域以及所述样本图像的标注信息,确定网络损失值,包括:对于所述多个候选边界框中的每个候选边界框,确定该候选边界框与所述样本图像标注的至少一个真实目标边界框中的每个真实目标边界框之间的交并比;根据确定的所述多个候选边界框中每个候选边界框的所述交并比,确定第一网络损失值。
- 根据权利要求13所述的方法,其特征在于,所述候选边界框和所述真实目标边界框之间的交并比是基于包含所述候选边界框与所述真实目标边界框的外接圆得到的。
- 根据权利要求12至14中任一项所述的方法,其特征在于,在确定所述网络损失值的过程中,所述候选边界框的宽度所对应的权重高于所述候选边界框的长度所对应的权重。
- 根据权利要求12至15中任一项所述的方法,其特征在于,所述根据所述特征数据,通过所述前景分割网络获得所述样本图像的样本前景分割结果,包括:对所述特征数据进行上采样处理,以使得处理后的所述特征数据的大小与样本图像的大小相同;基于所述处理后的所述特征数据进行像素分割,获得所述样本图像的样本前景分割结果。
- 根据权利要求12-16中任一项所述的方法,其特征在于,所述样本图像包含的目标对象的长宽比高于设定值。
- 一种目标检测装置,其特征在于,所述装置包括:特征提取单元,用于获得输入图像的特征数据;目标预测单元,用于根据所述特征数据,确定所述输入图像的多个候选边界框;前景分割单元,用于根据所述特征数据,获得所述输入图像的前景分割结果,其中,前景分割结果包含指示所述输入图像的多个像素中每个像素是否属于前景的指示信息;目标确定单元,用于根据所述多个候选边界框与所述前景分割结果,得到所述输入图像的目标检测结果。
- 根据权利要求18所述的装置,其特征在于,所述目标确定单元具体用于:根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框;基于所述至少一个目标边界框,得到所述输入图像的目标检测结果。
- 根据权利要求19所述的装置,其特征在于,所述目标确定单元在用于所述根据所述多个候选边界框中每个候选边界框与所述前景分割结果对应的前景图像区域之间的重叠区域,从多个候选边界框中选取至少一个目标边界框时,具体用于:对于所述多个候选边界框中每个候选边界框,若该候选边界框与对应的前景图像区域之间的重叠区域在该候选边界框中所占的比例大于第一阈值,则将该候选边界框作为所述目标边界框。
- 根据权利要求19或20所述的装置,其特征在于,所述至少一个目标边界框包括第一边界框和第二边界框,所述目标确定单元在用于基于所述至少一个目标边界框,得到所述输入图像的目标检测结果时,具体用于:基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠参数;基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置。
- 根据权利要求21所述的装置,其特征在于,所述目标确定单元在用于基于所述第一边界框和所述第二边界框之间的夹角,确定所述第一边界框和所述第二边界框的重叠参数时,具体用于:根据所述第一边界框和所述第二边界框之间的夹角,获得角度因子;根据所述第一边界框和所述第二边界框之间的交并比和所述角度因子,获得所述重叠参数。
- 根据权利要求22所述的装置,其特征在于,所述第一边界框和所述第二边界框的重叠参数为所述交并比与所述角度因子的乘积,其中,所述角度因子随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
- 根据权利要求22或23所述的装置,其特征在于,在所述交并比保持一定的条件下,所述第一边界框和所述第二边界框的重叠参数随着所述第一边界框和所述第二边界框之间的角度的增大而增大。
- 根据权利要求21至24中任一项所述的装置,其特征在于,所述基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置,包括:在所述第一边界框和所述第二边界框的重叠参数大于第二阈值的情况下,将所述第一边界框和所述第二边界框中的其中一个边界框作为所述目标对象位置。
- 根据权利要求25所述的装置,其特征在于,将所述第一边界框和所述第二边界框中的其中一个边界框作为所述目标对象位置,包括:确定所述第一边界框与所述前景分割结果对应的前景图像区域之间的重叠参数和所述第二边界框与所述前景图像区域之间的重叠参数;将所述第一边界框和所述第二边界框中与所述前景图像区域之间的重叠参数较大的边界框作为所述目标对象位置。
- 根据权利要求21至26中任一项所述的装置,其特征在于,所述基于所述第一边界框和所述第二边界框的重叠参数,确定所述第一边界框和所述第二边界框所对应的目标对象位置,包括:在所述第一边界框和所述第二边界框的重叠参数小于或等于第二阈值的情况下,将所述第一边界框和所述第二边界框均作为所述目标对象位置。
- 根据权利要求18至27中任一项所述的装置,其特征在于,所述输入图像中待检测的目标对象的长宽比大于特定数值。
- 一种目标检测网络的训练装置,其特征在于,所述目标检测网络包括特征提取网络、目标预测网络和前景分割网络,所述装置包括:特征提取单元,用于通过所述特征提取网络对样本图像进行特征提取处理,获得所述样本图像的特征数据;目标预测单元,用于根据所述特征数据,通过所述目标预测网络获得多个样本候选边界框;前景分割单元,用于根据所述特征数据,通过所述前景分割网络获得所述样本图像的样本前景分割结果,其中,所述样本前景分割结果包含指示所述样本图像的多个像素点中每个像素点是否属于前景的指示信息;损失值确定单元,用于根据所述多个样本候选边界框和所述样本前景分割结果以及所述样本图像的标注信息,确定网络损失值;参数调整单元,用于基于所述网络损失值,对所述目标检测网络的网络参数进行调整。
- 根据权利要求29所述的装置,其特征在于,所述标注信息包括所述样本图像包含的至少一个目标对象的真实边界框,所述损失值确定单元具体用于:对于所述多个候选边界框中的每个候选边界框,确定该候选边界框与所述样本图像标注的至少一个真实目标边界框中的每个真实目标边界框之间的交并比;根据确定的所述多个候选边界框中每个候选边界框的所述交并比,确定第一网络损失值。
- 根据权利要求30所述的装置,其特征在于,所述候选边界框和所述真实目标边界框之间的交并比是基于包含所述候选边界框与所述真实目标边界框的外接圆得到的。
- 根据权利要求29至31中任一项所述的装置,其特征在于,在确定所述网络损失值的过程中,所述候选边界框的宽度所对应的权重高于所述候选边界框的长度所对应的权重。
- 根据权利要求29至32中任一项所述的装置,其特征在于,所述前景分割单元具体用于:对所述特征数据进行上采样处理,以使得处理后的所述特征数据的大小与样本图像的大小相同;基于所述处理后的所述特征数据进行像素分割,获得所述样本图像的样本前景分割结果。
- 根据权利要求29-33中任一项所述的装置,其特征在于,所述样本图像包含的目标对象的长宽比高于设定值。
- 一种目标检测设备,其特征在于,所述设备包括存储器、处理器,所述存储器用于存储可在所述处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求1至11任一所述的方法。
- 一种目标检测网络的训练设备,其特征在于,所述设备包括存储器、处理器,所述存储器用于存储可在所述处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求12至17任一所述的方法。
- 一种非易失性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时,促使所述处理器实现权利要求1至11任一所述的方法,或实现权利要求12至17任一所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020207030752A KR102414452B1 (ko) | 2019-06-26 | 2019-12-25 | 목표 검출 및 목표 검출 네트워크의 훈련 |
SG11202010475SA SG11202010475SA (en) | 2019-06-26 | 2019-12-25 | Target detection and training for target detection network |
JP2020561707A JP7096365B2 (ja) | 2019-06-26 | 2019-12-25 | 目標検出および目標検出ネットワークのトレーニング |
US17/076,136 US20210056708A1 (en) | 2019-06-26 | 2020-10-21 | Target detection and training for target detection network |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910563005.8 | 2019-06-26 | ||
CN201910563005.8A CN110298298B (zh) | 2019-06-26 | 2019-06-26 | 目标检测及目标检测网络的训练方法、装置及设备 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/076,136 Continuation US20210056708A1 (en) | 2019-06-26 | 2020-10-21 | Target detection and training for target detection network |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020258793A1 true WO2020258793A1 (zh) | 2020-12-30 |
Family
ID=68028948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/128383 WO2020258793A1 (zh) | 2019-06-26 | 2019-12-25 | 目标检测及目标检测网络的训练 |
Country Status (7)
Country | Link |
---|---|
US (1) | US20210056708A1 (zh) |
JP (1) | JP7096365B2 (zh) |
KR (1) | KR102414452B1 (zh) |
CN (1) | CN110298298B (zh) |
SG (1) | SG11202010475SA (zh) |
TW (1) | TWI762860B (zh) |
WO (1) | WO2020258793A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113537342A (zh) * | 2021-07-14 | 2021-10-22 | 浙江智慧视频安防创新中心有限公司 | 一种图像中物体检测方法、装置、存储介质及终端 |
CN113657482A (zh) * | 2021-08-14 | 2021-11-16 | 北京百度网讯科技有限公司 | 模型训练方法、目标检测方法、装置、设备以及存储介质 |
CN114359561A (zh) * | 2022-01-10 | 2022-04-15 | 北京百度网讯科技有限公司 | 一种目标检测方法及目标检测模型的训练方法、装置 |
CN114387492A (zh) * | 2021-11-19 | 2022-04-22 | 西北工业大学 | 一种基于深度学习的近岸水面区域舰船检测方法及装置 |
CN114842510A (zh) * | 2022-05-27 | 2022-08-02 | 澜途集思生态科技集团有限公司 | 基于ScratchDet算法的生态生物识别方法 |
CN115496917A (zh) * | 2022-11-01 | 2022-12-20 | 中南大学 | 一种GPR B-Scan图像中的多目标检测方法及装置 |
Families Citing this family (67)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110298298B (zh) * | 2019-06-26 | 2022-03-08 | 北京市商汤科技开发有限公司 | 目标检测及目标检测网络的训练方法、装置及设备 |
CN110781819A (zh) * | 2019-10-25 | 2020-02-11 | 浪潮电子信息产业股份有限公司 | 一种图像目标检测方法、系统、电子设备及存储介质 |
CN110866928B (zh) * | 2019-10-28 | 2021-07-16 | 中科智云科技有限公司 | 基于神经网络的目标边界分割及背景噪声抑制方法及设备 |
CN112784638B (zh) * | 2019-11-07 | 2023-12-08 | 北京京东乾石科技有限公司 | 训练样本获取方法和装置、行人检测方法和装置 |
CN110930420B (zh) * | 2019-11-11 | 2022-09-30 | 中科智云科技有限公司 | 基于神经网络的稠密目标背景噪声抑制方法及设备 |
CN110880182B (zh) * | 2019-11-18 | 2022-08-26 | 东声(苏州)智能科技有限公司 | 图像分割模型训练方法、图像分割方法、装置及电子设备 |
US11200455B2 (en) * | 2019-11-22 | 2021-12-14 | International Business Machines Corporation | Generating training data for object detection |
CN111027602B (zh) * | 2019-11-25 | 2023-04-07 | 清华大学深圳国际研究生院 | 一种多级结构目标检测方法及系统 |
CN112886996B (zh) * | 2019-11-29 | 2024-08-20 | 北京三星通信技术研究有限公司 | 信号接收方法、用户设备、电子设备及计算机存储介质 |
WO2021111622A1 (ja) * | 2019-12-06 | 2021-06-10 | 日本電気株式会社 | パラメータ決定装置、パラメータ決定方法、及び、非一時的なコンピュータ可読媒体 |
CN111079638A (zh) * | 2019-12-13 | 2020-04-28 | 河北爱尔工业互联网科技有限公司 | 基于卷积神经网络的目标检测模型训练方法、设备和介质 |
CN111179300A (zh) * | 2019-12-16 | 2020-05-19 | 新奇点企业管理集团有限公司 | 障碍物检测的方法、装置、系统、设备以及存储介质 |
CN113051969A (zh) * | 2019-12-26 | 2021-06-29 | 深圳市超捷通讯有限公司 | 物件识别模型训练方法及车载装置 |
SG10201913754XA (en) * | 2019-12-30 | 2020-12-30 | Sensetime Int Pte Ltd | Image processing method and apparatus, electronic device, and storage medium |
CN111105411B (zh) * | 2019-12-30 | 2023-06-23 | 创新奇智(青岛)科技有限公司 | 一种磁瓦表面缺陷检测方法 |
CN111079707B (zh) * | 2019-12-31 | 2023-06-13 | 深圳云天励飞技术有限公司 | 人脸检测方法及相关装置 |
CN111241947B (zh) * | 2019-12-31 | 2023-07-18 | 深圳奇迹智慧网络有限公司 | 目标检测模型的训练方法、装置、存储介质和计算机设备 |
CN111260666B (zh) * | 2020-01-19 | 2022-05-24 | 上海商汤临港智能科技有限公司 | 图像处理方法及装置、电子设备、计算机可读存储介质 |
CN111508019A (zh) * | 2020-03-11 | 2020-08-07 | 上海商汤智能科技有限公司 | 目标检测方法及其模型的训练方法及相关装置、设备 |
CN111353464B (zh) * | 2020-03-12 | 2023-07-21 | 北京迈格威科技有限公司 | 一种物体检测模型训练、物体检测方法及装置 |
CN113496513A (zh) * | 2020-03-20 | 2021-10-12 | 阿里巴巴集团控股有限公司 | 一种目标对象检测方法及装置 |
US11847771B2 (en) * | 2020-05-01 | 2023-12-19 | Samsung Electronics Co., Ltd. | Systems and methods for quantitative evaluation of optical map quality and for data augmentation automation |
CN111582265A (zh) * | 2020-05-14 | 2020-08-25 | 上海商汤智能科技有限公司 | 一种文本检测方法及装置、电子设备和存储介质 |
CN111738112B (zh) * | 2020-06-10 | 2023-07-07 | 杭州电子科技大学 | 基于深度神经网络和自注意力机制的遥感船舶图像目标检测方法 |
CN111797704B (zh) * | 2020-06-11 | 2023-05-02 | 同济大学 | 一种基于相关物体感知的动作识别方法 |
CN111797993B (zh) * | 2020-06-16 | 2024-02-27 | 东软睿驰汽车技术(沈阳)有限公司 | 深度学习模型的评价方法、装置、电子设备及存储介质 |
CN112001247B (zh) * | 2020-07-17 | 2024-08-06 | 浙江大华技术股份有限公司 | 多目标检测方法、设备及存储装置 |
CN111967595B (zh) * | 2020-08-17 | 2023-06-06 | 成都数之联科技股份有限公司 | 候选框标注方法及系统及模型训练方法及目标检测方法 |
US11657373B2 (en) * | 2020-08-21 | 2023-05-23 | Accenture Global Solutions Limited | System and method for identifying structural asset features and damage |
CN112508848B (zh) * | 2020-11-06 | 2024-03-26 | 上海亨临光电科技有限公司 | 一种基于深度学习多任务端到端的遥感图像船舶旋转目标检测方法 |
KR20220068357A (ko) * | 2020-11-19 | 2022-05-26 | 한국전자기술연구원 | 딥러닝 객체 검출 처리 장치 |
CN112597837B (zh) * | 2020-12-11 | 2024-05-28 | 北京百度网讯科技有限公司 | 图像检测方法、装置、设备、存储介质和计算机程序产品 |
CN112906732B (zh) * | 2020-12-31 | 2023-12-15 | 杭州旷云金智科技有限公司 | 目标检测方法、装置、电子设备及存储介质 |
CN112862761B (zh) * | 2021-01-20 | 2023-01-17 | 清华大学深圳国际研究生院 | 一种基于深度神经网络的脑瘤mri图像分割方法及系统 |
KR102378887B1 (ko) * | 2021-02-15 | 2022-03-25 | 인하대학교 산학협력단 | 객체 탐지에서의 둘레기반 IoU 손실함수를 통한 효율적인 바운딩 박스 회귀 학습 방법 및 장치 |
CN112966587B (zh) * | 2021-03-02 | 2022-12-20 | 北京百度网讯科技有限公司 | 目标检测模型的训练方法、目标检测方法及相关设备 |
CN113780270B (zh) * | 2021-03-23 | 2024-06-21 | 京东鲲鹏(江苏)科技有限公司 | 目标检测方法和装置 |
CN112967322B (zh) * | 2021-04-07 | 2023-04-18 | 深圳创维-Rgb电子有限公司 | 运动目标检测模型建立方法和运动目标检测方法 |
CN113095257A (zh) * | 2021-04-20 | 2021-07-09 | 上海商汤智能科技有限公司 | 异常行为检测方法、装置、设备及存储介质 |
CN113160201B (zh) * | 2021-04-30 | 2024-04-12 | 聚时科技(上海)有限公司 | 基于极坐标的环状边界框的目标检测方法 |
CN112990204B (zh) * | 2021-05-11 | 2021-08-24 | 北京世纪好未来教育科技有限公司 | 目标检测方法、装置、电子设备及存储介质 |
CN113706450A (zh) * | 2021-05-18 | 2021-11-26 | 腾讯科技(深圳)有限公司 | 图像配准方法、装置、设备及可读存储介质 |
CN113313697B (zh) * | 2021-06-08 | 2023-04-07 | 青岛商汤科技有限公司 | 图像分割和分类方法及其模型训练方法、相关装置及介质 |
CN113284185B (zh) * | 2021-06-16 | 2022-03-15 | 河北工业大学 | 用于遥感目标检测的旋转目标检测方法 |
CN113536986B (zh) * | 2021-06-29 | 2024-06-14 | 南京逸智网络空间技术创新研究院有限公司 | 一种基于代表特征的遥感图像中的密集目标检测方法 |
CN113627421B (zh) * | 2021-06-30 | 2024-09-06 | 华为技术有限公司 | 一种图像处理方法、模型的训练方法以及相关设备 |
CN113505256B (zh) * | 2021-07-02 | 2022-09-02 | 北京达佳互联信息技术有限公司 | 特征提取网络训练方法、图像处理方法及装置 |
CN113610764A (zh) * | 2021-07-12 | 2021-11-05 | 深圳市银星智能科技股份有限公司 | 地毯识别方法、装置、智能设备及存储介质 |
CN113361662B (zh) * | 2021-07-22 | 2023-08-29 | 全图通位置网络有限公司 | 一种城市轨道交通遥感图像数据的处理系统及方法 |
CN113658199B (zh) * | 2021-09-02 | 2023-11-03 | 中国矿业大学 | 基于回归修正的染色体实例分割网络 |
CN113469302A (zh) * | 2021-09-06 | 2021-10-01 | 南昌工学院 | 一种视频图像的多圆形目标识别方法和系统 |
US11900643B2 (en) * | 2021-09-17 | 2024-02-13 | Himax Technologies Limited | Object detection method and object detection system |
CN113850783B (zh) * | 2021-09-27 | 2022-08-30 | 清华大学深圳国际研究生院 | 一种海面船舶检测方法及系统 |
CN114037865B (zh) * | 2021-11-02 | 2023-08-22 | 北京百度网讯科技有限公司 | 图像处理方法、装置、设备、存储介质和程序产品 |
CN114118408A (zh) * | 2021-11-11 | 2022-03-01 | 北京达佳互联信息技术有限公司 | 图像处理模型的训练方法、图像处理方法、装置及设备 |
CN114399697A (zh) * | 2021-11-25 | 2022-04-26 | 北京航空航天大学杭州创新研究院 | 一种基于运动前景的场景自适应目标检测方法 |
WO2023128323A1 (ko) * | 2021-12-28 | 2023-07-06 | 삼성전자 주식회사 | 목표 객체를 검출하는 전자 장치 및 방법 |
WO2023178542A1 (en) * | 2022-03-23 | 2023-09-28 | Robert Bosch Gmbh | Image processing apparatus and method |
CN114492210B (zh) * | 2022-04-13 | 2022-07-19 | 潍坊绘圆地理信息有限公司 | 一种高光谱卫星星载数据智能解译系统及其实现方法 |
CN114463603B (zh) * | 2022-04-14 | 2022-08-23 | 浙江啄云智能科技有限公司 | 图像检测模型的训练方法、装置、电子设备及存储介质 |
CN115131552A (zh) * | 2022-07-20 | 2022-09-30 | 上海联影智能医疗科技有限公司 | 目标检测方法、计算机设备和存储介质 |
CN117036670B (zh) * | 2022-10-20 | 2024-06-07 | 腾讯科技(深圳)有限公司 | 质量检测模型的训练方法、装置、设备、介质及程序产品 |
CN116152487A (zh) * | 2023-04-17 | 2023-05-23 | 广东广物互联网科技有限公司 | 一种基于深度IoU网络的目标检测方法、装置、设备及介质 |
CN116721093B (zh) * | 2023-08-03 | 2023-10-31 | 克伦斯(天津)轨道交通技术有限公司 | 基于神经网络的地铁轨道障碍物检测方法和系统 |
CN117876384B (zh) * | 2023-12-21 | 2024-08-20 | 珠海横琴圣澳云智科技有限公司 | 目标对象实例分割、模型训练方法及相关产品 |
CN117854211B (zh) * | 2024-03-07 | 2024-05-28 | 南京奥看信息科技有限公司 | 一种基于智能视觉的目标对象识别方法及装置 |
CN118397256B (zh) * | 2024-06-28 | 2024-08-30 | 武汉卓目科技股份有限公司 | Sar图像舰船目标检测方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106898005A (zh) * | 2017-01-04 | 2017-06-27 | 努比亚技术有限公司 | 一种实现交互式图像分割的方法、装置及终端 |
CN107369158A (zh) * | 2017-06-13 | 2017-11-21 | 南京邮电大学 | 基于rgb‑d图像的室内场景布局估计及目标区域提取方法 |
CN107862262A (zh) * | 2017-10-27 | 2018-03-30 | 中国航空无线电电子研究所 | 一种适用于高空侦察的快速可见光图像舰船检测方法 |
CN108513131A (zh) * | 2018-03-28 | 2018-09-07 | 浙江工业大学 | 一种自由视点视频深度图感兴趣区域编码方法 |
CN110298298A (zh) * | 2019-06-26 | 2019-10-01 | 北京市商汤科技开发有限公司 | 目标检测及目标检测网络的训练方法、装置及设备 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9665767B2 (en) * | 2011-02-28 | 2017-05-30 | Aic Innovations Group, Inc. | Method and apparatus for pattern tracking |
KR20140134505A (ko) * | 2013-05-14 | 2014-11-24 | 경성대학교 산학협력단 | 영상 객체 추적 방법 |
CN103530613B (zh) * | 2013-10-15 | 2017-02-01 | 易视腾科技股份有限公司 | 一种基于单目视频序列的目标人手势交互方法 |
CN105046721B (zh) * | 2015-08-03 | 2018-08-17 | 南昌大学 | 基于Grabcut及LBP跟踪质心矫正模型的Camshift算法 |
CN107872644B (zh) * | 2016-09-23 | 2020-10-09 | 亿阳信通股份有限公司 | 视频监控方法及装置 |
US10657364B2 (en) * | 2016-09-23 | 2020-05-19 | Samsung Electronics Co., Ltd | System and method for deep network fusion for fast and robust object detection |
KR20180107988A (ko) * | 2017-03-23 | 2018-10-04 | 한국전자통신연구원 | 객체 탐지 장치 및 방법 |
KR101837482B1 (ko) * | 2017-03-28 | 2018-03-13 | (주)이더블유비엠 | 영상처리방법 및 장치, 그리고 이를 이용한 제스처 인식 인터페이스 방법 및 장치 |
JP2019061505A (ja) | 2017-09-27 | 2019-04-18 | 株式会社デンソー | 情報処理システム、制御システム、及び学習方法 |
US10037610B1 (en) | 2017-10-03 | 2018-07-31 | StradVision, Inc. | Method for tracking and segmenting a target object in an image using Markov Chain, and device using the same |
CN108717693A (zh) * | 2018-04-24 | 2018-10-30 | 浙江工业大学 | 一种基于rpn的视盘定位方法 |
CN109214353B (zh) * | 2018-09-27 | 2021-11-23 | 云南大学 | 一种基于剪枝模型的人脸图像快速检测训练方法和装置 |
-
2019
- 2019-06-26 CN CN201910563005.8A patent/CN110298298B/zh active Active
- 2019-12-25 JP JP2020561707A patent/JP7096365B2/ja active Active
- 2019-12-25 SG SG11202010475SA patent/SG11202010475SA/en unknown
- 2019-12-25 WO PCT/CN2019/128383 patent/WO2020258793A1/zh active Application Filing
- 2019-12-25 KR KR1020207030752A patent/KR102414452B1/ko active IP Right Grant
-
2020
- 2020-01-17 TW TW109101702A patent/TWI762860B/zh active
- 2020-10-21 US US17/076,136 patent/US20210056708A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106898005A (zh) * | 2017-01-04 | 2017-06-27 | 努比亚技术有限公司 | 一种实现交互式图像分割的方法、装置及终端 |
CN107369158A (zh) * | 2017-06-13 | 2017-11-21 | 南京邮电大学 | 基于rgb‑d图像的室内场景布局估计及目标区域提取方法 |
CN107862262A (zh) * | 2017-10-27 | 2018-03-30 | 中国航空无线电电子研究所 | 一种适用于高空侦察的快速可见光图像舰船检测方法 |
CN108513131A (zh) * | 2018-03-28 | 2018-09-07 | 浙江工业大学 | 一种自由视点视频深度图感兴趣区域编码方法 |
CN110298298A (zh) * | 2019-06-26 | 2019-10-01 | 北京市商汤科技开发有限公司 | 目标检测及目标检测网络的训练方法、装置及设备 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113537342A (zh) * | 2021-07-14 | 2021-10-22 | 浙江智慧视频安防创新中心有限公司 | 一种图像中物体检测方法、装置、存储介质及终端 |
CN113657482A (zh) * | 2021-08-14 | 2021-11-16 | 北京百度网讯科技有限公司 | 模型训练方法、目标检测方法、装置、设备以及存储介质 |
CN114387492A (zh) * | 2021-11-19 | 2022-04-22 | 西北工业大学 | 一种基于深度学习的近岸水面区域舰船检测方法及装置 |
CN114387492B (zh) * | 2021-11-19 | 2024-10-15 | 西北工业大学 | 一种基于深度学习的近岸水面区域舰船检测方法及装置 |
CN114359561A (zh) * | 2022-01-10 | 2022-04-15 | 北京百度网讯科技有限公司 | 一种目标检测方法及目标检测模型的训练方法、装置 |
CN114842510A (zh) * | 2022-05-27 | 2022-08-02 | 澜途集思生态科技集团有限公司 | 基于ScratchDet算法的生态生物识别方法 |
CN115496917A (zh) * | 2022-11-01 | 2022-12-20 | 中南大学 | 一种GPR B-Scan图像中的多目标检测方法及装置 |
CN115496917B (zh) * | 2022-11-01 | 2023-09-26 | 中南大学 | 一种GPR B-Scan图像中的多目标检测方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
US20210056708A1 (en) | 2021-02-25 |
TWI762860B (zh) | 2022-05-01 |
KR20210002104A (ko) | 2021-01-06 |
SG11202010475SA (en) | 2021-01-28 |
TW202101377A (zh) | 2021-01-01 |
CN110298298A (zh) | 2019-10-01 |
CN110298298B (zh) | 2022-03-08 |
KR102414452B1 (ko) | 2022-06-29 |
JP7096365B2 (ja) | 2022-07-05 |
JP2021532435A (ja) | 2021-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020258793A1 (zh) | 目标检测及目标检测网络的训练 | |
CN111507335B (zh) | 自动标注利用于深度学习网络的训练图像的方法和装置 | |
WO2023116631A1 (zh) | 旋转船只目标检测模型的训练方法、训练装置和存储介质 | |
CN109598241B (zh) | 基于Faster R-CNN的卫星图像海上舰船识别方法 | |
US20130155235A1 (en) | Image processing method | |
CN109858547A (zh) | 一种基于bssd的目标检测方法与装置 | |
AU2020272936B2 (en) | Methods and systems for crack detection using a fully convolutional network | |
CN114612835A (zh) | 一种基于YOLOv5网络的无人机目标检测模型 | |
CN113850761B (zh) | 一种基于多角度检测框的遥感图像目标检测方法 | |
CN114627173A (zh) | 通过差分神经渲染进行对象检测的数据增强 | |
US20220114396A1 (en) | Methods, apparatuses, electronic devices and storage media for controlling image acquisition | |
CN115953371A (zh) | 一种绝缘子缺陷检测方法、装置、设备和存储介质 | |
CN114332633B (zh) | 雷达图像目标检测识别方法、设备和存储介质 | |
Song et al. | Fine-grained object detection in remote sensing images via adaptive label assignment and refined-balanced feature pyramid network | |
CN112215217A (zh) | 模拟医师阅片的数字图像识别方法及装置 | |
CN115100616A (zh) | 点云目标检测方法、装置、电子设备及存储介质 | |
CN114565824B (zh) | 基于全卷积网络的单阶段旋转舰船检测方法 | |
CN112884795A (zh) | 一种基于多特征显著性融合的输电线路巡检前景与背景分割方法 | |
CN114359286A (zh) | 一种基于人工智能的绝缘子缺陷识别方法、设备及介质 | |
CN113610178A (zh) | 一种基于视频监控图像的内河船舶目标检测方法和装置 | |
CN117011688B (zh) | 一种水下结构病害的识别方法、系统及存储介质 | |
CN115035429A (zh) | 一种基于复合主干网络和多预测头的航拍目标检测方法 | |
CN118379696B (zh) | 一种船舶目标检测方法、装置及可读存储介质 | |
CN114004980B (zh) | 一种基于cemgm-fe-fcn的车辆三维尺寸信息提取方法 | |
CN118247513B (zh) | 光伏板组件分割方法、装置、电子设备和介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2020561707 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19935279 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935279 Country of ref document: EP Kind code of ref document: A1 |