WO2022237153A1 - 目标检测方法及其模型训练方法、相关装置、介质及程序产品 - Google Patents

目标检测方法及其模型训练方法、相关装置、介质及程序产品 Download PDF

Info

Publication number
WO2022237153A1
WO2022237153A1 PCT/CN2021/137203 CN2021137203W WO2022237153A1 WO 2022237153 A1 WO2022237153 A1 WO 2022237153A1 CN 2021137203 W CN2021137203 W CN 2021137203W WO 2022237153 A1 WO2022237153 A1 WO 2022237153A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection
frame
loss
detection frame
target
Prior art date
Application number
PCT/CN2021/137203
Other languages
English (en)
French (fr)
Inventor
宋涛
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022237153A1 publication Critical patent/WO2022237153A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular to an object detection method and its model training method, related devices, media and program products.
  • the method for training the network model may include the following steps. First, input the sample image into the network model, obtain the output result of the network model, and then obtain the error between the output result and the annotation information in the sample image, and then Adjust the parameters in the network model according to this error.
  • the problem with this approach is that the training of the network model requires the establishment of a large number of labeled sample images. Generally, the sample images are manually labeled, and the process of obtaining a large number of labeled sample images is time-consuming and labor-intensive.
  • Embodiments of the present disclosure provide a target detection method and its model training method, related devices, media and program products.
  • an embodiment of the present disclosure provides a method for training a target detection model, including: using the target detection model to respectively detect targets in the first sample image and the second sample image, and obtain the The first detection result and the second detection result of the second sample image; determining at least one first detection frame in the first sample image based on the first detection result, and determining at least one first detection frame in the second sample image based on the second detection result Two detection frames, wherein the first detection frame and the second detection frame are used to represent the position of the target; based on the difference information between the first detection frame and the second detection frame representing the same target, the first detection frame of the target detection model is obtained loss; using the first loss, adjust the network parameters of the object detection model.
  • the first loss of the target detection model is obtained by obtaining the difference information between the detection frames containing the same target corresponding to the first sample image and the second sample image, and then the target can be adjusted by using the first loss
  • the network parameters of the detection model do not require a large number of labeled sample images, thereby reducing the training cost of the target detection model. There may be a large amount of invalid information in the first detection result and the second detection result.
  • the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result and The difference information between the second detection results obtains the first loss, which can improve the accuracy of the first loss.
  • the method before obtaining the first loss of the target detection model based on the difference information between the first detection frame and the second detection frame representing the same target, the method further includes: combining at least one first The detection frame is matched with at least one second detection frame to obtain several matching frame pairs, wherein each matching frame pair includes a first detection frame and a second detection frame matching the first detection frame, and different matching frame pairs The detection frames included are different; and, based on the difference information between the first detection frame and the second detection frame representing the same target, the first loss of the target detection model is obtained, including: using each matching frame to center the first detection The difference information between the box and the second detection box, get the first loss.
  • the first detection frame and the second detection frame representing the same target can be determined, thereby utilizing the difference information between the two The first loss is obtained, so that the accuracy of obtaining the obtained first loss is higher.
  • the matching method is a bipartite graph matching method; and/or, at least one first detection frame is matched with at least one second detection frame to obtain several matching frame pairs, including: Composing each first detection frame and each second detection frame into a plurality of candidate frame pairs; based on the matching degree of each candidate frame pair, selecting a part of the candidate frame pairs as matching frame pairs; and/or, using each matching frame pair
  • the difference information between the first detection frame and the second detection frame to obtain the first loss including: in the case of multiple matching frame pairs, accumulating the difference information of each matching frame pair to obtain the first loss .
  • multiple matching methods can be selected, so that any detection frame in the matching result has at most one detection frame matching it, thereby improving the accuracy of obtaining the first detection frame and the second detection frame representing the same target.
  • both the first detection frame and the second detection frame correspond to the first position information and/or the first category information about the target;
  • the difference information includes at least one of the following: the first detection frame and the second detection frame A first difference about the first position information and a second difference about the first category information between the two detection frames.
  • the obtained difference is more accurate.
  • the first loss of the target detection model is obtained based on the difference information between the first detection frame and the second detection frame representing the same target, including: using the first difference to obtain the position loss, and The category loss is obtained by using the second difference; the first loss is obtained based on the position loss and the category loss, wherein the position loss includes a distance loss and/or a detection frame intersection loss.
  • the position loss is further divided into example loss and detection frame intersection and ratio loss, so that the acquired position loss is more accurate.
  • the first sample image and the second sample image come from a sample set
  • the sample images in the sample set include labeled sample images and/or unlabeled sample images
  • the labeled sample images are marked with target Ground-truth frame information, wherein the ground-truth frame information includes second position information and second category information about the target
  • Adjusting the network parameters of the target detection model includes: using at least one of the first detection frame and the second detection frame as the target detection frame, and using the difference between the target detection frame and the corresponding real frame information to obtain the second loss; Using the first loss and the second loss, the network parameters of the object detection model are adjusted.
  • the difference between the target detection frame and the real frame is also obtained to determine the second loss, and then according to the second loss and the first The loss adjusts the network parameters of the object detection model, so that the adjustment effect of the object detection model is better.
  • using the first loss and the second loss to adjust the network parameters of the target detection model includes: using the weighted first loss and the second loss to obtain the total loss of the target detection model, where the target The more times the detection model is trained, the greater the weight of the first loss; based on the total loss, the network parameters of the target detection model are adjusted.
  • the initial training of the target detection model is mainly fully supervised training, and then gradually increase the weight of the first loss to ensure the stability of the training process.
  • determining at least one first detection frame in the first sample image based on the first detection result, and determining at least one second detection frame in the second sample image based on the second detection result include: Decoding the first detection result to obtain a number of first candidate detection frames, decoding the second detection result to obtain a number of second candidate detection frames; wherein, the first candidate detection frame and the second candidate detection frame both correspond to the target The third position information and/or the third category information; using the third position information and/or the third category information of the first candidate detection frame and the second candidate detection frame, the first candidate detection frame and the second candidate detection frame Screening is performed to obtain the first detection frame and the second detection frame.
  • the category probability of the candidate detection frame is low, and there may be multiple candidate detection frames representing the same
  • the overlapping of detection frames can be reduced, and the detection frame with a lower class probability will participate in the subsequent loss calculation, thereby improving the obtained first loss of accuracy.
  • the first candidate detection frame and the second candidate detection frame are screened by using the third position information and/or the third category information of the first candidate detection frame and the second candidate detection frame to obtain
  • the first detection frame and the second detection frame include: using the third category information, selecting a preset number of first candidate detection frames whose category probability meets the first requirement from several first candidate detection frames, and selecting from several second candidate detection frames Select a preset number of second candidate detection frames whose class probability meets the first requirement in the detection frame; based on the third position information, use the non-maximum value suppression method to select from the preset number of first candidate detection frames and second candidate detection frames Screening is performed to obtain the first detection frame and the second detection frame.
  • the first detection result includes a first classification map and a first positioning map corresponding to the first sample image
  • the second detection result includes a second classification map and a first positioning map corresponding to the second sample image.
  • the second positioning map; and/or, the second sample image is obtained by processing the first sample image; and/or, the first sample image and the second sample image are medical images.
  • the corresponding second sample image is obtained by processing the first sample image, so that the first sample image and the second sample image include at least one same object.
  • an embodiment of the present disclosure provides a target detection method, including: acquiring an image to be detected; using a target detection model to perform target detection on the image to be detected to obtain a target detection result, wherein the target detection model is obtained from the above first aspect
  • the training method of the target detection model is obtained through training.
  • the target detection result on the image to be detected can be obtained without manual detection, thereby improving the target detection efficiency of the image to be detected.
  • an embodiment of the present disclosure provides a training device for a target detection model, including: a detection module configured to use the target detection model to respectively detect targets in the first sample image and the second sample image, and obtain the first A first detection result of a sample image and a second detection result of a second sample image; a detection frame acquisition module configured to determine at least one first detection frame in the first sample image based on the first detection result, based on the second The detection result determines at least one second detection frame in the second sample image, wherein the first detection frame and the second detection frame are configured to represent the position of the target; the loss acquisition module is configured to represent the same target based on the first detection frame and the difference information between the second detection frame to obtain the first loss of the target detection model; the parameter adjustment module is configured to use the first loss to adjust the network parameters of the target detection model.
  • an embodiment of the present disclosure provides a target detection device, including: an image acquisition module configured to acquire an image to be detected; a target detection module configured to use a target detection model to perform target detection on the image to be detected to obtain a target detection result , wherein the target detection model is obtained by training the target detection model training method described in the first aspect above.
  • an embodiment of the present disclosure provides an electronic device, including a memory and a processor, and the processor is configured to execute program instructions stored in the memory, so as to implement the object detection model training method described in the first aspect above.
  • an embodiment of the present disclosure provides a computer-readable storage medium on which program instructions are stored, and when the program instructions are executed by a processor, the method for training an object detection model described in the first aspect above is implemented.
  • an embodiment of the present disclosure provides a computer program product, including computer readable codes.
  • a processor in the electronic device executes the first The training method of the target detection model described in the aspect.
  • the first loss of the target detection model is obtained by obtaining the difference information between the detection frames containing the same target corresponding to the first sample image and the second sample image, and then the first loss can be used to Adjusting the network parameters of the target detection model does not require a large number of labeled sample images, thereby reducing the training cost of the target detection model. There may be a large amount of invalid information in the first detection result and the second detection result.
  • the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result and The difference information between the second detection results obtains the first loss, which can improve the accuracy of the first loss.
  • FIG. 1 is a first schematic flow diagram of a method for training a target detection model provided by an embodiment of the present disclosure
  • FIG. 2 is a second schematic flow diagram of a method for training a target detection model provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of a target detection method provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a training device for a target detection model provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of a target detection device provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present disclosure.
  • FIG. 1 is a schematic flowchart of a method for training a target detection model provided by an embodiment of the present disclosure.
  • the method for training a target detection model may include the following steps:
  • Step S11 Use the target detection model to detect the target in the first sample image and the second sample image respectively, and obtain a first detection result of the first sample image and a second detection result of the second sample image.
  • the target detection model to be trained in the embodiments of the present disclosure may be any network model that can be used for target detection. Therefore, the embodiments of the present disclosure do not limit the target detection model.
  • the first sample image and the second sample image may include at least one same object. That is, the first sample image and the second sample image may be obtained by photographing the same target. In some implementations, the first sample image may be obtained by performing a series of image processing on the second sample image, or the second sample image may be obtained by performing a series of image processing on the first sample image. Image manipulation can be cropping, rotation, data augmentation, etc. The method of image processing is not specified here. Moreover, the first sample image and the second sample image in the embodiments of the present disclosure may be two-dimensional (two-dimensional, 2D) images, or three-dimensional (three-dimensional, 3D) images.
  • Step S12 Determine at least one first detection frame in the first sample image based on the first detection result, and determine at least one second detection frame in the second sample image based on the second detection result, wherein the first detection frame and the second detection frame Two detection boxes are used to represent the position of the target.
  • both the obtained first detection frame and the second detection frame are used to represent the position of the target in the first sample image or the second sample image.
  • the manner of determining at least one first detection frame in the first sample image based on the first detection result may be to obtain the corresponding first detection frame through operations such as decoding the first detection result.
  • the manner of determining at least one second detection frame in the second sample image based on the second detection result may be to obtain the corresponding second detection frame through operations such as decoding the second detection result.
  • Step S13 Obtain the first loss of the object detection model based on the difference information between the first detection frame and the second detection frame representing the same object.
  • the target detection results generally include classification results and positioning results. That is, it is necessary to find the category of the target in the sample image and the location of the target. That is, the difference information between the first detection frame and the second detection frame representing the same object described in the embodiment of the present disclosure may be one of the classification difference and the positioning difference, or both of the classification difference and the positioning difference , therefore, the difference information is not specified here.
  • the first loss may also be referred to as consistency loss.
  • Step S14 Using the first loss, adjust the network parameters of the object detection model.
  • the object detection results of the object detection model on the first sample image and the second sample image should be the same, or the error between the two should be less than or equal to the preset error. That is to say, the loss between the two should be very small and within an acceptable range.
  • the network parameters of the target detection model can be adjusted according to the first loss, so that the loss between the first sample image and the second sample image for the subsequent target detection model Less than the preset loss, that is, the object detection results of the object detection model between the first sample image and the second sample image are basically the same.
  • the above solution obtains the first loss of the target detection model by obtaining the difference information between the detection frames containing the same target corresponding to the first sample image and the second sample image, and then uses the first loss to adjust the target detection
  • the network parameters of the model do not require a large number of labeled sample images, thereby reducing the training cost of the target detection model. Since there may be a large amount of invalid information in the first detection result and the second detection result, the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result The difference information between the first loss and the second detection result can improve the accuracy of the first loss.
  • the target detection model is first trained using labeled sample images, and then gradually added to unlabeled sample images to train the target detection model, that is, the target detection model is trained using a semi-supervised training method.
  • the general semi-supervised training method includes the following steps: first, use the labeled sample images to train the target detection model, after a period of training, use the detection network to predict the unlabeled sample images to obtain the prediction results, and then use the labeled prediction results The image is added to the remaining labeled sample images, and input to the target detection model to train the target detection model.
  • the consistency loss between the unlabeled sample image and the sample image containing at least one target can be used to perform the target detection model Training, that is, there is no need to input the image with the prediction result into the detection network again for training, thus simplifying the training steps.
  • the first sample image and the second sample image are from a sample set.
  • the sample images in the sample set include labeled sample images and/or unlabeled sample images.
  • the marked sample image is marked with ground-truth frame information of the target, wherein the ground-truth frame information includes second position information and second category information about the target.
  • the second position information refers to the position information of the target in the sample image
  • the second category information refers to the category to which the target belongs.
  • the first sample image and the second sample image are medical images. For example, it may be a CT image, or an image obtained through other medical equipment.
  • the category of the target can be a common lesion, such as a pulmonary nodule.
  • the second sample image is obtained by processing the first sample image. For example, crop, rotate, adjust image brightness, etc. on the first sample image.
  • the cropping here may be cropping any region of the first sample image, for example, it may be cropping an edge region of the first sample image, or it may be cropping a middle region of the first sample image.
  • the clipped area can be filled with preset pixel values, for example, filled with 0 pixels, or filled with a certain area in other sample images. Regarding how to fill the clipped area, this embodiment of the present disclosure does not Make regulations.
  • the corresponding second sample image is obtained by processing the first sample image, so that the first sample image and the second sample image include at least one same object.
  • the first detection result includes a first classification map and a first positioning map corresponding to the first sample image
  • the second detection result includes a second classification map and a second positioning map corresponding to the second sample image. picture.
  • the first classification diagram and the second classification diagram respectively include the target classification number of the output grid.
  • the first positioning map and the second positioning map respectively include the upper boundary, the lower boundary, the left boundary and the right boundary of the first detection frame or the second detection frame.
  • each detection frame can be represented by a four-dimensional vector.
  • both the first detection frame and the second detection frame correspond to first position information and/or first category information about the target.
  • the category information may also include the probability of belonging to the category. For example, assuming that the probability that the first detection frame belongs to the first target is 0.7, and the probability that it belongs to the background is 0.3, in this case, the first category information of the first detection frame may include that the first detection frame belongs to the first target, and The probability of belonging to the first target is 0.7.
  • the steps of determining at least one first detection frame in the first sample image based on the first detection result, and determining at least one second detection frame in the second sample image based on the second detection result include: 1.
  • the detection result is decoded to obtain a plurality of first candidate detection frames
  • the second detection result is decoded to obtain a plurality of second candidate detection frames.
  • the number in the embodiments of the present disclosure may be 1, or 2 or more.
  • both the first candidate detection frame and the second candidate detection frame correspond to third position information and/or third category information about the target.
  • the decoding method please refer to the general method of decoding the output result of the target detection model. 2.
  • Use the third position information and/or the third category information of the first candidate detection frame and the second candidate detection frame to screen the first candidate detection frame and the second candidate detection frame to obtain the first detection frame and the second detection box.
  • the category probability of the candidate detection frame is low, and there may be multiple candidate detection frames representing the same target
  • by screening the first candidate detection frame and the second candidate detection frame it is possible to reduce the overlap of the detection frames, and the detection frame with a lower class probability participates in the subsequent loss calculation, thereby improving the obtained first loss. Accuracy.
  • the screening of the first candidate detection frame and the second candidate detection frame can be divided into two steps.
  • the third category information is used to select a preset number of first candidate detection frames whose category probabilities meet the first requirement from several first candidate detection frames, and select a category probability from several second candidate detection frames.
  • a preset number of second candidate detection frames meeting the first requirement For example, first according to the top-K strategy, that is, to find the K largest or smallest elements from the score set, and delete the detection frame with low confidence, for example, K can be 10, 20, 25, etc., when K is equal to 20 In the case of , that is, 20 detection frames are respectively selected from the first candidate detection frame and the second candidate detection frame.
  • the first candidate detection frame and the second candidate detection frame are respectively sorted according to the order of the probability of the category they belong to, and the first candidate detection frame and the second candidate detection frame are respectively selected in the probability of the category they belong to.
  • the probability of 20 corresponds to the first candidate detection frame or the second candidate detection frame.
  • the non-maximum value suppression method is used to screen a preset number of first and second candidate detection frames to obtain the first and second detection frames. Following the above example, the remaining 20 first and second candidate detection frames are screened in a non-maximum suppression manner to obtain the final first and second detection frames.
  • the first detection frame and the second detection frame are matched to obtain the first detection frame and the second detection frame representing the same object.
  • at least one first detection frame is matched with at least one second detection frame to obtain several matching frame pairs.
  • each matching frame pair includes a first detection frame and a second detection frame that matches the first detection frame, and different matching frame pairs include different detection frames, that is, a certain detection frame has at most one detection frame that matches the first detection frame.
  • Matching there is no detection frame that has multiple detection frames matching it.
  • the first detection frame and the second detection frame representing the same target can be determined, and the difference information between them can be used to obtain the second detection frame.
  • First loss so that the accuracy of obtaining the first loss is higher.
  • the matching method between the detection frames may be a bipartite graph matching method.
  • the process of the bipartite graph matching method is as follows: let C i,j be each first detection frame in the first detection frame group composed of the first detection frame and each of the second detection frame group composed of the second detection frame
  • the matching cost between the second detection frames, C i,j can be obtained by the following formula (1):
  • GIOU is used to represent the generalized intersection and overlap between two detection boxes.
  • the goal of bipartite graph matching is to find the optimal Boolean matching matrix M * of C i,j , M ⁇ 0,1 ⁇ N1 ⁇ N2 , where N1 represents the number of the first detection frame, and N2 represents the second detection frame The number of boxes.
  • M i,j 1 indicates that the first detection frame b i in the first detection frame group matches the second detection frame b j in the second detection frame group.
  • the best matching matrix M is obtained by solving the cost minimization problem of fixing C in the following formula (2):
  • M * denote the sets of non-zero-valued row and column indices as M' and N' respectively, and store the non-zero-valued index pairs in M * as tuples in the format ⁇ m,n>, Among them, m ⁇ M' and n ⁇ N', so that N index tuples can be obtained and expressed as a set T of size N, that is, the set T is a set composed of several matching frame pairs.
  • the manner of matching at least one first detection frame with at least one second detection frame to obtain several matching frame pairs may also be: respectively forming multiple candidate frame pairs with each first detection frame and each second detection frame. Based on the matching degree of each candidate frame pair, some candidate frame pairs are selected as matching frame pairs.
  • the matching degree may be the similarity between the first detection frame and the second detection frame. For example, if both the first detection frame and the second detection frame respectively include the first position information and the first category information, the matching degree may be the similarity and/or the first position information between the first detection frame and the second detection frame Or the similarity of the first category information.
  • the similarity of the first category information is comprehensively determined to determine the final similarity.
  • the matching degree between each first detection frame and each second detection frame may be obtained respectively, and some candidate frame pairs with high matching degree are selected as a detection frame matching group.
  • multiple matching methods can be selected, so that any detection frame in the matching result has at most one detection frame matching it, thereby improving the accuracy of obtaining the first detection frame and the second detection frame representing the same target.
  • the way to obtain the first loss of the target detection model includes: using each pair of matching frames between the first detection frame and the second detection frame The difference information, get the first loss.
  • the difference information of each matching frame pair is accumulated to obtain the first loss. That is, the difference information of all matching frame pairs is accumulated, and then the first loss is determined according to the accumulated difference information.
  • the difference information between the first detection frame and the second detection frame in each group of matching frame pairs includes at least one of the following: the first difference between the first detection frame and the second detection frame about the first position information and about the first The second difference in category.
  • the first difference is used to get the position loss
  • the second difference is used to get the class loss.
  • the position loss includes a distance loss and/or a detection frame intersection loss.
  • the formula for obtaining the category loss L cls may be the following formula (3):
  • N is the number of detection frame matching groups
  • p m represents the probability of the category to which the first detection frame belongs
  • p n represents the probability of the category to which the second detection frame belongs
  • 2 is the L2 norm
  • T represents the detection A collection of box matching groups.
  • the formula for obtaining the distance loss L dis may be the following formula (4):
  • N is the number of detection frame matching groups
  • b m represents the position of the first detection frame in the first sample image
  • b n represents the position of the second detection frame in the second sample image
  • 2 is the L2 norm
  • T represents the detection frame matching group.
  • the formula for obtaining the detection frame intersection ratio loss L giou can be the following formula (5):
  • the detection frame intersection ratio loss described in the embodiments of the present disclosure is a generalized intersection ratio loss between detection frames.
  • GIOU(b m , b n ) is used to represent the generalized intersection and coincidence degree between the first detection frame m and the second detection frame n.
  • the algorithm of the generalized cross-union ratio is the same as the generalized algorithm for calculating the generalized cross-union ratio.
  • D is the sample set
  • ⁇ 1 and ⁇ 2 are weights.
  • the step of adjusting the network parameters of the target detection model further includes: combining the first detection frame and the second At least one of the two detection frames is used as a target detection frame.
  • the first detection frame can be used as the target detection frame, or the second detection frame can be used as the target detection frame, or both the first detection frame and the second detection frame can be used as the target detection frame.
  • the first detection frame is selected as the target detection frame.
  • the first detection frame corresponds to real frame information
  • the second detection frame also has corresponding real frame information.
  • the second loss is obtained by using the difference between the target detection frame and the corresponding real frame information; wherein, the second loss can also be called a fully supervised loss.
  • the formula for obtaining the second loss L_sup may be the following formula (7):
  • L loc refers to the location loss between the object detection box and its corresponding ground truth box
  • L cls refers to the category loss between the object detection box and its corresponding ground truth box
  • L is a collection of labeled sample images.
  • P i here represents the category probability of the i-th target detection frame
  • B i represents the position of the i-th target detection frame
  • the second loss is determined, and then adjusted according to the second loss and the first loss.
  • the network parameters of the target detection model make the adjustment of the target detection model better.
  • the weighted first loss and the second loss are used to obtain the total loss of the target detection model. Among them, the more training times of the target detection model, the greater the weight of the first loss.
  • the formula for calculating the total loss by using the first loss and the second loss may be the following formula (8):
  • ⁇ (t) is a time-varying weight used to control the balance between L sup and L cons ; ⁇ (t) is a Gaussian early warning function.
  • t represents the current training step and tmax is the maximum training step.
  • the network parameters of the target model are adjusted.
  • the initial training of the target detection model is mainly full-supervised training, and then gradually increase the weight of the first loss to ensure the stability of the training process.
  • FIG. 2 is a second schematic flowchart of a method for training a target detection model provided by an embodiment of the present disclosure.
  • the first sample image 201 and the second sample image 202 are marked images, the second loss 210 is calculated by using the first classification map 204 and the first localization map 205 , that is, a fully supervised loss is calculated.
  • the full supervision loss is not calculated.
  • the second loss 210 may also be calculated by using the second classification map 206 and the second positioning map 207 .
  • the first detection frame 212 and the second detection frame 213 are matched using a bipartite graph matching method to obtain several matching frame pairs, and then the first loss 211 is obtained by using the matching frame pairs, that is, the consistency loss is obtained. The first loss 211 and the second loss 210 are then combined to obtain the total loss. Finally, the network parameters in the object detection model 203 are adjusted using the total loss. Of course, if the first sample image 201 and the second sample image 202 are unlabeled sample images, the first loss 211 is used to adjust the network parameters in the object detection model 203 .
  • the above solution obtains the first loss of the target detection model by obtaining the difference information between the detection frames containing the same target corresponding to the first sample image and the second sample image, and then uses the first loss to adjust the target detection
  • the network parameters of the model do not require a large number of labeled sample images, thereby reducing the training cost of the target detection model. There may be a large amount of invalid information in the first detection result and the second detection result.
  • the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result and The difference information between the second detection results obtains the first loss, which can improve the accuracy of the first loss.
  • the subject of the training method of the target detection model may be the training device of the target detection model, for example, the training method of the target detection model may be executed by a terminal device or a server or other processing equipment, wherein the terminal device may be a medical image analysis equipment, user equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (PDA), handheld device, computing device, vehicle-mounted device, wearable device, and automatic Driving cars, robots with positioning and mapping requirements, medical imaging systems with registration requirements, glasses, helmets and other products for augmented reality or virtual reality, etc.
  • the method for training an object detection model may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • FIG. 3 is a schematic flowchart of a target detection method provided by an embodiment of the present disclosure.
  • the target detection method may include the following steps:
  • Step S21 Obtain an image to be detected.
  • the image to be detected there are many ways to obtain the image to be detected.
  • it can be obtained by the execution device that executes the target detection method proposed in the embodiment of the present disclosure, or it can be taken by other devices and transmitted to the execution device by communication; it can be An image that has not been image-processed may also be an image that has been image-processed.
  • the way of image processing can be to adjust the brightness, resolution and so on.
  • the image to be detected may be a medical image.
  • the image to be detected may be a CT image or an X-ray image.
  • the target in the image to be detected may be a lung nodule or the like.
  • Step S22 Use the target detection model to perform target detection on the image to be detected, and obtain a target detection result.
  • the target detection model is trained by the above-mentioned target detection model training method.
  • the target detection result on the image to be detected can be obtained without manual detection, thereby improving the target detection efficiency of the image to be detected.
  • FIG. 4 is a schematic structural diagram of a training device for a target detection model provided by an embodiment of the present disclosure.
  • the training device 30 of the target detection model includes a detection module 31, a detection frame acquisition module 32, a loss acquisition module 33 and a parameter adjustment module 34;
  • the detection module 31 is configured to use a target detection model to respectively detect targets in the first sample image and the second sample image, and obtain a first detection result of the first sample image and a second detection result of the second sample image;
  • the loss acquisition module 33 is configured to determine at least one first detection frame in the first sample image based on the first detection result, and determine at least one second detection frame in the second sample image based on the second detection result, wherein the first The detection frame and the second detection frame are configured to represent the location of the target;
  • the loss acquisition module 33 is configured to obtain the first loss of the target detection model based on the difference information between the first detection frame and the second detection frame representing the same target;
  • the parameter adjustment module 34 is configured to use the first loss to adjust network parameters of the target detection model.
  • the first loss of the target detection model is obtained by obtaining the difference information between the detection frames containing the same target corresponding to the first sample image and the second sample image, and then using the first loss It is possible to adjust the network parameters of the target detection model without requiring a large number of labeled sample images, thereby reducing the training cost of the target detection model. There may be a large amount of invalid information in the first detection result and the second detection result.
  • the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result and The difference information between the second detection results obtains the first loss, which can improve the accuracy of the first loss.
  • the training device of the target detection model further includes a detection frame matching module
  • the loss acquisition module 33 is based on the difference information between the first detection frame and the second detection frame representing the same target
  • the detection frame matching module is configured to: match at least one first detection frame with at least one second detection frame to obtain several matching frame pairs, wherein each matching frame The pair includes a first detection frame and a second detection frame matching the first detection frame, and different matching frame pairs include different detection frames;
  • the loss acquiring module 33 is further configured to use the difference information between the first detection frame and the second detection frame in each matching frame pair to obtain the first loss.
  • the first detection frame and the second detection frame representing the same target can be determined, thereby utilizing the The difference information obtains the first loss, so that the accuracy of the acquired first loss is higher.
  • the matching method is a bipartite graph matching method; and/or, the loss acquisition module 33 is further configured to: respectively combine each first detection frame and each second detection frame into multiple Candidate frame pairs; based on the matching degree of each candidate frame pair, select some candidate frame pairs as matching frame pairs; and/or, in the case of multiple matching frame pairs, accumulate the difference information of each matching frame pair , to get the first loss.
  • multiple matching methods can be selected so that there is at most one detection frame matching any detection frame in the matching result, thereby improving the accuracy of obtaining the first detection frame and the second detection frame representing the same target.
  • both the first detection frame and the second detection frame correspond to the first position information and/or the first category information about the target;
  • the difference information includes at least one of the following: the first detection frame and the second detection frame A first difference about the first position information and a second difference about the first category information between the two detection frames.
  • the difference between the first detection frame and the second detection frame with regard to position and category is obtained as the difference between the first detection frame and the second detection frame, so that the acquired difference is more accurate.
  • the loss acquisition module 33 is further configured to use the first difference to obtain the position loss, and use the second difference to obtain the category loss; based on the position loss and the category loss, the first loss is obtained, wherein the position loss Include distance loss and/or detection box intersection loss.
  • the position loss is further divided into example loss and detection frame intersection and ratio loss, so that the obtained position loss is more accurate.
  • the first sample image and the second sample image come from a sample set
  • the sample images in the sample set include labeled sample images and/or unlabeled sample images
  • the labeled sample images are marked with target Real frame information, wherein the real frame information includes second position information and second category information about the target; and, when the first sample image and the second sample image are marked sample images, the parameter adjustment module 34. It is also configured to: use at least one of the first detection frame and the second detection frame as the target detection frame, and use the difference between the target detection frame and the corresponding real frame information to obtain the second loss; use the first loss and the second loss, to tune the network parameters of the object detection model.
  • the difference between the target detection frame and the real frame is also obtained to determine the second loss, and then according to the second loss and The first loss adjusts the network parameters of the object detection model, so that the adjustment effect of the object detection model is better.
  • the parameter adjustment module 34 is further configured to use the weighted first loss and the second loss to obtain the total loss of the target detection model, wherein, the more times the target detection model is trained, the first loss The greater the weight of ; based on the total loss, adjust the network parameters of the object detection model.
  • the initial training of the target detection model is mainly full-supervised training, and then the weight of the first loss is gradually increased to ensure the stability of the training process.
  • the detection frame acquisition module 32 is further configured to: decode the first detection result to obtain several first candidate detection frames, and decode the second detection result to obtain several second candidate detection frames ; Wherein, the first candidate detection frame and the second candidate detection frame are corresponding to the third position information and/or the third category information about the target; using the third position information and the first candidate detection frame and the second candidate detection frame /or the third type of information, the first detection frame candidate and the second detection frame candidate are screened to obtain the first detection frame and the second detection frame.
  • the category probability of the candidate detection frame is low, and there may be multiple candidate detection frames
  • the overlapping detection frames can be reduced, and the detection frame with a lower class probability will participate in the subsequent loss calculation, thereby improving the acquired The accuracy of the first loss.
  • the detection frame acquisition module 32 is further configured to: use the third category information to select a preset number of first candidate detection frames whose category probability meets the first requirement from several first candidate detection frames , and select a preset number of second candidate detection frames whose class probability meets the first requirement from several second candidate detection frames; based on the third position information, use the non-maximum value suppression method to detect frame and the second candidate detection frame to obtain the first detection frame and the second detection frame.
  • the category probability to filter the candidate detection frames first, and then use the location information to perform secondary screening, it can reduce the overlapping of detection frames and reduce the participation of detection frames with lower class probability in subsequent loss calculations. , thereby improving the accuracy of obtaining the obtained first loss.
  • the first detection result includes a first classification map and a first positioning map corresponding to the first sample image
  • the second detection result includes a second classification map and a first positioning map corresponding to the second sample image.
  • the second positioning map; and/or, the second sample image is obtained by processing the first sample image; and/or, the first sample image and the second sample image are medical images.
  • the corresponding second sample image is obtained by processing the first sample image, so that the first sample image and the second sample image include at least one same object.
  • the first loss of the target detection model is obtained by obtaining the difference information between the detection frames containing the same target corresponding to the first sample image and the second sample image, and then the target can be adjusted by using the first loss
  • the network parameters of the detection model do not require a large number of labeled sample images, thereby reducing the training cost of the target detection model. There may be a large amount of invalid information in the first detection result and the second detection result.
  • the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result and The difference information between the second detection results obtains the first loss, which can improve the accuracy of the first loss.
  • FIG. 5 is a schematic structural diagram of an object detection device provided by an embodiment of the present disclosure.
  • the target detection device 40 includes an image acquisition module 41 and a target detection module 42; wherein,
  • An image acquisition module 41 configured to acquire an image to be detected
  • the target detection module 42 is configured to use a target detection model to perform target detection on the image to be detected to obtain a target detection result, wherein the target detection model is obtained by training a target detection model training method.
  • the target detection result on the image to be detected can be obtained without manual detection, thereby improving the target detection efficiency of the image to be detected.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device 50 includes a memory 51 and a processor 52, and the processor 52 is configured to execute the program instructions stored in the memory 51, so as to realize the above-mentioned steps in the embodiment of the method for training the object detection model and/or the embodiment of the method for object detection.
  • the electronic device 50 may include, but is not limited to: a microcomputer and a server.
  • the electronic device 50 may also include mobile devices such as notebook computers and tablet computers, which are not limited herein.
  • the processor 52 is configured to control itself and the memory 51 to implement the steps in the above embodiment of the method for training an object detection model.
  • the processor 52 may also be called a CPU (Central Processing Unit, central processing unit).
  • the processor 52 may be an integrated circuit chip with signal processing capability.
  • the processor 52 can also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field-programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the processor 52 may be jointly realized by an integrated circuit chip.
  • the above solution obtains the first loss of the target detection model by obtaining the difference information between the detection frames containing the same target corresponding to the first sample image and the second sample image, and then uses the first loss to adjust the target detection
  • the network parameters of the model do not require a large number of labeled sample images, thereby reducing the training cost of the target detection model. There may be a large amount of invalid information in the first detection result and the second detection result.
  • the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result and The difference information between the second detection results obtains the first loss, which can improve the accuracy of the first loss.
  • FIG. 7 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present disclosure.
  • the computer-readable storage medium 60 stores program instructions 601 that can be executed by the processor, and the program instructions 601 are configured to implement the steps in the above-mentioned embodiment of the method for training the object detection model and/or the embodiment of the method for object detection.
  • the above solution obtains the first loss of the target detection model by obtaining the difference information between the detection frames containing the same target corresponding to the first sample image and the second sample image, and then uses the first loss to adjust the target detection
  • the network parameters of the model do not require a large number of labeled sample images, thereby reducing the training cost of the target detection model. There may be a large amount of invalid information in the first detection result and the second detection result.
  • the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result and The difference information between the second detection results obtains the first loss, which can improve the accuracy of the first loss.
  • An embodiment of the present disclosure provides a computer program product, including computer readable code.
  • the processor in the electronic device performs the training of the above-mentioned target detection model. Steps in method embodiments and/or target detection method embodiments.
  • the functions or included modules of the apparatus provided in the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the implementation may refer to the descriptions of the above method embodiments.
  • the disclosed methods and devices may be implemented in other ways.
  • the device implementations described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the essence of the technical solution of the present disclosure or the part that contributes to the related technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
  • a computer device which may be a personal computer, a server, or a network device, etc.
  • a processor processor
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the first loss of the target detection model is obtained, and then the network of the target detection model can be adjusted by using the first loss parameters, without the need for a large number of labeled sample images, thereby reducing the training cost of the target detection model.
  • the embodiment of the present disclosure obtains the first loss based on the difference information between the first detection frame and the second detection frame, rather than based on the first detection result and The difference information between the second detection results obtains the first loss, which can improve the accuracy of the first loss.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供了一种目标检测方法及其模型训练方法、相关装置、介质及程序产品。该目标检测模型的训练方法包括:利用目标检测模型分别对第一样本图像和第二样本图像中的目标进行检测,得到第一样本图像的第一检测结果和第二样本图像的第二检测结果;基于第一检测结果确定第一样本图像中的至少一个第一检测框,基于第二检测结果确定第二样本图像中的至少一个第二检测框,其中,第一检测框和第二检测框用于表示目标的位置;基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失;利用第一损失,调整目标检测模型的网络参数。上述方案,能够减少目标检测模型的训练成本。

Description

目标检测方法及其模型训练方法、相关装置、介质及程序产品
相关申请的交叉引用
本公开基于申请号为202110529645.4、申请日为2021年05月14日、申请名称为“目标检测方法及其模型的训练方法和相关装置及介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本公开。
技术领域
本公开涉及图像处理技术领域,尤其涉及一种目标检测方法及其模型训练方法、相关装置、介质及程序产品。
背景技术
一般地,对于网络模型进行训练的方法可以包括以下几个步骤,首先,将样本图像输入网络模型,获取网络模型的输出结果,然后获取输出结果与样本图像中的标注信息之间的误差,然后根据该误差调整网络模型中的参数。这种方式存在的问题是对网络模型的训练需要建立大量的已标注样本图像,一般都是采用人工对样本图像进行标注,获取大量的已标注样本图像的过程耗时耗力。
发明内容
本公开实施例提供一种目标检测方法及其模型训练方法、相关装置、介质及程序产品。
第一方面,本公开实施例提供了一种目标检测模型的训练方法,包括:利用目标检测模型分别对第一样本图像和第二样本图像中的目标进行检测,得到第一样本图像的第一检测结果和第二样本图像的第二检测结果;基于第一检测结果确定第一样本图像中的至少一个第一检测框,基于第二检测结果确定第二样本图像中的至少一个第二检测框,其中,第一检测框和第二检测框用于表示目标的位置;基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失;利用第一损失,调整目标检测模型的网络参数。
本公开实施例通过获取第一样本图像和第二样本图像对应的包含相同的目标的检测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。第一检测结果和第二检测结果中可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。
在一种可能的实现方式中,在基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失之前,方法还包括:将至少一个第一检测框和至少一个第二检测框进行匹配,得到若干个匹配框对,其中,每个匹配框对包括第一检测框和与第一检测框匹配的第二检测框,且不同的匹配框对包括的检测框不同;以及,基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失,包括:利用每个匹配框对中第一检测框和第二检测框之间的差异信息,得到第一损失。
因此,通过将至少一个第一检测框和至少一个第二检测框进行匹配得到匹配框对,从而能够确定表示相同目标的第一检测框和第二检测框,从而利用二者之间的差异信息得到第一损失,使得获取得到的第一损失的准确度更高。
在一种可能的实现方式中,匹配的方式为二部图匹配方式;和/或,将至少一个第一检测框和至少一个第二检测框进行匹配,得到若干个匹配框对,包括:分别将各第一检测框与各第二检测框组成多个候选框对;基于各候选框对的匹配程度,选择部分的候选框对作为匹配框对;和/或,利用每个匹配框对中第一检测框和第二检测框之间的差异信息,得到第一损失,包括:在匹配框对为多个的情况下,将每个匹配框对的差异信息进行累加,以得到第一损失。
因此,可以选择使用多种匹配方式,使得匹配结果中任意检测框最多只有一个检测框与之匹配,从而提高获取表示相同目标的第一检测框和第二检测框的准确度。
在一种可能的实现方式中,第一检测框和第二检测框均对应有关于目标的第一位置信息和/或第一类别信息;差异信息包括以下至少一者:第一检测框和第二检测框之间关于第一位置信息的第一差异和关于第一类别信息的第二差异。
因此,通过获取第一检测框和第二检测框关于位置和类别的差异,作为第一检测框和第二检测框之间的差异,使得获取到的差异更准确。
在一种可能的实现方式中,基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失,包括:利用第一差异得到位置损失,并利用第二差异得到类别损失;基于位置损失和类别损失,得到第一损失,其中,位置损失包括距离损失和/或检测框交并比损失。
因此,将位置损失还分为举例损失和检测框交并比损失,使得获取到的位置损失更准确。
在一种可能的实现方式中,第一样本图像和第二样本图像来自于样本集,样本集中的样本图像包括已标注样本图像和/或未标样本图像,已标注样本图像标注有目标的真实框信息,其中,真实框信息包括关于目标的第二位置信息和第二类别信息;以及,在第一样本图像和第二样本图像为已标注样本图像的情况下,利用第一损失,调整目标检测模型的网络参数,包括:将第一检测框和第二检测框中的至少一者作为目标检测框,利用目标检测框与对应的真实框信息之间的差异,得到第二损失;利用第一损失和第二损失,调整目标检测模型的网络参数。
因此,通过在第一样本图像和第二样本图像为已标注样本图像的情况下,还获取目标检测框和真实框之间的差异,从而确定第二损失,进而根据第二损失和第一损失调整目标检测模型的网络参数,使得对目标检测模型的调整效果更好。
在一种可能的实现方式中,利用第一损失和第二损失,调整目标检测模型的网络参数,包括:利用加权后的第一损失与第二损失得到目标检测模型的总损失,其中,目标检测模型的训练次数越多,第一损失的权重越大;基于总损失,调整目标检测模型的网络参数。
因此,通过根据目标检测模型的训练次数与第一损失的权重,使得目标检测模型的训练初期以全监督训练为主,然后逐渐增大第一损失的权重,以保证训练过程的稳定。
在一种可能的实现方式中,基于第一检测结果确定第一样本图像中的至少一个第一检测框,基于第二检测结果确定第二样本图像中的至少一个第二检测框,包括:对第一检测结果进行解码,得到若干第一候选检测框,对第二检测结果进行解码,得到若干第二候选检测框;其中,第一候选检测框和第二候选检测框均对应有关于目标的第三位置信息和/或第三类别信息;利用第一候选检测框和第二候选检测框的第三位置信息和/或第三类别信息,对第一候选检测框和第二候选检测框进行筛选,得到第一检测框和第二检测框。
因此,通过对第一检测结果和第二检测结果进行解码得到第一候选检测框和第二候选检测框,可能出现候选检测框中类别概率较低,且有可能存在多个候选检测框表示同一目标的情况,通过对第一候选检测框和第二候选检测框进行筛选,能够减少检测框相互重叠的情况,以及类别概率较低的检测框参与后续的损失计算,从而提高获取得到的第一损失的准确度。
在一种可能的实现方式中,利用第一候选检测框和第二候选检测框的第三位置信息和/或第三类别信息,对第一候选检测框和第二候选检测框进行筛选,得到第一检测框和第二检测框,包括:利用第三类别信息,从若干第一候选检测框中选择类别概率满足第一要求的预设数量个第一候选检测框,以及从若干第二候选检测框中选择类别概率满足第一要求的预设数量个第二候选检测框;基于第三位置信息,利用非极大值抑制方式从预设数量个第一候选检测框和第二候选检测框进行筛选,得到第一检测框和第二检测框。
因此,通过先利用类别概率对候选检测框进行一次筛选之后,再使用位置信息进行二次筛选,能够减少检测框相互重叠的情况,以及减少类别概率较低的检测框参与后续的损失计算,从而提高获取得到的第一损失的准确度。
在一种可能的实现方式中,第一检测结果包括与第一样本图像对应的第一分类图和第一定位图,第二检测结果包括与第二样本图像对应的第二分类图和第二定位图;和/或,第二样本图像是对第一样本图像处理得到的;和/或,第一样本图像和所述第二样本图像为医学图像。
因此,通过对第一样本图像进行处理得到对应的第二样本图像,从而使得第一样本图像和第二样本图像包括至少一个相同的目标。
第二方面,本公开实施例提供了一种目标检测方法,包括:获取待检测图像;利用目标检测模型对待检测图像进行目标检测,得到目标检测结果,其中,目标检测模型是由上述第一方面所述的目标检测模型的训练方法训练得到的。
本公开实施例中,通过利用目标检测模型对待检测图像进行目标检测,可以得到关于待检测图像的目标检测结果,无需人工检测,从而提高了对待检测图像的目标检测效率。
第三方面,本公开实施例提供了一种目标检测模型的训练装置,包括:检测模块,配置为利用目标检测模型分别对第一样本图像和第二样本图像中的目标进行检测,得到第一样本图像的第一检测结果和第二样本图像的第二检测结果;检测框获取模块,配置为基于第一检测结果确定第一样本图像中的至少一个第一检测框,基于第二检测结果确定第二样本图像中的至少一个第二检测框,其中,第一检测框和第二检测框配置为表示目标的位置;损失获取模块,配置为基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失;参数调整模块,配置为利用第一损失,调整目标检测模型的网络参数。
第四方面,本公开实施例提供了一种目标检测装置,包括:图像获取模块,配置为获取待检测图像;目标检测模块,配置为利用目标检测模型对待检测图像进行目标检测,得到目标检测结果,其中,目标检测模型是由上述第一方面所述的目标检测模型的训练方法训练得到的。
第五方面,本公开实施例提供了一种电子设备,包括存储器和处理器,处理器配置为执行存储器中存储的程序指令,以实现上述第一方面所述的目标检测模型的训练方法。
第六方面,本公开实施例提供了一种计算机可读存储介质,其上存储有程序指令,程序指令被处理器执行时实现上述第一方面所述的目标检测模型的训练方法。
第七方面,本公开实施例提供了一种计算机程序产品,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行如第一方面所述的目标检测模型的训练方法。
本公开实施例中,通过获取第一样本图像和第二样本图像对应的包含相同的目标的检测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。第一检测结果和第二检测结果中可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开实施例。
为使本公开实施例的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1是本公开实施例所提供的目标检测模型的训练方法的流程示意图一;
图2是本公开实施例所提供的目标检测模型的训练方法的流程示意图二;
图3是本公开实施例所提供的目标检测方法的流程示意图;
图4是本公开实施例所提供的目标检测模型的训练装置的结构示意图;
图5是本公开实施例所提供的目标检测装置的结构示意图;
图6是本公开实施例所提供的电子设备的结构示意图;
图7是本公开实施例所提供的计算机可读存储介质的结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行定义和解释。
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种” 表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
请参阅图1,图1是本公开实施例所提供的目标检测模型的训练方法的流程示意图一,该目标检测模型的训练方法可以包括如下步骤:
步骤S11:利用目标检测模型分别对第一样本图像和第二样本图像中的目标进行检测,得到第一样本图像的第一检测结果和第二样本图像的第二检测结果。
其中,本公开实施例中需要训练的目标检测模型可以是任意一种能够用于目标检测的网络模型。因此,本公开实施例不对目标检测模型做限定。
本公开实施例中,第一样本图像和第二样本图像可以包括至少一个相同的目标。即第一样本图像和第二样本图像可以是针对同一目标拍摄得到。在一些实施方式中,第一样本图像可以是根据第二样本图像进行一系列的图像处理得到,或者第二样本图像是根据第一样本图像进行一系列的图像处理得到。图像处理可以是裁剪、旋转、数据增强等等。关于图像处理的方式此处不做规定。并且,本公开实施例中的第一样本图像和第二样本图像可以是二维(two-dimensional,2D)图,也可以是三维(three-dimensional,3D)图。
步骤S12:基于第一检测结果确定第一样本图像中的至少一个第一检测框,基于第二检测结果确定第二样本图像中的至少一个第二检测框,其中,第一检测框和第二检测框用于表示目标的位置。
在一些实施方式中,得到的第一检测框和第二检测框均用于表示目标在第一样本图像或第二样本图像中的位置。
其中,基于第一检测结果确定第一样本图像中的至少一个第一检测框的方式可以是通过对第一检测结果进行解码等操作,得到对应的第一检测框。同理,基于第二检测结果确定第二样本图像中的至少一个第二检测框的方式可以是通过对第二检测结果进行解码等操作,得到对应的第二检测框。
步骤S13:基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失。
一般地,目标检测结果一般包括分类结果和定位结果。也就是需要找到样本图像中目标所属的类别以及目标所在位置。即本公开实施例中所述的表示相同目标的第一检测框和第二检测框之间的差异信息可以是分类差异和定位差异中的一个,也可以是分类差异和定位差异中的两个,因此,此处不对差异信息做规定。其中,第一损失还可称之为一致性损失。
步骤S14:利用第一损失,调整目标检测模型的网络参数。
本公开实施例中,因为第一样本图像和第二样本图像之间存在联系,即某一样本图像是根据另一样本图像得到。因此,理论上目标检测模型对第一样本图像和第二样本图像的目标检测结果应该是一样的,或者说二者之间的误差理应小于或等于预设误差。也就是二者之间的损失应该很小,是在可以接受的范围内。如果第一损失较大,不在可接受的范围内的情况下,可以根据第一损失调整目标检测模型的网络参数,使得后续目标检测模型对第一样本图像和第二样本图像之间的损失小于预设损失,即目标检测模型对第一样本图像和第二样本图像之间的目标检测结果基本相同。
上述方案,通过获取第一样本图像和第二样本图像对应的包含相同的目标的检测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。由于第一检测结果和第二检测结果中可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。
一些场景中,先使用已标注样本图像对目标检测模型进行训练,然后逐渐加入未标注样本图像对目标检测模型进行训练,也就是使用半监督训练方式对目标检测模型进行训练。一般半监督训练方式包括如下步骤:首先,使用已标注样本图像对目标检测模型进行训练,在训练一段时间后,使用检测网络对未标注样本图像进行预测得到预测结果,然后将标注有预测结果的图像加入其余已标注样本图像中,并输入目标检测模型对目标检测模型进行训练。而本公开实施例提供的技术方案中,若输入目标检测模型中的是未标注样本图像,则可以使用未标注样本图像与其包含至少一个目标的样本图像之间的一致性损失对目标检测模型进行训练,即无需将带有预测结果的图像再次输入检测网络中进行训练,从而简化了训练步骤。
一些公开实施例中,第一样本图像和第二样本图像来自于样本集。其中,样本集中的样本图像包括已标注样本图像和/或未标注样本图像。已标注样本图像标注有目标的真实框信息,其中,真实框信息包括关于目标的第二位置信息和第二类别信息。其中,第二位置信息指的是目标在样本图像 中的位置信息,其中,第二类别信息指的是目标所属的类别。其中,第一样本图像和第二样本图像为医学图像。例如,可以是CT图,也可以是经过其他医学设备获取的图像。一些应用场景中,若样本图像为医学图像,目标所属的类别可以是常见的病变,例如肺结节等。第二样本图像是对第一样本图像处理得到的。例如,对第一样本图像进行裁剪,旋转、调整图像亮度等等。其中,这里的裁剪可以是对第一样本图像任意区域的裁剪,例如可以是对第一样本图像的边缘区域进行裁剪,也可以是对第一样本图像的中间区域进行裁剪。其中,被裁剪的区域可以使用预设像素值进行填充,例如使用0像素进行填充,也可以是使用其他样本图像中的某一区域进行填充,关于如何对被裁剪区域进行填充本公开实施例不做规定。通过对第一样本图像进行处理得到对应的第二样本图像,从而使得第一样本图像和第二样本图像包括至少一个相同的目标。
一些公开实施例中,第一检测结果包括与第一样本图像对应的第一分类图和第一定位图,第二检测结果包括与第二样本图像对应的第二分类图和第二定位图。其中,第一分类图和第二分类图中分别包括输出网格的目标分类编号。第一定位图和第二定位图分别包括第一检测框或第二检测框的上边界、下边界、左边界以及右边界。本公开实施例中,每个检测框均可以使用一个四维向量表示。
一些公开实施例中,第一检测框和第二检测框均对应有关于目标的第一位置信息和/或第一类别信息。类别信息还可包括所属类别的概率。例如,假设第一检测框属于第一目标的概率为0.7,属于背景的概率为0.3,这种情况下,第一检测框的第一类别信息中可以包括第一检测框属于第一目标,且属于第一目标的概率为0.7。
其中,基于第一检测结果确定第一样本图像中的至少一个第一检测框,以及基于第二检测结果确定第二样本图像中的至少一个第二检测框的步骤包括:一、对第一检测结果进行解码,得到若干第一候选检测框,以及对第二检测结果进行解码,得到若干第二候选检测框。其中,本公开实施例中的若干可以是1,也可以是2及以上等。其中,第一候选检测框和第二候选检测框均对应有关于目标的第三位置信息和/或第三类别信息。其中,解码的方式可参见一般对目标检测模型的输出结果进行解码的方式。二、利用第一候选检测框和第二候选检测框的第三位置信息和/或第三类别信息,对第一候选检测框和第二候选检测框进行筛选,得到第一检测框和第二检测框。
通过对第一检测结果和第二检测结果进行解码得到第一候选检测框和第二候选检测框,可能出现候选检测框中类别概率较低,且有可能存在多个候选检测框表示同一目标的情况,通过对第一候选检测框和第二候选检测框进行筛选,能够减少检测框相互重叠的情况,以及类别概率较低的检测框参与后续的损失计算,从而提高获取得到的第一损失的准确度。
在一些实施方式中,可分为两个步骤对第一候选检测框和第二候选检测框进行筛选。在一些实施方式中,利用第三类别信息,从若干第一候选检测框中选择类别概率满足第一要求的预设数量个第一候选检测框,以及从若干第二候选检测框中选择类别概率满足第一要求的预设数量个第二候选检测框。例如,先根据top-K策略,即从分数集合中找到K个最大或最小元素的方式,将低置信度的检测框删除,例如,K可以是10、20、25等等,当K等于20的情况下,也就是分别从第一候选检测框和第二候选检测框中选择20个检测框。在一些实施方式中,第一候选检测框和第二候选检测框分别按照所属类别概率从高到低的顺序排序,分别选择第一候选检测框和第二候选检测框中所属类别的概率中前20的概率对应的第一候选检测框或第二候选检测框。基于第三位置信息,利用非极大值抑制方式从预设数量个第一候选检测框和第二候选检测框进行筛选,得到第一检测框和第二检测框。继上例,对留下的20个第一候选检测框和第二候选检测框分别按照非极大值抑制方式进行筛选,得到最终的第一检测框和第二检测框。
通过先利用类别概率对候选检测框进行一次筛选之后,再使用位置信息进行二次筛选,能够减少检测框相互重叠的情况,以及减少类别概率较低的检测框参与后续的损失计算,从而提高获取得到的第一损失的准确度。
一些公开实施例中,在基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失之前,需要获取表示相同的目标的第一检测框和第二检测框。在一些实施方式中,将第一检测框和第二检测框进行匹配,得到用于表示相同的目标的第一检测框和第二检测框。在一些实施方式中,将至少一个第一检测框和至少一个第二检测框进行匹配,得到若干个匹配框对。其中,每个匹配框对包括第一检测框和与第一检测框匹配的第二检测框,且不同的匹配框对包括的检测框不同,也即是某一检测框最多只有一个检测框与其匹配,不存在一个检测框有多个检测框与之匹配。通过将至少一个第一检测框和至少一个第二检测框进行匹配得到匹配框对,从而能够确定表示相同目标的第一检测框和第二检测框,从而利用二者之间的差异信息得到第一损失,使得获取得到的第一损失的准确度更高。
其中,检测框之间的匹配方式可以是二部图匹配方式。
其中,二部图匹配方式的过程如下:令C i,j为由第一检测框构成的第一检测框组中各第一检测框与由第二检测框构成的第二检测框组中各第二检测框之间的匹配成本,C i,j可通过以下公式(1)得到:
Figure PCTCN2021137203-appb-000001
其中,
Figure PCTCN2021137203-appb-000002
如上述,GIOU用于表示两个检测框之间的广义相交重合。
二部图匹配的目标就是为了找到C i,j的最优布尔匹配矩阵M *,M∈{0,1} N1×N2,其中,N1表示第一检测框的个数,N2表示第二检测框的个数。其中M i,j=1表示第一检测框组中的第一检测框b i和第二检测框组中的第二检测框b j匹配。最佳匹配矩阵M是通过解决以下公式(2)固定C的成本最小化问题而获得的:
Figure PCTCN2021137203-appb-000003
其中,M *被约束为具有N=min(N1,N2)个非零值,并且每一行和每一列最多具有一个非零值。对于最佳结果M *,将非零值的行和列索引集分别表示为M'和N',以<m,n>的格式将非零值的索引对存储在M *中作为元组,其中m∈M'和n∈N',从而可以获得N个索引元组,并将其表示为大小为N的集合T,即集合T为若干个匹配框对构成的集合。
将至少一个第一检测框和至少一个第二检测框进行匹配,得到若干个匹配框对的方式还可以是:分别将各第一检测框与各第二检测框组成多个候选框对。基于各候选框对的匹配程度,选择部分的候选框对作为匹配框对。其中,匹配程度可以是第一检测框和第二检测框之间的相似度。例如,若第一检测框和第二检测框均分别包括第一位置信息和第一类别信息,匹配程度可以是第一检测框和第二检测框之间的第一位置信息的相似度和/或第一类别信息的相似度。在一些实施方式中,为保障第一检测框和第二检测框之间的匹配更为准确,此处可以选择结合第一检测框和第二检测框之间的第一位置信息的相似度和第一类别信息的相似度综合确定最终的相似度。其中,可以分别获取各第一检测框和各第二检测框之间的匹配程度,选择匹配程度高的部分候选框对作为检测框匹配组。当然,这里也同样需要满足一个检测框只存在于一个检测框组中。
因此,可以选择使用多种匹配方式,使得匹配结果中任意检测框最多只有一个检测框与之匹配,从而提高获取表示相同目标的第一检测框和第二检测框的准确度。
基于表示相同目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失的方式包括:利用每个匹配框对中第一检测框和第二检测框之间的差异信息,得到第一损失。在一些实施方式中,在匹配框对为多个的情况下,将每个匹配框对的差异信息进行累加,以得到第一损失。也就是,将所有匹配框对的差异信息进行累加,然后再根据累加得到的差异信息确定第一损失。在其他实施例中,还可以是分别计算每个匹配框对的第一损失,再将所有匹配框对的第一损失进行累加,得到总的第一损失,其中,第一检测框和第二检测框均对应有关于目标的第一位置信息和/或第一类别信号。每组匹配框对中第一检测框和第二检测框之间的差异信息包括以下至少一者:第一检测框和第二检测框之间关于第一位置信息的第一差异和关于第一类别的第二差异。通过获取第一检测框和第二检测框关于位置和类别的差异,作为第一检测框和第二检测框之间的差异,使得获取到的差异更准确。然后,利用第一差异得到位置损失,并利用第二差异得到类别损失。其中,位置损失包括距离损失和/或检测框交并比损失,通过将位置损失还分为举例损失和检测框交并比损失,使得获取到的位置损失更准确。
在一些实施方式中,获取类别损失L cls的公式可以是以下公式(3):
Figure PCTCN2021137203-appb-000004
其中,N为检测框匹配组的个数,p m表示第一检测框所属类别的概率,p n表示第二检测框所属类别的概率,||·|| 2是L2范数,T表示检测框匹配组构成的集合。
获取距离损失L dis的公式可以是以下公式(4):
Figure PCTCN2021137203-appb-000005
其中,这里的N为检测框匹配组的个数,b m表示第一检测框在第一样本图像中的位置,b n表示第二检测框在第二样本图像中的位置,||·|| 2是L2范数,T表示检测框匹配组。
获取检测框交并比损失L giou的公式可以是以下公式(5):
Figure PCTCN2021137203-appb-000006
本公开实施例所述的检测框交并比损失是检测框之间的广义交并比损失。其中,GIOU(b m,b n)用于表示第一检测框m与第二检测框n之间的广义相交重合度。其中,广义交并比的算法如同一般计算广义交并比的算法。
由此,获取第一损失的公式为以下公式(6):
Figure PCTCN2021137203-appb-000007
其中,D为样本集,β 1和β 2为权重。
一些公开实施例中,在第一样本图像和第二样本图像为已标注样本图像的情况下,利用第一损失,调整目标检测模型的网络参数的步骤还包括:将第一检测框和第二检测框中的至少一者作为目标检测框。例如,将第一检测框作为目标检测框,或将第二检测框作为目标检测框,或同时将第一检测框和第二检测框作为目标检测框均可。本公开实施例中选择将第一检测框作为目标检测框。其中,在同时将第一检测框和第二检测框作为目标检测框的情况下,第一检测框对应有真实框信息,第二检测框也有对应的真实框信息。利用目标检测框与对应的真实框信息之间的差异,得到第二损失;其中,第二损失还可称之为全监督损失。在一些实施方式中,获取第二损失L_sup的公式可以是以下公式(7):
Figure PCTCN2021137203-appb-000008
其中,L loc指的是目标检测框与其对应的真实框之间的位置损失,L cls指的是目标检测框与其对应的真实框之间的类别损失。L为已标注样本图像集合。其中,这里的P i表示第i个目标检测框的类别概率,B i表示第i个目标检测框的位置,
Figure PCTCN2021137203-appb-000009
表示第i个目标检测框对应的真实框的类别概率,
Figure PCTCN2021137203-appb-000010
表示第i个目标检测框对应的真实框的位置。
利用第一损失和第二损失,调整目标模型的网络参数。通过在第一样本图像和第二样本图像为已标注样本图像的情况下,还获取目标检测框和真实框之间的差异,从而确定第二损失,进而根据第二损失和第一损失调整目标检测模型的网络参数,使得对目标检测模型的调整效果更好。
在一些实施方式中,利用加权后的第一损失与第二损失得到目标检测模型的总损失。其中,目标检测模型的训练次数越多,第一损失的权重越大。
其中,利用第一损失和第二损失计算总损失的公式可以是以下公式(8):
L total=L sup+σ(t)L cons          (8)
其中,σ(t)是随时间变化的权重,用于控制L sup和L cons之间的平衡;σ(t)是高斯预警函数,在一些实施方式中,
Figure PCTCN2021137203-appb-000011
其中,t表示当前训练步骤以及tmax是最大训练步骤。
基于总损失,调整目标模型的网络参数。通过根据目标检测模型的训练次数与第一损失的权重,使得目标检测模型的训练初期以全监督训练为主,然后逐渐增大第一损失的权重,以保证训练过程的稳定。
为更好理解本公开实施例提出目标检测模型训练的方法,请参考下例。请参见图2,图2是本公开实施例所提供的目标检测模型的训练方法的流程示意图二。
如图2所示,在将第一样本图像201输入目标检测模型203之前,对第一样本图像201进行图像处理,得到第二样本图像202,然后将第一样本图像201和第二样本图像202输入目标检测模型 203,得到第一检测结果和第二检测结果,其中,第一检测结果包括第一分类图204和第一定位图205,第二检测结果包括第二分类图206和第二定位图207。当第一样本图像201和第二样本图像202为已标注图像的情况下,利用第一分类图204和第一定位图205计算第二损失210,即计算全监督损失。其中,在第一样本图像201和第二样本图像202为未标注图像的情况下,则不计算全监督损失。当然,其他实施例中,也可以用第二分类图206和第二定位图207计算第二损失210。通过对第一检测结果进行解码,得到若干个第一候选检测框208,对第二检测结果进行解码,得到若干个第二候选检测框209。然后对若干个第一候选检测框208和若干个第二候选检测框209进行筛选,得到若干个第一检测框212和若干个第二检测框213。使用二部图匹配方式对第一检测框212和第二检测框213进行匹配,得到若干个匹配框对,然后利用匹配框对获取第一损失211,即获取一致性损失。然后结合第一损失211和第二损失210得到总损失。最后,利用总损失调整目标检测模型203中的网络参数。当然,如果第一样本图像201和第二样本图像202为未标注样本图像,则使用第一损失211调整目标检测模型203中的网络参数。
上述方案,通过获取第一样本图像和第二样本图像对应的包含相同的目标的检测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。第一检测结果和第二检测结果中可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。
目标检测模型的训练方法的执行主体可以是目标检测模型的训练装置,例如,目标检测模型的训练方法可以由终端设备或服务器或其它处理设备执行,其中,终端设备可以为用于医学图像分析的设备、用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备以及自动驾驶汽车,有定位及建图需求的机器人,有配准需求的医疗成像系统,用于增强现实或虚拟现实的眼镜、头盔等产品等。在一些可能的实现方式中,该目标检测模型的训练方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
请参见图3,图3是本公开实施例所提供的目标检测方法的流程示意图,目标检测方法可以包括如下步骤:
步骤S21:获取待检测图像。
其中,获取待检测图像的方式有多种,例如,通过执行本公开实施例提出的目标检测方法的执行设备拍摄得到,也可以是由其他设备拍摄并以通信的方式传输给执行设备;可以是未经图像处理过的图像,也可以是经过图像处理过的图像。图像处理的方式可以是调整亮度、分辨率等等。
一些应用场景中,待检测图像可以是医学图像。例如,待检测图像可以是CT图像,也可以是X射线图像。待检测图像中的目标可以是肺结节等。
步骤S22:利用目标检测模型对待检测图像进行目标检测,得到目标检测结果。
其中,目标检测模型是由上述目标检测模型的训练方法训练得到的。
上述方案,通过利用目标检测模型对待检测图像进行目标检测,可以得到关于待检测图像的目标检测结果,无需人工检测,从而提高了对待检测图像的目标检测效率。
请参阅图4,图4是本公开实施例所提供的目标检测模型的训练装置的结构示意图。图4中,目标检测模型的训练装置30包括检测模块31、检测框获取模块32、损失获取模块33以及参数调整模块34;
检测模块31,配置为利用目标检测模型分别对第一样本图像和第二样本图像中的目标进行检测,得到第一样本图像的第一检测结果和第二样本图像的第二检测结果;
损失获取模块33,配置为基于第一检测结果确定第一样本图像中的至少一个第一检测框,基于第二检测结果确定第二样本图像中的至少一个第二检测框,其中,第一检测框和第二检测框配置为表示目标的位置;
损失获取模块33,配置为基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失;
参数调整模块34,配置为利用第一损失,调整目标检测模型的网络参数。
上述公开的实施例的方案,通过获取第一样本图像和第二样本图像对应的包含相同的目标的检 测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。第一检测结果和第二检测结果中可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。
在一种可能的实现方式中,所述目标检测模型的训练装置还包括检测框匹配模块,损失获取模块33在基于表示相同的目标的第一检测框和第二检测框之间的差异信息,得到目标检测模型的第一损失之前,所述检测框匹配模块,配置为:将至少一个第一检测框和至少一个第二检测框进行匹配,得到若干个匹配框对,其中,每个匹配框对包括第一检测框和与第一检测框匹配的第二检测框,且不同的匹配框对包括的检测框不同;
损失获取模块33,还配置为利用每个匹配框对中第一检测框和第二检测框之间的差异信息,得到第一损失。
该实现方式,通过将至少一个第一检测框和至少一个第二检测框进行匹配得到匹配框对,从而能够确定表示相同目标的第一检测框和第二检测框,从而利用二者之间的差异信息得到第一损失,使得获取得到的第一损失的准确度更高。
在一种可能的实现方式中,匹配的方式为二部图匹配方式;和/或,所述损失获取模块33,还配置为:分别将各第一检测框与各第二检测框组成多个候选框对;基于各候选框对的匹配程度,选择部分的候选框对作为匹配框对;和/或,在匹配框对为多个的情况下,将每个匹配框对的差异信息进行累加,以得到第一损失。
该实现方式,可以选择使用多种匹配方式,使得匹配结果中任意检测框最多只有一个检测框与之匹配,从而提高获取表示相同目标的第一检测框和第二检测框的准确度。
在一种可能的实现方式中,第一检测框和第二检测框均对应有关于目标的第一位置信息和/或第一类别信息;差异信息包括以下至少一者:第一检测框和第二检测框之间关于第一位置信息的第一差异和关于第一类别信息的第二差异。
该实现方式,通过获取第一检测框和第二检测框关于位置和类别的差异,作为第一检测框和第二检测框之间的差异,使得获取到的差异更准确。
在一种可能的实现方式中,损失获取模块33,还配置为利用第一差异得到位置损失,并利用第二差异得到类别损失;基于位置损失和类别损失,得到第一损失,其中,位置损失包括距离损失和/或检测框交并比损失。
该实现方式,将位置损失还分为举例损失和检测框交并比损失,使得获取到的位置损失更准确。
在一种可能的实现方式中,第一样本图像和第二样本图像来自于样本集,样本集中的样本图像包括已标注样本图像和/或未标样本图像,已标注样本图像标注有目标的真实框信息,其中,真实框信息包括关于目标的第二位置信息和第二类别信息;以及,在第一样本图像和第二样本图像为已标注样本图像的情况下,所述参数调整模块34,还配置为:将第一检测框和第二检测框中的至少一者作为目标检测框,利用目标检测框与对应的真实框信息之间的差异,得到第二损失;利用第一损失和第二损失,调整目标检测模型的网络参数。
该实现方式,通过在第一样本图像和第二样本图像为已标注样本图像的情况下,还获取目标检测框和真实框之间的差异,从而确定第二损失,进而根据第二损失和第一损失调整目标检测模型的网络参数,使得对目标检测模型的调整效果更好。
在一种可能的实现方式中,参数调整模块34,还配置为利用加权后的第一损失与第二损失得到目标检测模型的总损失,其中,目标检测模型的训练次数越多,第一损失的权重越大;基于总损失,调整目标检测模型的网络参数。
该实现方式,通过根据目标检测模型的训练次数与第一损失的权重,使得目标检测模型的训练初期以全监督训练为主,然后逐渐增大第一损失的权重,以保证训练过程的稳定。
在一种可能的实现方式中,检测框获取模块32,还配置为:对第一检测结果进行解码,得到若干第一候选检测框,对第二检测结果进行解码,得到若干第二候选检测框;其中,第一候选检测框和第二候选检测框均对应有关于目标的第三位置信息和/或第三类别信息;利用第一候选检测框和第二候选检测框的第三位置信息和/或第三类别信息,对第一候选检测框和第二候选检测框进行筛选,得到第一检测框和第二检测框。
该实现方式,通过对第一检测结果和第二检测结果进行解码得到第一候选检测框和第二候选检测框,可能出现候选检测框中类别概率较低,且有可能存在多个候选检测框表示同一目标的情况, 通过对第一候选检测框和第二候选检测框进行筛选,能够减少检测框相互重叠的情况,以及类别概率较低的检测框参与后续的损失计算,从而提高获取得到的第一损失的准确度。
在一种可能的实现方式中,检测框获取模块32,还配置为:利用第三类别信息,从若干第一候选检测框中选择类别概率满足第一要求的预设数量个第一候选检测框,以及从若干第二候选检测框中选择类别概率满足第一要求的预设数量个第二候选检测框;基于第三位置信息,利用非极大值抑制方式从预设数量个第一候选检测框和第二候选检测框进行筛选,得到第一检测框和第二检测框。
该实现方式,通过先利用类别概率对候选检测框进行一次筛选之后,再使用位置信息进行二次筛选,能够减少检测框相互重叠的情况,以及减少类别概率较低的检测框参与后续的损失计算,从而提高获取得到的第一损失的准确度。
在一种可能的实现方式中,第一检测结果包括与第一样本图像对应的第一分类图和第一定位图,第二检测结果包括与第二样本图像对应的第二分类图和第二定位图;和/或,第二样本图像是对第一样本图像处理得到的;和/或,第一样本图像和所述第二样本图像为医学图像。
上述方案,通过对第一样本图像进行处理得到对应的第二样本图像,从而使得第一样本图像和第二样本图像包括至少一个相同的目标。
该实现方式,通过获取第一样本图像和第二样本图像对应的包含相同的目标的检测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。第一检测结果和第二检测结果中可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。
请参阅图5,图5是本公开实施例所提供的目标检测装置的结构示意图。图5中,目标检测装置40包括图像获取模块41以及目标检测模块42;其中,
图像获取模块41,配置为获取待检测图像;
目标检测模块42,配置为利用目标检测模型对待检测图像进行目标检测,得到目标检测结果,其中,目标检测模型是由目标检测模型的训练方法训练得到的。
上述公开实施例的方案,通过利用目标检测模型对待检测图像进行目标检测,可以得到关于待检测图像的目标检测结果,无需人工检测,从而提高了对待检测图像的目标检测效率。
请参阅图6,图6是本公开实施例所提供的电子设备的结构示意图。电子设备50包括存储器51和处理器52,处理器52配置为执行存储器51中存储的程序指令,以实现上述目标检测模型的训练方法实施例和/或目标检测方法实施例中的步骤。在一个实施场景中,电子设备50可以包括但不限于:微型计算机、服务器,此外,电子设备50还可以包括笔记本电脑、平板电脑等移动设备,在此不做限定。
在一些实施方式中,处理器52配置为控制其自身以及存储器51以实现上述目标检测模型的训练方法实施例中的步骤。处理器52还可以称为CPU(Central Processing Unit,中央处理单元)。处理器52可能是一种集成电路芯片,具有信号的处理能力。处理器52还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,处理器52可以由集成电路芯片共同实现。
上述方案,通过获取第一样本图像和第二样本图像对应的包含相同的目标的检测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。第一检测结果和第二检测结果中可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。
请参阅图7,图7为本公开实施例提供的计算机可读存储介质的结构示意图。计算机可读存储介质60存储有能够被处理器运行的程序指令601,程序指令601配置为实现上述目标检测模型的训练方法实施例和/或目标检测方法实施例中的步骤。
上述方案,通过获取第一样本图像和第二样本图像对应的包含相同的目标的检测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。第一检测结果和第二检测结果中 可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。
本公开实施例提供了一种计算机程序产品,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行如上述目标检测模型的训练方法实施例和/或目标检测方法实施例中的步骤。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以配置为执行上文方法实施例描述的方法,其实现可以参照上文方法实施例的描述。
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考。
在本公开所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本公开各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。
工业实用性
通过获取第一样本图像和第二样本图像对应的包含相同的目标的检测框之间的差异信息,从而得到目标检测模型的第一损失,然后利用第一损失就能够调整目标检测模型的网络参数,无需大量的已标注样本图像,从而减少目标检测模型的训练成本。第一检测结果和第二检测结果中可能存在大量无效的信息,本公开实施例是基于第一检测框和第二检测框之间的差异信息获取第一损失,而不是基于第一检测结果和第二检测结果之间的差异信息获取第一损失,能够提高第一损失的准确度。

Claims (25)

  1. 一种目标检测模型的训练方法,包括:
    利用所述目标检测模型分别对第一样本图像和第二样本图像中的目标进行检测,得到所述第一样本图像的第一检测结果和所述第二样本图像的第二检测结果;
    基于所述第一检测结果确定所述第一样本图像中的至少一个第一检测框,基于所述第二检测结果确定所述第二样本图像中的至少一个第二检测框,其中,所述第一检测框和所述第二检测框用于表示所述目标的位置;
    基于表示相同的目标的所述第一检测框和所述第二检测框之间的差异信息,得到所述目标检测模型的第一损失;
    利用所述第一损失,调整所述目标检测模型的网络参数。
  2. 根据权利要求1所述的方法,其中,在所述基于表示相同的目标的所述第一检测框和所述第二检测框之间的差异信息,得到所述目标检测模型的第一损失之前,所述方法还包括:
    将所述至少一个第一检测框和所述至少一个第二检测框进行匹配,得到若干个匹配框对,其中,每个匹配框对包括第一检测框和与所述第一检测框匹配的第二检测框,且不同的匹配框对包括的检测框不同;以及,
    所述基于表示相同的目标的所述第一检测框和所述第二检测框之间的差异信息,得到所述目标检测模型的第一损失,包括:
    利用所述每个匹配框对中第一检测框和第二检测框之间的差异信息,得到所述第一损失。
  3. 根据权利要求2所述的方法,其中,所述匹配的方式为二部图匹配方式;和/或,
    所述将所述至少一个第一检测框和所述至少一个第二检测框进行匹配,得到若干个匹配框对,包括:
    分别将各第一检测框与各第二检测框组成多个候选框对;
    基于各候选框对的匹配程度,选择部分的所述候选框对作为所述匹配框对;和/或,
    所述利用所述每个匹配框对中第一检测框和第二检测框之间的差异信息,得到所述第一损失,包括:
    在所述匹配框对为多个的情况下,将每个所述匹配框对的所述差异信息进行累加,以得到所述第一损失。
  4. 根据权利要求1至3任一项所述的方法,其中,所述第一检测框和所述第二检测框均对应有关于所述目标的第一位置信息和/或第一类别信息;
    所述差异信息包括以下至少一者:所述第一检测框和第二检测框之间关于所述第一位置信息的第一差异和关于所述第一类别信息的第二差异。
  5. 根据权利要求4所述的方法,其中,所述基于表示相同的目标的所述第一检测框和所述第二检测框之间的差异信息,得到所述目标检测模型的第一损失,包括:
    利用所述第一差异得到位置损失,并利用所述第二差异得到类别损失;
    基于所述位置损失和类别损失,得到所述第一损失,其中,所述位置损失包括距离损失和/或检测框交并比损失。
  6. 根据权利要求1至5任一项所述的方法,其中,所述第一样本图像和第二样本图像来自于样本集,所述样本集中的样本图像包括已标注样本图像和/或未标样本图像,所述已标注样本图像标注有所述目标的真实框信息,其中,所述真实框信息包括关于所述目标的第二位置信息和第二类别信息;以及
    在所述第一样本图像和第二样本图像为已标注样本图像的情况下,所述利用所述第一损失,调整所述目标检测模型的网络参数,包括:
    将所述第一检测框和第二检测框中的至少一者作为目标检测框,利用所述目标检测框与对应的所述真实框信息之间的差异,得到第二损失;
    利用所述第一损失和第二损失,调整所述目标检测模型的网络参数。
  7. 根据权利要求6所述的方法,其中,所述利用所述第一损失和第二损失,调整所述目标检测模型的网络参数,包括:
    利用加权后的所述第一损失与所述第二损失得到所述目标检测模型的总损失,其中,所述目标 检测模型的训练次数越多,所述第一损失的权重越大;
    基于所述总损失,调整所述目标检测模型的网络参数。
  8. 根据权利要求1至7任一项所述的方法,其中,所述基于所述第一检测结果确定所述第一样本图像中的至少一个第一检测框,基于所述第二检测结果确定所述第二样本图像中的至少一个第二检测框,包括:
    对所述第一检测结果进行解码,得到若干第一候选检测框,对所述第二检测结果进行解码,得到若干第二候选检测框;其中,所述第一候选检测框和所述第二候选检测框均对应有关于所述目标的第三位置信息和/或第三类别信息;
    利用所述第一候选检测框和所述第二候选检测框的第三位置信息和/或第三类别信息,对所述第一候选检测框和所述第二候选检测框进行筛选,得到所述第一检测框和所述第二检测框。
  9. 根据权利要求8所述的方法,其中,所述利用所述第一候选检测框和所述第二候选检测框的第三位置信息和/或第三类别信息,对所述第一候选检测框和所述第二候选检测框进行筛选,得到所述第一检测框和所述第二检测框,包括:
    利用所述第三类别信息,从所述若干第一候选检测框中选择类别概率满足第一要求的预设数量个第一候选检测框,以及从所述若干第二候选检测框中选择类别概率满足第一要求的预设数量个第二候选检测框;
    基于所述第三位置信息,利用非极大值抑制方式从所述预设数量个第一候选检测框和第二候选检测框进行筛选,得到所述第一检测框和所述第二检测框。
  10. 根据权利要求1至9任一项所述的方法,其中,所述第一检测结果包括与所述第一样本图像对应的第一分类图和第一定位图,第二检测结果包括与所述第二样本图像对应的第二分类图和第二定位图;
    和/或,所述第二样本图像是对所述第一样本图像处理得到的;
    和/或,所述第一样本图像和所述第二样本图像为医学图像。
  11. 一种目标检测方法,包括:
    获取待检测图像;
    利用目标检测模型对所述待检测图像进行目标检测,得到目标检测结果,其中,所述目标检测模型是由权利要求1至10任一项方法训练得到的。
  12. 一种目标检测模型的训练装置,包括:
    检测模块,配置为利用所述目标检测模型分别对第一样本图像和第二样本图像中的目标进行检测,得到所述第一样本图像的第一检测结果和所述第二样本图像的第二检测结果;
    检测框获取模块,配置为基于所述第一检测结果确定所述第一样本图像中的至少一个第一检测框,基于所述第二检测结果确定所述第二样本图像中的至少一个第二检测框,其中,所述第一检测框和第二检测框用于表示所述目标的位置;
    损失获取模块,配置为基于表示相同的目标的所述第一检测框和所述第二检测框之间的差异信息,得到所述目标检测模型的第一损失;
    参数调整模块,配置为利用所述第一损失,调整所述目标检测模型的网络参数。
  13. 根据权利要求12所述的装置,其中,所述目标检测模型的训练装置还包括检测框匹配模块,在所述损失获取模块基于表示相同的目标的所述第一检测框和所述第二检测框之间的差异信息,得到所述目标检测模型的第一损失之前,所述检测框匹配模块,配置为:
    将所述至少一个第一检测框和所述至少一个第二检测框进行匹配,得到若干个匹配框对,其中,每个匹配框对包括第一检测框和与所述第一检测框匹配的第二检测框,且不同的匹配框对包括的检测框不同;以及,
    损失获取模块,还配置为:利用所述每个匹配框对中第一检测框和第二检测框之间的差异信息,得到所述第一损失。
  14. 根据权利要求13所述的装置,其中,所述匹配的方式为二部图匹配方式;和/或,
    所述损失获取模块,还配置为:
    分别将各第一检测框与各第二检测框组成多个候选框对;
    基于各候选框对的匹配程度,选择部分的所述候选框对作为所述匹配框对;和/或,
    在所述匹配框对为多个的情况下,将每个所述匹配框对的所述差异信息进行累加,以得到所述第一损失。
  15. 根据权利要求12至14任一项所述的装置,其中,所述第一检测框和所述第二检测框均对 应有关于所述目标的第一位置信息和/或第一类别信息;
    所述差异信息包括以下至少一者:所述第一检测框和第二检测框之间关于所述第一位置信息的第一差异和关于所述第一类别信息的第二差异。
  16. 根据权利要求15所述的装置,其中,所述损失获取模块,还配置为:
    利用所述第一差异得到位置损失,并利用所述第二差异得到类别损失;
    基于所述位置损失和类别损失,得到所述第一损失,其中,所述位置损失包括距离损失和/或检测框交并比损失。
  17. 根据权利要求12至16任一项所述的装置,其中,所述第一样本图像和第二样本图像来自于样本集,所述样本集中的样本图像包括已标注样本图像和/或未标样本图像,所述已标注样本图像标注有所述目标的真实框信息,其中,所述真实框信息包括关于所述目标的第二位置信息和第二类别信息;以及
    在所述第一样本图像和第二样本图像为已标注样本图像的情况下,所述参数调整模块,还配置为:
    将所述第一检测框和第二检测框中的至少一者作为目标检测框,利用所述目标检测框与对应的所述真实框信息之间的差异,得到第二损失;
    利用所述第一损失和第二损失,调整所述目标检测模型的网络参数。
  18. 根据权利要求17所述的装置,其中,所述参数调整模块,还配置为:
    利用加权后的所述第一损失与所述第二损失得到所述目标检测模型的总损失,其中,所述目标检测模型的训练次数越多,所述第一损失的权重越大;
    基于所述总损失,调整所述目标检测模型的网络参数。
  19. 根据权利要求12至18任一项所述的装置,其中,所述检测框获取模块,还配置为:
    对所述第一检测结果进行解码,得到若干第一候选检测框,对所述第二检测结果进行解码,得到若干第二候选检测框;其中,所述第一候选检测框和所述第二候选检测框均对应有关于所述目标的第三位置信息和/或第三类别信息;
    利用所述第一候选检测框和所述第二候选检测框的第三位置信息和/或第三类别信息,对所述第一候选检测框和所述第二候选检测框进行筛选,得到所述第一检测框和所述第二检测框。
  20. 根据权利要求19所述的装置,其中,所述检测框获取模块,还配置为:
    利用所述第三类别信息,从所述若干第一候选检测框中选择类别概率满足第一要求的预设数量个第一候选检测框,以及从所述若干第二候选检测框中选择类别概率满足第一要求的预设数量个第二候选检测框;
    基于所述第三位置信息,利用非极大值抑制方式从所述预设数量个第一候选检测框和第二候选检测框进行筛选,得到所述第一检测框和所述第二检测框。
  21. 根据权利要求12至20任一项所述的装置,其中,所述第一检测结果包括与所述第一样本图像对应的第一分类图和第一定位图,第二检测结果包括与所述第二样本图像对应的第二分类图和第二定位图;
    和/或,所述第二样本图像是对所述第一样本图像处理得到的;
    和/或,所述第一样本图像和所述第二样本图像为医学图像。
  22. 一种目标检测装置,包括:
    图像获取模块,配置为获取待检测图像;
    目标检测模块,配置为利用目标检测模型对所述待检测图像进行目标检测,得到目标检测结果,其中,所述目标检测模型是由权利要求1至10任一项方法训练得到的。
  23. 一种电子设备,包括存储器和处理器,所述处理器用于执行所述存储器中存储的程序指令,以实现权利要求1至11任一项所述的方法。
  24. 一种计算机可读存储介质,其上存储有程序指令,所述程序指令被处理器执行时实现权利要求1至11任一项所述的方法。
  25. 一种计算机程序产品,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行如权利要求1至11任一项所述的方法。
PCT/CN2021/137203 2021-05-14 2021-12-10 目标检测方法及其模型训练方法、相关装置、介质及程序产品 WO2022237153A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110529645.4 2021-05-14
CN202110529645.4A CN113283485A (zh) 2021-05-14 2021-05-14 目标检测方法及其模型的训练方法和相关装置及介质

Publications (1)

Publication Number Publication Date
WO2022237153A1 true WO2022237153A1 (zh) 2022-11-17

Family

ID=77279249

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137203 WO2022237153A1 (zh) 2021-05-14 2021-12-10 目标检测方法及其模型训练方法、相关装置、介质及程序产品

Country Status (2)

Country Link
CN (1) CN113283485A (zh)
WO (1) WO2022237153A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468967A (zh) * 2023-04-18 2023-07-21 北京百度网讯科技有限公司 样本图像筛选方法、装置、电子设备及存储介质
CN117115568A (zh) * 2023-10-24 2023-11-24 浙江啄云智能科技有限公司 数据筛选方法、装置、设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283485A (zh) * 2021-05-14 2021-08-20 上海商汤智能科技有限公司 目标检测方法及其模型的训练方法和相关装置及介质
CN114187488B (zh) * 2021-12-10 2023-11-17 北京百度网讯科技有限公司 图像处理方法、装置、设备、介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190325259A1 (en) * 2018-04-12 2019-10-24 Discovery Communications, Llc Feature extraction and machine learning for automated metadata analysis
CN111814850A (zh) * 2020-06-22 2020-10-23 浙江大华技术股份有限公司 缺陷检测模型训练方法、缺陷检测方法及相关装置
CN112733864A (zh) * 2019-09-16 2021-04-30 北京迈格威科技有限公司 模型训练方法、目标检测方法、装置、设备及存储介质
CN112749726A (zh) * 2020-02-26 2021-05-04 腾讯科技(深圳)有限公司 目标检测模型的训练方法、装置、计算机设备和存储介质
CN113283485A (zh) * 2021-05-14 2021-08-20 上海商汤智能科技有限公司 目标检测方法及其模型的训练方法和相关装置及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190325259A1 (en) * 2018-04-12 2019-10-24 Discovery Communications, Llc Feature extraction and machine learning for automated metadata analysis
CN112733864A (zh) * 2019-09-16 2021-04-30 北京迈格威科技有限公司 模型训练方法、目标检测方法、装置、设备及存储介质
CN112749726A (zh) * 2020-02-26 2021-05-04 腾讯科技(深圳)有限公司 目标检测模型的训练方法、装置、计算机设备和存储介质
CN111814850A (zh) * 2020-06-22 2020-10-23 浙江大华技术股份有限公司 缺陷检测模型训练方法、缺陷检测方法及相关装置
CN113283485A (zh) * 2021-05-14 2021-08-20 上海商汤智能科技有限公司 目标检测方法及其模型的训练方法和相关装置及介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468967A (zh) * 2023-04-18 2023-07-21 北京百度网讯科技有限公司 样本图像筛选方法、装置、电子设备及存储介质
CN116468967B (zh) * 2023-04-18 2024-04-16 北京百度网讯科技有限公司 样本图像筛选方法、装置、电子设备及存储介质
CN117115568A (zh) * 2023-10-24 2023-11-24 浙江啄云智能科技有限公司 数据筛选方法、装置、设备及存储介质
CN117115568B (zh) * 2023-10-24 2024-01-16 浙江啄云智能科技有限公司 数据筛选方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113283485A (zh) 2021-08-20

Similar Documents

Publication Publication Date Title
WO2022237153A1 (zh) 目标检测方法及其模型训练方法、相关装置、介质及程序产品
US20220108546A1 (en) Object detection method and apparatus, and computer storage medium
US10796452B2 (en) Optimizations for structure mapping and up-sampling
US11232286B2 (en) Method and apparatus for generating face rotation image
CN110348294B (zh) Pdf文档中图表的定位方法、装置及计算机设备
JP2019032773A (ja) 画像処理装置、画像処理方法
CN111583220B (zh) 影像数据检测方法和装置
US20230237771A1 (en) Self-supervised learning method and apparatus for image features, device, and storage medium
WO2022000862A1 (zh) 鱼眼图像中的对象检测方法、装置及存储介质
US20150012472A1 (en) Systems, methods, and media for updating a classifier
WO2022188315A1 (zh) 一种视频检测方法、装置、电子设备及存储介质
JPWO2019026104A1 (ja) 情報処理装置、情報処理プログラム及び情報処理方法
TW202125415A (zh) 三維目標檢測及模型的訓練方法、設備、儲存媒體
WO2022217876A1 (zh) 实例分割方法及装置、电子设备及存储介质
JP2018072938A (ja) 目的物個数推定装置、目的物個数推定方法及びプログラム
CN110222718A (zh) 图像处理的方法及装置
WO2023221790A1 (zh) 图像编码器的训练方法、装置、设备及介质
CN115797736B (zh) 目标检测模型的训练和目标检测方法、装置、设备和介质
WO2023221713A1 (zh) 图像编码器的训练方法、装置、设备及介质
WO2021217937A1 (zh) 姿态识别模型的训练方法及设备、姿态识别方法及其设备
WO2023109361A1 (zh) 用于视频处理的方法、系统、设备、介质和产品
WO2023165033A1 (zh) 识别医学图像中的目标的模型训练、方法、设备及介质
Zhang et al. Fine segmentation on faces with masks based on a multistep iterative segmentation algorithm
Niu et al. Boundary-aware RGBD salient object detection with cross-modal feature sampling
US20170177977A1 (en) Control of Computer Vision Pre-Processing Based on Image Matching Using Structural Similarity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21941715

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE