WO2022257314A1 - 图像检测方法和相关训练方法及相关装置、设备及介质 - Google Patents

图像检测方法和相关训练方法及相关装置、设备及介质 Download PDF

Info

Publication number
WO2022257314A1
WO2022257314A1 PCT/CN2021/123753 CN2021123753W WO2022257314A1 WO 2022257314 A1 WO2022257314 A1 WO 2022257314A1 CN 2021123753 W CN2021123753 W CN 2021123753W WO 2022257314 A1 WO2022257314 A1 WO 2022257314A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
detection model
predicted
sample
preset
Prior art date
Application number
PCT/CN2021/123753
Other languages
English (en)
French (fr)
Inventor
郭佳
谢帅宁
赵亮
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022257314A1 publication Critical patent/WO2022257314A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present disclosure relates to the technical field of artificial intelligence, and in particular to an image detection method, a related training method, and related devices, equipment, and media.
  • neural network models have been gradually applied to many application scenarios such as medicine and transportation to assist relevant staff in their work and improve work efficiency.
  • the performance of neural network models is closely related to the annotation quality of sample images.
  • manual annotation often has missing labels, which damages the performance of neural network models.
  • Embodiments of the present disclosure provide an image detection method, a related training method, and related devices, equipment, and media.
  • the first aspect of the embodiments of the present disclosure provides a training method for an image detection model, including: obtaining a sample image; wherein, several sample positions of target objects are marked in the sample image; The image is detected to obtain the first prediction information and the second prediction information of the target object; wherein, the first prediction information includes a number of first prediction positions and their first prediction probability values, and the second prediction information includes a number of second prediction positions and their The second predicted probability value; filter the second predicted position corresponding to the first reference position in the second predicted information and its second predicted probability value to obtain the third predicted information, wherein the first reference position is not related to any sample position Corresponding to the first predicted position where the first predicted probability value satisfies the first preset condition; based on the third predicted information, adjusting network parameters of the image detection model.
  • the first prediction information and the second prediction information of the target object are obtained , and the first prediction information includes several first prediction locations and their first prediction probability values, and the second prediction information includes several second prediction locations and their second prediction probability values.
  • filter the second prediction information with The second predicted position corresponding to the first reference position and its second predicted probability value are obtained to obtain the third predicted information, and the first reference position is not corresponding to any sample position and the first predicted probability value satisfies the first preset condition
  • the first predicted position, and further based on the third predicted information adjust the network parameters of the image detection model, that is, during the training process of the image detection model, the first predicted position predicted by the preset detection model can be filtered without any A sample position corresponds to the first reference position whose first predicted probability value satisfies the first preset condition.
  • the third predicted information is obtained by filtering the second predicted position corresponding to the potential missed mark position and its second predicted probability value in the second predicted information,
  • the image detection model is obtained through several rounds of training, wherein, when the current training round satisfies the second preset condition, the sample image is detected by using the preset detection model and the image detection model respectively. , the step of obtaining the first prediction information and the second prediction information of the target object and subsequent steps to train the image detection model, and train the preset detection model when the current training round does not meet the second preset condition .
  • the image detection model is trained through several rounds, and when the current training round satisfies the second preset condition, the image detection model is trained, and when the current training round does not meet the second preset condition, the preset detection
  • the model is trained, that is, during the training process of the image detection model, the preset detection model is also trained alternately, so the image detection model and the preset detection model can be taught each other during the training process, compared with a single Training the image detection model can help reduce the cumulative error, so as to further reduce the impact of missing labels on the performance of the model as much as possible.
  • the step of training the preset detection model includes: performing a step of using the preset detection model and the image detection model to detect the sample image respectively to obtain the first prediction information and the second prediction information of the target object; filtering the second prediction information A first predicted position corresponding to the second reference position and its first predicted probability value in the predicted information to obtain fourth predicted information; wherein, the second reference position does not correspond to any sample position and the second predicted probability value satisfies The second predicted position of the first preset condition; based on the fourth predicted information, adjusting the network parameters of the preset detection model.
  • the steps of using the preset detection model and the image detection model to detect the sample image respectively to obtain the first prediction information and the second prediction information of the target object are also performed first, and based on this, further Filtering the first predicted position and its first predicted probability value corresponding to the second reference position in the first predicted information to obtain the fourth predicted information, and the second reference position is not corresponding to any sample position and the second predicted probability value
  • the second predicted position that satisfies the first preset condition, and based on the fourth predicted information adjust the network parameters of the preset detection model, that is, during the training process of the preset detection model, the first position predicted by the image detection model can be filtered Among the two predicted positions, the second reference position that does not correspond to any sample position and whose second predicted probability value satisfies the first preset condition can be regarded as a false positive position because the second predicted position does not correspond to any sample position, and The predicted probability value is related to the predicted loss, so the second reference position can be regarded as
  • the fourth prediction information is obtained to adjust the network parameters of the preset detection model through the fourth prediction information, which can reduce the impact of missing labels on model performance as much as possible, and because both of them can reduce the impact of missing labels on their own performance as much as possible. Influence, and then in the alternate training process of the preset image detection model and the image detection model, it can help to improve the quality of mutual teaching between the two and improve the performance of the model.
  • the second preset condition includes any of the following: the current training round is an odd number, the current training round is an even number, and the remainder of the current training round divided by the first value is greater than the second value; wherein, the first The second value is less than the first value.
  • the image detection model and the preset detection model can be trained in a balanced manner according to a one-to-one training strategy, and by setting The second preset condition is set such that the remainder of the current training round divided by the first value is greater than the second value, and the second value is smaller than the first value, so that the image detection model and the preset detection model can be compared once (or once)
  • the training strategy of more than multiple times) focuses on training.
  • the first preset condition in the case of training an image detection model, includes: the first predicted probability value is higher than a preset probability threshold; in the case of training a preset detection model, the first preset condition includes : The second predicted probability value is higher than the preset probability threshold.
  • the second predicted probability value is higher than the preset probability threshold
  • the false positive position with a large loss can be screened out as a potential missed mark position, and its impact on the training image can be ignored. Detect the impact of the model, which can help to further reduce the impact of missing labels on the performance of the model as much as possible.
  • the preset detection model and the image detection model have the same network structure; and/or, the sample image includes a bone region, and the target object includes a fracture.
  • the preset detection model and the image detection model have the same network structure, that is, the preset detection model used for teaching the image detection model has the same network structure as the image detection model, which can help reduce the training difficulty of the image detection model; and By setting the sample image to include bone regions and the target object to include fractures, the image detection model can be applied to fracture detection.
  • the image detection model includes a feature extraction network and a result prediction network
  • the feature extraction network is used to extract the sample feature map of the sample image
  • the result prediction network is used to use the sample feature map to predict and obtain the second prediction information
  • the sample feature The ratio between the resolution of the map and the resolution of the sample image is within a preset range.
  • the image detection model includes a feature extraction network and a result prediction network
  • the feature extraction network is used to extract the sample feature map of the sample image
  • the result prediction network is used to use the sample feature map to predict and obtain the second prediction information
  • the resolution of the sample feature map The ratio between the rate and the resolution of the sample image is within the preset range, that is, the resolution of the sample feature map will not be too large, thereby reducing the load on the hardware equipment used to train the image detection model, and the resolution of the sample feature map The rate will not be too small, so that it can provide enough feature information for the result prediction network, which is conducive to improving the prediction accuracy.
  • the second aspect of the embodiments of the present disclosure provides an image detection method, including: acquiring an image to be detected; using an image detection model to detect the image to be detected, and obtaining detection information of a target object in the image to be detected; wherein the detection information includes the target object The target position in the image to be detected, and the image detection model is obtained by using the image detection model training method in the first aspect above.
  • the detection information of the target object in the image to be detected is obtained, and the detection information includes the target position of the target object in the image to be detected, and the image detection model is Obtained by using the training method of the image detection model in the first aspect above, that is, in the training process of the image detection model, the influence of missing labels on the performance of the model can be reduced as much as possible, so it can help improve the detection information of the target object accuracy.
  • the image to be detected is a medical image, the image to be detected includes a bone region, and the target object includes a fracture; before using the image detection model to detect the image to be detected to obtain the detection information of the target object in the image to be detected, the method It also includes: identifying the bone area in the image to be detected, and intercepting several sub-images from the image to be detected; wherein, the set of several sub-images covers the bone area; using the image detection model to detect the image to be detected, and obtaining the target in the image to be detected
  • the detection information of the object includes: using the image detection model to detect several sub-images respectively to obtain the detection information of the target object in each sub-image.
  • the image to be detected is a medical image, and the image to be detected includes a bone area, and the target object includes a fracture.
  • image, and the set of several sub-images can cover the bone area.
  • the image detection model can be used to detect several sub-images respectively to obtain the detection information of the target object in each sub-image. Since the bone area usually appears as a strip-shaped area, Compared with traversing the non-bone region for detection by traversing the image to be detected, only detecting a few sub-images can help greatly reduce the time required for image detection and improve image detection efficiency.
  • the third aspect of the present disclosure provides a training device for an image detection model, including: an image acquisition module, a target detection module, an information filtering module, and a parameter adjustment module.
  • the image acquisition module is configured to acquire a sample image; wherein, the sample image is marked with Several sample positions of the target object;
  • the target detection module is configured to use the preset detection model and the image detection model to detect the sample image respectively, and obtain the first prediction information and the second prediction information of the target object; wherein, the first prediction information includes several The first predicted position and its first predicted probability value, the second predicted information includes several second predicted positions and its second predicted probability value;
  • the information filtering module is configured to filter the second predicted position corresponding to the first reference position in the second predicted information A predicted position and its second predicted probability value to obtain third predicted information, wherein the first reference position is a first predicted position that does not correspond to any sample position and the first predicted probability value satisfies the first preset condition; parameter adjustment The module is configured to adjust network parameters of the image detection model based on the
  • the image detection model is obtained through several rounds of training, wherein, in the case that the current training round satisfies the second preset condition, performing target detection on the sample image using the preset detection model and the image detection model respectively. Detection, the step of obtaining the first prediction information and the second prediction information of the target object and subsequent steps, so as to train the image detection model, and when the current training round does not meet the second preset condition, train the preset detection Model.
  • the training device for the image detection model further includes a model training module configured to train a preset detection model
  • the model training module includes a detection sub-module configured to use the preset detection model and the image detection model to test the sample image respectively Perform detection to obtain the first prediction information and the second prediction information of the target object
  • the model training module further includes a filtering sub-module configured to filter the first prediction position corresponding to the second reference position in the first prediction information and its first prediction probability value to obtain fourth prediction information; wherein, the second reference position is a second prediction position that does not correspond to any sample position and the second prediction probability value satisfies the first preset condition
  • the model training module also includes an adjustment submodule, It is configured to adjust the network parameters of the preset detection model based on the fourth prediction information.
  • the second preset condition includes any of the following: the current training round is an odd number, the current training round is an even number, and the remainder of the current training round divided by the first value is greater than the second value; wherein, the first The second value is less than the first value.
  • the first preset condition in the case of training an image detection model, includes: the first predicted probability value is higher than a preset probability threshold, and in the case of training a preset detection model, the first preset condition includes : The second predicted probability value is higher than the preset probability threshold.
  • the preset detection model and the image detection model have the same network structure; and/or, the sample image is a medical image, and the sample image includes a bone region, and the target object includes a fracture.
  • the image detection model includes a feature extraction network and a result prediction network
  • the feature extraction network is used to extract the sample feature map of the sample image
  • the result prediction network is used to use the sample feature map to predict and obtain the second prediction information
  • the sample feature The ratio between the resolution of the map and the resolution of the sample image is within a preset range.
  • the fourth aspect of the embodiments of the present disclosure provides an image detection device, including: an image acquisition module and a target detection module, the image acquisition module is configured to acquire an image to be detected; the target detection module is configured to use an image detection model to detect the image to be detected, Obtain the detection information of the target object in the image to be detected; wherein the detection information includes the target position of the target object in the image to be detected, and the image detection model is obtained by using the image detection model training device in the third aspect above.
  • the image to be detected is a medical image
  • the image to be detected includes a bone region
  • the target object includes a fracture
  • the image detection device further includes a region identification module configured to identify the bone region in the image to be detected
  • the image detection device further It includes an image interception module configured to intercept several sub-images from the image to be detected; wherein, the collection of several sub-images covers the bone area
  • the target detection module is also configured to use the image detection model to detect several sub-images respectively, and obtain each sub-image The detection information of the target object in the image.
  • the fifth aspect of the embodiment of the present disclosure provides an electronic device, including a memory and a processor coupled to each other, the processor is used to execute the program instructions stored in the memory, so as to implement the image detection model training method in the first aspect above, Or implement the image detection method in the second aspect above.
  • the sixth aspect of the embodiments of the present disclosure provides a computer-readable storage medium on which program instructions are stored.
  • the program instructions are executed by a processor, the method for training the image detection model in the first aspect above is implemented, or the second aspect above is implemented.
  • Image detection methods in aspects.
  • the first prediction information and the second prediction information of the target object are obtained information, and the first prediction information includes several first prediction positions and their first prediction probability values, and the second prediction information includes several second prediction positions and their second prediction probability values.
  • the second predicted position corresponding to the first reference position and its second predicted probability value are used to obtain the third predicted information, and the first reference position does not correspond to any sample position and the first predicted probability value satisfies the first preset condition , and further adjust the network parameters of the image detection model based on the third prediction information, that is, during the training process of the image detection model, the first prediction position predicted by the preset detection model can be screened to be different from The first reference position corresponding to any sample position and the first predicted probability value satisfies the first preset condition, because the first predicted position not corresponding to any sample position can be regarded as a false positive position, and the predicted probability value is the same as the predicted The loss is related, so the first reference position can be regarded as a potential missing label position.
  • the third prediction information is obtained by filtering the second predicted position corresponding to the potential missing label position and its second predicted probability value in the second prediction information , so as to adjust the network parameters of the image detection model through the third prediction information, which can reduce the impact of missing labels on the performance of the model as much as possible.
  • FIG. 1A shows a schematic diagram of a system architecture to which an image detection method according to an embodiment of the present disclosure can be applied;
  • FIG. 1B is a schematic flow diagram of an embodiment of a training method for an image detection model of the present disclosure
  • Fig. 2 is a schematic framework diagram of an embodiment of a detection model
  • FIG. 3 is a schematic diagram of the process of an embodiment of the training method of the image detection model of the present disclosure
  • FIG. 4 is a schematic flow diagram of an embodiment of the disclosed image detection method
  • Fig. 5 is a schematic diagram of an embodiment of intercepting a sub-image
  • Fig. 6 is a schematic frame diagram of an embodiment of a training device for an image detection model of the present disclosure
  • Fig. 7 is a schematic frame diagram of an embodiment of an image detection device of the present disclosure.
  • Fig. 8 is a schematic frame diagram of an embodiment of an electronic device of the present disclosure.
  • FIG. 9 is a schematic diagram of an embodiment of a computer-readable storage medium of the present disclosure.
  • system and “network” are often used interchangeably herein.
  • the term “and/or” in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations.
  • the character "/” in this article generally indicates that the contextual objects are an “or” relationship.
  • “many” herein means two or more than two.
  • Rib fracture is one of the common injuries after the chest is subjected to external force.
  • the diagnosis of rib fracture is very important for the evaluation of trauma severity and treatment.
  • Quantitative measurement of the number of fractures is directly related to injury identification, compensation, and sentencing.
  • Computed Tomography (CT) has higher sensitivity and specificity than plain chest X-ray. Evaluation of CT images is a time-consuming task that requires the patience and experience of clinicians. Therefore, an automatic fracture detection system for chest CT can help clinicians improve the efficiency and accuracy of fracture assessment.
  • a neural network is often used to automatically detect rib fractures.
  • a convolutional neural network is used to train a neural network detection model on the axial slices of CT.
  • the existing problem is that the performance of rib fractures is a kind of three-dimensional information, and its changes on continuous slices are an important basis for diagnosis. Using only two-dimensional axial slices limits its detection accuracy.
  • a three-dimensional convolutional neural network was used to perform sliding window segmentation on chest CT images and segment the rib fracture area. Since there are a large number of non-rib regions in the 3D sliding window, the problem is that the calculation time is long and the accuracy is not high.
  • Neural network-based methods require a large amount of human-labeled training data.
  • the quantity and quality of datasets greatly affect the performance of detection models.
  • due to the sparseness and tinyness of rib fractures it is tedious to mark the fractures in 3D CT.
  • labeling 3D bounding boxes or segmentation masks for fractures takes a lot of time. This inevitably leads to the omission of fracture annotations, resulting in datasets with noisy annotations, which in turn impairs model performance.
  • FIG. 1A shows a schematic diagram of a system architecture that can apply the image detection method of the embodiment of the present disclosure
  • the system architecture includes: an image acquisition terminal 11 , a network 12 and an image detection terminal 13 .
  • the image acquisition terminal 11 and the image detection terminal 13 establish a communication connection through the network 12, and the image acquisition terminal 11 reports the acquired sample image to the image detection terminal 13 through the network 12; the image detection terminal 13 responds to receiving
  • the detected image is detected by using the preset detection model and the image detection model respectively to obtain the first prediction information and the second prediction information of the target object.
  • the third prediction information is obtained by filtering the second prediction information corresponding to the first reference position and its second prediction probability value; the image detection terminal 13 adjusts the network parameters of the image detection model through the third prediction information.
  • the image acquisition terminal 11 may include an image acquisition device, and the image detection terminal 13 may include a vision processing device or a remote server capable of processing visual information.
  • the network 12 can be wired or wirelessly connected.
  • the image detection terminal 13 can communicate with the visual processing device through a wired connection, such as performing data communication through a bus; when the image acquisition terminal 11 is a remote server, the image detection terminal 13 13. It can exchange data with the remote server through the wireless network.
  • the image detection terminal 13 may be a vision processing device with a video capture module, a mobile phone or glasses with an image capture function, and the like.
  • the image detection method of the embodiment of the present disclosure may be executed by the image detection terminal 13 , and the above-mentioned system architecture may not include the network 12 and the image detection terminal 11 .
  • FIG. 1B is a schematic flowchart of an embodiment of a training method for an image detection model of the present disclosure. The method may include the steps of:
  • Step S11 Obtain a sample image.
  • the target object may exist in several places in the sample image, and at least one of them (such as one, two, three, etc.) may not be marked, that is, there may be In the case of omission of labeling, of course, all of them can be marked, and there is no limitation here.
  • sample images may include but not limited to: CT images, MR images, etc.; or, in a traffic application scenario, sample images may include but not limited to surveillance images captured by surveillance cameras, etc. , is not limited here.
  • Other scenarios can be deduced in the same way, and examples are not given here.
  • the target object can be set according to actual application needs.
  • the sample image may contain a bone region (such as a rib region, a femur region, etc.), and the target object may include a fracture (such as a rib fracture, a femur fracture, etc.);
  • the sample image can contain organ regions (such as stomach, large intestine, small intestine, etc.), and the target object can include hematoma (such as gastric hematoma, large intestine hematoma, small intestine hematoma, etc.) or
  • the traffic application scenario as an example, in the case where non-motor vehicles need to be detected, the sample image may include road areas, and the target objects may include non-motor vehicles (such as bicycles, electric vehicles, scooters, etc.).
  • Other scenarios can be deduced in the same way, and examples are not given here.
  • the sample position may specifically be the center position of the image area of the target object in the sample image.
  • the image area of the target object can be a square area containing the target object (for example, in the case where the sample image is a three-dimensional image, the square area can be a cuboid area, and when the sample image is a two-dimensional image, the square area can be is a rectangular area, which is not limited here), then the sample position may be the center of the square area.
  • the sample position may specifically include a pixel point position belonging to the target object in the sample image.
  • the pixel points belonging to the target object in the sample image can be marked as sample positions, so as to distinguish the pixel points not belonging to the target object in the sample image.
  • Step S12 Using the preset detection model and the image detection model to detect the sample image respectively, to obtain the first prediction information and the second prediction information of the target object.
  • the sample image can be a three-dimensional image
  • the image detection model can be a three-dimensional fully convolutional neural network.
  • the convolution kernel used to extract image features in the image detection model can be is a three-dimensional convolution kernel.
  • the size of the three-dimensional convolution kernel may include but not limited to: 3*3*3, 5*5*5, etc., which is not limited here.
  • the image detection model is a three-dimensional fully convolutional neural network, which can capture the spatial information of the three-dimensional image, and is conducive to improving the sensitivity and specificity of the image detection model.
  • the convolution kernel used to extract image features in the preset detection model may also be a three-dimensional convolution kernel, which is not limited here.
  • the size of the three-dimensional convolution kernel may include but not limited to: 3*3*3, 5*5*5, etc., which is not limited here.
  • the preset detection model and the image detection model may have the same network structure.
  • the preset detection model may also be a 3D fully convolutional neural network, and so on for other cases, and no further examples are given here.
  • the preset detection model and the image detection model may also have different network structures, and both the preset detection model and the image detection model have the ability to detect the target object.
  • both the preset detection model and the image detection model may include a feature extraction network for extracting image features and a result prediction network for obtaining prediction information of the target object based on the image features.
  • the image detection model may include a feature extraction network and a result prediction network
  • the feature extraction network is used to extract the sample feature map of the sample image
  • the result prediction network is used to predict the second prediction information by using the sample feature map
  • the ratio between the resolution of the sample feature map and the resolution of the sample image is within a preset range.
  • the preset range can be set to 1/8 to 1/2, or can also be set to 1/6 to 1/3, or can also be set to 1/7 to 1/2, or can also be set to 1 /5 to 1/3, etc., are not limited here.
  • the ratio between the resolution of the sample feature map and the resolution of the sample image may be 1/4, 1/5, etc., which is not limited here.
  • the image detection model includes a feature extraction network and a result prediction network
  • the feature extraction network is used to extract the sample feature map of the sample image
  • the result prediction network is used to use the sample feature map to predict and obtain the second prediction information
  • the sample feature map The ratio between the resolution and the resolution of the sample image is within a preset range, that is, the resolution of the sample feature map will not be too large, thereby reducing the load on the hardware device used to train the image detection model, and the sample feature map
  • the resolution is not too small, so that it can provide enough feature information for the result prediction network, which is conducive to improving the prediction accuracy.
  • the first prediction information includes several first prediction locations and their first prediction probability values
  • the second prediction information includes several second prediction locations and their second prediction probability values.
  • first prediction probability value and second prediction probability value represent the possibility that the model prediction is accurate.
  • the predicted probability value of the first predicted position may represent the probability that the image detection model accurately predicts the first predicted position
  • the predicted probability value of the second predicted position may represent the probability that the preset image detection model accurately predicts the second predicted position .
  • FIG. 2 is a schematic framework diagram of an embodiment of a detection model.
  • the detection model shown in Figure 2 can be either an image detection model or a preset detection model, and both the image detection model and the preset detection model can use the network structure shown in Figure 2, which will not be discussed here. limited.
  • the detection model adopts an encoder-decoder (ie encoder-decoder) network structure, the encoder part is used to capture context information, and the decoder part is used to precisely locate the target.
  • encoder-decoder ie encoder-decoder
  • the detection model starts with a preprocessing layer to preprocess the input image.
  • the preprocessing layer can contain a convolutional block with a size of 5*5*5 (the convolutional block contains a 5*5*5 convolutional layer
  • the batch normalization layer, activation layer located after the convolutional layer) after which the maximum pooling process can be performed with a stride of 2.
  • the encoder part consists of 4 layers, and each layer contains 3 residual units.
  • the output of each layer is downsampled to half the resolution by a convolutional block with a stride of 2 and a size of 2*2*2.
  • the decoder part consists of 2 layers, each layer contains a transposed convolutional block of size 2*2*2 for upsampling, and the transposed convolutional block is connected after the codec bridge and 3 residual units.
  • the codec bridge contains 2 residual units, which are used to alleviate the semantic divergence between the encoder part and the decoder part.
  • the feature map size output by the decoder part is The feature map can be input into the probability prediction branch and the position prediction branch respectively.
  • the probability prediction branch can contain 3 convolutional blocks with a size of 3*3*3 and a convolutional layer with a single output channel
  • the position prediction branch can contain 3 A convolutional block of size 3*3*3 and a convolutional layer with three output channels. That is, the feature map output by the decoder part is used for prediction by the probability prediction branch to obtain a size of The prediction probability matrix is used by the position prediction branch to predict the size of The position prediction matrix of .
  • FIG. 3 is a schematic diagram of a process of an embodiment of a training method for an image detection model of the present disclosure. As shown in FIG. 3 , the sample image is respectively detected by the preset detection model M2 and the image detection model M1 to obtain the first prediction information and the second prediction information.
  • the white-filled circle and the white-filled X in the first prediction information both indicate the first predicted position, and the first predicted position shown by the white-filled circle corresponds to the sample position, that is, the preset detection model M2 in the sample image
  • the target object is also predicted at the marked sample position, but the first predicted position shown by the white-filled X does not correspond to any sample position, that is, the preset detection model M2 predicts in the area where the sample position is not marked in the sample image
  • the first predicted position that does not correspond to any sample position can be regarded as a false positive (that is, False Positive, FP )Target.
  • the predicted position corresponds to the sample position, it may indicate that the predicted position and the sample position are at the same position, or it may also indicate that the predicted position and the sample position
  • the positions are no more than a preset distance apart (eg, 2 pixel distance, 3 pixel distance, etc.).
  • the predicted position when the sample position represents the center position of the image area where the target object exists in the sample image, if the predicted position corresponds to the sample position, it may indicate that the distance between the predicted position and the sample does not exceed a preset distance, or it may also indicate that the predicted The position is at the same position as the sample position; or, when the sample position includes the position of the pixel point belonging to the target object in the sample image, if the predicted position corresponds to the sample position, it can mean that the predicted position and the sample position are at the same position, where No limit.
  • Step S13 Filter the second predicted position corresponding to the first reference position and its second predicted probability value in the second predicted information to obtain third predicted information.
  • the first reference position is a first predicted position that does not correspond to any sample position and whose first predicted probability value satisfies a first preset condition.
  • the predicted position corresponds to the sample position, it may mean that the predicted position and the sample position are at the same position, or it may also mean that the distance between the predicted position and the sample position does not exceed a preset distance. Therefore, in the case where the prediction position corresponds to the sample position, which means that the prediction position and the sample position are at the same position, the first reference position does not correspond to any sample position, which may indicate that the first reference position is not at the same position as any sample position; Alternatively, when the predicted position corresponds to the sample position, it means that the distance between the predicted position and the sample position does not exceed the preset distance, and the first reference position does not correspond to any sample position, it can mean that the distance between the first reference position and any sample position is equal. Exceeded the preset distance.
  • the first preset condition may include that the first predicted probability value is higher than a preset probability threshold, and the preset probability threshold may be set to 0.5, 0.6, 0.7, etc. according to actual conditions, which is not limited herein.
  • the preset probability threshold may be set to 0.5, 0.6, 0.7, etc. according to actual conditions, which is not limited herein.
  • the prediction loss of the first prediction position can also be calculated based on the first prediction probability value of the first prediction position not corresponding to any sample position, and the prediction loss is positively correlated with the first prediction probability value, that is The larger the first prediction probability value of the first prediction position that does not correspond to any sample position, the greater the prediction loss.
  • a preset loss threshold ⁇ can be set, and will not be related to any sample position
  • the corresponding first predicted position is regarded as a false positive target, and the false positive target whose prediction loss is greater than the preset loss threshold ⁇ is regarded as a potential missed position.
  • the preset loss threshold ⁇ reference may be made to the calculation process of the reference loss value described below, which will not be repeated here.
  • the first predicted position shown by the white-filled X does not correspond to any sample position, that is, the first predicted position shown by the white-filled X They can be regarded as false positive targets, and can be recorded as fp1, fp2, fp3 and fp4 respectively for the convenience of distinction.
  • the prediction loss L(fp1) of the false positive target fp1 can be calculated as 7.8,
  • the prediction loss L(fp2) of the false positive target fp2 is 0.4
  • the prediction loss L(fp3) of the false positive target fp3 is 5.2
  • the prediction loss L(fp4) of the false positive target fp4 is 2.2
  • L(fp1) and L( fp3) are greater than the preset loss threshold ⁇ , so both the false positive target fp1 and the false positive target fp3 can be regarded as potential missed label positions, that is, the first reference position described in the embodiment of the present disclosure.
  • the preset loss threshold ⁇ can refer to the calculation process of the reference loss value described below, which will not be
  • the third prediction information is obtained by filtering the second prediction position corresponding to the first reference position and its second prediction probability value in the second prediction information, and in the subsequent training process, based on the third prediction information , to adjust the network parameters of the image detection model, that is, to be able to ignore the influence of the potential missing mark position on the training image detection model as much as possible.
  • both the false positive target fp1 and the false positive target fp3 are regarded as potential missed label positions, that is, the first reference position described in the embodiment of the present disclosure, and as shown in Figure 3, the second prediction information contains a second prediction position corresponding to the first reference position fp3 (that is, the X shown in white in the second prediction information output by the forward propagation of the image detection model M1 in Figure 3 position), and filter the second predicted position and its second predicted probability value to obtain the second predicted information.
  • Other situations can be deduced by analogy, and no more examples will be given here.
  • a first filtering mechanism can be further introduced to further select from the first predicted positions that do not correspond to any sample position and whose first predicted probability value satisfies the first preset condition.
  • the first predicted position of is used as the first reference position.
  • the first filter condition may include: the first predicted probability value is sorted from high to low in the first predicted position of the previous preset ordinal value.
  • the preset value can be set to be positively correlated with training rounds.
  • the first reference position FPex can be expressed as:
  • FP ex argmax FP:
  • n represents the number of the first reference position FP ex
  • L focal (FP) represents the total loss value from the n first prediction positions
  • the loss value and the first predicted probability value can be positively correlated.
  • the first predicted probability value can be selected from the first predicted position in the top n positions in order from high to low, as the first Reference location.
  • the preset value n can be expressed as:
  • n ⁇ (e) ⁇ batchsize...(2);
  • e represents the training round
  • batchsize represents the batch sample size during batch training (that is, the number of samples input for one training)
  • is used to control the preset value. For example, in the initial training stage, ⁇ can be 0 so that the model learns a common pattern, and as the training rounds increase, ⁇ can be gradually increased.
  • the above-mentioned first filtering mechanism has better performance when the batch samples are large.
  • the batch samples are usually small.
  • a second filtering mechanism can also be introduced, specifically for the current training round and several previous training rounds: obtain the first prediction that does not correspond to any sample position and whose first prediction probability value satisfies the first preset condition The total loss value of the position; and from the total loss values of the current training round and several training rounds before it, select a preset number of total loss values that meet the second filtering condition as the reference loss value; in the current training round , from the first predicted positions that do not correspond to any sample position and whose first predicted probability value satisfies the first preset condition, the first predicted position whose loss value is greater than the total loss value can be selected as the first predicted position a reference position.
  • the second filtering condition may include: the total loss value is sorted from high to low by a preset numerical value.
  • the first reference position FP ex can be expressed as:
  • L local (fp i ) represents the loss value of the i-th first predicted position among the first predicted positions that do not correspond to any sample position and whose first predicted probability value satisfies the first preset condition
  • represents the reference loss value.
  • the reference loss value ⁇ can be expressed as:
  • FP h represents the first prediction that does not correspond to any sample position and that the first prediction probability value satisfies the first preset condition, which are respectively predicted in the current training round and several training rounds before it.
  • position n represents the preset value
  • max represents the maximum value.
  • FP h can be expressed as:
  • k represents the total number of rounds of the current training round and several previous training rounds
  • the preset value can be positively correlated with the total number of rounds and positively correlated with the number of training rounds.
  • n can be expressed as:
  • n ⁇ (e) ⁇ batchsize ⁇ k...(6);
  • Step S14 Adjust network parameters of the image detection model based on the third prediction information.
  • the prediction loss of the image detection model may be calculated based on the third prediction information, and the network parameters of the image detection model may be adjusted based on the prediction loss.
  • loss functions such as L1 loss function and L2 loss function can be used to calculate the prediction loss.
  • L1 loss function and L2 loss function For the specific calculation process, please refer to the relevant formulas of L1 loss function and L2 loss function, which will not be repeated here.
  • gradient backpropagation can be performed, and the network parameters of the image detection model can be optimized through gradient descent, so the gradient contribution of the above-mentioned potential missing label positions can be ignored.
  • the pre-trained detection model can be pre-trained on conventional target detection tasks, and then fine-tuned the pre-trained pre-trained detection model by using a small number of sample images without missing marks.
  • the preset detection model can be adapted to the target detection task of sample images, and on the other hand, through fine-tuning training on a small number of sample images without missing marks, the preset detection model can also have a certain detection accuracy. After that, the steps in the embodiments of the present disclosure can be executed to train the image detection model.
  • the pre-training step in the routine target detection task may include: obtaining a routine medical image, wherein the actual location of the medical object is marked in the routine medical image, and the medical object can be different
  • the medical object for example, in the case that the target object includes rib fracture, the medical object may include femur fracture, and other cases may be deduced by analogy, and no more examples are given here.
  • the number of routine medical images can be as large as possible, for example, thousands of routine medical images can be collected.
  • the preset detection model can be used to detect conventional medical images to obtain the predicted position of the medical object, so that based on the difference between the predicted position and the actual position, the network parameters of the preset detection model can be adjusted, and the above A step of using the preset detection model to detect the routine medical image and subsequent steps until the preset detection model converges on the routine target detection task.
  • the predicted loss calculated based on the predicted position and the actual position is smaller than a preset loss threshold.
  • a small number of sample images can be used for transfer learning, where the sample positions of the target objects are marked in the sample images, and there are no missing target objects in these small sample images.
  • the pre-trained detection model can be used to detect the sample image to obtain the predicted position of the target object, and based on the difference between the sample position and the predicted position of the target object, the network parameters of the preset detection model can be adjusted again , and re-perform the above-mentioned steps of using the pre-trained preset detection model to detect the sample image and subsequent steps until the preset detection model converges on a small number of sample images.
  • the image detection model can be obtained through several rounds of training, and in the case that the current training round satisfies the second preset condition, the above step S12 and the subsequent steps can be executed to perform image detection in the current training round The detection model is trained, and when the current training round does not meet the second preset condition, the preset detection model is trained.
  • the image detection model is trained through several rounds, and when the current training round satisfies the second preset condition, the image detection model is trained, and when the current training round does not meet the second preset condition, the preset
  • the detection model is trained, that is, in the training process of the image detection model, the preset detection model is also trained alternately, so the image detection model and the preset detection model can be taught each other during the training process, compared with a single Accurately training the image detection model can help reduce the cumulative error, so as to further reduce the impact of missing labels on the performance of the model as much as possible.
  • the alternate training strategy of the preset detection model and the image detection model can be adjusted by setting the above-mentioned second preset condition.
  • the second preset condition may include that the current training round is an odd number, that is, when the current training round is an odd number, the image detection model can be trained, and when the current training round is not an odd number, the preset detection model can be trained. model; or, the second preset condition can also include that the current training round is an even number, that is, when the current training round is an even number, the image detection model can be trained, and when the current training round is an odd number, the image detection model can be trained Train a preset detection model.
  • the image detection model and the preset detection model can be trained in a balanced manner according to a one-by-one training strategy.
  • the second preset condition may also be set to include: the remainder of dividing the current training round by the first value is greater than the second value, and the second value is smaller than the first value. Taking the first value equal to 4 and the second value equal to 0 as an example, when the remainder of the current training round divided by 4 is greater than 0, the image detection model can be trained, that is, the current training round is 1, 2, 3, When 5, 6, 7, 9, 10, 11, etc.
  • the image detection model can be trained, and when the remainder of the current training round divided by 4 is not greater than 0, the preset detection model can be trained , that is, when the current training rounds are 4, 8, 12, etc., which are integer multiples of 4, the preset detection model can be trained. Training, other situations can be deduced by analogy, no more examples here.
  • the image detection model and the preset detection model can be executed multiple times The training strategy of one time (or one time more than many times) is more focused on training.
  • the above step S12 can be performed, that is, the preset detection model and the image detection model can be used Perform target detection on the sample images respectively to obtain the first prediction information and the second prediction information of the target object, and then filter the first prediction position and its first prediction probability value corresponding to the second reference position in the first prediction information to obtain
  • the fourth prediction information, and the second reference position is a second prediction position that does not correspond to any sample position and the second prediction probability value satisfies the first preset condition, and based on the fourth prediction information, adjust the network of the preset detection model parameter.
  • the above-mentioned first preset condition includes that the second predicted probability value is higher than the preset probability threshold.
  • the steps of using the preset detection model and the image detection model to perform target detection on the sample image to obtain the first prediction information and the second prediction information of the target object are also performed first, based on Then filter the first prediction position corresponding to the second reference position and its first prediction probability value in the first prediction information to obtain the fourth prediction information, and the second reference position is not corresponding to any sample position and the second prediction probability value is The probability value satisfies the second predicted position of the first preset condition, and based on the fourth predicted information, adjust the network parameters of the preset detection model, that is, during the training process of the preset detection model, it can be predicted by screening the image detection model Among the second predicted positions of , the second reference position that does not correspond to any sample position and whose second predicted probability value satisfies the first preset condition, because the second predicted position that does not correspond to any sample position can be regarded as a false positive position , and the prediction probability value is related to the prediction loss, so the second
  • the fourth prediction information is obtained by predicting the probability value, so as to adjust the network parameters of the preset detection model through the fourth prediction information, which can reduce the impact of missing labels on the performance of the model as much as possible, and because both of them can reduce the impact of missing labels on themselves as much as possible. Therefore, in the process of alternating training between the preset image detection model and the image detection model, it can help improve the quality of mutual teaching between the two and improve the performance of the model.
  • the first filtering mechanism or the second filtering mechanism described in the foregoing disclosed embodiments may also be used.
  • the second predicted position that meets the first filtering condition can be further selected from the second predicted positions that do not correspond to any sample position and whose second predicted probability value satisfies the first preset condition , as the second reference position.
  • the first filter condition may include: the second predicted probability value is sorted from high to low in the second predicted position of the previous preset ordinal value; or, when the second filtering mechanism is used, it can be used for the current training round and Execute several training rounds separately: obtain the total loss value of the second predicted position that does not correspond to any sample position and whose second predicted probability value satisfies the first preset condition; and from the current training round and several previous training rounds Among the total loss values of , select a preset number of total loss values that meet the second screening condition as the reference loss value; in the current training round, it can never correspond to any sample position and the second predicted probability value satisfies the first Among the second predicted positions with preset conditions, a second predicted position whose loss value is greater than the total loss value is selected as the second reference position.
  • the second filtering condition may include: the total loss value is sorted from high to low by a preset numerical value. For details, reference may be made to the foregoing related descriptions, and details are not repeated here.
  • the preset detection model M2 when training the image detection model M1, can select a false positive (ie FP) target with a large prediction loss as a potential leak position, and ignore the contribution of these potential missed mark positions to the gradient update of the image detection model M1; conversely, when training the preset detection model M2, the image detection model M1 can select the false positive target with a large prediction loss as a potential leak Mark position, and ignore the contribution of these potential missing mark positions to the gradient update of the preset detection model M2.
  • a false positive (ie FP) target a large prediction loss as a potential leak position
  • the image detection model M1 when training the preset detection model M2, can select the false positive target with a large prediction loss as a potential leak Mark position, and ignore the contribution of these potential missing mark positions to the gradient update of the preset detection model M2.
  • the memory characteristics of the neural network are effectively used, that is, the network is in the Fit the common pattern before fitting a specific sample, and the missing label position has a large loss because it does not conform to the common pattern in the early stage of training, so by ignoring its contribution to the gradient update, its influence can be eliminated, so that the network can focus on
  • the learning of the common model and the mutual teaching during the alternate training process of the two models can effectively reduce the cumulative error that may be caused by the training of a single model.
  • the sample image may include a bone region. Since the bone region usually appears as a long strip (such as ribs, etc.) in the sample image, directly detecting the sample image will result in The image area traversed in the process may not be the bone area, thus slowing down the detection efficiency. In view of this, in order to improve the detection efficiency, after the sample image is acquired, the bone area in the sample image can be identified first, and several sample sub-images can be intercepted from the sample image, and the set of several sample sub-images can cover the bone area.
  • the preset detection model and the image detection model when used to detect the target of the sample image, the preset detection model and the image detection model can be used to detect the target of several sample sub-images, so that only the bone area can be detected. Target detection can greatly improve the detection efficiency.
  • the manners of identifying the rib area and intercepting the sample sub-image reference may be made to relevant descriptions in the following disclosed embodiments, which will not be repeated here.
  • the first prediction information and the second prediction information of the target object are obtained information, and the first prediction information includes several first prediction positions and their first prediction probability values, and the second prediction information includes several second prediction positions and their second prediction probability values.
  • the second predicted position corresponding to the first reference position and its second predicted probability value are used to obtain the third predicted information, and the first reference position does not correspond to any sample position and the first predicted probability value satisfies the first preset condition , and further adjust the network parameters of the image detection model based on the third prediction information, that is, during the training process of the image detection model, the first prediction position predicted by the preset detection model can be screened to be different from The first reference position corresponding to any sample position and the first predicted probability value satisfies the first preset condition, because the first predicted position not corresponding to any sample position can be regarded as a false positive position, and the predicted probability value is the same as the predicted The loss is related, so the first reference position can be regarded as a potential missing label position.
  • the third prediction information is obtained by filtering the second predicted position corresponding to the potential missing label position and its second predicted probability value in the second prediction information , so as to adjust the network parameters of the image detection model through the third prediction information, which can reduce the impact of missing labels on the performance of the model as much as possible.
  • FIG. 4 is a schematic flowchart of an embodiment of an image detection method of the present disclosure. As shown in Figure 4, the embodiment of the present disclosure may specifically include the following steps:
  • Step S41 Obtain an image to be detected.
  • the image to be detected may be a medical image, specifically including but not limited to: CT images, MR images, etc., which are not limited here .
  • Other application scenarios can be deduced in the same way, so we will not give examples one by one here.
  • the target objects contained in the image to be detected may include but not limited to: fractures, hematomas, etc., which are not limited here .
  • Other application scenarios can be deduced in the same way, so we will not give examples one by one here.
  • Step S42 Use the image detection model to detect the image to be detected, and obtain the detection information of the target object in the image to be detected.
  • the detection information includes the target position of the target object in the image to be detected, and the image detection model is obtained by using the steps in any of the above image detection model training method embodiments.
  • the image detection model is obtained by using the steps in any of the above image detection model training method embodiments.
  • the target position may be the center position of the image area containing the target object in the image to be detected.
  • the image area containing the target object can be a rectangular area, and the target position can be the center of the rectangular area; or, when the image to be detected is a three-dimensional image , the image area containing the target object may be a cuboid area, and the target position may be the center of the cuboid area.
  • taking the image to be detected as an example of a medical image in the case that the image to be detected includes a bone area and the target object includes a fracture, before using the image detection model to detect the image to be detected, it can be identified first
  • the bone area in the image is detected, and several sub-images are intercepted from the image to be detected, and the set of several sub-images can cover the bone area.
  • the image detection model is used to detect several sub-images respectively, and the detection information of the target object in each sub-image is obtained.
  • the detection information of the target object in the sub-image may include the target position of the fracture, or may not include the target position, that is, there is no fracture in the sub-image.
  • the image to be detected is a medical image
  • the image to be detected includes a bone area
  • the target object includes a fracture.
  • the image detection model Before using the image detection model for detection, first identify the bone area in the image to be detected, and intercept several sub-image, and the set of several sub-images can cover the bone area, based on this, the image detection model can be used to detect several sub-images respectively, and the detection information of the target object in each sub-image can be obtained, because the bone area usually appears as a strip-shaped area , compared to traversing the non-bone region for detection by traversing the image to be detected, only detecting a few sub-images can help greatly reduce the time required for image detection and improve image detection efficiency.
  • an image recognition model for identifying the bone region may be pre-trained, and the image recognition model may include but not limited to: U-Net, etc., which is not limited here.
  • the image to be detected can be identified by using the image recognition model to obtain the bone region in the image to be detected.
  • the image to be detected can also be binarized first. Since the density of the bone area in the image to be detected is different from the density of the tissue, it appears as a different gray level in the image to be detected, so that it can be preset A grayscale threshold to differentiate between bony and non-bone regions. After that, in order to improve the accuracy of the bone region, based on the shape characteristics of the bone region (such as long strips, with a certain curvature, etc.), the identified bone region can be processed by denoising to identify the final bone region .
  • the size of each sub-image can be the same, for example, for a two-dimensional image, the size of the sub-image can be 16*16, or 32*32, or 48*48, etc. or, for a three-dimensional image, the size of the sub-image may be 16*16*16, or 32*32*32, or 48*48*48, etc., which is not limited here.
  • FIG. 5 is a schematic diagram of an embodiment of intercepting a sub-image.
  • the bar area represents the rib area, and from left to right respectively represents the view of the front rib area, the view of the middle rib area and the view of the rear rib area.
  • the embodiment of the present disclosure first identifies the rib area, and based on the identified rib area, intercepts several rib areas from the image to be detected.
  • sub-images and several sub-images can cover the rib area, so that the intercepted sub-images can be greatly reduced.
  • 4 sub-images can be intercepted in the front rib area (shown by the cube in Figure 5)
  • 4 sub-images can be intercepted in the middle rib area (shown by the cube in Figure 5)
  • 4 sub-images can be intercepted in the posterior rib area 5 sub-images (shown as cubes in Figure 5).
  • Other situations can be deduced by analogy, and no more examples will be given here.
  • the detection information of the target object in the image to be detected is obtained, and the detection information includes the target position of the target object in the image to be detected, and the image detection model It is obtained by using the training method of the image detection model in the first aspect above, that is, in the training process of the image detection model, the influence of missing labels on the performance of the model can be reduced as much as possible, so it can help improve the detection of the target object. Accuracy of Information.
  • FIG. 6 is a schematic diagram of an embodiment of a training device 60 for an image detection model in the present disclosure.
  • the training device 60 of the image detection model includes: an image acquisition module 61, a target detection module 62, an information filtering module 63 and a parameter adjustment module 64, and the image acquisition module 61 is configured to acquire a sample image; Sample position; the target detection module 62 is configured to use a preset detection model and an image detection model to detect the sample image respectively, and obtain the first prediction information and the second prediction information of the target object; wherein, the first prediction information includes a number of first predictions position and its first predicted probability value, the second predicted information includes several second predicted positions and its second predicted probability value; the information filtering module 63 is configured to filter the second predicted position corresponding to the first reference position in the second predicted information and its second predicted probability value to obtain the third predicted information, wherein the first reference position is not corresponding to any sample position and the first predicted probability value satisfies the first preset condition; the parameter adjustment module 64 It is:
  • the first prediction information and the second prediction information of the target object are obtained , and the first prediction information includes several first prediction locations and their first prediction probability values, and the second prediction information includes several second prediction locations and their second prediction probability values.
  • filter the second prediction information with The second predicted position corresponding to the first reference position and its second predicted probability value are obtained to obtain the third predicted information, and the first reference position is not corresponding to any sample position and the first predicted probability value satisfies the first preset condition
  • the first predicted position, and further based on the third predicted information adjust the network parameters of the image detection model, that is, during the training process of the image detection model, the first predicted position predicted by the preset detection model can be filtered without any A sample position corresponds to the first reference position whose first predicted probability value satisfies the first preset condition.
  • the third predicted information is obtained by filtering the second predicted position corresponding to the potential missed mark position and its second predicted probability value in the second predicted information, By adjusting the network parameters of the image detection model through the third prediction information, the influence of missing labels on the performance of the model can be reduced as much as possible.
  • the image detection model is obtained through several rounds of training, wherein, when the current training round satisfies the second preset condition, the sample image is respectively performed using the preset detection model and the image detection model Target detection, the step of obtaining the first prediction information and the second prediction information of the target object and subsequent steps to train the image detection model, and when the current training round does not meet the second preset condition, the training preset detection model.
  • the image detection model is trained through several rounds, and when the current training round satisfies the second preset condition, the image detection model is trained, and when the current training round does not meet the second preset condition, the preset detection
  • the model is trained, that is, during the training process of the image detection model, the preset detection model is also trained alternately, so the image detection model and the preset detection model can be taught each other during the training process, compared with a single Training the image detection model can help reduce the cumulative error, so as to further reduce the impact of missing labels on the performance of the model as much as possible.
  • the image detection model training device 60 further includes a model training module configured to train a preset detection model, and the model training module includes a detection sub-module configured to use the preset detection model and the image detection model to respectively The sample image is detected to obtain the first prediction information and the second prediction information of the target object.
  • the model training module also includes a filtering sub-module configured to filter the first prediction position corresponding to the second reference position and its second prediction information in the first prediction information.
  • a predicted probability value to obtain fourth predicted information wherein, the second reference position is a second predicted position that does not correspond to any sample position and the second predicted probability value satisfies the first preset condition, and the model training module also includes an adjuster A module configured to adjust network parameters of a preset detection model based on the fourth prediction information.
  • the steps of using the preset detection model and the image detection model to detect the sample image respectively to obtain the first prediction information and the second prediction information of the target object are also performed first, and based on this, further Filtering the first predicted position and its first predicted probability value corresponding to the second reference position in the first predicted information to obtain the fourth predicted information, and the second reference position is not corresponding to any sample position and the second predicted probability value
  • the second predicted position that satisfies the first preset condition, and based on the fourth predicted information adjust the network parameters of the preset detection model, that is, during the training process of the preset detection model, the first position predicted by the image detection model can be filtered Among the two predicted positions, the second reference position that does not correspond to any sample position and whose second predicted probability value satisfies the first preset condition can be regarded as a false positive position because the second predicted position does not correspond to any sample position, and The predicted probability value is related to the predicted loss, so the second reference position can be regarded as
  • the fourth prediction information is obtained to adjust the network parameters of the preset detection model through the fourth prediction information, which can reduce the impact of missing labels on model performance as much as possible, and because both of them can reduce the impact of missing labels on their own performance as much as possible. Influence, and then in the alternate training process of the preset image detection model and the image detection model, it can help to improve the quality of mutual teaching between the two and improve the performance of the model.
  • the second preset condition includes any of the following: the current training round is an odd number, the current training round is an even number, and the remainder of the current training round divided by the first value is greater than the second value; wherein, The second value is less than the first value.
  • the image detection model and the preset detection model can be trained in a balanced manner according to a one-to-one training strategy, and by setting The second preset condition is set such that the remainder of the current training round divided by the first value is greater than the second value, and the second value is smaller than the first value, so that the image detection model and the preset detection model can be compared once (or once)
  • the training strategy of more than multiple times) focuses on training.
  • the first preset condition in the case of training an image detection model, includes: the first predicted probability value is higher than a preset probability threshold; in the case of training a preset detection model, the first preset condition Including: the second predicted probability value is higher than a preset probability threshold.
  • the image detection model by setting the first preset condition that the first predicted probability value is higher than the preset probability threshold, and in the case of training the preset detection model, by setting the first preset condition If the second prediction probability value is set to be higher than the preset probability threshold, during the training process of the image detection model and the preset detection model, false positive positions with large losses can be screened out as potential missed mark positions, and their impact on training can be ignored.
  • the impact of the image detection model can help to further reduce the impact of missing labels on the performance of the model as much as possible.
  • the preset detection model and the image detection model have the same network structure; and/or, the sample image is a medical image, and the sample image includes a bone region, and the target object includes a fracture.
  • the preset detection model and the image detection model have the same network structure, that is, the preset detection model used for teaching the image detection model has the same network structure as the image detection model, which can help reduce the training difficulty of the image detection model; and
  • the image detection model can be applied to fracture detection.
  • the image detection model includes a feature extraction network and a result prediction network
  • the feature extraction network is used to extract the sample feature map of the sample image
  • the result prediction network is used to use the sample feature map to predict and obtain the second prediction information
  • the sample A ratio between the resolution of the feature map and the resolution of the sample image is within a preset range.
  • the image detection model includes a feature extraction network and a result prediction network
  • the feature extraction network is used to extract the sample feature map of the sample image
  • the result prediction network is used to use the sample feature map to predict and obtain the second prediction information
  • the resolution of the sample feature map The ratio between the rate and the resolution of the sample image is within the preset range, that is, the resolution of the sample feature map will not be too large, thereby reducing the load on the hardware equipment used to train the image detection model, and the resolution of the sample feature map The rate will not be too small, so that it can provide enough feature information for the result prediction network, which is conducive to improving the prediction accuracy.
  • FIG. 7 is a schematic frame diagram of an embodiment of an image detection device 70 of the present disclosure.
  • the image detection device 70 includes: an image acquisition module 71 and a target detection module 72, the image acquisition module 71 is configured to acquire an image to be detected; the target detection module 72 is configured to use an image detection model to detect the image to be detected, and obtain the target object in the image to be detected The detection information; wherein, the detection information includes the target position of the target object in the image to be detected, and the image detection model is obtained by using the image detection model training device in any of the above image detection model training device embodiments.
  • the image to be detected is a medical image
  • the image to be detected includes a bone region
  • the target object includes a fracture
  • the image detection device 70 further includes a region identification module configured to identify the bone region in the image to be detected
  • the image detection The device 70 also includes an image interception module, which is configured to intercept several sub-images from the image to be detected; wherein, the set of several sub-images covers the bone area
  • the target detection module 72 is specifically configured to detect several sub-images respectively using an image detection model, The detection information of the target object in each sub-image is obtained.
  • the image to be detected is a medical image, and the image to be detected includes a bone area, and the target object includes a fracture.
  • image, and the set of several sub-images can cover the bone area.
  • the image detection model can be used to detect several sub-images respectively to obtain the detection information of the target object in each sub-image. Since the bone area usually appears as a strip-shaped area, Compared with traversing the non-bone region for detection by traversing the image to be detected, only detecting a few sub-images can help greatly reduce the time required for image detection and improve image detection efficiency.
  • FIG. 8 is a schematic frame diagram of an embodiment of an electronic device 80 of the present disclosure.
  • the electronic device 80 includes a memory 81 and a processor 82 coupled to each other.
  • the processor 82 is used to execute the program instructions stored in the memory 81, so as to realize the steps in any of the above-mentioned embodiments of the image detection model training method, or to realize any of the above-mentioned Steps in an embodiment of an image detection method.
  • the electronic device 80 may include, but is not limited to: a microcomputer and a server.
  • the electronic device 80 may also include mobile devices such as notebook computers and tablet computers, which are not limited here.
  • the processor 82 is used to control itself and the memory 81 to implement the steps in any of the above embodiments of the image detection model training method, or to realize the steps in any of the above embodiments of the image detection method.
  • the processor 82 may also be called a CPU (Central Processing Unit, central processing unit).
  • the processor 82 may be an integrated circuit chip with signal processing capabilities.
  • the processor 82 can also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field-programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the processor 82 may be jointly realized by an integrated circuit chip.
  • the above solution can reduce the impact of missing labels on the performance of the model as much as possible, improve the performance of the image detection model, and improve the accuracy of image detection.
  • FIG. 9 is a schematic diagram of an embodiment of a computer-readable storage medium 90 in the present disclosure.
  • the computer-readable storage medium 90 stores program instructions 901 that can be executed by the processor, and the program instructions 901 are used to implement the steps in any of the above-mentioned image detection model training method embodiments, or to implement any of the above-mentioned image detection method embodiments. A step of.
  • the above solution can reduce the impact of missing labels on the performance of the model as much as possible, improve the performance of the image detection model, and improve the accuracy of image detection.
  • the disclosed methods and devices may be implemented in other ways.
  • the device implementations described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present disclosure is essentially or part of the contribution to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the present disclosure discloses an image detection method, a related training method, and related devices, equipment, and media
  • the image detection model training method includes: acquiring a sample image; using a preset detection model and an image detection model to detect the sample image respectively , to obtain the first prediction information and the second prediction information of the target object, and the first prediction information includes several first prediction positions and their first prediction probability values, and the second prediction information includes several second prediction positions and their second prediction probabilities value; filtering the second predicted position and its second predicted probability value corresponding to the first reference position in the second predicted information to obtain third predicted information; adjusting network parameters of the image detection model based on the third predicted information.
  • the above solution can reduce the impact of missing labels on the performance of the model as much as possible, improve the performance of the image detection model, and improve the accuracy of image detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例公开了一种图像检测方法和相关训练方法及相关装置、设备及介质,其中,图像检测模型的训练方法包括:获取样本图像;利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息,且第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值;过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息;基于第三预测信息,调整图像检测模型的网络参数。上述方案,能够减轻漏标对模型性能的影响,提高图像检测模型的性能,以提高图像检测的准确性。

Description

图像检测方法和相关训练方法及相关装置、设备及介质
相关申请的交叉引用
本专利申请要求2021年6月7日提交的中国专利申请号为202110633460.8、申请人为上海商汤智能科技有限公司,申请名称为“图像检测方法和相关训练方法及相关装置、设备及介质”的优先权,该申请的全文以引用的方式并入本公开中。
技术领域
本公开涉及人工智能技术领域,尤其涉及一种图像检测方法和相关训练方法及相关装置、设备及介质。
背景技术
近年来,得益于人工智能技术的快速发展,神经网络模型已经逐渐应用于诸如医学、交通等众多应用场景中,以辅助相关工作人员开展工作,提高工作效率。通常来说,神经网络模型的性能与样本图像的标注质量息息相关。然而,在现实场景中,人工标注往往会存在漏标的情况,从而损害神经网络模型的性能。
发明内容
本公开实施例提供一种图像检测方法和相关训练方法及相关装置、设备及介质。
本公开实施例第一方面提供了一种图像检测模型的训练方法,包括:获取样本图像;其中,样本图像中标注有目标对象的若干样本位置;利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息;其中,第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值;过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,其中,第一参考位置为不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置;基于第三预测信息,调整图像检测模型的网络参数。
因此,通过获取样本图像,且样本图像中标注有目标对象的若干样本位置,并利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息,且第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值,在此基础上,过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,且第一参考位置为不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置,并进一步基于第三预测信息,调整图像检测模型的网络参数,即在图像检测模型的训练过程中,可以通过筛选预设检测模型所预测到的第一预测位置中不与任一样本位置对应且第一预测概率值满足第一预设条件的第一参考位置,由于不与任一样本位置对应的第一预测位置可视为假阳位置,而预测概率值又与预测损失相关,故第一参考位置可视为潜在的漏标位置,基于此通过过滤第二预测信息中与潜在的漏标位置对应的第二预测位置及其第二预测概率值得到第三预测信息,以通过第三预测信息调整图像检测模型的网络参数,能够尽可能地减轻漏标对模型性能的影响,提高图像检测模型的性能,以提高图像检测的准确性。
在一些实施例中,图像检测模型是经过若干轮训练得到的,其中,在当前训练轮次满足第二预设条件的情况下,执行利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息的步骤以及后续步骤,以对图像检测模型进行训练,并在当前训练轮次不满足第二预设条件的情况下,训练预设检测模型。
因此,通过若干轮训练图像检测模型,且在当前训练轮次满足第二预设条件时,对图像检测模型进行训练,而在当前训练轮次不满足第二预设条件时,对预设检测模型进行训练,即在图像检测模型的训练过程中,预设检测模型也在其中交替地进行训练,故能够在训练过程中,使图像检测模型和预设检测模型相互教学,相较于单一地训练图像检测模型,能够有利于减少累积误差,从而能够进一步尽可能地减轻漏标对模型性能的影响。
在一些实施例中,训练预设检测模型的步骤包括:执行利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息的步骤;过滤第一预测信息中与第二参考位置对应的第一预测位置及其第一预测概率值,得到第四预测信息;其中,第二参考位置为不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置;基于第四预测信息,调整预设检测模型的网络参数。
因此,在训练预设检测模型的过程中,同样先执行利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息的步骤,基于此再过滤第一预测信息中与第二参考位置对应的第一预测位置及其第一预测概率值,得到第四预测信息,且第二参考位置为不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置,并基于第四预测信息,调整预设检测模型的网络参数,即在预设检测模型的训练过程中,可以通过筛选图像检测模型所预测到的第二预测位置中不与任一样本位置对应且第二预测概率值满足第一预设条件的第二参考位置,由于不与任一样本位置对应的第二预测位置可视为假阳位置,而预测概率值又与预测损失相关,故第二参考位置可视为潜在的漏标位置,基于此通过过滤第一预测信息中与潜在的漏标位置对应的第一预测位置及其第一预测概率值得到第四预测信息,以通过第四预测信息调整预设检测模型的网络参数,能够尽可能地减轻漏标对模型性能的影响,且由于两者均能够尽可能减轻漏标对自身性能的影响,进而在预设图像检测模型与图像检测模型的交替训练过程中,能够有利于提高两者相互教学的质量,有利于提高模型性能。
在一些实施例中,第二预设条件包括以下任一者:当前训练轮次为奇数、当前训练轮次为偶数、当前训练轮次除以第一数值的余数大于第二数值;其中,第二数值小于第一数值。
因此,通过将第二预设条件设置为当前训练轮次为奇数或者当前训练轮次为偶数,能够使得图像检测模型和预设检测模型按照一次比一次的训练策略均衡地进行训练,而通过将第二预设条件设置为当前训练轮次除以第一数值的余数大于第二数值,且第二数值小于第一数值,能够使得图像检测模型和预设检测模型按照多次比一次(或一次比多次)的训练策略有侧重地进行训练。
在一些实施例中,在训练图像检测模型的情况下,第一预设条件包括:第一预测概率值高于预设概率阈值;在训练预设检测模型的情况下,第一预设条件包括:第二预测概率值高于预设概率阈值。
因此,通过在训练图像检测模型的情况下,将第一预设条件设置为第一预测概率值高于预设概率阈值,以及在训练预设检测模型的情况下,将第一预设条件设置为第二预测概率值高于预设概率阈值,能够在图像检测模型和预设检测模型的训练过程中,筛选出损失较大的假阳位置作为潜在的漏标位置,并忽略其对训练图像检测模型的影响,从而能够有利于进一步尽可能地减轻漏标对模型性能的影响。
在一些实施例中,预设检测模型和图像检测模型具有相同网络结构;和/或,样本图像包括骨区域,目标对象包括骨折。
因此,预设检测模型和图像检测模型具有相同的网络结构,即用于教学图像检测模型的预设检测模型具有与图像检测模型相同的网络结构,能够有利于降低图像检测模型的训练难度;而通过将样本图像设置为包括骨区域,且目标对象设置为包括骨折,能够使图像检测模型应用于骨折检测。
在一些实施例中,图像检测模型包括特征提取网络和结果预测网络,特征提取网络用于提取样本图像的样本特征图,结果预测网络用于利用样本特征图预测得到第二预测信息,且样本特征图的分辨率与样本图像的分辨率之间的比值在预设范围内。
因此,图像检测模型包括特征提取网络和结果预测网络,且特征提取网络用于提取样本图像的样本特征图,结果预测网络用于利用样本特征图预测得到第二预测信息,而样本特征图的分辨率与样本图像的分辨率之间的比值在预设范围内,即样本特征图的分辨率不会太大,从而能够减轻用于训练图像检测模型的硬件设备的负荷,且样本特征图的分辨率也不会太小,从而能够为结果预测网络提供足够的特征信息,有利于提高预测准确性。
本公开实施例第二方面提供了一种图像检测方法,包括:获取待检测图像;利用图像检测模型对待检测图像进行检测,得到待检测图像中目标对象的检测信息;其中,检测信息包括目标对象在待检 测图像中的目标位置,且图像检测模型是利用上述第一方面中的图像检测模型的训练方法得到的。
因此,通过获取待检测图像,并利用图像检测模型对待检测图像进行检测,得到待检测图像中目标对象的检测信息,且检测信息包括目标对象在待检测图像中的目标位置,而图像检测模型是利用上述第一方面中的图像检测模型的训练方法得到的,即在对图像检测模型的训练过程中,能够尽可能地减轻漏标对模型性能的影响,故能够有利于提高目标对象的检测信息的准确性。
在一些实施例中,待检测图像为医学图像,待检测图像包括骨区域,且目标对象包括骨折;在利用图像检测模型对待检测图像进行检测,得到待检测图像中目标对象的检测信息之前,方法还包括:识别待检测图像中的骨区域,并从待检测图像中截取若干子图像;其中,若干子图像的集合覆盖骨区域;利用图像检测模型对待检测图像进行检测,得到待检测图像中目标对象的检测信息,包括:利用图像检测模型分别对若干子图像进行检测,得到每一子图像中目标对象的检测信息。
因此,待检测图像为医学图像,待检测图像包括骨区域,且目标对象包括骨折,在利用图像检测模型进行检测之前,先识别待检测图像中的骨区域,并在待检测图像中截取若干子图像,且若干子图像的集合可以覆盖骨区域,基于此可以利用图像检测模型分别对若干子图像进行检测,得到每一子图像中目标对象的检测信息,由于骨区域通常表现为条状区域,相较于遍历待检测图像进行检测可能会遍历非骨区域,仅对若干子图像进行检测,能够有利于大大减少图像检测所需时间,提高图像检测效率。
本公开第三方面提供了一种图像检测模型的训练装置,包括:图像获取模块、目标检测模块、信息过滤模块和参数调整模块,图像获取模块配置为获取样本图像;其中,样本图像中标注有目标对象的若干样本位置;目标检测模块配置为利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息;其中,第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值;信息过滤模块配置为过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,其中,第一参考位置为不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置;参数调整模块配置为基于第三预测信息,调整图像检测模型的网络参数。
在一些实施例中,图像检测模型是经过若干轮训练得到的,其中,在当前训练轮次满足第二预设条件的情况下,执行利用预设检测模型和图像检测模型分别对样本图像进行目标检测,得到目标对象的第一预测信息和第二预测信息的步骤以及后续步骤,以对图像检测模型进行训练,并在当前训练轮次不满足第二预设条件的情况下,训练预设检测模型。
在一些实施例中,图像检测模型的训练装置还包括配置为训练预设检测模型的模型训练模块,且模型训练模块包括检测子模块,配置为利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息,模型训练模块还包括过滤子模块,配置为过滤第一预测信息中与第二参考位置对应的第一预测位置及其第一预测概率值,得到第四预测信息;其中,第二参考位置为不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置,模型训练模块还包括调整子模块,配置为基于第四预测信息,调整预设检测模型的网络参数。
在一些实施例中,第二预设条件包括以下任一者:当前训练轮次为奇数、当前训练轮次为偶数、当前训练轮次除以第一数值的余数大于第二数值;其中,第二数值小于第一数值。
在一些实施例中,在训练图像检测模型的情况下,第一预设条件包括:第一预测概率值高于预设概率阈值,在训练预设检测模型的情况下,第一预设条件包括:第二预测概率值高于预设概率阈值。
在一些实施例中,预设检测模型和图像检测模型具有相同网络结构;和/或,样本图像为医学图像,且样本图像包括骨区域,目标对象包括骨折。
在一些实施例中,图像检测模型包括特征提取网络和结果预测网络,特征提取网络用于提取样本图像的样本特征图,结果预测网络用于利用样本特征图预测得到第二预测信息,且样本特征图的分辨率与样本图像的分辨率之间的比值在预设范围内。
本公开实施例第四方面提供了一种图像检测装置,包括:图像获取模块和目标检测模块,图像获取模块配置为获取待检测图像;目标检测模块配置为利用图像检测模型对待检测图像进行检测,得到 待检测图像中目标对象的检测信息;其中,检测信息包括目标对象在待检测图像中的目标位置,且图像检测模型是利用上述第三方面中的图像检测模型的训练装置得到的。
在一些实施例中,待检测图像为医学图像,待检测图像包括骨区域,且目标对象包括骨折,图像检测装置还包括区域识别模块,配置为识别待检测图像中的骨区域,图像检测装置还包括图像截取模块,配置为从待检测图像中截取若干子图像;其中,若干子图像的集合覆盖骨区域,目标检测模块还配置为利用图像检测模型分别对若干子图像进行检测,得到每一子图像中目标对象的检测信息。
本公开实施例第五方面提供了一种电子设备,包括相互耦接的存储器和处理器,处理器用于执行存储器中存储的程序指令,以实现上述第一方面中的图像检测模型的训练方法,或实现上述第二方面中的图像检测方法。
本公开实施例第六方面提供了一种计算机可读存储介质,其上存储有程序指令,程序指令被处理器执行时实现上述第一方面中的图像检测模型的训练方法,或实现上述第二方面中的图像检测方法。
上述方案,通过获取样本图像,且样本图像中标注有目标对象的若干样本位置,并利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息,且第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值,在此基础上,过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,且第一参考位置为不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置,并进一步基于第三预测信息,调整图像检测模型的网络参数,即在图像检测模型的训练过程中,可以通过筛选预设检测模型所预测到的第一预测位置中不与任一样本位置对应且第一预测概率值满足第一预设条件的第一参考位置,由于不与任一样本位置对应的第一预测位置可视为假阳位置,而预测概率值又与预测损失相关,故第一参考位置可视为潜在的漏标位置,基于此通过过滤第二预测信息中与潜在的漏标位置对应的第二预测位置及其第二预测概率值得到第三预测信息,以通过第三预测信息调整图像检测模型的网络参数,能够尽可能地减轻漏标对模型性能的影响。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1A示出可以应用本公开实施例的图像检测方法的一种系统架构示意图;
图1B是本公开图像检测模型的训练方法一实施例的流程示意图;
图2是检测模型一实施例的框架示意图;
图3是本公开图像检测模型的训练方法一实施例的过程示意图;
图4是本公开图像检测方法一实施例的流程示意图;
图5是截取子图像一实施例的示意图;
图6是本公开图像检测模型的训练装置一实施例的框架示意图;
图7是本公开图像检测装置一实施例的框架示意图;
图8是本公开电子设备一实施例的框架示意图;
图9是本公开计算机可读存储介质一实施例的框架示意图。
具体实施方式
下面结合说明书附图,对本公开实施例的方案进行详细说明。
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的具体细节,以便透彻理解本公开。
本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,本文中的“多”表示两个或者多于两个。
肋骨骨折是胸部遭受外力冲击后的常见伤害之一,肋骨骨折诊断对于评估创伤严重程度和治疗十分重要。骨折数量的定量测量与直接与伤情鉴定、赔偿、量刑相关。计算机断层扫描(Computed Tomography,CT)相比胸部X线平片有更高的敏感性与特异性。CT图像的评估是一项耗时的工作,需要临床医生的耐心和经验,因此,胸部CT的自动骨折检测系统可以帮助临床医生提高骨折评估的效率和准确性。相关技术中,多采用神经网络对肋骨骨折进行自动检测。如使用卷积神经网络,在CT的轴位片层上训练神经网络检测模型,然而其存在的问题是肋骨骨折表现是一种三维信息,其在连续片层上的变化是诊断的重要依据,仅使用二维轴位片层限制其检测准确率。使用三维卷积神经网络,对胸部CT图像进行滑窗分片、分割肋骨骨折区域。由于三维滑窗存在大量非肋骨区域,其存在的问题是计算时间长,准确率不高。
基于神经网络的方法需要大量人工标注的训练数据。数据集的数量与质量极大地影响检测模型的性能。然而,由于肋骨骨折的稀疏和微小,在三维CT中标记骨折十分繁琐。此外,为骨折标记三维边界框或分割蒙版需要大量的时间。这都不可避免地导致了骨折标注的遗漏,产生具有噪声标注的数据集,进而损害模型性能。
基于此,本申请实施例提供一种图像检测方法,实现该图像检测方法的网络架构如图1A所示,图1A示出可以应用本公开实施例的图像检测方法的一种系统架构示意图;如图1A所示,该系统架构中包括:图像采集终端11、网络12和图像检测终端13。为实现支撑一个示例性应用,图像采集终端11和图像检测终端13通过网络12建立通信连接,图像采集终端11通过网络12向图像检测终端13上报包括获取的样本图像;图像检测终端13响应于接收到的图像,利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息。通过过滤第二预测信息与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息;图像检测终端13通过第三预测信息,调整图像检测模型的网络参数。
作为示例,图像采集终端11可以包括图像采集设备,图像检测终端13可以包括具有视觉信息处理能力的视觉处理设备或远程服务器。网络12可以采用有线或无线连接方式。其中,当图像采集终端11为视觉处理设备时,图像检测终端13可以通过有线连接的方式与视觉处理设备通信连接,例如通过总线进行数据通信;当图像采集终端11为远程服务器时,图像检测终端13可以通过无线网络与远程服务器进行数据交互。
或者,在一些场景中,图像检测终端13可以是带有视频采集模组的视觉处理设备,可以带有图像采集功能的手机或眼镜等。这时,本公开实施例的图像检测方法可以图像检测终端13执行,上述系统架构可以不包含网络12和图像检测终端11。
请参阅图1B,图1B是本公开图像检测模型的训练方法一实施例的流程示意图。所述方法可以包括如下步骤:
步骤S11:获取样本图像。
本公开实施例中,样本图像中标注有目标对象的若干样本位置。需要说明的是,本公开实施例中,样本图像可以在若干处均存在目标对象,其中至少一处(如,一处、两处、三处等等)可以未标注,即样本图像中可以存在漏标注的情况,当然也可以均做标注,在此不做限定。
在一个实施场景中,在医学应用场景中,样本图像可以包括但不限于:CT图像、MR图像等;或者,在交通应用场景中,样本图像可以包括但不限于监控相机拍摄到的监控图像等,在此不做限定。其他场景可以以此类推,在此不再一一举例。
在另一个实施场景中,目标对象可以根据实际应用需要进行设置。以医学应用场景为例,在需要对骨折进行检测的情况下,样本图像可以包含骨区域(如,肋骨区域、股骨区域等),目标对象可以包括骨折(如,肋骨骨折、股骨骨折等);或者,在需要对血肿进行检测的情况下,样本图像可以包含脏器区域(如,胃部、大肠、小肠等),目标对象可以包括血肿(如,胃部血肿、大肠血肿、小肠血肿等);或者,以交通应用场景为例,在需要对非机动车进行检测的情况下,样本图像可以包括道路区域,目标对象可以包括非机动车(如,自行车、电动车、滑板车等)。其他场景可以以此类推,在此不再一一举例。
在又一个实施场景中,样本位置具体可以为样本图像中目标对象的图像区域的中心位置。例如,目标对象的图像区域可以为包含目标对象的方形区域(如,在样本图像为三维图像的情况下,方形区 域可以为长方体区域,而在样本图像为二维图像的情况下,方形区域可以为矩形区域,在此不做限定),则样本位置可以为该方形区域的中心位置。
在又一个实施场景中,样本位置具体可以包括样本图像中属于目标对象的像素点位置。例如,样本图像中属于目标对象的像素点均可以标注为样本位置,以此可以区分样本图像中不属于目标对象的像素点。
步骤S12:利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息。
在一个实施场景中,如前所述,样本图像可以为三维图像,则图像检测模型可以为三维全卷积神经网络,在此情况下,图像检测模型中用于提取图像特征的卷积核可以为三维卷积核。具体地,三维卷积核的尺寸可以包括但不限于:3*3*3、5*5*5等,在此不做限定。上述方式,在样本图像为三维图像的情况下,图像检测模型为三维全卷积神经网络,能够捕获三维图像的空间信息,有利于提高图像检测模型的敏感性和特异性。
在一个实施场景中,在样本图像为三维图像的情况下,预设检测模型中用于提取图像特征的卷积核也可以为三维卷积核,在此不做限定。此外,如前所述,三维卷积核的尺寸可以包括但不限于:3*3*3、5*5*5等,在此不做限定。
在另一个实施场景中,预设检测模型和图像检测模型可以具有相同的网络结构。例如,在图像检测模型为三维全卷积神经网络的情况下,预设检测模型也可以为三维全卷积神经网络,其他情况可以以此类推,在此不再一一举例。或者,预设检测模型和图像检测模型也可以具有不同的网络结构,且预设检测模型和图像检测模型均具备检测目标对象的能力。例如,预设检测模型和图像检测模型均可以包括用于提取图像特征的特征提取网络以及用于基于图像特征得到目标对象的预测信息的结果预测网络。
在又一个实施场景中,图像检测模型可以包括特征提取网络和结果预测网络,特征提取网络用于提取样本图像的样本特征图,结果预测网络用于利用样本特征图预测得到上述第二预测信息,且样本特征图的分辨率与样本图像的分辨率之间的比值在预设范围内。具体地,预设范围可以设置为1/8至1/2,或者,也可以设置为1/6至1/3,或者也可以设置为1/7至1/2,或者也可以设置为1/5至1/3等等,在此不做限定。例如,样本特征图的分辨率与样本图像的分辨率之间的比值可以为1/4、1/5等等,在此不做限定。上述方式,图像检测模型包括特征提取网络和结果预测网络,且特征提取网络用于提取样本图像的样本特征图,结果预测网络用于利用样本特征图预测得到第二预测信息,而样本特征图的分辨率与样本图像的分辨率之间的比值在预设范围内,即样本特征图的分辨率不会太大,从而能够减轻用于训练图像检测模型的硬件设备的负荷,且样本特征图的分辨率也不会太小,从而能够为结果预测网络提供足够的特征信息,有利于提高预测准确性。
本公开实施例中,第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值。需要说明的是,上述第一预测概率值和第二预测概率值表示模型预测准确的可能性。例如,第一预测位置的预测概率值可以表示图像检测模型准确预测第一预测位置的可能性,而第二预测位置的预测概率值可以表示预设图像检测模型准确预测第二预测位置的可能性。
在一个实施场景中,请结合参阅图2,图2是检测模型一实施例的框架示意图。需要说明的是,图2所示的检测模型即可以是图像检测模型,也可以是预设检测模型,还可以图像检测模型和预设检测模型均采用图2所示网络结构,在此不做限定。如图2所示,检测模型采用编码器-解码器(即encoder-decoder)的网络结构,编码器部分用于捕获上下文信息,而解码器部分用于精确定位目标。需要说明的是,图中矩形框内数字表示特征图的通道数,每一层左侧数字h 3表示在给定尺寸为128*128*128的输入图像的情况下,三维特征图的尺寸。请继续参阅图2,检测模型以预处理层开始对输入图像进行预处理,预处理层可以包含尺寸为5*5*5的卷积块(卷积块包含5*5*5的卷积层以及位于卷积层之后的批归一化层、激活层),之后可以以步长(stride)2进行最大池化处理。编码器部分包含4层,每层包含3个残差单元,残差单元的具体结构可以参阅残差网络相关技术细节,在此不再赘述。在编码器部分,除最后一层之外,各层输出经步长为2且尺寸为2*2*2的卷积块下采样至一半分辨率。解码器部分包含2层,每层包含尺寸为2*2*2的转置卷积块以实现上采样,该转置卷积块连接于编解码桥接以及3个残差单元之后。此外,区别于一般的编解码跳接,编解码桥接包含2个残差单元,用于缓解编码器部分和解码器部分两者之间的语义分歧。
请继续参阅图2,对于给定三维输入图像
Figure PCTCN2021123753-appb-000001
由解码器部分输出的特征图尺寸为
Figure PCTCN2021123753-appb-000002
该特征图可以分别输入概率预测分支和位置预测分支,概率预测分支可以包含3个尺寸为3*3*3的卷积块以及一个具有单输出通道的卷积层,而位置预测分支可以包含3个尺寸为3*3*3的卷积块以及一个具有三输出通道的卷积层。即解码器部分所输出的特征图被概率预测分支用于预测得到尺寸为
Figure PCTCN2021123753-appb-000003
的预测概率矩阵,并被位置预测分支用于预测得到尺寸为
Figure PCTCN2021123753-appb-000004
的位置预测矩阵。
在一个实施场景中,请结合参阅图3,图3是本公开图像检测模型的训练方法一实施例的过程示意图。如图3所示,样本图像经过预设检测模型M2和图像检测模型M1分别检测,可以得到第一预测信息和第二预测信息。此外,第一预测信息中白色填充圆形和以白色填充的X均表示第一预测位置,且白色填充圆形所示的第一预测位置对应有样本位置,即预设检测模型M2在样本图像所标注的样本位置上也预测出了目标对象,而白色填充的X所示的第一预测位置与任一样本位置均不对应,即预设检测模型M2在样本图像未标注样本位置的区域预测出了目标对象,在此情况下,可以将与任一样本位置均不对应的第一预测位置(即,白色填充的X所示的第一预测位置)视为假阳(即False Positive,FP)目标。
需要说明的是,本公开实施例以及下述公开实施例中,如无特殊说明,若预测位置与样本位置对应,可以表示预测位置与样本位置处于相同位置,或者,也可以表示预测位置与样本位置相距不超过预设距离(如,2个像素距离、3个像素距离等等)。例如,在样本位置表示样本图像中存在目标对象的图像区域的中心位置的情况下,若预测位置与样本位置对应,则可以表示预测位置与样本相距不超过预设距离,或者,也可以表示预测位置与样本位置处于相同位置;或者,在样本位置包括样本图像中属于目标对象的像素点位置的情况下,若预测位置与样本位置对应,则可以表示预测位置与样本位置处于相同位置,在此不做限定。
步骤S13:过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息。
本公开实施例中,第一参考位置为不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置。
在一个实施场景中,如前所述,若预测位置与样本位置对应,可以表示预测位置与样本位置处于相同位置,或者,也可以表示预测位置与样本位置相距不超过预设距离。故此,在预测位置与样本位置对应表示预测位置与样本位置处于相同位置的情况下,第一参考位置不与任一样本位置对应,可以表示第一参考位置不与任一样本位置处于相同位置;或者,在预测位置与样本位置对应表示预测位置与样本位置相距不超过预设距离的情况下,第一参考位置不与任一样本位置对应,可以表示第一参考位置与任一样本位置相距均超过预设距离。
在一个实施场景中,第一预设条件可以包括第一预测概率值高于预设概率阈值,预设概率阈值可以根据实际情况设置为0.5、0.6、0.7等等,在此不做限定。上述方式,通过将第一预设条件设置为第一预测概率值高于预设概率阈值,能够在图像检测模型的训练过程中,筛选出损失较大的假阳位置作为潜在的漏标位置,并忽略其对训练图像检测模型的影响,从而能够有利于进一步尽可能地减轻漏标对模型性能的影响。
在一个实施场景中,如前所述,通过将第一预设条件设置为第一预测概率值高于预设概率阈值,能够筛选出损失较大的假阳位置作为潜在的漏标位置。也就是说,也可以基于不与任一样本位置对应的第一预测位置的第一预测概率值,计算第一预测位置的预测损失,且预测损失与第一预测概率值为正相关关系,即不与任一样本位置对应的第一预测位置的第一预测概率值越大,其预测损失也越大,在此情况下,可以设置一个预设损失阈值ε,并将不与任一样本位置对应的第一预测位置视为假阳目标,并将预测损失大于预设损失阈值ε的假阳目标,视为潜在的漏标位置。需要说明的是,预设损失阈值ε可以参考下述关于参考损失值的计算过程,在此暂不赘述。
在另一个实施场景中,请结合参阅图3,如前所述,白色填充的X所示的第一预测位置与任一样本位置均不对应,即白色填充的X所示的第一预测位置可以视为假阳目标,为了便于区分,可以分别记为fp1、fp2、fp3和fp4。在此基础上,可以分别基于假阳目标fp1、假阳目标fp2、假阳目标fp3和假阳目标fp4的第一预测概率值,计算得到假阳目标fp1的预测损失L(fp1)为7.8,假阳目标fp2的预测损失L(fp2)为0.4,假阳目标fp3的预测损失L(fp3)为5.2,假阳目标fp4的预测损失L(fp4)为2.2,且L(fp1)和L(fp3)均大于预设损失阈值ε,故可以将假阳目标fp1和假阳目标fp3均视为潜在的漏标位置,即本公开实施例所述的第一参考位置。其他情况可以以此类推,在此不再一一举例。需要 说明的是,如前所述,预设损失阈值ε可以参考下述关于参考损失值的计算过程,在此暂不赘述。
在一个实施场景中,通过过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,并在后续训练过程中,基于第三预测信息,调整图像检测模型的网络参数,即能够尽可能地忽略潜在的漏标位置对于训练图像检测模型的影响。
在一个实施场景,请继续结合参阅图3,如前所述,假阳目标fp1和假阳目标fp3均视为潜在的漏标位置,即本公开实施例所述的第一参考位置,而如图3所示,第二预测信息中包含一个与第一参考位置fp3对应的第二预测位置(即图3图像检测模型M1向前传播输出的第二预测信息中以白色填充的X所示的位置),并将该第二预测位置及其第二预测概率值过滤出第二预测信息。其他情况可以以此类推,在此不再一一举例。
在一个实施场景中,还可以进一步引入第一过滤机制,以进一步从不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置中挑选出符合第一过滤条件的第一预测位置,作为第一参考位置。第一过滤条件可以包括:第一预测概率值按照从高到低排序位于前预设序数值位的第一预测位置。此外,考虑到神经网络的记忆效应,预设数值可以设置为训练轮次正相关。为了便于描述,不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置可以记为FP={fp 1,fp 2,…,fp i},第一参考位置FPex可以表示为:
FP ex=argmax FP:|FP|<nL focal(FP)……(1);
上述公式(1)中,n表示第一参考位置FP ex的数量,L focal(FP)表示从n个第一预测位置的总损失值,argmax表示求取最大值,即需要在上述第一预测位置FP={fp 1,fp 2,…,fp i}选出n个第一预测位置,作为第一参考位置,且能够确保这n个第一预测位置的总损失值L focal(FP)最大。如前所述,损失值与第一预测概率值可以为正相关关系,在此情况下,可以选取第一预测概率值按照从高到低排序位于前n位的第一预测位置,作为第一参考位置。具体地,预设数值n可以表示为:
n=λ(e)×batchsize……(2);
上述公式(2)中,e表示训练轮次,batchsize表示批训练时的批样本大小(即一次训练所输入的样本数量),λ用于控制预设数值。例如,在初始训练阶段,λ可以为0,以使模型学习共有模式,随着训练轮次的增加,λ可以逐渐增大。
在另一个实施场景中,上述第一过滤机制在批样本较大时具有较优性能,然而在三维场景中,由于三维图像对于计算资源的高消耗,通常批样本较小,为了适应于三维场景,还可以引入第二过滤机制,具体可以对于当前训练轮次及其之前若干训练轮次分别执行:获取不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置的总损失值;并从当前训练轮次及其之前若干训练轮次的总损失值中,选取预设数值个满足第二筛选条件的总损失值,作为参考损失值;在当前训练轮次,可以从不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置中,挑选出第一预测位置的损失值大于总损失值的第一预测位置,作为第一参考位置。第二筛选条件可以包括:总损失值按照从高到低排序排在前预设数值位。为了便于描述,如前所述,当前轮次中,不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置可以记为FP={fp 1,fp 2,…,fp i},第一参考位置FP ex可以表示为:
FP ex={fp i|L local(fp i)>ε,fp i∈FP}……(3);
上述公式(3)中,L local(fp i)表示不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置中第i个第一预测位置的损失值,ε表示参考损失值。其中,参考损失值ε可以表示为:
ε=max n(L focal(FP h))……(4);
上述公式(4)中,FP h表示当前训练轮次及其之前若干训练轮次各自分别所预测得到的不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置,n表示预设数值,max表示取最大值。为了便于描述,FP h可以表示为:
FP h=(FP,FP -1,…,FP -k+1)……(5);
上述公式(5)中,k表示当前训练轮次及其之前若干训练轮次的总轮次数,预设数值可以与总轮次数正相关、且与训练轮次正相关。具体地,预设数值n可以表示为:
n=λ(e)×batchsize×k……(6);
上述公式(6)中,λ、e以及batchsize的含义,可以参阅前述相关描述,在此不再赘述。
步骤S14:基于第三预测信息,调整图像检测模型的网络参数。
在一个实施场景中,可以基于第三预测信息,计算图像检测模型的预测损失,并基于该预测损失,调整图像检测模型的网络参数。具体地,可以采用诸如L1损失函数、L2损失函数等损失函数计算预测损失,具体计算过程可以参阅L1损失函数、L2损失函数的相关公式,在此不再赘述。此外,基于预测损失,可以进行梯度回传,通过梯度下降优化图像检测模型的网络参数,故可以忽略上述潜在的漏标位置的梯度贡献。
在一个实施场景中,预设检测模型可以预先在常规目标检测任务上进行预训练,再利用少量不存在漏标的样本图像对预训练之后的预设检测模型进行微调。此时,预设检测模型一方面可以适应于对样本图像的目标检测任务,另一方面通过少量不存在漏标的样本图像的微调训练,预设检测模型也能够具备一定的检测准确性。在此之后即可执行本公开实施例中的步骤,以对图像检测模型进行训练。需要说明的是,以医学应用场景为例,在常规目标检测任务进行预训练的步骤可以包括:获取常规医学图像,其中,常规医学图像中标注有存在医学对象的实际位置,该医学对象可以不同于目标对象,如在目标对象包括肋骨骨折的情况下,医学对象可以包括股骨骨折,其他情况可以以此类推,在此不再一一举例。此外,为了提高预设检测模型在常规目标检测任务的准确性,常规医学图像的数量可以尽可能地多,如,可以采集上千张常规医学图像。在此基础上,可以利用预设检测模型对常规医学图像进行检测,得到医学对象的预测位置,从而可以基于预测位置与实际位置之间的差异,调整预设检测模型的网络参数,并重复上述利用预设检测模型对常规医学图像进行检测的步骤以及后续步骤,直至预设检测模型在常规目标检测任务上收敛。例如,基于预测位置与实际位置所计算得到的预测损失小于预设损失阈值。在此基础上,可以利用少量样本图像,进行迁移学习,其中样本图像中标注有存在目标对象的样本位置,且这些少量的样本图像中不存在漏标的目标对象。具体地,可以利用预训练之后的预设检测模型对样本图像进行检测,得到目标对象的预测位置,并基于目标对象的样本位置和预测位置之间的差异,再次调整预设检测模型的网络参数,以及重新执行上述利用预训练之后的预设检测模型对样本图像进行检测的步骤以及后续,直至预设检测模型在少量样本图像上训练收敛。
在另一个实施场景中,也可以不必预先对预设检测模型进行训练。如前所述,图像检测模型可以是经过若干轮训练得到的,且在当前训练轮次满足第二预设条件的情况下,可以执行上述步骤S12以及后续步骤,以在当前训练轮次对图像检测模型进行训练,并在当前训练轮次不满足第二预设条件的情况下,训练预设检测模型。上述方式,通过若干轮训练图像检测模型,且在当前训练轮次满足第二预设条件时,对图像检测模型进行训练,而在当前训练轮次不满足第二预设条件时,对预设检测模型进行训练,即在图像检测模型的训练过程中,预设检测模型也在其中交替地进行训练,故能够在训练过程中,使图像检测模型和预设检测模型相互教学,相较于单一地训练图像检测模型,能够有利于减少累积误差,从而能够进一步尽可能地减轻漏标对模型性能的影响。
在一个实施场景中,可以通过设置上述第二预设条件,来调整预设检测模型和图像检测模型的交替训练策略。例如,第二预设条件可以包括当前训练轮次为奇数,即在当前训练轮次为奇数时,可以对图像检测模型进行训练,而在当前训练轮次不为奇数时,可以对预设检测模型进行训练;或者,第二预设条件也可以包括当前训练轮次为偶数,即在当前训练轮次为偶数时,可以对图像检测模型进行训练,而在当前训练轮次为奇数时,可以对预设检测模型进行训练。上述方式,通过将第二预设条件设置为当前训练轮次为奇数或者当前训练轮次为偶数,能够使得图像检测模型和预设检测模型按照一次比一次的训练策略均衡地进行训练。此外,第二预设条件也可以设置为包括:当前训练轮次除以第一数值的余数大于第二数值,且第二数值小于第一数值。以第一数值等于4且第二数值等于0为例,在当前训练轮次除以4的余数大于0的情况下,可以训练图像检测模型,即在当前训练轮次为1、2、3、5、6、7、9、10、11等不为4的整数倍时,可以训练图像检测模型,而在当前训练轮次除以4的余数不大于0的情况下,可以训练预设检测模型,即在当前训练轮次为4、8、12等为4的整数倍时,可以训练预设检测模型,在此情况下,图像检测模型和预设检测模型可以按照三次比一次的训练策略进行训练,其他情况可以以此类推,在此不再一一举例。上述方式,通过将第二预设条件设置为当前训练轮次除以第一数值的余数大于第二数值,且第二数值小于第一数值,能够使得图像检测模型和预设检测模型按照多次比一次(或一次比多次)的训练策略有侧重地进行训练。
在另一个实施场景中,图像检测模型和预设检测模型交替训练的过程中,若当前训练轮次需训练预设检测模型,则可以执行上述步骤S12,即利用预设检测模型和图像检测模型分别对样本图像进行目标检测,得到目标对象的第一预测信息和第二预测信息,之后再过滤第一预测信息中与第二参考位 置对应的第一预测位置及其第一预测概率值,得到第四预测信息,且第二参考位置为不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置,并基于第四预测信息,调整预设检测模型的网络参数。关于第二参考位置、调整网络参数的相关细节,可以参考前述训练图像检测模型的相关描述,在此不再赘述。需要说明的是,在训练预设检测模型的过程中,上述第一预设条件包括第二预测概率值高于预设概率阈值。上述方式,在训练预设检测模型的过程中,同样先执行利用预设检测模型和图像检测模型分别对样本图像进行目标检测,得到目标对象的第一预测信息和第二预测信息的步骤,基于此再过滤第一预测信息中与第二参考位置对应的第一预测位置及其第一预测概率值,得到第四预测信息,且第二参考位置为不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置,并基于第四预测信息,调整预设检测模型的网络参数,即在预设检测模型的训练过程中,可以通过筛选图像检测模型所预测到的第二预测位置中不与任一样本位置对应且第二预测概率值满足第一预设条件的第二参考位置,由于不与任一样本位置对应的第二预测位置可视为假阳位置,而预测概率值又与预测损失相关,故第二参考位置可视为潜在的漏标位置,基于此通过过滤第一预测信息中与潜在的漏标位置对应的第一预测位置及其第一预测概率值得到第四预测信息,以通过第四预测信息调整预设检测模型的网络参数,能够尽可能地减轻漏标对模型性能的影响,且由于两者均能够尽可能减轻漏标对自身性能的影响,进而在预设图像检测模型与图像检测模型的交替训练过程中,能够有利于提高两者相互教学的质量,有利于提高模型性能。
需要说明的是,在训练预设检测模型的过程中,也可以采用前述公开实施例所述的第一过滤机制或第二过滤机制。例如,在采用第一过滤机制时,可以进一步从不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置中挑选出符合第一过滤条件的第二预测位置,作为第二参考位置。第一过滤条件可以包括:第二预测概率值按照从高到低排序位于前预设序数值位的第二预测位置;或者,在采用第二过滤机制时,可以对于当前训练轮次及其之前若干训练轮次分别执行:获取不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置的总损失值;并从当前训练轮次及其之前若干训练轮次的总损失值中,选取预设数值个满足第二筛选条件的总损失值,作为参考损失值;在当前训练轮次,可以从不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置中,挑选出第二预测位置的损失值大于总损失值的第二预测位置,作为第二参考位置。第二筛选条件可以包括:总损失值按照从高到低排序排在前预设数值位。具体可以参阅前述相关描述,在此不再赘述。
在又一个实施场景中,请结合参阅图3,如图3所示,在训练图像检测模型M1时,预设检测模型M2可以选择预测损失较大的假阳(即FP)目标作为潜在的漏标位置,并忽略这些潜在的漏标位置对图像检测模型M1梯度更新的贡献;反之,在训练预设检测模型M2时,图像检测模型M1可以选择预测损失较大的假阳目标作为潜在的漏标位置,并忽略这些潜在的漏标位置对预设检测模型M2梯度更新的贡献,在图像检测模型M1和预设检测模型M2交替训练过程中,有效利用了神经网络的记忆特征,即网络在拟合特定样本前先拟合共有模式,而漏标位置在训练早期因不符合共有模式而具有较大的损失,故通过忽略其对梯度更新的贡献,即可排除其影响,使得网络专注于对共有模式的学习,并在两个模型的交替训练过程中,通过相互教学可以有效减少单一模型训练而可能带来的累积误差。
在一个实施场景中,如前所述,样本图像中可以包括骨区域,由于骨区域在样本图像中通常表现为长条形(如肋骨等),故直接对样本图像进行检测,会导致在检测过程中所遍历的图像区域有可能并非骨区域,从而减慢检测效率。有鉴于此,为了提高检测效率,在获取到样本图像之后,可以先识别样本图像中的骨区域,并从样本图像中截取若干样本子图像,且若干样本子图像的集合可以覆盖骨区域,在此基础上,在利用预设检测模型和图像检测模型分别对样本图像进行目标检测时,可以利用预设检测模型和图像检测模型分别对若干样本子图像进行目标检测,从而能够仅对骨区域进行目标检测,进而能够大大提升检测效率。此外,识别肋骨区域和截取样本子图像的方式可以参阅下述公开实施例中相关描述,在此暂不赘述。
上述方案,通过获取样本图像,且样本图像中标注有目标对象的若干样本位置,并利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息,且第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值,在此基础上,过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,且第一参考位置为不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置,并进一步基于第三预测信息,调整图像检测模型的网络参数,即在图像检测模型的训练过程中,可以通过筛选预设检测模型所预测到的第一预测位置中不与任一样本位置对应且 第一预测概率值满足第一预设条件的第一参考位置,由于不与任一样本位置对应的第一预测位置可视为假阳位置,而预测概率值又与预测损失相关,故第一参考位置可视为潜在的漏标位置,基于此通过过滤第二预测信息中与潜在的漏标位置对应的第二预测位置及其第二预测概率值得到第三预测信息,以通过第三预测信息调整图像检测模型的网络参数,能够尽可能地减轻漏标对模型性能的影响。
请参阅图4,图4是本公开图像检测方法一实施例的流程示意图。如图4所示,本公开实施例具体可以包括如下步骤:
步骤S41:获取待检测图像。
在一个实施场景中,与前述公开实施例中样本图像类似地,在医学应用场景中,待检测图像可以为医学图像,具体可以包括但不限于:CT图像、MR图像等,在此不做限定。其他应用场景可以以此类推,在此不再一一举例。
在另一个实施场景中,在医学应用场景中,与前述公开实施例中样本图像类似地,待检测图像中所包含的目标对象可以包括但不限于:骨折、血肿等等,在此不做限定。其他应用场景可以以此类推,在此不再一一举例。
步骤S42:利用图像检测模型对待检测图像进行检测,得到待检测图像中目标对象的检测信息。
本公开实施例中,检测信息包括目标对象在待检测图像中的目标位置,且图像检测模型是利用上述任一图像检测模型的训练方法实施例中的步骤训练得到的,具体训练过程可以参阅前述公开实施例中相关描述,在此不再赘述。
在一个实施场景中,目标位置可以为待检测图像中包含目标对象的图像区域的中心位置。例如,在待检测图像为二维图像的情况下,包含目标对象的图像区域可以为一矩形区域,则目标位置可以为该矩形区域的中心位置;或者,在待检测图像为三维图像的情况下,包含目标对象的图像区域可以为一长方体区域,则目标位置可以为该长方体区域的中心位置。
在另一个实施场景中,以待检测图像是医学图像为例,在待检测图像包括骨区域,且目标对象包括骨折的情况下,在利用图像检测模型对待检测图像进行检测之前,可以先识别待检测图像中的骨区域,并从待检测图像中截取若干子图像,且若干子图像的集合可以覆盖骨区域。在此基础上,再利用图像检测模型分别对若干子图像进行检测,得到每一子图像中目标对象的检测信息。需要说明的是,子图像中目标对象的检测信息可以包括骨折的目标位置,也可以不包括目标位置,即子图像中不存在骨折。上述方式,待检测图像为医学图像,待检测图像包括骨区域,且目标对象包括骨折,在利用图像检测模型进行检测之前,先识别待检测图像中的骨区域,并在待检测图像中截取若干子图像,且若干子图像的集合可以覆盖骨区域,基于此可以利用图像检测模型分别对若干子图像进行检测,得到每一子图像中目标对象的检测信息,由于骨区域通常表现为条状区域,相较于遍历待检测图像进行检测可能会遍历非骨区域,仅对若干子图像进行检测,能够有利于大大减少图像检测所需时间,提高图像检测效率。
在一个实施场景中,为了提高识别肋骨区域的效率,可以预先训练一个用于识别骨区域的图像识别模型,该图像识别模型可以包括但不限于:U-Net等,在此不做限定。在此基础上,可以利用图像识别模型对待检测图像进行识别,得到待检测图像中的骨区域。
在另一个实施场景中,也可以先对待检测图像进行二值化处理,由于待检测图像中骨区域的密度与组织的密度不同,故在待检测图像中表现为不同灰度,从而可以预先设置一个灰度阈值,以对骨区域和非骨区域进行区分。之后,为了提高骨区域的准确性,可以基于骨区域的形状特征(如,长条状、带有一定弧度等),对识别出来的骨区域进行降噪等处理,以识别出最终的骨区域。
在又一个实施场景中,各个子图像的尺寸可以相同,如,对于二维图像而言,子图像的尺寸可以为16*16,或者32*32,或者48*48等等,在此不做限定;或者,对于三维图像而言,子图像的尺寸可以为16*16*16、或者32*32*32,或者48*48*48等等,在此不做限定。
在又一个实施场景中,以骨区域具体是肋骨区域为例,请参阅图5,图5是截取子图像一实施例的示意图。如图5所示,条形区域表示肋骨区域,从左往右分别表示前部肋骨区域视图,中部肋骨区域视图和后部肋骨区域视图。如前所述,区别于通过滑窗截取子图像可能造成部分子图像中并不存在肋骨区域,本公开实施例先识别得到肋骨区域,并基于识别得到的肋骨区域,从待检测图像中截取若干子图像,且若干子图像能够覆盖肋骨区域,从而可以大大减少截取出来的子图像。如图5所示,在前部肋骨区域可以截取4个子图像(图5中立方体所示),在中部肋骨区域可以截取4个子图像(图5中立方体所示),在后部肋骨区域可以截取5个子图像(图5中立方体所示)。其他情况可以以此类推,在此不再一一举例。
上述方案,通过获取待检测图像,并利用图像检测模型对待检测图像进行检测,得到待检测图像 中目标对象的检测信息,且检测信息包括目标对象在待检测图像中的目标位置,而图像检测模型是利用上述第一方面中的图像检测模型的训练方法得到的,即在对图像检测模型的训练过程中,能够尽可能地减轻漏标对模型性能的影响,故能够有利于提高目标对象的检测信息的准确性。
请参阅图6,图6是本公开图像检测模型的训练装置60一实施例的框架示意图。图像检测模型的训练装置60包括:图像获取模块61、目标检测模块62、信息过滤模块63和参数调整模块64,图像获取模块61配置为获取样本图像;其中,样本图像中标注有目标对象的若干样本位置;目标检测模块62配置为利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息;其中,第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值;信息过滤模块63配置为过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,其中,第一参考位置为不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置;参数调整模块64配置为基于第三预测信息,调整图像检测模型的网络参数。
因此,通过获取样本图像,且样本图像中标注有目标对象的若干样本位置,并利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息,且第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值,在此基础上,过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,且第一参考位置为不与任一样本位置对应且第一预测概率值满足第一预设条件的第一预测位置,并进一步基于第三预测信息,调整图像检测模型的网络参数,即在图像检测模型的训练过程中,可以通过筛选预设检测模型所预测到的第一预测位置中不与任一样本位置对应且第一预测概率值满足第一预设条件的第一参考位置,由于不与任一样本位置对应的第一预测位置可视为假阳位置,而预测概率值又与预测损失相关,故第一参考位置可视为潜在的漏标位置,基于此通过过滤第二预测信息中与潜在的漏标位置对应的第二预测位置及其第二预测概率值得到第三预测信息,以通过第三预测信息调整图像检测模型的网络参数,能够尽可能地减轻漏标对模型性能的影响。
在一些公开实施例中,图像检测模型是经过若干轮训练得到的,其中,在当前训练轮次满足第二预设条件的情况下,执行利用预设检测模型和图像检测模型分别对样本图像进行目标检测,得到目标对象的第一预测信息和第二预测信息的步骤以及后续步骤,以对图像检测模型进行训练,并在当前训练轮次不满足第二预设条件的情况下,训练预设检测模型。
因此,通过若干轮训练图像检测模型,且在当前训练轮次满足第二预设条件时,对图像检测模型进行训练,而在当前训练轮次不满足第二预设条件时,对预设检测模型进行训练,即在图像检测模型的训练过程中,预设检测模型也在其中交替地进行训练,故能够在训练过程中,使图像检测模型和预设检测模型相互教学,相较于单一地训练图像检测模型,能够有利于减少累积误差,从而能够进一步尽可能地减轻漏标对模型性能的影响。
在一些公开实施例中,图像检测模型的训练装置60还包括配置为训练预设检测模型的模型训练模块,且模型训练模块包括检测子模块,配置为利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息,模型训练模块还包括过滤子模块,配置为过滤第一预测信息中与第二参考位置对应的第一预测位置及其第一预测概率值,得到第四预测信息;其中,第二参考位置为不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置,模型训练模块还包括调整子模块,配置为基于第四预测信息,调整预设检测模型的网络参数。
因此,在训练预设检测模型的过程中,同样先执行利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息的步骤,基于此再过滤第一预测信息中与第二参考位置对应的第一预测位置及其第一预测概率值,得到第四预测信息,且第二参考位置为不与任一样本位置对应且第二预测概率值满足第一预设条件的第二预测位置,并基于第四预测信息,调整预设检测模型的网络参数,即在预设检测模型的训练过程中,可以通过筛选图像检测模型所预测到的第二预测位置中不与任一样本位置对应且第二预测概率值满足第一预设条件的第二参考位置,由于不与任一样本位置对应的第二预测位置可视为假阳位置,而预测概率值又与预测损失相关,故第二参考位置可视为潜在的漏标位置,基于此通过过滤第一预测信息中与潜在的漏标位置对应的第一预测位置及其第一预测概率值得到第四预测信息,以通过第四预测信息调整预设检测模型的网络参数,能够尽可能地减轻漏标对模型性能的影响,且由于两者均能够尽可能减轻漏标对自身性能的影响,进而在预设图像检测模型与图像检测模型的交替训练过程中,能够有利于提高两者相互教学的质量,有利于提高模型性能。
在一些公开实施例中,第二预设条件包括以下任一者:当前训练轮次为奇数、当前训练轮次为偶 数、当前训练轮次除以第一数值的余数大于第二数值;其中,第二数值小于第一数值。
因此,通过将第二预设条件设置为当前训练轮次为奇数或者当前训练轮次为偶数,能够使得图像检测模型和预设检测模型按照一次比一次的训练策略均衡地进行训练,而通过将第二预设条件设置为当前训练轮次除以第一数值的余数大于第二数值,且第二数值小于第一数值,能够使得图像检测模型和预设检测模型按照多次比一次(或一次比多次)的训练策略有侧重地进行训练。
在一些公开实施例中,在训练图像检测模型的情况下,第一预设条件包括:第一预测概率值高于预设概率阈值,在训练预设检测模型的情况下,第一预设条件包括:第二预测概率值高于预设概率阈值。
因此,在训练图像检测模型的情况下,通过将第一预设条件设置为第一预测概率值高于预设概率阈值,而在训练预设检测模型的情况下,通过将第一预设条件设置为第二预测概率值高于预设概率阈值,能够在图像检测模型和预设检测模型的训练过程中,筛选出损失较大的假阳位置作为潜在的漏标位置,并忽略其对训练图像检测模型的影响,从而能够有利于进一步尽可能地减轻漏标对模型性能的影响。
在一些公开实施例中,预设检测模型和图像检测模型具有相同网络结构;和/或,样本图像为医学图像,且样本图像包括骨区域,目标对象包括骨折。
因此,预设检测模型和图像检测模型具有相同的网络结构,即用于教学图像检测模型的预设检测模型具有与图像检测模型相同的网络结构,能够有利于降低图像检测模型的训练难度;而通过将样本图像设置为医学图像,且样本图像包括骨区域,且目标对象设置为包括骨折,能够使图像检测模型应用于骨折检测。
在一些公开实施例中,图像检测模型包括特征提取网络和结果预测网络,特征提取网络用于提取样本图像的样本特征图,结果预测网络用于利用样本特征图预测得到第二预测信息,且样本特征图的分辨率与样本图像的分辨率之间的比值在预设范围内。
因此,图像检测模型包括特征提取网络和结果预测网络,且特征提取网络用于提取样本图像的样本特征图,结果预测网络用于利用样本特征图预测得到第二预测信息,而样本特征图的分辨率与样本图像的分辨率之间的比值在预设范围内,即样本特征图的分辨率不会太大,从而能够减轻用于训练图像检测模型的硬件设备的负荷,且样本特征图的分辨率也不会太小,从而能够为结果预测网络提供足够的特征信息,有利于提高预测准确性。
请参阅图7,图7是本公开图像检测装置70一实施例的框架示意图。图像检测装置70包括:图像获取模块71和目标检测模块72,图像获取模块71配置为获取待检测图像;目标检测模块72配置为利用图像检测模型对待检测图像进行检测,得到待检测图像中目标对象的检测信息;其中,检测信息包括目标对象在待检测图像中的目标位置,且图像检测模型是利用上述任一图像检测模型的训练装置实施例中的图像检测模型的训练装置得到的。
在一些公开实施例中,待检测图像为医学图像,待检测图像包括骨区域,且目标对象包括骨折,图像检测装置70还包括区域识别模块,配置为识别待检测图像中的骨区域,图像检测装置70还包括图像截取模块,配置为从待检测图像中截取若干子图像;其中,若干子图像的集合覆盖骨区域,目标检测模块72具体配置为利用图像检测模型分别对若干子图像进行检测,得到每一子图像中目标对象的检测信息。
因此,待检测图像为医学图像,待检测图像包括骨区域,且目标对象包括骨折,在利用图像检测模型进行检测之前,先识别待检测图像中的肋骨区域,并在待检测图像中截取若干子图像,且若干子图像的集合可以覆盖骨区域,基于此可以利用图像检测模型分别对若干子图像进行检测,得到每一子图像中目标对象的检测信息,由于骨区域通常表现为条状区域,相较于遍历待检测图像进行检测可能会遍历非骨区域,仅对若干子图像进行检测,能够有利于大大减少图像检测所需时间,提高图像检测效率。
请参阅图8,图8是本公开电子设备80一实施例的框架示意图。电子设备80包括相互耦接的存储器81和处理器82,处理器82用于执行存储器81中存储的程序指令,以实现上述任一图像检测模型的训练方法实施例中的步骤,或实现上述任一图像检测方法实施例中的步骤。在一个具体的实施场景中,电子设备80可以包括但不限于:微型计算机、服务器,此外,电子设备80还可以包括笔记本电脑、平板电脑等移动设备,在此不做限定。
具体而言,处理器82用于控制其自身以及存储器81以实现上述任一图像检测模型的训练方法实施例中的步骤,或实现上述任一图像检测方法实施例中的步骤。处理器82还可以称为CPU(Central Processing Unit,中央处理单元)。处理器82可能是一种集成电路芯片,具有信号的处理能力。处理 器82还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,处理器82可以由集成电路芯片共同实现。
上述方案,能够尽可能地减轻漏标对模型性能的影响,提高图像检测模型的性能,以提高图像检测的准确性。
请参阅图9,图9为本公开计算机可读存储介质90一实施例的框架示意图。计算机可读存储介质90存储有能够被处理器运行的程序指令901,程序指令901用于实现上述任一图像检测模型的训练方法实施例中的步骤,或实现上述任一图像检测方法实施例中的步骤。
上述方案,能够尽可能地减轻漏标对模型性能的影响,提高图像检测模型的性能,以提高图像检测的准确性。
在本公开所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本公开各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
工业实用性
本公开公开了一种图像检测方法和相关训练方法及相关装置、设备及介质,其中,图像检测模型的训练方法包括:获取样本图像;利用预设检测模型和图像检测模型分别对样本图像进行检测,得到目标对象的第一预测信息和第二预测信息,且第一预测信息包括若干第一预测位置及其第一预测概率值,第二预测信息包括若干第二预测位置及其第二预测概率值;过滤第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息;基于第三预测信息,调整图像检测模型的网络参数。上述方案,能够尽可能地减轻漏标对模型性能的影响,提高图像检测模型的性能,以提高图像检测的准确性。

Claims (14)

  1. 一种图像检测模型的训练方法,所述方法由电子设备执行,所述方法包括:
    获取样本图像;其中,所述样本图像中标注有目标对象的若干样本位置;
    利用预设检测模型和图像检测模型分别对所述样本图像进行检测,得到所述目标对象的第一预测信息和第二预测信息;其中,所述第一预测信息包括若干第一预测位置及其第一预测概率值,所述第二预测信息包括若干第二预测位置及其第二预测概率值;
    过滤所述第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,其中,所述第一参考位置为不与任一所述样本位置对应且所述第一预测概率值满足第一预设条件的第一预测位置;
    基于所述第三预测信息,调整所述图像检测模型的网络参数。
  2. 根据权利要求1所述的方法,其中,所述图像检测模型是经过若干轮训练得到的,其中,在当前训练轮次满足第二预设条件的情况下,执行所述利用预设检测模型和图像检测模型分别对所述样本图像进行检测,得到所述目标对象的第一预测信息和第二预测信息的步骤以及后续步骤,以对所述图像检测模型进行训练;以及
    在所述当前训练轮次不满足所述第二预设条件的情况下,训练所述预设检测模型。
  3. 根据权利要求2所述的方法,其中,训练所述预设检测模型的步骤包括:
    执行所述利用预设检测模型和图像检测模型分别对所述样本图像进行检测,得到所述目标对象的第一预测信息和第二预测信息的步骤;
    过滤所述第一预测信息中与第二参考位置对应的第一预测位置及其第一预测概率值,得到第四预测信息;其中,所述第二参考位置为不与任一所述样本位置对应且所述第二预测概率值满足所述第一预设条件的第二预测位置;
    基于所述第四预测信息,调整所述预设检测模型的网络参数。
  4. 根据权利要求2或3所述的方法,其中,所述第二预设条件包括以下任一者:所述当前训练轮次为奇数、所述当前训练轮次为偶数、所述当前训练轮次除以第一数值的余数大于第二数值;其中,所述第二数值小于所述第一数值。
  5. 根据权利要求1至4任一项所述的方法,其中,在训练所述图像检测模型的情况下,所述第一预设条件包括:所述第一预测概率值高于预设概率阈值;
    在训练所述预设检测模型的情况下,所述第一预设条件包括:所述第二预测概率值高于所述预设概率阈值。
  6. 根据权利要求1至5任一项所述的方法,其中,
    所述预设检测模型和所述图像检测模型具有相同网络结构;
    和/或,所述样本图像为医学图像,且所述样本图像包括骨区域,所述目标对象包括骨折。
  7. 根据权利要求1至6任一项所述的方法,其中,所述图像检测模型包括特征提取网络和结果预测网络,所述特征提取网络用于提取所述样本图像的样本特征图,所述结果预测网络用于利用所述样本特征图预测得到所述第二预测信息,且所述样本特征图的分辨率与所述样本图像的分辨率之间的比值在预设范围内。
  8. 一种图像检测方法,其中,包括:
    获取待检测图像;
    利用图像检测模型对所述待检测图像进行检测,得到所述待检测图像中目标对象的检测信息;
    其中,所述检测信息包括所述目标对象在所述待检测图像中的目标位置,且所述图像检测模型是利用权利要求1至7任一项所述的图像检测模型的训练方法得到的。
  9. 根据权利要求8所述的方法,其中,所述待检测图像为医学图像,所述待检测图像包括骨区域, 且所述目标对象包括骨折;在所述利用图像检测模型对所述待检测图像进行检测,得到所述待检测图像中目标对象的检测信息之前,所述方法还包括:
    识别所述待检测图像中的所述骨区域,并从所述待检测图像中截取若干子图像;其中,所述若干子图像的集合覆盖所述骨区域;
    所述利用图像检测模型对所述待检测图像进行检测,得到所述待检测图像中目标对象的检测信息,包括:
    利用所述图像检测模型分别对所述若干子图像进行检测,得到每一所述子图像中所述目标对象的检测信息。
  10. 一种图像检测模型的训练装置,其中,包括:
    图像获取模块,配置为获取样本图像;其中,所述样本图像中标注有目标对象的若干样本位置;
    目标检测模块,配置为利用预设检测模型和图像检测模型分别对所述样本图像进行检测,得到所述目标对象的第一预测信息和第二预测信息;其中,所述第一预测信息包括若干第一预测位置及其第一预测概率值,所述第二预测信息包括若干第二预测位置及其第二预测概率值;
    信息过滤模块,配置为过滤所述第二预测信息中与第一参考位置对应的第二预测位置及其第二预测概率值,得到第三预测信息,其中,所述第一参考位置为不与任一所述样本位置对应且所述第一预测概率值满足第一预设条件的第一预测位置;
    参数调整模块,配置为基于所述第三预测信息,调整所述图像检测模型的网络参数。
  11. 一种图像检测装置,其中,包括:
    图像获取模块,配置为获取待检测图像;
    目标检测模块,配置为利用图像检测模型对所述待检测图像进行检测,得到所述待检测图像中目标对象的检测信息;
    其中,所述检测信息包括所述目标对象在所述待检测图像中的目标位置,且所述图像检测模型是利用权利要求10所述的图像检测模型的训练装置得到的。
  12. 一种电子设备,其中,包括相互耦接的存储器和处理器,所述处理器用于执行所述存储器中存储的程序指令,以实现权利要求1至7任一项所述的图像检测模型的训练方法,或实现权利要求8或9所述的图像检测方法。
  13. 一种计算机可读存储介质,其上存储有程序指令,其中,所述程序指令被处理器执行时实现权利要求1至7任一项所述的图像检测模型的训练方法,或实现权利要求8或9所述的图像检测方法。
  14. 一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于实现如权利要求1至7任一项所述的图像检测模型的训练方法,或实现权利要求8或9所述的图像检测方法。
PCT/CN2021/123753 2021-06-07 2021-10-14 图像检测方法和相关训练方法及相关装置、设备及介质 WO2022257314A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110633460.8 2021-06-07
CN202110633460.8A CN113435260A (zh) 2021-06-07 2021-06-07 图像检测方法和相关训练方法及相关装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2022257314A1 true WO2022257314A1 (zh) 2022-12-15

Family

ID=77803914

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/123753 WO2022257314A1 (zh) 2021-06-07 2021-10-14 图像检测方法和相关训练方法及相关装置、设备及介质

Country Status (2)

Country Link
CN (1) CN113435260A (zh)
WO (1) WO2022257314A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576453A (zh) * 2023-11-14 2024-02-20 中国人民解放军陆军装甲兵学院 一种跨域装甲目标检测方法、系统、电子设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435260A (zh) * 2021-06-07 2021-09-24 上海商汤智能科技有限公司 图像检测方法和相关训练方法及相关装置、设备及介质
CN114549445A (zh) * 2022-02-16 2022-05-27 上海商汤智能科技有限公司 图像检测及相关模型的训练方法、相关装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190392606A1 (en) * 2017-03-22 2019-12-26 Nec Corporation Object detection device, object detection method, and object detection program
CN111091091A (zh) * 2019-12-16 2020-05-01 北京迈格威科技有限公司 目标对象重识别特征的提取方法、装置、设备及存储介质
CN111179247A (zh) * 2019-12-27 2020-05-19 上海商汤智能科技有限公司 三维目标检测方法及其模型的训练方法及相关装置、设备
CN112766244A (zh) * 2021-04-07 2021-05-07 腾讯科技(深圳)有限公司 目标对象检测方法、装置、计算机设备和存储介质
CN113435260A (zh) * 2021-06-07 2021-09-24 上海商汤智能科技有限公司 图像检测方法和相关训练方法及相关装置、设备及介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977943B (zh) * 2019-02-14 2024-05-07 平安科技(深圳)有限公司 一种基于yolo的图像目标识别方法、系统和存储介质
CN109919928B (zh) * 2019-03-06 2021-08-03 腾讯科技(深圳)有限公司 医学影像的检测方法、装置和存储介质
CN111126165B (zh) * 2019-11-29 2022-07-29 苏州科达科技股份有限公司 黑烟车检测方法、装置及电子设备
CN111028224B (zh) * 2019-12-12 2020-12-01 广西医准智能科技有限公司 数据标注、模型训练和图像处理方法、装置及存储介质
CN111508019A (zh) * 2020-03-11 2020-08-07 上海商汤智能科技有限公司 目标检测方法及其模型的训练方法及相关装置、设备
CN112132206A (zh) * 2020-09-18 2020-12-25 青岛商汤科技有限公司 图像识别方法及相关模型的训练方法及相关装置、设备
CN112785565B (zh) * 2021-01-15 2024-01-05 上海商汤智能科技有限公司 目标检测方法及装置、电子设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190392606A1 (en) * 2017-03-22 2019-12-26 Nec Corporation Object detection device, object detection method, and object detection program
CN111091091A (zh) * 2019-12-16 2020-05-01 北京迈格威科技有限公司 目标对象重识别特征的提取方法、装置、设备及存储介质
CN111179247A (zh) * 2019-12-27 2020-05-19 上海商汤智能科技有限公司 三维目标检测方法及其模型的训练方法及相关装置、设备
CN112766244A (zh) * 2021-04-07 2021-05-07 腾讯科技(深圳)有限公司 目标对象检测方法、装置、计算机设备和存储介质
CN113435260A (zh) * 2021-06-07 2021-09-24 上海商汤智能科技有限公司 图像检测方法和相关训练方法及相关装置、设备及介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576453A (zh) * 2023-11-14 2024-02-20 中国人民解放军陆军装甲兵学院 一种跨域装甲目标检测方法、系统、电子设备及存储介质

Also Published As

Publication number Publication date
CN113435260A (zh) 2021-09-24

Similar Documents

Publication Publication Date Title
WO2022257314A1 (zh) 图像检测方法和相关训练方法及相关装置、设备及介质
CN111209810B (zh) 向可见光与红外图像准确实时行人检测的边界框分割监督深度神经网络架构
CN110599448B (zh) 基于MaskScoring R-CNN网络的迁移学习肺部病变组织检测系统
CN110378381B (zh) 物体检测方法、装置和计算机存储介质
CN109063559B (zh) 一种基于改良区域回归的行人检测方法
CN111445478B (zh) 一种用于cta图像的颅内动脉瘤区域自动检测系统和检测方法
US9008365B2 (en) Systems and methods for pedestrian detection in images
Lakshminarayanan et al. Deep Learning-Based Hookworm Detection in Wireless Capsule Endoscopic Image Using AdaBoost Classifier.
US20150054824A1 (en) Object detection method, object detection device, and image pickup device
US9330336B2 (en) Systems, methods, and media for on-line boosting of a classifier
US8488878B2 (en) Sky detection system used in image extraction device and method using sky detection system
CN111583220B (zh) 影像数据检测方法和装置
CN111104925B (zh) 图像处理方法、装置、存储介质和电子设备
CN113762009B (zh) 一种基于多尺度特征融合及双注意力机制的人群计数方法
WO2009152509A1 (en) Method and system for crowd segmentation
JP5936561B2 (ja) 画像における外観及びコンテキストに基づく物体分類
WO2019184851A1 (zh) 图像处理方法和装置及神经网络模型的训练方法
CN113011450B (zh) 青光眼识别的训练方法、训练装置、识别方法及识别系统
CN112613471A (zh) 人脸活体检测方法、装置及计算机可读存储介质
CN113780145A (zh) 精子形态检测方法、装置、计算机设备和存储介质
CN108229281B (zh) 神经网络的生成方法和人脸检测方法、装置及电子设备
CN117690128A (zh) 胚胎细胞多核目标检测系统、方法和计算机可读存储介质
CN112991280B (zh) 视觉检测方法、系统及电子设备
CN112597842B (zh) 基于人工智能的运动检测面瘫程度评估系统
Kailash et al. Deep learning based detection of mobility aids using yolov5

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21944814

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21944814

Country of ref document: EP

Kind code of ref document: A1