CN114429459A

CN114429459A - Training method of target detection model and corresponding detection method

Info

Publication number: CN114429459A
Application number: CN202210080240.1A
Authority: CN
Inventors: 王娜; 刘星龙; 黄宁; 陈翼男
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-05-03
Also published as: WO2023138190A1

Abstract

The application discloses a training method of a target detection model, and a corresponding detection method, device, equipment and storage medium, wherein the training method comprises the following steps: acquiring a sample medical image containing a preset organ, wherein the sample medical image is marked with a marking result of at least one target positioned on the preset organ, and the marking result comprises an actual area where the target is positioned; respectively matching at least one first candidate region for each target according to a matching sequence by using a target detection model, and obtaining a final prediction result about the target based on the first candidate regions, wherein the matching sequence is determined based on the size of an actual region where each target is located; and adjusting parameters of the target detection model by using the final prediction result and the labeling result. By the method, the recall rate of the target detection model during training can be improved.

Description

Training method of target detection model and corresponding detection method

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a training method for a target detection model and a corresponding detection method.

Background

By detecting the organs, possible focuses of the organs can be found, and the diagnosis and treatment effect can be improved. At present, the target detection model is trained, so that the target detection model can be used for detecting organs subsequently, the workload of medical staff can be greatly reduced, and the diagnosis and treatment efficiency is improved.

However, due to different sizes of the targets in the organs, the prior training method of the target detection model has a low recall rate when detecting the targets with different sizes, so that the target detection effect is poor, which limits the further development of the technology.

Therefore, how to improve the training method of the target detection model to improve the recall rate of the detection result and improve the target detection effect has very important significance.

Disclosure of Invention

The application at least provides a training method of a target detection model, a corresponding detection method, a device, equipment and a storage medium.

The first aspect of the present application provides a training method for a target detection model, where the training method includes: acquiring a sample medical image containing a preset organ, wherein the sample medical image is marked with a marking result of at least one target positioned on the preset organ, and the marking result comprises an actual area where the target is positioned; respectively matching at least one first candidate region for each target according to a matching sequence by using a target detection model, and obtaining a final prediction result about the target based on the first candidate regions, wherein the matching sequence is determined based on the size of an actual region where each target is located; and adjusting parameters of the target detection model by using the final prediction result and the labeling result.

Therefore, in the training process, at least one first candidate region is matched for each target according to the matching sequence, and the matching sequence is determined according to the size of the actual region where the target on the preset organ is located, so that the matching sequence can be adjusted according to the size of the actual region where the target is located, the matching sequence can be more suitable for the condition that the targets are different in size, the recall rate during training of the target detection model is improved, and the target detection effect is improved.

Wherein, the matching sequence is as follows: the smaller the size of the actual area, the earlier the target is matched.

Therefore, by setting the smaller the size of the actual region, the earlier the matching is performed, it may be that the smaller size of the actual region can be preferentially matched, so that the smaller size of the actual region can be matched to the more suitable first candidate region, thereby being able to improve the recall rate in the training of the target detection model, particularly the recall rate of the small target, and contributing to the improvement of the target detection effect.

Wherein, the above using the target detection model to match at least one first candidate region for each target according to the matching sequence respectively includes: dividing the targets into different target groups based on the size of the actual area where the targets are located, wherein the size range corresponding to each target group is different; determining the matching sequence corresponding to different target groups based on the size ranges of the different target groups; and respectively matching at least one first candidate region for the targets in each target group according to the matching sequence.

Therefore, the targets can be divided into groups based on the size of the actual area where the targets are located by determining the matching sequence corresponding to different target groups based on the size ranges of the different target groups, and further, the matching sequence is determined based on the size of the actual area where the targets are located by determining the matching sequence corresponding to the different target groups based on the size ranges of the different target groups.

Wherein, the matching at least one first candidate region for the targets in each target group according to the matching sequence respectively includes: and (3) carrying out the following matching steps on each target group according to the matching sequence: obtaining the matching degree between each target in the target group and different anchor point areas of the sample medical image; based on the matching degree, at least one anchor point area is selected for each target of the target group to serve as a first candidate area of the target.

Therefore, by acquiring the matching degree between each target in the target group and different anchor point regions of the sample medical image, at least one anchor point region can be selected for each target in the target group as a first candidate region of the target based on the matching degree, and the determination of the first candidate region for each target in the target group is realized.

Wherein, the number of the first candidate regions of the targets in the different target groups is different, and the smaller the size range is, the larger the number of the first candidate regions of the targets in the target group is; and/or the degree of match between the target and the anchor region is the degree of overlap between the target and the anchor region.

Therefore, the degree of coincidence between the target and the anchor region is used as the degree of matching between the target and the anchor region, so that the degree of matching is determined according to the degree of coincidence. In addition, the larger the number of the first candidate regions of the targets in the target group with the smaller size range is, the more first candidate regions matched with the targets with the smaller size range can be used for training the target detection model during training, so that the sensitivity of the target detection model on the detection of the small targets and the accuracy of the detection of the small targets during practical application can be improved.

Wherein, the above-mentioned matching degree between each target in the target group and different anchor point regions of the sample medical image includes: selecting anchor point areas which are not taken as first candidate areas from a plurality of anchor point areas generated for the sample medical image as anchor point areas to be matched, and acquiring the matching degree between each target in the target group and each anchor point area to be matched; and/or, before matching at least one first candidate region for the objects in each object group respectively according to the matching order, the method further comprises: generating a preset number of anchor point regions with different sizes for each position point of the sample medical image, wherein the sizes of the anchor point regions with the preset number are determined respectively based on the preset number of first feature maps with different scales of the sample medical image.

Therefore, by selecting an anchor point region that is not used as the first candidate region as an anchor point region to be matched and obtaining the matching degree between each target in the target group and each anchor point region to be matched, more anchor point regions can be selected as the first candidate region. In addition, a plurality of anchor regions may be generated by determining the size of a preset number of anchor regions based on a preset number of differently scaled first feature maps of the sample medical image.

Before the above using the target detection model to match at least one first candidate region for each target according to the matching order and obtaining the prediction result about the target based on the first candidate region, the method further includes: acquiring a preset number of first feature maps with different scales of a sample medical image by using a target detection model, wherein the preset number is greater than or equal to 1; deriving a final prediction result about the target based on the first candidate region, including: and predicting to obtain a final prediction result based on the first candidate region and the first feature map.

Therefore, by generating a preset number of first feature maps with different scales, the target detection model can be trained by using the first feature maps with different sizes, so that the detection effect of the target detection model on targets with different sizes can be improved.

The above-mentioned obtaining of the preset number of first feature maps with different scales of the sample medical image includes: carrying out feature extraction on the sample medical image to obtain a preset number of second feature maps with different scales; and for each second feature map, performing preset attention processing on the second feature map to obtain a first feature map corresponding to the second feature map, wherein the preset attention processing comprises one or more of dimension attention processing and feature channel attention processing.

Therefore, by performing the preset attention processing on the second feature map, the target detection model is facilitated to extract more accurate feature information about the target, so that the accuracy and recall rate of target detection can be improved.

The obtaining of the first feature map corresponding to the second feature map by performing the predetermined attention processing on the second feature map includes: obtaining dimension weights corresponding to all dimensions of the second feature map, and performing weighting processing on all dimension features in the second feature map by using the dimension weights corresponding to all dimensions to obtain a spatial focusing feature map; dividing the characteristics of different channels in the spatial focusing characteristic diagram into a plurality of channel characteristic groups, acquiring the channel weight corresponding to each channel characteristic group, and performing weighting processing on the plurality of channel characteristic groups by using the channel weight to obtain a first characteristic diagram obtained by the characteristic channel attention processing.

Therefore, the second feature map is used for obtaining the spatial focusing feature map, and the spatial focusing feature map is further used for obtaining the first feature map obtained through feature channel attention processing, so that the target detection model is facilitated to extract more accurate feature information about the target in the spatial dimension and the channel dimension.

The obtaining of the dimension weight corresponding to each dimension of the second feature map includes: taking each dimension of the second feature map as a target dimension, and performing average pooling on the second feature map for the remaining dimensions except the target dimension to obtain a third feature map on the target dimension; determining the dimension weight corresponding to each dimension of the second feature map by using the third feature maps on different target dimensions; and/or, acquiring the channel weight corresponding to each channel feature group, including: and performing cosine transform on each channel feature group to obtain channel weight corresponding to each channel feature group.

Therefore, each dimension of the second feature map is taken as a target dimension, the second feature map is subjected to average pooling of the remaining dimensions except the target dimension to obtain a third feature map on the target dimension, and the third feature maps on different target dimensions are utilized, so that the dimension weight corresponding to each dimension of the second feature map can be determined, and the target detection model is facilitated to extract more accurate feature information about the target.

The at least first candidate region is selected from a plurality of anchor point regions of different sizes of the sample medical image, and the first candidate regions of different sizes respectively correspond to first feature maps of different scales; and predicting to obtain a final prediction result based on the first candidate region and the first feature map, wherein the method comprises the following steps of: for each first candidate region, obtaining feature information of the first candidate region based on a first feature map corresponding to the size of the first candidate region; and predicting to obtain a final prediction result about the target by using the characteristic information of the first candidate region.

Therefore, the first feature map corresponding to the size of the first candidate region is determined, and the feature information of the first candidate region is obtained, so that the feature information of the first candidate region can be utilized to perform target detection to obtain a final prediction result.

The predicting to obtain a final prediction result about the target by using the feature information of the first candidate region includes: adjusting the first candidate regions by utilizing the characteristic information of the first candidate regions to obtain initial prediction results corresponding to the first candidate regions, wherein the initial prediction results corresponding to the first candidate regions comprise the initial prediction regions of the target obtained by adjustment based on the first candidate regions; and performing optimization prediction by using the initial prediction results corresponding to the first candidate regions to obtain a final prediction result related to the target.

Therefore, the first candidate regions are adjusted by using the feature information of the first candidate regions to obtain initial prediction results corresponding to the first candidate regions, and further, the initial prediction results corresponding to the first candidate regions are used for performing optimized prediction to obtain final prediction results related to the target, so that the sensitivity of the target detection model for detecting the target can be realized, and the accuracy of target detection is improved.

Wherein, the final prediction result further includes a final confidence of the category to which the final prediction region belongs; and adjusting parameters of the target detection model by using the final prediction result and the labeling result, wherein the parameters comprise: obtaining a first category loss based on the final confidence; obtaining a first return loss based on the offset between the final prediction region and the actual region and the intersection ratio of the final prediction region and the actual region; and adjusting parameters of the target detection model by utilizing the first class loss and the first regression loss.

Therefore, by obtaining the first class loss and the first regression loss, the training of the target detection model can be realized based on the classification loss of the target detection model and the regression loss of the prediction result.

The final prediction result is obtained by predicting the target detection model by using the first candidate region to obtain an initial prediction result and performing optimization prediction on the initial prediction result, and the initial prediction result comprises an initial prediction region of the target and an initial confidence coefficient of a category to which the initial prediction region belongs; the adjusting the parameters of the target detection model by using the final prediction result and the labeling result further includes: obtaining a second category loss based on the initial confidence; obtaining a second regression loss based on the offset between the initial prediction region and the actual region and the intersection ratio of the initial prediction region and the actual region; utilizing the first class loss and the first regression loss to adjust parameters of the target detection model, comprising: and adjusting parameters of the target detection model by utilizing the first category loss, the first regression loss, the second category loss and the second regression loss.

Therefore, by obtaining the first class loss, the first regression loss, the second class loss and the second regression loss, the parameters of the target detection model can be adjusted, so as to train the target detection model.

Wherein, the obtaining a first regression loss based on the offset between the final prediction region and the actual region and the intersection ratio between the final prediction region and the actual region, or obtaining a second regression loss based on the offset between the initial prediction region and the actual region and the intersection ratio between the initial prediction region and the actual region, includes: obtaining a first offset loss of the corresponding prediction region by using the offset corresponding to the corresponding prediction region; obtaining a loss weight of the corresponding prediction region based on the corresponding intersection ratio of the corresponding prediction region, and multiplying the first offset loss of the corresponding prediction region by using the loss weight of the corresponding prediction region to obtain a second offset loss of the corresponding prediction region, wherein the larger the intersection ratio is, the smaller the loss weight is; obtaining the GIOU loss of the corresponding prediction region based on the corresponding intersection ratio of the corresponding prediction region; and obtaining the corresponding regression loss by using the second offset loss and the GIOU loss.

Therefore, by setting the intersection ratio to be larger, the loss weight is smaller. Therefore, the result with lower coincidence degree between the final prediction region corresponding to the target and the actual region where the target is located can be punished greatly, so that the parameter updating strength of the target detection model during optimized positioning is higher, and the accuracy of target detection is improved. In addition, the corresponding regression loss is obtained by utilizing the second offset loss and the GIOU loss, so that the prediction region of the trained target detection model can be positioned more accurately.

Wherein, the sample medical image is a three-dimensional medical image; and/or the predetermined organ is a lung and the target is a nodule.

Therefore, by defining the preset organ as a lung and the target as a nodule, the trained target detection model can perform targeted detection on the nodule in the lung.

A second aspect of the present application provides a target detection method, including: acquiring a target medical image containing a preset organ; obtaining a first feature map of a target medical image by using a target detection model, determining at least one first candidate region of the target, and obtaining a final prediction result about the target based on the first candidate region and the first feature map; the target detection model is obtained by training with the above-mentioned training method for the target detection model of the first aspect, and/or the first feature map is obtained by performing preset attention processing on a second feature map obtained by extracting features of the target medical image, where the preset attention processing includes one or more of dimension attention processing and feature channel attention processing.

Therefore, the target detection is performed by using the target detection model obtained by training through the training method of the target detection model, so that the accuracy and the recall rate of the target detection can be improved. In addition, by performing the preset attention processing on the second feature map, the target detection model is facilitated to extract more accurate feature information about the target, so that the accuracy and recall rate of target detection can be improved.

The third aspect of the present application provides a training apparatus for a target detection model, the apparatus including an obtaining module, a detecting module and an adjusting module, wherein the obtaining module is configured to obtain a sample medical image including a preset organ, the sample medical image is labeled with a labeling result of at least one target located on the preset organ, and the labeling result includes an actual region where the target is located; the detection module is used for respectively matching at least one first candidate region for each target according to the matching sequence by using the target detection model and obtaining a final prediction result about the target based on the first candidate region, wherein the matching sequence is determined based on the size of an actual region where each target is located; and the adjusting module is used for adjusting the parameters of the target detection model by utilizing the final prediction result and the labeling result.

The present application in a fourth aspect provides an object detection apparatus, comprising: the system comprises an acquisition module and a detection module, wherein the acquisition module is used for acquiring a target medical image containing a preset organ; the detection module is used for obtaining a first feature map of a target medical image by using a target detection model, determining at least one first candidate region of the target, and obtaining a final prediction result about the target based on the first candidate region and the first feature map; the target detection model is obtained by training with the above-mentioned training method for the target detection model of the first aspect, and/or the first feature map is obtained by performing preset attention processing on a second feature map obtained by extracting features of the target medical image, where the preset attention processing includes one or more of dimension attention processing and feature channel attention processing.

A fifth aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the method for training an object detection model described in the above first aspect, or to implement the method for object detection described in the above second aspect.

A sixth aspect of the present application provides a computer-readable storage medium having stored thereon program instructions that, when executed by a processor, implement the method of training an object detection model described in the above first aspect, or implement the method of object detection described in the above second aspect.

According to the scheme, in the training process, at least one first candidate region is matched for each target according to the matching sequence, the matching sequence is determined according to the size of the actual region where the target on the preset organ is located, so that the matching sequence can be adjusted according to the size of the actual region where the target is located, the matching sequence can be more suitable for the condition that the targets are different in size, the recall rate during training of a target detection model is improved, and the target detection effect is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application.

FIG. 1 is a first flowchart of an embodiment of a method for training a target detection model according to the present application;

FIG. 2 is a second flowchart of an embodiment of a method for training a target detection model according to the present application;

FIG. 3 is a first flowchart of another embodiment of a method for training an object detection model according to the present application;

FIG. 4 is a second flowchart of another embodiment of a method for training an object detection model according to the present application;

FIG. 5 is a third flowchart of another embodiment of a training method for an object detection model according to the present application;

FIG. 6 is a schematic flowchart of a training method for object detection models according to another embodiment of the present application;

fig. 7 is a schematic structural diagram of the target detection model in the training method of the target detection model of the present application.

FIG. 8 is a schematic flow chart diagram of an embodiment of a target detection method of the present application;

FIG. 9 is a schematic diagram of a structure of the training apparatus for the object detection model of the present application;

FIG. 10 is a schematic view of the structure of the object detecting device of the present application;

FIG. 11 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 12 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Referring to fig. 1, fig. 1 is a first flowchart illustrating a training method of a target detection model according to an embodiment of the present application. Specifically, the following steps may be included:

step S11: a sample medical image containing a predetermined organ is acquired.

In the present application, the predetermined organ may be an organ of an animal or a human body. The animal organ is, for example, a kidney, a heart, etc. of a dog. Examples of organs of the human body are the kidneys, lungs, heart, etc. In one embodiment, the predetermined organ is a lung of a human body.

The sample medical image may be a two-dimensional image or a three-dimensional image. The three-dimensional image may be a three-dimensional image obtained by scanning an organ. For example, the medical image of the sample can be obtained by three-dimensional imaging by a Computed Tomography (CT) imaging technique. The two-dimensional image is, for example, a medical image of a specimen obtained by an ultrasonic imaging technique or an X-ray imaging technique. It will be appreciated that the method of imaging the medical image of the sample is not limited.

In the application, the sample medical image is labeled with a labeling result of at least one target located on a preset organ, and the labeling result includes an actual region where the target is located. The target on the predetermined organ may be a specific substance present on the organ. Such as nodules in the lungs, cysts in the kidneys, etc. The actual region where the object is located is the region where the object is present on the sample medical image. Such as the area of nodules in the lungs, the area of cysts in the kidneys, etc.

In one embodiment, the predetermined organ is a lung and the target on the predetermined organ is a nodule. By defining the preset organ as a lung and the target as a nodule, the trained target detection model can perform targeted detection on the nodule in the lung.

In one embodiment, the sample medical image may be a re-sampled of the initial sample medical image. By resampling the initial sample medical image, the resolution of the sample medical image can meet the requirement, which is helpful for improving the accuracy of target detection. Furthermore, normalization operation can be performed on pixel values in the sample medical image, and subsequent training of the target detection model is facilitated.

In one embodiment, after the sample medical image is obtained, the sample medical image may be further subjected to operations such as rotation, translation, mirroring, scaling, and the like, so as to achieve data enhancement, balance positive and negative samples in the sample medical image, achieve the purpose of data volume amplification, and contribute to improving the generalization of the target detection model and reducing the possibility of overfitting.

Step S12: and respectively matching at least one first candidate region for each target according to the matching sequence by using a target detection model, and obtaining a final prediction result about the target based on the first candidate regions.

For target detection, in a candidate region-based target detection algorithm, regardless of one-stage detection or two-stage detection, at least one first candidate region needs to be matched for a target during training, so as to take the first candidate region as a sample region where the target exists, and obtain a final prediction result of the target by using the first candidate region. The first candidate region is, for example, an anchor region (anchor) matching the target in the one-stage detection or two-stage detection algorithm.

In the present application, the matching order is determined based on the size of the actual area in which each object is located. The matching sequence is determined based on the size of the actual region where each target is located, for example, the target to be preferentially matched is determined according to the size of the actual region where the target is located, or the number of first candidate regions matched with the targets with different sizes is determined according to the size of the actual region where the target is located.

In one embodiment, the matching order is: the smaller the size of the actual area, the earlier the target is matched. Therefore, by setting the smaller the size of the actual region, the earlier the matching is performed, it may be that the smaller size of the actual region can be preferentially matched, so that the smaller size of the actual region can be matched to the more suitable first candidate region, thereby being able to improve the recall rate in the training of the target detection model, particularly the recall rate of the small target, and contributing to the improvement of the target detection effect.

After obtaining at least one first candidate region for each target, a final prediction result for the target may be obtained based on the first candidate region. In one embodiment, the final prediction result includes a final prediction region of the predicted target.

The process of specifically obtaining the final prediction result of the target may be a specific process of a one-stage detection algorithm commonly used in the art, or a specific process of a two-stage detection algorithm, which is not described herein again.

Step S13: and adjusting parameters of the target detection model by using the final prediction result and the labeling result.

After the final prediction result is obtained, the corresponding loss value can be determined by using the loss function according to the difference between the final prediction result and the labeling result, and the parameters of the target detection model are adjusted according to the loss value.

Referring to fig. 2, fig. 2 is a second flowchart illustrating a training method of a target detection model according to an embodiment of the present application. In this embodiment, the "matching at least one first candidate region for each target in the matching order by using the target detection model" mentioned in the above steps specifically includes steps S121 to S123.

Step S121: and dividing the targets into different target groups based on the size of the actual area where the targets are located.

In this embodiment, the size range corresponding to each target group is different. That is, the size ranges to which the sizes of the actual regions where the targets belonging to different target groups belong are different from each other. Thus, the targets can be divided into groups based on the size of the actual area where the targets are located.

For example, for nodules in the lung, each nodule may be classified into a different target group according to the size of the actual region of the nodule. Specifically, the nodules may be first divided into different groups according to different sizes, for example, the nodules smaller than 6 mm are divided into small nodules, the nodules from 6 mm to 12 mm are medium nodules, and the nodules larger than 12 mm are large nodules. Then, based on the size of the actual area of the greater, middle and smaller nodes, the smaller nodes were grouped together, the middle nodes were grouped together, and the larger nodes were grouped together.

For another example, each target may be directly divided into different target groups according to the size of the actual region where the target is located. In the specific grouping, if the sample medical image is a two-dimensional image, the actual region where the target is located may be an area, and if the sample medical image is a three-dimensional image, the actual region where the target is located may be a volume. Then, the actual area where the target is located in the first preset range is divided into one group, the actual area where the target is located in the second preset range is divided into one group, and so on.

Step S122: and determining the matching sequence corresponding to the different target groups based on the size ranges of the different target groups.

By dividing each target into different target groups, the targets are divided in groups based on the size of the actual area where the targets are located. On the basis, the matching sequence corresponding to different target groups is determined based on the size range of the different target groups, so that the matching sequence can be determined based on the size of the actual area where each target is located.

For example, there are a total of 4 target groups, the first target group being in the size range of less than 6 mm. The second target group has a size in the range of 6 mm to 10 mm, the third target group has a size in the range of greater than 10 mm to 15 mm, and the fourth target group has a size in the range of greater than 15 mm. Thus, according to the size range of each target group, the matching order can be determined as follows: a first matching first target group, a second matching second person target group, a third matching third target group, a fourth matching fourth target group.

Step S123: and respectively matching at least one first candidate region for the targets in each target group according to the matching sequence.

After the matching sequence is determined, at least one first candidate region can be matched for the targets in each target group according to the matching sequence. Specifically, when at least one first candidate region is matched for the target of each target group, at least one first candidate region may be sequentially matched according to the size order of the actual regions where the targets in the target group are located. For example, the matching may be performed from small to large, or from small to large.

In one embodiment, when the step "match at least one first candidate region for the targets in each target group respectively according to the matching order" is performed, for each target group, the following matching step S1231 and step S1232 (not shown) may be specifically performed on each target group according to the matching order.

Step S1231: and acquiring the matching degree between each target in the target group and different anchor point areas of the sample medical image.

The anchor region may be a default region generated in the sample medical image. For example, when the sample medical image is a two-dimensional image, 4 anchor regions, which are 4 × 4, 8 × 8, 16 × 16, and 32 × 32 anchor regions, may be generated at each pixel point of the sample medical image. As another example, when the sample medical image is a three-dimensional image, 4 anchor regions, respectively 4 × 4, 8 × 8, 16 × 16, and 32 × 32 anchor regions may be generated at each voxel of the sample medical image. It is understood that the size of the anchor point region can be set according to the requirement, and is not limited herein.

After the anchor point regions are obtained, the matching degree between each anchor point region and the target can be obtained by utilizing the anchor point regions, and further the matching degree between each target in the target group and different anchor point regions of the sample medical image is obtained.

In one embodiment, the degree of match between the target and the anchor region is the degree of overlap between the target and the anchor region. For example, an Intersection of Union (IoU) between an actual region where the target is located and the anchor region may be used as the degree of coincidence between the target and the anchor region. For another example, the proportion of the overlap between the actual region where the target is located and the anchor point region to the anchor point region may be directly used as the overlap ratio. Therefore, the degree of coincidence between the target and the anchor region is used as the degree of matching between the target and the anchor region, so that the degree of matching is determined according to the degree of coincidence.

Step S1232: based on the matching degree, at least one anchor point area is selected for each target of the target group to serve as a first candidate area of the target.

Generally speaking, the higher the matching degree, the more the anchor point region can reflect the real situation of the actual region where the target is located. Thus, at least one anchor region may be selected for each target of the target group as a first candidate region for the target based on the degree of matching. For example, for each target of the target group, several anchor regions with the highest degree of matching therewith may be selected as the first candidate region. For example, 1000 anchor point regions are generated on the sample medical image, and by calculating the matching degree of the 1000 anchor point regions with a certain target, the matching degree of the 1000 anchor point regions with the certain target can be determined, so that the 6 anchor point regions with the highest matching degree can be selected as the first candidate region with the certain target.

In one embodiment, an anchor point region that is not a first candidate region may be selected as an anchor point region to be matched from a plurality of anchor point regions generated for the sample medical image, and a matching degree between each object in the object group and each anchor point region to be matched is obtained. For example, if there are already 50 anchor regions as the first candidate regions for some targets among 1000 anchor regions generated on the sample medical image, then several anchor regions may be selected as the first candidate regions for other targets from the remaining 950 anchor regions when selecting the first candidate regions for other targets. Therefore, by selecting an anchor point region that is not used as the first candidate region as an anchor point region to be matched and obtaining the matching degree between each target in the target group and each anchor point region to be matched, more anchor point regions can be selected as the first candidate region.

In one embodiment, the number of first candidate regions of the objects in different object groups is different, and the smaller the size range, the greater the number of first candidate regions of the objects in the object group. For example, there are a total of 4 target groups, the first target group being in the size range of less than 6 mm. The second target group has a size in the range of 6 mm to 10 mm, the third target group has a size in the range of greater than 10 mm to 15 mm, and the fourth target group has a size in the range of greater than 15 mm. It may be determined that the number of first candidate regions of the objects of the first object group is at most 6; the number of first candidate regions of the targets of the second target group is 4; the number of first candidate regions of the targets of the third target group is 3; the number of first candidate regions of the objects of the fourth object group is 2. Therefore, by determining that the number of the first candidate regions of the targets in the target group with the smaller size range is larger, the target detection model can be trained by using more first candidate regions matched with the targets with the smaller size range during training, thereby being beneficial to improving the sensitivity of the target detection model for detecting the small targets and the accuracy of detecting the small targets during practical application.

In one embodiment, in addition to matching at least one first candidate region for each target, a partial anchor region may be selected as a second candidate region, and the second partial anchor region may be used as a region where no target exists to train the target detection model. For example, an anchor region having a certain range of matching degree with a certain target may be selected as the second candidate region. When the matching degree between the target and the anchor point area is the intersection ratio between the actual area where the target is located and the anchor point area, the anchor point area with the intersection ratio of 0.02-0.2 can be used as a second candidate area to train the target detection model, so that the number of positive and negative samples is balanced, and the training effect of the target detection model is improved.

In one embodiment, before the step "matching at least one first candidate region for the targets in each target group respectively according to the matching order", the following steps may be performed: generating a preset number of anchor point regions with different sizes for each position point of the sample medical image, wherein the sizes of the anchor point regions with the preset number are determined respectively based on the preset number of first feature maps with different scales of the sample medical image.

Based on the preset number of first feature maps with different scales of the sample medical image, the feature extraction network of the target detection model can be used for extracting the features of the sample medical image. For example, a Feature extraction network in a Feature Pyramid Network (FPN) or ssd (single Shot multi box detector) model is used to obtain a preset number of first Feature maps with different scales. It is to be understood that the method of obtaining a preset number of first feature maps of different scales is not limited.

In one embodiment, anchor points of one size may be generated on the sample medical image corresponding to one first feature map, such that a predetermined number of anchor points of different sizes may be generated on the sample medical image. In one embodiment, the specific size of the anchor region, which is larger in size, is determined based on the first feature map, which is smaller in size. For example, if the generated first feature maps are 48 × 48, 24 × 24, 12 × 12, 6 × 6, anchor regions of 4 × 4 may be generated on the sample medical image based on the 48 × 48 sized first feature map, anchor regions of 16 × 16 may be generated on the sample medical image based on the 24 × 24 sized first feature map, and anchor regions of 32 × 32 may be generated on the sample medical image based on the 6 × 6 sized first feature map.

In a specific embodiment, when feature information corresponding to the first candidate region needs to be acquired, the feature information on the first feature map corresponding to the first candidate region may be used as the feature information corresponding to the first candidate region. Since the sizes of the anchor point regions with the preset number are respectively determined based on the first feature maps with the preset number and different scales of the sample medical image, the anchor point regions determined based on a certain first feature map are the first feature maps corresponding to the anchor point regions, so that the first feature maps corresponding to the sizes of the first candidate regions can be correspondingly determined. For example. If the size of the anchor region is 16 × 16, the size of the first feature map corresponding to the size of the anchor region is 12 × 12, and the feature information corresponding to the anchor region is the feature information of the region on the first feature map of 12 × 12 size corresponding to the anchor region.

Thus, by determining the size of a preset number of anchor point regions based on a preset number of differently scaled first feature maps of the sample medical image, a plurality of anchor point regions may be generated.

In an embodiment, in an actual application process, when the target detection model is used to perform target detection on the target medical image, the anchor point region generated in the sample medical image may be directly used as the first candidate region to perform target detection.

Referring to fig. 3, fig. 3 is a first flowchart illustrating a training method of a target detection model according to another embodiment of the present application. In this embodiment, steps S21 to S24 are specifically included.

Step S21: a sample medical image containing a predetermined organ is acquired.

For a detailed description of this step, please refer to step S11, which is not described herein again.

Step S22: and acquiring a preset number of first feature maps with different scales of the sample medical image by using the target detection model.

In this embodiment, the preset number is greater than or equal to 1.

The method includes the steps of obtaining a preset number of first feature maps with different scales of a sample medical image by using a target detection model, wherein the first feature maps can be obtained by performing feature extraction on the sample medical image by using a feature extraction network of the target detection model. For example, a Feature extraction network in a Feature Pyramid Network (FPN) or ssd (single Shot multi box detector) model is used to obtain a predetermined number of first Feature maps with different scales. It is to be understood that the method of obtaining a preset number of first feature maps of different scales is not limited. In one embodiment, the bottom-up portion of the feature map pyramid Network is a Residual Network (ResNet). The residual network is for example ResNet 18.

Step S23: and respectively matching at least one first candidate region for each target according to the matching sequence by using a target detection model, and obtaining a final prediction result about the target based on the first candidate regions.

For a detailed description of this step, please refer to step S12, which is not described herein again.

In one embodiment, step S23 specifically includes: and predicting to obtain a final prediction result based on the first candidate region and the first feature map. For each first candidate region, a corresponding region on the first feature map may be determined, and feature information of the first candidate region on the first feature map may be determined accordingly, and then a final prediction result may be obtained based on the first candidate region and the feature information of the first candidate region on the first feature map.

In one embodiment, because a preset number of anchor regions are generated on the sample medical image, the first candidate region may be selected from a number of anchor regions of different sizes of the sample medical image. And the first candidate regions with different sizes respectively correspond to the first feature maps with different scales. For the specific determined method and the corresponding relationship between the first candidate region and the first feature maps with different scales, please refer to the related description of the above steps, which is not described herein again.

In one embodiment, the step of "predicting to obtain a final prediction result based on the first candidate region and the first feature map" specifically includes step S231 and step S232 (not shown).

Step S231: for each first candidate region, feature information of the first candidate region is obtained based on a first feature map corresponding to the size of the first candidate region.

Please refer to the related description of the above steps, which will not be described herein, for a specific determination method of the first feature map corresponding to the size of the first candidate region and a specific method for obtaining the feature information of the first candidate region.

Step S232: and predicting to obtain a final prediction result about the target by using the characteristic information of the first candidate region.

After determining the feature information of the first candidate region, specifically, target detection may be performed by using a target detection algorithm to predict a final prediction result about the target. The target detection algorithm is, for example, a one-stage detection algorithm or a two-stage detection algorithm, which is not described herein again.

In one embodiment, the step of "predicting to obtain a final prediction result about the target by using the feature information of the first candidate region" specifically includes step S2321 and step S2322 (not shown).

Step S2321: and adjusting the first candidate regions by using the characteristic information of the first candidate regions to obtain initial prediction results corresponding to the first candidate regions.

In the present embodiment, the initial prediction result corresponding to the first candidate region includes the target initial prediction region adjusted based on the first candidate region. The initial prediction region is, for example, the Proposal obtained by the two-stage detection algorithm. That is, the target detection model may perform regression (adjustment) on the first candidate region based on the feature information of the first candidate region, so as to obtain an initial prediction result corresponding to the first candidate region.

In one embodiment, the target detection model may further perform regression on the second candidate region based on the feature information of the second candidate region, so as to obtain an initial prediction result corresponding to the second candidate region.

Step S2322: and performing optimization prediction by using the initial prediction results corresponding to the first candidate regions to obtain a final prediction result related to the target.

Performing optimization prediction by using the initial prediction result corresponding to each first candidate region, specifically, taking the initial prediction region of the initial prediction result as a region of interest (RoI), and performing prediction again, where the specific process may refer to a related process in a two-stage detection algorithm, and details are not described here.

Step S24: and adjusting parameters of the target detection model by using the final prediction result and the labeling result.

For detailed description of this step, please refer to step S13, which is not repeated herein.

Referring to fig. 4, fig. 4 is a second flowchart illustrating a training method of a target detection model according to another embodiment of the present application. In this embodiment, the step of "acquiring a preset number of first feature maps of different scales of the sample medical image" specifically includes step S221 and step S222.

Step S221: and performing feature extraction on the sample medical image to obtain a preset number of second feature maps with different scales.

The method includes performing Feature extraction on a sample medical image to obtain a preset number of second Feature maps with different scales, which may be specifically obtained by using a Feature extraction network in a Feature map Pyramid network (FPN) and a ssd (single Shot multi box detector) model, and is not described herein again.

Step S222: and for each second feature map, performing preset attention processing on the second feature map to obtain a first feature map corresponding to the second feature map.

In this embodiment, the preset attention process includes one or more of a dimension attention process and a feature channel attention process. The dimension attention process is, for example, a coordinate attention process in an object detection algorithm. The feature Channel Attention process is, for example, a Channel Attention process in an object detection algorithm. And will not be described in detail herein.

In a specific embodiment, the step S222 specifically includes a step S2221 and a step S2222 (not shown).

Step S2221: and obtaining the dimension weight corresponding to each dimension of the second feature map, and performing weighting processing on each dimension feature in the second feature map by using the dimension weight corresponding to each dimension to obtain the spatial focusing feature map.

In the case that the sample medical image is a two-dimensional image, the dimension weight corresponding to the X, Y dimension of the second feature map may be used, and then the dimension weight corresponding to the X, Y dimension performs weighting processing on the X, Y dimension feature in the second feature map, so as to obtain the spatial focusing feature map. In the case that the sample medical image is a three-dimensional image, the dimension weight corresponding to the X, Y, Z dimension of the second feature map may be used, and then the dimension weight corresponding to the X, Y, Z dimension performs weighting processing on the X, Y, Z dimension feature in the second feature map, so as to obtain the spatial focusing feature map. And obtaining the dimension weight corresponding to each dimension, which can be obtained by processing the feature information of each dimension of the second feature map by using coordinate attention.

In one embodiment, the dimension weight corresponding to each dimension of the second feature map may be obtained through the following steps 1 and 2 (not shown).

Step 1: and taking each dimension of the second feature map as a target dimension, and performing average pooling on the second feature map for the remaining dimensions except the target dimension to obtain a third feature map on the target dimension.

For example, if the sample medical image is a three-dimensional image, then the second feature map is also a three-dimensional feature map. At this time, the X, Y, Z dimensions of the second feature map may be respectively regarded as target dimensions. And taking the X dimension of the second feature map as a target dimension, and then performing average pooling on Y, Z dimensions to obtain a third feature map in the X dimension. Similarly, a third feature map in Y, Z dimension can be obtained in the same manner.

And 2, step: and determining the dimension weight corresponding to each dimension of the second feature map by using the third feature maps on different target dimensions.

And after the third feature maps in all dimensions are obtained, determining the dimension weight corresponding to each dimension of the second feature map by using the third feature maps in all dimensions.

In a specific embodiment, the third feature maps in each dimension may be spliced, then a convolutional layer with batch normalization and nonlinear activation is used for processing, then the output after convolutional layer processing is subdivided into feature maps in each dimension, and then a dimension weight corresponding to each dimension of the second feature map is obtained after one layer of convolution and activation. For example, after obtaining the X, Y, Z-dimensional third feature map, the X, Y, Z-dimensional third feature map may be spliced, then processed by using a convolutional layer with batch normalization and nonlinear activation, and then the output of the convolutional layer after processing is subdivided into X, Y, Z-dimensional feature maps, and then after one-layer convolution and activation, the dimension weight corresponding to X, Y, Z-dimensional feature of the second feature map is obtained.

In an embodiment, the second feature map may be cut into a plurality of parts in the channel dimension, then the processing of step 1 and step 2 is performed on each part, the dimension weight corresponding to each dimension of the second feature map of each part is obtained, and then the results of each part are combined, so that the dimension weight corresponding to each dimension of the complete second feature map can be obtained. By dividing the second profile into several parts in the channel dimension, the amount of data per process can be reduced.

Therefore, the dimensions of the second feature map are respectively used as target dimensions, the second feature map is subjected to average pooling of the remaining dimensions except the target dimensions to obtain a third feature map on the target dimensions, and the third feature maps on different target dimensions are utilized, so that the dimension weight corresponding to each dimension of the second feature map can be determined, and the target detection model is facilitated to extract more accurate feature information about the target.

Step S2222: dividing the characteristics of different channels in the spatial focusing characteristic diagram into a plurality of channel characteristic groups, acquiring the channel weight corresponding to each channel characteristic group, and performing weighting processing on the plurality of channel characteristic groups by using the channel weight to obtain a first characteristic diagram obtained by the characteristic channel attention processing.

The features of different channels in the spatial focusing feature map are divided into a plurality of channel feature groups, for example, 256-dimensional channel features in the spatial focusing feature map are divided into four channel feature groups, and each channel feature group is 64-dimensional channel features.

In a specific embodiment, the obtaining of the Channel weight corresponding to each Channel feature group may specifically be performing cosine transform on each Channel feature group, that is, performing Frequency Channel assignment on each Channel feature group, so as to obtain the Channel weight corresponding to each Channel feature group. Specifically, the features of the spatial focusing feature map may be divided into a plurality of equal parts in the channel dimension, and each feature may be multiplied by a cosine series to obtain the frequency after cosine transform. The frequencies are then combined in the channel dimension and passed through a fully connected layer with sigmoid activation to obtain the channel weight. Therefore, by performing cosine transform on each channel feature group, the utilization rate of feature information can be improved, which is beneficial to improving the accuracy of target detection.

Referring to fig. 5, fig. 5 is a third flowchart illustrating a training method of a target detection model according to another embodiment of the present application. In this embodiment, the final prediction result further includes a final confidence of the category to which the final prediction region belongs. The step of "adjusting the parameters of the target detection model by using the final prediction result and the labeling result" specifically includes steps S241 to S243.

Step S241: based on the final confidence, a first category loss is obtained.

For target detection, for a target, a plurality of first candidate regions matched with the target exist, and a plurality of final prediction results can be obtained based on the plurality of first candidate regions. Thus, one target may correspond to multiple final predictors. At this time, based on the final confidence, the optimal final prediction result can be selected from several final prediction results, so as to realize the detection of the target. For example, a Non-Maximum Suppression (NMS) process may be performed on the final confidence of each final prediction result to obtain an optimal final prediction result, and a classification score of the optimal final prediction result may also be determined accordingly.

Then, a first category loss may be calculated based on the classification scores of the corresponding optimal final predictors for the several targets and the classification scores of the corresponding final predictors for the second candidate region. For example, a first class loss may be obtained using the Focal loss function.

In one embodiment, the first category loss may be calculated using the following equation (1):

where y represents the true classification information of the obtained optimal final prediction result, y ═ 1 represents the optimal final prediction result as the target, y ═ 0 represents the optimal final prediction result as the background, y' represents the classification score of the optimal final prediction result, γ is the loss weight, and α is the adjustment weight.

In one embodiment, the tag [0,1] of the real classification information can be softened to [0.1,0.9], so that the generalization performance of the target detection model can be enhanced.

Step S242: and obtaining a first return loss based on the offset between the final prediction region and the actual region and the intersection ratio of the final prediction region and the actual region.

The final prediction region of this step may be the final prediction region of the optimal final prediction result of one target determined in step S231.

Determining the offset between the final prediction region and the actual region may be a common method in the art, for example, using a smooth-L1 loss function to determine the offset between the final prediction region and the actual region. The method of determining the intersection ratio of the final prediction region and the actual region may be a general calculation method, and will not be described herein.

Specifically, based on the offset between the final prediction region and the actual region and the intersection ratio between the final prediction region and the actual region, the offset between the final prediction region and the actual region may be weighted by using the intersection ratio between the final prediction region and the actual region as an adjustment weight, or the offset between the final prediction region and the actual region may be weighted by calculating a loss value by using the intersection ratio between the final prediction region and the actual region, and then weighted and summed with the offset between the final prediction region and the actual region to obtain the first return loss.

Step S243: and adjusting parameters of the target detection model by utilizing the first class loss and the first regression loss.

After obtaining the first class loss and the first regression loss, respectively, a final loss may be determined based on the two losses, for example, by means of weighted summation or the like, so as to obtain the final loss. And adjusting parameters of the target detection model based on the final loss.

In one embodiment, the final prediction result is obtained by predicting the target detection model by using the first candidate region to obtain an initial prediction result and performing optimized prediction on the initial prediction result. The initial prediction result includes an initial prediction region of the target and an initial confidence of a category to which the initial prediction region belongs. In a particular embodiment, the initial predictor further comprises an initial predictor corresponding to the second candidate region. Correspondingly, the step "adjusting the parameters of the target detection model by using the final prediction result and the labeling result" further includes step S234 and step S235 (not shown).

Step S244: based on the initial confidence, a second category loss is derived.

Step S245: and obtaining a second regression loss based on the offset between the initial prediction region and the actual region and the intersection ratio of the initial prediction region and the actual region.

For a detailed description of step S244 and step S245, please refer to step S241 and step S242, which are not described herein again.

In this case, the step of "adjusting the parameters of the target detection model using the first class loss and the first regression loss" specifically includes: and adjusting parameters of the target detection model by utilizing the first category loss, the first regression loss, the second category loss and the second regression loss. For example, a first loss may be obtained by using a first class loss and a first regression loss, a second loss may be obtained by using a second class loss and a second regression loss, a final loss value may be obtained based on the first loss and the second loss, and a parameter of the target detection model may be adjusted according to the final loss value.

Referring to fig. 6, fig. 6 is a schematic flowchart illustrating a training method of a target detection model according to another embodiment of the present application. In this embodiment, the obtaining of the first regression loss and the obtaining of the second regression loss based on the offset between the final prediction region and the actual region and the intersection ratio between the final prediction region and the actual region or the obtaining of the second regression loss based on the offset between the initial prediction region and the actual region and the intersection ratio between the initial prediction region and the actual region includes steps S31 to S33.

Step S31: obtaining a first offset loss of the corresponding prediction region by using the offset corresponding to the corresponding prediction region; and obtaining the loss weight of the corresponding prediction region based on the corresponding intersection ratio of the corresponding prediction region, and multiplying the first offset loss of the corresponding prediction region by using the loss weight of the corresponding prediction region to obtain the second offset loss of the corresponding prediction region.

In the present embodiment, the larger the cross-over ratio, the smaller the loss weight. Therefore, the result with lower coincidence degree between the final prediction region corresponding to the target and the actual region where the target is located can be punished greatly, so that the parameter updating strength of the target detection model during optimized positioning is higher, and the accuracy of target detection is improved.

In this embodiment, the corresponding prediction region may be an initial prediction region corresponding to the target in the initial prediction result, or a final prediction region corresponding to the target in the final prediction result. The intersection ratio corresponding to the corresponding prediction region may be an intersection ratio of the initial prediction region and the actual region, and an intersection ratio of the final prediction region and the actual region.

In one embodiment, the second offset loss may be determined by the following equations (2) and (3):

W_iou＝(e^-iou+0.4) (2)

wherein iou is the intersection ratio corresponding to the corresponding prediction region. W_iouFor the loss weight of the corresponding prediction region, equation (3) is based on the loss weight W_iouA weighted smooth-L1 loss function is performed.

The second offset loss of the initial prediction region and the final prediction region can be obtained by the above equations (2) and (3), respectively.

Step S32: and obtaining the GIOU loss of the corresponding prediction region based on the corresponding intersection ratio of the corresponding prediction region.

In this embodiment, the GIOU (generalized interaction over Union) loss can be obtained based on the following formula (4):

wherein, A is the corresponding prediction area, B is the actual area, and C is the minimum closed area of the corresponding prediction area and the actual area.

Step S33: and obtaining the corresponding regression loss by using the second offset loss and the GIOU loss.

Specifically, the corresponding regression loss may be obtained by performing weighted addition on the second offset loss and the GIOU loss. For example, a first regression loss or a second regression loss may be obtained, respectively.

Therefore, the corresponding regression loss is obtained by utilizing the second offset loss and the GIOU loss, and the prediction region of the trained target detection model can be positioned more accurately.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a target detection model in the training method of the target detection model of the present application. In the present embodiment, the object detection model 10 includes a feature extraction module 11, an attention module 12, and a detection module 13. The following describes a training method of the target detection model 10 in brief with reference to a specific structure of the target detection model 10.

The feature extraction module 11 is, for example, a feature map pyramid network, a feature extraction network in an SSD model, or the like. The feature extraction module 11 can perform feature extraction on the input sample medical image to obtain a preset number of second feature maps with different scales. In this embodiment, the sample medical image is labeled with a labeling result of at least one target located on a predetermined organ, and the labeling result includes an actual region where the target is located.

The attention module 12 may perform a predetermined attention process on each of the second feature maps. The preset attention processing includes one or more of dimension attention processing and feature channel attention processing, so that the first feature map can be obtained. In this embodiment, the attention module 12 includes a spatial attention submodule 121 and a feature attention submodule 122, where the spatial attention submodule 121 may perform dimensional attention processing on the second feature map to obtain a spatial focusing feature map, and the feature attention submodule 122 may perform feature channel attention processing on the spatial focusing feature map to obtain a first feature map.

The detection module 13 is capable of matching at least one first candidate region for each target in the matching order, and obtaining a final predicted result about the target based on the first candidate regions. In the present embodiment, the detection module 13 includes an initial prediction sub-module 131 and a final prediction sub-module 132. The initial prediction sub-module 131 can adjust the first candidate regions by using the feature information of the first candidate regions to obtain initial prediction results corresponding to the first candidate regions. Then, the final prediction sub-module 132 can perform optimal prediction by using the initial prediction result corresponding to each first candidate region, so as to obtain a final prediction result about the target. For example, the final prediction sub-module 132 performs optimization prediction on the initial prediction result by performing ROI posing using the ROI posing layer of the final prediction sub-module 132, then performing two-layer full-link layer processing with nonlinear activation, and then performing detection, so as to obtain the final prediction result about the target.

Referring to fig. 8, fig. 8 is a schematic flowchart illustrating an embodiment of a target detection method according to the present application. In this embodiment, the target detection method specifically includes:

step S41: a target medical image containing a predetermined organ is acquired.

For obtaining the target medical image including the predetermined organ, please refer to step S11, which is not repeated herein.

Step S42: the method comprises the steps of obtaining a first feature map of a target medical image by using a target detection model, determining at least one first candidate region of the target, and obtaining a final prediction result about the target based on the first candidate region and the first feature map.

In this embodiment, the target detection model is obtained by training using the above-mentioned training method for the target detection model.

In this embodiment, when the target detection is performed by using the target detection model, the first candidate region may be an anchor point region directly generated on the target medical image. In addition, the final prediction result about the target is obtained based on the first candidate region and the first feature map, and specifically, the final prediction result about the target may be obtained by performing detection according to feature information of the first candidate region on the first feature map. Please refer to the related description of the above embodiment, which will not be repeated herein.

Step S43: and performing optimization prediction by using the initial prediction results corresponding to the first candidate regions to obtain a final prediction result related to the target.

For detailed description of this step, please refer to step S2322 and other related descriptions, which are not repeated herein.

Therefore, the target detection is performed by using the target detection model obtained by training through the training method of the target detection model, so that the accuracy and the recall rate of the target detection can be improved.

In a specific embodiment, the first feature map is obtained by performing preset attention processing on a second feature map obtained by feature extraction of the target medical image, and the preset attention processing includes one or more of dimension attention processing and feature channel attention processing. Please refer to the related description of the above embodiments, which will not be described herein again. Therefore, by performing the preset attention processing on the second feature map, the target detection model is facilitated to extract more accurate feature information about the target, so that the accuracy and recall rate of target detection can be improved.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a training apparatus for an object detection model according to the present application. The training device 90 comprises an acquisition module 91, a detection module 92 and an adjustment module 96. The acquiring module 91 is configured to acquire a sample medical image including a preset organ, where the sample medical image is marked with a marking result of at least one target located on the preset organ, and the marking result includes an actual region where the target is located; the detection module 92 is configured to match at least one first candidate region for each target according to a matching sequence by using the target detection model, and obtain a final prediction result about the target based on the first candidate region, where the matching sequence is determined based on the size of an actual region where each target is located; the adjusting module 96 is configured to adjust parameters of the target detection model by using the final prediction result and the labeling result.

The detecting module 92 is configured to match at least one first candidate region for each target according to a matching sequence by using a target detection model, and includes: dividing the targets into different target groups based on the size of the actual area where the targets are located, wherein the size range corresponding to each target group is different; determining the matching sequence corresponding to different target groups based on the size ranges of the different target groups; and respectively matching at least one first candidate region for the targets in each target group according to the matching sequence.

The detecting module 92 is configured to match at least one first candidate region for the targets in each target group according to the matching order, and includes: and performing the following matching steps on each target group according to the matching sequence: obtaining the matching degree between each target in the target group and different anchor point areas of the sample medical image; based on the matching degree, at least one anchor point area is selected for each target of the target group to serve as a first candidate area of the target.

Wherein, the number of the first candidate regions of the targets in the different target groups is different, and the smaller the size range is, the larger the number of the first candidate regions of the targets in the target group is; and/or the matching degree between the target and the anchor point region is the contact ratio between the target and the anchor point region.

The detecting module 92 is configured to obtain matching degrees between each target in the target group and different anchor point regions of the sample medical image, and includes: selecting an anchor point region which is not taken as a first candidate region from a plurality of anchor point regions generated for the sample medical image as an anchor point region to be matched, and acquiring the matching degree between each target in the target group and each anchor point region to be matched; and/or before the detection module 92 is configured to match at least one first candidate region for the targets in each target group respectively according to the matching order, the detection module 92 is further configured to generate a preset number of anchor point regions with different sizes for each position point of the sample medical image, where the sizes of the preset number of anchor point regions are determined based on the preset number of different-scale first feature maps of the sample medical image respectively.

Before the detection module 92 is configured to match at least one first candidate region for each target according to the matching sequence by using the target detection model and obtain a prediction result about the target based on the first candidate region, the detection module 92 is further configured to obtain a preset number of first feature maps with different scales of the sample medical image by using the target detection model, where the preset number is greater than or equal to 1; the detection module 92 is configured to obtain a final prediction result about the target based on the first candidate region, and includes: and predicting to obtain a final prediction result based on the first candidate region and the first feature map.

The detection module 92 is configured to obtain a preset number of first feature maps with different scales from a sample medical image, and includes: carrying out feature extraction on the sample medical image to obtain a preset number of second feature maps with different scales; and for each second feature map, performing preset attention processing on the second feature map to obtain a first feature map corresponding to the second feature map, wherein the preset attention processing comprises one or more of dimension attention processing and feature channel attention processing.

The detecting module 92 is configured to perform a preset attention process on the second feature map to obtain a first feature map corresponding to the second feature map, and includes: obtaining dimension weights corresponding to all dimensions of the second feature map, and performing weighting processing on all dimension features in the second feature map by using the dimension weights corresponding to all dimensions to obtain a spatial focusing feature map; dividing the characteristics of different channels in the spatial focusing characteristic diagram into a plurality of channel characteristic groups, acquiring the channel weight corresponding to each channel characteristic group, and performing weighting processing on the plurality of channel characteristic groups by using the channel weight to obtain a first characteristic diagram obtained by the characteristic channel attention processing.

The detection module 92 is configured to obtain a dimension weight corresponding to each dimension of the second feature map, and includes: taking each dimension of the second feature map as a target dimension, and performing average pooling on the second feature map for the remaining dimensions except the target dimension to obtain a third feature map on the target dimension; determining the dimension weight corresponding to each dimension of the second feature map by using the third feature maps on different target dimensions; and/or, acquiring the channel weight corresponding to each channel feature group, including: and performing cosine transform on each channel feature group to obtain channel weight corresponding to each channel feature group.

The at least first candidate region is selected from a plurality of anchor point regions of different sizes of the sample medical image, and the first candidate regions of different sizes respectively correspond to first feature maps of different scales; the detection module 92 is configured to predict a final prediction result based on the first candidate region and the first feature map, and includes: for each first candidate region, obtaining feature information of the first candidate region based on a first feature map corresponding to the size of the first candidate region; and predicting to obtain a final prediction result about the target by using the characteristic information of the first candidate region.

The detecting module 92 is configured to predict a final prediction result regarding the target by using the feature information of the first candidate region, and includes: adjusting the first candidate regions by utilizing the characteristic information of the first candidate regions to obtain initial prediction results corresponding to the first candidate regions, wherein the initial prediction results corresponding to the first candidate regions comprise the initial prediction regions of the target obtained by adjustment based on the first candidate regions; and performing optimization prediction by using the initial prediction results corresponding to the first candidate regions to obtain a final prediction result related to the target.

Wherein, the final prediction result further includes a final confidence of the category to which the final prediction region belongs; the adjusting module 96 is configured to adjust parameters of the target detection model by using the final prediction result and the labeling result, and includes: obtaining a first category loss based on the final confidence; obtaining a first return loss based on the offset between the final prediction region and the actual region and the intersection ratio of the final prediction region and the actual region; and adjusting parameters of the target detection model by utilizing the first class loss and the first regression loss.

The final prediction result is obtained by predicting the target detection model by using the first candidate region to obtain an initial prediction result and performing optimization prediction on the initial prediction result, and the initial prediction result comprises an initial prediction region of the target and an initial confidence coefficient of a category to which the initial prediction region belongs; the adjusting module 96 is configured to adjust parameters of the target detection model by using the final prediction result and the labeling result, and further includes: obtaining a second category loss based on the initial confidence; obtaining a second regression loss based on the offset between the initial prediction region and the actual region and the intersection ratio of the initial prediction region and the actual region; the adjusting module 96 is configured to adjust parameters of the target detection model by using the first class loss and the first regression loss, and includes: and adjusting parameters of the target detection model by utilizing the first category loss, the first regression loss, the second category loss and the second regression loss.

The adjusting module 96 is configured to obtain a first regression loss based on an offset between the final prediction region and the actual region and an intersection ratio between the final prediction region and the actual region, or obtain a second regression loss based on an offset between the initial prediction region and the actual region and an intersection ratio between the initial prediction region and the actual region, and includes: obtaining a first offset loss of the corresponding prediction region by using the offset corresponding to the corresponding prediction region; obtaining a loss weight of the corresponding prediction region based on the corresponding intersection ratio of the corresponding prediction region, and multiplying the first offset loss of the corresponding prediction region by using the loss weight of the corresponding prediction region to obtain a second offset loss of the corresponding prediction region, wherein the larger the intersection ratio is, the smaller the loss weight is; obtaining the GIOU loss of the corresponding prediction region based on the corresponding intersection ratio of the corresponding prediction region; and obtaining the corresponding regression loss by using the second offset loss and the GIOU loss.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an object detection device according to the present application. The object detection device 100 includes an acquisition module 101 and a detection module 102. The acquisition module 101 is configured to acquire a target medical image including a preset organ; the detection module 102 is configured to obtain a first feature map of a medical image of a target by using a target detection model, determine at least one first candidate region of the target, and obtain a final prediction result about the target based on the first candidate region and the first feature map; the target detection model 100 is obtained by training using the above training method for the target detection model, and/or the first feature map is obtained by performing preset attention processing on a second feature map obtained by extracting features of the target medical image, where the preset attention processing includes one or more of dimension attention processing and feature channel attention processing.

Referring to fig. 11, fig. 11 is a schematic frame diagram of an electronic device according to an embodiment of the present application. The electronic device 110 comprises a memory 111 and a processor 112 coupled to each other, and the processor 112 is configured to execute program instructions stored in the memory 111 to implement the steps of any of the above embodiments of the object detection method, or to implement the steps of any of the above embodiments of the object detection method. In one particular implementation scenario, the electronic device 110 may include, but is not limited to: the electronic device 110 may further include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

In particular, the processor 112 is configured to control itself and the memory 111 to implement the steps of any of the above-described embodiments of the training method of the image segmentation model, or to implement the steps of any of the above-described embodiments of the image segmentation method. Processor 112 may also be referred to as a CPU (Central Processing Unit). Processor 112 may be an integrated circuit chip having signal processing capabilities. The Processor 112 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 112 may be commonly implemented by integrated circuit chips.

Referring to fig. 12, fig. 12 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 120 stores program instructions 121 that can be executed by the processor, and the program instructions 121 are used for implementing the steps of any of the above-described embodiments of the object detection method, or implementing the steps of any of the above-described embodiments of the object detection method.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is considered as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Claims

1. A training method of an object detection model, the training method comprising:

acquiring a sample medical image containing a preset organ, wherein the sample medical image is marked with a marking result of at least one target on the preset organ, and the marking result comprises an actual area where the target is located;

respectively matching at least one first candidate region for each target according to a matching sequence by using the target detection model, and obtaining a final prediction result about the target based on the first candidate regions, wherein the matching sequence is determined based on the size of an actual region where each target is located;

and adjusting parameters of the target detection model by using the final prediction result and the labeling result.

2. The method of claim 1, wherein the matching order is: the smaller the size of the actual area, the earlier the matching is performed on the object.

3. The method according to claim 1 or 2, wherein the matching, by using the object detection model, at least one first candidate region for each of the objects in a matching order respectively comprises:

dividing the targets into different target groups based on the size of an actual area where the targets are located, wherein the size range corresponding to each target group is different;

determining the matching sequence corresponding to different target groups based on the size ranges of the different target groups;

and respectively matching at least one first candidate region for the targets in each target group according to the matching sequence.

4. The method of claim 3, wherein said matching at least one first candidate region for objects in each of said object groups in said matching order comprises:

and performing the following matching steps on each target group according to the matching sequence:

obtaining the matching degree between each target in the target group and different anchor point areas of the sample medical image;

and selecting at least one anchor point region for each target of the target group as a first candidate region of the target based on the matching degree.

5. The method of claim 4, wherein the number of the first candidate regions of the targets in different target groups is different, and the smaller the size range, the greater the number of the first candidate regions of the targets in the target group;

and/or the matching degree between the target and the anchor point region is the contact ratio between the target and the anchor point region.

6. The method of claim 4, wherein obtaining the degree of matching between each of the targets in the target group and the different anchor point regions of the sample medical image comprises: selecting anchor point regions which are not used as the first candidate regions from a plurality of anchor point regions generated for the sample medical image as anchor point regions to be matched, and acquiring the matching degree between each target in the target group and each anchor point region to be matched;

and/or, before said matching at least one first candidate region for the objects in each of said object groups in said matching order, respectively, said method further comprises: generating a preset number of anchor point regions with different sizes for each position point of the sample medical image, wherein the sizes of the anchor point regions with the preset number are determined respectively based on the first feature maps with the preset number and different sizes of the sample medical image.

7. The method according to any one of claims 1 to 6, wherein before matching at least one first candidate region for each of the targets in the matching order by using the target detection model and obtaining a prediction result about the target based on the first candidate region, the method further comprises:

acquiring a preset number of first feature maps with different scales of the sample medical image by using the target detection model, wherein the preset number is greater than or equal to 1;

the deriving a final prediction result about the target based on the first candidate region comprises:

and predicting to obtain the final prediction result based on the first candidate region and the first feature map.

8. The method of claim 7, wherein the obtaining a preset number of first feature maps of different scales of the sample medical image comprises:

performing feature extraction on the sample medical image to obtain a preset number of second feature maps with different scales;

and for each second feature map, performing preset attention processing on the second feature map to obtain a first feature map corresponding to the second feature map, wherein the preset attention processing includes one or more of dimension attention processing and feature channel attention processing.

9. The method according to claim 8, wherein the performing the predetermined attention processing on the second feature map to obtain a first feature map corresponding to the second feature map comprises:

obtaining dimension weights corresponding to all dimensions of the second feature map, and performing weighting processing on all dimension features in the second feature map by using the dimension weights corresponding to all dimensions to obtain a spatial focusing feature map;

dividing the characteristics of different channels in the spatial focusing characteristic diagram into a plurality of channel characteristic groups, acquiring the channel weight corresponding to each channel characteristic group, and weighting the plurality of channel characteristic groups by using the channel weight to obtain the first characteristic diagram obtained by the characteristic channel attention processing.

10. The method according to claim 9, wherein the obtaining of the dimension weight corresponding to each dimension of the second feature map comprises:

taking each dimension of the second feature map as a target dimension, and performing average pooling on the second feature map for the remaining dimensions except the target dimension to obtain a third feature map on the target dimension;

determining dimension weights corresponding to all dimensions of the second feature map by using the third feature maps on different target dimensions;

and/or, the obtaining of the channel weight corresponding to each channel feature group includes:

and performing cosine transform on each channel feature group to obtain channel weight corresponding to each channel feature group.

11. The method according to claim 7, wherein the at least one first candidate region is selected from anchor regions of different sizes of the sample medical image, the first candidate regions of different sizes corresponding to the first feature maps of different scales, respectively;

the predicting to obtain the final prediction result based on the first candidate region and the first feature map includes:

for each first candidate region, obtaining feature information of the first candidate region based on the first feature map corresponding to the size of the first candidate region;

and predicting to obtain a final prediction result about the target by using the characteristic information of the first candidate region.

12. The method according to claim 11, wherein the predicting a final prediction result about the target by using the feature information of the first candidate region comprises:

adjusting the first candidate regions by using the feature information of the first candidate regions to obtain initial prediction results corresponding to the first candidate regions, wherein the initial prediction results corresponding to the first candidate regions comprise the initial prediction regions of the target obtained by adjusting the first candidate regions;

and performing optimization prediction by using the initial prediction results corresponding to the first candidate areas to obtain a final prediction result related to the target.

13. The method of claim 1, wherein the final prediction result further comprises a final confidence level for a category to which the final prediction region belongs; the adjusting the parameters of the target detection model by using the final prediction result and the labeling result includes:

obtaining a first category loss based on the final confidence;

obtaining a first return loss based on the offset between the final prediction region and the actual region and the intersection ratio of the final prediction region and the actual region;

and adjusting parameters of the target detection model by using the first class loss and the first regression loss.

14. The method according to claim 13, wherein the final prediction result is obtained by predicting the target detection model by using the first candidate region to obtain an initial prediction result and performing optimization prediction on the initial prediction result, and the initial prediction result comprises an initial prediction region of the target and an initial confidence of a category to which the initial prediction region belongs;

the adjusting the parameters of the target detection model by using the final prediction result and the labeling result further comprises:

obtaining a second category loss based on the initial confidence;

obtaining a second regression loss based on the offset between the initial prediction region and the actual region and the intersection ratio of the initial prediction region and the actual region;

the adjusting parameters of the target detection model by using the first class loss and the first regression loss comprises:

and adjusting parameters of the target detection model by using the first category loss, the first regression loss, the second category loss and the second regression loss.

15. The method according to claim 13 or 14, wherein the obtaining a first regression loss based on an offset between the final prediction region and the actual region and an intersection ratio between the final prediction region and the actual region, or obtaining a second regression loss based on an offset between the initial prediction region and the actual region and an intersection ratio between the initial prediction region and the actual region comprises:

obtaining a first offset loss of the corresponding prediction area by using the offset corresponding to the corresponding prediction area; obtaining a loss weight of the corresponding prediction region based on a cross-over ratio corresponding to the corresponding prediction region, and multiplying a first offset loss of the corresponding prediction region by using the loss weight of the corresponding prediction region to obtain a second offset loss of the corresponding prediction region, wherein the larger the cross-over ratio is, the smaller the loss weight is;

obtaining the GIOU loss of the corresponding prediction region based on the intersection ratio corresponding to the corresponding prediction region;

and obtaining corresponding regression loss by using the second offset loss and the GIOU loss.

16. The method according to any one of claims 1 to 15, wherein the sample medical image is a three-dimensional medical image; and/or the presence of a gas in the gas,

the predetermined organ is a lung and the target is a nodule.

17. A method of object detection, comprising:

acquiring a target medical image containing a preset organ;

obtaining a first feature map of the target medical image by using a target detection model, determining at least one first candidate region of the target, and obtaining a final prediction result about the target based on the first candidate region and the first feature map;

the target detection model is obtained by training with the method of any one of claims 1 to 16, and/or the first feature map is obtained by performing a preset attention process on a second feature map obtained by feature extraction of the target medical image, where the preset attention process includes one or more of a dimension attention process and a feature channel attention process.

18. An apparatus for training an object detection model, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a sample medical image containing a preset organ, the sample medical image is marked with a marking result of at least one target positioned on the preset organ, and the marking result comprises an actual area where the target is positioned;

the detection module is used for respectively matching at least one first candidate region for each target according to a matching sequence by using the target detection model, and obtaining a final prediction result about the target based on the first candidate region, wherein the matching sequence is determined based on the size of an actual region where each target is located;

and the adjusting module is used for adjusting the parameters of the target detection model by utilizing the final prediction result and the labeling result.

19. An object detection device, comprising:

the acquisition module is used for acquiring a target medical image containing a preset organ;

the detection module is used for obtaining a first feature map of the target medical image by using a target detection model, determining at least one first candidate region of the target and obtaining a final prediction result about the target based on the first candidate region and the first feature map;

20. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the method for training an object detection model according to any one of claims 1 to 16 or to implement the method for object detection according to claim 17.

21. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the method of training an object detection model according to any one of claims 1 to 16, or implement the method of object detection according to claim 17.