CN114549445A

CN114549445A - Image detection and related model training method, related device, equipment and medium

Info

Publication number: CN114549445A
Application number: CN202210141456.4A
Authority: CN
Inventors: 叶宇翔; 陈翼男; 朱雅靖
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-05-27
Also published as: WO2023155494A1

Abstract

The application discloses a training method, a related device, equipment and a medium for image detection and related models, wherein the training method for the image detection models comprises the following steps: acquiring a sample medical image and reference information thereof; the reference information comprises a first distance corresponding to the target pixel point, the first distance represents the actual distance from the target pixel point to a first reference position, and the first reference position represents the central position of a sample object to which the target pixel point belongs; processing the sample medical image by using the image detection model to obtain a first processing result; the first processing result comprises a second distance corresponding to the target pixel point, and the second distance represents a predicted distance from the target pixel point to the first reference position; and obtaining difference information based on the first distance and the second distance corresponding to the target pixel point, and adjusting the network parameters of the image detection model based on the difference information. According to the scheme, under the condition that different objects are very close to each other or even are adhered to each other, the objects can be accurately distinguished and detected.

Description

Image detection and related model training method, related device, equipment and medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, a related apparatus, a device, and a medium for image detection and training of a related model.

Background

Medical images such as CT (Computed Tomography) images and MR (Magnetic Resonance) images are of great importance in the context of assisted diagnosis, surgical planning, and the like.

In many of the applications described above, accurate detection of objects such as lesions is often the basis upon which such applications may be implemented. However, in an actual scene, objects such as a focus may be very close to each other or even adhered to each other, so that it is difficult for the prior art to accurately distinguish the objects, thereby affecting smooth implementation of subsequent applications. In view of the above, how to accurately distinguish and detect each object becomes an urgent problem to be solved.

Disclosure of Invention

The application provides an image detection and related model training method, a related device, equipment and a medium.

The first aspect of the present application provides a training method for an image detection model, including: acquiring a sample medical image and reference information thereof; the reference information comprises a first distance corresponding to a target pixel point in the sample medical image, the first distance represents the actual distance from the target pixel point to a first reference position, and the first reference position represents the central position of a sample object to which the target pixel point belongs; processing the sample medical image by using the image detection model to obtain a first processing result; the first processing result comprises a second distance corresponding to the target pixel point, and the second distance represents a predicted distance from the target pixel point to the first reference position; and obtaining difference information based on the first distance and the second distance corresponding to the target pixel point, and adjusting the network parameters of the image detection model based on the difference information.

Therefore, a sample medical image and reference information thereof are obtained, the reference information comprises a first distance corresponding to a target pixel point in the sample medical image, the first distance represents an actual distance from the target pixel point to a first reference position, the first reference position represents a central position of a sample object to which the target pixel point belongs, on the basis, the sample medical image is processed by using an image detection model to obtain a first processing result, the first processing result comprises a second distance corresponding to the target pixel point, the second distance represents a predicted distance from the target pixel point to the first reference position, and difference information is obtained based on the first distance and the second distance corresponding to the target pixel point, and network parameters of the image detection model are adjusted based on the difference information, so that the image detection model can learn central distance characteristics of the target pixel point by constraining the difference between the first distance and the second distance, the method is beneficial to improving the perception of the image detection model to the sample object, and further can be beneficial to accurately distinguishing and detecting each object.

Wherein, the difference information includes first difference information, and the difference information is obtained based on the first distance and the second distance corresponding to the target pixel point, and includes: for each target pixel point, obtaining a first sub-loss based on a difference value between a first distance and a second distance corresponding to the target pixel point, obtaining a first weight based on the first distance corresponding to the target pixel point, and obtaining a first loss of the target pixel point based on the first sub-loss and the first weight; and obtaining first difference information based on the first loss of each target pixel point.

Therefore, the difference information includes first difference information, for each target pixel point, a first sub-loss is obtained based on a difference between a first distance and a second distance corresponding to the target pixel point, a first weight is obtained based on the first distance corresponding to the target pixel point, and a first loss of the target pixel point is obtained based on the first sub-loss and the first weight, so that the first difference information is obtained based on the first loss of each target pixel point, therefore, the first sub-loss is subjected to adaptive weighting based on the first distance in the loss calculation process, and the detection precision is further improved.

The target pixel points are located in a target area of the sample medical image, the target area comprises a central area and an edge area of the sample object, the target pixel points comprise at least one of first target pixel points and second target pixel points, the first target pixel points are located in the central area, the second target pixel points are located in the edge area, first weights corresponding to the first target pixel points are positively correlated with first distances corresponding to the first target pixel points, and first weights corresponding to the second target pixel points are positively correlated with first distances corresponding to the second target pixel points.

Therefore, the target pixel points are located in a target area of the sample medical image, the target area comprises a central area and an edge area of the sample object, the target pixel points comprise at least one of first target pixel points and second target pixel points, the first target pixel points are located in the central area, the second target pixel points are located in the edge area, a first weight corresponding to the first target pixel points is in positive correlation with a first distance corresponding to the first target pixel points, a first weight corresponding to the second target pixel points is in positive correlation with a first distance corresponding to the second target pixel points, on one hand, the image detection model can learn the central distance characteristic of the target area in a emphasized mode in the training process, on the other hand, the attention of the image detection model to the target pixel points in the training process can be flexibly controlled.

The target area further comprises a background area irrelevant to the sample object, the target pixel points further comprise third target pixel points, the third target pixel points are located in the background area, and the first weights corresponding to the third target pixel points are preset numerical values.

Therefore, the target area further comprises a background area irrelevant to the sample object, the target pixel points further comprise third target pixel points, the third target pixel points are located in the background area, and the first weights corresponding to the third target pixel points are preset values, so that the perception of the image detection model to the background area can be further improved on the basis of improving the perception of the image detection model to the target area, and the detection performance of the image detection model is favorably improved.

The first distance is within a preset range, and the preset range comprises a lower limit value and an upper limit value; under the condition that the first distance corresponding to the target pixel point is between the lower limit value and the first numerical value, the target pixel point is located in the edge area, under the condition that the first distance corresponding to the target pixel point is between the second numerical value and the upper limit value, the target pixel point is located in the center area, and the second numerical value is not smaller than the first numerical value.

Therefore, the first distance is located in a preset range, the preset range comprises a lower limit value and an upper limit value, the target pixel point is located in the edge region under the condition that the first distance corresponding to the target pixel point is located between the lower limit value and the first numerical value, the target pixel point is located in the center region under the condition that the first distance corresponding to the target pixel point is located between the second numerical value and the upper limit value, and the second numerical value is smaller than the first numerical value, namely the closer the first distance is to the lower limit value, the closer the target pixel point is to the edge, the closer the first distance is to the upper limit value, the closer the target pixel point is to the center, the selected target pixel point can be flexibly controlled by controlling the first numerical value and the second numerical value, and the perception range of the image detection model to the target region is favorably improved.

Wherein, the step of obtaining the first distance comprises: counting the maximum value and the minimum value in the actual distance, and acquiring a first difference value between the maximum value and the minimum value; for each target pixel point belonging to the sample object, acquiring a second difference value between the actual distance and the minimum value, and obtaining a first distance based on a ratio between the second difference value and the first difference value; wherein the first distance is inversely related to the ratio.

Therefore, the maximum value and the minimum value in the actual distance are counted, the first difference value between the maximum value and the minimum value is obtained, the second difference value between the actual distance and the minimum value is obtained for each target pixel point belonging to the sample object, the first distance is obtained based on the ratio between the second difference value and the first difference value, the first distance is in negative correlation with the ratio, the first distance can be obtained through simple operation, and the complexity of calculating the first distance is reduced.

The reference information further comprises a first mark of the target pixel, the first mark represents whether the target pixel actually belongs to the sample object, the difference information further comprises second difference information, and the image detection model comprises a feature extraction network and a semantic segmentation network; before adjusting the network parameters of the image detection model based on the difference information, the method further comprises: obtaining a plurality of sample feature maps extracted by a feature extraction network; processing the plurality of sample feature maps by utilizing a semantic segmentation network to obtain a second processing result corresponding to each sample feature map; the second processing result comprises a second mark of the target pixel point, and the second mark represents whether the target pixel point prediction belongs to the sample object; for each sample feature map, obtaining a second sub-loss based on the difference between the first mark and the second mark, obtaining a second weight based on the resolution of the sample feature map, and obtaining a second loss corresponding to the sample feature map based on the second sub-loss and the second weight; wherein the second weight is positively correlated with the resolution; and obtaining second difference information based on the second losses respectively corresponding to the plurality of sample characteristic graphs.

Therefore, the reference information further includes a first mark of the target pixel, the first mark represents whether the target pixel actually belongs to the sample object, the difference information further includes second difference information, the image detection model includes a feature extraction network and a semantic segmentation network, before adjusting network parameters, a plurality of sample feature maps extracted by the feature extraction network are obtained, the semantic segmentation network is utilized to process the plurality of sample feature maps respectively to obtain second processing results corresponding to the plurality of sample feature maps, the second processing second-level result includes a second mark of the target pixel, the second mark represents whether the target pixel is predicted to belong to the sample object, for each sample feature map, a second sub-loss is obtained based on a difference between the first mark and the second mark, a second weight is obtained based on a resolution corresponding to the sample feature map, and the second sub-loss and the second weight are based on the second sub-loss and the second weight, and obtaining second loss corresponding to the sample characteristic diagrams, wherein the second weight is positively correlated with the resolution, and second difference information is obtained based on the second loss corresponding to each of the plurality of sample characteristic diagrams.

The reference information further comprises first deviation information corresponding to the target pixel point, the first deviation information comprises actual distances from the target pixel point to a plurality of first reference boundaries respectively, a first reference region formed by the first reference boundaries surrounds a sample object to which the target pixel point belongs, the difference information further comprises third difference information, and the image detection model comprises a feature extraction network and a deviation detection network; before adjusting the network parameters of the image detection model based on the difference information, the method further comprises: acquiring a target feature map extracted by a feature extraction network; wherein the target feature map has the same resolution as the sample medical image; processing the target characteristic diagram by using a deviation detection network to obtain a third processing result; the third processing result comprises second deviation information corresponding to the target pixel point, and the second deviation information comprises the prediction distances from the target pixel point to the plurality of first reference boundaries respectively; for each target pixel point belonging to the sample object, obtaining a third sub-loss based on the difference between the first deviation information and the second deviation information corresponding to the target pixel point, obtaining a third weight based on the first distance corresponding to the target pixel point, and obtaining a third loss of the target pixel point based on the third sub-loss and the third weight; third difference information is obtained based on the respective third losses.

Therefore, the reference information further comprises first deviation information corresponding to the target pixel point, the first deviation information comprises actual distances from the target pixel point to a plurality of first reference boundaries respectively, a first reference region formed by the plurality of first reference boundaries surrounds a sample object to which the target pixel point belongs, the difference information further comprises third difference information, the image detection model comprises a feature extraction network and a deviation detection network, before network parameters are adjusted, a target feature map extracted by the feature extraction network is obtained, the target feature map and the sample medical image have the same resolution, the target feature map is processed by the deviation detection network to obtain a third processing result, the third processing result comprises second deviation information corresponding to the target pixel point, and the second deviation information comprises predicted distances from the target pixel point to the plurality of first reference boundaries respectively, on the basis, for each target pixel point belonging to the sample object, a third sub-loss is obtained based on the difference between the first deviation information and the second deviation information corresponding to the target pixel point, a third weight is obtained based on the first distance corresponding to the target pixel point, a third loss of the target pixel point is obtained based on the third sub-loss and the third weight, and finally third difference information is obtained based on each third loss.

A second aspect of the present application provides an image detection method, including: acquiring a medical image to be detected; the medical image to be detected comprises a plurality of target objects; processing the medical image to be detected by using the image detection model to obtain the detection result of each target object; the image detection model is obtained by using the training method of the image detection model in the first aspect.

Therefore, the medical image to be detected is obtained, the medical image to be detected comprises a plurality of target objects, the medical image to be detected is processed by the image detection model to obtain the detection result of each target object, and the image detection image is obtained by the training method of the image detection model in the first aspect, so that the distinguishing precision of each target object in the medical image to be detected can be improved.

The method for processing the medical image to be detected by using the image detection model to obtain the detection result of each target object comprises the following steps: processing the medical image to be detected by using the image detection model to obtain a first detection result, a second detection result and a third detection result; respectively acquiring estimated sizes of a plurality of target objects based on the first detection result and the third detection result; for each target object, selecting any one of the second detection result and the third detection result as a reference detection result based on the estimated size of the target object, and analyzing to obtain a final detection result of the target object based on the first detection result and the reference detection result; the first detection result comprises a central distance corresponding to a pixel point in a medical image to be detected, the central distance represents a prediction distance from the pixel point to a second reference position, the second reference position represents a central position of a target object to which the pixel point belongs, the second detection result comprises an attribute mark of the pixel point, the attribute mark represents whether the pixel point prediction belongs to the target object, the third detection result comprises prediction deviation information corresponding to the pixel point, the prediction deviation information comprises prediction distances from the pixel point to a plurality of second reference boundaries respectively, and second reference areas formed by the plurality of second reference boundaries surround the target object to which the pixel point belongs.

Therefore, the medical image to be detected is processed by the image detection model to obtain a first detection result, a second detection result and a third detection result, the estimated sizes of the target objects are respectively obtained based on the first detection result and the third detection result, and any one of the second detection result and the third detection result is selected as a reference detection result for each target object based on the estimated size of the target object, so that the final detection result of the target object is obtained by analysis based on the first detection result and the reference detection result, therefore, different analysis strategies can be adopted to obtain the final detection result based on the estimated size of the target object, the target objects with different sizes are favorably considered, and the detection precision is further improved.

Wherein selecting either one of the second detection result and the third detection result as a reference detection result based on the estimated size of the target object includes: selecting the second detection result as a reference detection result based on the estimated size of the target object being larger than a preset size, and taking the target object as the first object; analyzing and obtaining a final detection result of the target object based on the first detection result and the reference detection result, wherein the method comprises the following steps: determining the central position of the first object based on the first detection result, taking the pixel point which is predicted to belong to the first object as a first reference pixel point based on the reference detection result, and acquiring the central distance corresponding to the first reference pixel point based on the first detection result; acquiring a contour region of the first object based on the central position of the first object and the central distance corresponding to the first reference pixel point; wherein the final detection result comprises a contour region of the first object.

Therefore, based on the fact that the estimated size of the target object is larger than the preset size, the second detection result is selected as the reference detection result, the target object is used as the first object, on the basis, the center position of the first object is determined based on the first detection result, the pixel point which is predicted to belong to the first object is used as the first reference pixel point based on the reference detection result, the center distance corresponding to the first reference pixel point is obtained based on the first detection result, the contour region of the first object is obtained based on the center position of the first object and the center distance corresponding to the first reference pixel point, and the final detection result comprises the contour region of the first object, so that the contour region of the large-size target object can be quickly obtained by combining the first detection result and the second detection result, and the detection speed is favorably improved.

Wherein selecting either one of the second detection result and the third detection result as a reference detection result based on the estimated size of the target object includes: selecting a third detection result as a reference detection result based on the estimated size of the target object being not larger than the preset size, and taking the target object as a second object; analyzing and obtaining a final detection result of the target object based on the first detection result and the reference detection result, wherein the method comprises the following steps: determining the central position of a second object based on the first detection result, taking a pixel point at the central position of the second object as a second reference pixel point, and acquiring prediction deviation information corresponding to the second reference pixel point based on the reference detection result; acquiring a boundary region surrounding the second object based on the prediction deviation information corresponding to the second reference pixel point; wherein the final detection result comprises a boundary region of the second object.

Therefore, based on the fact that the estimated size of the target object is not larger than the preset size, the third detection result is selected as the reference detection result, the target object is used as the second object, on the basis, the center position of the second object is determined based on the first detection result, the pixel point at the center position of the second object is used as the second reference pixel point, the prediction deviation information corresponding to the second reference pixel point is obtained based on the reference detection result, the boundary area surrounding the second object is obtained based on the prediction deviation information corresponding to the second reference pixel point, and the final detection result comprises the boundary area of the second object, so that the boundary area of the small-size target object can be accurately detected by combining the first detection result and the third detection result, and the detection accuracy is improved.

A third aspect of the present application provides an image detection model training apparatus, including: the system comprises an acquisition module, a processing module and an adjusting module, wherein the acquisition module is used for acquiring a sample medical image and reference information thereof; the reference information comprises a first distance corresponding to a target pixel point in the sample medical image, the first distance represents the actual distance from the target pixel point to a first reference position, and the first reference position represents the central position of a sample object to which the target pixel point belongs; the processing module is used for processing the sample medical image by using the image detection model to obtain a first processing result; the first processing result comprises a second distance corresponding to the target pixel point, and the second distance represents a predicted distance from the target pixel point to the first reference position; the adjusting module is used for obtaining difference information based on the first distance and the second distance corresponding to the target pixel point, and adjusting network parameters of the image detection model based on the difference information.

A fourth aspect of the present application provides an image detection apparatus, comprising: the system comprises an acquisition module and a processing module, wherein the acquisition module is used for acquiring a medical image to be detected; the medical image to be detected comprises a plurality of target objects; the processing module is used for processing the medical image to be detected by using the image detection model to obtain the detection result of each target object; wherein the image detection model is obtained by using the training device of the image detection model in the third aspect.

A fifth aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory to implement the method for training an image detection model in the first aspect or to implement the method for image detection in the second aspect.

A sixth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, which program instructions, when executed by a processor, implement the method for training an image detection model in the above first aspect, or implement the method for image detection in the above second aspect.

According to the scheme, the sample medical image and the reference information thereof are obtained, the reference information comprises a first distance corresponding to a target pixel point in the sample medical image, the first distance represents an actual distance from the target pixel point to a first reference position, the first reference position represents a central position of a sample object to which the target pixel point belongs, on the basis, the sample medical image is processed by using an image detection model to obtain a first processing result, the first processing result comprises a second distance corresponding to the target pixel point, the second distance represents a predicted distance from the target pixel point to the first reference position, and difference information is obtained based on the first distance and the second distance corresponding to the target pixel point, and network parameters of the image detection model are adjusted based on the difference information, so that the image detection model can learn the central distance characteristic of the target pixel point by restricting the difference between the first distance and the second distance, the method is beneficial to improving the perception of the image detection model to the sample object, and further can be beneficial to accurately distinguishing and detecting each object.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a training method for an image detection model according to the present application;

FIG. 2 is a block diagram of an embodiment of an image inspection model;

FIG. 3 is a schematic flow chart diagram illustrating an embodiment of an image detection method according to the present application;

FIG. 4 is a process diagram of an embodiment of the image detection method of the present application;

FIG. 5 is a block diagram of an embodiment of an image inspection model training apparatus according to the present application;

FIG. 6 is a block diagram of an embodiment of an image detection apparatus according to the present application;

FIG. 7 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 8 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a training method for an image detection model according to the present application. Specifically, the method may include the steps of:

step S11: a sample medical image and its reference information are acquired.

In the embodiment of the disclosure, the reference information includes a first distance corresponding to a target pixel point in the sample medical image, the first distance represents an actual distance from the target pixel point to a first reference position, and the first reference position represents a center position of a sample object to which the target pixel point belongs.

In one implementation scenario, the sample objects may be arranged according to actual needs. For example, in an application scenario of lesion detection, the sample object may be a lesion; alternatively, in an application scenario of organ detection, the sample object may be an organ, which is not limited herein.

In one implementation scenario, during the training process, a first distance corresponding to each sample pixel point in the sample medical image may be obtained, and the first distance may be represented in the form of a distance field. On the basis, the sample pixel points can be selected as target pixel points based on the first distances corresponding to the sample pixel points. The selection process of the target pixel point may refer to the subsequent related description, which is not repeated herein. Specifically, the distance field may be represented as an image having the same resolution as the sample medical image, which may be referred to as a first distance field image for convenience of description, and the pixel value of each sample pixel point in the first distance field image is a first distance corresponding to a sample pixel point at the same position in the sample medical image. Taking the sample medical image as a three-dimensional image such as CT, MR, etc., the pixel values of the sample pixel points at the (i, j, k) coordinates in the first distance field image are the first distances corresponding to the sample pixel points at the (i, j, k) coordinates in the sample medical image. Other cases may be analogized and are not illustrated here.

In an implementation scenario, a target pixel point is located in a target region of a sample medical image, the target region includes a central region and an edge region of the sample object, a first distance is located in a preset range (e.g., 0 to 1), the preset range may include an upper limit value and a lower limit value, and when a first distance corresponding to the target pixel point is located between the lower limit value and a first value, the target pixel point may be determined to be located in the edge region of the sample object, that is, after obtaining the first distance corresponding to each sample pixel point, the sample pixel point whose first distance is located between the lower limit value and the first value may be selected as the target pixel point located in the edge region; in addition, under the condition that the first distance corresponding to the target pixel point is between the second numerical value and the upper limit value, it can be determined that the target pixel point is located in the central area of the sample object, and the second numerical value is not less than the first numerical value, that is, after the first distance corresponding to each sample pixel point is obtained, the sample pixel point of which the first distance is between the second numerical value and the upper limit value can be selected as the target pixel point located in the central area. By the mode, the selected target pixel point can be flexibly controlled by controlling the first numerical value and the second numerical value, and the perception range of the image detection model to the target area is favorably improved.

In a specific implementation scenario, it should be noted that the edge region and the center region respectively represent different image regions, the edge region is a region located inside the sample object and near the edge of the sample object, and the center region is a region located inside the sample object and near the center of the sample object.

In a specific implementation scenario, the preset range may be set according to actual needs. For example, for convenience of numerical calculation, the preset range may be 0 to 1, i.e., the lower limit value is 0 and the upper limit value is 1, which are not limited herein. Correspondingly, the first numerical value and the second numerical value may also be set according to actual needs, for example, the first numerical value may be set to 0.1, that is, under the condition that the first distance corresponding to the target pixel point is between 0 and 0.1, it may be determined that the target pixel point is located in the edge area, and so on in other conditions, which is not illustrated one by one herein; or, if the second value may be set to 0.9, that is, in a case that the first distance corresponding to the target pixel point is between 0.9 and 1, it may be determined that the target pixel point is located in the central area, and the like may be performed in other cases, which is not illustrated herein. It should be noted that when the first distance corresponding to the sample pixel point is equal to the lower limit value, it may be determined that the sample pixel point belongs to a background region unrelated to the sample object, and when the first distance corresponding to the sample pixel point is equal to the upper limit value, it may be determined that the sample pixel point is located at the center position of the sample object. That is, unlike the central region, the central position represents the position of the sample pixel point with the first distance equal to the upper limit value in the sample medical image, that is, the central region can be regarded as a connected domain formed by a plurality of sample pixel points close to the central position.

In a specific implementation scenario, for the convenience of distinguishing, the target pixel points may include at least one of a first target pixel point and a second target pixel point, where the first target pixel point is located in the central region and the second target pixel point is located in the edge region. As described above, the first distance is within a preset range, the preset range includes the upper limit value and the lower limit value, the sample pixel point can be used as the second target pixel point when the first distance corresponding to the target pixel point is between the lower limit value and the first value, and the sample pixel point can be used as the first target pixel point when the first distance corresponding to the target pixel point is between the second value and the upper limit value.

In a specific implementation scenario, the target region may further include a background region unrelated to the sample object, and the target pixel may further include a third target pixel located in the background region.

In a specific implementation scenario, the maximum value and the minimum value in the actual distance of each sample pixel point may be counted, and a first difference between the maximum value and the minimum value may be obtained, so that for each target pixel point, a second difference between the actual distance and the minimum value may be obtained, and a first distance may be obtained based on a ratio between the second difference and the first difference, and the first distance is negatively correlated with the ratio. Taking the preset range of 0 to 1 and the sample medical image as a three-dimensional image as an example, the actual distance corresponding to the target pixel point located at the (i, j, k) position coordinate may be denoted as dispap (i, j, k), where the maximum value may be denoted as max (dispap) and the minimum value may be denoted as min (dispap), and then the first distance corresponding to the target pixel point located at the (i, j, k) position coordinate in the first distance field image may be denoted as GT_{center|(i，j，k)}Specifically, it can be expressed as:

in the above formula (1), max (demap) -min (demap) represents the first difference, and demap (i, j, k) -min (demap) represents the second difference. As can be seen from the formula (1), for a target pixel located at the center of the sample object, the corresponding actual distance demap (i, j, k) is the minimum value (i.e., min (demap)) of all the actual distances, so that the corresponding first distance is 1 by calculation according to the formula (1), and other target pixels can be analogized, which is not exemplified herein. It should be noted that, since the target pixel point located in the background region unrelated to the sample object does not belong to the sample object, the lower limit value (e.g., 0) of the range may be directly preset. In addition, the formula (1) merely shows an exemplary possible first distance calculation manner in the practical application process, and the specific calculation manner is not limited thereby.

In an implementation scenario, in order to further improve the detection performance of the image detection model, the reference information may further include a first flag of the target pixel, and the first flag may represent whether the target pixel actually belongs to the sample object. For example, 0 may be used to indicate that the target pixel does not belong to the sample object, and 1 may be used to indicate that the target pixel belongs to the sample object, which is not limited herein. Whether the target pixel point belongs to the sample object is marked, and whether the target pixel point belongs to the sample object is predicted subsequently by the image detection model, so that the image detection model can further learn the outline characteristics of the sample object, and reference can be specifically made to subsequent related description, which is not repeated herein.

In an implementation scenario, in order to further improve the detection performance of the image detection model, the reference information may further include first deviation information corresponding to the target pixel point, where the first deviation information specifically may include actual distances from the target pixel point to a plurality of first reference boundaries, and a first reference region formed by the plurality of first reference boundaries surrounds a sample object to which the target pixel point belongs. For example, for a two-dimensional image, a rectangular frame surrounding the sample object may be used as a first reference region, and four sides of the rectangular frame are respectively used as first reference boundaries, so that the first deviation information may specifically include actual distances from sample pixel points belonging to the sample object to the four first reference boundaries, respectively; or, for the three-dimensional image, a hexahedron (e.g., a cuboid, a cube, etc.) surrounding the sample object may be used as the first reference region, and six faces of the hexahedron are respectively used as the first reference boundaries, and the first deviation information may specifically include actual distances from sample pixel points belonging to the sample object to the six first reference boundaries, respectively. Other cases may be analogized, and no one example is given here. By labeling the actual distances from the sample pixel points to the first reference boundaries of the first reference regions surrounding the sample objects to which the sample pixel points belong, the image detection model can be facilitated to further learn the boundary characteristics of the sample objects, and specifically refer to the following related descriptions, which are not repeated herein.

Furthermore, it should be noted that, in the case of sufficient computational resources, the whole scanned medical image can be used as the sample medical image, such as the whole MR image or CT image can be used as the sample medical image; or, under the condition that the computing resources are limited, the scanned medical image can be divided into a plurality of sub-medical images, and the sub-medical images are used as sample medical images, for example, the whole medical image may have extremely high resolution, and the conventional computer may exhaust the computing resources and cannot smoothly process the whole medical image. In addition, the size of the sub-medical image may be set according to actual needs, such as 96 × 96, 192 × 192, and the like, which is not limited herein.

Step S12: and processing the sample medical image by using the image detection model to obtain a first processing result.

In the embodiment of the present disclosure, the first processing result includes a second distance corresponding to the target pixel point, and the second distance represents a predicted distance from the target pixel point to the first reference position.

In one implementation scenario, the second distance may also be represented in the form of a distance field, similar to the first distance. Specifically, the distance field may be represented as an image having the same resolution as the sample medical image, and for convenience of description, it may be referred to as a second distance field image, which is specifically referred to the above description about the first distance field image, and will not be described herein again.

In an implementation scenario, similar to the first distance, the second distance and the first distance are located in the same preset range (e.g., 0 to 1), and for the specific meaning of the preset range, reference may be made to the foregoing related description, which is not repeated herein.

In one implementation scenario, please refer to fig. 2 in combination, and fig. 2 is a schematic diagram of a framework of an embodiment of an image detection model. As shown in fig. 2, the whole medical image shown on the left side of fig. 2 may be used as the sample medical image, or a sub-medical image obtained by dividing the whole medical image may be used as the sample medical image, which specifically refers to the foregoing related description and is not described herein again. Further, the image detection model may include a feature extraction network and a distance prediction network, so that the feature extraction network may be used to perform feature extraction on the sample medical image to obtain a sample feature map, and the distance prediction network may be used to perform a first processing result on the sample feature map to obtain a first processing result, that is, a second distance field image output by the distance prediction network in fig. 2. As shown in fig. 2, the brighter the sample pixel is, the closer the sample pixel is to the center of the sample object to which the sample pixel belongs, and conversely, the darker the sample pixel is, the farther the sample pixel is from the center of the sample object to which the sample pixel belongs. In addition, the black sample pixel point represents belonging to a background region unrelated to the sample object.

In a specific implementation scenario, the feature extraction network may include, but is not limited to, a convolutional layer, a pooling layer, a normalization layer, an activation layer, and the like. Illustratively, the feature extraction network may employ a network structure of encoder-decoders (i.e., encoder-decoders), the encoders may include multiple encoding layers, the decoders may include decoding layers, both of which include, but are not limited to: convolutional layers, normalization layers, active layers, etc., in particular, the encoding layer may also comprise a downsampling layer, and the decoding layer may comprise an upsampling layer. In addition, taking the example that the encoder comprises N encoding layers and the decoder comprises N decoding layers, skip connection (i.e., skip connection) can be formed between the i encoding layer and the N +1-i decoding layers, so that the pyramid characteristics of high-level semantic information and pixel level resolution can be considered, and the accuracy of feature extraction can be improved.

In a specific implementation scenario, the distance prediction network may include, but is not limited to, several convolution layers, and as described above, the distance prediction network may perform prediction processing on the sample feature map to obtain the first processing result.

Step S13: and obtaining difference information based on the first distance and the second distance corresponding to the target pixel point, and adjusting the network parameters of the image detection model based on the difference information.

In an implementation scenario, the difference information may include first difference information, and for each target pixel point, a first sub-loss is obtained based on a difference between a first distance and a second distance corresponding to the target pixel point, a first weight is obtained based on the first distance corresponding to the target pixel point, a first loss of the target pixel point is obtained based on the first sub-loss and the first weight, and finally, the first difference information may be obtained based on the first loss of each target pixel point. By the method, the first sub-losses can be subjected to self-adaptive weighting based on the first distance in the loss calculation process, and the detection precision is further improved.

In a specific implementation scenario, under the condition that the target pixel point includes the first target pixel point, the processing operation may be performed on each first target pixel point to obtain a first loss of each first target pixel point, and the first weight corresponding to the first target pixel point is positively correlated with the first distance corresponding to the first target pixel point, so that the first difference information corresponding to the first target pixel point may be finally obtained based on the first loss of each first target pixel point. Taking the preset range of 0 to 1 and the second value of 0.9 as an example, that is, under the condition that the first distance corresponding to the sample pixel point is greater than 0.9, the sample pixel point can be used as the first target pixel point. For convenience of description, the first difference information corresponding to the first target pixel point may be recordedIs 1_{focal|0.9＜th≤1}And the first distance corresponding to the first target pixel point is recorded as GT_{center|0.9＜th≤1}And the second distance corresponding to the first target pixel point is recorded as Pred_centerThen, the first difference information l corresponding to the first target pixel point_{focal|0.9＜th≤1}Can be expressed as:

l_{focal|0.9＜th≤1}＝-GT_{center|0.9＜th≤1}×log(Pred_center)×|GT_{center|0.9＜th≤1}-Predcenter2......(2)

in the above formula (2), GT_{center|0.9＜th≤1}Represents a first weight, - - -log (Pred)_center)×|GT_{center|0.9＜th≤1}-Pred_center|²Representing a first sub-loss and th a threshold set for the first distance. It should be noted that, for simplicity of description, the formula (2) does not show that an operation such as summation is performed on the first loss of each first target pixel point to obtain the first difference information corresponding to the first target pixel point. In addition, in the case where the second value is set to other values, the analogy may be made, and no one example is given here.

In a specific implementation scenario, under the condition that the target pixel point includes the second target pixel point, the processing operation may be performed on each second target pixel point to obtain a first loss of each second target pixel point, and the first weight corresponding to the second target pixel point is positively correlated with the first distance corresponding to the second target pixel point, so that finally, the first difference information corresponding to the second target pixel point may be obtained based on the first loss of each second target pixel point. Taking the preset range of 0 to 1 and the first value of 0.1 as an example, that is, under the condition that the first distance corresponding to the sample pixel point is greater than 0 and less than 0.1, the sample pixel point can be taken as the second target pixel point. For convenience of description, the first difference information corresponding to the second target pixel point may be denoted as l_{focal|0＜th＜0.1}And the first distance corresponding to the second target pixel point is recorded as GT_{center|0＜th＜0.1}And the second distance corresponding to the second target pixel point is recorded as Pred_centerThen the first difference information l corresponding to the second target pixel point_{focal|0＜th＜0.1}Can be expressed as:

l_{focal|0＜th＜0.1}＝-GT_{center|0＜th＜0.1}×log(Pred_center)×|GT_{center|0＜th≤0.1}-Predcenter2......(3)

in the above formula (3), GT_{center|0＜th＜0.1}Represents a first weight, -log (Pred)_center)×|GT_{center|0＜th≤0.1-}Pred_center|²Representing the first sub-loss. It should be noted that, for simplicity of description, the formula (3) does not show that an operation such as summation is performed on the first loss of each second target pixel point to obtain the first difference information corresponding to the second target pixel point. In addition, in the case where the first value is set to another value, the analogy may be made, and no one example is given here.

In a specific implementation scenario, under the condition that the target pixel point includes a third target pixel point, the processing operation may be performed on each third target pixel point to obtain a first loss of each third target pixel point, and the first weight corresponding to the third target pixel point may be set to a preset value, so that finally, the first difference information corresponding to the third target pixel point may be obtained based on the first loss of each third target pixel point. Taking the preset range of 0 to 1 as an example, that is, under the condition that the first distance corresponding to the sample pixel point is equal to 0, the sample pixel point can be used as a third target pixel point. For convenience of description, the first difference information corresponding to the third target pixel point may be denoted as l_focal|th＝0And the first distance corresponding to the third target pixel point is recorded as GT_{center|th＝0}And the second distance corresponding to the third target pixel point is recorded as Pred_centerThen the first difference information l corresponding to the third target pixel point_focal|th＝0Can be expressed as:

l_focal|th＝0＝-(1-GT_{center|th＝0})×log(1-Pred_center)×|GT_{center|th＝0}-Predcenter2......(4)

in the above formula (4), (1-GT)_{center|th＝0}) Representing a first weight (i.e. a preset value of 1), -log (1-Pred)_center)×|GT_{center|th＝0}-Pred_center|²Representing the first sub-loss. It should be noted that, for simplicity of description, the formula (3) does not show that an operation such as summation is performed on the first loss of each third target pixel point to obtain the first difference information corresponding to the third target pixel point. In addition, in the case where the preset range is set as another range, the analogy may be made, and no one example is given here.

In an implementation scenario, the first difference information may be obtained by calculation according to the inclusion range of the target pixel. For example, when the accuracy of center prediction is concerned more, the center feature of the sample object needs to be learnt with emphasis, the target pixel point may include a first target pixel point, when the accuracy of edge prediction is concerned more, the edge feature of the sample object needs to be learnt with emphasis, the target pixel point may include a second target pixel point, when the accuracy of center prediction and edge prediction is concerned simultaneously, the center feature and the edge feature of the sample object need to be learnt with emphasis, the target pixel point may include the first target pixel point and the second target pixel point, and so on, which is not exemplified herein one by one. In addition, the smoothing loss may be further obtained based on a difference between the second distances corresponding to the adjacent target pixel points, and the first difference information may further include the smoothing loss. For example, in the case where the sample medical image is a three-dimensional image, the second distances may be smoothly constrained from three dimensions of x, y, and z, that is, in the x direction, the difference between the second distances corresponding to the neighboring target pixel points is measured, in the y direction, the difference between the second distances corresponding to the neighboring target pixel points is measured, and in the z direction, the difference between the second distances corresponding to the neighboring target pixel points is measured. It should be noted that, the larger the difference between the second distances corresponding to the adjacent target pixel points is, the larger the smoothing loss is, and the specific process of the smoothing loss may refer to the relevant technical details of the smoothing loss, which are not described herein again. For convenience of description, the smooth loss may be denoted as smooth, and taking an example that the target pixel includes the first target pixel, the second target pixel, and the third target pixel at the same time, the first difference information may be expressed as:

l_focal＝l_{focal|0＜th＜0.1}+l_{focal|0.9＜th≤1}+l_focal|th＝0+smoothess......(5)

it should be noted that fig. 5 shows a representation form of the first difference information when the target pixel includes the first target pixel, the second target pixel, and the third target pixel at the same time when the preset range is 0 to 1, the first numerical value is equal to 0.1, and the second numerical value is equal to 0.9, and this can be analogized when the preset range is other ranges, the first numerical value and the second numerical value are other numerical values, and the inclusion range of the target pixel is other ranges, which is not illustrated here.

In one implementation scenario, as described above, the reference information may further include a first label of the target pixel point, so that the image detection model further learns the contour feature of the sample object. The image detection model can further comprise a semantic segmentation network, on the basis, a plurality of sample feature maps extracted by the feature extraction network can be obtained, and the semantic segmentation network is utilized to process a plurality of sample characteristic graphs respectively to obtain a second processing result corresponding to each sample characteristic graph, and the second processing result comprises a second label of the target pixel, the second label characterizes whether the target pixel prediction belongs to the sample object, for each sample profile, deriving a second sub-penalty based on the difference between the first marker and the second marker, and deriving a second weight based on the resolution of the sample profile, and obtaining second loss corresponding to the sample characteristic graphs based on the second sub-loss and the second weight, wherein the second weight is positively correlated with the resolution, and then obtaining second difference information based on the second loss corresponding to the plurality of sample characteristic graphs respectively. According to the mode, on one hand, the image detection model can learn the outline characteristics of the sample object by restricting the difference between the first mark and the second mark, on the other hand, weighting is carried out based on the resolution of the sample characteristic diagram in the loss calculation process, and the resolution and the weight are in positive correlation, so that the accuracy of the high-resolution sample characteristic diagram can be accurately monitored continuously, the perception of the image detection model on the outline area is improved, and the distinguishing accuracy of each object is further improved. It should be noted that, in a real scene, the reference information may include a first mark of each sample pixel point in the sample medical image, and the second processing result may include a second mark of each sample pixel point, and then the second difference information may be calculated based on the first mark and the second mark of each sample pixel point with reference to the measurement mode of the second difference information, and the specific process is not described herein again.

In a specific implementation scenario, please refer to fig. 2 in combination, the output of the semantic segmentation network in fig. 2 is the second processing result, wherein the sample pixel in the diagonally shaded area represents the sample pixel predicted to belong to the sample object.

In a specific implementation scenario, as described above, the feature extraction network may adopt a network structure of an encoder and a decoder, and then feature maps respectively output by each decoding layer of the decoder may be obtained to obtain sample feature maps with different resolutions. For example, the decoder may include 3 decoding layers, and sample feature maps of three different resolutions may be output. Others may be so, and no one example is given here. Furthermore, the semantic segmentation network may include several (e.g., 2, 3, etc.) convolutional layers, which is not limited herein.

In a specific implementation scenario, a connected domain formed by sample pixel points actually belonging to a sample object may be used as a first contour region according to a first label, and for a second processing result corresponding to each sample feature map, a connected domain formed by sample pixel points predicted to belong to the sample object may be used as a second contour region according to a second label, and then the first contour region and the second contour region may be processed based on a set similarity loss function such as dice loss, so as to obtain a second sub-loss corresponding to each sample feature map. For example, in the case of including three sample feature maps with different resolutions, the second sub-loss corresponding to the sample feature map with the low resolution can be expressed as l_{dice low}The second sub-loss corresponding to the sample feature map of medium resolution can be recorded as l_{dice mid}The second sub-loss corresponding to the sample feature map with high resolution can be recorded as l_{dice hiqh}. Other cases may be analogized, and no one example is given here.

In a specific implementation scenario, for example, still taking a sample feature map with three different resolutions, namely, high, medium and low, as an example, the second weight corresponding to the sample feature map with low resolution may be denoted as w_lowThe second weight corresponding to the sample feature map with medium resolution can be denoted as w_midThe second weight corresponding to the high-resolution sample feature map can be denoted as w_highIn this case, the second difference information l_{dice all}Can be expressed as:

l_{dice all}＝w_low×l_{dice low}+w_mid×l_{dice mid}+w_high×l_{dice high}......(6)

it should be noted that the above formula (6) is only an exemplary representation of the second difference information in the case of a sample feature map containing three different resolutions, namely, high, medium and low, and the other cases may be similar to each other, and no example is given here.

In one implementation scenario, as mentioned above, the reference information may further include first deviation information corresponding to the target pixel point, so that the image detection model further learns the boundary features of the sample object. The image detection model may further include a deviation detection network, on the basis, a target feature map extracted by the feature extraction network may be obtained, and the target feature map has the same resolution as the sample medical image, and the target feature map is processed by the deviation detection network to obtain a third processing result, the third processing result includes second deviation information corresponding to the target pixel point, and the second deviation information includes predicted distances from the target pixel point to a plurality of first reference boundaries, for each target pixel point belonging to the sample object, a third sub-loss is obtained based on a difference between the first deviation information and the second deviation information corresponding to the target pixel point, a third weight is obtained based on the first distance corresponding to the target pixel point, and a third loss of the target pixel point is obtained based on the third sub-loss and the third weight, finally, third difference information may be obtained based on the respective third losses. In the above manner, on one hand, the difference between the first deviation information and the second deviation information is constrained, so that the image detection model can learn the size characteristics of the sample object, and on the other hand, the adaptive weighting is performed based on the first distance in the loss calculation process, so that the detection accuracy of the image detection model is further improved. It should be noted that, in a real scene, the reference information may include first deviation information corresponding to each sample pixel point in the sample medical image, and the third processing result may include second deviation information corresponding to each sample pixel point, and then the third difference information may be calculated and obtained on the basis of the first deviation information and the second deviation information of each sample pixel point by referring to the measurement mode of the third difference information, and the specific process is not described herein again.

In one specific implementation scenario, please refer to fig. 2 in combination, the output of the deviation detecting network in fig. 2 is the third processing result, wherein the rectangular box represents the boundary area of the sample object determined according to the second deviation information.

In a specific implementation scenario, as described above, the feature extraction network may adopt a network structure of an encoder and a decoder, and then may obtain a feature map output by a last decoding layer of the decoder as a target feature map. The deviation Detection network may specifically perform deviation Detection on sample pixel points belonging to a sample Object by using a concept similar to FCOS (first-order full-convolution Object Detection), so that in the Detection process, consideration is not required to be given to the size of an anchor frame due to a large difference in the size of the sample Object.

In a specific implementation scenario, for each sample pixel point belonging to a sample object, a difference between an actual distance and a predicted distance from the sample pixel point to the same first reference boundary may be obtained based on corresponding first deviation information and second deviation information, and a third sub-loss corresponding to the sample pixel point may be obtained based on a sum of squares of differences corresponding to different first reference boundaries. The sample medical image is a three-dimensional imageFor example, for a sample pixel point (x, y, z) belonging to a sample object, the first deviation information comprises d_up，d_down，d_left，d_riqht，d_front，d_backRespectively representing the actual distances of the sample pixel point to an upper boundary, a lower boundary, a left boundary, a right boundary, a front boundary and a rear boundary, and the second deviation information comprises d'_up，d′_down，d′_left，d′_right，d′_front，d′_backThen the third sub-loss corresponding to the sample pixel point can be expressed as (d)_up-d′_up)²+(d_down-d′_down)²+(d_left-d′_left)²+(d_right-d′_right)²+(d_front-d′_front)²+(d_back-d′_back)²Or the square sum of squares and the square root can be used as the third sub-loss. Other cases may be analogized, and no one example is given here.

In a specific implementation scenario, the third weight corresponding to the sample pixel point may be set to be positively correlated with the first distance corresponding to the sample pixel point. For example, in the case that the preset range is set to 0 to 1, the closer the sample pixel point is to the central position of the sample object, the greater the corresponding first distance is, in this case, the third weight corresponding to the sample pixel point may be set to be positively correlated with the first distance corresponding to the sample pixel point, so that the image detection model focuses on the accuracy of the deviation detection of the sample pixel point located at the central position. It should be noted that, under the condition that the accuracy of deviation detection of sample pixel points at other positions is focused, other weight setting manners may be correspondingly adopted by analogy, and no one-to-one example is given here.

In a specific implementation scenario, an operation such as summing may be performed on the third loss corresponding to each sample pixel point belonging to the sample object, so as to obtain third difference information.

In an implementation scenario, after obtaining the first difference information, the second difference information, and the third difference information, the difference information may be obtained, for example, the first difference information, the second difference information, and the third difference information may be summed to obtain final difference information; or, the first difference information, the second difference information, and the third difference information may be weighted and summed to obtain final difference information, which is not limited herein.

In an implementation scenario, network parameters of the image detection model may be adjusted by using an optimization manner such as gradient descent based on the difference information, which may specifically refer to technical details of the optimization manner such as gradient descent, and is not described herein again.

According to the scheme, the sample medical image and the reference information thereof are obtained, the reference information comprises a first distance corresponding to a target pixel point in the sample medical image, the first distance represents an actual distance from the target pixel point to a first reference position, the first reference position represents a central position of a sample object to which the target pixel point belongs, on the basis, the sample medical image is processed by using an image detection model to obtain a first processing result, the first processing result comprises a second distance corresponding to the target pixel point, the second distance represents a predicted distance from the target pixel point to the first reference position, and difference information is obtained based on the first distance and the second distance corresponding to the target pixel point, and network parameters of the image detection model are adjusted based on the difference information, so that the image detection model can learn the central distance characteristic of the target pixel point by restricting the difference between the first distance and the second distance, the method is beneficial to improving the perception of the image detection model on the sample object, and further can be beneficial to accurately distinguishing and detecting each object.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an embodiment of an image detection method according to the present application.

Specifically, the method may include the steps of:

step S31: and acquiring a medical image to be detected.

In the embodiment of the present disclosure, the medical image to be detected includes a plurality of target objects, and specific meanings of the target objects may refer to the related descriptions about the sample objects in the foregoing embodiment, which are not described herein again.

It should be noted that, similar to the sample medical image in the foregoing disclosed embodiment, in the case of sufficient computing resources, the scanned whole medical image may be used as the medical image to be tested, for example, the whole MR image or CT image may be used as the medical image to be tested; or, under the condition that the computing resource is limited, the scanned medical image may be divided into a plurality of sub-medical images, and the sub-medical images are used as the medical image to be detected, which may specifically refer to the related description in the foregoing disclosed embodiment and is not described herein again.

Step S32: and processing the medical image to be detected by using the image detection model to obtain the detection result of each target object.

In the embodiment of the present disclosure, the image detection model is obtained by training in the steps of the training method embodiment of any one of the image detection models, and the specific training process of the image detection model may refer to the foregoing embodiment, which is not described herein again.

In an implementation scenario, the medical image to be detected may be processed by using an image detection model to obtain a first detection result, a second detection result, and a third detection result, and based on the first detection result and the third detection result, estimated sizes of a plurality of target objects may be obtained, for each target object, based on the estimated size of the target object, any one of the second detection result and the third detection result may be selected as a reference detection result, and based on the first detection result and the reference detection result, a final detection result of the target object may be obtained by analysis, where the first detection result includes a central distance corresponding to a pixel point in the medical image to be detected, the central distance represents a predicted distance from the pixel point to the second reference position, the second reference position represents a central position of the target object to which the pixel point belongs, and the second detection result includes an attribute mark of the pixel point, the attribute mark represents whether the pixel point prediction belongs to the target object, the third detection result comprises prediction deviation information corresponding to the pixel point, the prediction deviation information comprises prediction distances from the pixel point to a plurality of second reference boundaries respectively, and a second reference region formed by the second reference boundaries surrounds the target object to which the pixel point belongs. By the method, the final detection result can be obtained by adopting different analysis strategies based on the estimated size of the target object, so that the target objects with different sizes can be taken into consideration, and the detection precision is further improved.

In a specific implementation scenario, please refer to fig. 2 and the related description in the foregoing disclosed embodiment, the image detection model may specifically include a feature extraction network, a distance prediction network, a deviation detection network, and a semantic segmentation network, where the feature extraction network is configured to perform a feature map on the medical image, the distance prediction network is configured to predict a center distance, so as to obtain a first detection result, the semantic segmentation network is configured to predict an object contour, so as to obtain a second detection result, and the deviation detection network is configured to detect deviation information, so as to obtain a third detection result, and a network structure and a processing procedure of each of the networks may refer to the related description in the foregoing disclosed embodiment, and are not described herein again.

In a specific implementation scenario, specific meanings of the first detection result, the second detection result, and the third detection result may refer to specific meanings of the first processing result, the second processing structure, and the third processing result in the foregoing disclosed embodiments, respectively, and are not described herein again.

In a specific implementation field, the center position of each target object may be determined based on the first detection result. Referring to fig. 4, fig. 4 is a schematic process diagram of an embodiment of the image detection method of the present application. As shown in fig. 4, taking the center distance in the preset range as an example, the preset range may include an upper limit value and a lower limit value, and the closer the center distance is to the upper limit value, the closer the pixel point is to the center position, in this case, a reference threshold for determining the center position may be set (for example, in the case that the preset range is 0 to 1, the reference threshold may be set to 0.8, and the like), the pixel point whose center distance is greater than the reference threshold is used as a candidate pixel point where the center position is located, a plurality of connected domains are obtained based on the candidate pixel point, and the center of each connected domain is selected to be used as the center position of each target object.

In a particular embodimentIn this case, after the center position of each target object is determined based on the first detection result, prediction deviation information at each center position may be acquired based on the third detection result, and the estimated size of the target object may be analyzed based on the prediction deviation information. Taking the medical image to be measured as a three-dimensional image, as described in the foregoing disclosure, the prediction deviation information may include d_up，d_down，d_left，d_riqht，d_front，d_backIt is to be noted that the offset ratio of the center position to the upper boundary (i.e. the ratio of the offset value of the center position to the upper boundary to the image height), the offset ratio of the center position to the lower boundary (i.e. the ratio of the offset value of the center position to the lower boundary to the image height), the offset ratio of the center position to the left boundary (i.e. the ratio of the offset value of the center position to the left boundary to the image length), the offset ratio of the center position to the right boundary (i.e. the ratio of the offset value of the center position to the front boundary to the image width), and the offset ratio of the center position to the rear boundary (i.e. the ratio of the offset value of the center position to the rear boundary to the image width) are respectively shown and will not be described herein again. On the basis of the above, d can be_left，d_rightThe sum and the product of the image length are used as the length of the second reference area, d_up，d_downThe product of the sum and the image height is used as the height of the second reference area, d_front，d_backThe product of the sum and the image width serves as the width of the second reference region, so that the estimated size of the target object can be derived based on the length, width and height. For example, the maximum (longest) among the three may be used as the estimated size, or the average of the three may be used as the estimated size, which is not limited herein.

In a specific implementation scenario, based on that the estimated size of the target object is larger than a preset size (e.g., the longest diameter is larger than 10mm), the second detection result may be selected as a reference detection result, and the target object may be used as a first object (i.e., a large-sized object, if the target object is a lesion, the first object is a large-sized lesion), on the basis, the center position of the first object may be determined based on the first detection result, and based on the reference detection result, a pixel point predicted to belong to the first object may be used as a first reference pixel point, and based on the first detection result, a center distance corresponding to the first reference pixel point may be obtained, and finally, based on the center position of the first object and the center distance corresponding to the first reference pixel point, a contour region of the first object may be obtained, and the final detection result includes the contour region of the first object. Specifically, a processing method such as a watershed algorithm may be adopted to obtain a contour region of the first object based on the center position of the first object and the center distance corresponding to the first reference pixel point, so as to implement instantiation segmentation of each first object. With continued reference to fig. 4, the black color of the image representing the contour region represents the background region, and the image regions with different gray levels represent the contour regions of different target objects. By the aid of the method, the outline area of the large-size target object can be quickly obtained by combining the first detection result and the second detection result, and the detection speed is favorably improved.

In one particular implementation scenario, where there may be no semantic segmentation response for small-sized objects, the target object may be segmented based on an estimated size of the target object being no greater than a preset size (e.g., longest dimension no greater than 10mm), the third detection result may be selected as the reference detection result, and the target object may be used as the second object (i.e. a small-sized object, if the target object is a lesion, the second object is a small-sized lesion), on the basis, the central position of the second object can be determined based on the first detection result, and the pixel point at the central position of the second object is used as a second reference pixel point, and based on the reference detection result, acquiring the prediction deviation information corresponding to the second reference pixel point, and finally based on the prediction deviation information corresponding to the second reference pixel point, a boundary region surrounding the second object may be obtained and the final detection result includes the boundary region of the second object. With continued reference to fig. 4, in the image showing the boundary region, each set of criss-cross double arrows indicates the boundary region of the second object, the positions indicated by the arrows indicate the boundary positions of the second object, for example, the left arrow indicates the left boundary position of the second object, the right arrow indicates the right boundary position of the second object, the upper arrow indicates the upper boundary position of the second object, and the lower arrow indicates the lower boundary position of the second object. Other cases may be analogized, and no one example is given here. By the method, the boundary area of the small-size target object can be accurately detected by combining the first detection result and the third detection result, and the detection precision is improved.

According to the scheme, the medical image to be detected is obtained, the medical image to be detected comprises a plurality of target objects, the medical image to be detected is processed by the image detection model to obtain the detection result of each target object, and the image detection image is obtained by the training method embodiment of any one image detection model, so that the distinguishing precision of each target object in the medical image to be detected can be improved.

Referring to fig. 5, fig. 5 is a schematic diagram of a framework of an embodiment of a training apparatus 50 for image detection models according to the present application. The training apparatus 50 for the image detection model includes: the system comprises an acquisition module 51, a processing module 52 and an adjusting module 53, wherein the acquisition module 51 is used for acquiring a sample medical image and reference information thereof; the reference information comprises a first distance corresponding to a target pixel point in the sample medical image, the first distance represents the actual distance from the target pixel point to a first reference position, and the first reference position represents the central position of a sample object to which the target pixel point belongs; the processing module 52 is configured to process the sample medical image by using the image detection model to obtain a first processing result; the first processing result comprises a second distance corresponding to the target pixel point, and the second distance represents a predicted distance from the target pixel point to the first reference position; the adjusting module 53 is configured to obtain difference information based on the first distance and the second distance corresponding to the target pixel point, and adjust a network parameter of the image detection model based on the difference information.

According to the scheme, the difference between the first distance and the second distance is restrained, so that the image detection model can learn the central distance characteristic of the target pixel point, the perception of the image detection model on the sample object is favorably improved, and then each object can be favorably and accurately distinguished and detected.

In some disclosed embodiments, the difference information includes first difference information, and the adjusting module 53 includes a first loss calculating submodule, configured to, for each target pixel point, obtain a first sub-loss based on a difference between a first distance and a second distance corresponding to the target pixel point, obtain a first weight based on the first distance corresponding to the target pixel point, and obtain a first loss of the target pixel point based on the first sub-loss and the first weight; the adjusting module 53 includes a first difference statistic submodule, configured to obtain first difference information based on the first loss of each target pixel.

In some disclosed embodiments, the target pixel point is located in a target region of the sample medical image, the target region includes a central region and an edge region of the sample object, the target pixel point includes at least one of a first target pixel point and a second target pixel point, the first target pixel point is located in the central region, the second target pixel point is located in the edge region, a first weight corresponding to the first target pixel point is positively correlated with a first distance corresponding to the first target pixel point, and a first weight corresponding to the second target pixel point is positively correlated with a first distance corresponding to the second target pixel point.

In some disclosed embodiments, the target region further includes a background region unrelated to the sample object, the target pixel further includes a third target pixel, the third target pixel is located in the background region, and the first weight corresponding to the third target pixel is a preset value.

In some disclosed embodiments, the first distance is within a preset range, the preset range including a lower limit and an upper limit; under the condition that the first distance corresponding to the target pixel point is between the lower limit value and the first numerical value, the target pixel point is located in the edge area, under the condition that the first distance corresponding to the target pixel point is between the second numerical value and the upper limit value, the target pixel point is located in the center area, and the second numerical value is not smaller than the first numerical value.

In some disclosed embodiments, the obtaining module 51 includes a distance statistics submodule, configured to count a maximum value and a minimum value in the actual distance, and obtain a first difference value between the maximum value and the minimum value; the obtaining module 51 includes a distance calculating submodule, configured to obtain a second difference between the actual distance and the minimum value for each target pixel point belonging to the sample object, and obtain a first distance based on a ratio between the second difference and the first difference; wherein the first distance is inversely related to the ratio.

In some disclosed embodiments, the reference information further includes a first mark of the target pixel, the first mark represents whether the target pixel actually belongs to the sample object, the difference information further includes second difference information, and the image detection model includes a feature extraction network and a semantic segmentation network; the training device 50 for the image detection model further comprises a sample feature obtaining module, which is used for obtaining a plurality of sample feature maps extracted by the feature extraction network; the training device 50 for the image detection model further comprises a semantic segmentation module, which is used for respectively processing the plurality of sample feature maps by using a semantic segmentation network to obtain a second processing result corresponding to each sample feature map; the second processing result comprises a second mark of the target pixel point, and the second mark represents whether the target pixel point prediction belongs to the sample object; the training device 50 for the image detection model further includes a second loss module, configured to obtain, for each sample feature map, a second sub-loss based on a difference between the first label and the second label, obtain a second weight based on a resolution of the sample feature map, and obtain a second loss corresponding to the sample feature map based on the second sub-loss and the second weight; wherein the second weight is positively correlated with the resolution; the training apparatus 50 for the image detection model further includes a second difference module, configured to obtain second difference information based on second losses respectively corresponding to the plurality of sample feature maps.

In some disclosed embodiments, the reference information further includes first deviation information corresponding to the target pixel point, the first deviation information includes actual distances from the target pixel point to the first reference boundaries, respectively, a first reference region formed by the first reference boundaries surrounds a sample object to which the target pixel point belongs, the difference information further includes third difference information, and the image detection model includes a feature extraction network and a deviation detection network; the training device 50 for the image detection model further comprises a target feature obtaining module, configured to obtain a target feature map extracted by the feature extraction network; wherein the target feature map has the same resolution as the sample medical image; the training device 50 for the image detection model further includes a deviation detection module, configured to process the target feature map by using a deviation detection network to obtain a third processing result; the third processing result comprises second deviation information corresponding to the target pixel point, and the second deviation information comprises the prediction distances from the target pixel point to the plurality of first reference boundaries respectively; the training device 50 of the image detection model further includes a third loss module, configured to, for each target pixel point belonging to the sample object, obtain a third sub-loss based on a difference between the first deviation information and the second deviation information corresponding to the target pixel point, obtain a third weight based on the first distance corresponding to the target pixel point, and obtain a third loss of the target pixel point based on the third sub-loss and the third weight; the training apparatus 50 for image detection model further comprises a third difference module, configured to obtain third difference information based on each third loss.

Referring to fig. 6, fig. 6 is a schematic diagram of a framework of an embodiment of an image detection apparatus 60 according to the present application. The image detection device 60 comprises an acquisition module 61 and a processing module 62, wherein the acquisition module 61 is used for acquiring a medical image to be detected; the medical image to be detected comprises a plurality of target objects; the processing module 62 is configured to process the medical image to be detected by using the image detection model to obtain a detection result of each target object; the image detection model is obtained by using the training device of the image detection model in the embodiment of the training device of any one of the image detection models.

According to the scheme, the medical image to be detected is obtained, the medical image to be detected comprises a plurality of target objects, the medical image to be detected is processed by the image detection model to obtain the detection result of each target object, the image detection image is obtained by the training device of the image detection model in the training device embodiment of any image detection model, and therefore the distinguishing precision of each target object in the medical image to be detected can be improved.

In some disclosed embodiments, the processing module 62 includes a detection sub-module, configured to process the medical image to be detected by using the image detection model to obtain a first detection result, a second detection result, and a third detection result; the processing module 62 includes an estimation sub-module, configured to obtain estimated sizes of the multiple target objects based on the first detection result and the third detection result, respectively; the processing module 62 includes a selection sub-module for selecting, for each target object, either one of the second detection result and the third detection result as a reference detection result based on the estimated size of the target object; the processing module 62 includes an analysis sub-module, configured to obtain a final detection result of the target object through analysis based on the first detection result and the reference detection result; the first detection result comprises a central distance corresponding to a pixel point in a medical image to be detected, the central distance represents a prediction distance from the pixel point to a second reference position, the second reference position represents a central position of a target object to which the pixel point belongs, the second detection result comprises an attribute mark of the pixel point, the attribute mark represents whether the pixel point prediction belongs to the target object, the third detection result comprises prediction deviation information corresponding to the pixel point, the prediction deviation information comprises prediction distances from the pixel point to a plurality of second reference boundaries respectively, and second reference areas formed by the plurality of second reference boundaries surround the target object to which the pixel point belongs.

In some disclosed embodiments, the selection sub-module includes a first selection unit for selecting the second detection result as the reference detection result and the target object as the first object based on the estimated size of the target object being larger than a preset size; the analysis submodule comprises a first analysis unit, a second analysis unit and a third analysis unit, wherein the first analysis unit is used for determining the central position of the first object based on the first detection result, taking the pixel point which is predicted to belong to the first object as a first reference pixel point based on the reference detection result, and acquiring the central distance corresponding to the first reference pixel point based on the first detection result; the analysis submodule comprises a contour acquisition unit, a first reference pixel point and a second reference pixel point, wherein the contour acquisition unit is used for acquiring a contour region of the first object based on the center position of the first object and the center distance corresponding to the first reference pixel point; wherein the final detection result comprises a contour region of the first object.

In some disclosed embodiments, the selection sub-module includes a second selection unit configured to select the third detection result as the reference detection result and the target object as the second object based on the estimated size of the target object being not larger than the preset size; the analysis submodule comprises a second analysis unit, a first detection unit and a second detection unit, wherein the second analysis unit is used for determining the central position of a second object based on the first detection result, taking a pixel point at the central position of the second object as a second reference pixel point, and acquiring prediction deviation information corresponding to the second reference pixel point based on the reference detection result; the analysis submodule comprises a boundary acquisition unit, a first reference pixel point and a second reference pixel point, wherein the boundary acquisition unit is used for acquiring a boundary area surrounding a second object on the basis of the prediction deviation information corresponding to the second reference pixel point; wherein the final detection result comprises a boundary region of the second object.

Referring to fig. 7, fig. 7 is a schematic frame diagram of an embodiment of an electronic device 70 of the present application. The electronic device 70 comprises a memory 71 and a processor 72 coupled to each other, and the processor 72 is configured to execute program instructions stored in the memory 71 to implement the steps of any of the above-described embodiments of the image detection model training method, or to implement the steps of any of the above-described embodiments of the image detection method. In one particular implementation scenario, the electronic device 70 may include, but is not limited to: a microcomputer, a server, and the electronic device 70 may also include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

Specifically, the processor 72 is configured to control itself and the memory 71 to implement the steps of any of the above-described embodiments of the training method of the image detection model, or to implement the steps of any of the above-described embodiments of the image detection method. The processor 72 may also be referred to as a CPU (Central Processing Unit). The processor 72 may be an integrated circuit chip having signal processing capabilities. The Processor 72 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 72 may be collectively implemented by an integrated circuit chip.

Referring to fig. 8, fig. 8 is a block diagram illustrating an embodiment of a computer readable storage medium 80 according to the present application. The computer readable storage medium 80 stores program instructions 801 that can be executed by the processor, where the program instructions 801 are used to implement the steps of any of the above-described embodiments of the image detection model training method, or to implement the steps of any of the above-described embodiments of the image detection method.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is regarded as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Claims

1. A training method of an image detection model is characterized by comprising the following steps:

acquiring a sample medical image and reference information thereof; the reference information comprises a first distance corresponding to a target pixel point in the sample medical image, the first distance represents an actual distance from the target pixel point to a first reference position, and the first reference position is a central position of a sample object to which the target pixel point belongs;

processing the sample medical image by using an image detection model to obtain a first processing result; the first processing result comprises a second distance corresponding to the target pixel point, and the second distance represents a predicted distance from the target pixel point to the first reference position;

and obtaining difference information based on the first distance and the second distance corresponding to the target pixel point, and adjusting the network parameters of the image detection model based on the difference information.

2. The method of claim 1, wherein the difference information comprises first difference information, and wherein obtaining difference information based on the first distance and the second distance corresponding to the target pixel point comprises:

for each target pixel point, obtaining a first sub-loss based on a difference value between a first distance and a second distance corresponding to the target pixel point, obtaining a first weight based on the first distance corresponding to the target pixel point, and obtaining a first loss of the target pixel point based on the first sub-loss and the first weight;

and obtaining the first difference information based on the first loss of each target pixel point.

3. The method of claim 2, wherein the target pixel point is located in a target region of the sample medical image, the target region includes a center region and an edge region of the sample object, the target pixel point includes at least one of a first target pixel point and a second target pixel point, the first target pixel point is located in the center region, the second target pixel point is located in the edge region, a first weight corresponding to the first target pixel point is positively correlated with a first distance corresponding to the first target pixel point, and a first weight corresponding to the second target pixel point is positively correlated with a first distance corresponding to the second target pixel point.

4. The method of claim 3, wherein the target region further comprises a background region unrelated to the sample object, the target pixel further comprises a third target pixel, the third target pixel is located in the background region, and the first weight corresponding to the third target pixel is a preset value.

5. The method of claim 3, wherein the first distance is within a preset range, the preset range including a lower limit value and an upper limit value;

the target pixel point is located in the edge region under the condition that a first distance corresponding to the target pixel point is located between the lower limit value and the first numerical value, and the target pixel point is located in the central region under the condition that the first distance corresponding to the target pixel point is located between the second numerical value and the upper limit value, wherein the second numerical value is not smaller than the first numerical value.

6. The method of claim 5, wherein the step of obtaining the first distance comprises:

counting the maximum value and the minimum value in the actual distance, and acquiring a first difference value between the maximum value and the minimum value;

for each target pixel point belonging to the sample object, obtaining a second difference value between the actual distance and the minimum value, and obtaining the first distance based on a ratio between the second difference value and the first difference value; wherein the first distance is inversely related to the ratio.

7. The method according to any one of claims 1 to 6, wherein the reference information further includes a first label of the target pixel, the first label characterizes whether the target pixel actually belongs to the sample object, the difference information further includes second difference information, and the image detection model includes a feature extraction network and a semantic segmentation network; before the adjusting network parameters of the image detection model based on the difference information, the method further comprises:

obtaining a plurality of sample feature maps extracted by the feature extraction network;

respectively processing the plurality of sample feature maps by using the semantic segmentation network to obtain a second processing result corresponding to each sample feature map; wherein the second processing result comprises a second label of the target pixel, and the second label characterizes whether the target pixel is predicted to belong to the sample object;

for each sample feature map, obtaining a second sub-loss based on the difference between the first mark and the second mark, obtaining a second weight based on the resolution of the sample feature map, and obtaining a second loss corresponding to the sample feature map based on the second sub-loss and the second weight; wherein the second weight is positively correlated with the resolution;

and obtaining the second difference information based on the second losses respectively corresponding to the plurality of sample feature maps.

8. The method according to any one of claims 1 to 7, wherein the reference information further includes first deviation information corresponding to the target pixel point, the first deviation information includes actual distances from the target pixel point to a plurality of first reference boundaries, respectively, a first reference region formed by the plurality of first reference boundaries surrounds a sample object to which the target pixel point belongs, the difference information further includes third difference information, and the image detection model includes a feature extraction network and a deviation detection network; before the adjusting network parameters of the image detection model based on the difference information, the method further comprises:

acquiring a target feature map extracted by the feature extraction network; wherein the target feature map has the same resolution as the sample medical image;

processing the target characteristic diagram by using the deviation detection network to obtain a third processing result; the third processing result comprises second deviation information corresponding to the target pixel point, and the second deviation information comprises predicted distances from the target pixel point to the plurality of first reference boundaries respectively;

for each target pixel point belonging to the sample object, obtaining a third sub-loss based on a difference between first deviation information and second deviation information corresponding to the target pixel point, obtaining a third weight based on a first distance corresponding to the target pixel point, and obtaining a third loss of the target pixel point based on the third sub-loss and the third weight;

and obtaining the third difference information based on each third loss.

9. An image detection method, comprising:

acquiring a medical image to be detected; wherein the medical image to be detected comprises a plurality of target objects;

processing the medical image to be detected by using an image detection model to obtain a detection result of each target object; wherein the image detection model is trained using the method of any one of claims 1 to 8.

10. The method according to claim 9, wherein the processing the medical image to be detected by using the image detection model to obtain the detection result of each target object comprises:

processing the medical image to be detected by using the image detection model to obtain a first detection result, a second detection result and a third detection result;

respectively acquiring estimated sizes of the target objects based on the first detection result and the third detection result;

for each target object, selecting any one of the second detection result and the third detection result as a reference detection result based on the estimated size of the target object, and analyzing to obtain a final detection result of the target object based on the first detection result and the reference detection result;

the first detection result comprises a central distance corresponding to a pixel point in the medical image to be detected, the central distance represents a prediction distance from the pixel point to a second reference position, the second reference position represents a central position of a target object to which the pixel point belongs, the second detection result comprises an attribute mark of the pixel point, the attribute mark represents whether the pixel point prediction belongs to the target object, the third detection result comprises prediction deviation information corresponding to the pixel point, the prediction deviation information comprises prediction distances from the pixel point to a plurality of second reference boundaries respectively, and a second reference region formed by the plurality of second reference boundaries surrounds the target object to which the pixel point belongs.

11. The method of claim 10, wherein selecting either the second detection result or the third detection result as a reference detection result based on the estimated size of the target object comprises:

selecting the second detection result as the reference detection result based on the estimated size of the target object being larger than a preset size, and taking the target object as the first object;

analyzing to obtain a final detection result of the target object based on the first detection result and the reference detection result, including:

determining the central position of the first object based on the first detection result, taking the pixel point predicted to belong to the first object as a first reference pixel point based on the reference detection result, and acquiring the central distance corresponding to the first reference pixel point based on the first detection result;

acquiring a contour region of the first object based on the central position of the first object and the central distance corresponding to the first reference pixel point; wherein the final detection result comprises a contour region of the first object.

12. The method of claim 10, wherein selecting either the second detection result or the third detection result as a reference detection result based on the estimated size of the target object comprises:

selecting the third detection result as the reference detection result based on the estimated size of the target object being not larger than a preset size, and taking the target object as a second object;

determining the center position of the second object based on the first detection result, taking a pixel point at the center position of the second object as a second reference pixel point, and acquiring prediction deviation information corresponding to the second reference pixel point based on the reference detection result;

acquiring a boundary region surrounding the second object based on the prediction deviation information corresponding to the second reference pixel point; wherein the final detection result comprises a boundary region of the second object.

13. An apparatus for training an image detection model, comprising:

the acquisition module is used for acquiring a sample medical image and reference information thereof; the reference information comprises a first distance corresponding to a target pixel point in the sample medical image, the first distance represents an actual distance from the target pixel point to a first reference position, and the first reference position represents a central position of a sample object to which the target pixel point belongs;

the processing module is used for processing the sample medical image by using the image detection model to obtain a first processing result; the first processing result comprises a second distance corresponding to the target pixel point, and the second distance represents a predicted distance from the target pixel point to the first reference position;

and the adjusting module is used for obtaining difference information based on the first distance and the second distance corresponding to the target pixel point and adjusting the network parameters of the image detection model based on the difference information.

14. An image detection apparatus, characterized by comprising:

the acquisition module is used for acquiring a medical image to be detected; wherein the medical image to be detected comprises a plurality of target objects;

the processing module is used for processing the medical image to be detected by using an image detection model to obtain a detection result of each target object; wherein the image detection model is trained using the apparatus of claim 13.

15. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the method for training an image detection model according to any one of claims 1 to 8, or to implement the method for image detection according to any one of claims 9 to 12.

16. A computer-readable storage medium, on which program instructions are stored, which program instructions, when executed by a processor, implement the method of training an image detection model according to any one of claims 1 to 8, or implement the method of image detection according to any one of claims 9 to 12.