CN117710317A

CN117710317A - Training method and detection method of detection model

Info

Publication number: CN117710317A
Application number: CN202311726223.1A
Authority: CN
Inventors: 石鲁越; 刘周; 万翔; 韩晓光
Original assignee: Chinese University of Hong Kong Shenzhen; Shenzhen Research Institute of Big Data SRIBD
Current assignee: Chinese University of Hong Kong Shenzhen; Shenzhen Research Institute of Big Data SRIBD
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-03-15

Abstract

The invention provides a training method and a detection method of a detection model, which relate to the technical field of medical image processing and comprise the following steps: acquiring a plurality of training image samples, and preprocessing the training image samples; for any training image sample: inputting the preprocessed training image sample into a coding module in an initial detection model to extract a plurality of depth feature images with different scales; respectively carrying out feature enhancement processing on the depth feature images with different scales to obtain enhancement feature images with different scales; inputting the depth feature images and the enhancement feature images with different scales into a decoding module in the initial detection model to obtain detection results corresponding to the training image samples; and performing iterative training on the initial detection model based on the detection result of each training image sample to obtain a target detection model. The invention reduces the complexity of model training while ensuring the accuracy of detection.

Description

Training method and detection method of detection model

Technical Field

The invention relates to the technical field of medical image processing, in particular to a training method and a detection method of a detection model.

Background

The detection of lung nodules is an important task in lung cancer analysis and can help doctors to screen lung cancer. Common deep learning-based lung nodule detection algorithms are often faced with the problem of excessive false positives, i.e., the model-detected lung nodule detection results contain many non-nodule tissues, e.g., blood vessels. This reduces the accuracy of the model, thereby making it much more time for the physician to screen for false detection results. To solve this problem, a two-stage detection method is generally adopted, that is, a suspected lung nodule is detected by a detection model, and then a false positive removal model is used to perform secondary classification on the detected suspected lung nodule, so as to filter out an erroneous detection result. However, this manner of model training, which relies on multiple stages of model training and more manually defined hyper-parameters, significantly increases the complexity of model training.

Disclosure of Invention

The invention provides a training method and a detection method for a detection model, and aims to solve the technical problem that model training complexity is remarkably increased due to the fact that model training is dependent on multiple stages and more manually defined super-parameter model training modes.

The invention provides a training method of a detection model, which comprises the following steps:

acquiring a plurality of training image samples, and preprocessing the training image samples;

for any of the training image samples: inputting the preprocessed training image sample into a coding module in an initial detection model to extract a plurality of depth feature images with different scales;

respectively carrying out feature enhancement processing on the depth feature images with different scales to obtain enhancement feature images with different scales;

inputting the depth feature images and the enhancement feature images with different scales into a decoding module in the initial detection model to obtain detection results corresponding to the training image samples;

based on the detection result of each training image sample, performing iterative training on the initial detection model to obtain a target detection model

According to the training method of the detection model provided by the invention, the feature enhancement processing is performed on the depth feature images with different scales to obtain the enhancement feature images with different scales, and the method comprises the following steps:

depth feature maps for any scale:

dividing the depth feature map into two-dimensional feature sequences of different views;

Respectively inputting the two-dimensional feature sequences of the different views to a multi-view prediction module in the initial detection model for prediction to obtain prediction results corresponding to the two-dimensional feature sequences of the different views;

constructing a first weight graph of different views based on prediction results of the different views;

and generating the enhancement feature map based on the depth feature map and a first weight map of the different view.

According to the training method of the detection model provided by the invention, the generating the enhanced feature map based on the depth feature map and the first weight map of the different views comprises the following steps:

carrying out average operation on the first weight graphs of the different views to obtain a target weight graph;

and carrying out feature fusion on the target weight map and the depth feature map to obtain the enhanced feature map.

According to the training method of the detection model provided by the invention, after the two-dimensional feature sequences of the different views are respectively input to the multi-view prediction module in the initial detection model to be predicted, the method further comprises the following steps:

for the prediction result corresponding to any two-dimensional characteristic sequence: and calculating to obtain a first loss value based on the prediction result and preset position labeling information, wherein the first loss value is used for optimizing model parameters of the initial detection model.

depth feature maps for any scale:

performing convolution processing on the depth feature map to obtain a segmentation result;

constructing and obtaining a second weight graph based on the segmentation result;

and generating the enhancement feature map based on the second weight map and the depth feature map.

According to the training method of the detection model provided by the invention, the convolution processing is performed on the depth feature map by using the convolution layer, and after a segmentation result is obtained, the training method further comprises the following steps:

generating three-dimensional mask information based on preset position marking information;

determining a second loss value based on the three-dimensional mask information and the segmentation result;

calculating a third loss value by using a contrast loss function based on the position labeling information and the segmentation result;

wherein the second loss value and the third loss value are used to optimize model parameters of the initial detection model.

According to the training method of the detection model provided by the invention, the depth feature images and the enhancement feature images of different scales comprise a first-scale depth feature image, a second-scale depth feature image, a third-scale depth feature image, a first-scale enhancement feature image, a second-scale enhancement feature image and a third-scale enhancement feature image;

Wherein the first dimension is smaller than the second dimension, and the second dimension is smaller than the third dimension.

According to the training method of the detection model provided by the invention, the depth feature images and the enhancement feature images with different scales are input into the decoding module in the initial detection model to obtain the detection result corresponding to the training image sample, and the method comprises the following steps:

upsampling the depth feature map and the enhancement feature map of the first scale to obtain a first upsampled feature map;

performing convolution processing on the first up-sampling feature map to obtain a first convolution feature map;

performing feature fusion processing on the first convolution feature map, the depth feature map of the second scale and the enhancement feature map to obtain a first fusion feature map;

upsampling the fusion feature map to obtain a second upsampled feature map;

performing convolution processing on the second up-sampling feature map to obtain a second convolution feature map;

performing feature fusion processing on the second convolution feature map, the depth feature map of the third scale and the enhancement feature map to obtain a second fusion feature map;

and carrying out convolution processing on the second fusion feature map to obtain the detection result.

According to the training method of the detection model provided by the invention, the preprocessed training image sample is input into the coding module in the initial detection model to extract a plurality of depth feature images with different scales, and the training method comprises the following steps:

intercepting the training image sample by using a preset sliding window to obtain a plurality of three-dimensional image blocks;

and extracting the characteristics of each three-dimensional image block to obtain depth characteristic images with different scales.

The invention also provides a detection method, which comprises the following steps:

acquiring an image to be detected, and preprocessing the image to be detected;

inputting the preprocessed image to be detected into a coding module in a target detection model to extract a plurality of target depth feature images with different scales;

performing feature enhancement processing on the target depth feature images with different scales respectively to obtain target enhancement feature images with different scales;

and inputting the target depth feature images and the target enhancement feature images with different scales into a decoding module in the target detection model to obtain a lung nodule detection result output by the detection model.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a training method of the detection model as described in any one of the above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a training method of a detection model as described in any of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of training a detection model as described in any of the above.

The invention provides a training method and a detection method of a detection model, comprising the following steps: acquiring a plurality of training image samples, and preprocessing the training image samples; for any of the training image samples: inputting the preprocessed training image sample into a coding module in an initial detection model to extract a plurality of depth feature images with different scales; respectively carrying out feature enhancement processing on the depth feature images with different scales to obtain enhancement feature images with different scales; inputting the depth feature images and the enhancement feature images with different scales into a decoding module in the initial detection model to obtain detection results corresponding to the training image samples; and performing iterative training on the initial detection model based on the detection result of each training image sample to obtain a target detection model. According to the invention, the depth feature images with different scales are extracted through the coding module, and then the feature enhancement processing is carried out on the depth feature images with different scales, so that more image features are extracted, the accuracy of the model on detecting the lung nodules is improved, the suspected lung nodules are detected without the detection model obtained by training, then the detected suspected lung nodules are secondarily classified by training a false positive removing model, the complexity of model training is greatly reduced, and the efficiency of the detection model on detecting the images is effectively improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions in the prior art, the drawings that are used in the description of the embodiments or the prior art will be briefly described one by one, it being obvious that the drawings in the description below are some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a training method of a detection model provided by the invention;

FIG. 2 is a flow chart of generating an enhanced feature map by a multi-view prediction module provided by the present invention;

FIG. 3 is a schematic flow chart of generating an enhanced feature map by a three-dimensional segmentation module according to the present invention;

FIG. 4 is a schematic flow chart of the detection method provided by the invention;

FIG. 5 is a schematic structural diagram of a training device for a detection model provided by the invention;

fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the invention, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the invention to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the invention. The word "if" as used herein may be interpreted as "at … …" or "when … …", depending on the context.

Fig. 1 is a schematic flow chart of a training method of a detection model provided by the invention. As shown in fig. 1, the training method of the detection model includes:

S11, acquiring a plurality of training image samples, and preprocessing the training image samples;

the training image samples are CT image data of the published lung nodule and image data of the clinically collected lung nodule, and optionally, the training image samples are associated with manual labeling information, such as position and size information of the lung nodule in the image data.

Specifically, the training image samples are resampled first, optionally with a specific resolution to eliminate the difference in resolution of different device scans, and then the distribution range is determined according to HU values of the training image samples to normalize the data, and it should be noted that in medical image processing, HU values (Hounsfield units) are typically used to measure the relative density of tissue in CT (computed tomography) images. Further, image segmentation is carried out on the training image sample by adopting an image segmentation algorithm to obtain an image lung region, the image segmentation algorithm comprises a watershed segmentation algorithm, a threshold segmentation algorithm, an edge-based segmentation algorithm and the like, and further, data enhancement processing is carried out on the training image sample, and optionally, the data enhancement comprises methods of random overturn, rotation and the like.

Step S12, for any one of the training image samples: inputting the preprocessed training image sample into a coding module in an initial detection model to extract a plurality of depth feature images with different scales;

specifically, the following steps are performed for any one of the training image samples: and intercepting the training image sample by using a preset sliding window to obtain a plurality of three-dimensional image blocks, and further extracting depth feature maps of different scales of each image block. For example, the sliding window is selected to have a step size of 48, the size of the truncated three-dimensional image block is 96 x 96, the depth feature graphs of different scales extracted through the three-dimensional feature pyramid network are 48 x 48, 24 x 24 and 12 x 12 respectively.

Step S13, respectively carrying out feature enhancement processing on the depth feature images with different scales to obtain enhancement feature images with different scales;

specifically, in an embodiment, the initial detection model includes a multi-view prediction module. The following steps are performed for depth feature maps of any scale: dividing the depth feature map into a plurality of groups of two-dimensional feature sequences of different views, further respectively inputting the two-dimensional feature sequences of the different views into a multi-view prediction module in the initial detection model for prediction to obtain prediction results corresponding to the two-dimensional feature sequences of the different views, and further constructing a first weight map of the different views based on the prediction results of the different views; the construction process of the first weight map is specifically described in the following embodiments, which are not described herein. And further carrying out average operation on the first weight graphs of the different views to obtain a target weight graph, further carrying out feature multiplication on the depth feature graph and the target weight graph, and further carrying out feature addition on the multiplied feature graph and the depth feature graph to obtain the enhancement feature graph.

In an embodiment, the initial detection model further comprises a three-dimensional shape estimation module comprising a three-dimensional convolution layer; the following steps are performed for depth feature maps of any scale: the depth feature map is convolved by using a three-dimensional convolution layer to obtain a segmentation result, and a second weight map is constructed based on the segmentation result, and the construction process of the weight map is specifically described in the following embodiments and is not described herein. And fusing the second weight map and the depth feature map, optionally, carrying out feature multiplication on the second weight map and the depth feature map, and carrying out feature addition on the multiplied feature map and the depth feature map, thereby obtaining the enhancement feature map.

Step S14, inputting the depth feature images and the enhancement feature images with different scales into a decoding module in the initial detection model to obtain detection results corresponding to the training image samples;

the depth feature map and the enhancement feature map of different scales comprise a first-scale depth feature map, a second-scale depth feature map, a third-scale depth feature map, a first-scale enhancement feature map, a second-scale enhancement feature map and a third-scale enhancement feature map; wherein the first dimension is smaller than the second dimension, and the second dimension is smaller than the third dimension.

Specifically, the depth feature map and the enhancement feature map of the first scale are up-sampled to obtain a first up-sampled feature map, the first up-sampled feature map is further convolved to obtain a first convolved feature map, and further, feature fusion processing is performed on the first convolved feature map and the depth feature map and the enhancement feature map of the second scale to obtain a first fusion feature map. Further up-sampling the fusion feature map to obtain a second up-sampling feature map; and then carrying out convolution processing on the second upsampling feature map to obtain a second convolution feature map, carrying out feature fusion processing on the second convolution feature map, the depth feature map and the enhancement feature map of the third scale to obtain a second fusion feature map, and finally carrying out convolution processing on the second fusion feature map to obtain the detection result.

Optionally, the processing times of the up-sampling, convolution processing and feature fusion processing are related to scale levels, for example, if only feature graphs of the first scale and the second scale are obtained, after the first fusion feature graph is obtained, the convolution processing is performed on the first fusion feature graph to obtain the detection result.

And step S15, performing iterative training on the initial detection model based on the detection result of each training image sample so as to generate an actual classification prediction model.

Specifically, based on the detection result of each training image sample and the position labeling information corresponding to each training image sample, a model loss value is calculated, so that model parameters of an initial detection model are iteratively optimized based on the model loss value, and a well-trained target detection model is obtained.

In an embodiment, after the two-dimensional feature sequences of the different views are respectively input to the multi-view prediction module in the initial detection model to perform prediction, a prediction result corresponding to the two-dimensional feature sequences of the different views is obtained, and then a first loss value may be calculated based on the prediction result and preset position labeling information, for example, a loss function may be set according to an actual situation, which is not limited herein. Therefore, the model parameters of the initial detection model can be iteratively optimized by combining the first loss value and the model loss value, so as to obtain the well-trained target detection model.

In an embodiment, a convolution layer in the three-dimensional segmentation module carries out convolution processing on the depth feature map to obtain a segmentation result, and then three-dimensional mask information is generated based on preset position labeling information; determining a second loss value based on the three-dimensional mask information and the segmentation result; based on the position labeling information and the segmentation result, a third loss value is obtained by using a comparison loss function, the calculation process of the second loss value and the third loss value is specifically described in the following embodiments, and is not described herein, and further, the model parameters of the initial detection model can be iteratively optimized by combining the second loss value, the third loss value and the model loss value, so as to obtain a well-trained target detection model.

In an embodiment, the model parameters of the initial detection model may be iteratively optimized to obtain a well-trained target detection model by combining the first loss value, the second loss value, the third loss value, and the model loss value.

In the process of model parameter optimization by combining a plurality of loss values, the loss values can be weighted to obtain a total loss value, so that model parameters of an initial detection model are optimized based on the total loss value to obtain a well-trained target detection model.

The embodiment of the invention comprises the following steps: acquiring a plurality of training image samples, and preprocessing the training image samples; for any of the training image samples: inputting the preprocessed training image sample into a coding module in an initial detection model to extract a plurality of depth feature images with different scales; respectively carrying out feature enhancement processing on the depth feature images with different scales to obtain enhancement feature images with different scales; inputting the depth feature images and the enhancement feature images with different scales into a decoding module in the initial detection model to obtain detection results corresponding to the training image samples; and performing iterative training on the initial detection model based on the detection result of each training image sample to obtain a target detection model. According to the embodiment of the invention, the depth feature images with different scales are extracted through the coding module, and then the feature enhancement processing is carried out on the depth feature images with different scales, so that more image features are extracted, the accuracy of the model in detecting the lung nodules is improved, therefore, the suspected lung nodules are not required to be detected through a detection model, then the detected suspected lung nodules are subjected to secondary classification through a false positive removal model, the complexity of model training is greatly reduced, and the efficiency of the detection model in detecting the images is effectively improved.

In one embodiment of the present invention, the performing feature enhancement processing on the depth feature maps with different scales to obtain enhancement feature maps with different scales includes:

depth feature maps for any scale:

Specifically, the following steps are performed for depth feature maps of any scale: firstly, dividing the depth feature map into two-dimensional feature sequences of different views, further respectively detecting and predicting the two-dimensional feature sequences of the different views by utilizing a multi-view prediction module in the initial detection model to obtain a prediction result corresponding to the two-dimensional feature sequences of each view, optionally, the prediction result comprises the prediction probability of a lung nodule boundary frame at each position, and further, constructing a first weight map of any view based on the prediction probability of each position corresponding to the two-dimensional feature sequence of any view. Further, the first weight graphs of different views are subjected to average operation to obtain a final target weight graph, and feature fusion is performed based on the depth feature graph and the target weight graph to obtain the enhanced feature graph.

As will be appreciated, referring to fig. 2, fig. 2 is a flow chart illustrating the generation of an enhanced feature map by the multi-view prediction module according to the present invention. For an input three-dimensional depth profile of size z, h, w, where z, h, w denote depth, height, and width of the depth profile, respectively, c is the number of channels of the profile, the depth profile is first divided into three sets of multi-view two-dimensional feature sequences, including an axial feature sequence, a coronal feature sequence, and a sagittal feature sequence, of size z (h, w, c), w (z, h, c), respectively, optionally, CT (computed tomography) images typically show human body structures through three main anatomical planes, axial (axial), coronal (coronal), and sagittal (sagittal) planes, respectively. These three planes provide slices in different directions to provide a more comprehensive understanding of the anatomy. Axial Plane (Axial Plane): this is a plane along the longitudinal axis of the body (in the direction from head to foot). In CT scanning, the axial plane is typically a transverse slice, parallel to the ground. Such planes provide cross-sectional information about the body structure, suitable for viewing internal organs, pulmonary nodules, and the like. Coronal Plane (corona Plane): this is a plane along the anterior-posterior direction of the body. In CT scanning, the coronal plane is typically a longitudinal slice, perpendicular to the axial plane. Through the coronal plane, the structure of the body from front to back can be observed, and the device is suitable for looking up the cross section of the head, neck, thoracic cavity and abdominal cavity. Sagittal Plane (Sagittal Plane): this is a plane along the left-right direction of the body. In CT scanning, the sagittal plane is also typically a longitudinal slice, perpendicular to the axial plane. Through the sagittal plane, the structure of the body from left to right can be observed, and the device is suitable for looking at the cross section of the head, the neck, the spine and the joints. Further, each group of two-dimensional feature sequences predicts lung nodules of each two-dimensional feature sequence through a two-dimensional detection module sharing parameters, and prediction results corresponding to the three groups of two-dimensional feature sequences are obtained. For example, for a two-dimensional feature sequence with a size of z (h×w×c), the prediction result is z×w×n×7, which means that for each of the z sections, the feature map has a size of h×w, the number of channels of the feature map is c,7 represents the prediction probability of the lung nodule bounding box at each position, and the lung nodule center and the size regression value, and the maximum prediction probability is taken at each position, so as to construct a weight map with a size of h×w. Three sets of feature sequences finally obtain three weight graphs with the size of z, h and w.

According to the embodiment of the invention, the depth feature map of any scale is divided into the two-dimensional feature sequences of different views to extract the feature information of the different views, and then the prediction results corresponding to the two-dimensional feature sequences of the different views are constructed to obtain the weight map, so that the enhancement feature map is generated based on the depth feature map and the weight map of the different views to extract the feature information of more lung nodules, and the accuracy of model detection is improved.

In one embodiment of the present invention, the generating the enhancement feature map based on the depth feature map and the first weight map of the different view includes:

Specifically, for depth feature maps of any scale: and carrying out average operation on the first weight graphs of the different views to obtain a target weight graph, further carrying out feature multiplication on the target weight graph and the depth feature graph to obtain a target feature graph, and further carrying out feature addition on the target feature graph and the depth feature graph to obtain the enhancement feature graph. For example: and finally obtaining three first weight graphs with the size of z, h and w by three groups of characteristic sequences, and carrying out mean value operation on the three first weight graphs to obtain a target weight graph with the final size of z, h and w. Multiplying the target weight graph by the original input depth feature graph with the size of z, h and w to obtain a target feature graph, and adding the target feature graph and the input depth feature graph to obtain the final enhancement feature. Thereby obtaining enhancement characteristic diagrams with different scales.

According to the embodiment of the invention, the first weight map and the depth feature map of the different views are fused to obtain the enhanced feature map, so that more feature information of lung nodules is extracted, and the accuracy of model detection is improved.

depth feature maps for any scale:

It should be noted that, referring to fig. 3, fig. 3 is a schematic flow chart of generating an enhanced feature map by the three-dimensional segmentation module according to the present invention. The initial detection model comprises a three-dimensional segmentation module, and the three-dimensional segmentation module comprises a three-dimensional convolution layer. Specifically, the depth feature map is subjected to convolution processing by using a three-dimensional segmentation module to obtain a segmentation result, wherein the segmentation result comprises the probability that each pixel belongs to a lung nodule, a second weight map is constructed and obtained based on the probability of each pixel in the segmentation result, further, the target weight map and the depth feature map are subjected to feature multiplication to obtain a target feature map, and further, the target feature map and the depth feature map are subjected to feature addition to obtain the enhancement feature map.

According to the embodiment of the invention, the three-dimensional segmentation module is utilized to carry out convolution processing on the depth feature map, so that a weight map is constructed according to a segmentation result, and the enhancement feature map is generated according to the weight map and the depth feature map, so that more feature information of lung nodules is extracted, and the accuracy of model detection is improved.

In one embodiment of the present invention, after the performing convolution processing on the depth feature map by using the convolution layer to obtain a segmentation result, the method further includes:

Specifically, in one embodiment, based on the lung nodule bounding boxes noted in the location annotation information, a three-dimensional mask information is generated that is used to distinguish between foreground (i.e., lung nodules) and background in the image. The segmentation result of the predicted lung nodule is then compared to the three-dimensional mask information, and the second loss value may be calculated using, for example, a cross-over ratio (IoU) or a binary cross entropy loss function.

In another embodiment, to improve the model's ability to distinguish between background and foreground features. Feature extraction is performed on different regions in the image (i.e., the background within the bounding box, the background outside the bounding box, and the foreground within the bounding box) to obtain a feature representation of these regions. And then, in combination, based on the position-labeling information and the segmentation result, a distance between the positive sample pair (the characteristic representation of the inner background of the bounding box and the outer background of the bounding box) and a distance between the negative sample pair (the characteristic representation of the inner background and the foreground of the bounding box) are calculated. Typically, such distances are calculated by some measure, such as Euclidean distance or cosine similarity. Further, the third loss value is calculated using a contrast loss function, which is typically in various forms, but the basic principle is to make the distance of the positive pair as small as possible and the distance of the negative pair as large as possible.

Further, after the second loss value and the third loss value are obtained through calculation, parameters of the model can be optimized by combining the second loss value and the third loss value in the training process, so that understanding and distinguishing capabilities of the model on different areas in an image are improved, and accuracy of model detection is improved.

In one embodiment of the present invention, the inputting the depth feature map and the enhancement feature map of different scales into the decoding module in the initial detection model to obtain the detection result corresponding to the training image sample includes:

upsampling the fusion feature map to obtain a second upsampled feature map;

Specifically, the following steps are executed for feature maps of different scales corresponding to any three-dimensional image block in each training image sample: and upsampling the depth feature map and the enhancement feature map of the first scale to obtain a first upsampled feature map, wherein optionally the first upsampled feature map has the same scale as the second upsampled feature map. And further, carrying out convolution processing on the first up-sampling feature map to obtain a first convolution feature map. And further, carrying out feature addition on the first convolution feature map, the depth feature map of the second scale and the enhancement feature map to obtain a first fusion feature map. Further up-sampling the fusion feature map to obtain a second up-sampling feature map; optionally, the scale of the second up-sampling feature map is the same as the third scale, further, the second up-sampling feature map is subjected to convolution processing to obtain a second convolution feature map, further, feature addition is performed on the second convolution feature map and the depth feature map and the enhancement feature map of the third scale to obtain a second fusion feature map, and finally, the second fusion feature map is subjected to convolution processing to obtain the detection result.

It will be appreciated that: assume a multi-scale feature map the sizes are 48 x 48 24 x 24, 12 x 12. In the decoding module, the feature map with the size of 12 x 12 is firstly up-sampled to obtain a feature map with the size of 24 x 24, the feature map with the size of 24 x 24 is convolved by a three-dimensional convolution layer and then added with the feature map with the size of 24 x 24 to obtain a fusion feature map with the size of 24 x 24, and up-sampling the fusion characteristic diagram to obtain a 48 x 48 characteristic diagram, convoluting the 48 x 48 characteristic diagram, adding the 48 x 48 characteristic diagram with the 48 x 48 characteristic diagram to obtain the 48 x 48 fusion characteristic diagram, and outputting a final detection result after passing through a convolution block.

According to the embodiment of the invention, through the scheme, the detection result of the training image sample is determined by combining the depth feature images with different scales and the enhancement feature images with different scales after feature enhancement treatment, so that the accuracy of model detection is improved.

Fig. 4 is a schematic flow chart of the detection method provided by the invention. As shown in fig. 4, the detection method includes:

s21, acquiring an image to be detected, and preprocessing the image to be detected;

step S22, inputting the preprocessed image to be detected into a coding module in a target detection model to extract a plurality of target depth feature images with different scales;

Step S23, respectively carrying out feature enhancement processing on the target depth feature images with different scales to obtain target enhancement feature images with different scales;

and step S24, inputting the target depth feature images and the target enhancement feature images with different scales into a decoding module in the target detection model to obtain a lung nodule detection result output by the detection model.

Specifically, the complete detection process of the present embodiment is as follows: firstly, an image to be detected is obtained, preprocessing and data enhancement processing are carried out on the image to be detected, and the preprocessing and data enhancement processing and the model training stage are the same in processing mode, and are not described in detail herein. And inputting the preprocessed image to be detected into a coding module in the target detection model, so as to extract a plurality of target depth feature images with different scales. Further, for depth feature maps of either scale: dividing the target depth feature map into two-dimensional feature sequences of different views; respectively inputting the two-dimensional feature sequences of different views to a multi-view prediction module in the target detection model for prediction to obtain prediction results corresponding to the two-dimensional feature sequences of different views; constructing and obtaining weight graphs of different views based on prediction results of the different views; and generating the target enhancement feature map based on the depth feature map and the weight map of the different views. And the target depth feature map can be subjected to convolution processing to obtain a segmentation result; constructing a weight graph based on the segmentation result; and generating the target enhancement feature map based on the weight map and the target depth feature map. Furthermore, the target depth feature images and the target enhancement feature images with different scales are input into a decoding module in the target detection model, so that a lung nodule detection result output by the detection model can be obtained.

According to the embodiment of the invention, the lung nodule detection is carried out by utilizing the target detection model, so that the target depth feature images and the target enhancement feature images with different scales can be extracted, the accuracy of the model on the lung nodule detection is improved, the suspected lung nodule is not required to be detected by a detection model, then the detected suspected lung nodule is subjected to secondary classification by utilizing a false positive removal model, and the image detection efficiency of the detection model is effectively improved.

The training device of the detection model provided by the invention is described below, and the training device of the detection model described below and the training method of the detection model described above can be referred to correspondingly.

Fig. 5 is a schematic structural diagram of a training device for a detection model according to the present invention, and as shown in fig. 5, the training device for a detection model according to an embodiment of the present invention includes:

the acquisition module 21 is used for acquiring a plurality of training image samples and preprocessing the training image samples;

a feature extraction module 22, configured to, for any of the training image samples: inputting the preprocessed training image sample into a coding module in an initial detection model to extract a plurality of depth feature images with different scales;

The feature enhancement module 23 is configured to perform feature enhancement processing on the depth feature maps with different scales, so as to obtain enhanced feature maps with different scales;

the detection module 24 is configured to input the depth feature maps and the enhancement feature maps with different scales into a decoding module in the initial detection model, so as to obtain a detection result corresponding to the training image sample;

the training module 25 is configured to iteratively train the initial detection model based on the detection result of each training image sample, so as to obtain a target detection model.

The training device of the detection model further comprises:

depth feature maps for any scale:

The training device of the detection model further comprises:

depth feature maps for any scale:

The training device of the detection model further comprises:

the depth feature map and the enhancement feature map of different scales comprise a depth feature map of a first scale, a depth feature map of a second scale, a depth feature map of a third scale, an enhancement feature map of the first scale, an enhancement feature map of the second scale and an enhancement feature map of the third scale;

the training device of the detection model further comprises:

upsampling the fusion feature map to obtain a second upsampled feature map;

The training device of the detection model further comprises:

It should be noted that, the above device provided in the embodiment of the present invention can implement all the method steps implemented in the method embodiment and achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those of the method embodiment in the embodiment are omitted.

Fig. 6 is a schematic structural diagram of an electronic device according to the present invention, and as shown in fig. 6, the electronic device may include: processor 310, memory 320, communication interface 330 and communication bus 340, wherein processor 310, memory 320, communication interface 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke logic instructions in the memory 320 to perform the training method of the detection model.

Further, the logic instructions in the memory 320 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the training method of the detection model provided by the above methods.

In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the training method of the detection model provided by the above methods.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of training a test model, comprising:

and performing iterative training on the initial detection model based on the detection result of each training image sample to obtain a target detection model.

2. The method for training a detection model according to claim 1, wherein the performing feature enhancement processing on the depth feature maps with different scales to obtain enhancement feature maps with different scales respectively includes:

depth feature maps for any scale:

3. The method of training a detection model according to claim 2, wherein the generating the enhancement feature map based on the depth feature map and the first weight map of the different view comprises:

4. The method for training a detection model according to claim 2, wherein the steps of respectively inputting the two-dimensional feature sequences of the different views to a multi-view prediction module in the initial detection model for prediction, and obtaining the prediction results corresponding to the two-dimensional feature sequences of the different views further comprise:

5. The method for training a detection model according to claim 1, wherein the performing feature enhancement processing on the depth feature maps with different scales to obtain enhancement feature maps with different scales respectively includes:

depth feature maps for any scale:

6. The method for training a detection model according to claim 5, wherein the performing convolution processing on the depth feature map by using the convolution layer to obtain a segmentation result further comprises:

7. The method of training a detection model according to claim 1, wherein the depth feature map and the enhancement feature map of different scales comprise a depth feature map of a first scale, a depth feature map of a second scale, a depth feature map of a third scale, and an enhancement feature map of the first scale, an enhancement feature map of the second scale, and an enhancement feature map of the third scale;

8. The method for training a detection model according to claim 7, wherein the step of inputting the depth feature map and the enhancement feature map of different scales into a decoding module in the initial detection model to obtain a detection result corresponding to the training image sample includes:

Upsampling the fusion feature map to obtain a second upsampled feature map;

9. The method for training a detection model according to claim 1, wherein the inputting the preprocessed training image samples into the coding module in the initial detection model to extract depth feature maps of a plurality of different scales comprises:

10. A method of detection comprising:

acquiring an image to be detected, and preprocessing the image to be detected;

inputting the target depth feature images and the target enhancement feature images with different scales into a decoding module in the target detection model to obtain a lung nodule detection result output by the detection model;

wherein the object detection model is obtained based on the training method of the detection model according to any one of claims 1 to 9.