CN116245892B

CN116245892B - Image processing model generation method, image processing method and device

Info

Publication number: CN116245892B
Application number: CN202211552144.9A
Authority: CN
Inventors: 隋栋; 刘伟峰; 郭茂祖
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing University of Civil Engineering and Architecture
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2024-04-12
Anticipated expiration: 2042-12-05
Also published as: CN116245892A

Abstract

An image processing model generation method, an image processing method and an image processing device acquire a plurality of sample images containing the same target object; preprocessing a plurality of sample images; acquiring an actual target object obtained by dividing a target object on each sample image and actual category information obtained by classifying the target object on each sample image according to a designated attribute; taking the preprocessed sample images, the actual target object corresponding to each sample image and the actual category information as training sample set training deep learning models to generate an image processing model; the deep learning model is constructed in advance by adopting a structure of a UNet model, and each convolution block in the deep learning model comprises: one depth-wise convolutional layer and two point-wise convolutional layers. The embodiment of the disclosure improves the model representation capability, so that the trained image processing model can improve the image segmentation precision and optimize the image segmentation details.

Description

Image processing model generation method, image processing method and device

Technical Field

The embodiment of the disclosure relates to the field of image processing, in particular to an image processing model generation method and an image processing method.

Background

With the development of artificial intelligence technology, deep learning models are widely applied to systems requiring intelligent recognition and intelligent decision, such as image segmentation and image recognition.

However, the deep learning method has insufficient image representation capability for complex structures, and the insufficient representation capability can cause problems of low precision of the segmentation result, poor segmentation details and the like.

Disclosure of Invention

The embodiment of the disclosure provides an image processing model generation method, which can improve the representation capability of a model from the two layers of depth and width, so that the representation capability of the model is greatly improved, and the trained image processing model can improve the image segmentation precision and optimize the image segmentation details under the condition that a required data set is smaller.

In one aspect, an embodiment of the present disclosure provides an image processing model generating method, including:

acquiring a plurality of sample images containing the same target object; the plurality of sample images containing the same target object include: a multi Zhang Fubu pan-scan magnetic resonance imaging (Magnetic Resonance Imagin, MRI) sample image comprising a liver tumor, or a medical scan sample image comprising a same human tissue, or a multi abdominal MRI sample image comprising a pancreatic tumor, or a multi abdominal MRI sample image comprising a colorectal tumor, or a multi abdominal MRI sample image comprising a prostate tumor, or a multi abdominal MRI sample image comprising a pancreatic tumor, or a multi pathological section medical scan sample image comprising a same human tissue, or a multi Hematoxylin-Eosin staining (H & E) medical scan sample image comprising a same human tissue, or a multi H & E staining section medical scan sample image comprising a same animal tissue, or a medical scan sample image comprising a multi immunohistochemical section of a same human tissue, or a medical scan sample image comprising a multi pathological section of an animal tissue, or a driving sample image comprising a driving scene of an automated slice of a same animal tissue;

Preprocessing the plurality of sample images;

acquiring an actual target object obtained by dividing the target object on each sample image, and classifying the target object on each sample image according to a specified attribute to obtain actual category information;

taking the preprocessed sample images, the actual target object corresponding to each sample image and the actual category information corresponding to each sample image as a training sample set training deep learning model to generate an image processing model; the deep learning model is pre-constructed by adopting a structure of a UNet model, and each convolution block in the deep learning model comprises: one depth-wise convolutional layer that promotes model representation capability in depth and two point-wise convolutional layers that promote model representation capability in width.

Illustratively, depth-wise is a Depth separable convolutional layer, and Point-wise is a Point convolutional layer, i.e., a convolutional layer with a convolution kernel of 1*1.

In another aspect, an embodiment of the present disclosure further provides an image processing method, including:

acquiring a target image containing a target object;

preprocessing the target image;

Inputting the preprocessed target image into an image processing model generated by the image processing model generating method, dividing the target object on the preprocessed target image by the image processing model to obtain a predicted target object, and classifying the target object on the preprocessed target image by the image processing model according to the specified attribute to obtain prediction type information.

In still another aspect, an embodiment of the present disclosure further provides an image processing model generating apparatus, including: a first memory and a first processor, the first memory for storing an executable program;

the first processor is configured to read and execute the executable program to implement the image processing model generation method as described above.

In still another aspect, an embodiment of the present disclosure further provides an image processing apparatus, including: a second memory and a second processor, the second memory for storing an executable program;

the second processor is configured to read and execute the executable program to implement the image processing method as described above.

Compared with the related art, the image processing model generating method provided by the embodiment of the disclosure adopts the structural construction of the UNet model, and designs each convolution block to comprise a depth-wise convolution layer for improving the model representation capability in depth and two point-wise convolution layers for improving the model representation capability in width, so that the representation capability of the model is greatly improved under the condition that a required data set is smaller, the trained image processing model can improve the image segmentation precision, and the image segmentation details are optimized.

Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the disclosure. Other advantages of the present disclosure may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The accompanying drawings are included to provide an understanding of the technical aspects of the present disclosure, and are incorporated in and constitute a part of this specification, illustrate the technical aspects of the present disclosure and together with the embodiments of the disclosure, not to limit the technical aspects of the present disclosure.

Fig. 1 is a flowchart of an image processing model generating method according to an embodiment of the disclosure;

FIG. 2 is a flow chart of an image processing method according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram of an image processing model according to an embodiment of the disclosure;

FIG. 4 is a schematic diagram showing a comparison of a segmentation result of an image processing model and an existing model in a liver region according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating a comparison of a segmentation result of an image processing model and an existing model in a liver tumor region according to an embodiment of the present disclosure.

Detailed Description

The present disclosure describes several embodiments, but the description is illustrative and not limiting, and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described in the present disclosure. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or in place of any other feature or element of any other embodiment unless specifically limited.

The present disclosure includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements of the present disclosure that have been disclosed may also be combined with any conventional features or elements to form a unique arrangement as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other aspects to form another unique aspect as defined in the claims. Thus, it should be understood that any of the features shown and/or discussed in this disclosure may be implemented alone or in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Further, various modifications and changes may be made within the scope of the appended claims.

Furthermore, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Accordingly, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Furthermore, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present disclosure.

The embodiment of the disclosure provides an image processing model generating method, as shown in fig. 1, including:

step 101, acquiring a plurality of sample images containing the same target object;

the plurality of sample images containing the same target object include: a plurality of Zhang Fubu plain scan MRI sample images comprising liver tumor, or a plurality of abdominal MRI sample images comprising the same human tissue, or a plurality of abdominal MRI sample images comprising pancreatic tumor, or a plurality of abdominal MRI sample images comprising colorectal tumor, or a plurality of abdominal MRI sample images comprising prostate tumor, or a plurality of abdominal MRI sample images comprising pancreatic tumor, or a plurality of pathological section medical scan sample images comprising the same human tissue, or a plurality of pathological section medical scan sample images comprising the same animal tissue, or a plurality of H & E stained section medical scan sample images comprising the same human tissue, or a plurality of H & E stained section medical scan sample images comprising the same animal tissue, or a plurality of immunohistochemical section medical scan sample images comprising the same human tissue, or a plurality of immunohistochemical section medical scan sample images comprising the same animal tissue, or an automated driving scene sample image comprising a target object;

102, preprocessing the plurality of sample images;

step 103, obtaining an actual target object obtained by dividing the target object on each sample image, and obtaining actual category information obtained by classifying the target object on each sample image according to the appointed attribute;

104, taking the preprocessed sample images, the actual target object corresponding to each sample image and the actual category information corresponding to each sample image as a training sample set training deep learning model to generate an image processing model; the deep learning model is pre-constructed by adopting a structure of a UNet model, and each convolution block in the deep learning model comprises: one depth-wise convolutional layer that promotes model representation capability in depth and two point-wise convolutional layers that promote model representation capability in width.

Illustratively, when a plurality of sample images containing the same target object are: a multi Zhang Fubu flat scanning Magnetic Resonance Imaging (MRI) sample image containing liver tumor, wherein the appointed attribute can be specifically benign and malignant of the liver tumor; when a plurality of sample images containing the same target object are: the medical scanning sample image containing the same human tissue has specific attribute that can be specific to the benign and malignant of the human tissue; when a plurality of sample images containing the same target object are: the automated driving scene sample image containing the pedestrian, the specified attribute may specifically be the sex of the pedestrian.

In the related art, liver tumor is one of serious diseases which endanger human life safety, and in the clinical practice of the current stage of medicine, diagnosis of liver tumor still needs to be carried out by taking CT or nuclear magnetic images of a patient, and designing a treatment method after a clinician with abundant experience marks a cancerous region manually. By manually performing the localization and segmentation of cancerous regions, a physician is required to have a great deal of experience and, with a strong subjectivity, the results are highly relevant to the physician's level and experience. With the continuous progress of computer imaging technology and artificial intelligence technology in recent years, a large number of researchers have tried to automatically segment medical images using deep learning technology, and medical image imaging and analysis methods have made tremendous progress in the last decade, from early shallow machine learning methods based on manual features to deep learning methods based on convolutional neural networks. Efficient visual representation systems of medical images can extract key features required for downstream tasks, which have been a major need in clinical research and diagnosis. Researchers have focused on designing medical image learning frameworks for segmentation and classification, such as segmentation network U-Net, segmentation network U-Net++, segmentation network Dense-UNet, segmentation network DeepLabV3, and so forth, for medical image analysis. The U-Net method comprises the steps of gradually downsampling an input image, expanding a receptive field, capturing a region of interest, sequentially upsampling features extracted through a representation structure, and fusing multi-level features through jump connection to achieve a segmentation result. The U-Net++ method leads the dimension difference of the characteristic diagram to be smaller when in fusion by introducing long connection and short connection. The Dense-UNet method introduces the idea of Dense connection in U-Net, and increases the representation capability of the network, so that the performance of the separation model is more excellent. The deep LabV3 method discovers that the continuous downsampling operation reduces the resolution of the features, is unfavorable for positioning, the information between the global features and the context is favorable for semantic segmentation, and a hole space pyramid pooling technology is provided. Recently, the transducer model replaces the traditional convolutional neural network model on a plurality of visual tasks, is the best model on a plurality of computer visual tasks due to the capturing capability of global information, and particularly, the Swin transducer model is the best in a plurality of visual tasks by introducing the sliding window concept in the convolutional neural network. A series of methods such as Swin-UNet and TransUNet replace the original convolution-based encoder with a transform encoder on the UNet basis, and good results are obtained on the task of some medical image segmentation.

Compared with the conventional medical image segmentation method, the segmentation task in the MRI image sequence aiming at the problem of lesion region segmentation in the MRI image, especially the liver cancer MRI image sequence has the following difficulties: the morphology of the lesion areas to be segmented is irregular and the differences are large: for the abdomen flat scanning MRI medical image, the liver tumor position and the tumor area size have the problem of irregular morphology, and the method is mainly characterized in that: (1) The size difference is large, in the slices of different positions of the same patient, the liver tumor areas of some slices are larger, and the liver tumor slices of some slices are smaller; and the size distribution of the tumor area of the liver cancer at different stages is also relatively uneven among the MRI images of different patients. (2) The shape differences are large, and the liver tumor area is large in the distribution of the shape outline of the tumor area between different cases and different sections. Based on the two points, the problems of irregular morphology and large difference of the lesion areas to be segmented can cause the problems of low segmentation precision and unclear edge contour.

The target of partial lesion area is smaller, and the segmentation effect is poor: there are large differences in the size of liver tumor areas for different stages of liver cancer, but there is a difficulty: the target area to be segmented is smaller, the resolution of the whole abdomen MRI image is higher, all organs in the abdominal cavity are contained in the whole abdomen MRI image, and the liver tumor area occupies smaller area in the whole MRI image. Thus, the segmentation problem for liver tumor regions remains a challenging problem, and small targets tend to cause over-segmentation or under-segmentation problems for segmentation models.

The prior art methods, especially the deep learning method, have insufficient representation capability for the lesion area: in the prior art, the method based on the traditional feature engineering has the problems of insufficient representation capability and poor generalization. With the continuous development of deep learning technology, a liver tumor segmentation method based on a deep learning convolutional neural network is gradually mainstream, but the problem of insufficient representation capability still exists. Some approaches address the problem of insufficient representation capability by introducing self-attention mechanisms, but self-attention mechanism models require large amounts of training data, which is difficult to achieve on medical image segmentation tasks. The problem that the accuracy of the segmentation result is low and the segmentation details are poor due to the insufficient representation capability remains a technical difficulty.

U-Net networks are important in medical image segmentation tasks, but U-Net networks do not perform well in complex structure segmentation due to insufficient representation capabilities. The U-Net++ network cannot solve the defect of insufficient representation capability of the U-Net network in complex structure segmentation by introducing long and short connection. The transducer model based on the attention mechanism has extremely strong representation capability, is superior to the convolutional neural network in representation capability, but the training cost of the transducer model is extremely high, besides, due to the lack of inductive bias and translational invariance of the convolutional neural network, a particularly large data set is required for training the network, although a few people try to study to train the transducer model by using a small data set, a sufficient number of the transducer models are still difficult to acquire on medical images for training, and the labeling of the medical images requires experienced clinical specialists to do so, so that the sufficient number of the transducer models suitable for the medical images are difficult to acquire for training.

According to the image processing model generation method provided by the embodiment of the disclosure, a deep learning model is built by adopting the structure of the UNet model, each convolution block is designed to comprise a depth-wise convolution layer capable of improving the representation capacity of the model in depth and two point-wise convolution layers capable of improving the representation capacity of the model in width, so that the representation capacity of the model is greatly improved under the condition that a required data set is small, the trained image processing model can improve the precision of an image segmentation result, and the image segmentation detail is optimized.

From the perspective of network model representation capability, the image processing model generation method provided by the embodiment of the disclosure introduces a new neural network architecture design idea into an original UNet model, improves the representation capability of the network model, and extracts enough excellent and robust features for downstream segmentation and recognition tasks. Meanwhile, the original UNet basic structure, namely convolution, is maintained, and the defect of inherent inductive bias of convolution caused by using a transducer as an encoder is avoided, so that the effectiveness of a model is ensured without a large amount of training data, namely the problem of data dependence is avoided. The image processing model generation method provided by the embodiment of the disclosure can realize training of two task targets at the same time.

In an exemplary embodiment, one depth-wise convolution layer and two point-wise convolution layers included in each convolution block are sequentially arranged from front to back according to a data flow direction;

each convolution block further comprises: a normalization layer and an activation layer;

the normalization layer is arranged after the depth-wise convolution layer,

the activation layer is arranged behind a first point-wise convolution layer in the two point-wise convolution layers, or is arranged behind a second point-wise convolution layer in the two point-wise convolution layers, or is respectively arranged behind the two point-wise convolution layers;

the normalization layer is realized by using LayerNorm function, and the activation layer is realized by using Gaussian error linear unit GELU function.

The feature extraction module is also an encoder part and is improved based on the original structure of the UNet model, and by introducing the latest convolutional neural network architecture design idea, for each basic convolutional block, a depth-wise convolutional layer and two point-wise convolutional layers form an inverse bottleneck structure, and simultaneously, layerNorm normalization and GELU activation functions are used, so that the model representation capability is effectively improved.

In the design of the basic convolution block of the encoder, the image processing module generated by the image processing model generating method provided by the embodiment of the disclosure uses a LayerNorm method for feature normalization, and in the related art, the method for generating the image processing model, such as UNet, deepLab, UNet ++, adopts a Batchnorm method for feature normalization. LayerNorm normalizes for layers and BatchNorm normalizes for batches, batchNorm normalizes each feature in the same batch, is very sensitive to the batch size of the training process, since the mean and variance calculated each time is calculated between one batch, if the batch size is too small, the calculated mean and variance is insufficient to represent the distribution of the entire data. Too large a bacterial size can cause extremely severe computational overhead, failing to work properly on edge and small computing devices. The LayerNorm calculates the mean value and variance in the sample, can pull the data distribution to the unsaturated region of the activation function, has the characteristic of data expansion and contraction without deformation, and plays roles of relieving gradient disappearance and gradient explosion.

In addition, the image processing module generated by the image processing model generating method provided by the embodiment of the disclosure adopts a GELU activating function as an activating function, which is quite different from the previous method, and the methods for generating the image processing model in the related art, such as UNet, unet++, unet3+ and the like, all adopt linear rectifying function ReLU activating functions.

In addition, in each convolution block of the image processing module generated by the image processing model generating method provided by the embodiment of the present disclosure, although there are three different convolution layers, only one activation function (the activation function is used after the first point-wise convolution layer) may be used. In the related art method UNet, dense-UNet, PSPNet for generating an image processing model, there is an activation function after each convolution layer in each convolution block. The effect of the activation function is to improve the non-linear representation of the model, since all are linear transformations in the neural network, the fitting ability to complex problem linear transformations is not sufficient.

In one illustrative example, the deep learning model includes: four convolution block sets sequentially arranged according to a data flow direction, wherein the first convolution set comprises: three of the convolution blocks, the second convolution set comprising: three of the convolution blocks, the third convolution set comprising: nine of the convolution blocks, the fourth convolution set comprising: three of the convolution blocks.

In an exemplary embodiment, the generating an image processing model using the preprocessed plurality of sample images, the actual target object corresponding to each sample image, and the actual class information corresponding to each sample image as a training sample set training deep learning model includes:

And taking the preprocessed sample images as input, taking an actual target object corresponding to each sample image and actual category information on each sample image as output, performing iterative training on the deep learning model according to a preset loss function, and adjusting parameters in the deep learning model by using a back propagation algorithm until the loss value of the loss function converges to obtain the image processing model.

In one illustrative example, the predetermined loss function includes: a first loss function and a second loss function;

taking the preprocessed sample images as input, taking an actual target object corresponding to each sample image and actual category information on each sample image as output, performing iterative training on the deep learning model according to a preset loss function, and adjusting parameters in the deep learning model by using a back propagation algorithm until loss values of the loss function are converged to obtain the image processing model, wherein the method comprises the following steps:

dividing batches of the preprocessed sample images, taking the preprocessed sample images of the first batch as sample images of the current batch, taking an initial deep learning model as a current deep learning model, and executing the following iterative training process:

Firstly, inputting the current batch of sample images to the current deep learning model, dividing the target object on each sample image in the current batch of sample images by the current deep learning model to obtain a predicted target object, and classifying the target object on each sample image in the current batch of sample images by the current deep learning model according to the appointed attribute to obtain predicted category information;

secondly, calculating a first difference between a predicted target object and an actual target object corresponding to each sample image in the current batch of sample images according to the first loss function and the second loss function; calculating a second difference between the predicted category information and the actual category information corresponding to each sample image in the current batch of sample images according to the first loss function;

thirdly, based on the first difference and the second difference corresponding to each sample image in the current batch of sample images, adjusting parameters in the current deep learning model by utilizing the back propagation algorithm to obtain an updated model;

and finally, taking the updated model as a new current deep learning model, taking the preprocessed next batch of sample images as new current batch of sample images, continuing to execute the iterative training process until the first loss function and the second loss function are converged, and taking the finally obtained updated deep learning model as the image processing model.

In one illustrative example, the first loss function includes: a weighted cross entropy loss function, the second loss function comprising: weighted cross-ratio loss function.

In an exemplary embodiment, the initialization parameters of the convolution block are parameters pre-trained based on ImageNet.

Illustratively, pre-training based on ImageNet is prior art, and model convergence can be achieved more quickly by directly using the parameters obtained by pre-training based on ImageNet as the initialization parameters of ConvNeXt blocks.

In one illustrative example, each time a parameter in the current deep learning model is adjusted in performing the iterative training process, an Adamw optimizer is employed.

The prior art is often tuned using Adam optimizers or random gradient descent methods, both of which are less effective than Adamw optimizers.

In an exemplary embodiment, the preprocessing the plurality of sample images includes:

and performing image enhancement on the plurality of sample images by using an automatic data enhancement technology.

Exemplary automatic enhancement techniques include conventional image enhancement means such as color space enhancement, flipping, affine transformation, and the like. The automatic enhancement method is used for replacing a manually designed data enhancement strategy, so that generalization of the model is ensured to a certain extent.

An embodiment of the present disclosure provides an image processing method, as shown in fig. 2, including:

step 201, obtaining a target image containing a target object;

step 202, preprocessing the target image;

step 203, inputting the preprocessed target image into an image processing model generated by the image processing model generating method described in any embodiment, dividing the target object on the preprocessed target image by the image processing model to obtain a predicted target object, and classifying the target object on the preprocessed target image by the image processing model according to the specified attribute to obtain prediction type information.

According to the image processing method provided by the embodiment of the disclosure, the image processing is performed by using the image processing model capable of greatly improving the model representation capability under the condition that the required data set is small, so that the image segmentation precision is improved, and the segmentation details of the image are optimized.

In an exemplary embodiment, when the number of the target images is multiple, the image processing model segments the target objects on the preprocessed target images to obtain predicted target objects, and the image processing model classifies the target objects on the preprocessed target images according to the specified attribute to obtain predicted class information, the method further includes:

Firstly, obtaining an actual target object obtained by dividing the target object on each target image, and obtaining actual category information obtained by classifying the target object on each target image according to a designated attribute;

and secondly, training the image processing module by taking the preprocessed target images, the actual target object corresponding to each target image and the actual category information corresponding to each sample image as a training sample set to obtain an updated image processing model.

In an exemplary embodiment, the training the image processing module using the preprocessed target images, the actual target object corresponding to each target image, and the actual class information corresponding to each sample image as a training sample set, to obtain an updated image processing model includes:

and taking the preprocessed target images as input, taking the actual target object corresponding to each target image and the actual category information corresponding to each target image as output, performing iterative training on the image processing model according to a preset loss function, and adjusting parameters in the image processing model by using a back propagation algorithm until the loss value of the loss function converges, so as to obtain an updated image processing model.

taking the preprocessed target images as input, taking the actual target object corresponding to each target image and the actual category information corresponding to each target image as output, performing iterative training on the image processing model according to a preset loss function, and adjusting parameters in the image processing model by using a back propagation algorithm until the loss value of the loss function converges, so as to obtain an updated image processing model, wherein the method comprises the following steps:

dividing batches of the preprocessed target images, taking the preprocessed target images of the first batch as a current batch of target images, taking an initial image processing model as a current image processing model, and executing the following iterative training process:

firstly, inputting the current batch of target images into the current image processing model, dividing the target object on each target image in the current batch of target images by the current image processing model to obtain a predicted target object, and classifying the target object on each target image in the current batch of target images by the current image processing model according to the appointed attribute to obtain predicted category information;

Secondly, calculating a first difference between a predicted target object and an actual target object corresponding to each target image in the current batch of target images according to the first loss function and the second loss function; calculating a second difference between the predicted category information and the actual category information corresponding to each target image in the current batch of target images according to the first loss function;

thirdly, based on the first difference and the second difference corresponding to each target image in the current batch of target images, adjusting parameters in the current image processing model by utilizing the back propagation algorithm to obtain an updated model;

and finally, taking the updated model as a new current image processing model, taking the preprocessed next batch of target images as new current batch of target images, and continuing to execute the iterative training process until the first loss function and the second loss function are converged, so as to obtain a final updated image processing model.

The embodiment of the disclosure provides a generation method for processing a liver medical image processing model and a corresponding image processing method, wherein the generation method for processing the liver medical image processing model and the corresponding image processing method form a liver medical image analysis frame, and the method comprises a data processing stage, a model building stage, a model training stage and a test reasoning stage, wherein:

The data processing stage comprises the operations of format conversion, data cleaning, normalization and the like of the abdominal medical images acquired between different devices, and aims to alleviate the distribution difference of the data from the different devices.

In the model building stage, fig. 3 is a schematic diagram of an image processing model structure provided in the embodiment of the present disclosure, as shown in fig. 3, a new neural network architecture design concept is introduced based on an original UNet model, a depth-wise convolution layer and two point-wise convolution layers are used to form an "inverse bottleneck" structure in each basic block of the encoder, and a LayerNorm normalization operation is added after the depth-wise convolution layer, and a gel activating function is used after the first point-wise convolution layer to improve the nonlinear expression capability of the model. The encoder stage has a total of 18 above-mentioned convolution blocks, called convnex blocks (wherein 18 different convolution blocks are distributed among four convolution stages of the encoder, the first one of the convolution stages E1 contains three convolution blocks, the second one of the convolution stages E2 contains basic convolution blocks, the third one of the convolution stages E3 contains nine convolution blocks, and the fourth one of the convolution stages E4 contains three convolution blocks). The data is sent into the convolution block after being downsampled by 4 times once, and multi-scale image features rich in information are extracted. After the encoder extracts the features, the different features are sent to different task related decoders, respectively. For image tumor recognition tasks, the decoder consists of three fully connected layers, accompanied by a GELU activation function. For the tumor region segmentation task, the decoder consists of a plurality of upsampling layers, restores the original resolution of the image step by step, and gives the segmentation result.

In a segmentation task decoder, namely a segmentation module (Segmentation Block), multi-scale image features obtained in an encoder stage are input into the segmentation block, namely f4 which is output finally is firstly subjected to double up-sampling operation and then is spliced with the image features f3 through two convolution layers with the convolution kernel size of 3*3; splicing with the image feature f2 after a double up-sampling operation and convolution operation with two convolution kernel sizes 3*3; the image is spliced with the image feature f1 after a double up-sampling operation and two convolution operations of convolution kernel size 3*3, and then restored to the original size after a 4-fold up-sampling operation. Finally, a convolution operation with a convolution kernel size of 1*1, the output dimension of which is 1, is performed to obtain a segmentation result. The method is called jump connection, the jump connection can be used for fusing multi-scale characteristic information, and a tumor region segmentation result with better detail information can be obtained through multi-scale characteristic fusion.

In a Prediction task decoder, namely a Prediction module (Prediction Block), only f4 features output finally by an encoder stage are used, after the f4 features are input into the Prediction Block, the output features pass through a first hidden Layer (Hiddnen Layer), the feature dimension is reduced from 1000 to 512, then the linear conversion capability of a model is increased by using a GELU activation function, and then the overfitting problem of the model is avoided by using a random discarding neuron Dropout technology; then through the second hidden layer, the feature dimension is reduced from 512 to 256, and the GELU activation function and the Dropout technology are used similarly; and finally, reducing the feature dimension to 2 through a third hidden layer to obtain a final classification recognition result.

In the model training stage, weights based on ImageNet pre-training are adopted as parameter initialization of an encoder part, the encoder adopts random initialization parameters, an initialization Adamw optimizer is adopted as an optimizer of the whole training stage, and a weighted cross entropy loss function and a weighted cross-over ratio (Intersection over Union. IoU) loss function are used as optimization targets. And (3) designing a data generator according to the batch size, and sending all training data into the network model batch by batch to obtain identification and segmentation results respectively. And calculating and identifying the difference between the predicted result of the task decoder and the real label based on the weighted cross entropy loss function, calculating and dividing the difference between the predicted result of the task decoder and the real tumor area based on the weighted cross entropy loss function and the weighted cross ratio loss function, and updating the parameters of the whole model frame based on a back propagation algorithm. And when the network model iterates to the preset iteration times, saving parameters of the model, and stopping training.

And in the test reasoning stage, model parameters obtained through the training are loaded into a network model, and liver images after pretreatment are directly input into the CXNet framework to obtain the recognition and segmentation prediction results of the framework.

The image processing model provided by the embodiment of the disclosure is used for performing experimental comparison between liver tumor recognition and a common tumor classification recognition model, and the obtained results are shown in the following table 1:

TABLE 1

As can be seen from table 1, the image processing model of the present disclosure is superior to the existing model in the task of tumor classification recognition. The second column of the table is a convolutional neural network model, which is less effective than the image processing model of the present disclosure because of the insufficient representation capability of these models. The third column of the table is the transducer model, which, in agreement with previous analysis, requires a large amount of training data to compensate for this deficiency due to the lack of inductive bias inherent to the convolution model, but is difficult to obtain for medical tasks.

The image processing model provided by the embodiment of the disclosure is used for performing experimental comparison between liver segmentation and a common segmentation model, and the obtained results are shown in the following table 2:

TABLE 2

As can be seen from table 2, the image processing model of the present disclosure is significantly superior to the existing model in 5 evaluation indexes as a result of the segmentation of the liver region.

The embodiment of the disclosure also provides a comparison schematic diagram of the segmentation result of the image processing model and the existing model in the liver region, as shown in fig. 4, the segmentation result of the image processing model of the disclosure is obviously better than the segmentation result of the existing model, and is very close to a reference standard (GT), which can be a sample marked manually for model training and final result comparison.

The image processing model provided by the embodiment of the disclosure is used for performing experimental comparison between liver tumor region segmentation and a common segmentation model, and the obtained results are shown in the following table 3:

TABLE 3 Table 3

As can be seen from table 3, the segmentation result of the image processing model of the present disclosure on the liver tumor region is significantly better than that of the previous model in 5 evaluation indexes.

The embodiment of the disclosure also provides a comparison schematic diagram of the segmentation result of the image processing model and the existing model in the liver tumor area, as shown in fig. 5, on the relatively small target of the tumor area, the segmentation result of the image processing model of the disclosure is obviously superior to the segmentation result of other models and is very close to GT.

The embodiment of the disclosure provides an image processing model generating device, which comprises: a first memory and a first processor, the first memory for storing an executable program;

the first processor is configured to read and execute the executable program to implement the image processing model generating method described in any one of the foregoing embodiments.

An embodiment of the present disclosure provides an image processing apparatus including: a second memory and a second processor, the second memory for storing an executable program;

The second processor is configured to read and execute the executable program to implement the image processing method described in any of the foregoing embodiments.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, simply "CPU"), but may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.

In an implementation, the processing performed by the terminal device may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. That is, the steps of the methods disclosed in the embodiments of the present disclosure may be embodied as hardware processor execution or as a combination of hardware and software modules in a processor. The software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, and other storage media. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

The present application describes a number of embodiments, but the description is illustrative and not limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or in place of any other feature or element of any other embodiment unless specifically limited.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements of the present disclosure may also be combined with any conventional features or elements to form a unique inventive arrangement as defined in the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive arrangements to form another unique inventive arrangement as defined in the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Further, various modifications and changes may be made within the scope of the appended claims.

Furthermore, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Accordingly, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Furthermore, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

Claims

1. An image processing model generation method, characterized by comprising:

acquiring a plurality of sample images containing the same target object; the plurality of sample images containing the same target object include: a multi Zhang Fubu flat scan MRI sample image containing liver tumor, or a multi abdominal MRI sample image containing pancreatic tumor, or a multi abdominal MRI sample image containing colorectal tumor, or a multi abdominal MRI sample image containing prostate tumor, or a multi abdominal MRI sample image containing pancreatic tumor, or a multi pathological section medical scan sample image containing the same human tissue, or a multi pathological section medical scan sample image containing the same animal tissue, or a multi hematoxylin-eosin H & E staining section medical scan sample image containing the same human tissue, or a multi H & E staining section medical scan sample image containing the same animal tissue, or a multi immunohistochemical section medical scan sample image containing the same human tissue, or a medical scan sample image containing the same animal tissue, or an automated driving scene sample image of the target object;

Preprocessing the plurality of sample images;

taking the preprocessed sample images, the actual target object corresponding to each sample image and the actual category information corresponding to each sample image as a training sample set training deep learning model to generate an image processing model; the deep learning model is pre-constructed by adopting a structure of a UNet model, and each convolution block in the deep learning model comprises: a depth-wise convolution layer that elevates the model representation capability in depth and two point-wise convolution layers that elevates the model representation capability in width;

the generating an image processing model by taking the preprocessed sample images, the actual target object corresponding to each sample image and the actual category information corresponding to each sample image as a training sample set training deep learning model comprises the following steps:

taking the preprocessed sample images as input, taking an actual target object corresponding to each sample image and actual category information on each sample image as output, performing iterative training on the deep learning model according to a preset loss function, and adjusting parameters in the deep learning model by using a back propagation algorithm until the loss value of the loss function converges to obtain the image processing model;

The predetermined loss function includes: a weighted cross entropy loss function and a weighted cross-ratio loss function;

inputting the current batch of sample images to the current deep learning model, dividing the target object on each sample image in the current batch of sample images by the current deep learning model to obtain a predicted target object, and classifying the target object on each sample image in the current batch of sample images by the current deep learning model according to the appointed attribute to obtain predicted category information;

Calculating a first difference between a predicted target object and an actual target object corresponding to each sample image in the current batch of sample images according to the weighted cross entropy loss function and the weighted cross ratio loss function; calculating a second difference between the predicted category information and the actual category information corresponding to each sample image in the current batch of sample images according to the weighted cross entropy loss function;

based on the first difference and the second difference corresponding to each sample image in the current batch of sample images, adjusting parameters in the current deep learning model by using the back propagation algorithm to obtain an updated model;

and taking the updated model as a new current deep learning model, taking the preprocessed next batch of sample images as new current batch of sample images, continuing to execute the iterative training process until the weighted cross entropy loss function and the weighted cross union ratio loss function are converged, and taking the finally obtained updated deep learning model as the image processing model.

2. The method of claim 1, wherein the one depth-wise convolutional layer and the two point-wise convolutional layers included in each convolutional block are sequentially arranged from front to back according to a data stream;

the normalization layer is arranged after the depth-wise convolution layer,

the normalization layer is realized by using LayerNorm function, and the activation layer is realized by using GELU function.

3. The method of claim 1, wherein the deep learning model comprises: four convolution block sets sequentially arranged according to a data flow direction, wherein the first convolution set comprises: three of the convolution blocks, the second convolution set comprising: three of the convolution blocks, the third convolution set comprising: nine of the convolution blocks, the fourth convolution set comprising: three of the convolution blocks.

4. The method of claim 1, wherein the initialization parameters of the convolution block are parameters that are pre-trained based on ImageNet.

5. The method of claim 1, wherein each time a parameter in the current deep learning model is adjusted in performing the iterative training process, an Adamw optimizer is employed.

6. The method of claim 1, wherein the preprocessing the plurality of sample images comprises:

7. An image processing method, comprising:

acquiring a target image containing a target object;

preprocessing the target image;

inputting the preprocessed target image into an image processing model generated by the image processing model generating method according to any one of claims 1-6, dividing the target object on the preprocessed target image by the image processing model to obtain a predicted target object, and classifying the target object on the preprocessed target image by the image processing model according to the specified attribute to obtain prediction type information.

8. The method according to claim 7, wherein when the number of the target images is plural, the dividing the target object on the preprocessed target image by the image processing model to obtain a predicted target object, and classifying the target object on the preprocessed target image by the image processing model according to the specified attribute to obtain prediction category information, further comprises:

Acquiring an actual target object obtained by dividing the target object on each target image, and classifying the target object on each target image according to a specified attribute to obtain actual category information;

and taking the preprocessed target images, the actual target object corresponding to each target image and the actual category information corresponding to each sample image as a training sample set training image processing module to obtain an updated image processing model.

9. The method according to claim 8, wherein training the image processing module using the preprocessed target images, the actual target object corresponding to each target image, and the actual class information corresponding to each sample image as a training sample set, to obtain an updated image processing model includes:

10. The method of claim 9, wherein the predetermined loss function comprises: a first loss function and a second loss function;

inputting the current batch of target images into the current image processing model, dividing the target object on each target image in the current batch of target images by the current image processing model to obtain a predicted target object, and classifying the target object on each target image in the current batch of target images by the current image processing model according to the appointed attribute to obtain predicted category information;

Calculating a first difference between a predicted target object and an actual target object corresponding to each target image in the current batch of target images according to the first loss function and the second loss function; calculating a second difference between the predicted category information and the actual category information corresponding to each target image in the current batch of target images according to the first loss function;

based on the first difference and the second difference corresponding to each target image in the current batch of target images, adjusting parameters in the current image processing model by using the back propagation algorithm to obtain an updated model;

and taking the updated model as a new current image processing model, taking the preprocessed next batch of target images as new current batch of target images, and continuing to execute the iterative training process until the first loss function and the second loss function are converged, so as to obtain a final updated image processing model.

11. An image processing model generation apparatus, comprising: a first memory and a first processor, the first memory for storing an executable program;

the first processor is configured to read and execute the executable program to implement the image processing model generating method according to any one of claims 1 to 6.

12. An image processing apparatus, comprising: a second memory and a second processor, the second memory for storing an executable program;

the second processor is configured to read and execute the executable program to implement the image processing method according to any one of claims 7 to 10.