CN114782768A

CN114782768A - Training method of pre-training network model, medical image processing method and equipment

Info

Publication number: CN114782768A
Application number: CN202210239706.8A
Authority: CN
Inventors: 杨俊�; 商琨; 王海峰; 梁栋
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-03-12
Filing date: 2022-03-12
Publication date: 2022-07-22

Abstract

The application provides a training method of a pre-training network model, a medical image processing method and equipment, which relate to the technical field of image processing and are used for solving the problems that in the prior art, performing single-channel feature calculation on a plurality of pre-training images to obtain feature images corresponding to the pre-training images, the initial pre-training network model is trained according to each pre-training image and each corresponding characteristic image to obtain a pre-training network model, the characteristic images are used as training targets in the embodiment, the general characteristics of the pre-training images are easier to learn, the generalization capability of the pre-training model is enhanced, the visual network parameters of the obtained pre-training network model are more accurate, and then, the visual network parameters of the pre-training network model are used as initial parameters of the visual network to be trained of the image processing model, so that the training process of the image processing model can be simplified, and the training efficiency is improved.

Description

Training method of pre-training network model, medical image processing method and equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a training method for a pre-training network model, a medical image processing method, and a medical image processing apparatus.

Background

The wide application of deep learning in the field of medical images, especially the success of a new generation of vision-based deep learning model, makes the demands on resources such as computing power required by the training model, trained medical image data and the like more and more.

Due to the fact that data of a specific disease patient are relatively fixed and privacy of the patient data is limited, collected trained medical image data are few, and a high-performance deep learning model usually needs manual labeling of a professional doctor, so that workload of the doctor is increased, and medical resources are further squeezed. Therefore, the high quality data available for deep neural network training is still in short supply. Therefore, the training effect of the visual neural network for medical image task is poor under the condition of low data volume, and the training effect is still a problem to be solved.

Disclosure of Invention

The present application aims to provide a training method, a medical image processing method and a device for pre-training a network model, which are used for solving the above technical problems.

In a first aspect, a training method for pre-training a network model is provided, which includes:

acquiring an initial pre-training network model and a plurality of pre-training images, wherein the initial pre-training network model comprises a visual network to be trained and a first decoding network to be trained;

performing single-channel feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image; training the initial pre-training network model according to each pre-training image and the corresponding characteristic image to obtain the pre-training network model;

the visual network parameters of the pre-training network model can be used as initial parameters of a to-be-trained visual network trained by an image processing model, and the image processing model is used for performing image processing on a to-be-processed two-dimensional medical image.

Based on the scheme, the method performs single-channel feature calculation on multiple pre-training images to obtain feature images corresponding to the pre-training images, trains the initial pre-training network model according to the pre-training images and the feature images corresponding to the pre-training images to obtain the pre-training network model, uses the feature images as training targets, learns general features of the pre-training images more easily, enhances the generalization capability of the pre-training model, obtains more accurate visual network parameters of the pre-training network model, uses the visual network parameters of the pre-training network model as initial parameters of a to-be-trained visual network trained by the image processing model, can simplify the training process of the image processing model, and improves the training efficiency

In a possible implementation manner, the training the initial pre-training network model according to each pre-training image and each corresponding feature image to obtain the pre-training network model includes:

carrying out mask processing on each pre-training image to generate a mask image corresponding to each pre-training image;

inputting each mask image into the initial pre-training network model, and outputting an image to be compared corresponding to each mask image; calculating by using a masking mean square error loss function according to the image to be compared corresponding to each masking image and the characteristic images corresponding to each masking image to obtain a loss value;

determining whether the initial pre-training network model converges according to the loss value; if so, determining the current initial pre-training network model as a pre-training network model after training; if not, performing iterative training until the initial pre-training network model converges to obtain the pre-training network model.

In a possible implementation manner, the performing single-channel feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image includes:

performing grid division on each pre-training image based on a preset grid size to obtain a plurality of blocks;

calculating single-channel characteristics of each block of each pre-training image to obtain a characteristic diagram corresponding to each pre-training image;

correspondingly, the masking each of the pre-training images to generate a mask corresponding to each of the pre-training images includes:

generating a mask for each of the pre-training images based on a preset grid size;

and carrying out mask processing on each pre-training image according to the mask to generate a mask image corresponding to each pre-training image.

In one possible implementation, the pre-training image is a pre-training two-dimensional medical image and/or a natural image; the pre-trained two-dimensional medical image comprises any one or more of: CT images, MRI images.

In a possible implementation manner, the performing single-channel feature calculation on each pre-training image to obtain a single-channel feature of a feature image corresponding to each pre-training image includes:

performing target feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image, wherein the target features include: any one of haar feature, Gabor feature, LBP feature.

In a second aspect, a medical image processing method is provided, including:

acquiring an initial image processing model to be trained, wherein the initial image processing model comprises a visual network to be trained and a second decoding network to be trained, and initial parameters of the visual network to be trained of the initial image processing model are parameters of the visual network of the pre-training network model; the pre-training network model is obtained according to a plurality of pre-training images and respective corresponding characteristic images;

acquiring a plurality of two-dimensional medical images and corresponding labeling information;

training the initial image processing model according to a plurality of two-dimensional medical images and the corresponding labeling information to obtain the image processing model;

and acquiring a two-dimensional medical image to be processed, and performing image processing on the two-dimensional medical image to be processed by using the image processing model.

In one possible implementation, the image processing, using the image processing model, on the two-dimensional medical image to be processed includes:

inputting the two-dimensional medical image to be processed into a visual network of the image processing model to obtain a coding feature map;

and inputting the coding feature map into a second decoder of the image processing model to obtain an image processing result.

In a possible implementation, the image processing includes any one of object recognition, image segmentation, and image classification.

In a third aspect, a training apparatus for pre-training a network model is provided, including:

the pre-training information acquisition module is used for acquiring an initial pre-training network model and a plurality of pre-training images, wherein the initial pre-training network model comprises a visual network to be trained and a first decoding network to be trained;

the feature calculation module is used for performing single-channel feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image;

the pre-training module is used for training the initial pre-training network model according to each pre-training image and the characteristic image corresponding to each pre-training image to obtain the pre-training network model;

the visual network parameters of the pre-training network model can be used as initial parameters of a to-be-trained visual network trained by an image processing model, and the image processing model is used for performing image processing on the to-be-processed two-dimensional medical image.

In one possible implementation, the pre-training module includes:

the mask processing unit is used for performing mask processing on each pre-training image to generate a mask image corresponding to each pre-training image;

the output unit is used for inputting each mask image into the initial pre-training network model and outputting the image to be compared corresponding to each mask image;

the loss value calculation unit is used for calculating by using a masking mean square error loss function according to the images to be compared corresponding to the masking images and the characteristic images corresponding to the masking images to obtain loss values;

the pre-training network model obtaining unit is used for determining whether the initial pre-training network model converges according to the loss value; if so, determining the current initial pre-training network model as a pre-training network model after training; if not, performing iterative training until the initial pre-training network model converges to obtain the pre-training network model.

In one possible implementation, the feature calculation module includes:

the grid division unit is used for carrying out grid division on each pre-training image based on a preset grid size to obtain a plurality of blocks; the feature calculation unit is used for calculating single-channel features of each block of each pre-training image to obtain a feature map corresponding to each pre-training image;

accordingly, the mask processing unit includes:

a mask generation subunit, configured to generate a mask for each of the pre-training images based on a preset grid size;

and the mask processing subunit is used for performing mask processing on each pre-training image according to the mask to generate a mask image corresponding to each pre-training image.

In one possible implementation, the feature calculation module includes:

a feature calculation unit, configured to perform target feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image, where the target feature includes: any one of haar feature, Gabor feature, LBP feature.

In a fourth aspect, there is provided a medical image processing apparatus comprising:

the initial image processing model acquisition module is used for acquiring an initial image processing model to be trained, the initial image processing model comprises a visual network to be trained and a second decoding network to be trained, and initial parameters of the visual network to be trained of the initial image processing model are parameters of the visual network of the pre-training network model; the pre-training network model is obtained according to a plurality of pre-training images and the characteristic images corresponding to the pre-training images;

the training set acquisition module is used for acquiring a plurality of two-dimensional medical images and the labeling information corresponding to the two-dimensional medical images;

the image processing model obtaining module is used for training the initial image processing model according to a plurality of two-dimensional medical images and the corresponding labeling information to obtain the image processing model;

and the image processing module is used for acquiring the two-dimensional medical image to be processed and processing the two-dimensional medical image to be processed by utilizing the image processing model.

In one possible implementation, the image processing module, when performing image processing on the two-dimensional medical image to be processed using the image processing model, is configured to:

In a fifth aspect, an electronic device is provided, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the method for training the pre-trained network model according to any of the possible implementations of the first aspect is performed, or the method for medical image processing according to any of the possible implementations of the second aspect is performed.

In a sixth aspect, a computer-readable storage medium is provided, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for training a pre-trained network model according to any one of the possible implementations of the first aspect, or implements the method for processing medical images according to any one of the possible implementations of the second aspect.

In summary, the present application includes at least one of the following beneficial technical effects:

the application provides a training method of a pre-training network model, a medical image processing method and equipment, in the application, single-channel feature calculation is carried out on a plurality of pre-training images to obtain feature images corresponding to the pre-training images, the initial pre-training network model is trained according to the pre-training images and the feature images corresponding to the pre-training images to obtain the pre-training network model, the feature images are used as training targets, general features of the pre-training images are learned more easily, the generalization capability of the pre-training model is enhanced, the obtained visual network parameters of the pre-training network model are more accurate, the visual network parameters of the pre-training network model are used as initial parameters of a visual network to be trained of the image processing model, the training process of the image processing model can be simplified, and the training efficiency is improved.

Drawings

Fig. 1 is a schematic flowchart of a training method for a pre-training network model according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a pre-training process provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of a medical image processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a training process of an image processing model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of pre-training provided in an embodiment of the present application;

fig. 6 is a block diagram of a training apparatus for pre-training a network model according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a medical image processing apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of a structure of an electronic device according to an embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to fig. 1-8.

The present embodiment is only for explaining the present application, and it is not limited to the present application, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent laws within the scope of the present application.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified.

When deep learning is adopted in the field of medical images to train an image processing model, due to the relatively fixed data of a patient with a specific disease and the limitation of privacy of the patient data, the acquired trained medical image data is less, and the high-performance deep learning model often needs manual marking of a professional doctor, so that the workload of the doctor is increased, and medical resources are further squeezed. There is still a shortage of high quality data available for deep neural network training. Therefore, the training effect of the visual neural network for medical image tasks is poor under the condition of low data volume, and the training effect is still a problem to be solved.

In order to solve the above technical problems, embodiments of the present application are described in further detail below with reference to the drawings of the specification.

The embodiment of the application provides a training method for a pre-training network model, which is executed by electronic equipment, wherein the electronic equipment can be a server or terminal equipment, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud computing service. As shown in fig. 1, fig. 1 is a schematic flow chart of a training method for a pre-training network model provided in the embodiment of the present application, where the method includes:

s110, obtaining an initial pre-training network model and a plurality of pre-training images, wherein the initial pre-training network model comprises a visual network to be trained and a first decoding network to be trained;

in the embodiment of the present application, a large number of pre-training images are required to train the pre-training network model, the pre-training images may be two-dimensional medical images, and specifically may be any one or more of CT images, Magnetic Resonance Imaging (MRI) images, and ultrasound images, but the pre-training images may also be natural images formed by natural light Imaging because the two-dimensional medical images are difficult to collect. If the plurality of pre-training images comprise two-dimensional medical images and natural images, the number of the two-dimensional medical images and the number of the natural images are not limited in the embodiment, and a user can set the pre-training images according to actual requirements. The training of the initial pre-training network model by using the natural image can learn general information, which includes but is not limited to: edge information, contrast information, shape information, texture information, hue information; relevant medical information can be learned by training the initial pre-training network model through the two-dimensional medical image.

Preferably, the modality of the pre-trained image is consistent with the modality of the input image of the image processing model in the actual application scene, so that the accuracy of the parameters of the trained pre-trained network model is improved, the visual network parameters of the pre-trained network model are further used as the initial parameters of the visual network of the image processing model, and the image processing model obtained by training according to the plurality of two-dimensional medical images and the respective corresponding labeling information can effectively improve the training efficiency. If the image processing model is an MRI image artifact removing model, the corresponding pre-training image is an MRI image; for another example, when the image processing model is a CT image disease recognition model, the determined pre-training image is a CT image.

Further, the manner of acquiring the plurality of pre-training images may include: and acquiring a plurality of pre-training images from a local storage, or crawling a plurality of pre-training images from a network, or acquiring a plurality of pre-training images input by a user.

Further, before step S110, the method may further include: acquiring a plurality of initial pre-training images; and carrying out image size processing and image enhancement processing on the plurality of initial pre-training images to obtain corresponding pre-training images, wherein all the pre-training images have the same size. Image enhancement processing includes, but is not limited to: any one or more of contrast enhancement, denoising, filtering, edge sharpening and the like are performed to improve the visual effect of the image, highlight the characteristics of the image and facilitate the further analysis of the image by a machine.

Further, before step S110, the method may further include: acquiring a plurality of initial pre-training images, wherein the initial pre-training images comprise pre-training two-dimensional medical images; and carrying out affine change and/or elastic change on the plurality of pre-training two-dimensional medical images to obtain a plurality of expanded pre-training two-dimensional medical images, wherein the plurality of pre-training images comprise all pre-training two-dimensional medical images and all expanded pre-training two-dimensional medical images.

Specifically, due to the particularity of the medical images, the two-dimensional medical images are difficult to obtain, and therefore, the pre-training two-dimensional medical images are limited, so that the robustness of the pre-training network model obtained when the pre-training two-dimensional medical images are simply adopted as the pre-training images is poor, and accordingly, the accuracy of the visual network parameters of the obtained pre-training network model is insufficient. Therefore, in the embodiment, the pre-training image is expanded by using affine transformation and elastic change to obtain an expanded pre-training image and a pre-training image. The affine transformation is that translation, rotation, scaling, shearing and symmetry are carried out on a plurality of initial pre-training images. The elastic transformation is an operation expansion sample for carrying out elastic transformation on a plurality of initial pre-training images or the initial pre-training images after affine transformation. After affine transformation and elastic transformation, the diversity of the pre-training images can be increased through the obtained final pre-training images, so that the pre-training network model can learn various features.

Specifically, in this embodiment of the present application, the initial pre-training network model includes a to-be-trained visual network that is an encoder of the initial pre-training network model and a to-be-trained first decoding network connected to the visual network, and the to-be-trained first decoding network may be a fully-connected layer with linear activation. Specifically, the visual network may be ResNET18 with an output layer removed, but of course, other configurations are possible, and the present embodiment is not limited to this, as long as the purpose of the present embodiment can be achieved.

S120, performing single-channel feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image; the medical image is generally a single-channel grayscale image, and thus, the embodiment performs single-channel feature calculation on the pre-training images to obtain feature images corresponding to the pre-training images.

Specifically, the target feature calculation may be performed on each pre-training image to obtain a feature image corresponding to each pre-training image, where the target feature includes: haar feature, Gabor feature, LBP feature. In the embodiment, the local features are extracted through the target feature calculation, and then the local similarity in the pre-training image can be utilized to improve the utilization of data.

Generally, the process of model training using the transfer learning method includes: pre-training by using a large data set to enable the model to learn the general characteristics of the image, then finely adjusting parameters in the data set of a specific task to enable the natural image knowledge learned by the model to be transferred to the task image so as to perform classification diagnosis of the task image. However, the pre-training data required by the method needs to be labeled, i.e. supervised, and only the unique features of the label can be learned. The feature map obtained in the step is used as a training target of the initial pre-training network model, so that the initial pre-training network model can learn general features of the pre-training image more easily, the spatial structure, the spatial brightness, the contrast and the like of the pre-training image can be fully understood, unique features of the pre-training image learned by the initial pre-training network model in a manual data labeling mode are avoided, and the generalization capability of the model is improved.

S130, training the initial pre-training network model according to each pre-training image and the corresponding characteristic image to obtain a pre-training network model, wherein the visual network parameters of the pre-training network model can be used as the initial parameters of the visual network to be trained by the image processing model.

Specifically, the images to be compared can be obtained based on the pre-training images and the initial pre-training network model; calculating the image to be compared and the characteristic image by using a preset loss function to obtain a loss value; and performing iterative training on the initial pre-training network model based on the loss value, determining the convergence of the initial pre-training network model when the loss value reaches a preset loss threshold value, or finishing the training of all pre-training images, or finishing the image training of a preset period, and determining the current initial pre-training network model as the trained pre-training network model, wherein the preset loss threshold value can be set by a user in a self-defined manner or according to experience.

The parameters of the pre-training network model in this embodiment include: the visual network parameters and the first decoder parameters are used as initial parameters of a visual network to be trained for training of the image processing model.

In conclusion, compared with most of the existing neural network pre-training methods, the pre-training data set does not need to be marked and is self-supervised, and because the contrast in the two-dimensional medical image mainly changes, compared with other self-supervision pre-training methods, the neural network pre-training method uses the feature map as the pre-training target, so that the neural network learning is better in generalization effect by predicting the local contrast information of the image.

According to the technical scheme, single-channel feature calculation is carried out on a plurality of pre-training images to obtain feature images corresponding to the pre-training images, the initial pre-training network model is trained according to the pre-training images and the feature images corresponding to the pre-training images to obtain the pre-training network model, the feature images are used as training targets, the general features of the pre-training images are learned more easily, the generalization capability of the pre-training models is enhanced, the obtained visual network parameters of the pre-training network model are more accurate, the visual network parameters of the pre-training network model are used as the initial parameters of the visual network to be trained, which is trained by the image processing model, the training process of the image processing model can be simplified, and the training efficiency and the training effect are improved.

Further, the pre-training network model introduced in the above embodiment is obtained by training the initial pre-training network model according to each pre-training image and each corresponding feature image.

The embodiment of the present application provides a pre-training mode with simple operation, and specifically, S130 may include: inputting the pre-training image into an initial pre-training network model to obtain an image to be compared; calculating the image to be compared and the characteristic image by using a preset loss function to obtain a loss value; and carrying out iterative training on the initial pre-training network model based on the loss value to obtain the pre-training network model. In the method, the pre-training image is directly input to the initial pre-training network model for model training, and the method is simple and strong in operability.

Referring to fig. 2, fig. 2 is a schematic diagram of a pre-training process provided in the embodiment of the present application, and specifically, S130 may include: s131, S132, S133, S134, wherein:

s131, performing mask processing on each pre-training image to generate a mask image corresponding to each pre-training image;

in order to further mine the internal relation of the pre-training image and enable the feature extraction obtained by the initial pre-training network model to be more accurate, the pre-training image is subjected to mask processing to obtain a mask image.

The masking process is to mask a part of the pre-training image, and a mask commonly used for the masking may be to use a multi-valued image or a binary matrix. According to the method and the device, the mask is utilized to shield the part of the pre-training image to obtain the mask image, the mask image is utilized to train the initial preprocessing model, so that the difficulty of characteristic local extraction of the initial preprocessing model is improved, and the training process can learn the local comparison information of the predicted image. It is clear that the mask of each pre-training image can be set according to actual requirements, as long as the purpose of the present embodiment can be achieved.

S132, inputting each mask image into the initial pre-training network model, and outputting the image to be compared corresponding to each mask image; s133, calculating by using a masking mean square error loss function according to the image to be compared corresponding to each masking image and the characteristic image corresponding to each masking image to obtain a loss value;

in this embodiment, the loss value is calculated by using a masking mean square error loss function to obtain the loss values of the image to be compared and the feature maps corresponding to the respective masking maps, and it can be understood that, in the embodiment of the present application, the masking mean square error loss function is used for calculation, and only the mean square error of the output feature of the masking part and the part corresponding to the feature image is calculated as the loss value, and the loss value is used for representing the difference between the image to be compared and the feature map corresponding to the masking map.

S134, determining whether the initial pre-training network model is converged or not according to the loss value; if so, determining the current initial pre-training network model as a pre-training network model after training; if not, performing iterative training until the initial pre-training network model converges to obtain the pre-training network model.

Specifically, when the loss value is within the first range, the initial pre-training network model is determined to be converged, when the loss value is not within the first range any more, the initial pre-training network model is determined not to be converged, back propagation is performed based on the loss value to update the weight parameters of the initial pre-training network model, and iterative training is performed until the pre-training network model is obtained. The first range user may be set according to actual requirements or according to empirical values, as long as the purpose of the embodiment can be achieved.

It can be seen that, in the embodiment of the present application, each pre-training image is subjected to mask processing to obtain a corresponding mask image, the mask image is used as input data of an initial pre-training model, an image to be compared corresponding to each mask image is output, the image to be compared corresponding to each mask image is used as an image to be compared, a feature map is used as a target image, a mask mean square error loss function is used for calculation, and the convergence condition of the current initial pre-training model is determined based on the obtained loss value to obtain a final trained pre-training network model.

Further, the present embodiment provides a way of specifically calculating a single-channel feature, specifically, S120, including: s121 (not shown in the drawings), S122 (not shown in the drawings), wherein:

s121, performing grid division on each pre-training image based on a preset grid size to obtain a plurality of blocks;

the preset grid size is based onThe size of the pre-training image is set, specifically, the size can be set by a user self-definition, and can also be set according to the experience of the user. For example, when the pre-training image size is h × w, h is the height value of the pre-training image, w is the width value of the pre-training image, and if the predetermined grid size is h_g×w_gThen n can be divided_h×n_wA grid of which

And, the content in each grid is called a partition, and the partition size is h_g×w_g。

S122, calculating single-channel characteristics of each block of each pre-training image to obtain a characteristic diagram corresponding to each pre-training image;

in this embodiment, single-channel feature calculation is performed on each partition, and specifically, the feature quantity calculated by each partition can be set by a user according to actual conditions. Correspondingly, after the feature quantity of each block is determined, the feature quantity of the feature map is also determined, specifically, the product of the block quantity and the feature quantity of each block, and correspondingly, the number of the neurons of the second encoder is consistent with the feature quantity of the feature map.

Accordingly, S131 includes: s131-1 (not shown in the drawings), S131-2 (not shown in the drawings), wherein:

s131-1, generating a mask for each pre-training image based on the size of a preset grid;

the method for generating the mask is not limited, and the user can customize the setting. In particular, it may be for each pre-training image

Generating a mask

Wherein,

the binary matrix is a binary matrix, each matrix element in the binary matrix corresponds to a block, 1 represents covering, and 0 represents uncovering.

In particular, a random mask may be generated for each pre-training image based on a preset grid size. Specifically, when the training image includes: during MRI images, in the actual sampling process, in order to accelerate the imaging speed of a nuclear magnetic resonance instrument, parallel imaging and undersampling technologies are often adopted, wherein the undersampling technology mainly comprises Cartesian sampling modes such as uniform sampling and random traversal; non-Cartesian sampling modes such as spiral, radial, etc., and active sampling. The mask is generated in a random mode, the data-based acquisition modes can be combined, and the mask production mode is set according to the acquisition modes.

S131-2, performing mask processing on each pre-training image according to the mask to generate a mask image corresponding to each pre-training image.

Therefore, in the embodiment of the application, each pre-training image is subjected to grid division based on the preset grid size to obtain a plurality of blocks, single-channel feature calculation is performed based on each block to obtain the feature image corresponding to the pre-training image, calculation can be simplified, and the operating pressure of the electronic equipment caused by the calculation amount of the feature calculation of the whole pre-training image is avoided.

Further, the pre-training image is a pre-training two-dimensional medical image and/or a natural image; the pre-trained two-dimensional medical image comprises any one or more of: CT images, MRI images. Because the two-dimensional medical images are difficult to collect, the pre-training images can also adopt natural images imaged by natural light so as to obtain a large number of pre-training images for training, and the model parameters of the trained pre-training network model are improved.

It can be understood that the model parameters of a pre-trained network model provided in the embodiment of the present application may be referred to in two-dimensional medical images, and may also be applied to natural maps.

For application of two-dimensional medical images, please refer to fig. 3, fig. 3 is a schematic flowchart of a medical image processing method according to an embodiment of the present application, including:

s210, obtaining an initial image processing model to be trained, wherein the initial image processing model comprises a visual network to be trained and a second decoding network to be trained, and initial parameters of the visual network to be trained of the initial image processing model are parameters of the visual network of the pre-training network model; the pre-training network model is obtained according to a plurality of pre-training images and the characteristic images corresponding to the pre-training images; the structure of the visual network of the initial image processing model is the same as that of the visual network of the initial pre-training network model, but a second decoder of the initial image processing model is different from a first decoder of the initial pre-training network model, and the second decoder of the initial image processing model can be selected according to a specific image processing task.

S220, acquiring a plurality of two-dimensional medical images and corresponding labeling information;

the type of the two-dimensional medical image and the corresponding annotation information may be determined specifically according to the task of image processing. For example, when the image processing is object recognition of an MIR image, the type of the corresponding two-dimensional medical image is the MIR image, and the annotation information is specifically object information of the two-dimensional medical image.

The manner of acquiring the plurality of two-dimensional medical images may include: and acquiring a plurality of two-dimensional medical images from a local storage, or crawling a plurality of two-dimensional medical images from a network, or acquiring a plurality of two-dimensional medical images input by a user. Moreover, image enhancement and affine change and/or elastic change of the images can be performed on a plurality of two-dimensional medical images, the embodiment is not limited, and a user can set the images according to actual requirements.

S230, training the initial image processing model according to the plurality of two-dimensional medical images and the corresponding labeling information to obtain an image processing model;

specifically, a plurality of two-dimensional medical images are input into an initial image processing model for image processing, and a plurality of two-dimensional medical images to be compared are obtained; obtaining a loss value of an initial image processing model based on a plurality of to-be-compared two-dimensional medical images and corresponding labeling information, performing iterative training on the initial image processing model based on the loss value of the initial image processing model until the loss value of the initial image processing model reaches a preset loss threshold value or the iterative training is completed for a preset number of times, and determining the current initial image processing model as the trained image processing model.

Further, outputting the two-dimensional medical image to the initial image processing model for image processing may include: outputting the two-dimensional medical image to a visual network of an initial image processing model, and outputting a training coding feature map; and inputting the training code characteristic graph into a decoder for decoding, and outputting a two-dimensional medical image to be compared.

S240, acquiring the two-dimensional medical image to be processed, and performing image processing on the two-dimensional medical image to be processed by using the image processing model.

The two-dimensional medical image to be processed is a medical image which is actually processed, and the two-dimensional medical image can be correspondingly processed by using the obtained image processing model, so that a corresponding result can be output, and the processing efficiency is improved.

Specifically, in this embodiment of the present application, the performing image processing on the two-dimensional medical image to be processed by using the image processing model may include: inputting a two-dimensional medical image to be processed into a visual network of an image processing model to obtain a coding feature map; and inputting the coding feature map into a second decoder of the image processing model to obtain an image processing result.

Further, inputting the coding feature map into a second decoder of the image processing model, and before obtaining the image processing result, the method may further include: inputting the coding feature map into a feature pyramid module for processing to obtain a processed coding feature map; correspondingly, the method for inputting the coding feature map into a second decoder of the image processing model to obtain an image processing result comprises the following steps: and inputting the processed coding feature map into a decoder for decoding to obtain an image processing result.

Therefore, the visual network parameters of the pre-trained network model are used as the initial parameters of the visual network to be trained for the image processing model training, the training process of the image processing model can be simplified, and the training efficiency is improved.

Further, the image processing includes any one of target recognition, image segmentation, and image classification.

For example, taking MRI images as an example, when performing object recognition, the medical image recognition method may include:

S1A, and a target recognition model training process. Specifically, an initial image processing model to be trained is obtained; acquiring a plurality of MRI images and corresponding target labeling information; and training the initial image processing model according to the plurality of MRI images and the corresponding target labeling information to obtain a target recognition model.

S2A, and carrying out an object recognition process. Specifically, acquiring an MRI image to be identified; inputting the MRI image to be identified into a target identification model for target identification, and outputting an identification result, wherein the step of inputting the MRI image to be identified into the target identification model for target identification comprises the following steps: inputting an MRI image to be identified into a visual network for coding to obtain a coding characteristic diagram; and inputting the coding feature map into a second decoder for decoding processing to obtain an identification result.

For another example, taking MRI images as an example, when performing image segmentation, the medical image processing method may include:

S1B, and carrying out a segmentation model training process. Specifically, an initial image processing model to be trained is obtained; acquiring a plurality of MRI images and corresponding segmentation marking information; and training the initial image processing model according to the plurality of MRI images and the corresponding segmentation marking information to obtain a segmentation model.

S2B, image segmentation process. Specifically, an MRI image to be segmented is acquired; inputting an MRI image to be segmented into a segmentation model for segmentation, and outputting a segmentation result, wherein the step of inputting the MRI image to be segmented into the segmentation model for segmentation comprises the following steps: inputting an MRI image to be segmented into a visual network for coding to obtain a coding characteristic diagram; and inputting the coding feature map into a second decoder for decoding processing to obtain a segmentation result.

For another example, taking MRI images as an example, when performing image classification, the medical image processing method may include:

S1C, and a classification model training process. Specifically, an initial image processing model to be trained is obtained; acquiring a plurality of MRI images and corresponding category marking information; and training the initial image processing model according to the plurality of MRI images and the corresponding class marking information to obtain a classification model.

S2C, and a classification process. Specifically, MRI images to be classified are acquired; inputting the MRI images to be classified into the classification model for target identification, and outputting an identification result, wherein the step of inputting the MRI images to be identified into the classification model for classification comprises the following steps: inputting the MRI images to be classified into a visual network for coding to obtain a coding characteristic diagram; and inputting the coding feature map into a second decoder for decoding processing to obtain a classification result.

Referring to fig. 4, fig. 4 is a schematic diagram of a training process of an image processing model according to an embodiment of the present application, where:

first, data set collection

The embodiment of the application aims at deep learning neural network training of medical images, and the steps comprise pre-training and formal training, so that data of corresponding steps need to be collected.

First, a pre-training data set is collected. The data set may contain a large number of non-repeating two-dimensional medical images, preferably of the same modality as the data of the formal training, and without the need for manual labeling. Two-dimensional medical images of different modalities and/or natural images may also be collected due to two-dimensional medical image collection difficulties. The pre-training data set comprises a plurality of pre-training images, and specifically comprises n_pA picture, is marked as

For each one

The pre-training image needs to be scaled to a fixed h x w size and simply pre-processed.

Second, a formal training data set is collected. The formal training data set includes a plurality of two-dimensional medical images, and accurate manual annotations of corresponding tasks. And (4) zooming each two-dimensional medical image in the formal training data set to a fixed h multiplied by w size and performing simple preprocessing. It will be appreciated that in the case of segmentation tasks, it is also necessary to scale the annotation using nearest neighbor interpolation.

And II, pre-training.

Step one, dividing grids. Firstly, the preset grid size h of a single grid is determined_g×w_gEach pre-training image can be divided into n_h×n_wA grid of

In a pre-training image, the content in a grid is called a block, and the block size is h_g×w_g。

And step two, calculating the haar features. To one to

Calculating n for each of the blocks_fA haar feature, one n can be obtained_f×n_h×n_wCharacteristic diagram Y of⁽ⁱ⁾。

And step three, pre-training preparation. One visual network, e.g., ResNet18 with its output layers removed, is selected as the model encoder, denoted as E. After E there is a linearly activated fully connected layer D₀As decoder output layer, D₀The number of the middle neurons is n_out＝n_f×n_h×n_w。

And step four, pre-training. In the pre-training process, each pre-training image is subjected to

Generating a random mask

Wherein,

is a binary matrix, each matrix element corresponds to a block, 1 represents covering, and 0 represents uncovering.

After generation, a mask process is applied to the corresponding image patch whose mask value is 1. The mask processing is to replace the block content with a matrix T, T being an h_g×w_gPlease refer to fig. 5 for a trainable parameter of size, which is a "mask" in a pre-training schematic diagram provided in an embodiment of the present application. The image after the mask is applied is noted as J⁽ⁱ⁾. Then J is mixed⁽ⁱ⁾Inputting an initial pre-training network model to obtain an output vector

Then outputting the vector

Deformation to n_f×n_h×n_wOf (2) is detected. The pre-trained target vector is the result of step two, namely feature map Y⁽ⁱ⁾. Loss function L of pre-training⁽ⁱ⁾Outputting vectors for a network

And the target vector Y⁽ⁱ⁾The masking mean square error of (a) is expressed as:

wherein j is the channel number of the feature map, k is the pixel row number in the feature map and the mask, l is the pixel column number in the feature map and the mask,

the symbol represents element-by-element multiplication of corresponding positions of the matrix.

According to the loss function, the back propagation operation is executed, and the fourth step is circulated until the whole pre-training set is traversed

And taking the obtained result as a pre-training period until a pre-training network model is obtained.

Specifically, the description will be given with reference to fig. 5, in which haar features are calculated for a pre-training image (original image "map") to obtain a target vector (feature map); and applying a random mask to the pre-training image, inputting the image after the mask application as an input image into an initial pre-training network model comprising a visual network and a linear full-connection layer to obtain an output vector (an image to be compared), and training based on the target vector, the output vector and a loss function to obtain the pre-training network model.

And step five, storing parameters, and facilitating formal training calling.

Wherein, the pre-training network model comprises: a visual network parameter and a first decoder parameter. The visual network parameters of the pre-training network model can be used as initial parameters of the visual network to be trained by the image processing model.

Three, formal training

And calling the parameters of the pre-training model during formal training. Selecting a suitable second decoder, denoted D, depending on the task of image processing₁. The neural network forward propagation computation process becomes D₁(E(J⁽ⁱ⁾)). And finally, performing formal training based on the formal training data and the initial image processing model to obtain the image processing model.

Referring to fig. 6, fig. 6 is a block diagram of a structure of a training apparatus for pre-training a network model according to an embodiment of the present disclosure, including:

a pre-training information obtaining module 610, configured to obtain an initial pre-training network model and a plurality of pre-training images, where the initial pre-training network model includes a visual network to be trained and a first decoding network to be trained;

the feature calculation module 620 is configured to perform single-channel feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image;

a pre-training module 630, configured to train the initial pre-training network model according to each pre-training image and each corresponding feature image, to obtain a pre-training network model;

the visual network parameters of the pre-training network model can be used as initial parameters of a to-be-trained visual network trained by the image processing model, and the image processing model is used for performing image processing on the to-be-processed two-dimensional medical image.

In one possible implementation, the pre-training module includes:

the mask processing unit is used for performing mask processing on each pre-training image to generate a mask image corresponding to each pre-training image; the output unit is used for inputting each mask image into the initial pre-training network model and outputting the image to be compared corresponding to each mask image; the loss value calculation unit is used for calculating by using a masking mean square error loss function according to the image to be compared corresponding to each masking image and the characteristic images corresponding to each masking image to obtain a loss value;

the pre-training network model obtaining unit is used for determining whether the initial pre-training network model converges according to the loss value; if so, determining the current initial pre-training network model as a pre-training network model after training; and if not, performing iterative training until the initial pre-training network model converges to obtain the pre-training network model.

In one possible implementation, the feature calculation module includes:

the grid division unit is used for carrying out grid division on each pre-training image based on the preset grid size to obtain a plurality of blocks; the feature calculation unit is used for calculating the single-channel feature of each block of each pre-training image to obtain a feature map corresponding to each pre-training image;

accordingly, the mask processing unit includes:

a mask generation subunit, configured to generate a mask for each pre-training image based on a preset grid size;

In one possible implementation, the pre-training images are pre-training two-dimensional medical images and/or natural images; the pre-trained two-dimensional medical image comprises any one or more of: CT images, MRI images.

In one possible implementation, the feature calculation module includes:

the feature calculation unit is configured to perform target feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image, where the target feature includes: any one of haar feature, Gabor feature, LBP feature.

In the following, a medical image processing apparatus provided in an embodiment of the present application is described, and a medical image processing apparatus described below and a medical image processing method described above may be referred to correspondingly, where the medical image processing apparatus of the present embodiment is disposed in an electronic device, and with reference to fig. 7, fig. 7 is a block diagram of a medical image processing apparatus provided in an embodiment of the present application, and includes:

an initial image processing model obtaining module 710, configured to obtain an initial image processing model to be trained, where the initial image processing model includes a visual network to be trained and a second decoding network to be trained, and initial parameters of the visual network to be trained of the initial image processing model are parameters of a visual network of a pre-training network model; the pre-training network model is obtained according to a plurality of pre-training images and the characteristic images corresponding to the pre-training images;

a training set obtaining module 720, configured to obtain a plurality of two-dimensional medical images and corresponding labeling information;

the image processing model obtaining module 730 is configured to train the initial image processing model according to the multiple two-dimensional medical images and the respective corresponding annotation information to obtain an image processing model;

and the image processing module 740 is configured to acquire a two-dimensional medical image to be processed, and perform image processing on the two-dimensional medical image to be processed by using the image processing model.

In one possible implementation, the image processing module 740, when performing image processing on the two-dimensional medical image to be processed using the image processing model, is configured to:

inputting a two-dimensional medical image to be processed into a visual network of an image processing model to obtain a coding feature map;

In one possible implementation, the image processing includes any one of target recognition, image segmentation, and image classification.

In the following, an electronic device provided by an embodiment of the present application is introduced, and the electronic device described below and the method described above may be referred to correspondingly.

In an embodiment of the present application, an electronic device is provided, as shown in fig. 8, and fig. 8 is a block diagram of a structure of an electronic device provided in an embodiment of the present application, specifically, an electronic device 800 shown in fig. 8 includes: a processor 801 and a memory 803. Wherein the processor 801 is coupled to a memory 803, such as via a bus 802. Optionally, the electronic device 800 may also include a transceiver 804. It should be noted that the transceiver 804 is not limited to one in practical applications, and the structure of the electronic device 800 is not limited to the embodiment of the present application.

The Processor 801 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other Programmable logic device, transistor logic, hardware components, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 801 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and combinations of microprocessors, and the like.

Bus 802 may include a path that transfers information between the above components. The bus 802 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 802 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but that does not indicate only one bus or one type of bus.

The Memory 803 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact disk Read Only Memory) or other optical disk storage, optical disk storage (including Compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 803 is used for storing application program code for performing the present solution and is controlled in execution by the processor 801. The processor 801 is configured to execute application program code stored in the memory 803 to implement the training method of the pre-trained network model shown in the foregoing method embodiment, or to implement the medical image processing method shown in the foregoing method embodiment when executed.

Wherein, the electronic device includes but is not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.

The following describes a computer-readable storage medium provided by embodiments of the present application, and the computer-readable storage medium described below and the method described above may be referred to correspondingly.

The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program implements the training method of the pre-training network model shown in the foregoing method embodiment, or implements the medical image processing method shown in the foregoing method embodiment.

Since the embodiment of the computer-readable storage medium portion and the embodiment of the method portion correspond to each other, please refer to the description of the embodiment of the method portion for the embodiment of the computer-readable storage medium portion, which is not repeated here.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless otherwise indicated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a few embodiments of the present application and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present application, and that these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A training method for pre-training a network model is characterized by comprising the following steps:

performing single-channel feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image;

training the initial pre-training network model according to each pre-training image and the corresponding characteristic image to obtain the pre-training network model;

2. The method according to claim 1, wherein the training of the initial pre-trained network model according to each pre-trained image and its corresponding feature image to obtain the pre-trained network model comprises:

inputting each mask image into the initial pre-training network model, and outputting an image to be compared corresponding to each mask image;

calculating by using a masking mean square error loss function according to the image to be compared corresponding to each masking image and the characteristic image corresponding to each masking image to obtain a loss value;

3. The method for training the pre-training network model according to claim 2, wherein the performing single-channel feature calculation on each pre-training image to obtain a feature image corresponding to each pre-training image comprises:

performing mesh division on each pre-training image based on a preset mesh size to obtain a plurality of blocks;

correspondingly, the masking each pre-training image to generate a mask map corresponding to each pre-training image includes:

4. The training method of the pre-trained network model according to claim 1, wherein the pre-trained image is a pre-trained two-dimensional medical image and/or a natural image;

the pre-trained two-dimensional medical image comprises any one or more of: CT images, MRI images.

5. The method for training a pre-training network model according to any one of claims 1 to 4, wherein the performing single-channel feature calculation on each pre-training image to obtain a single-channel feature of a feature image corresponding to each pre-training image comprises:

6. A medical image processing method, characterized by comprising:

acquiring an initial image processing model to be trained, wherein the initial image processing model comprises a visual network to be trained and a second decoding network to be trained, and initial parameters of the visual network to be trained of the initial image processing model are parameters of the visual network of the pre-trained network model; the pre-training network model is obtained according to a plurality of pre-training images and the characteristic images corresponding to the pre-training images;

7. The medical image processing method according to claim 6, wherein the image processing the two-dimensional medical image to be processed using the image processing model comprises:

8. The medical image processing method according to claim 6, wherein the image processing includes any one of object recognition, image segmentation, and image classification.

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing a training method of a pre-trained network model according to any one of claims 1 to 5, or performing a medical image processing method according to any one of claims 6 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the training method of the pre-trained network model according to any one of claims 1 to 5 or the medical image processing method according to any one of claims 6 to 8.