WO2023109651A1 - 训练模型的方法、图像处理方法、装置及电子设备 - Google Patents

训练模型的方法、图像处理方法、装置及电子设备 Download PDF

Info

Publication number
WO2023109651A1
WO2023109651A1 PCT/CN2022/137661 CN2022137661W WO2023109651A1 WO 2023109651 A1 WO2023109651 A1 WO 2023109651A1 CN 2022137661 W CN2022137661 W CN 2022137661W WO 2023109651 A1 WO2023109651 A1 WO 2023109651A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
processed
target
human tissues
Prior art date
Application number
PCT/CN2022/137661
Other languages
English (en)
French (fr)
Inventor
申浩
刘周
胡战利
何品
任雅
罗德红
梁栋
刘新
郑海荣
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2023109651A1 publication Critical patent/WO2023109651A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present application belongs to the field of image processing, and in particular relates to a method for training a model, an image processing method, device and electronic equipment.
  • the embodiments of the present application provide a method for training a model, an image processing method, a device, and an electronic device, which can automatically segment the distribution areas of tissues such as muscles and fats of the human body, and improve accuracy.
  • the embodiment of the present application provides a method for training a model, the method comprising:
  • the first image includes multiple human tissues, and the second image is marked with multiple human tissues.
  • multiple target weights correspond to multiple human tissues one-to-one, and part of the target weights in the multiple target weights are greater than the corresponding preset weights.
  • the preset weights are the volume of the previous convolution layer The weight used in product processing;
  • Model training is performed according to the second image and the third image to obtain the target model, the third image is obtained after manual processing, and the areas occupied by multiple human tissues in the third image are marked on the third image.
  • the target weight of a specified type of human tissue among the plurality of human tissues is greater than the corresponding preset weight.
  • the specified types of human tissue include abdominal fat and abdominal muscle.
  • the preset model is based on the Unet model.
  • the current convolution layer includes a first convolution module and a second convolution module, the first convolution module is located before the second convolution module, and the first convolution module is used to extract multiple human tissues in the first image information, the second convolution module is used to adjust the preset weight to obtain the target weight.
  • the embodiment of the present application provides a device for training a model, which includes:
  • the processing unit is configured to perform convolution processing on the first image using multiple target weights through the current convolution layer of the preset model to obtain a second image, the first image includes multiple human tissues, and the second image is marked with multiple For the area occupied by human body tissues in the second image, multiple target weights correspond to multiple human tissues one by one, and part of the target weights in the multiple target weights are greater than the corresponding preset weights, and the preset weights are the previous volume The weight used by the multilayer in the convolution process; model training is performed according to the second image and the third image to obtain the target model, the third image is obtained after manual processing, and multiple human tissues are marked on the third image The area occupied in the third image.
  • the embodiment of the present application provides an image processing method, the method comprising:
  • the image to be processed is input into the target model to obtain a processed image, the processed image is marked with areas occupied by multiple human tissues in the image to be processed, and the target model is the target model obtained in the aforementioned first aspect.
  • an image processing device which includes:
  • An acquisition module configured to acquire images to be processed, where the images to be processed include multiple human tissues;
  • the processing module is used to input the image to be processed into the target model to obtain the processed image, and the processed image is marked with the areas occupied by multiple human tissues in the image to be processed, and the target model is the target obtained in the aforementioned first aspect Model.
  • the embodiment of the present application also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor implements the image processing as described in the third aspect when executing the computer program method.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions are run on the computer, the computer executes the first The method for training a model described in the aspect, or the image processing method described in the third aspect.
  • the embodiment of the present application also provides a computer program product, the computer program product includes a computer program, and when the computer program product is run on a computer, the method for training a model as described in the first aspect is implemented , or the image processing method described in the third aspect.
  • the embodiment of the present application provides a method for training a model and an image processing method.
  • an attention mechanism is added so that the weight of some targets of the current convolutional layer is greater than the preset weight of the previous convolutional layer. , so as to improve the processing effect of the target model.
  • the target model is used to replace the traditional manual segmentation, so as to realize the automatic segmentation of multiple human tissues in CT images.
  • the method provided in the embodiment of the present application can segment multiple human tissues at one time, improving the accuracy.
  • FIG. 1 is a schematic flow chart of a method for training a model provided in an embodiment of the present application
  • Fig. 2 is a schematic structural diagram of a preset model provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a second convolution module provided by an embodiment of the present application.
  • FIG. 4 is a schematic flow chart of an image processing method provided in an embodiment of the present application.
  • Fig. 5 is a schematic flow chart of another image processing method provided by the embodiment of the present application.
  • Figure (a) in Figure 6 is a schematic diagram of an original image provided by the embodiment of the present application.
  • Figure (b) in Figure 6 is a schematic diagram of an annotated image provided by the embodiment of the present application.
  • Figure (c) in Figure 6 is a schematic diagram of an output image provided by the embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • plural means two or more, and “at least one” and “one or more” mean one, two or more.
  • references to "one embodiment” or “some embodiments” or the like in this specification means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.
  • CT Computerized tomography
  • Tomographic images through X-rays penetrating human body substances with different attenuation coefficients.
  • the tomographic images can clearly show the distribution areas of tissues such as human muscle and fat.
  • the nutrition of the human body can be evaluated State, diagnosis of obesity, metabolic disease, cardiovascular disease, hypertension, cancer and other diseases.
  • the embodiment of the present application provides a method for training a model and an image processing method, through training the model, end-to-end automatic segmentation of human tissues such as muscle and fat is realized.
  • the attention mechanism is added to re-correct the features (human tissue) in the form of weights, so that some target weights of the current convolutional layer are greater than the preset weights of the previous convolutional layer, thereby improving the target model. processing effect.
  • the target model is used to replace the traditional manual segmentation, so as to realize the automatic segmentation of multiple human tissues in the CT image and improve the accuracy.
  • the method provided in the embodiment of the present application can automatically segment multiple human tissues at one time, and has the advantages of high Dice coefficient, fast convergence speed, strong generalization, and high algorithm robustness.
  • FIG. 1 shows a method 100 for training a model provided by the embodiment of the present application.
  • the method 100 includes at least the following steps:
  • the original image is the CT scanned by the instrument
  • the image which may also be called a CT slice, includes multiple human tissues, such as subcutaneous fat, abdominal muscle, visceral fat, and other tissues.
  • the annotated image marks the area occupied by subcutaneous fat, abdominal muscle, visceral fat, and other tissues in the original image, and the annotated image can be manually annotated by a physician in the hospital.
  • the labeled image corresponds to the third image.
  • CT images use CT values to represent the absorption coefficient of each tissue and organ of the human body.
  • Hounsfield Units are the units of CT values. The values on the CT images represent whether the photographed parts are normal or have lesions.
  • Abdominal muscles can be calibrated using a first predetermined threshold of -29HU to +150HU, and abdominal adipose tissue can be calibrated using a second predetermined threshold of -190HU to -30HU.
  • Physicians can annotate the original image according to the value of the pixels in the CT image.
  • Preprocess raw images before feeding them into preset models Preprocess raw images before feeding them into preset models. Set the value below -128HU in the original image to -128, and the value above 150HU to 150 to reduce the amount of information input into the network model.
  • the CT image may be scanned by different instruments, and z-score normalization is performed on the original image. , reducing the gap between different instruments:
  • x i is the HU value of each pixel
  • ⁇ i E(xi )
  • is a constant.
  • the labeled image is a single-channel grayscale image.
  • the single-channel grayscale image can be divided into multi-channel images according to human tissue.
  • the single-channel grayscale image can be divided into The grayscale image is encoded into four channel images according to subcutaneous fat, abdominal muscle, visceral fat and other tissues.
  • the pixel of the subcutaneous fat position is set to 1, and the pixels of other parts are set to 0;
  • the position of abdominal muscles is set to 1, and other parts are set to 0;
  • the position of visceral fat is set to 1, and other parts are set to 0;
  • the channel images corresponding to other tissues In , the positions of other tissues are set to 1, and other parts are set to 0; then the image output by the model can be a four-channel image.
  • All the acquired original images are divided into training set and verification set, such as a total of 545 original images, of which 436 original images are used as the training set and 109 original images are used as the verification set.
  • the number of original images in the training set is increased by using data enhancement methods such as random rotation, random horizontal flip, and random vertical flip on the original images.
  • the original image in the training set is used as the input of the model, and the size of the original image can be determined according to the actual situation, for example, the size of each original image is 512 ⁇ 512 pixels.
  • the Unet model is used as the basic model, and the preset model is constructed in combination with the attention mechanism.
  • FIG. 2 shows a schematic structural diagram of a preset model provided by an embodiment of the present application.
  • the depth of the preset model is 5, including 5 convolutional layers from top to bottom, which is a U-shaped structure.
  • the preset model includes two parts, encoding and decoding. Encoding is performed on the left side of the U-shape. The encoding is used to obtain the information of human tissue, reduce the spatial dimension of the image, and increase the number of channels. Decoding is performed on the right side of the U-shape to precisely locate human tissue.
  • each convolution layer is provided with a first convolution module (ResConv) and a second convolution module (SECA), the first convolution module is used to extract the information of multiple human tissues in the input image, and the second convolution module uses It is used to determine the weight corresponding to each human tissue, and increase or decrease the weight corresponding to each human tissue.
  • the number of the first convolution module and the number of the second convolution module may be specifically set according to needs, and this embodiment of the present application does not impose any limitation on this.
  • each convolutional layer is provided with a first convolution module and a second convolution module in the decoding stage, and a first convolution module and a second convolution module in the encoding stage. Two convolution modules.
  • the image in the training set (the first image) is input to the first convolutional layer, and the input image is channel-increased through the first convolutional module in the first convolutional layer to obtain multiple channels Figure, which is equivalent to enlarging the features of the input image; and extracting the information of multiple human tissues.
  • the single-channel input image 1 ⁇ 512 ⁇ 512 is converted into an 8-channel image 8 ⁇ 512 ⁇ 512, and the preset weight is artificial set manually. Adjust the weight of each human tissue in the 8-channel image 8 ⁇ 512 ⁇ 512 through the second convolution module to obtain the target weight, for example, increase the weight of subcutaneous fat, abdominal muscle, and visceral fat, and reduce the weight of other tissues.
  • the numbers 1, 8, and 16 above the module indicate the number of channels of the feature map, and the number 512 ⁇ 512 on the side of the module indicates the size of the feature map.
  • the feature information (8-channel image 8 ⁇ 512 ⁇ 512) output by the second convolution module in the first convolution layer is down-sampled by convolution to obtain an image of 8 ⁇ 256 ⁇ 256, and then input into the second Continue to add channels in the first convolution module of the convolutional layer, extract feature information, and obtain a 16-channel image of 16 ⁇ 256 ⁇ 256.
  • the weight used by the second convolutional layer is the second volume of the first convolutional layer
  • the output weight of the convolution module (the target weight of the current convolution layer, the preset weight of the next convolution layer).
  • the second convolutional module through the second convolutional layer continues to adjust the weights for each human tissue. It should be understood that downsampling by convolution can reduce the image size and obtain more local information of the image.
  • the feature information output by the second convolution module in the second convolution layer is down-sampled to 128 ⁇ 128 through convolution, and then input to the first convolution module in the third convolution layer to continue adding channels to 32.
  • Extract feature information The feature information output by the second convolution module in the third convolution layer is continuously down-sampled through convolution, and then input to the first convolution module in the fourth convolution layer to continue increasing the channel to 64 to extract feature information.
  • the feature information output by the second convolution module in the fourth convolution layer is continuously down-sampled through convolution, and then input to the first convolution module in the fifth convolution layer to continue increasing the channel to 128 to extract feature information, At this time, the image size is 128 ⁇ 32 ⁇ 32. Increasing the convolutional layer makes the convolution deeper, the obtained feature dimension is higher, more abstract, and more functions are fitted at the same time, which finally makes the image effect better.
  • the second convolution module adjusts the weight of each human tissue.
  • the convolution size and padding of the first convolution module are 7 and 3 respectively; in the remaining convolution layers, the convolution size of the first convolution module Both are 3, padding is 1, and the step size of all convolution modules is 1.
  • Downsampling can be done in a variety of ways, such as Max Pooling.
  • the feature information output by the second convolution module in the fifth convolutional layer is upsampled through deconvolution, and then skip-connected (Skip-Connection), and the fourth volume when downsampling
  • the feature information extracted by the second convolution module in the product layer is spliced, and the spliced feature information is sequentially transmitted to the first convolution module and the second convolution module, and the first convolution module in the upsampling process is reduced.
  • the channel performs feature extraction, and the second convolution module also adjusts the weights.
  • the upsampling operation is used to enlarge the size of the image features and improve the image resolution; for example, the upsampling operation can use the interpolation method, that is, to insert a new image between the pixels on the basis of the original image pixels using a suitable interpolation algorithm. Elements.
  • the fourth convolutional layer is transmitted to the third convolutional layer in turn; until it is transmitted to the first convolutional layer, a 1 ⁇ 1 convolution (1 ⁇ 1Conv2D) is used for channel adjustment to obtain an output image of the same size as the input image, or , restore 4 channel grayscale images of 4 ⁇ 512 ⁇ 512 as the output image.
  • a 1 ⁇ 1 convolution (1 ⁇ 1Conv2D) is used for channel adjustment to obtain an output image of the same size as the input image, or , restore 4 channel grayscale images of 4 ⁇ 512 ⁇ 512 as the output image.
  • upsampling uses transposed convolution (UpConv) to restore spatial resolution
  • Convolution kernel size, step size, and padding in the upsampling process are the same as those in the downsampling process. Since the times of downsampling and upsampling are the same, the size of the output image obtained by processing is the same as that of the input image.
  • UpConv transposed convolution
  • FIG. 3 shows a schematic structural diagram of a second convolution module in FIG. 2 .
  • the second convolution module is the channel attention module, which adds an attention mechanism to the preset model, sends the features extracted by the first convolution module to the channel attention module, and treats the features of each channel differently, so that the preset
  • the model pays more attention to important features, suppresses other irrelevant features, and makes important features have a greater weight in the process of processing.
  • the feature map input to the second convolution module is convolved through convolution kernels of different sizes, and features are extracted from the input feature map, the difference part is strengthened, and the other part is weakened.
  • two kinds of convolutions are performed on a feature map with a size of H ⁇ W ⁇ C to obtain images U 1 and U 2 , and the sizes of convolution kernels used are 3 and 5, namely kernel3 ⁇ 3 and kernel5 ⁇ 5.
  • Squeeze Add the images obtained by the two convolutions (element-wise sum) to obtain the image U, and then perform a squeeze operation (Squeeze) on U, and obtain a C ⁇ 1 ⁇ 1 channel level from the C ⁇ H ⁇ W feature map Vector S, vector S compresses each space into a sequence of real numbers.
  • the implementation of Squeeze can be global average pooling (Fgp). Global average pooling averages all pixel values of the feature map to obtain a value, that is, use this value to represent the corresponding feature map, which can reduce the number of parameters and reduce the amount of calculation. , to reduce overfitting.
  • one-dimensional convolution (Conv1D) is used to avoid dimensionality reduction, and the vector Z is obtained by learning channel attention, and then after two fully connected layers (F fc ) and softmax activation functions, the channel attention vector is output, that is, two matrices ⁇ and ⁇ , matrix multiplication of ⁇ and ⁇ with the feature maps U 1 and U 2 after convolution of the original feature map (element-wise multiplication), and then add and output the final vector V, the vector V corresponds to the second convolution module The output multi-channel image after adjusting the weights.
  • Conv1D one-dimensional convolution
  • S103 Perform convolution processing on the first image using multiple target weights through the current convolution layer of the preset model to obtain a second image, and perform model training according to the second image and the third image to obtain a target model.
  • the first image is an image in the training set, the first image is input, and the second image after convolution processing by the first convolution layer is output.
  • the second convolutional layer uses the second image output by the first convolutional layer as the first image to perform convolution processing to obtain the second image output by the second convolutional layer. The same goes for other convolutional layers.
  • the output image is obtained through the preset model
  • the output image is compared with the third image, the difference between the output image and the third image is calculated, that is, the loss value, and the optimization direction of the preset model is determined according to the loss value.
  • this embodiment of the application can use the cross-entropy loss function to calculate the loss value, for a given set y i represents the label of the image prediction, Represents the real label of the image, M is the number of categories, and N is the total number of samples, then the loss function calculation formula is:
  • y i, c is a sign function (0 or 1), if the true category of pixel i is equal to c, it takes 1, otherwise it is 0.
  • p i,c is the probability that the i-th pixel is predicted to be the c-th class.
  • the Adam optimizer can be used to optimize the loss function during the training process, and the cosine annealing algorithm can be used to attenuate the learning rate. It should be understood that other methods can also be used to optimize the loss function and learning rate.
  • the embodiment of the present application is here No limit.
  • the five-fold cross-validation method can be used to divide all the obtained original images, that is, to divide all the obtained original images into 5 equal parts on average, and to train the model.
  • One of them is used as a training set, and the other 4 are used as a verification set, and the training method is the same as the embodiment shown in Figure 1, then 5 models can be trained to obtain 5 trained models, and these 5 trained models can be used
  • Certain data transformations, such as summation obtain the target model, and the target model trained by this method can solve the overfitting problem in the case of sparse labeling.
  • FIG. 4 is an image processing method 400 provided by the embodiment of the present application.
  • the method 400 includes at least the following steps:
  • S401 Acquire an image to be processed, where the image to be processed includes multiple human tissues.
  • S402 Input the image to be processed into the target model to obtain a processed image, and the processed image is marked with areas occupied by multiple human tissues in the image to be processed.
  • the CT images taken in medical activities are input into the target model to obtain processed images.
  • the processed images are marked with the areas occupied by multiple human tissues in the images to be processed, that is, the images are segmented. Doctors Body composition analysis can be performed from the segmented images.
  • the original image above may be an enhanced CT image, or a plain scan CT image.
  • the target model trained by enhanced CT images it can also process plain scan CT images, which enhances the generalization of the network model.
  • the method provided in the embodiment of the present application can also be applied to positron emission computed tomography (Positron Emission Computed Tomography, PET) and magnetic resonance imaging ( Magnetic Resonance Imaging, MRI) and other types of image segmentation.
  • positron emission computed tomography PET
  • Magnetic Resonance Imaging Magnetic Resonance Imaging
  • the embodiment of the present application provides a network model training method and an image processing method, as shown in Figure 5, the training set and the test set are obtained, and medical experts mark each image in the training set and the test set , get the label during model training, use data augmentation means to increase the number of images on the training set, after normalization and cross-validation, use the training set to train the model, and obtain the target model.
  • the attention mechanism is added to explore the diversity of feature information, so that the network pays more attention to important features and suppresses the weight of other features, thereby improving the processing effect of the target model.
  • test set to test the target model input the test set to the target model, obtain the segmentation result, compare the segmentation result with the label of the test set, and realize the performance analysis of the target model.
  • the target model is used to replace the traditional manual segmentation, so as to realize the automatic segmentation of each human tissue in the CT image.
  • the method provided in the embodiment of the present application can segment multiple human tissues at one time, improving the accuracy.
  • the embodiment of the present application can obtain a better segmentation result of the plain-scan CT image without training the plain-scan CT image.
  • the method provided by the embodiment of the present application effectively improves the performance of the model, accelerates the convergence speed of the model, and enhances the generalization of the model.
  • FIG. 7 shows an image processing device 700 provided by an embodiment of the present application.
  • the device 700 includes: an acquisition module 701 and a processing module 702 .
  • the acquiring module 701 is configured to acquire an image to be processed, and the image to be processed includes a plurality of human tissues.
  • the processing module 702 is configured to input the image to be processed into the target model to obtain a processed image, and the processed image is marked with areas occupied by multiple human tissues in the image to be processed.
  • the device 700 of the embodiment of the present application can be implemented by an application-specific integrated circuit (ASIC), or a programmable logic device (programmable logic device, PLD), and the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
  • the method shown in FIG. 4 can also be realized by software.
  • the device 700 and its modules can also be software modules.
  • FIG. 8 is a schematic structural diagram of an electronic device 800 provided by an embodiment of the present application.
  • the device 800 includes a processor 801 , a memory 802 , a communication interface 803 and a bus 804 .
  • the processor 801, the memory 802, and the communication interface 803 communicate through the bus 804, or communicate through other means such as wireless transmission.
  • the memory 802 is used to store instructions, and the processor 801 is used to execute the instructions stored in the memory 802 .
  • the memory 802 stores a program code 8021, and the processor 801 can call the program code 8021 stored in the memory 802 to execute the image processing method shown in FIG. 4 .
  • the processor 801 may be a CPU, and the processor 801 may also be other general-purpose processors, digital signal processors (DSP), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) ) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the memory 802 may include read-only memory and random-access memory, and provides instructions and data to the processor 801 .
  • Memory 802 may also include non-volatile random access memory.
  • the memory 802 can be volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • bus 804 may also include a power bus, a control bus, a status signal bus, and the like. However, the various buses are labeled as bus 804 in FIG. 8 for clarity of illustration.
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Image Processing (AREA)

Abstract

本申请实施例提供了一种训练模型的方法和一种图像处理方法,应用于图像处理领域,尤其是医学图像处理,在训练模型的过程中,加入注意力机制,以权重的形式对各人体组织的区域进行重校正,使得当前卷积层的部分目标权重大于前一卷积层的预设权重,从而提升目标模型的处理效果。在图像处理过程中,利用该目标模型代替传统的手动分割,从而实现CT图像中多个人体组织的自动分割。本申请实施例提供的方法,能够一次性分割出多个人体组织,提高准确性。

Description

训练模型的方法、图像处理方法、装置及电子设备 技术领域
本申请属于图像处理领域,尤其涉及一种训练模型的方法、图像处理方法、装置及电子设备。
背景技术
针对上述技术问题,本申请实施例提供一种训练模型的方法、图像处理方法、装置及电子设备,能够自动分割出人体的肌肉和脂肪等组织的分布区域,提高准确性。
第一方面,本申请实施例提供了一种训练模型的方法,该方法包括:
通过预设模型的当前卷积层对第一图像使用多个目标权重进行卷积处理,得到第二图像,第一图像中包括多个人体组织,第二图像上标注了多个人体组织在第二图像中所占的区域,多个目标权重与多个人体组织一一对应,多个目标权重中的部分目标权重大于相对应的预设权重,预设权重是前一卷积层在进行卷积处理时使用的权重;
根据第二图像与第三图像进行模型训练,得到目标模型,第三图像是经人工处理后得到的,第三图像上标注了多个人体组织在第三图像中所占的区域。
特别的,多个人体组织中指定类型的人体组织的目标权重大于相对应的预设权重。
指定类型的人体组织包括腹部脂肪和腹部肌肉。
特别的,预设模型以Unet模型为基础模型。
特别的,当前卷积层包括第一卷积模块和第二卷积模块,第一卷积模块位于第二卷积模块之前,第一卷积模块用于提取第一图像中的多个人体组织的信息,第二卷积模块用于对预设权重进行调整得到目标权重。
第二方面,本申请实施例提供了一种训练模型的装置,该装置包括:
处理单元,用于通过预设模型的当前卷积层对第一图像使用多个目标权重进行卷积处理,得到第二图像,第一图像中包括多个人体组织,第二图像上标注了多个人体组织在第二图像中所占的区域,多个目标权重与多个人体组织一一对应,多个目标权重中的部分目标权重大于相对应的预设权重,预设权重是前一卷积层在进行卷积处理时使用的权重;根据第二图像与第三图像进行模型训练,得到目标模型,第三图像是经人工处理后得到的,第三图像上标注了多个人体组织在第三图像中所占的区域。
第三方面,本申请实施例提供了一种图像处理方法,该方法包括:
获取待处理图像,所述待处理图像包括多个人体组织;
将待处理图像输入目标模型,得到处理后的图像,处理后的图像上标注了多个人体组织在待处理图像中所占的区域,目标模型为前述第一方面得到的目标模型。
第四方面,本申请实施例提供了一种图像处理装置,该装置包括:
获取模块,用于获取待处理图像,待处理图像包括多个人体组织;
处理模块,用于将待处理图像输入目标模型,得到处理后的图像,处理后的图像上标注了多个人体组织在待处理图像中所占的区域,目标模型为前述第一方面得到的目标模型。
第五方面,本申请实施例还提供了一种电子设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如第三方面所述的图像处理方法。
第六方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如第一方面所述的训练模型的方法,或如第三方面所述的图像处理方法。
第七方面,本申请实施例还提供了一种计算机程序产品,所述计算机程序 产品包括计算机程序,当所述计算机程序产品在计算机上运行时,实现如第一方面所述的训练模型的方法,或如第三方面所述的图像处理方法。
本申请实施例提供了一种训练模型的方法和一种图像处理方法,在训练模型的过程中,加入注意力机制,使得当前卷积层的部分目标权重大于前一卷积层的预设权重,从而提升目标模型的处理效果。在图像处理过程中,利用该目标模型代替传统的手动分割,从而实现CT图像中多个人体组织的自动分割。本申请实施例提供的方法,能够一次性分割出多个人体组织,提高准确性。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种训练模型的方法的流程示意图;
图2是本申请实施例提供的一种预设模型的结构示意图;
图3是本申请实施例提供的一种第二卷积模块的结构示意图;
图4是本申请实施例提供的一种图像处理方法的流程示意图;
图5是本申请实施例提供的又一种图像处理方法的流程示意图;
图6中的(a)图是本申请实施例提供的一种原始图像的示意图;
图6中的(b)图是本申请实施例提供的一种标注图像的示意图;
图6中的(c)图是本申请实施例提供的一种输出图像的示意图;
图7是本申请实施例提供的一种装置的结构示意图;
图8是本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实 施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上,“至少一个”、“一个或多个”是指一个、两个或两个以上。
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
电子计算机断层成像(CT)通过X射线穿透不同衰减系数的人体物质,获得断层图像,断层图像能清晰地显示人体肌肉和脂肪等组织的分布区域,通过分析人体组成成分,可以评估人体的营养状态,诊断肥胖、代谢性疾病、心血管疾病、高血压、癌症等疾病。
然而,目前获取断层图像后,需要医师手动分割人体的肌肉和脂肪等组织的分布区域,需要耗费的时间和人力成本较高,且医师依赖于主观经验,影响分割的准确性和客观性,大大限制了基于CT人体成分分析在医学中的广泛应用。或者,针对特定任务设计人工特征分割算法,但是该方法同样依赖医生的先验知识,并且医学图像标注数据获取困难,由于大部分算法是基于少数部位设计的,不能应用于其他部位,鲁棒性和泛化性能较差。因此,如何实现精准、快速、客观地对CT图像上肌肉和脂肪等人体组织进行分割是目前需要解决的问题。
为此,本申请实施例提供了一种训练模型的方法和一种图像处理方法,通过训练模型,实现端到端的肌肉、脂肪等人体组织的自动分割。在训练模型的过程中,加入注意力机制,以权重的形式对特征(人体组织)进行重校正,使得当前卷积层的部分目标权重大于前一卷积层的预设权重,从而提升目标模型的处理效果。在图像处理过程中,利用该目标模型代替传统的手动分割,从而实现CT图像中多个人体组织的自动分割,提高准确性。本申请实施例提供的方法,能够一次性自动分割出多个的人体组织,具有高Dice系数,收敛速度快,泛化性强,算法鲁棒性高等优点。
首先,对本申请实施例提供的训练模型的方法进行说明,图1示出了本申请实施例提供一种训练模型的方法100,该方法100至少包括以下步骤:
S101:获取训练集。
获取多张原始图像和每张原始图像对应的标注图像,例如,获取位于第三腰椎水平的腹部CT图像的多张原始图像与对应的多张标注图像,原始图像即通过仪器扫描的得到的CT图像,也可以称为CT切片,原始图像包括多个人体组织,如皮下脂肪、腹部肌肉、内脏脂肪以及其他组织等。标注图像上标注了皮下脂肪、腹部肌肉、内脏脂肪以及其他组织在原始图像中所占的区域,标注图像可以是医院的医师手工标注得到的。标注图像对应于第三图像。
CT图像用CT值来表示人体的每个组织器官的吸收系数,Hounsfield Units(HU)是CT值的单位,CT图像上的值代表了拍摄的部位是否正常,是否有病变。腹部肌肉可以使用第一预定阈值-29HU至+150HU进行标定,腹部脂肪组织可以使用第二预定阈值-190HU至-30HU进行标定。医师可以根据CT图像中像素的值对原始图像进行标注。
将原始图像输入预设模型之前对其进行预处理。将原始图像中低于-128HU的值设置为-128,高于150HU的值设置为150,以减少输入网络模型的信息量,CT图像可能是不同仪器扫描的,对原始图像进行z-score标准化,减少不同仪器之间的差距:
Figure PCTCN2022137661-appb-000001
其中,x i为每个像素点的HU值,μ i=E(x i),
Figure PCTCN2022137661-appb-000002
ε为常数。经过处理后,原始图像中像素点的HU值的均值为0,标准差为1。
标注图像是单通道的灰度图,本申请实施例中,在模型训练时,可以将单通道的灰度图根据人体组织分为多通道图像,例如,采用one-hot编码方法将单通道的灰度图,按照皮下脂肪、腹部肌肉、内脏脂肪以及其他组织编码为四个通道图像,皮下脂肪对应的通道图像中,皮下脂肪的位置的像素点设置为1,其他部位的像素设置为0;腹部肌肉对应的通道图像中,腹部肌肉的位置设置为1,其他部位设置为0;内脏脂肪对应的通道图像中,内脏脂肪的位置设置为1,其他部位设置为0;其他组织对应的通道图像中,其他组织的位置设置为1,其他部位设置为0;则模型输出的图像可以是四通道图像。
将获取的所有原始图像分为训练集和验证集,如共545张原始图像,其中436张原始图像作为训练集,109张原始图像作为验证集。
对于训练集,通过对原始图像使用随机旋转、随机水平翻转、随机垂直翻转等数据增强手段,增加训练集的原始图像的数量。
将训练集中的原始图像作为模型的输入,原始图像的尺寸可以根据实际情况确定,例如,每张原始图像的大小为512×512像素。
S102:构建预设模型。
本申请实施例以Unet模型为基础模型,结合注意力机制构建预设模型。
示例性的,图2示出了本申请实施例提供的一种预设模型的结构示意图。如图2所示,预设模型的深度为5,包括自上而下的5个卷积层,为一个U形结构,预设模型包括两部分,编码和解码。在U形的左侧进行编码,编码用来获取人体组织的信息,减少图像的空间维度,增加通道数量,在U形的右侧进行解码,对人体组织进行精确定位。
且每一卷积层均设置第一卷积模块(ResConv)和第二卷积模块(SECA), 第一卷积模块用于提取输入图像中多个人体组织的信息,第二卷积模块用于确定每个人体组织对应的权重,并增加或减少每个人体组织对应的权重。应理解,第一卷积模块和第二卷积模块的数量,具体可以根据需要进行设置,本申请实施例对此不进行任何限制。示例性的,在图2中,每个卷积层均在解码阶段设置1个第一卷积模块和1个第二卷积模块,在编码阶段设置1个第一卷积模块和1个第二卷积模块。
在左边的解码阶段,首先,将训练集中的图像(第一图像)输入到第一卷积层,通过第一卷积层中的第一卷积模块对输入图像进行通道增加,得到多个通道图,相当于放大输入图像的特征;并提取多个人体组织的信息,此处,将单通道的输入图像1×512×512转换为8通道的图像8×512×512,预设权重为人工手动设置的。通过第二卷积模块对8通道的图像8×512×512中的每个人体组织调整权重,得到目标权重,如,提高皮下脂肪、腹部肌肉、内脏脂肪的权重,降低其他组织的权重。图中,模块上方的数字例1、8、16表示特征图的通道数,模块侧面的数字例512×512表示特征图的尺寸。
然后,将第一卷积层中的第二卷积模块输出的特征信息(8通道的图像8×512×512)通过卷积进行下采样,得到8×256×256的图像,再输入第二卷积层的第一卷积模块中继续增加通道,提取特征信息,得到16通道的图像16×256×256,此时,第二卷积层使用的权重为第一卷积层的第二卷积模块输出的权重(当前卷积层的目标权重,下一卷积层的预设权重)。然后,通过第二卷积层的第二卷积模块继续对每个人体组织调整权重。应理解,通过卷积进行下采样,可以缩小图像尺寸,获取到更多图像的局部信息。
同理,将第二卷积层中的第二卷积模块输出的特征信息,通过卷积下采样为128×128,再输入第三卷积层中的第一卷积模块中继续增加通道至32,提取特征信息。将第三卷积层中的第二卷积模块输出的特征信息通过卷积继续进行下采样,再输入第四卷积层中的第一卷积模块中继续增加通道至64,提取特征信息。将第四卷积层中的第二卷积模块输出的特征信息通过卷积继续进行下采 样,再输入第五卷积层中的第一卷积模块中继续增加通道至128,提取特征信息,此时图像为128×32×32。增加卷积层使得卷积的更深,得到的特征维度更高,更抽象,同时拟合的函数更多,最后使得图像效果更好。
并且,在每一次使用第一卷积模块提取特征信息后,第二卷积模块均对每个人体组织的权重进行调整。
在第一卷积层和第二卷积层中,第一卷积模块的卷积尺寸和填充尺寸(padding)分别为7和3;其余卷积层中,第一卷积模块的卷积尺寸均为3,padding均为1,所有的卷积模块步长均为1。下采样可以使用多种方式完成,例如最大池化(MaxPooling)。
在右边的编码阶段,首先,将第五卷积层中的第二卷积模块输出的特征信息通过反卷积进行上采样,再通过跳跃连接(Skip-Connection),与下采样时第四卷积层中的第二卷积模块提取的特征信息,进行特征拼接,将拼接后的特征信息依次传输至第一卷积模块和第二卷积模块,上采样过程中的第一卷积模块减少通道进行特征提取,第二卷积模块同样调整权重。应理解,上采样操作用于放大图像特征的尺寸,提高图像分辨率;比如,上采样操作可以采用内插值方法,即在原有图像像素的基础上在像素点之间采用合适的插值算法插入新的元素。
第四卷积层依次向第三卷积层传输;直到传输至第一卷积层,利用一个1×1卷积(1×1Conv2D)进行通道调整,得到与输入图像相同尺寸的输出图像,或者,还原出4×512×512的4个的通道灰度图像,作为输出图像。
此外,上采样采用转置卷积(UpConv)恢复空间分辨率,上采样过程中的卷积核尺寸、步长和padding均与下采样过程中对应相同。由于下采样和上采样次数相同,由此,处理得到的输出图像和输入图像的尺寸相同。
示例性的,图3示出了图2中的一种第二卷积模块的结构示意图。
第二卷积模块为通道注意力模块,为预设模型添加注意力机制,将第一卷积模块提取的特征,送入通道注意力模块,将每个通道的特征进行区别对待, 使预设模型更关注重要特征,抑制其他无关特征,使重要特征在处理过程中权重更大。
如图3所示,对输入第二卷积模块的特征图通过不同尺寸的卷积核进行卷积,从输入特征图中提取特征,差异部分被强化,其它部分被弱化。例如,将一个H×W×C大小的特征图做两种卷积得到图像U 1和U 2,所运用的卷积核大小分别是3和5,即kernel3×3和kernel5×5。
将两种卷积得到的图像相加(element-wise sum)得到图像U,后再对U进行挤压操作(Squeeze),由C×H×W的特征图得到C×1×1的channel级向量S,向量S将每个空间压缩成一系列实数。Squeeze的实现方式可以是全局平均池化(Fgp),全局平均池化将特征图所有像素值相加求平均,得到一个数值,即用该数值表示对应特征图,能够减少参数数量,减少计算量,减少过拟合。
然后通过一维卷积(Conv1D)避免降维,通过学习通道注意力得到向量Z,然后经过两个全连接层(F fc)和softmax激活函数后,输出通道注意力向量,即两个矩阵α和β,将α和β分别与原始特征图卷积后的特征图U 1和U 2进行矩阵相乘(element-wise multiplication),然后加和输出最终向量V,向量V对应第二卷积模块输出的调整权重后的多通道图像。
S103:通过预设模型的当前卷积层对第一图像使用多个目标权重进行卷积处理,得到第二图像,根据第二图像与第三图像进行模型训练,得到目标模型。
第一卷积层中,第一图像为训练集中的图像,输入第一图像,输出第一卷积层卷积处理后的第二图像。
第二卷积层,以第一卷积层输出的第二图像为第一图像,进行卷积处理,得到第二卷积层输出的第二图像。其他卷积层同理。
在经过预设模型得到输出图像后,将输出图像与第三图像进行对比,求输出图像与第三图像之间的差值,即损失值,根据损失值确定预设模型的优化方向。
示例性的,本申请实施例可以使用交叉熵损失函数计算损失值,对于给定 集合
Figure PCTCN2022137661-appb-000003
y i表示图像预测的标签,
Figure PCTCN2022137661-appb-000004
表示图像真实的标签,M为类别数,N为样本总数,则损失函数计算公式为:
Figure PCTCN2022137661-appb-000005
其中,y i,c为符号函数(0或1),若像素i的真实类别等于c则取1,否则为0。p i,c为第i个像素预测为第c类的概率。
本申请实施例在训练过程中可以采用Adam优化器来优化损失函数,可以采用余弦退火算法衰减学习率,应理解的是,也可以采用其他方式优化损失函数和学习率,本申请实施例在此不做限定。
除了通过以上方式训练目标模型外,在另一种实现方式中,可以采用五折交叉验证法,对获取的所有原始图像进行划分,即将获取的所有原始图像平均分成5等份,训练模型时将其中一份作为训练集,将其他4份作为验证集,训练方法与图1所示实施例相同,则可以训练5个模型,得到5个训练后的模型,将这5个训练后的模型使用一定的数据变换,例如求和,得到目标模型,该方法训练的目标模型可以解决稀疏标注情况下的过拟合问题。
在得到目标模型后,可以应用于实际医疗活动中,图4为本申请实施例提供的一种图像处理方法400,该方法400至少包括以下步骤:
S401:获取待处理图像,待处理图像包括多个人体组织。
S402:将待处理图像输入目标模型,得到处理后的图像,处理后的图像上标注了多个人体组织在待处理图像中所占的区域。
即将医疗活动中拍摄的CT图像,输入到目标模型中,得到处理后的图像,处理后的图像上标注了多个人体组织在待处理图像中所占的区域,即对图像进行了分割,医师可以根据分割图像对人体成分进行分析。
上述原始图像可以是增强CT图像,也可以是平扫CT图像。根据增强CT图像训练得到的目标模型,同样可以对平扫CT图像进行处理,增强了网络模型的泛化性。
应理解的是,本申请实施例提供的方法除了应用于CT图像分割外,经适当变形后,也可应用于正电子发射型计算机断层显像(Positron Emission Computed Tomography,PET)、磁共振成像(Magnetic Resonance Imaging,MRI)等类型的图像分割。
总之,本申请实施例提供了一种网络模型的训练方法和一种图像处理方法,如图5所示,获取训练集和测试集,并由医学专家对训练集和测试集中每张图像进行标注,得到模型训练时的标签,对训练集使用数据增强手段增加图像数量,经过归一化和交叉验证等处理后,使用训练集进行训练模型,得到目标模型。在训练模型的过程中,加入注意力机制,探索特征信息的多样性,使得网络更加关注重要的特征,抑制其他特征的权重,从而提升目标模型的处理效果。
使用测试集对目标模型进行测试,将测试集输入到目标模型,得到分割结果,将分割结果和测试集的标签进行对比,实现对目标模型的性能分析。
在图像处理过程中,利用该目标模型代替传统的手动分割,从而实现CT图像中各人体组织的自动分割。本申请实施例提供的方法,能够一次性分割出多个人体组织,提高准确性。此外,本申请实施例可以在不训练平扫CT图像的前提下得到较优的平扫CT图像分割结果。本申请实施例提供的方法有效提高了模型性能,加速了模型的收敛速度,增强了模型的泛化性。
以下对本申请实施例的有效性进行示例性说明。如图6所示,图6中的(a)图为原始图像,图6中的(b)图为人工标注后的图像,图6中的(c)图为将原始图像输入到目标模型中,目标模型输出的分割后的图像,对比图6中的(b)图和图6中的(c)图可知,目标模型输出的图像与人工标注后的图像相似度较高,可以表明目标模型分割的准确度较高,验证了本申请实施例提供的图像处理方法的可行性和有效性。
以下对本申请实施例提供的装置和电子设备进行说明。
图7为本申请实施例提供的一种图像处理装置700,该装置700包括:获取模块701,处理模块702。
获取模块701,用于获取待处理图像,待处理图像包括多个人体组织。
处理模块702,用于将待处理图像输入目标模型,得到处理后的图像,处理后的图像上标注了多个人体组织在待处理图像中所占的区域。
应理解的是,本申请实施例的装置700可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图4所示的方法,通过软件实现图4所示的方法时,装置700及其各个模块也可以为软件模块。
图8为本申请实施例提供的一种电子设备800的结构示意图。如图8所示,该设备800包括处理器801、存储器802、通信接口803和总线804。其中,处理器801、存储器802、通信接口803通过总线804进行通信,也可以通过无线传输等其他手段实现通信。该存储器802用于存储指令,该处理器801用于执行该存储器802存储的指令。该存储器802存储程序代码8021,且处理器801可以调用存储器802中存储的程序代码8021执行图4所示的图像处理方法。
应理解,在本申请实施例中,处理器801可以是CPU,处理器801还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
该存储器802可以包括只读存储器和随机存取存储器,并向处理器801提供指令和数据。存储器802还可以包括非易失性随机存取存储器。该存储器802可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM, EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
该总线804除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图8中将各种总线都标为总线804。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其 依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (8)

  1. 一种训练模型的方法,其特征在于,所述方法包括:
    通过预设模型的当前卷积层对第一图像使用多个目标权重进行卷积处理,得到第二图像,所述第一图像中包括多个人体组织,所述第二图像上标注了多个所述人体组织在所述第二图像中所占的区域,多个所述目标权重与多个所述人体组织一一对应,多个所述目标权重中的部分目标权重大于相对应的预设权重,所述预设权重是前一卷积层在进行卷积处理时使用的权重;
    根据所述第二图像与第三图像进行模型训练,得到目标模型,所述第三图像是经人工处理后得到的,所述第三图像上标注了多个所述人体组织在所述第三图像中所占的区域。
  2. 根据权利要求1所述的方法,其特征在于,多个所述人体组织中指定类型的人体组织的目标权重大于相对应的预设权重。
  3. 根据权利要求1或2所述的方法,其特征在于,所述预设模型以Unet模型为基础模型。
  4. 根据权利要求3所述的方法,其特征在于,所述当前卷积层包括第一卷积模块和第二卷积模块,所述第一卷积模块用于提取所述第一图像中的多个所述人体组织的信息,所述第二卷积模块用于对所述预设权重进行调整得到所述目标权重。
  5. 一种图像处理方法,其特征在于,所述方法包括:
    获取待处理图像,所述待处理图像包括多个人体组织;
    将所述待处理图像输入目标模型,得到处理后的图像,所述处理后的图像上标注了多个所述人体组织在所述待处理图像中所占的区域,所述目标模型为利用权利要求1至4中任一项所述的方法得到的目标模型。
  6. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取待处理图像,所述待处理图像包括多个人体组织;
    处理模块,用于将所述待处理图像输入目标模型,得到处理后的图像,所述处理后的图像上标注了多个所述人体组织在所述待处理图像中所占的区域, 所述目标模型为利用权利要求1至4中任一项所述的方法得到的目标模型。
  7. 一种电子设备,其特征在于,包括:存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如权利要求5所述的方法。
  8. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1至4中任一项所述的训练模型的方法,和/或,如权利要求5所述的图像处理方法。
PCT/CN2022/137661 2021-12-15 2022-12-08 训练模型的方法、图像处理方法、装置及电子设备 WO2023109651A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111534785.7 2021-12-15
CN202111534785.7A CN114359169A (zh) 2021-12-15 2021-12-15 训练模型的方法、图像处理方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2023109651A1 true WO2023109651A1 (zh) 2023-06-22

Family

ID=81099934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137661 WO2023109651A1 (zh) 2021-12-15 2022-12-08 训练模型的方法、图像处理方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN114359169A (zh)
WO (1) WO2023109651A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359169A (zh) * 2021-12-15 2022-04-15 深圳先进技术研究院 训练模型的方法、图像处理方法、装置及电子设备
CN116503932B (zh) * 2023-05-24 2024-06-18 北京万里红科技有限公司 重点区域加权的眼周特征提取方法、系统及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651438A (zh) * 2020-12-24 2021-04-13 世纪龙信息网络有限责任公司 多类别图像的分类方法、装置、终端设备和存储介质
CN113129310A (zh) * 2021-03-04 2021-07-16 同济大学 一种基于注意力路由的医学图像分割系统
US20210248747A1 (en) * 2020-02-11 2021-08-12 DeepVoxel, Inc. Organs at risk auto-contouring system and methods
CN114359169A (zh) * 2021-12-15 2022-04-15 深圳先进技术研究院 训练模型的方法、图像处理方法、装置及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127487B (zh) * 2019-12-27 2022-04-19 电子科技大学 一种实时多组织医学图像的分割方法
CN111951274A (zh) * 2020-07-24 2020-11-17 上海联影智能医疗科技有限公司 图像分割方法、系统、可读存储介质和设备
CN112651979B (zh) * 2021-01-11 2023-10-10 华南农业大学 肺部x光图像分割方法、系统、计算机设备及存储介质
CN113012172B (zh) * 2021-04-09 2023-10-03 杭州师范大学 一种基于AS-UNet的医学图像分割方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210248747A1 (en) * 2020-02-11 2021-08-12 DeepVoxel, Inc. Organs at risk auto-contouring system and methods
CN112651438A (zh) * 2020-12-24 2021-04-13 世纪龙信息网络有限责任公司 多类别图像的分类方法、装置、终端设备和存储介质
CN113129310A (zh) * 2021-03-04 2021-07-16 同济大学 一种基于注意力路由的医学图像分割系统
CN114359169A (zh) * 2021-12-15 2022-04-15 深圳先进技术研究院 训练模型的方法、图像处理方法、装置及电子设备

Also Published As

Publication number Publication date
CN114359169A (zh) 2022-04-15

Similar Documents

Publication Publication Date Title
WO2023109651A1 (zh) 训练模型的方法、图像处理方法、装置及电子设备
Yi et al. Generative adversarial network in medical imaging: A review
US11491350B2 (en) Decision support system for individualizing radiotherapy dose
EP3979198A1 (en) Image segmentation model training method and apparatus, computer device, and storage medium
Nemoto et al. Efficacy evaluation of 2D, 3D U-Net semantic segmentation and atlas-based segmentation of normal lungs excluding the trachea and main bronchi
WO2023221954A1 (zh) 基于强化学习和注意力的胰腺肿瘤图像分割方法及系统
US11810292B2 (en) Disease characterization and response estimation through spatially-invoked radiomics and deep learning fusion
CN110889853A (zh) 基于残差-注意力深度神经网络的肿瘤分割方法
Li et al. DenseX-net: an end-to-end model for lymphoma segmentation in whole-body PET/CT images
Lee et al. Applying artificial intelligence to longitudinal imaging analysis of vestibular schwannoma following radiosurgery
CN111583246A (zh) 利用ct切片图像对肝脏肿瘤进行分类的方法
Hänsch et al. Improving automatic liver tumor segmentation in late-phase MRI using multi-model training and 3D convolutional neural networks
JP2024035070A (ja) マルチビューサブ空間クラスタリングに基づくマルチモード医学データ融合システム
Tang et al. Improving splenomegaly segmentation by learning from heterogeneous multi-source labels
Tummala et al. Liver tumor segmentation from computed tomography images using multiscale residual dilated encoder‐decoder network
Yagi et al. Abdominal organ area segmentation using u-net for cancer radiotherapy support
CN111738975B (zh) 图像辨识方法及图像辨识装置
Abdikerimova et al. Detection of chest pathologies using autocorrelation functions
CN113450306B (zh) 提供骨折检测工具的方法
Pu et al. Automated segmentation of 3-d body composition on computed tomography
Tong et al. Robust and efficient abdominal CT segmentation using shape constrained multi-scale attention network
Hsu et al. A comprehensive study of age-related macular degeneration detection
US20230267607A1 (en) Hybrid convolutional wavelet networks for predicting treatment response via radiological images of bowel disease
CN112884759B (zh) 一种乳腺癌腋窝淋巴结转移状态的检测方法及相关装置
Jiménez-Gaona et al. Breast mass regions classification from mammograms using convolutional neural networks and transfer learning.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906399

Country of ref document: EP

Kind code of ref document: A1